How do I perform basic data manipulation with pandas?

Question

QA Hub Editorial · Accepted Answer

Short answer pandas provides DataFrame and Series objects for tabular data manipulation. You can load data from CSV or dictionaries, filter rows, select columns, handle missing values, and aggregate with groupby. Steps Import pandas: import pandas as pd. Create or load a DataFrame. Filter, select, and transform. import pandas as pd df = pd.DataFrame({ "name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35], "city": ["NYC", "LA", "NYC"] }) # Filtering adults = df[df["age"] >= 30] print(adults) # Selection print(df[["name", "age"]]) # Aggregation print(df.groupby("city")["age"].mean()) Tips Use .loc[] for label-based indexing and .iloc[] for position-based indexing. fillna() and dropna() manage missing data. apply() and map() allow custom transformations. Vectorized operations are much faster than row-by-row iteration. # Vectorized operation df["age_next_year"] = df["age"] + 1 # Missing data df["score"] = [90, None, 85] df["score"] = df["score"].fillna(0) Common issues Modifying a slice of a DataFrame with chained indexing (df[...][...] = value) can fail or produce a SettingWithCopyWarning; use .loc instead. Large DataFrames can consume significant memory; use dtype optimization or chunking for big files. NaN is a float, so integer columns with missing values become float; use Int64 (nullable integer) if needed.

Short answer

Steps

Tips

Common issues

Related Questions

How do I package and distribute a Python project?

How do I interact with databases using SQLAlchemy in Python?

How do I build a simple web API with Flask or FastAPI in Python?

How to use Python multiprocessing module

How do I use Jupyter notebooks for interactive Python development?

How do I format and lint Python code with black, flake8, and ruff?