How do I perform basic data manipulation with pandas?
· Category: Python Programming
Short answer
pandas provides DataFrame and Series objects for tabular data manipulation. You can load data from CSV or dictionaries, filter rows, select columns, handle missing values, and aggregate with groupby.
Steps
- Import pandas:
import pandas as pd. - Create or load a DataFrame.
- Filter, select, and transform.
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["NYC", "LA", "NYC"]
})
# Filtering
adults = df[df["age"] >= 30]
print(adults)
# Selection
print(df[["name", "age"]])
# Aggregation
print(df.groupby("city")["age"].mean())
Tips
- Use
.loc[]for label-based indexing and.iloc[]for position-based indexing. fillna()anddropna()manage missing data.apply()andmap()allow custom transformations.- Vectorized operations are much faster than row-by-row iteration.
# Vectorized operation
df["age_next_year"] = df["age"] + 1
# Missing data
df["score"] = [90, None, 85]
df["score"] = df["score"].fillna(0)
Common issues
- Modifying a slice of a DataFrame with chained indexing (
df[...][...] = value) can fail or produce a SettingWithCopyWarning; use.locinstead. - Large DataFrames can consume significant memory; use
dtypeoptimization or chunking for big files. NaNis a float, so integer columns with missing values become float; useInt64(nullable integer) if needed.