How do I perform basic data manipulation with pandas?

· Category: Python Programming

Short answer

pandas provides DataFrame and Series objects for tabular data manipulation. You can load data from CSV or dictionaries, filter rows, select columns, handle missing values, and aggregate with groupby.

Steps

  1. Import pandas: import pandas as pd.
  2. Create or load a DataFrame.
  3. Filter, select, and transform.
import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["NYC", "LA", "NYC"]
})

# Filtering
adults = df[df["age"] >= 30]
print(adults)

# Selection
print(df[["name", "age"]])

# Aggregation
print(df.groupby("city")["age"].mean())

Tips

  • Use .loc[] for label-based indexing and .iloc[] for position-based indexing.
  • fillna() and dropna() manage missing data.
  • apply() and map() allow custom transformations.
  • Vectorized operations are much faster than row-by-row iteration.
# Vectorized operation
df["age_next_year"] = df["age"] + 1

# Missing data
df["score"] = [90, None, 85]
df["score"] = df["score"].fillna(0)

Common issues

  • Modifying a slice of a DataFrame with chained indexing (df[...][...] = value) can fail or produce a SettingWithCopyWarning; use .loc instead.
  • Large DataFrames can consume significant memory; use dtype optimization or chunking for big files.
  • NaN is a float, so integer columns with missing values become float; use Int64 (nullable integer) if needed.