How to use groupby for aggregations

Question

QA Hub Editorial · Accepted Answer

Short answer

Groupby splits data into groups based on categorical variables, applies computations within each group, and combines results into a summary.

Steps

Choose grouping columns that define the meaningful partitions for your analysis.
Select aggregation functions such as sum, mean, count, or custom lambdas.
Apply multiple aggregations simultaneously with agg for comprehensive summaries.
Use transform to return results aligned with the original DataFrame index.
Reset the index after aggregation to flatten hierarchical results.

Tips

Use as_index=False to keep grouping columns as regular columns.
Apply named aggregations for cleaner column names in the output.
Filter groups with filter based on group-level properties.
Use size instead of count when you want total rows including NaNs.

Common issues

Slow performance when grouping on high-cardinality columns.
Unexpected multi-index structures complicating downstream operations.
Missing groups due to filtering before aggregation.
Memory spikes from intermediate group objects in complex pipelines.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.

Short answer

Steps

Tips

Common issues

Example

Related Questions

How to filter and slice DataFrames

How to reshape data with melt and pivot

How to work with large CSV files efficiently

How to transform data types in pandas

How to remove duplicate records from data

How to merge multiple datasets in pandas