How to reshape data with melt and pivot
· Category: Data Science
Short answer
Reshaping data between wide and long formats aligns it with the requirements of visualization libraries, statistical models, and reporting tools.
Steps
- Identify whether your analysis needs observations in rows or variables in columns.
- Use melt to convert wide data into long format by unpivoting columns into key-value pairs.
- Use pivot or pivot_table to spread long data into wide format with aggregation.
- Handle duplicate entries in pivot by specifying an aggregation function.
- Validate the result by checking row counts and value distributions.
Tips
- Use pivot_table instead of pivot when index combinations are not unique.
- Reset the index after pivoting to avoid hierarchical index complications.
- Melt only the columns that represent variables, keeping identifiers fixed.
- Use stack and unstack for multi-level index reshaping.
Common issues
- Losing data when pivoting without aggregation over duplicate keys.
- Creating sparse matrices with many NaN values after pivoting.
- Confusion between long and wide formats causing incorrect analysis.
- Memory overhead from intermediate DataFrames during complex reshapes.
Example
import pandas as pd
import numpy as np
df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())
This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.