How to reshape data with melt and pivot

· Category: Data Science

Short answer

Reshaping data between wide and long formats aligns it with the requirements of visualization libraries, statistical models, and reporting tools.

Steps

  1. Identify whether your analysis needs observations in rows or variables in columns.
  2. Use melt to convert wide data into long format by unpivoting columns into key-value pairs.
  3. Use pivot or pivot_table to spread long data into wide format with aggregation.
  4. Handle duplicate entries in pivot by specifying an aggregation function.
  5. Validate the result by checking row counts and value distributions.

Tips

  • Use pivot_table instead of pivot when index combinations are not unique.
  • Reset the index after pivoting to avoid hierarchical index complications.
  • Melt only the columns that represent variables, keeping identifiers fixed.
  • Use stack and unstack for multi-level index reshaping.

Common issues

  • Losing data when pivoting without aggregation over duplicate keys.
  • Creating sparse matrices with many NaN values after pivoting.
  • Confusion between long and wide formats causing incorrect analysis.
  • Memory overhead from intermediate DataFrames during complex reshapes.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.