How to create heatmaps for correlation

· Category: Data Science

Short answer

Correlation heatmaps provide a compact visual summary of pairwise relationships among numerical variables in a dataset.

Steps

  1. Compute a correlation matrix using Pearson, Spearman, or Kendall methods.
  2. Pass the matrix to a heatmap function such as seaborn.heatmap.
  3. Choose a diverging color map centered at zero for symmetric interpretation.
  4. Annotate cells with correlation coefficients for precise reading.
  5. Reorder rows and columns with clustering to reveal variable groupings.

Tips

  • Mask the upper triangle to reduce redundancy since correlation matrices are symmetric.
  • Use hierarchical clustering to group highly correlated variables together.
  • Set appropriate figure size so labels remain legible.
  • Highlight significant correlations with asterisks or bold formatting.

Common issues

  • Including categorical variables in correlation calculations.
  • Interpreting correlation as causation without controlling for confounders.
  • Cluttered heatmaps with too many variables making cells unreadable.
  • Using default sequential color maps that do not center at zero.

Example

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style='whitegrid')
plt.figure(figsize=(10, 6))
sns.barplot(x='category', y='value', data=df)
plt.title('Sales by Category')
plt.show()

This snippet demonstrates how to configure aesthetics and create a publication-ready bar chart with labeled axes and a clear title.