How to create heatmaps for correlation
· Category: Data Science
Short answer
Correlation heatmaps provide a compact visual summary of pairwise relationships among numerical variables in a dataset.
Steps
- Compute a correlation matrix using Pearson, Spearman, or Kendall methods.
- Pass the matrix to a heatmap function such as seaborn.heatmap.
- Choose a diverging color map centered at zero for symmetric interpretation.
- Annotate cells with correlation coefficients for precise reading.
- Reorder rows and columns with clustering to reveal variable groupings.
Tips
- Mask the upper triangle to reduce redundancy since correlation matrices are symmetric.
- Use hierarchical clustering to group highly correlated variables together.
- Set appropriate figure size so labels remain legible.
- Highlight significant correlations with asterisks or bold formatting.
Common issues
- Including categorical variables in correlation calculations.
- Interpreting correlation as causation without controlling for confounders.
- Cluttered heatmaps with too many variables making cells unreadable.
- Using default sequential color maps that do not center at zero.
Example
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid')
plt.figure(figsize=(10, 6))
sns.barplot(x='category', y='value', data=df)
plt.title('Sales by Category')
plt.show()
This snippet demonstrates how to configure aesthetics and create a publication-ready bar chart with labeled axes and a clear title.