How to choose the right statistical test
· Category: Data Science
Short answer
Choosing the right statistical test depends on the type of data, number of groups, sample size, and whether distributional assumptions are met.
Steps
- Identify the measurement scale of your variables: nominal, ordinal, interval, or ratio.
- Determine the number of groups and whether comparisons are paired or independent.
- Check assumptions such as normality and equal variance.
- Select a parametric test if assumptions hold; otherwise choose a non-parametric equivalent.
- Verify that the test addresses the specific research question and effect of interest.
Tips
- Use flowcharts or decision trees to guide test selection systematically.
- When in doubt, perform both parametric and non-parametric tests and compare conclusions.
- Consider permutation tests for exact inference with complex designs.
- Document your rationale for test selection in analysis reports.
Common issues
- Using t-tests for more than two groups without correcting for multiple comparisons.
- Applying parametric tests to ordinal data with small samples.
- Ignoring paired structures and treating repeated measures as independent.
- Confusing one-tailed and two-tailed hypotheses leading to incorrect p-values.
Example
import pandas as pd
import numpy as np
df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())
This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.