How hypothesis testing works in practice
· Category: Data Science
Short answer
Hypothesis testing uses sample data to assess the strength of evidence against a default assumption about a population parameter.
Steps
- State the null hypothesis representing no effect or no difference.
- State the alternative hypothesis representing the effect of interest.
- Choose a significance level alpha such as 0.05 to control Type I error.
- Calculate a test statistic and its corresponding p-value from the sample data.
- Compare the p-value to alpha and reject or fail to reject the null hypothesis.
Tips
- Always report effect sizes alongside p-values for practical significance.
- Use confidence intervals to convey estimation uncertainty.
- Check assumptions such as normality and equal variance before applying parametric tests.
- Pre-register hypotheses to avoid data dredging and publication bias.
Common issues
- Confusing failure to reject the null with acceptance of the null.
- Running many tests without correction inflating familywise error rates.
- Small sample sizes yielding underpowered and inconclusive results.
- Violating test assumptions leading to invalid p-values.
Example
import pandas as pd
import numpy as np
df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())
This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.