How hypothesis testing works in practice

· Category: Data Science

Short answer

Hypothesis testing uses sample data to assess the strength of evidence against a default assumption about a population parameter.

Steps

  1. State the null hypothesis representing no effect or no difference.
  2. State the alternative hypothesis representing the effect of interest.
  3. Choose a significance level alpha such as 0.05 to control Type I error.
  4. Calculate a test statistic and its corresponding p-value from the sample data.
  5. Compare the p-value to alpha and reject or fail to reject the null hypothesis.

Tips

  • Always report effect sizes alongside p-values for practical significance.
  • Use confidence intervals to convey estimation uncertainty.
  • Check assumptions such as normality and equal variance before applying parametric tests.
  • Pre-register hypotheses to avoid data dredging and publication bias.

Common issues

  • Confusing failure to reject the null with acceptance of the null.
  • Running many tests without correction inflating familywise error rates.
  • Small sample sizes yielding underpowered and inconclusive results.
  • Violating test assumptions leading to invalid p-values.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.