How to check assumptions for statistical tests

Question

QA Hub Editorial · Accepted Answer

Short answer

Statistical tests rely on assumptions about data distribution, variance, and independence that must be verified to ensure valid conclusions.

Steps

Test normality with Shapiro-Wilk, Anderson-Darling, or visual Q-Q plots.
Assess homoscedasticity with Levene's test or by plotting residuals versus fitted values.
Check independence by examining data collection methods and autocorrelation functions.
Review sample size to determine whether asymptotic approximations apply.
If assumptions are violated, apply transformations or switch to robust non-parametric tests.

Tips

Use graphical checks in addition to formal tests since large samples often reject trivial deviations.
Log-transform right-skewed data to improve normality and stabilize variance.
Use bootstrap methods when assumptions are uncertain.
Report assumption checks transparently to strengthen credibility.

Common issues

Over-reliance on formal normality tests with large sample sizes.
Ignoring clustered or hierarchical structures that violate independence.
Applying transformations that complicate coefficient interpretation.
Continuing with parametric tests after clear assumption violations.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.

Short answer

Steps

Tips

Common issues

Example

Related Questions

What is the difference between precision and recall

What is feature engineering and why is it important

How to preprocess data for machine learning models

How to document a data pipeline

How to optimize slow data pipelines

How to version datasets for reproducibility