How to understand and use p-values correctly

Question

QA Hub Editorial · Accepted Answer

Short answer

A p-value measures the probability of observing data as extreme as the sample data, assuming the null hypothesis is true.

Steps

Formulate a clear null hypothesis before conducting the analysis.
Choose an appropriate statistical test based on data type and distribution.
Compute the p-value from the test statistic using the relevant distribution.
Compare the p-value to a predetermined significance level alpha.
Report the exact p-value, effect size, and confidence interval rather than just significance.

Tips

Treat p-values as continuous measures of evidence, not binary switches.
A non-significant p-value does not prove the null hypothesis is true.
Replicate studies to confirm findings rather than relying on a single p-value.
Combine p-values with domain knowledge and study design quality.

Common issues

Treating p less than 0.05 as automatic proof of a real effect.
P-hacking by selectively reporting analyses that achieved significance.
Ignoring practical significance when effects are statistically significant but tiny.
Misunderstanding that the p-value is not the probability that the hypothesis is true.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.

Short answer

Steps

Tips

Common issues

Example

Related Questions

How hypothesis testing works in practice

How to apply the central limit theorem

How to use confidence intervals

How to perform A/B testing analysis

How linear regression makes predictions

How to interpret correlation coefficients