What is a normal distribution and why it matters
· Category: Data Science
Short answer
The normal distribution is a symmetric, bell-shaped probability distribution characterized by its mean and standard deviation.
How it works
In a normal distribution, about 68 percent of values fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. This predictable structure makes it mathematically tractable and allows exact calculation of probabilities and confidence intervals. Many natural phenomena approximate normality, and the central limit theorem ensures that sample means tend toward normality regardless of the population distribution.
Example
Heights in a homogeneous population often follow a normal distribution. If the mean height is 170 centimeters with a standard deviation of 10, then roughly 95 percent of individuals fall between 150 and 190 centimeters.
Why it matters
Normality assumptions justify parametric tests such as t-tests and linear regression, which are more powerful than non-parametric alternatives when assumptions hold. Understanding deviations from normality helps analysts choose appropriate models and transformations.
Example
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid')
plt.figure(figsize=(10, 6))
sns.barplot(x='category', y='value', data=df)
plt.title('Sales by Category')
plt.show()
This snippet demonstrates how to configure aesthetics and create a publication-ready bar chart with labeled axes and a clear title.