How linear regression makes predictions
· Category: Data Science
Short answer
Linear regression predicts a continuous outcome by fitting a straight line that minimizes the sum of squared residuals between observed and predicted values.
Steps
- Identify the dependent variable and one or more independent variables.
- Verify that the relationship is approximately linear and residuals are homoscedastic.
- Fit the model using ordinary least squares to estimate coefficients.
- Interpret coefficients as the expected change in the outcome per unit change in the predictor.
- Predict new values by plugging inputs into the fitted equation.
Tips
- Standardize predictors when comparing coefficient magnitudes across variables.
- Include interaction terms when the effect of one predictor depends on another.
- Check the residual plot for patterns indicating model misspecification.
- Regularize with Ridge or Lasso when multicollinearity is present.
Common issues
- Overfitting when too many predictors are included relative to sample size.
- Violation of independence assumptions in time-series or clustered data.
- Extrapolation beyond the range of training data producing unreliable predictions.
- Heteroscedasticity invalidating standard error estimates.
Example
import pandas as pd
import numpy as np
df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())
This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.