How linear regression makes predictions

· Category: Data Science

Short answer

Linear regression predicts a continuous outcome by fitting a straight line that minimizes the sum of squared residuals between observed and predicted values.

Steps

  1. Identify the dependent variable and one or more independent variables.
  2. Verify that the relationship is approximately linear and residuals are homoscedastic.
  3. Fit the model using ordinary least squares to estimate coefficients.
  4. Interpret coefficients as the expected change in the outcome per unit change in the predictor.
  5. Predict new values by plugging inputs into the fitted equation.

Tips

  • Standardize predictors when comparing coefficient magnitudes across variables.
  • Include interaction terms when the effect of one predictor depends on another.
  • Check the residual plot for patterns indicating model misspecification.
  • Regularize with Ridge or Lasso when multicollinearity is present.

Common issues

  • Overfitting when too many predictors are included relative to sample size.
  • Violation of independence assumptions in time-series or clustered data.
  • Extrapolation beyond the range of training data producing unreliable predictions.
  • Heteroscedasticity invalidating standard error estimates.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.