How to deploy a machine learning model to production

· Category: AI & Machine Learning

Short answer

Wrap your trained model in a REST API (using Flask, FastAPI, or specialized frameworks like BentoML), containerize it with Docker, and deploy to a cloud platform. Monitor predictions and model drift in production. For API design, see what is a REST API.

Deployment options

  1. REST API: Serve predictions via HTTP endpoints using FastAPI or Flask
  2. Batch prediction: Run predictions on a schedule against a database
  3. Edge deployment: Convert models to ONNX or TensorFlow Lite for mobile/embedded
  4. Serverless: Deploy with AWS Lambda for sporadic, low-latency requests

Containerization

FROM python:3.11-slim
COPY model.pkl app.py /app/
RUN pip install fastapi uvicorn scikit-learn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0"]

For Docker basics, see how to write a Dockerfile.

Monitoring

  • Track prediction latency and error rates
  • Detect data drift when input distributions change
  • Set up automated retraining pipelines

Tips

  • Version your models and keep training metadata
  • Use A/B testing to compare model versions in production