How to deploy a machine learning model to production
· Category: AI & Machine Learning
Short answer
Wrap your trained model in a REST API (using Flask, FastAPI, or specialized frameworks like BentoML), containerize it with Docker, and deploy to a cloud platform. Monitor predictions and model drift in production. For API design, see what is a REST API.
Deployment options
- REST API: Serve predictions via HTTP endpoints using FastAPI or Flask
- Batch prediction: Run predictions on a schedule against a database
- Edge deployment: Convert models to ONNX or TensorFlow Lite for mobile/embedded
- Serverless: Deploy with AWS Lambda for sporadic, low-latency requests
Containerization
FROM python:3.11-slim
COPY model.pkl app.py /app/
RUN pip install fastapi uvicorn scikit-learn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0"]
For Docker basics, see how to write a Dockerfile.
Monitoring
- Track prediction latency and error rates
- Detect data drift when input distributions change
- Set up automated retraining pipelines
Tips
- Version your models and keep training metadata
- Use A/B testing to compare model versions in production