How to Deploy a Machine Learning Model with FastAPI: A Step-by-Step Tutorial

Hero image for: How to Deploy a Machine Learning Model with FastAPI: A Step-by-Step Tutorial

Deploying a machine learning (ML) model is a critical step towards integrating AI into real-world applications. While model $1 is essential, it’s the deployment that bridges your model with users, services, or products. In this hands-on tutorial, you'll learn how to expose a trained ML model via a REST API using FastAPI, a modern and high-performance web framework for Python.

Why Use FastAPI for Model Deployment?

  • Speed: FastAPI is one of the fastest Python web frameworks available.
  • Asynchronous Support: It handles asynchronous code and multiple requests efficiently.
  • Easy Integration: Compatible with standard Python ML libraries (scikit-learn, $1, TensorFlow).
  • Automatic Documentation: Generates interactive Swagger docs from your code without extra effort.

Prerequisites

  • Basic knowledge of Python and scikit-learn (or similar ML libraries).
  • Python 3.7+ installed.
  • A trained and serialized ML model (e.g., saved as a .pkl or .joblib file).

Step 1: Install Required Packages

Install FastAPI and an ASGI server (Uvicorn) to serve your API, plus any ML libraries you need.

pip install fastapi uvicorn scikit-learn joblib

Step 2: Prepare Your Trained Model

Assuming you have a trained model saved as model.joblib (can be created with scikit-learn’s joblib.dump()):

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import joblib

X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier()
clf.fit(X, y)
joblib.dump(clf, 'model.joblib')

Step 3: Create the FastAPI App

Build the API in a file, for example main.py:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

# Load model at startup
model = joblib.load('model.joblib')

app = FastAPI()

class PredictRequest(BaseModel):
    features: list

@app.post('/predict')
def predict(request: PredictRequest):
    X = np.array(request.features).reshape(1, -1)
    prediction = model.predict(X)
    return {"prediction": int(prediction[0])}
  • PredictRequest: Defines the expected input as a list of features.
  • Loads model once at startup for $1.
  • Accepts POST requests to /predict with feature data.

Step 4: Run the API Server

Launch your API with Uvicorn:

uvicorn main:app --reload

Your API is now live at http://localhost:8000. FastAPI automatically generates interactive docs at /docs.

Step 5: Test Your API

Visit http://localhost:8000/docs to test your endpoint interactively, or use curl or Python’s requests library:

import requests
url = 'http://localhost:8000/predict'
data = {"features": [5.1, 3.5, 1.4, 0.2]}
response = requests.post(url, json=data)
print(response.json())

Step 6: Secure and Scale Your Deployment

  • Security: Protect your endpoint with authentication if deploying publicly (e.g., OAuth2, API keys).
  • Scaling: Use containerization (Docker), and deploy on scalable platforms (AWS, GCP, Azure).
  • Versioning: Manage multiple model versions with separate endpoints (e.g., /v1/predict, /v2/predict).

Benefits of This Approach

  • Real-time Predictions: Your model is instantly available for integration with web or mobile apps.
  • Flexibility: Can extend with more endpoints (e.g., model explanation, batch predictions).
  • Transparency: Auto-generated docs help users understand inputs and outputs.

Conclusion

Deploying your ML model with FastAPI is a practical, modern approach to making your AI solution accessible in production environments. By following this tutorial, you’ve created a REST API that can integrate seamlessly with any service or application, making your model’s predictions available at scale. As you refine your deployment workflow, consider adding features such as logging, monitoring, and authentication to ensure reliability and security.