Building Machine Learning Pipelines with the FTI Architecture: A Practical Step-by-Step Guide

Published in

Data Engineer Things

8 min readDec 2, 2024

Source: “LLM Engineer’s Handbook” by Paul Iusztin and Maxime Labonne

In the world of data engineering and machine learning, designing scalable and maintainable pipelines is critical. While the model itself often garners the most attention, it is the infrastructure surrounding it that enables reliable, efficient, and reusable machine learning systems. The FTI (Feature, Training, Inference) architecture provides a structured approach to building modular ML pipelines by separating them into three independent components: Feature, Training, and Inference pipelines.

In this guide, we will dive into the FTI architecture, its benefits, and how to implement it step by step. By the end, you’ll have a practical understanding of how to create scalable pipelines for feature engineering, model training, and real-time inference.

1. What is the FTI Architecture?

The FTI (Feature, Training, Inference) architecture is a powerful framework for designing scalable, modular, and maintainable machine learning systems. It divides the pipeline into three distinct stages: Feature Pipeline, Training Pipeline, and Inference Pipeline, each with its unique responsibilities. This separation ensures that machine learning workflows are not only easier to build and manage but also more efficient and consistent.

The Feature Pipeline is the foundation of the FTI architecture. It is responsible for transforming raw data into engineered features that are ready for both training and inference. This involves:

Data Extraction: Retrieving raw data from various sources, such as relational databases, APIs, or data lakes. This part can be separated from the Feature Pipeline.
Feature Engineering: Performing transformations like aggregations, scaling, encoding, and computing derived metrics.
Feature Storage: Saving the processed features in a feature store (e.g., Feast, Hopsworks) for reuse during training and inference.

For example, in a customer churn prediction system, the Feature Pipeline might compute metrics such as total_spending, average_spending, and transaction_count from raw transaction data. By storing these features in a feature store, you ensure consistency between the data used for training the model and the data used during inference.

The Training Pipeline handles the machine learning model’s lifecycle, from loading historical features for training to saving the trained model. This stage typically includes:

Feature Retrieval: Fetching historical features from the feature store for training datasets.
Model Training: Training a machine learning model using frameworks like TensorFlow, PyTorch, or Scikit-learn.
Evaluation and Versioning: Evaluating the model’s performance on metrics such as accuracy, precision, and recall, and saving it in a model registry like MLflow.

For instance, a fraud detection system could use features from transaction data, train a Random Forest classifier, and register the resulting model along with metadata like hyperparameters and evaluation metrics. This ensures that the model is tracked and versioned for reproducibility.

The Inference Pipeline takes over once the model is trained and serves predictions in real-time or batch settings. The pipeline:

Retrieves the latest features for incoming data points from the feature store.
Uses the trained model to make predictions.
Serves the results through APIs or stores them in a data warehouse for downstream consumption.

In real-time inference, low-latency APIs are critical. Frameworks like FastAPI or Flask can be used to deploy models as REST endpoints, supported by Kubernetes for horizontal scaling. For batch inference, tools like Spark or Airflow can process large datasets in one go and store the predictions for reporting or analytics. For example, a recommendation system might fetch user behavior data from the feature store in real-time and predict product recommendations using a neural network.

2. Benefits of FTI Architecture

The FTI architecture offers several key benefits that make it an industry-standard approach for building machine learning systems:

Modularity: Each stage — Feature, Training, and Inference — is independent. This allows you to develop, test, and optimize them separately. For example, you can scale the Feature Pipeline for handling massive data transformations without affecting the Training or Inference pipelines.
Reusability: Features stored in a feature store can be reused across different machine learning models and projects, reducing redundancy. Similarly, trained models saved in a registry can be deployed in multiple environments with minimal effort.
Consistency: By ensuring that the same features are used during training and inference, the architecture eliminates the risk of data leakage and guarantees accurate predictions.
Scalability: Each pipeline can scale independently. The Feature Pipeline can leverage distributed systems like Spark or Dask for processing large datasets, while the Inference Pipeline can use Kubernetes to handle high prediction throughput.
Reproducibility: The architecture encourages logging and versioning at every stage — datasets, feature definitions, models, and configurations — making it easy to audit and reproduce results.

3. When Should You Use FTI Architecture?

The FTI architecture is particularly valuable for production-grade machine learning systems where scalability and reliability are critical. It is ideal for:

Collaborative Teams: Data engineers, machine learning engineers, and DevOps teams can work on separate stages without interfering with each other’s workflows.
Iterative Development: FTI allows you to iterate on individual components — like adding new features to the Feature Pipeline or experimenting with different models in the Training Pipeline — without affecting the entire system.
Real-Time Applications: Scenarios like fraud detection, recommendation systems, or predictive maintenance benefit greatly from the modular and scalable nature of FTI.

4. Implementing the FTI Architecture: A simple Guide

The FTI architecture defines clear boundaries for the Feature, Training, and Inference pipelines, allowing each component to be independently developed, tested, and scaled. Below is a detailed guide to implementing the FTI architecture in a real-world machine learning system.

Step 1: Building the Feature Pipeline

The Feature Pipeline is the backbone of the FTI architecture. It prepares raw data into reusable, production-ready features stored in a feature store for both training and inference.

1.1. Data Extraction

Extract raw data from your source systems (e.g., databases, APIs, or data lakes) using ETL tools or frameworks.

Example: Extract e-commerce customer transactions from a MySQL database.

import pandas as pd
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine("mysql+pymysql://user:password@localhost:3306/ecommerce")
query = "SELECT customer_id, transaction_date, amount FROM transactions"
raw_data = pd.read_sql(query, con=engine)

1.2. Feature Engineering

Transform the raw data into meaningful features. Common transformations include aggregations, one-hot encoding, normalization, and derived metrics.

Example: Aggregate transaction data to calculate total and average spending per customer.

features = raw_data.groupby("customer_id").agg(
    total_spending=("amount", "sum"),
    avg_spending=("amount", "mean"),
    transaction_count=("amount", "count"),
).reset_index()

1.3. Storing Features in a Feature Store

Feature stores like Feast or Hopsworks allow you to share features between training and inference pipelines, ensuring consistency.

Example: Using Feast to Store Features:

from feast import FeatureStore, Entity, FeatureView, ValueType

# Define the entity
customer_entity = Entity(name="customer_id", value_type=ValueType.INT64, description="Customer ID")

# Define the feature view
customer_features = FeatureView(
    name="customer_features",
    entities=["customer_id"],
    ttl="720h",
    schema=[
        {"name": "total_spending", "dtype": ValueType.FLOAT},
        {"name": "avg_spending", "dtype": ValueType.FLOAT},
        {"name": "transaction_count", "dtype": ValueType.INT64},
    ],
    source="data_source",  # Replace with your source
)

# Apply the feature store configuration
store = FeatureStore(repo_path="feature_repo")
store.apply([customer_entity, customer_features])

Step 2: Designing the Training Pipeline

The Training Pipeline consumes features from the Feature Pipeline, trains machine learning models, and registers the trained models for downstream usage.

2.1. Loading Features for Training

Fetch historical features and labels for training from the feature store.

training_data = store.get_historical_features(
    entity_df="SELECT customer_id FROM training_dataset",
    feature_names=["total_spending", "avg_spending", "transaction_count"],
).to_df()

2.2. Model Training

Train your machine learning model using frameworks like Scikit-learn, TensorFlow, or PyTorch.

Example: Train a classification model to predict customer churn.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X = training_data[["total_spending", "avg_spending", "transaction_count"]]
y = training_data["churn_label"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

2.3. Model Evaluation

Evaluate the trained model and log metrics to ensure reproducibility.

from sklearn.metrics import classification_report

predictions = model.predict(X_test)
print(classification_report(y_test, predictions))

2.4. Registering the Model

Save the model to a model registry like MLflow for deployment and tracking.

import mlflow.sklearn

mlflow.set_experiment("customer-churn")
with mlflow.start_run():
    mlflow.sklearn.log_model(model, artifact_path="churn_model")
    mlflow.log_metric("accuracy", accuracy_score(y_test, predictions))

Step 3: Setting Up the Inference Pipeline

The Inference Pipeline combines real-time feature retrieval and model predictions to serve predictions efficiently in production. This pipeline ensures the same feature engineering logic used during training is applied during inference, maintaining consistency and reducing errors.

3.1. Real-Time Feature Retrieval

When serving predictions in production, features must be retrieved on-demand from the feature store. This ensures that the same feature transformations applied during training are used for inference.

How It Works:

A client sends a prediction request (e.g., a customer ID).
The service retrieves the latest features for that customer ID from the feature store.
The retrieved features are passed to the deployed model for prediction.

Example: Retrieve Features from Feast: Here’s how you can fetch features in real-time using Feast:

# Import the feature store
from feast import FeatureStore

# Connect to the feature store
store = FeatureStore(repo_path="feature_repo")

# Retrieve features for a specific customer ID
entity_rows = [{"customer_id": 12345}]
features = store.get_online_features(
    entity_rows=entity_rows,
    feature_names=[
        "customer_features:total_spending",
        "customer_features:avg_spending",
        "customer_features:transaction_count",
    ],
).to_dict()

print(features)

This returns a dictionary of feature values for the given customer_id, which will be used as input for the model.

3.2. Deploying the Model

The trained model is deployed as a REST API that:

Accepts prediction requests (e.g., customer IDs).
Retrieves real-time features from the feature store.
Feeds the features into the model to produce predictions.

We use FastAPI for model deployment.

Example: Deploy a Model with Real-Time Feature Retrieval:

from fastapi import FastAPI, HTTPException
from feast import FeatureStore
import joblib

# Initialize FastAPI app and Feature Store
app = FastAPI()
store = FeatureStore(repo_path="feature_repo")
model = joblib.load("churn_model.pkl")  # Load the trained model

@app.post("/predict")
def predict(customer_id: int):
    # Retrieve features from the feature store
    entity_rows = [{"customer_id": customer_id}]
    features = store.get_online_features(
        entity_rows=entity_rows,
        feature_names=[
            "customer_features:total_spending",
            "customer_features:avg_spending",
            "customer_features:transaction_count",
        ],
    ).to_dict()

    # Prepare the feature vector for prediction
    try:
        feature_vector = [
            features["total_spending"][0],
            features["avg_spending"][0],
            features["transaction_count"][0],
        ]
    except KeyError as e:
        raise HTTPException(status_code=400, detail=f"Missing feature: {str(e)}")

    # Make a prediction using the model
    prediction = model.predict([feature_vector])
    return {"customer_id": customer_id, "prediction": prediction.tolist()}

What Happens Here:

Client Request: The client sends a POST request with a customer_id.
Feature Retrieval: The API retrieves real-time features for the customer from the feature store.
Prediction: The retrieved features are passed to the model for prediction.
Response: The API returns the prediction to the client.

3.3. Monitoring and Logging

Once the inference pipeline is live, monitoring is critical to ensure consistent performance and to detect anomalies like data drift or latency issues.

Metrics to Monitor:

Latency: Measure the time taken for feature retrieval and model inference.
Data Drift: Compare real-time features against training data statistics to detect shifts.
Prediction Metrics: Track the distribution of predictions over time.

End-to-End Workflow

Here’s a summary of how the FTI architecture ties everything together:

Feature Pipeline: Extracts and engineers features stored in a feature store.
Training Pipeline: Consumes historical features for training and registers the trained model.
Inference Pipeline: Retrieves real-time features and serves predictions using the trained model.

Conclusion

The FTI architecture provides a robust and scalable framework for building machine learning pipelines. By separating Feature, Training, and Inference pipelines, you gain modularity, reproducibility, and flexibility, making it easier to manage complex ML workflows.

As data and ML systems become more complex, adopting architectures like FTI will ensure that your systems remain reliable, scalable, and easy to maintain.

Start implementing the FTI architecture today and unlock the true potential of your machine learning projects!