Back to Notes

ML Feature Store

Problem

Design a Feature Store — a centralised repository for storing, sharing, and serving ML features for training and inference.


Why It Matters for You

Direct relevance: Crest Data ML thresholding engine — KPI features needed consistent computation across training and real-time inference. A feature store solves exactly this.


Functional Requirements

  • Store + version features computed from raw data
  • Serve features for real-time inference (low latency)
  • Batch features for model training
  • Feature discovery — catalog with metadata

Non-Functional Requirements

  • Low latency online serving (< 10ms p99)
  • High throughput batch reads
  • Consistency between training and serving (training-serving skew problem)

High-Level Design

Data Sources → Feature Pipeline (Spark/Flink) →
    Offline Store (S3/Data Warehouse) → Model Training
    Online Store (Redis/DynamoDB) → Real-time Inference
                    ↑
             Feature Registry (metadata, versioning)

Key Concepts

  • Online Store — low-latency KV store (Redis, DynamoDB) for inference
  • Offline Store — historical data lake (S3 + Parquet) for training
  • Feature Pipeline — batch (Spark) or streaming (Flink/Kafka) computation
  • Feature Registry — catalog of all features, versions, ownership
  • Training-serving skew — biggest problem: same feature must be computed identically in batch and streaming

Interview Angle (Your War Story)

At Crest, the ML thresholding engine needed KPI features at training time and at real-time inference time. Without a feature store, we risked skew. Use this to explain WHY feature stores exist.


Notes

<!-- Add as you study Gaurav Sen W7 — Sat May 2 -->