The "Two Pipelines" Problem

Without a Feature Store, you usually build two pipelines:

Training Pipeline: A massive SQL query that joins tables to calculate Avg_Spend_30d.
Serving Pipeline: A fast Java/Go function that queries the database to calculate Avg_Spend_30d for the user right now.

Risk: If the SQL logic and Java logic differ by even 1%, your model fails.

Vertex AI Feature Store provides a centralized repository so you define the logic once.

1. Architecture

EntityType: The "Noun" (e.g., User, Product, Store).
Feature: The "Adjective" (e.g., age, average_rating, zip_code).
Ingestion: You stream or batch write values into the store.

The Two Interfaces

Offline Store (BigQuery backed):
- Used for: Training.
- Query: "Give me the values of age and spend for these 100k users."
- Feature: Point-in-Time Lookup (Time Travel). (See below).
Online Store (Bigtable/Redis backed):
- Used for: Serving.
- Query: "Give me the latest values for User_123."
- Latency: < 10ms.

2. Point-in-Time Correctness (Time Travel)

This is the killer feature. Imagine you are training a fraud model to predict a fraud that happened on Jan 1st.

User's spend on Jan 1st was $500.
User's spend today (Feb 1st) is $1000.

If you just query "Current Spend" for your training set, you leak future information ($1000). The model learns that "High Spend causes past fraud" (Wrong). Feature Store allows you to ask: "Give me the feature values as they existed on the timestamp of the event."

3. Code Example: Fetching Features

from google.cloud import aiplatform

# 1. SERVING (Online)
# Get latest values for User 123
features = featurestore_service_client.read_feature_values(
    entity_type_id="users",
    entity_id="123",
    feature_selector={"id_matcher": {"ids": ["age", "avg_spend"]}}
)
# Returns: {age: 25, avg_spend: 500}

# 2. TRAINING (Offline)
# Get values for a list of users at specific times
training_df = aiplatform.Featurestore.batch_serve_to_dataframe(
    serving_resource=feature_store_resource,
    read_instances_uri="gs://my-bucket/training_ids_and_timestamps.csv"
)

4. Summary

Feature Store prevents skew by unifying logic.
Offline = High Throughput, Time Travel (Training).
Online = Low Latency (Serving).
Point-in-Time prevents data leakage.

In the next module, we enter the Lab. Model Prototyping.

Knowledge Check

Error: Quiz options are missing or invalid.

Vertex AI Feature Store: The Single Source of Truth