Cloud Infrastructure,  Data Engineering,  GCP,  Machine Learning,  Python,  Software Engineering

The Tale of Two Routes: A Love Story Between BigQuery and Vertex AI for Machine Learning predictions

The Tale of Two Routes: A Love Story Between BigQuery and Vertex AI for Machine Learning predictions

This blog post is the first, and probably last where I thought I would see what the Github Copilot VSCode extension could do with writing a blog post giving one file as context. The blog post is written with respect to the following Python script https://github.com/mortie23/ml/blob/master/model/nfl/touchdown/nfltouchdownpredict/predict.py. I gave it to Claude 3.5 and asked it to write a blog post based on explaining this file and asked it to tell it as a story with some humor. I had to create the images, add some extra context here and there and fix some mistakes (like you have to do with the code it generates). Here is the result:

Have you ever tried to make two different client applications happy with the same ML prediction service? It’s like trying to serve breakfast to both an early bird and a night owl - they want the same food but at different times and in different ways! This blog post will outline how I used FastAPI and a Vertex AI trained model, to make both Cloud Run and Vertex AI online endpoints be able to host the same model and allow BigQuery remote functions to call the service as well.

gcp ml predict pipeline graph

The Characters in Our Story

First, let’s meet the cast:

  • FastAPI: The lightning-fast matchmaker
  • Pydantic: The strict but fair bouncer checking IDs at the door
  • Vertex AI: The enterprise consultant who loves formal meetings at /predict
  • BigQuery: The data engineer who prefers verbose catch-ups at the root path /

Diagram showing the two different routes and their callers

The Plot Thickens: Two Routes, One Model

This is our bouncer’s checklist (Pydantic) making sure everyone’s properly dressed for the party. It’s like a strict dress code, but instead of “no sneakers,” it’s “no string where an integer should be”!

For demonstrating our use case, we will be creating a prediction service that predicts the number of touchdowns a team scored based on the:

  • Number of first downs
  • Total yards gained
  • Number of interceptions thrown
  • Number of punts

The Python class looks like the following:

class ModelInputItem(BaseModel):
    game_team_id: int = Field(..., example=411)
    total_first_downs: int = Field(..., example=26)
    total_yards: int = Field(..., example=419)
    interceptions: int = Field(..., example=0)
    punts: int = Field(..., example=4)

This bouncer also polite enough to give a really detailed of description of why you are not allowed in if you don’t meet the checklist.

This example shows what response is given when we arrive at the endpoint with a POST request that contains only a bad field name. We are told we missed the required fields.

Diagram showing Pydantic 422 detailed error message

The Tale of Two Requests

Vertex AI: The Formal One

Vertex AI shows up being very particular about hosting at /predict wearing a three-piece suit, wrapping everything in an “instances” array. Very formal, very proper.

{
  "instances": [
    {
      "game_team_id": 411,
      "total_first_downs": 26,
      "total_yards": 419,
      "interceptions": 0,
      "punts": 4
    }
  ]
}

We can host the Vertex AI trained model using the Deploy and test directly from the model registry.

Diagram showing Vertex AI request flow

BigQuery: The Chatty One

BigQuery comes directly to root route / (and this cannot be changed). It also sends it’s life story (and this cannot be changed) - who it is, where it’s from, and oh by the way, the actual data is nested three levels deep because… why not? 🤷‍♂️

{
  "requestId": "unique-request-id",
  "caller": "//bigquery.googleapis.com/projects/...",
  "sessionUser": "user@example.com",
  "calls": [
    [
      [
        {
          "game_team_id": 411,
          "total_first_downs": 26,
          "total_yards": 419,
          "interceptions": 0,
          "punts": 4
        }
      ]
    ]
  ]
}

Screenshot of Swagger UI showing the ModelInputItem schema

The Magic of FastAPI Routes

Once I started attempting to get BigQuery remote functions to call the /predict route that Vertex AI produced, I realised this wouldn’t work. So we needed another route. It is like having two different entrances to the same restaurant - the formal dining room for Vertex AI and the casual café for BigQuery. Same kitchen (our ML model), different serving styles!

@app.post("/")
async def predict_bigquery(input_data: BigQueryRemoteRequest):
    # BigQuery's casual coffee chat
    # ...existing code...

@app.post(os.environ["AIP_PREDICT_ROUTE"])
async def predict_vertex(input_data: VertexPredictionRequest):
    # Vertex AI's formal business meeting
    # ...existing code...

The Setup: Environment Magic and Config Sorcery

Before our restaurant even opens its doors, we need to set up the kitchen! Let’s peek behind the curtain:

The .env Chef’s Secret Recipe

It’s like having a secret recipe card - keeping our environment spices separate from the main cookbook! Vertex AI is very opinionated about how it uses environment variables, what they need to be called and in fact, sometimes different names for the same variable in different parts of the life cycle because… why not? 🤷‍♂️

env=dev
project_id=prj-xyz-dev-nfl-0
# For training
AIP_MODEL_DIR=gs://bkt-xyz-dev-nfl-vertex-0/aiplatform-custom-training-yyyy-mm-dd-hh:mm:ss.ms/model
# For prediction
AIP_STORAGE_URI=gs://bkt-xyz-dev-nfl-vertex-0/aiplatform-custom-training-yyyy-mm-dd-hh:mm:ss.ms/model
AIP_HEALTH_ROUTE=/ping
AIP_PREDICT_ROUTE=/predict
etc=

Vertex AI will be setting these for us. We need to be clever about setting them when deploying the service to Cloud Run for our BigQuery remote function.

Hydra: The Configuration Sous Chef

model_bucket: bkt-xyz-<env>-nfl-vertex-0
model_path: nfl-touchdown/model.joblib
etc:

Hydra is our sous chef who knows how to adapt the recipe for different kitchens (environments). Watch the magic:

initialize(version_base=None, config_path=".")
cfg = compose(config_name="predict")

# Season the config with environment flavoring
for k, v in cfg.items():
    cfg[k] = v.replace("<env>", env)

Diagram showing Hydra config templating

The Model: From Training Kitchen to Serving Station

Remember that Vertex AI training job that created our model (well, this is for another blog post)? It’s like a master chef preparing components in the main kitchen. Now we need to get that perfectly trained model to our serving station:

with open("model.joblib", "wb") as model_f:
    client.download_blob_to_file(
        f"{os.environ['AIP_STORAGE_URI']}/model.joblib", model_f
    )

_model = joblib.load("model.joblib")

This is like getting our special sauce from the central kitchen (Cloud Storage) where the training job left it. The AIP_STORAGE_URI environment variable is our delivery instructions, courtesy of Vertex AI!

Think of it as a relay race:

  1. Training Job: “I’ve created this perfect model! Storing it in GCS!”
  2. Cloud Run or Vertex Endpoint Service: “Got the location from AIP_STORAGE_URI!”
  3. Model Loading: “Model loaded and ready to make predictions!”
# Get our environment settings
env_path = Path(__file__).parent.parent / ".env"
load_dotenv(env_path)
env = os.getenv("env")

# Set up our prediction server
app = FastAPI()
client = storage.Client(os.environ["project_id"])

# Download our pre-trained model
with open("model.joblib", "wb") as model_f:
    client.download_blob_to_file(
        f"{os.environ['AIP_STORAGE_URI']}/model.joblib",
        model_f
    )

The Happy Ending

Thanks to FastAPI and Pydantic, we’ve created a ML model serving solution that’s like a restaurant that makes both fine diners and casual brunchers feel at home. The model happily predicts NFL touchdowns while BigQuery and Vertex AI are think they’re getting special treatment!

Meme about different request formats but same predictions

In the end, it’s not about the route you take, it’s about the predictions you make.

gcp ml predict bigquery predictions

A good ML service is like a well-run restaurant - the customers don’t need to know about the complex kitchen operations, they just enjoy the consistent, delicious predictions.