Cloud Infrastructure, MLOps, Machine Learning, Python

GCP Serverless prediction service using Cloud Run from a Vertex AI model

Christopher Mortimer
Specialising in data
More posts by Christopher Mortimer.

Christopher Mortimer

15 Feb 2025•6 min read

GCP Serverless prediction service using Cloud Run from a Vertex AI model

Deploying ML Models on Cloud Run: From Placeholder to Production

This guide walks through deploying a machine learning model on Google Cloud Run, using a two-phase deployment strategy: placeholder service followed by model deployment.

Architecture Overview

The architecture connects these components:

Vertex AI for training and model registry
Model artifacts stored in Cloud Storage
Cloud Run service running our prediction container
BigQuery for making predictions via remote functions (not discussed in this blog post)

Prerequisites

I have built this on my Windows development machine with WSL2, VScode and Docker Desktop. The are some of the other requirements on your local development machine.

Google Cloud Project with required APIs enabled
Terraform >= 1.0
Python >= 3.9
gcloud CLI configured

Phase 1: Placeholder Service

First, we deploy a minimal Nginx container that serves a simple status endpoint. This helps us establish our infrastructure and permissions before deploying the actual model.

Placeholder Container Structure

Our placeholder service uses Nginx to serve a static JSON response:

📁devops/serve/placeholder/
├── Dockerfile
├── nginx.conf
└── status.json

server {
    listen 8080;
    server_name localhost;

    location / {
        root /usr/share/nginx/html;
        try_files /status.json =404;
        default_type application/json;
    }
}

Initial Deployment

The placeholder is deployed via Terraform:

resource "google_cloud_run_service" "cr_nfl_predict" {
  name     = "nfl-touchdown"
  location = var.region

  template {
    spec {
      containers {
        image = "${var.region}-docker.pkg.dev/prj-xyz-shr-rep-0/rpo-bld-dkr-0/placeholder:latest"
        // ...configuration...
      }
    }
  }
}

ModelPorter: A Custom MLOps CLI

ModelPorter is our custom command-line tool that orchestrates the ML model lifecycle across Google Cloud Platform services. It simplifies the complex workflow of training, deploying, and serving machine learning models.

Why We Built It

Traditional ML deployment often involves juggling multiple cloud services, configuration files, and deployment steps. For our NFL touchdown prediction model, we needed to:

Build and push Docker containers for both training and prediction
Manage training jobs on Vertex AI
Deploy prediction endpoints to Cloud Run
Handle environment-specific configurations
Maintain consistent model versioning

Rather than managing these steps manually or with complex shell scripts, ModelPorter provides a unified interface.

Key Features

# Build containers
modelporter --env=dev nfl.touchdown build --phase=predict

# Train models
modelporter --env=dev nfl.touchdown train --service=vertex

# Deploy prediction service
modelporter --env=dev nfl.touchdown serve --service=cloudrun

The CLI handles:

Environment-specific configuration management
Container image building and pushing
Vertex AI training job orchestration
Cloud Run service updates
Model artifact management

Under the Hood

ModelPorter uses several key technologies:

Click: For CLI interface and command structure
Hydra: For configuration management
Cloud Build API: For container image building
Vertex AI API: For model training orchestration
Cloud Run API: For deployment management

Each command in ModelPorter maps to a specific workflow:

Build Command: Packages model code into containers

# Key environment variables set during build
buildargs = {
    "env": env,
    "project_id": cfg.project_id
}

Train Command: Manages Vertex AI training jobs

# Vertex AI training configuration
job = aiplatform.CustomContainerTrainingJob(
    display_name=f"{display_name}-train-job",
    container_uri=container_uri,
    model_serving_container_image_uri=model_serving_container_image_uri
)

Serve Command: Updates Cloud Run services

# Cloud Run service configuration
service = {
    "template": {
        "containers": [{
            "image": model_serving_container_image_uri,
            "env": [
                {"name": "AIP_STORAGE_URI", "value": model.uri},
                {"name": "AIP_PREDICT_ROUTE", "value": "/predict"}
            ]
        }]
    }
}

This abstraction allows our team to focus on model development rather than deployment mechanics, while maintaining consistency across environments.

Phase 2: Model Training and Deployment

Training on Vertex AI

Using our custom modelporter CLI tool:

modelporter --env=dev nfl.touchdown train --service=vertex

This command:

Initiates a training job on Vertex AI
Saves the trained model to GCS bucket
Registers the model in Vertex AI registry

No existing model found with display name 'nfl-touchdown'. Creating a new model.
⠴ Training model nfl.touchdown is training on vertex....Training Output directory:
gs://bkt-xyz-dev-nfl-vertex-0/aiplatform-custom-training-2025-02-28-16:29:02.387
⠼ Training model nfl.touchdown is training on vertex....View Training:
https://console.cloud.google.com/ai/platform/locations/australia-southeast1/training/<training-job-id>?project=<project-number>
⠸ Training model nfl.touchdown is training on vertex....CustomContainerTrainingJob projects/<project-number>/locations/australia-southeast1/trainingPipelines/<training-job-id> current state:
PipelineState.PIPELINE_STATE_RUNNING
View backing custom job:
https://console.cloud.google.com/ai/platform/locations/australia-southeast1/training/<id>?project=<project-number>
⠴ Training model nfl.touchdown is training on vertex....CustomContainerTrainingJob projects/<project-number>/locations/australia-southeast1/trainingPipelines/<training-job-id> current state:
PipelineState.PIPELINE_STATE_RUNNING
⠙ Training model nfl.touchdown is training on vertex....CustomContainerTrainingJob run completed. Resource name: projects/<project-number>/locations/australia-southeast1/trainingPipelines/<training-job-id>
⠴ Training model nfl.touchdown is training on vertex....Model available at projects/<project-number>/locations/australia-southeast1/models/<model-id>

The Training pipeline results in a registered model in Vertex AI.

As well as the model artifact in storage.

Deploying the Model Service

Once training is complete, we deploy our FastAPI prediction service:

modelporter --env=dev nfl.touchdown serve

Behind the scenes, modelporter:

Locates the latest trained model in Vertex AI
Updates the Cloud Run service with our prediction container
Configures environment variables to point to the model in GCS

Waiting for operation to complete...
Cloud Run service updated. URL: https://nfl-touchdown-<>.a.run.app

Once this has finished, if we browse to the Cloud Run service in the web console, and look at the Revisions, we can see a new revision taking 100% of the traffic.

# Key environment variables set during deployment
env = [
    {"name": "AIP_STORAGE_URI", "value": model.uri},
    {"name": "AIP_PREDICT_ROUTE", "value": "/predict"},
    {"name": "PROJECT_ID", "value": project_id}
]

Testing the Deployment

You can verify the deployment first getting an identity token to authenticate to the Cloud Run API:

# Get an identity token to pass as a bearer
gcloud auth print-identity-token

I like to use Thunder Client (The VSCode extension to test constructing API calls).

Thunder Client can also help you generate the programmatic calls in many languages.

# Make a prediction
curl -X POST \
  'https://nfl-touchdown-<project-id>.<region>.run.app/predict' \
  --header 'Accept: */*' \
  --header 'User-Agent: Thunder Client (https://www.thunderclient.com)' \
  --header 'Authorization: Bearer <identity-token>' \
  --header 'Content-Type: application/json' \
  --data-raw '{"instances":[[23.0, 150.0, 1.0, 20.0]]}'

Inspecting the Google Cloud Logs Explorer we can see the service was started and called successfully, confirming the response we got in Thunder Client was indeed returned from the Cloud Run service.

Infrastructure as Code

All base Cloud infrastructure components are managed via Terraform:

Cloud Run service configuration
IAM permissions
BigQuery connections
Storage buckets

This ensures consistent deployments across environments and makes it easy to replicate the setup.

terraform init -backend-config="vars/backend-dev.hcl"

Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Using previously-installed hashicorp/google v4.51.0
- Using previously-installed hashicorp/random v3.7.1

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

The complete solution demonstrates a production-ready ML serving infrastructure that’s both scalable and maintainable.

GCP Serverless prediction service using Cloud Run from a Vertex AI model

Christopher Mortimer

Christopher Mortimer

Deploying ML Models on Cloud Run: From Placeholder to Production

Architecture Overview

Prerequisites

Phase 1: Placeholder Service

Placeholder Container Structure

Initial Deployment

ModelPorter: A Custom MLOps CLI

Why We Built It

Key Features

Under the Hood

Phase 2: Model Training and Deployment

Training on Vertex AI

Deploying the Model Service

Testing the Deployment

Infrastructure as Code

If cloud platforms were people (thoughts from a Generative AI)

Quackpipe, running DuckDB local and on AWS Glue

Query Salesforce with Python