Cloud Run AI Cookbook
Blog
Gemma 3
Hands-on with Gemma 3 on Google Cloud
This blog post announces two codelabs that show developers how to deploy Gemma 3 on Google Cloud using either Cloud Run for a serverless approach or Google Kubernetes Engine (GKE) for a platform approach.
2025-11-17Blog
Tools
Easy AI workflow automation: Deploy n8n on Cloud Run
This blog post explains how to deploy agents using the n8n workflow automation tool on Cloud Run to create AI-powered workflows and integrate with tools like Google Workspace.
2025-11-07Blog
Extensions
Gemini
Automate app deployment and security analysis with new Gemini CLI extensions
This blog post announces the Cloud Run extension in the Gemini CLI to simplify application deployment with a single /deploy command.
2025-09-10Blog
Extensions
Gemini
From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose
This blog post announces a collaboration between Google Cloud and Docker that simplifies the deployment of complex AI applications by allowing developers to use the gcloud run compose up command to deploy their compose.yaml files directly to Cloud Run.
2025-07-10Blog
MCP
Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes
This blog post provides a step-by-step guide to building and deploying a secure, remote Model Context Protocol (MCP) server on Google Cloud Run in under 10 minutes using FastMCP, and then testing it from a local client.
2025-06-07Agents
AI Studio
Blog
MCP
AI deployment made easy: Deploy your app to Cloud Run from AI Studio or MCP-compatible AI agents
This blog post introduces ways to simplify AI deployments with one-click deployment from AI Studio to Cloud Run, direct deployment of Gemma 3 models, and a MCP server for agent-based deployments.
2025-05-20Agents
Blog
Use cases
This article showcases how CodeRabbit, an AI code review tool, utilizes Cloud Run to build a scalable and secure platform for executing untrusted code, ultimately cutting code review time and bugs in half.
2025-04-22Blog
Vertex AI
Create shareable generative AI apps in less than 60 seconds with Vertex AI and Cloud Run
This article introduces a feature in Vertex AI that allows for one-click deployment of web applications on Cloud Run. Use generative AI prompts to streamline the process of turning a generative AI concept into a shareable prototype.
2025-02-20Blog
Deployment
How to deploy serverless AI with Gemma 3 on Cloud Run
This blog post announces Gemma 3, a family of lightweight, open AI models, and explains how to deploy them on Cloud Run for scalable and cost-effective serverless AI applications.
2025-03-12Blog
GPUs
Inference
RAG
Vertex AI
Unlock Inference-as-a-Service with Cloud Run and Vertex AI
This blog post explains how developers can accelerate the development of generative AI applications by adopting an Inference-as-a-Service model on Cloud Run. This enables hosting and scaling of LLMs with GPU support and integrating them with Retrieval-Augmented Generation (RAG) for context-specific responses.
2025-02-20Frameworks
Gemini
LangChain
Quickstart: Build and deploy a Python (LangChain) web app to Cloud Run
This quickstart shows you how to build and deploy a LangChain application using Cloud Run and Gemini to respond to queries about city capitals.
2026-02-03Agents
Frameworks
Gemini
Quickstart: Build and deploy a Python (smolagents) web app to Cloud Run
This quickstart shows you how to build and deploy a smolagents application using Cloud Run and Gemini.
2026-01-28Architecture
RAG
Vertex AI
RAG infrastructure for generative AI using Vertex AI and Vector Search
This document presents a reference architecture for building a generative AI application with Retrieval-Augmented Generation (RAG) on Google Cloud, utilizing Vector Search for large-scale similarity matching and Vertex AI for managing embeddings and models.
2025-03-07Agents
Antigravity
Video
Stop coding, start architecting: Google Antigravity + Cloud Run
This video introduces Google's agentic IDE, Antigravity. Use it to build and deploy a full stack app to Cloud Run from scratch. Watch this video to write a spec sheet for the AI, force it to use modern Node.js (no build steps!), and watch it autonomously debug a port mismatch during deployment touching a config file.
2025-12-08Agents
GPUs
Ollama
Video
This AI agent runs on Cloud Run + NVIDIA GPUs
This video shows how to build a real AI agent application on a serverless NVIDIA GPU. See a demo of a smart health agent that uses open source models like Gemma with Ollama on Cloud Run, and LangGraph to build a multi-agent workflow (RAG + tools).
2025-11-13MCP
Video
Power your AI agents with MCP tools on Google Cloud Run
This video introduces MCP (Model Context Protocol) and how it makes life easier for AI agent developers. Get a walk through of building an MCP server using FastMCP, and deploying an ADK agent on Cloud Run. See how the code handles service to service authentication using Cloud Run's built-in OIDC tokens.
2025-11-06Model Armor
Security
Video
We tried to jailbreak our AI (and Model Armor stopped it)
This video shows an example of using Google's Model Armor to block threats with an API call.
2025-10-30Benchmarking
Vertex AI
Video
Don't guess: How to benchmark your AI prompts
This video shows how to use Vertex AI to build reliable generative AI applications using Google Cloud's tools. Developers will learn how to use Google Cloud tools for rapid prototyping, get hard numbers with data-driven benchmarking, and finally, build an automated CI/CD pipeline for true quality control, all while avoiding common pitfalls.
2025-10-23ADK
Multi-agent
Video
How to build a multi-agent app with ADK and Gemini
This video shows how to build an app using Google's ADK (Agent Development Toolkit) that helps you refine and collaborate on content. Explore how stateful multi-agents work better than a single agent.
2025-10-16Gemini
Video
Build an AI app that watches videos using Gemini
This video shows how to build an app that watches and understands YouTube videos using Gemini 2.5 Pro. Use smart prompts to customize your app's output for blog posts, summaries, quizzes, and more. This video covers how to integrate Gemini to generate both text content and header images from video input, discuss cost considerations, and explain how to handle longer videos with batch requests.
2025-10-06GenAI
Video
Let's build a GenAI app on Cloud Run
This video walks you through the architecture and code, using AI to help with every step.
2025-07-17Agents
Firebase
Video
Build AI agents with Cloud Run and Firebase Genkit
This video shows how to build AI agents with Cloud Run and Firebase Genkit, a serverless AI agent builder.
2025-07-10AI Studio
Firebase
Gemini
LLMs
Video
This videos provides a demo on how to quickly build a tech support application using AI Studio, Cloud Functions, and Firebase Hosting. Learn how to leverage Large Language Models (LLMs) and see a practical example of integrating AI into a traditional web application.
2025-06-19ADK
Agents
Frameworks
LangGraph
Vertex AI
Video
Building AI agents on Google Cloud
This video shows how to build and deploy AI agents using Cloud Run and Vertex AI. Explore key concepts like tool calling, model agnosticism, and the use of frameworks like LangGraph and the Agent Development Kit (ADK).
2025-05-21AI models
GPUs
Ollama
Video
How to host DeepSeek with Cloud Run GPUs in 3 steps
This video shows how to simplify hosting the DeepSeek AI model with Cloud Run GPUs. Learn how to deploy and manage Large Language Models (LLMs) on Google Cloud with three commands. Watch along and discover the capabilities of Cloud Run and the Ollama command-line tool, allowing developers to operate AI applications rapidly with on-demand resource allocation and scaling.
2025-04-24Function calling
Gemini
Video
How to use Gemini function calling with Cloud Run
This video explores the power of Gemini function calling and learn how to integrate external APIs into your AI applications. Build a weather app that leverages Gemini's natural language understanding to process user requests and fetch weather data from an external API, providing a practical example of function calling in action.
2025-01-23Image generation
Vertex AI
Video
Text to image with Google Cloud's Vertex AI on Cloud Run
This video shows how to build an image generation app using Vertex AI on Google Cloud. With Vertex AI image generation model, developers can create stunning visuals without the need for complex infrastructure or model management.
2025-01-16GPUs
Ollama
Video
Ollama and Cloud Run with GPUs
This video explains how to use Ollama to easily deploy large language models on Cloud Run with GPUs for scalable and efficient AI model deployment in the cloud.
2024-12-02Data protection
Security
Video
Protecting sensitive data in AI apps
This video shows how to safeguard sensitive data in AI applications. Explore key concepts, best practices, and tools for protecting data throughout the AI lifecycle.
2024-11-21LangChain
RAG
Video
RAG with LangChain on Google Cloud
This video shows how to enhance the accuracy of your AI applications using Retrieval-Augmented Generation (RAG). Build a web application that leverages the power of RAG with LangChain, a technique that makes AI responses more accurate and precise.
2024-11-07Large prompt window
Model tuning
RAG
Video
RAG vs Model tuning vs Large prompt window
This video discusses the three primary methods for integrating your data into AI applications: prompts with long context windows, Retrieval Augmented Generation (RAG), and model tuning. Learn the strengths, limitations, and ideal use cases for each approach to make informed decisions for your AI projects in this episode of Serverless Expeditions.
2024-11-14Prompt engineering
Video
Prompt engineering for developers
This video shows how to use prompt engineering to improve the quality of AI responses. Watch the video to learn how to unlock more accurate and relevant responses from generative AI with chain of thought, few-shot, and multi-shot prompting techniques.
2024-10-31AI models
GPUs
LLMs
Video
Deploying A GPU-Powered LLM on Cloud Run
This video shows how you can deploy your own GPU-powered large language model (LLM) on Cloud Run. This video walks through taking an open-source model like Gemma and deploying it as a scalable, serverless service with GPU acceleration
2024-10-06GPUs
LLMs
Ollama
Video
This video shows a demonstration of deploying Google's Gemma 2, an open-source large language model, through Ollama on Cloud Run.
2024-10-03Gemini
LLMs
Video
Build AI chat apps on Google Cloud
This video shows how to build a large language model (LLM) chat app on Gemini.
2024-08-29Multimodal
Vertex AI
Video
This video shows a demo of using Vertex AI to build a multimodal application that processes video, audio, and text to create output.
2024-08-15AI models
Vertex AI
Video
Using Serverless Generative AI | Google Vertex AI
This video shows how to build and deploy blazing-fast generative AI apps using Vertex AI Studio, Cloud Run, and generative AI models.
2024-02-22Codelab
Tools
Deploying and Running n8n on Google Cloud Run
This codelab shows how to deploy a production-ready instance of the n8n workflow automation tool on Cloud Run, complete with a Cloud SQL database for persistence and Secret Manager for sensitive data.
2025-11-20Codelab
GPUs
LLMs
How to run LLM inference on Cloud Run GPUs with vLLM and the OpenAI Python SDK
This codelab shows how to deploy Google's Gemma 2 2b instruction-tuned model on Cloud Run with GPUs, using vLLM as an inference engine and the OpenAI Python SDK to perform sentence completion.
2025-11-13ADK
Agents
Codelab
Deploy, Manage, and Observe ADK Agent on Cloud Run
This codelab walks you through deploying, managing, and monitoring a powerful agent built with the Agent Development Kit (ADK) on Cloud Run.
2025-11-12Codelab
Gemini CLI
MCP
How to deploy a secure MCP server on Cloud Run
This codelab walks you through deploying a secure Model Context Protocol (MCP) server on Cloud Run and connecting to it from the Gemini CLI.
2025-10-28ADK
Agents
Codelab
MCP
Build and deploy an ADK agent that uses an MCP server on Cloud Run
This codelab guides you through building and deploying a tool-using AI agent with the Agent Development Kit (ADK). The agent connects to a remote MCP server for its tools, and is deployed as a container on Cloud Run.
2025-10-27AI models
Cloud Run jobs
Codelab
Model tuning
How to fine-tune a LLM using Cloud Run Jobs
This codelab provides a step-by-step guide on how to use Cloud Run Jobs with GPUs to fine-tune a Gemma 3 model on the Text2Emoji dataset and then serve the resulting model on a Cloud Run service with vLLM.
2025-10-21Batch inference
Cloud Run jobs
Codelab
How to run batch inference on Cloud Run jobs
This codelab demonstrates how to use a GPU-powered Cloud Run job to run batch inference on a Llama 3.2-1b model and write the results directly to a Cloud Storage bucket.
2025-10-21ADK
Agents
Codelab
GPUs
LLMs
MCP
Lab 3:Prototype to Production - Deploy Your ADK Agent to Cloud Run with GPU
This codelab demonstrates how to deploy a production-ready Agent Development Kit (ADK) agent with a GPU-accelerated Gemma backend on Cloud Run. The codelab covers deployment, integration, and performance testing.
2025-10-03Agents
Codelab
How to deploy a Gradio frontend app that calls a backend ADK agent, both running on Cloud Run
This codelab demonstrates how to deploy a two-tier application on Cloud Run, consisting of a Gradio frontend and an ADK agent backend, with a focus on implementing secure, authenticated service-to-service communication.
2025-09-29Codelab
Gemini
How to deploy a FastAPI chatbot app to Cloud Run using Gemini
This codelab shows you how to deploy a FastAPI chatbot app to Cloud Run.
2025-04-02Cloud Run functions
Codelab
LLMs
How to host a LLM in a sidecar for a Cloud Run function
This codelab shows you how to host a gemma3:4b model in a sidecar for a Cloud Run function.
2025-03-27Community
Security
Securely call your Cloud Run service from anywhere
This article provides a Python code example that acquires an identity token to securely call an authenticated Cloud Run service from any environment. The example uses application default credentials (ADC) to authenticate the call.
2025-10-15AI models
Community
RAG
Serverless AI: EmbeddingGemma with Cloud Run
This article provides a step-by-step guide on how to containerize and deploy the EmbeddingGemma model to Cloud Run with GPUs, and then use it to build a RAG application.
2025-09-24Community
Security
Chain of Trust for AI: Securing MCP Toolbox Architecture on Cloud Run
This article deconstructs a simple hotel booking application built on Google Cloud. It demonstrates a robust, zero-trust security model using service identities, and shows how a secure chain of trust is established from the end-user all the way to the database.
2025-09-03AI models
Community
Containerization
Docker
Ollama
RAG
Serverless AI: Qwen3 Embeddings with Cloud Run
This article provides a tutorial on how to deploy the Qwen3 Embedding model to Cloud Run with GPUs. The article also covers containerization with Docker and Ollama, and provides an example of how to use it in a RAG application.
2025-08-20Architecture
Community
LLMs
Still Packaging AI Models in Containers? Do This Instead on Cloud Run
This article advocates for a more efficient and scalable architecture for serving large language models (LLMs) on Cloud Run by decoupling model files from the application container, and instead using Cloud Storage FUSE.
2025-08-11AI models
Community
Building an AI-Powered Podcast Generator with Gemini and Cloud Run
This article details how to build a serverless AI-powered podcast generator that uses Gemini for content summarization and Cloud Run. The example orchestrates the automated pipeline for generating and delivering daily audio briefings from RSS feeds.
2025-08-11Community
MCP
Power your MCP servers with Google Cloud Run
This article explains the purpose of the Model Context Protocol (MCP) and provides a tutorial on how to build and deploy a MCP server on Cloud Run to expose resources as tools for AI applications.
2025-07-09Community
ML models
Monitoring
Deploying & Monitoring ML Models with Cloud Run — Lightweight, Scalable, and Cost-Efficient
This article explains how to deploy, monitor, and automatically scale a machine learning model on Cloud Run, utilizing a lightweight monitoring stack with Google Cloud services to track performance and control costs.
2025-05-29AI models
AI Studio
Community
LLMs
Deploying Gemma Directly from AI Studio to Cloud Run
This article provides a step-by-step tutorial on how to take a Gemma model from AI Studio, adapt its code for production, and deploy it as a containerized web application on Cloud Run.
2025-05-29ADK
Agents
Community
MCP
The Triad of Agent Architecture: ADK, MCP, and Cloud Run
This article demonstrates how to build an AI agentic architecture by setting up an Agent Development Kit (ADK) workflow that communicates with a Model Context Protocol (MCP) server hosted on Cloud Run to manage flight bookings.
2025-05-27A2A
Agents
Community
Frameworks
Use cases
Exploring Agent2Agent (A2A) Protocol with Purchasing Concierge Use Case on Cloud Run
This article explains the Agent2Agent (A2A) protocol and demonstrates its use with a purchasing concierge application. The Cloud Run app contains multiple AI agents, built with different frameworks, and collaborate amongst itself to fulfill a user's order.
2025-05-15AI models
Automation
CI/CD
Community
GitHub
Automating ML Models Deployment with GitHub Actions and Cloud Run
This article provides a comprehensive guide on how to create a CI/CD pipeline with GitHub Actions to automate the build and deployment of machine learning models as containerized services on Cloud Run.
2025-05-08Community
LLMs
Security
Building Sovereign AI Solutions with Google Cloud - Cloud Run
This article provides a step-by-step guide on how to build and deploy a sovereign AI solution on Google Cloud by using Sovereign Controls by Partners. The examples runs a Gemma model on Cloud Run, ensuring data residency and compliance with European regulations.
2025-04-03Community
LLMs
From Zero to Deepseek on Cloud Run during my morning commute
This article shows how to rapidly deploy the Deepseek R1 model on Cloud Run with GPUs using Ollama during a morning commute. This article explores advanced topics such as embedding the model in the container, A/B testing with traffic splitting, and adding a web UI with a sidecar container.
2025-02-11Community
LLMs
Ollama
How to run (any) open LLM with Ollama on Google Cloud Run [Step-by-step]
This article shows how to host any open LLM, such as Gemma 2, on Google Cloud Run using Ollama. The article also includes instructions for creating a Cloud Storage bucket for model persistence and testing the deployment.
2025-01-20Community
ML models
Deployment of Serverless Machine Learning models with GPUs using Google Cloud: Cloud Run
This article provides a step-by-step guide to deploying a machine learning (ML) model with GPU support on Cloud Run. The article covers everything from project setup and containerization to automated deployment with Cloud Build and testing with curl and JavaScript.
2025-01-17