Gemini — llmock

Google Gemini

llmock supports both generateContent (non-streaming) and streamGenerateContent (SSE) endpoints, plus Gemini Live over WebSocket. The same fixtures drive all three transports.

Endpoints

Method	Path	Format
POST	/v1beta/models/:model:generateContent	JSON
POST	/v1beta/models/:model:streamGenerateContent	SSE (data:)
WS	/ws/google.ai.generativelanguage.*	WebSocket JSON

Unit Test: Streaming Text

const textFixture = {
  match: { userMessage: "hello" },
  response: { content: "Hi there!" },
};

const instance = await createServer([textFixture]);

const res = await post(
  `${instance.url}/v1beta/models/gemini-2.0-flash:streamGenerateContent?alt=sse`,
  {
    contents: [{ role: "user", parts: [{ text: "hello" }] }],
  }
);

// Parse Gemini SSE chunks
const chunks = res.body.split("\n")
  .filter(l => l.startsWith("data: "))
  .map(l => JSON.parse(l.slice(6)));

// Gemini response shape
expect(chunks[0].candidates[0].content.parts[0].text).toBeDefined();

// Reassemble text
const text = chunks
  .map(c => c.candidates[0].content.parts[0].text ?? "")
  .join("");
expect(text).toBe("Hi there!");

Unit Test: Tool Call

const toolFixture = {
  match: { userMessage: "weather" },
  response: {
    toolCalls: [{ name: "get_weather", arguments: '{"city":"NYC"}' }]
  },
};

const instance = await createServer([toolFixture]);

const res = await post(
  `${instance.url}/v1beta/models/gemini-2.0-flash:streamGenerateContent?alt=sse`,
  {
    contents: [{ role: "user", parts: [{ text: "what is the weather?" }] }],
  }
);

const chunks = parseGeminiSSEChunks(res.body);
const parts = chunks[0].candidates[0].content.parts;
expect(parts[0].functionCall.name).toBe("get_weather");

Request Translation

Gemini uses a different request format (contents with parts) than OpenAI. llmock translates Gemini requests to the unified format via geminiToCompletionRequest() so the same fixture match.userMessage works regardless of which provider endpoint the request arrives on.

Gemini Live (WebSocket)

Gemini Live uses WebSocket at /ws/google.ai.generativelanguage.* for bidirectional streaming. See the WebSocket APIs page for details.

Gemini Live text support is unverified against a real model — no text-capable Gemini Live model existed at time of writing. The implementation follows the API specification.

Vertex AI

Google Cloud's Vertex AI provides access to Gemini models through a different URL pattern than the AI Studio API. llmock supports Vertex AI requests using the same Gemini handler — the URL pattern is different, but the request and response formats are identical.

Vertex AI URLs follow the pattern:

POST /v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent
POST /v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContent

The same fixtures work for both Gemini AI Studio and Vertex AI endpoints. See the Vertex AI page for configuration details.