CloudQuery

Creating a New Source Integration in JavaScript

This guide walks through building a source integration in JavaScript/TypeScript using the CloudQuery SDK. We reference the Airtable integration as an example throughout.

Prerequisites:

  • Node.js 20+ installed (Node.js tutorial)
  • CloudQuery CLI installed
  • Familiarity with TypeScript is helpful (the SDK emits TypeScript type definitions)

Get Started

Clone the Airtable integration as a starting point, then install dependencies:

Or install the SDK directly if starting from scratch:

npm i @cloudquery/plugin-sdk-javascript

How It All Connects

When a user runs cloudquery sync, here’s what happens with your JavaScript integration:

  1. The CLI starts your integration (or connects to it over gRPC if you’re running it locally)
  2. Your main.ts creates the plugin and starts the gRPC server
  3. The CLI calls your newClient function with the user’s spec configuration
  4. newClient parses the spec, validates it, creates an authenticated API client, and discovers available tables
  5. The CLI calls sync, which runs each table’s resolver through the scheduler
  6. Each resolver is an async function that calls stream.write() for each record. The SDK handles writing them to the destination.

Your main implementation work is in three places: the newClient function (parsing configuration and creating the API client), the table definitions (defining columns and how they map to API data), and the resolvers (fetching data from the API).

Project Structure

Here’s the structure based on the Airtable integration:

plugins/source/airtable/
├── package.json              # SDK dependency (@cloudquery/plugin-sdk-javascript)
├── tsconfig.json
├── Dockerfile
└── src/
    ├── main.ts               # Entry point
    ├── serve.ts              # Creates serve command
    ├── plugin.ts             # Plugin definition (newPlugin, newClient, sync)
    ├── tables.ts             # Table definitions, column mapping, resolvers
    ├── spec.ts               # Spec parsing + JSON schema validation
    └── airtable.ts           # API types and enums

Here’s what each component does:

  • main.ts / serve.ts: the entry point. Creates the plugin and starts the gRPC server. This is boilerplate you rarely modify.
  • plugin.ts: the core wiring. The newClient function is called when a sync starts: it parses the spec, creates the API client, discovers tables, and returns an object with tables(), sync(), and close() methods. This is the equivalent of Go’s Configure function.
  • tables.ts: defines your tables and resolvers. Each table specifies its name, columns (with Apache Arrow types), and a resolver function that fetches data from the API.
  • spec.ts: defines and validates the user’s configuration using JSON Schema (via Ajv). This catches invalid configuration before your code runs.
  • airtable.ts (or <api_name>.ts): API-specific types, enums, and client code. Keeping this separate from CloudQuery-specific code means you can test it independently.

Entry Point

The main.ts is minimal boilerplate. It creates and runs the serve command:

// src/main.ts
import { createMyServeCommand } from './serve.js';
 
createMyServeCommand().parse();
// src/serve.ts
import { createServeCommand } from '@cloudquery/plugin-sdk-javascript/plugin/serve';
import { newMyPlugin } from './plugin.js';
 
export const createMyServeCommand = () => createServeCommand(newMyPlugin());

Plugin Setup

The plugin is created using newPlugin() with a newClient function. Unlike Go (where you define a Configure function) or Python (where you extend a Plugin class), the JavaScript SDK uses a functional approach: newClient is an async function that receives the spec, sets up your API client, and returns an object with tables, sync, and close methods.

The newPlugin() call registers your integration’s name, version, and team, and passes newClient as the initialization callback:

import { newPlugin, newUnimplementedDestination } from '@cloudquery/plugin-sdk-javascript/plugin/plugin';
import { sync } from '@cloudquery/plugin-sdk-javascript/scheduler';
import { Table, filterTables } from '@cloudquery/plugin-sdk-javascript/schema/table';
 
export const newMyPlugin = () => {
  const pluginClient = {
    // ...internal state
  };
 
  const newClient = async (logger, spec, options) => {
    // Parse spec, create API client, discover tables
    return {
      plugin: {
        ...newUnimplementedDestination(),
        tables: () => pluginClient.allTables,
        sync: (options) => sync({ tables, client, stream: options.stream, /* ... */ }),
        close: async () => { /* cleanup */ },
      },
    };
  };
 
  return newPlugin('my-integration', version, newClient, {
    kind: 'source',
    team: 'my-team',
    jsonSchema: JSON_SCHEMA,
  });
};

Define a Table

In JavaScript/TypeScript, tables are created using factory functions rather than classes. Each table needs a name, a list of columns (with their Arrow types and resolvers), and a table resolver function. Unlike Go where the SDK auto-maps struct fields, in JavaScript you explicitly define each column and how it maps to the API response data using pathResolver:

import { createTable } from '@cloudquery/plugin-sdk-javascript/schema/table';
import { createColumn } from '@cloudquery/plugin-sdk-javascript/schema/column';
import { Utf8, Timestamp, Float64, Bool, Int64 } from 'apache-arrow'; // bundled with SDK
import { JSONType } from '@cloudquery/plugin-sdk-javascript/types/json';
import { pathResolver } from '@cloudquery/plugin-sdk-javascript/schema/resolvers';
 
const myTable = createTable({
  name: 'my_integration_items',
  description: 'Items from the API',
  columns: [
    createColumn({ name: 'id', type: new Utf8(), primaryKey: true, resolver: pathResolver('id') }),
    createColumn({ name: 'name', type: new Utf8(), resolver: pathResolver('name') }),
    createColumn({ name: 'created_at', type: new Timestamp(), resolver: pathResolver('created_at') }),
    createColumn({ name: 'active', type: new Bool(), resolver: pathResolver('active') }),
    createColumn({ name: 'metadata', type: new JSONType(), resolver: pathResolver('metadata') }),
  ],
  resolver: myTableResolver,
});

Key details:

  • Column types are Apache Arrow types: Utf8, Timestamp, Float64, Bool, Int64, Uint64
  • For JSON columns, use JSONType() from @cloudquery/plugin-sdk-javascript/types/json
  • Use pathResolver('field_name') for direct field-to-column mappings
  • Custom column resolvers have the signature (client, resource, column) => Promise<void>

Write a Table Resolver

The resolver is an async function that fetches data from the API and writes each record to a stream. The three arguments serve the same purpose as in other SDKs, adapted for JavaScript’s async patterns:

const myTableResolver = async (clientMeta, parent, stream) => {
  const client = clientMeta; // your API client state from newClient
  const items = await client.listItems();
 
  for (const item of items) {
    stream.write(item); // write each record as an object
  }
};
  • clientMeta: the client state you set up in newClient. This is where you’d access your authenticated API client, configuration values, etc.
  • parent: null for top-level tables. For child tables (e.g. fetching comments for a specific post), this contains the parent row so you can extract the parent ID.
  • stream: call stream.write(record) with plain objects whose keys match column names. Write each item as soon as you get it from the API. Don’t collect everything into an array first. This keeps memory usage low and gets data to the destination faster.

For paginated APIs, loop through pages inside the resolver and call stream.write() for each item on each page. The SDK handles batching and delivery to the destination.

Configuration & Authentication

The SDK passes the user’s spec block as a JSON string to your newClient function. Use Ajv for JSON Schema validation:

// src/spec.ts
import Ajv from 'ajv';
 
export const JSON_SCHEMA = {
  type: 'object',
  properties: {
    access_token: { type: 'string' },
    endpoint_url: { type: 'string', default: 'https://api.example.com' },
    concurrency: { type: 'number', default: 10000 },
  },
  required: ['access_token'],
};
 
export const parseSpec = (spec: string) => {
  const parsed = JSON.parse(spec);
  const ajv = new Ajv({ useDefaults: true });
  const validate = ajv.compile(JSON_SCHEMA);
  if (!validate(parsed)) {
    throw new Error(`Invalid spec: ${JSON.stringify(validate.errors)}`);
  }
  return parsed;
};

Then in your newClient function, parse the spec and create your API client:

const newClient = async (logger, spec, options) => {
  const parsedSpec = parseSpec(spec);
  const apiClient = new MyApiClient(parsedSpec.access_token, parsedSpec.endpoint_url);
  // ...
};

Users configure authentication in their YAML file. The CLI automatically resolves environment variable references:

spec:
  access_token: "${MY_API_TOKEN}"
  endpoint_url: "https://api.example.com"

The jsonSchema option passed to newPlugin() enables the CLI to validate user configuration before your code runs.

Common Pitfalls

Avoid these common mistakes when building JavaScript integrations:

  • Don’t install Apache Arrow separately. Use the version bundled with @cloudquery/plugin-sdk-javascript. A separate install will cause version conflicts.
  • Stream results immediately. Call stream.write() for each item or page as you get it. Don’t accumulate everything in an array first.
  • Use pathResolver for direct mappings. Don’t write custom column resolvers when a field path will do.
  • Handle async errors properly. Make sure your resolver’s async function either awaits or catches all promises. Unhandled rejections will crash the integration.
  • Pass jsonSchema to newPlugin(). This lets the CLI validate the user’s spec before your code runs, giving better error messages.

Test Locally

Start the integration as a gRPC server:

npm run dev
# Or after building: node dist/main.js serve

You can also build and run as a Docker container. See the Airtable Dockerfile. The Dockerfile exposes port 7777 and uses the entry command node dist/main.js serve --address [::]:7777.

See Testing Locally for configuration examples and Running Locally for full details.

Publishing

Visit Publishing an Integration to the Hub for release instructions. JavaScript integrations can also be distributed as Docker images.

Real-World Examples

  • Airtable: dynamic table generation from API schema with Arrow type mapping

Next Steps

Once your integration is working locally:

  1. Publish to the Hub: make your integration available to others
  2. Add tests: see serve.test.ts in the Airtable integration for a testing pattern
  3. Add JSON Schema validation: pass jsonSchema to newPlugin() so the CLI validates user configuration before your code runs
  4. Build a Docker image: see the Airtable Dockerfile for a production-ready example

Resources