CloudQuery
Creating a New Source Integration in JavaScript
This guide walks through building a source integration in JavaScript/TypeScript using the CloudQuery SDK. We reference the Airtable integration as an example throughout.
Prerequisites:
- Node.js 20+ installed (Node.js tutorial)
- CloudQuery CLI installed
- Familiarity with TypeScript is helpful (the SDK emits TypeScript type definitions)
Get Started
Clone the Airtable integration as a starting point, then install dependencies:
Or install the SDK directly if starting from scratch:
npm i @cloudquery/plugin-sdk-javascriptHow It All Connects
When a user runs cloudquery sync, here’s what happens with your JavaScript integration:
- The CLI starts your integration (or connects to it over gRPC if you’re running it locally)
- Your
main.tscreates the plugin and starts the gRPC server - The CLI calls your
newClientfunction with the user’sspecconfiguration newClientparses the spec, validates it, creates an authenticated API client, and discovers available tables- The CLI calls
sync, which runs each table’s resolver through the scheduler - Each resolver is an async function that calls
stream.write()for each record. The SDK handles writing them to the destination.
Your main implementation work is in three places: the newClient function (parsing configuration and creating the API client), the table definitions (defining columns and how they map to API data), and the resolvers (fetching data from the API).
Project Structure
Here’s the structure based on the Airtable integration:
plugins/source/airtable/
├── package.json # SDK dependency (@cloudquery/plugin-sdk-javascript)
├── tsconfig.json
├── Dockerfile
└── src/
├── main.ts # Entry point
├── serve.ts # Creates serve command
├── plugin.ts # Plugin definition (newPlugin, newClient, sync)
├── tables.ts # Table definitions, column mapping, resolvers
├── spec.ts # Spec parsing + JSON schema validation
└── airtable.ts # API types and enumsHere’s what each component does:
main.ts/serve.ts: the entry point. Creates the plugin and starts the gRPC server. This is boilerplate you rarely modify.plugin.ts: the core wiring. ThenewClientfunction is called when a sync starts: it parses the spec, creates the API client, discovers tables, and returns an object withtables(),sync(), andclose()methods. This is the equivalent of Go’sConfigurefunction.tables.ts: defines your tables and resolvers. Each table specifies its name, columns (with Apache Arrow types), and a resolver function that fetches data from the API.spec.ts: defines and validates the user’s configuration using JSON Schema (via Ajv). This catches invalid configuration before your code runs.airtable.ts(or<api_name>.ts): API-specific types, enums, and client code. Keeping this separate from CloudQuery-specific code means you can test it independently.
Entry Point
The main.ts is minimal boilerplate. It creates and runs the serve command:
// src/main.ts
import { createMyServeCommand } from './serve.js';
createMyServeCommand().parse();// src/serve.ts
import { createServeCommand } from '@cloudquery/plugin-sdk-javascript/plugin/serve';
import { newMyPlugin } from './plugin.js';
export const createMyServeCommand = () => createServeCommand(newMyPlugin());Plugin Setup
The plugin is created using newPlugin() with a newClient function. Unlike Go (where you define a Configure function) or Python (where you extend a Plugin class), the JavaScript SDK uses a functional approach: newClient is an async function that receives the spec, sets up your API client, and returns an object with tables, sync, and close methods.
The newPlugin() call registers your integration’s name, version, and team, and passes newClient as the initialization callback:
import { newPlugin, newUnimplementedDestination } from '@cloudquery/plugin-sdk-javascript/plugin/plugin';
import { sync } from '@cloudquery/plugin-sdk-javascript/scheduler';
import { Table, filterTables } from '@cloudquery/plugin-sdk-javascript/schema/table';
export const newMyPlugin = () => {
const pluginClient = {
// ...internal state
};
const newClient = async (logger, spec, options) => {
// Parse spec, create API client, discover tables
return {
plugin: {
...newUnimplementedDestination(),
tables: () => pluginClient.allTables,
sync: (options) => sync({ tables, client, stream: options.stream, /* ... */ }),
close: async () => { /* cleanup */ },
},
};
};
return newPlugin('my-integration', version, newClient, {
kind: 'source',
team: 'my-team',
jsonSchema: JSON_SCHEMA,
});
};Define a Table
In JavaScript/TypeScript, tables are created using factory functions rather than classes. Each table needs a name, a list of columns (with their Arrow types and resolvers), and a table resolver function. Unlike Go where the SDK auto-maps struct fields, in JavaScript you explicitly define each column and how it maps to the API response data using pathResolver:
import { createTable } from '@cloudquery/plugin-sdk-javascript/schema/table';
import { createColumn } from '@cloudquery/plugin-sdk-javascript/schema/column';
import { Utf8, Timestamp, Float64, Bool, Int64 } from 'apache-arrow'; // bundled with SDK
import { JSONType } from '@cloudquery/plugin-sdk-javascript/types/json';
import { pathResolver } from '@cloudquery/plugin-sdk-javascript/schema/resolvers';
const myTable = createTable({
name: 'my_integration_items',
description: 'Items from the API',
columns: [
createColumn({ name: 'id', type: new Utf8(), primaryKey: true, resolver: pathResolver('id') }),
createColumn({ name: 'name', type: new Utf8(), resolver: pathResolver('name') }),
createColumn({ name: 'created_at', type: new Timestamp(), resolver: pathResolver('created_at') }),
createColumn({ name: 'active', type: new Bool(), resolver: pathResolver('active') }),
createColumn({ name: 'metadata', type: new JSONType(), resolver: pathResolver('metadata') }),
],
resolver: myTableResolver,
});Key details:
- Column types are Apache Arrow types:
Utf8,Timestamp,Float64,Bool,Int64,Uint64 - For JSON columns, use
JSONType()from@cloudquery/plugin-sdk-javascript/types/json - Use
pathResolver('field_name')for direct field-to-column mappings - Custom column resolvers have the signature
(client, resource, column) => Promise<void>
Write a Table Resolver
The resolver is an async function that fetches data from the API and writes each record to a stream. The three arguments serve the same purpose as in other SDKs, adapted for JavaScript’s async patterns:
const myTableResolver = async (clientMeta, parent, stream) => {
const client = clientMeta; // your API client state from newClient
const items = await client.listItems();
for (const item of items) {
stream.write(item); // write each record as an object
}
};clientMeta: the client state you set up innewClient. This is where you’d access your authenticated API client, configuration values, etc.parent:nullfor top-level tables. For child tables (e.g. fetching comments for a specific post), this contains the parent row so you can extract the parent ID.stream: callstream.write(record)with plain objects whose keys match column names. Write each item as soon as you get it from the API. Don’t collect everything into an array first. This keeps memory usage low and gets data to the destination faster.
For paginated APIs, loop through pages inside the resolver and call stream.write() for each item on each page. The SDK handles batching and delivery to the destination.
Configuration & Authentication
The SDK passes the user’s spec block as a JSON string to your newClient function. Use Ajv for JSON Schema validation:
// src/spec.ts
import Ajv from 'ajv';
export const JSON_SCHEMA = {
type: 'object',
properties: {
access_token: { type: 'string' },
endpoint_url: { type: 'string', default: 'https://api.example.com' },
concurrency: { type: 'number', default: 10000 },
},
required: ['access_token'],
};
export const parseSpec = (spec: string) => {
const parsed = JSON.parse(spec);
const ajv = new Ajv({ useDefaults: true });
const validate = ajv.compile(JSON_SCHEMA);
if (!validate(parsed)) {
throw new Error(`Invalid spec: ${JSON.stringify(validate.errors)}`);
}
return parsed;
};Then in your newClient function, parse the spec and create your API client:
const newClient = async (logger, spec, options) => {
const parsedSpec = parseSpec(spec);
const apiClient = new MyApiClient(parsedSpec.access_token, parsedSpec.endpoint_url);
// ...
};Users configure authentication in their YAML file. The CLI automatically resolves environment variable references:
spec:
access_token: "${MY_API_TOKEN}"
endpoint_url: "https://api.example.com"The jsonSchema option passed to newPlugin() enables the CLI to validate user configuration before your code runs.
Common Pitfalls
Avoid these common mistakes when building JavaScript integrations:
- Don’t install Apache Arrow separately. Use the version bundled with
@cloudquery/plugin-sdk-javascript. A separate install will cause version conflicts. - Stream results immediately. Call
stream.write()for each item or page as you get it. Don’t accumulate everything in an array first. - Use
pathResolverfor direct mappings. Don’t write custom column resolvers when a field path will do. - Handle async errors properly. Make sure your resolver’s
asyncfunction eitherawaits or catches all promises. Unhandled rejections will crash the integration. - Pass
jsonSchematonewPlugin(). This lets the CLI validate the user’s spec before your code runs, giving better error messages.
Test Locally
Start the integration as a gRPC server:
npm run dev
# Or after building: node dist/main.js serveYou can also build and run as a Docker container. See the Airtable Dockerfile. The Dockerfile exposes port 7777 and uses the entry command node dist/main.js serve --address [::]:7777.
See Testing Locally for configuration examples and Running Locally for full details.
Publishing
Visit Publishing an Integration to the Hub for release instructions. JavaScript integrations can also be distributed as Docker images.
Real-World Examples
- Airtable: dynamic table generation from API schema with Arrow type mapping
Next Steps
Once your integration is working locally:
- Publish to the Hub: make your integration available to others
- Add tests: see
serve.test.tsin the Airtable integration for a testing pattern - Add JSON Schema validation: pass
jsonSchematonewPlugin()so the CLI validates user configuration before your code runs - Build a Docker image: see the Airtable Dockerfile for a production-ready example