CloudQuery

Creating a New Source Integration in Java

This guide walks through building a source integration in Java using the CloudQuery SDK. We reference the Bitbucket integration as an example throughout.

Prerequisites:

The Java SDK is distributed via GitHub Packages (not Maven Central), so you need a GitHub PAT to download the dependency.

Get Started

Clone the Bitbucket integration as a starting point.

Set up GitHub Packages authentication:

export GITHUB_ACTOR=<your-github-username>
export GITHUB_TOKEN=<personal-access-token>

Build the project:

How It All Connects

When a user runs cloudquery sync, here’s what happens with your Java integration:

  1. The CLI starts your integration (or connects to it over gRPC if you’re running it locally)
  2. Your MainClass.java creates the plugin and starts the gRPC server via PluginServe
  3. The CLI calls your plugin’s newClient method with the user’s spec configuration
  4. newClient parses the spec (using Jackson), validates it, creates an authenticated API client, and transforms tables
  5. The CLI calls sync, which runs each table’s resolver through the scheduler
  6. Each resolver is a lambda that calls stream.write() for each record. The SDK handles writing them to the destination.

Your main implementation work is in three places: the newClient method (parsing configuration and creating the API client), the table definitions (defining columns and relations), and the resolvers (lambda functions that fetch data from the API).

Project Structure

Here’s the structure based on the Bitbucket integration:

plugins/source/bitbucket/
├── settings.gradle
├── gradlew / gradlew.bat
├── Dockerfile
├── gradle/
└── app/
    ├── build.gradle             # SDK dependency (io.cloudquery:plugin-sdk-java)
    └── src/main/java/bitbucket/
        ├── MainClass.java       # Entry point
        ├── BitbucketPlugin.java # Plugin definition
        ├── client/
        │   ├── BitbucketClient.java        # API client (implements ClientMeta)
        │   └── configuration/
        │       └── Spec.java               # Configuration spec (@Data + Jackson)
        └── resources/
            ├── Workspaces.java             # Table definition
            └── Repositories.java           # Child table definition

Here’s what each component does:

  • MainClass.java: the entry point. Creates the plugin and starts the gRPC server via PluginServe.builder(). This is boilerplate you rarely modify.
  • BitbucketPlugin.java: the Plugin class extends io.cloudquery.plugin.Plugin and implements three key methods: newClient (parses the spec, creates the API client), tables (returns the list of tables), and sync (runs resolvers through the scheduler).
  • client/BitbucketClient.java: the Client implements ClientMeta and wraps your authenticated API client. Every resolver receives this client so it can make API calls. It also provides an id() method used in logs and for multiplexing.
  • client/configuration/Spec.java: the Spec is a Lombok @Data class deserialized from the user’s YAML configuration via Jackson. It defines what settings your integration accepts (credentials, endpoints, concurrency).
  • resources/: one file per table. Each file defines a table using the builder pattern (name, columns, transform, relations) and a resolver lambda that fetches data from the API. The Bitbucket example shows parent-child tables: Workspaces.java defines the parent, Repositories.java defines the child.

Entry Point

The MainClass.java creates and serves the plugin. This is boilerplate that you rarely need to change:

import io.cloudquery.plugin.PluginServe;
 
public class MainClass {
    public static void main(String[] args) {
        BitbucketPlugin plugin = new BitbucketPlugin();
        PluginServe pluginServe = PluginServe.builder()
            .args(args)
            .plugin(plugin)
            .build();
        int exitCode = pluginServe.Serve();
        System.exit(exitCode);
    }
}

Plugin Class

The Plugin class is the central piece of your integration. It extends io.cloudquery.plugin.Plugin and implements three key methods: newClient (called with the user’s spec to set up the API client), tables (returns available tables), and sync (runs the resolvers through a scheduler):

import io.cloudquery.plugin.Plugin;
import io.cloudquery.plugin.PluginKind;
 
public class BitbucketPlugin extends Plugin {
    public BitbucketPlugin() {
        super("bitbucket", PLUGIN_VERSION);
        setTeam("my-team");
        setKind(PluginKind.Source);
    }
 
    @Override
    public ClientMeta newClient(String spec, boolean noConnection) {
        // Parse spec, create API client, transform tables
        Spec parsedSpec = objectMapper.readValue(spec, Spec.class);
        return new BitbucketClient(parsedSpec);
    }
 
    @Override
    public List<Table> tables() {
        return List.of(Workspaces.getTable());
    }
 
    @Override
    public void sync(List<Table> tables, SyncStream syncStream) {
        Scheduler.builder()
            .client(client)
            .tables(tables)
            .syncStream(syncStream)
            .concurrency(spec.getConcurrency())
            .build()
            .sync();
    }
}

Define a Table

In Java, tables use the builder pattern. Similar to Go’s TransformWithStruct, the Java SDK provides TransformWithClass which auto-maps fields from a Java model class (like a POJO or Lombok @Data class) to table columns. This means you don’t need to define columns manually. The SDK inspects your model class and creates appropriate columns for each field.

import io.cloudquery.schema.Table;
import io.cloudquery.transformers.TransformWithClass;
 
public class Workspaces {
    public static Table getTable() {
        return Table.builder()
            .name("bitbucket_workspaces")
            .description("Bitbucket workspaces")
            .transform(TransformWithClass.builder(Workspace.class)
                .pkField("uuid")
                .build())
            .relations(List.of(Repositories.getTable()))
            .resolver(resolveWorkspaces())
            .build();
    }
}

Key details:

  • Use Table.builder() to construct tables
  • TransformWithClass.builder(ModelClass.class) auto-maps fields to columns
  • Set primary keys via .pkField("fieldName")
  • Define parent-child relations via .relations(List.of(ChildTable.getTable()))

Write a Table Resolver

Resolvers are where the actual API fetching happens. In Java, TableResolver is a functional interface, so you can implement it as a lambda. The lambda receives three arguments:

  • clientMeta: your Client class (cast it to access API methods). This is the same object returned by newClient in your plugin.
  • parent: for top-level tables, this is null. For child tables, it contains the parent row so you can extract the parent’s data (e.g. a workspace name to list its repositories).
  • stream: call stream.write(item) to send each record to the destination. You can also use stream::write as a method reference with forEach.

Here’s a top-level resolver that fetches workspaces, and a child resolver that fetches repositories for each workspace:

// Top-level resolver: called once, fetches all workspaces
private static TableResolver resolveWorkspaces() {
    return (clientMeta, parent, stream) -> {
        BitbucketClient client = (BitbucketClient) clientMeta;
        List<Workspace> workspaces = client.listWorkspaces();
        workspaces.forEach(stream::write);
    };
}
 
// Child resolver: called once per parent workspace row
private static TableResolver resolveRepositories() {
    return (clientMeta, parent, stream) -> {
        BitbucketClient client = (BitbucketClient) clientMeta;
        // Extract the workspace name from the parent row
        String workspaceName = ((Workspace) parent.getItem()).getName();
        client.listRepositoriesForWorkspace(workspaceName)
            .forEach(stream::write);
    };
}

The child resolver demonstrates a key pattern: parent.getItem() returns the model object from the parent row. You cast it to the parent’s type and extract whatever you need to make the child API call. The SDK automatically calls the child resolver once for each row in the parent table, so you don’t need to loop over parents yourself.

Client

The client implements io.cloudquery.schema.ClientMeta:

import io.cloudquery.schema.ClientMeta;
 
public class BitbucketClient implements ClientMeta {
    private final Spec spec;
 
    @Override
    public String id() {
        return "bitbucket";
    }
 
    public List<Workspace> listWorkspaces() {
        // HTTP calls using Unirest or your preferred HTTP client
    }
}

Configuration & Authentication

The SDK passes the user’s spec block as a JSON string to your plugin’s newClient method. Use Lombok @Data with Jackson for deserialization:

import lombok.Data;
import com.fasterxml.jackson.annotation.JsonProperty;
 
@Data
public class Spec {
    @JsonProperty("username")
    private String username;
 
    @JsonProperty("password")
    private String password;
 
    @JsonProperty("concurrency")
    private int concurrency = 1000;
}

Then in newClient, parse the spec and create your authenticated API client:

@Override
public ClientMeta newClient(String spec, boolean noConnection) throws Exception {
    ObjectMapper mapper = new ObjectMapper();
    Spec parsedSpec = mapper.readValue(spec, Spec.class);
 
    // Validate required fields
    if (parsedSpec.getUsername() == null || parsedSpec.getUsername().isEmpty()) {
        throw new IllegalArgumentException("username is required");
    }
 
    // Create authenticated API client
    return new BitbucketClient(parsedSpec);
}

Users configure authentication in their YAML file. The CLI automatically resolves environment variable references:

spec:
  username: "${BITBUCKET_USERNAME}"
  password: "${BITBUCKET_APP_PASSWORD}"
  concurrency: 500

Common Pitfalls

Avoid these common mistakes when building Java integrations:

  • Set up GitHub Packages auth before building. The SDK is distributed via GitHub Packages, not Maven Central. Without GITHUB_ACTOR and GITHUB_TOKEN set, gradle build will fail to resolve the dependency.
  • Stream results immediately. Call stream.write() or stream::write for each item as you get it. Don’t accumulate all results in a list first.
  • Handle pagination in resolvers. Most APIs return paginated results. Loop until all pages are fetched, writing each page’s items to the stream immediately.
  • Make id() unique per multiplexed client. If you’re multiplexing over accounts or workspaces, include the entity name in the id() return value.
  • Pass Docker build args for CI. When building Docker images, remember to pass --build-arg GITHUB_ACTOR and --build-arg GITHUB_TOKEN or the build will fail.

Test Locally

Start the integration as a gRPC server:

You can also build and run as a Docker container. See the Bitbucket Dockerfile. Note that Docker builds require GITHUB_ACTOR and GITHUB_TOKEN as build args:

docker build -t my-integration:latest \
  --build-arg GITHUB_ACTOR=<username> \
  --build-arg GITHUB_TOKEN=<token> .

See Testing Locally for configuration examples and Running Locally for full details.

Publishing

Visit Publishing an Integration to the Hub for release instructions. Java integrations can also be distributed as Docker images.

Real-World Examples

  • Bitbucket: parent-child tables, REST API with pagination, Arrow type mapping

Next Steps

Once your integration is working locally:

  1. Publish to the Hub: make your integration available to others
  2. Add parent-child tables: use .relations() on the parent table builder to link hierarchical resources (see the Workspaces → Repositories pattern in the Bitbucket integration)
  3. Build a Docker image: see the Bitbucket Dockerfile for a production-ready example
  4. Add pagination: most APIs require it; loop until all pages are fetched, streaming each page’s items immediately

Resources