CloudQuery
Creating a New Source Integration in Java
This guide walks through building a source integration in Java using the CloudQuery SDK. We reference the Bitbucket integration as an example throughout.
Prerequisites:
- Java 20+ and Gradle 8.4+ installed (Java tutorials)
- CloudQuery CLI installed
- A GitHub personal access token with
read:packagesscope for SDK authentication (see GitHub Packages docs)
The Java SDK is distributed via GitHub Packages (not Maven Central), so you need a GitHub PAT to download the dependency.
Get Started
Clone the Bitbucket integration as a starting point.
Set up GitHub Packages authentication:
export GITHUB_ACTOR=<your-github-username>
export GITHUB_TOKEN=<personal-access-token>Build the project:
How It All Connects
When a user runs cloudquery sync, here’s what happens with your Java integration:
- The CLI starts your integration (or connects to it over gRPC if you’re running it locally)
- Your
MainClass.javacreates the plugin and starts the gRPC server viaPluginServe - The CLI calls your plugin’s
newClientmethod with the user’sspecconfiguration newClientparses the spec (using Jackson), validates it, creates an authenticated API client, and transforms tables- The CLI calls
sync, which runs each table’s resolver through the scheduler - Each resolver is a lambda that calls
stream.write()for each record. The SDK handles writing them to the destination.
Your main implementation work is in three places: the newClient method (parsing configuration and creating the API client), the table definitions (defining columns and relations), and the resolvers (lambda functions that fetch data from the API).
Project Structure
Here’s the structure based on the Bitbucket integration:
plugins/source/bitbucket/
├── settings.gradle
├── gradlew / gradlew.bat
├── Dockerfile
├── gradle/
└── app/
├── build.gradle # SDK dependency (io.cloudquery:plugin-sdk-java)
└── src/main/java/bitbucket/
├── MainClass.java # Entry point
├── BitbucketPlugin.java # Plugin definition
├── client/
│ ├── BitbucketClient.java # API client (implements ClientMeta)
│ └── configuration/
│ └── Spec.java # Configuration spec (@Data + Jackson)
└── resources/
├── Workspaces.java # Table definition
└── Repositories.java # Child table definitionHere’s what each component does:
MainClass.java: the entry point. Creates the plugin and starts the gRPC server viaPluginServe.builder(). This is boilerplate you rarely modify.BitbucketPlugin.java: the Plugin class extendsio.cloudquery.plugin.Pluginand implements three key methods:newClient(parses the spec, creates the API client),tables(returns the list of tables), andsync(runs resolvers through the scheduler).client/BitbucketClient.java: the Client implementsClientMetaand wraps your authenticated API client. Every resolver receives this client so it can make API calls. It also provides anid()method used in logs and for multiplexing.client/configuration/Spec.java: the Spec is a Lombok@Dataclass deserialized from the user’s YAML configuration via Jackson. It defines what settings your integration accepts (credentials, endpoints, concurrency).resources/: one file per table. Each file defines a table using the builder pattern (name, columns, transform, relations) and a resolver lambda that fetches data from the API. The Bitbucket example shows parent-child tables:Workspaces.javadefines the parent,Repositories.javadefines the child.
Entry Point
The MainClass.java creates and serves the plugin. This is boilerplate that you rarely need to change:
import io.cloudquery.plugin.PluginServe;
public class MainClass {
public static void main(String[] args) {
BitbucketPlugin plugin = new BitbucketPlugin();
PluginServe pluginServe = PluginServe.builder()
.args(args)
.plugin(plugin)
.build();
int exitCode = pluginServe.Serve();
System.exit(exitCode);
}
}Plugin Class
The Plugin class is the central piece of your integration. It extends io.cloudquery.plugin.Plugin and implements three key methods: newClient (called with the user’s spec to set up the API client), tables (returns available tables), and sync (runs the resolvers through a scheduler):
import io.cloudquery.plugin.Plugin;
import io.cloudquery.plugin.PluginKind;
public class BitbucketPlugin extends Plugin {
public BitbucketPlugin() {
super("bitbucket", PLUGIN_VERSION);
setTeam("my-team");
setKind(PluginKind.Source);
}
@Override
public ClientMeta newClient(String spec, boolean noConnection) {
// Parse spec, create API client, transform tables
Spec parsedSpec = objectMapper.readValue(spec, Spec.class);
return new BitbucketClient(parsedSpec);
}
@Override
public List<Table> tables() {
return List.of(Workspaces.getTable());
}
@Override
public void sync(List<Table> tables, SyncStream syncStream) {
Scheduler.builder()
.client(client)
.tables(tables)
.syncStream(syncStream)
.concurrency(spec.getConcurrency())
.build()
.sync();
}
}Define a Table
In Java, tables use the builder pattern. Similar to Go’s TransformWithStruct, the Java SDK provides TransformWithClass which auto-maps fields from a Java model class (like a POJO or Lombok @Data class) to table columns. This means you don’t need to define columns manually. The SDK inspects your model class and creates appropriate columns for each field.
import io.cloudquery.schema.Table;
import io.cloudquery.transformers.TransformWithClass;
public class Workspaces {
public static Table getTable() {
return Table.builder()
.name("bitbucket_workspaces")
.description("Bitbucket workspaces")
.transform(TransformWithClass.builder(Workspace.class)
.pkField("uuid")
.build())
.relations(List.of(Repositories.getTable()))
.resolver(resolveWorkspaces())
.build();
}
}Key details:
- Use
Table.builder()to construct tables TransformWithClass.builder(ModelClass.class)auto-maps fields to columns- Set primary keys via
.pkField("fieldName") - Define parent-child relations via
.relations(List.of(ChildTable.getTable()))
Write a Table Resolver
Resolvers are where the actual API fetching happens. In Java, TableResolver is a functional interface, so you can implement it as a lambda. The lambda receives three arguments:
clientMeta: your Client class (cast it to access API methods). This is the same object returned bynewClientin your plugin.parent: for top-level tables, this isnull. For child tables, it contains the parent row so you can extract the parent’s data (e.g. a workspace name to list its repositories).stream: callstream.write(item)to send each record to the destination. You can also usestream::writeas a method reference withforEach.
Here’s a top-level resolver that fetches workspaces, and a child resolver that fetches repositories for each workspace:
// Top-level resolver: called once, fetches all workspaces
private static TableResolver resolveWorkspaces() {
return (clientMeta, parent, stream) -> {
BitbucketClient client = (BitbucketClient) clientMeta;
List<Workspace> workspaces = client.listWorkspaces();
workspaces.forEach(stream::write);
};
}
// Child resolver: called once per parent workspace row
private static TableResolver resolveRepositories() {
return (clientMeta, parent, stream) -> {
BitbucketClient client = (BitbucketClient) clientMeta;
// Extract the workspace name from the parent row
String workspaceName = ((Workspace) parent.getItem()).getName();
client.listRepositoriesForWorkspace(workspaceName)
.forEach(stream::write);
};
}The child resolver demonstrates a key pattern: parent.getItem() returns the model object from the parent row. You cast it to the parent’s type and extract whatever you need to make the child API call. The SDK automatically calls the child resolver once for each row in the parent table, so you don’t need to loop over parents yourself.
Client
The client implements io.cloudquery.schema.ClientMeta:
import io.cloudquery.schema.ClientMeta;
public class BitbucketClient implements ClientMeta {
private final Spec spec;
@Override
public String id() {
return "bitbucket";
}
public List<Workspace> listWorkspaces() {
// HTTP calls using Unirest or your preferred HTTP client
}
}Configuration & Authentication
The SDK passes the user’s spec block as a JSON string to your plugin’s newClient method. Use Lombok @Data with Jackson for deserialization:
import lombok.Data;
import com.fasterxml.jackson.annotation.JsonProperty;
@Data
public class Spec {
@JsonProperty("username")
private String username;
@JsonProperty("password")
private String password;
@JsonProperty("concurrency")
private int concurrency = 1000;
}Then in newClient, parse the spec and create your authenticated API client:
@Override
public ClientMeta newClient(String spec, boolean noConnection) throws Exception {
ObjectMapper mapper = new ObjectMapper();
Spec parsedSpec = mapper.readValue(spec, Spec.class);
// Validate required fields
if (parsedSpec.getUsername() == null || parsedSpec.getUsername().isEmpty()) {
throw new IllegalArgumentException("username is required");
}
// Create authenticated API client
return new BitbucketClient(parsedSpec);
}Users configure authentication in their YAML file. The CLI automatically resolves environment variable references:
spec:
username: "${BITBUCKET_USERNAME}"
password: "${BITBUCKET_APP_PASSWORD}"
concurrency: 500Common Pitfalls
Avoid these common mistakes when building Java integrations:
- Set up GitHub Packages auth before building. The SDK is distributed via GitHub Packages, not Maven Central. Without
GITHUB_ACTORandGITHUB_TOKENset,gradle buildwill fail to resolve the dependency. - Stream results immediately. Call
stream.write()orstream::writefor each item as you get it. Don’t accumulate all results in a list first. - Handle pagination in resolvers. Most APIs return paginated results. Loop until all pages are fetched, writing each page’s items to the stream immediately.
- Make
id()unique per multiplexed client. If you’re multiplexing over accounts or workspaces, include the entity name in theid()return value. - Pass Docker build args for CI. When building Docker images, remember to pass
--build-arg GITHUB_ACTORand--build-arg GITHUB_TOKENor the build will fail.
Test Locally
Start the integration as a gRPC server:
You can also build and run as a Docker container. See the Bitbucket Dockerfile. Note that Docker builds require GITHUB_ACTOR and GITHUB_TOKEN as build args:
docker build -t my-integration:latest \
--build-arg GITHUB_ACTOR=<username> \
--build-arg GITHUB_TOKEN=<token> .See Testing Locally for configuration examples and Running Locally for full details.
Publishing
Visit Publishing an Integration to the Hub for release instructions. Java integrations can also be distributed as Docker images.
Real-World Examples
- Bitbucket: parent-child tables, REST API with pagination, Arrow type mapping
Next Steps
Once your integration is working locally:
- Publish to the Hub: make your integration available to others
- Add parent-child tables: use
.relations()on the parent table builder to link hierarchical resources (see the Workspaces → Repositories pattern in the Bitbucket integration) - Build a Docker image: see the Bitbucket Dockerfile for a production-ready example
- Add pagination: most APIs require it; loop until all pages are fetched, streaming each page’s items immediately