Read from Apache Iceberg to Dataflow

To read from Apache Iceberg to Dataflow, use the managed I/O connector.

Managed I/O supports the following capabilities for Apache Iceberg:

Catalogs	Hadoop Hive REST-based catalogs BigQuery metastore (requires Apache Beam SDK 2.62.0 or later if not using Runner v2)
Read capabilities	Batch read
Write capabilities	Batch write Streaming write Dynamic destinations Dynamic table creation

For BigQuery tables for Apache Iceberg, use the BigQueryIO connector with BigQuery Storage API. The table must already exist; dynamic table creation is not supported.

Dependencies

Add the following dependencies to your project:

Java

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-managed</artifactId>
  <version>${beam.version}</version>
</dependency>

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-io-iceberg</artifactId>
  <version>${beam.version}</version>
</dependency>

Example

The following example reads from an Apache Iceberg table and writes the data to text files.

What's next

Write to Apache Iceberg.
Streaming Write to Apache Iceberg with BigLake REST Catalog.
Learn more about Managed I/O.