bigframes.pandas — bigframes documentation

The primary entry point for the BigQuery DataFrames (BigFrames) pandas-compatible API.

BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API powered by the BigQuery engine. The bigframes.pandas module implements a large subset of the pandas API, allowing you to perform large-scale data analysis using familiar pandas syntax while the computations are executed in the cloud.

Key Features:

Petabyte-Scale Scalability: Handle datasets that exceed local memory by offloading computation to the BigQuery distributed engine.
Pandas Compatibility: Use common pandas methods like groupby(), merge(), pivot_table(), and more on BigQuery-backed DataFrame objects.
Direct BigQuery Integration: Read from and write to BigQuery tables and queries with bigframes.pandas.read_gbq() and bigframes.pandas.DataFrame.to_gbq().
User-defined Functions (UDFs): Effortlessly deploy Python functions functions using the bigframes.pandas.remote_function() and bigframes.pandas.udf() decorators.
Data Ingestion: Support for various formats including CSV, Parquet, JSON, and Arrow via bigframes.pandas.read_csv(), bigframes.pandas.read_parquet(), etc., which are automatically uploaded to BigQuery for processing. Convert any pandas DataFrame into a BigQuery DataFrame using bigframes.pandas.read_pandas().

Example usage:

>>> import bigframes.pandas as bpd

Initialize session and set options.

>>> bpd.options.bigquery.project = "your-project-id"

Load data from a BigQuery public dataset.

>>> df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")

Perform familiar pandas operations that execute in the cloud.

>>> top_names = (
...     df.groupby("name")
...     .agg({"number": "sum"})
...     .sort_values("number", ascending=False)
...     .head(10)
... )

Bring the final, aggregated results back to local memory if needed.

>>> local_df = top_names.to_pandas()

BigQuery DataFrames is designed for data scientists and analysts who need the power of BigQuery with the ease of use of pandas. It eliminates the “data movement bottleneck” by keeping your data in BigQuery for processing.

Functions

Classes

Module Attributes

pandas.NA = <NA>

pandas.BooleanDtype = <class 'pandas.core.arrays.boolean.BooleanDtype'>

pandas.Float64Dtype = <class 'pandas.core.arrays.floating.Float64Dtype'>

pandas.Int64Dtype = <class 'pandas.core.arrays.integer.Int64Dtype'>

pandas.StringDtype = <class 'pandas.core.arrays.string_.StringDtype'>

pandas.ArrowDtype = <class 'pandas.core.dtypes.dtypes.ArrowDtype'>

pandas.options = <bigframes._config.global_options.Options object>

pandas.option_context = <class 'bigframes_vendored.pandas._config.config.option_context'>