The primary entry point for the BigQuery DataFrames (BigFrames) pandas-compatible API.
BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API
powered by the BigQuery engine. The bigframes.pandas module implements a large
subset of the pandas API, allowing you to perform large-scale data analysis
using familiar pandas syntax while the computations are executed in the cloud.
Key Features:
Petabyte-Scale Scalability: Handle datasets that exceed local memory by offloading computation to the BigQuery distributed engine.
Pandas Compatibility: Use common pandas methods like
groupby(),merge(),pivot_table(), and more on BigQuery-backedDataFrameobjects.Direct BigQuery Integration: Read from and write to BigQuery tables and queries with
bigframes.pandas.read_gbq()andbigframes.pandas.DataFrame.to_gbq().User-defined Functions (UDFs): Effortlessly deploy Python functions functions using the
bigframes.pandas.remote_function()andbigframes.pandas.udf()decorators.Data Ingestion: Support for various formats including CSV, Parquet, JSON, and Arrow via
bigframes.pandas.read_csv(),bigframes.pandas.read_parquet(), etc., which are automatically uploaded to BigQuery for processing. Convert any pandas DataFrame into a BigQuery DataFrame usingbigframes.pandas.read_pandas().
Example usage:
>>> import bigframes.pandas as bpd
Initialize session and set options.
>>> bpd.options.bigquery.project = "your-project-id"
Load data from a BigQuery public dataset.
>>> df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")
Perform familiar pandas operations that execute in the cloud.
>>> top_names = ( ... df.groupby("name") ... .agg({"number": "sum"}) ... .sort_values("number", ascending=False) ... .head(10) ... )
Bring the final, aggregated results back to local memory if needed.
>>> local_df = top_names.to_pandas()
BigQuery DataFrames is designed for data scientists and analysts who need the power of BigQuery with the ease of use of pandas. It eliminates the “data movement bottleneck” by keeping your data in BigQuery for processing.
Functions
Classes
Module Attributes
- pandas.NA = <NA>
- pandas.BooleanDtype = <class 'pandas.core.arrays.boolean.BooleanDtype'>
- pandas.Float64Dtype = <class 'pandas.core.arrays.floating.Float64Dtype'>
- pandas.Int64Dtype = <class 'pandas.core.arrays.integer.Int64Dtype'>
- pandas.StringDtype = <class 'pandas.core.arrays.string_.StringDtype'>
- pandas.ArrowDtype = <class 'pandas.core.dtypes.dtypes.ArrowDtype'>
- pandas.options = <bigframes._config.global_options.Options object>
- pandas.option_context = <class 'bigframes_vendored.pandas._config.config.option_context'>