# Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.
In BigQuery, a STRUCT (also known as a record) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigQuery DataFrames maps BigQuery STRUCT types to the pandas equivalent, pandas.ArrowDtype(pa.struct()).
This notebook illustrates how to work with STRUCT columns in BigQuery DataFrames. First, let’s import the required packages and perform the necessary setup below.
import bigframes.pandas as bpd import bigframes.bigquery as bbq import pandas as pd import pyarrow as pa
REGION = "US" # @param {type: "string"} bpd.options.display.progress_bar = None bpd.options.bigquery.location = REGION
Create DataFrames with struct columns#
Example 1: Creating from a list of objects
names = ["Alice", "Bob", "Charlie"] addresses = [ {'City': 'New York', 'State': 'NY'}, {'City': 'San Francisco', 'State': 'CA'}, {'City': 'Seattle', 'State': 'WA'} ] df = bpd.DataFrame({'Name': names, 'Address': addresses}) df
| Name | Address | |
|---|---|---|
| 0 | Alice | {'City': 'New York', 'State': 'NY'} |
| 1 | Bob | {'City': 'San Francisco', 'State': 'CA'} |
| 2 | Charlie | {'City': 'Seattle', 'State': 'WA'} |
3 rows × 2 columns
[3 rows x 2 columns in total]
Name string[pyarrow] Address struct<City: string, State: string>[pyarrow] dtype: object
Example 2: Defining schema explicitly
bpd.Series( data=addresses, dtype=bpd.ArrowDtype(pa.struct([('City', pa.string()), ('State', pa.string())])) )
0 {'City': 'New York', 'State': 'NY'}
1 {'City': 'San Francisco', 'State': 'CA'}
2 {'City': 'Seattle', 'State': 'WA'}
dtype: struct<City: string, State: string>[pyarrow]
Example 3: Reading from a source
bpd.read_gbq("bigquery-public-data.ml_datasets.credit_card_default", max_results=5)["predicted_default_payment_next_month"]
0 [{'tables': {'score': 0.8667634129524231, 'val...
1 [{'tables': {'score': 0.9351968765258789, 'val...
2 [{'tables': {'score': 0.8572560548782349, 'val...
3 [{'tables': {'score': 0.9690881371498108, 'val...
4 [{'tables': {'score': 0.9349926710128784, 'val...
Name: predicted_default_payment_next_month, dtype: list<item: struct<tables: struct<score: double, value: string>>>[pyarrow]
Operate on STRUCT data#
BigQuery DataFrames provides two main approaches for operating on STRUCT data:
The
Series.structaccessor: Provides Pandas-like methods for STRUCT column manipulation.The
DataFrame.structaccessor: Provides Pandas-like methods for all child STRUCT columns manipulation.BigQuery built-in functions: Allows you to use functions mirroring BigQuery SQL operations, available through the
bigframes.bigquerymodule (abbreviated asbbqbelow), such asstruct.
View Data Types of Struct Fields#
df['Address'].struct.dtypes
City string[pyarrow] State string[pyarrow] dtype: object
Access a Struct Field by Name#
df['Address'].struct.field("City")
0 New York 1 San Francisco 2 Seattle Name: City, dtype: string