This notebook provides a comprehensive walkthrough of time series forecasting using the BigFrames library. We will explore two powerful models, TimesFM and ARIMAPlus, to predict bikeshare trip demand based on historical data from San Francisco. The process covers data loading, preprocessing, model training, and visualization of the results.
import bigframes.pandas as bpd from bigframes.ml import forecasting bpd.options.display.render_mode = "anywidget"
1. Data Loading and Preprocessing#
The first step is to load the San Francisco bikeshare dataset from BigQuery. We then preprocess the data by filtering for trips made by ‘Subscriber’ type users from 2018 onwards. This ensures we are working with a relevant and consistent subset of the data. Finally, we aggregate the trip data by the hour to create a time series of trip counts.
df = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips") df = df[df["start_date"] >= "2018-01-01"] df = df[df["subscriber_type"] == "Subscriber"] df["trip_hour"] = df["start_date"].dt.floor("h") df_grouped = df[["trip_hour", "trip_id"]].groupby("trip_hour").count().reset_index() df_grouped = df_grouped.rename(columns={"trip_id": "num_trips"})
2. Forecasting with TimesFM#
In this section, we use the TimesFM (Time Series Foundation Model) to forecast future bikeshare demand. TimesFM is a powerful model designed for a wide range of time series forecasting tasks. We will use it to predict the number of trips for the last week of our dataset.
result = df_grouped.head(2842-168).ai.forecast( timestamp_column="trip_hour", data_column="num_trips", horizon=168 ) result
/usr/local/google/home/shuowei/src/python-bigquery-dataframes/bigframes/dataframe.py:5340: FutureWarning: The 'ai' property will be removed. Please use 'bigframes.bigquery.ai' instead. warnings.warn(msg, category=FutureWarning)
✅ Completed. Query processed 58.7 MB in 19 seconds of slot time. [Job bigframes-dev:US.eb026c28-038a-4ca7-acfa-474ed0be4119 details]
✅ Completed. Query processed 7.1 kB in a moment of slot time.
✅ Completed. Query processed 7.1 kB in a moment of slot time.
3. Forecasting with ARIMAPlus#
Next, we will use the ARIMAPlus model, which is a BigQuery ML model available through BigFrames. ARIMAPlus is an advanced forecasting model that can capture complex time series patterns. We will train it on the same historical data and use it to forecast the same period as the TimesFM model.
model = forecasting.ARIMAPlus( auto_arima_max_order=5, # Reduce runtime for large datasets data_frequency="hourly", horizon=168 ) X = df_grouped.head(2842-168)[["trip_hour"]] y = df_grouped.head(2842-168)[["num_trips"]] model.fit( X, y ) predictions = model.predict(horizon=168, confidence_level=0.95) predictions
4. Compare and Visualize Forecasts#
Now we will visualize the forecasts from both TimesFM and ARIMAPlus against the actual historical data. This allows for a direct comparison of the two models’ performance.
timesfm_result = result.sort_values("forecast_timestamp")[["forecast_timestamp", "forecast_value"]] timesfm_result = timesfm_result.rename(columns={ "forecast_timestamp": "trip_hour", "forecast_value": "timesfm_forecast" }) arimaplus_result = predictions.sort_values("forecast_timestamp")[["forecast_timestamp", "forecast_value"]] arimaplus_result = arimaplus_result.rename(columns={ "forecast_timestamp": "trip_hour", "forecast_value": "arimaplus_forecast" }) df_all = df_grouped.merge(timesfm_result, on="trip_hour", how="left") df_all = df_all.merge(arimaplus_result, on="trip_hour", how="left") df_all.tail(672).plot.line( x="trip_hour", y=["num_trips", "timesfm_forecast", "arimaplus_forecast"], rot=45, title="Trip Forecasts Comparison" )
✅ Completed. Query processed 31.7 MB in 11 seconds of slot time.
✅ Completed. Query processed 58.8 MB in 12 seconds of slot time.
<Axes: title={'center': 'Trip Forecasts Comparison'}, xlabel='trip_hour'>
5. Multiple Time Series Forecasting#
This section demonstrates a more advanced capability of ARIMAPlus: forecasting multiple time series simultaneously. This is useful when you have several independent series that you want to model together, such as trip counts from different bikeshare stations. The id_col parameter is key here, as it is used to differentiate between the individual time series.
df_multi = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips") df_multi = df_multi[df_multi["start_station_name"].str.contains("Market|Powell|Embarcadero")] # Create daily aggregation features = bpd.DataFrame({ "start_station_name": df_multi["start_station_name"], "date": df_multi["start_date"].dt.date, }) # Group by station and date num_trips = features.groupby( ["start_station_name", "date"], as_index=False ).size() # Rename the size column to "num_trips" num_trips = num_trips.rename(columns={num_trips.columns[-1]: "num_trips"}) # Check data quality print(f"Number of stations: {num_trips['start_station_name'].nunique()}") print(f"Date range: {num_trips['date'].min()} to {num_trips['date'].max()}") # Use daily frequency model = forecasting.ARIMAPlus( data_frequency="daily", horizon=30, auto_arima_max_order=3, min_time_series_length=10, time_series_length_fraction=0.8 ) model.fit( num_trips[["date"]], num_trips[["num_trips"]], id_col=num_trips[["start_station_name"]] ) predictions_multi = model.predict() predictions_multi
/usr/local/google/home/shuowei/src/python-bigquery-dataframes/bigframes/core/log_adapter.py:182: TimeTravelCacheWarning: Reading cached table from 2025-12-12 23:04:48.874384+00:00 to avoid incompatibilies with previous reads of this table. To read the latest version, set `use_cache=False` or close the current session with Session.close() or bigframes.pandas.close_session(). return method(*args, **kwargs)
✅ Completed. Query processed 69.8 MB in a moment of slot time.
✅ Completed. Query processed 69.8 MB in a moment of slot time.
✅ Completed. Query processed 69.8 MB in a moment of slot time.
Date range: 2013-08-29 to 2018-04-30
✅ Completed. Query processed 1.4 MB in 4 seconds of slot time. [Job bigframes-dev:US.a292f715-1d9c-406d-a7d5-f99b2ba71660 details]
✅ Completed. Query processed 4.6 kB in a moment of slot time.
✅ Completed. Query processed 11.5 kB in a moment of slot time.
✅ Completed. Query processed 0 Bytes in a moment of slot time.