Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data.
Start your proof of concept with $300 in free credit
- Develop with our latest Generative AI models and tools.
- Get free usage of 20+ popular products, including Compute Engine and AI APIs.
- No automatic charges, no commitment.
Keep exploring with 20+ always-free products.
Access 20+ free products for common use cases, including AI APIs, VMs, data warehouses, and more.
Documentation resources
Find quickstarts and guides, review key references, and get help with common issues.
Guides
Reference
Resources
Explore self-paced training, use cases, reference architectures, and code samples with examples of how to use and connect Google Cloud services.
Training
Training and tutorials
Machine Learning with Spark on Dataproc
This course features a combination of lectures, demos, and hands-on labs to implement logistic regression using a machine learning library for Apache Spark running on a Dataproc cluster to develop a model for data from a multivariable dataset.