My Data Engineering Journey
Welcome to my personal data engineering portfolio! This repo contains my notes, homework, projects, and homework as I work through the DataTalksClub Data Engineering Zoomcamp.
📚 Week-by-Week Breakdown:
🧱 Week 1: Containerization & Infrastructure Setup
- Introduction to Google Cloud Platform (GCP)
- Working with Docker and Docker Compose
- Running PostgreSQL in Docker containers
- Managing infrastructure using Terraform
🔁 Week 2: Orchestrating Workflows
- Understanding Data Lakes and orchestration concepts
- Building pipelines with Kestra
- Exploring task scheduling and dependency management
⚙️ Workshop 1: Data Ingestion Techniques
- Reading and ingesting data from APIs
- Building scalable pipelines
- Implementing data normalization and incremental loading
🏢 Week 3: Data Warehousing Essentials
- Overview of Google BigQuery
- Implementing table partitioning and clustering
- Learning optimization best practices
- Intro to ML features in BigQuery
🛠️ Week 4: Analytics Engineering
- Building models using dbt (data build tool)
- Testing, documenting, and deploying transformations
- Creating dashboards with Metabase
⏱️ Week 5: Batch Processing Fundamentals
- Introduction to Apache Spark
- Working with DataFrames and Spark SQL
- Exploring how groupBy and join operations work under the hood
📡 Week 6: Real-Time Data Streaming
- Getting started with Kafka
- Using Kafka Streams and KSQL
- Managing data schemas with Avro
🎓 Final Project
A final end-to-end project applying concepts learned throughout the course. Coming soon!