Instructor

shambhvi

Data Engineering on Google Cloud Platform Training

10 weeks

All levels

0 lessons

0 quizzes

0 students

Data Engineering on Google Cloud Platform Training

Created By shambhvi
Posted on September 8th, 2025

Overview
Audience
Prerequisites
Curriculum

Description:

This official Google Cloud training provides a comprehensive introduction to data engineering on GCP, equipping participants with the skills to design, build, and manage scalable data pipelines for both batch and streaming workloads.

The training covers the entire data lifecycle on GCP — from ingestion and storage to processing, orchestration, governance, and analytics. Learners explore core services, including Cloud Pub/Sub, Dataflow, BigQuery, Dataproc, Cloud Storage, and Data Fusion, while also gaining exposure to advanced capabilities such as Dataflow streaming, BigQuery ML, Cloud Composer, and AI/ML integration.

Through a blend of demos, labs, and case studies, participants will:

Build and optimize ETL/ELT workflows.
Design data lakes and warehouses on GCP.
Implement streaming pipelines for real-time analytics.
Use orchestration tools (Cloud Composer, Data Fusion) to manage production pipelines.
Apply governance and security best practices.
Explore analytics and machine learning options directly within GCP.

Duration: 5 Days

Course Code: BDT 524

Learning Objectives:

After this training, participants will be able to:

Ingest and process data with Pub/Sub and Dataflow
Build warehouses with BigQuery
Perform ML with BigQuery ML
Secure and monitor pipelines

Cloud architects
Systems engineers
Developers working with GCE

Knowledge of SQL
Experience with data processing tools
Basic understanding of cloud services

Course Outline:

Module 1: Introduction to Data Engineering on GCP

Role of a data engineer
Data engineering challenges
Data lakes vs data warehouses
GCP data ecosystem overview
Case study: Real-world GCP customer pipeline

Module 2: Data Ingestion

Introduction to Cloud Pub/Sub
Batch vs streaming ingestion patterns
Basics of Cloud Dataflow pipelines
Lab: Publishing and processing streaming data with Pub/Sub & Dataflow

Module 3: Data Storage

Cloud Storage (buckets, lifecycle, security)
Relational storage: Cloud SQL, Spanner
NoSQL storage: Datastore/Firestore, Bigtable
Lab: Loading structured/unstructured data into GCP storage services

Module 4: Data Warehousing with BigQuery

Introduction to BigQuery as a modern data warehouse
Loading and querying data
Partitioning, clustering, schema design
Lab: Running federated queries on external datasets

Module 5: Batch Data Processing

ETL/ELT concepts on GCP
Quality considerations and transformations
Using Dataproc for Spark and Hadoop workloads
Lab: Running Spark jobs on Dataproc

Module 6: Streaming Data Processing

Streaming concepts and challenges
Dataflow streaming pipelines
BigQuery streaming inserts
Cloud Bigtable for high-throughput streaming
Lab: Real-time analytics dashboard with Pub/Sub + Dataflow + BigQuery

Module 7: Data Orchestration and Automation

Cloud Data Fusion for visual ETL design
Cloud Composer (Apache Airflow) for orchestration
Scheduling workflows and monitoring pipelines
Infrastructure automation with Deployment Manager and Terraform (intro)
Lab: Building a pipeline with Data Fusion and Composer

Module 8: Monitoring, Security, and Governance

Cloud Monitoring, Logging, Error Reporting, Tracing
Data security and access control with IAM and DLP API
Governance best practices (projects, quotas, billing)
Lab: Detecting PII with Cloud DLP

Module 9: Analytics and Machine Learning with BigQuery

BigQuery ML: SQL-based ML models
Common supported models
Performance tuning for large queries
Lab: Building a predictive model with BigQuery ML

Module 10: AI/ML Services for Data Enrichment

Pre-built ML APIs (Vision, NLP, Translation) for unstructured data
Cloud AI Platform Notebooks for data exploration
AutoML for custom ML models
Kubeflow pipelines for production ML workflows
Lab: Classifying text data with Cloud NLP API

Module 11: Containers and Modern App Integration

Using GCP with containerized workloads
Google Kubernetes Engine (GKE) for scalable data apps
Cloud Run for serverless data services
Cloud Pub/Sub + Functions for event-driven data processing

Training material provided: Yes (Digital format)

The curriculum is empty

shambhvi

90 Courses

0.0 Avg Review

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Data Engineering on Google Cloud Platform Training

Data Engineering on Google Cloud Platform Training

Description:

Course Outline:

shambhvi

Looking for Team Training?

Public Classes

Get Started

Enterprise Hadoop Administration with Cloudera

Building AI Agents: A Hands-on Workshop

AI@Work: Using ChatGPT to Work Smarter

Gear up for AWS Certified AI Practitioner Certification

Your guide to CompTIA Advanced Security Practitioner (CASP+)

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Data Engineering on Google Cloud Platform Training

Data Engineering on Google Cloud Platform Training

Description:

Course Outline:

shambhvi

Looking for Team Training?

Public Classes

Get Started

Related Courses

Enterprise Hadoop Administration with Cloudera

Building AI Agents: A Hands-on Workshop

AI@Work: Using ChatGPT to Work Smarter

Gear up for AWS Certified AI Practitioner Certification

Your guide to CompTIA Advanced Security Practitioner (CASP+)

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title