Google Cloud Professional Data Engineer
Validates ability to design, build, operationalize, and optimize data processing systems on Google Cloud. Covers designing data processing systems for security, reliability, flexibility, and migration; ingesting and processing data through batch and streaming pipelines; selecting storage systems including data warehouses, data lakes, and data platforms; preparing data for analysis and AI/ML; and maintaining and automating data workloads with monitoring and orchestration. 40-50 multiple-choice and multiple-select questions in 2 hours. Recommended 3+ years industry experience; 2-year validity.
Exam domains
- Ingesting and processing the data25%
Planning the data pipelines (defining data sources and sinks; defining data transformation logic; networking fundamentals; data encryption). Building the pipelines (data cleansing; identifying the services - Dataflow, Apache Beam, Dataproc, Cloud Data Fusion, BigQuery, Pub/Sub, Apache Spark, Hadoop ecosystem, Apache Kafka; transformations - batch, streaming including windowed and late-arriving data, language; ad hoc data ingestion - one-time or automated pipeline; data acquisition and import; integrating with new data sources). Deploying and operationalizing the pipelines (job automation and orchestration - Cloud Composer/Apache Airflow, Cloud Scheduler; CI/CD - Continuous Integration and Continuous Deployment).
- Designing data processing systems22%
Designing for security and compliance (Identity and Access Management - IAM, Cloud KMS, organization policy; data security; privacy compliance - HIPAA, GDPR, CCPA, region/jurisdiction; data sovereignty; legal compliance). Designing for reliability and fidelity (preparing and cleaning data - Dataprep, Dataflow, BigQuery; monitoring and orchestration of data pipelines; disaster recovery and fault tolerance; making decisions related to ACID vs BASE; data validation). Designing for flexibility and portability (mapping current and future business requirements to architecture; designing for data and application portability - multi-cloud, data residency requirements; data staging, cataloging, and discovery). Designing data migrations (analyzing current stakeholder needs, users, processes, and technologies and creating a plan to get to desired state; planning migration to Google Cloud - BigQuery Data Transfer Service, Database Migration Service, Transfer Appliance, Google Cloud networking; designing the migration validation strategy; designing the project, dataset, and table architecture).
Sources
Questions are grounded in 100 references from official and authoritative materials.