Databricks Certified Data Engineer Professional
Validates ability to use Databricks to perform advanced data engineering tasks including designing and implementing complex ETL pipelines, optimizing performance with Delta Lake, building production-grade lakehouse architectures, and applying security and governance best practices. The exam consists of 60 multiple-choice and multiple-select questions over 120 minutes and requires deep hands-on experience with Spark and Delta Lake.
Exam domains
- Developing Code for Data Processing using Python and SQL22%
Write production PySpark and Spark SQL on Databricks using the SparkSession entry point, DataFrame/Dataset API, Spark SQL functions, and Python/SQL UDFs (including Pandas UDFs) to express batch and streaming transformations. Apply Structured Streaming patterns (readStream/writeStream, output modes, processing-time and AvailableNow triggers, checkpointing, watermarks) and idiomatic Delta Lake operations (MERGE INTO, time travel, Change Data Feed) across notebooks, Python files, and Lakeflow Spark Declarative Pipelines.
- Cost and Performance Optimisation13%
Tune Spark and Delta workloads using Photon, serverless compute, autoscaling, job vs all-purpose clusters, instance pools, and compute policies, and right-size cluster memory/cores to minimise shuffle and disk spill. Apply Delta optimisations including OPTIMIZE compaction, liquid clustering, Z-ordering, VACUUM retention, predictive optimization, deletion vectors, and AQE/broadcast join hints to balance latency, throughput, and DBU spend.
- Debugging and Deploying10%
Debug Spark jobs using the Spark UI (stages, tasks, DAG, SQL tab), driver/executor logs, ganglia/metrics, and Lakeflow pipeline event log queries to diagnose skew, OOM, and shuffle pressure. Promote code across dev/staging/prod with Databricks Asset Bundles (databricks bundle validate/deploy/run), targets, variables, and Git folders, and integrate with CI/CD via the Databricks CLI/REST API and GitHub Actions.
Sources
Questions are grounded in 50 references from official and authoritative materials.