Databricks Certified Data Engineer Associate
Validates ability to use the Databricks Data Intelligence Platform to complete introductory data engineering tasks including building and managing data ingestion pipelines, processing and transforming data with Spark SQL and Python, productionizing data pipelines with Delta Live Tables and Lakeflow Jobs, and implementing data governance with Unity Catalog. The exam consists of 45 scored multiple-choice questions over 90 minutes.
Exam domains
- Data Processing & Transformations31%
Build batch and streaming transformations with PySpark DataFrame and Spark SQL APIs on Delta Lake, leveraging ACID transactions, schema enforcement and evolution, time travel, MERGE/UPDATE/DELETE, and OPTIMIZE/Z-ORDER to keep tables performant. Implement bronze/silver/gold medallion patterns with Structured Streaming for incremental processing.
- Development and Ingestion30%
Ingest data into the lakehouse using Auto Loader for incremental, exactly-once Structured Streaming ingestion from cloud storage (S3, ADLS, GCS, UC volumes) across JSON, CSV, XML, Parquet, Avro, ORC, and binary formats, and use COPY INTO for idempotent bulk loads. Develop PySpark and SQL code in notebooks against managed and external Delta tables.
- Productionizing Data Pipelines18%
Productionize pipelines with Lakeflow Jobs (formerly Databricks Jobs) for scheduling, task dependencies, conditional logic, retries, and email/webhook alerts, and with Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) for declarative SQL/Python pipelines with expectations, autoscaling, and built-in observability. Apply CI/CD practices including Databricks Asset Bundles and Repos for code promotion across environments.
- Data Governance & Quality
Sources
Questions are grounded in 100 references from official and authoritative materials.