Databricks Certified Associate Developer for Apache Spark
Validates understanding of the Apache Spark architecture and the ability to apply the Spark DataFrame API to complete data manipulation tasks within a Spark session. The exam covers Spark architecture and components, Spark SQL, DataFrame/DataSet API development, troubleshooting and tuning, Structured Streaming, Spark Connect, and the Pandas API on Spark. The exam consists of 45 scored multiple-choice questions over 90 minutes using Python.
Exam domains
- Developing Apache Spark DataFrame/DataSet API Applications30%
Build PySpark DataFrame applications using select, filter, withColumn, drop, sort, groupBy, agg, join, union, and window operations, plus schema handling, missing-data functions (na.fill/drop), and partitioned reads/writes across JSON, CSV, Parquet, ORC, and Delta. Author Python and pandas UDFs and apply Spark SQL functions for column-level transformations.
- Using Spark SQL20%
Use Spark SQL to query and manipulate data with SELECT, JOIN, GROUP BY, window functions, CTEs, and built-in functions across temporary views and catalog tables. Apply DDL/DML on managed and external tables, register UDFs for SQL use, and leverage the Catalyst optimizer and AQE for efficient query planning.
- Apache Spark Architecture and Components20%
Understand the Apache Spark execution model including the driver, executors, cluster managers, jobs/stages/tasks hierarchy, and partitions. Cover lazy evaluation, narrow vs. wide transformations and shuffles, fault tolerance via lineage, SparkSession/SparkContext entry points, deployment modes, and the Catalyst optimizer with Adaptive Query Execution (AQE).
- Structured Streaming10%
Sources
Questions are grounded in 50 references from official and authoritative materials.