Databricks Certified Machine Learning Associate practice questions

Question 1

A retail data engineering team is being onboarded to the Databricks Feature Store and the ML engineer must explain what the Feature Store actually is for AI and ML models. Per the Databricks Feature Store documentation, which option matches?

Accepted Answer

The Databricks Feature Store provides a central registry for features used in your AI and ML models, and feature tables and models are registered in Unity Catalog, providing governance, lineage, and cross-workspace sharing.

Answer

The Databricks Feature Store is an on-demand TV streaming bundle marketed to retail customers and explicitly does not provide a registry, lineage, or governance for features used in any ML model on the platform anywhere.

Answer

The Databricks Feature Store is a free per-cluster screenshot service that captures notebook cell outputs and does not register or govern any features used in any AI or ML model across any Databricks workspace at all.

Answer

The Databricks Feature Store is a third-party Stripe billing connector for retail subscriptions and does not provide a registry, governance, lineage, or cross-workspace sharing for features used in any ML model at all.

Question 2

A data scientist is being onboarded to SparkML on Databricks at a healthcare diagnostics team and the ML engineer is being asked which classical-ML algorithms SparkML provides for classification, regression, clustering, and collaborative filtering. Per the Databricks SparkML pipeline documentation, which option matches?

Accepted Answer

Key SparkML algorithms include LogisticRegression, RandomForestClassifier, GBTClassifier, LinearRegression, RandomForestRegressor, KMeans, and ALS for collaborative filtering across DataFrames in the SparkML library.

Answer

SparkML does not provide any classification, regression, clustering, or collaborative-filtering algorithm, so the healthcare diagnostics team must reimplement LogisticRegression, RandomForest, KMeans, and ALS from scratch in each notebook.

Answer

SparkML only provides ALS for collaborative filtering and explicitly does not provide LogisticRegression, RandomForestClassifier, GBTClassifier, LinearRegression, RandomForestRegressor, or KMeans for any healthcare diagnostics ML model.

Answer

SparkML only provides KMeans for clustering and explicitly does not provide LogisticRegression, RandomForestClassifier, GBTClassifier, LinearRegression, RandomForestRegressor, or ALS for any healthcare diagnostics ML model anywhere.

Question 3

A platform team is designing time series feature tables on Databricks and the architect is evaluating which situations actually call for a time series feature table rather than a standard feature table. Per the Databricks point-in-time feature joins documentation, which TWO situations are appropriate use cases?

Accepted Answer

Event-based features whose values change over time as new user interactions keep getting recorded.

Accepted Answer

Time-aggregated features such as a rolling 30-day spend recomputed at successive points in time.

Answer

Static reference attributes such as a country-to-region lookup that never changes after it is first loaded.

Answer

A one-time dimension table joined only at inference with no temporal component to any of its feature values.

Question 4

An MLOps team is hardening machine-to-machine authentication for a credit-risk model serving endpoint and the architect is being asked which authentication option Databricks recommends as the security best practice for production environments. Per the Databricks model serving query documentation, which option matches?

Accepted Answer

For authentication, machine-to-machine OAuth tokens represent the security best practice for production environments, while service principal tokens are recommended for development on Databricks Model Serving query clients.

Answer

For authentication, Databricks recommends embedding a personal access token of the data scientist in a hard-coded string inside the credit-risk client application and rotating it manually once per quarter as the production best practice.

Answer

For authentication, Databricks Model Serving does not support OAuth or service principal tokens and explicitly requires every credit-risk client to call the inference endpoint without any authentication header in production environments.

Answer

For authentication, Databricks Model Serving supports only Stripe-managed API keys for credit-risk clients and explicitly does not recommend any OAuth or service principal token pattern in production or development environments.

Question 5

A developer is wiring up MLflow tracking on Databricks and must record model parameters and results for each individual execution of training code. What is that single execution called in MLflow?

Accepted Answer

A run, a single execution of model code that logs parameters and results.

Answer

An experiment, which is what MLflow names each individual code execution.

Answer

A model, since each execution produces exactly one artifact bundle by itself.

Answer

A registered version, created automatically for every training execution.

Question 6

An MLOps team is rolling out Optuna on Databricks Runtime 17.0 ML, where MLflow 3.0 is pre-installed, and the architect must select between APIs to launch parallel Optuna studies across the cluster's PySpark executors. Which option is correct?

Accepted Answer

Use MlflowSparkStudy, which launches parallel Optuna studies across the cluster's PySpark executors.

Answer

Use MlflowStorage, which fans Optuna trials out to PySpark executors and aggregates them on the driver.

Answer

Use SparkTrials, which is the dedicated Optuna class for distributing its studies over executor slots.

Answer

Use TorchDistributor, which wraps each Optuna study as a Spark job for executor-parallel tuning runs.

Question 7

An ML engineer is building a Lakeflow Job for retraining a fintech credit-risk model and the team is being asked which control-flow constructs jobs support in the visual authoring UI. Per the Databricks Lakeflow Jobs documentation, which TWO options describe supported control-flow constructs?

Accepted Answer

Jobs support custom control flow logic using branching with if / else statements, so the credit-risk retraining job can conditionally run training, evaluation, or rollback tasks based on the result of any earlier task in the job.

Accepted Answer

Jobs support custom control flow logic using looping with for each statements, so the credit-risk retraining job can iterate task execution across a list of segments or feature sets using a visual authoring UI.

Answer

Jobs support custom control flow logic only through a Stripe-managed pay-per-task billing meter and explicitly do not allow branching or looping in any visual authoring UI for any retraining workflow at all.

Answer

Jobs support custom control flow logic only through a deprecated cron-style scheduling string and explicitly do not support if / else branching or for each looping inside any visual authoring UI for the retraining workflow.

Question 8

An organization is running offline batch inference over a large Delta table of customer reviews already stored on Databricks, and the data team needs a built-in SQL approach to apply a model to every row without exporting the data. Per the Databricks batch inference documentation, which option best fits?

Accepted Answer

Run batch inference with task-specific AI Functions or the general-purpose ai_query function from SQL.

Answer

Stream every row through a Structured Streaming job that calls an external REST model one record at a time.

Answer

Export the table to object storage and score it with a separate pandas script running on a single driver.

Answer

Convert the table to a feature table and rely on Feature Store online lookups to return a prediction per row.

Question 9

An organization is being onboarded to the feature store concept at a healthcare ML group and the data scientist is being asked what a feature store actually is and what it ensures for trained ML models. Per the Databricks Feature Store concepts documentation, which option matches?

Accepted Answer

A feature store is a centralized repository that enables data scientists to find and share features, and using a feature store also ensures that the code used to compute feature values is the same during model training and inference.

Answer

A feature store is a per-user Jupyter scratch pad that explicitly cannot be shared between any data scientists and provides no guarantees that training-time feature code matches the code used when the model is used for inference.

Answer

A feature store is a one-time Excel export of feature values from a healthcare data warehouse and explicitly does not enable any data scientist to find or share features for any model used for training or inference.

Answer

A feature store is a per-workspace dashboard that displays only model accuracy graphs to the data scientists and explicitly does not centralize, share, or recompute any features used for training or inference.

Question 10

A data scientist is being onboarded to MLflow Tracking on Databricks at a fintech credit-risk team and the ML engineer is being asked what MLflow Tracking actually records for each training run so that runs are comparable across experiments. Per the Databricks MLflow documentation, which option matches?

Accepted Answer

MLflow Tracking records parameters, metrics, artifacts, and source code for each training run, enabling comparison across experiments in any credit-risk experimentation workflow on any Databricks Runtime in the workspace.

Answer

MLflow Tracking only records a static screenshot of the notebook and explicitly does not record any parameters, metrics, artifacts, or source code for any credit-risk training run on any Databricks Runtime release in the workspace.

Answer

MLflow Tracking only records the cluster's idle CPU temperature for each credit-risk training run and explicitly does not record any parameters, metrics, artifacts, or source code across any of the team's MLflow experiments at all.

Answer

MLflow Tracking only records the timestamp of the credit-risk training notebook and explicitly does not record any parameters, metrics, artifacts, or source code across any of the team's MLflow experiments on Databricks at all.

Question 11

An organization is formalizing an MLOps practice on Databricks and the architect must state which disciplines MLOps brings together. Per the Databricks MLOps workflows documentation, which option matches?

Accepted Answer

It combines DevOps, DataOps, and ModelOps into one set of processes and automated steps.

Answer

It combines DevOps and DataOps only, leaving model concerns to a separate ModelOps team.

Answer

It combines DataOps and ModelOps but treats DevOps as out of scope for ML systems.

Answer

It combines DevOps with feature engineering and AutoML in place of DataOps practices.

Question 12

A startup is integrating an existing application that already uses the OpenAI client with a chat model now hosted on a Databricks Model Serving endpoint, and the team wants minimal code change. Per the Databricks Model Serving query methods documentation, which option fits?

Accepted Answer

Use the OpenAI client and pass the Model Serving endpoint name as the model input to query the chat model.

Answer

Use the MLflow Deployments SDK predict() call after rewriting the app to drop its existing OpenAI client.

Answer

Call the Serving UI Query endpoint button programmatically from the application to stream chat completions.

Answer

Install the feature-engineering client and read the chat responses from the endpoint's online feature table.

Question 13

An organization is rolling out a shared feature platform so that multiple data science teams stop independently re-implementing the same features. Per the Databricks Feature Store concepts documentation, which option matches?

Accepted Answer

A feature store is a centralized repository that lets data scientists find and share features across teams.

Answer

A feature store is a model registry that stores trained model artifacts and their full deployment stage history.

Answer

A feature store is a notebook scheduler that recomputes features on a fixed cron without any human review.

Answer

A feature store is a dashboarding layer that visualizes feature drift for stakeholders after deployment.

Question 14

An MLOps team is scaling a PyTorch model across a Spark cluster with TorchDistributor, and the engineer is being asked which CLI command TorchDistributor invokes under the hood to coordinate training across the worker nodes. Per the Databricks distributed training documentation, which option matches?

Accepted Answer

torch.distributed.run, which TorchDistributor uses to run training across the worker nodes

Answer

spark-submit, which TorchDistributor wraps to broadcast the model to each cluster executor

Answer

mlflow run, which TorchDistributor invokes once per worker to launch the training job

Answer

horovodrun, which TorchDistributor calls to set up the all-reduce communication ring

Question 15

An ML platform engineer at a fintech credit-risk shop is evaluating a Git-based source pattern for Lakeflow Jobs and the architect is being asked about the trade-offs of using a remote Git repository as the source for job tasks. Per the Databricks CI/CD documentation, which option matches?

Accepted Answer

Git with jobs lets you configure some job types to use a remote Git repository as the source for code files, but only code files are source-controlled, not task sequences, compute settings, or schedules, limiting multi-environment use.

Answer

Git with jobs lets you source-control every aspect of every job in every workspace, including task sequences, compute settings, and schedules, with no documented restriction on multi-environment, cross-workspace deployments at all.

Answer

Git with jobs is a deprecated dbutils helper that is no longer supported on Databricks and explicitly cannot use any remote Git repository as the source for any code files of any Lakeflow Job task in the workspace anywhere.

Answer

Git with jobs is only supported for warehouse SQL workloads at fintech credit-risk teams, and explicitly cannot use any remote Git repository as the source for any code files of any Lakeflow Job task in the workspace at all.

Databricks Certified Machine Learning Associate

Sample questions

Sources

Similar exams