AWS Certified Machine Learning Engineer - Associate practice questions

Question 1

A data engineering team is designing a SageMaker Feature Store offline store and must select between the documented table formats. Select the two table formats the offline store supports per the documentation.

Accepted Answer

Per the documentation, "Amazon SageMaker Feature Store supports the AWS Glue and Apache Iceberg table formats for the offline store" — these two formats are exactly the supported choices.

Accepted Answer

Per the documentation, "Feature Store only supports the Parquet file format when writing your data to your offline store" — Parquet is the documented file format underneath the chosen table format.

Answer

Delta Lake is one of the documented table formats supported by the offline store. Delta Lake is not in the documented list; AWS Glue and Apache Iceberg are. The documentation contradicts this option and supports the other choice instead.

Answer

ORC is supported as a file format for the offline store alongside Parquet. Only Parquet is documented for the offline store file format. The documentation contradicts this option and supports the other choice instead.

Question 2

An ML engineer must decide how to scale a deep learning training job that involves a very large model and dataset on SageMaker AI. Which approach does the documentation recommend?

Accepted Answer

Use SageMaker AI distributed training strategies for data parallelism, model parallelism, or both.

Answer

Switch to managed Spot instances, which the documentation calls the recommended path to scale very large training.

Answer

Run automatic model tuning, since hyperparameter search is how very large models are scaled up.

Answer

Enable Experiments tracking, which scales training across instances as the run count increases.

Question 3

A platform team is being asked by a security engineer how to confine SageMaker AI training jobs to private subnets in a customer VPC with no public internet access. Which option matches the documented configuration?

Accepted Answer

Per the documentation, "To control access to your training jobs, run them in an Amazon VPC with private subnets that don't have internet access. You configure the training job to run in the VPC by specifying its subnets and security group IDs. You don't need to specify the subnet for the container of the training job", which is the documented configuration the security engineer is asking for.

Answer

The customer must specify the subnet for the training container explicitly; SageMaker AI will not provision the ENI automatically. The customer specifies VPC subnets and security group IDs; SageMaker AI provisions the network interfaces and attaches them to the training containers. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Answer

Training jobs cannot run in private subnets; they always require an internet-routable subnet. Private subnets without internet access are explicitly supported for training jobs. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Answer

Training jobs require the customer to bake the training container image into a public S3 bucket because SageMaker cannot pull from ECR in a VPC. SageMaker AI automatically pulls the training container image from Amazon ECR even when running in a VPC. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Question 4

An MLOps engineer is being designed by an architect to standardize CI/CD environments across data science and MLOps teams with reproducible templates. Which option matches the documented SageMaker Projects positioning?

Accepted Answer

Per the documentation, "SageMaker Projects help organizations set up and standardize developer environments for data scientists and CI/CD systems for MLOps engineers. Projects also help organizations set up dependency management, code repository management, build reproducibility, and artifact sharing", which is the standardization surface the architect is designing for.

Answer

SageMaker Projects only supports the AWS Service Catalog templates and rejects custom Amazon S3 templates. Projects can be provisioned using custom templates stored in Amazon S3 buckets or templates from the AWS Service Catalog. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Answer

SageMaker Projects are limited to model deployment automation and cannot set up developer environments. Projects standardize developer environments for data scientists as well as CI/CD for MLOps engineers. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Answer

SageMaker Projects do not include source control or automated ML pipelines and must be assembled manually. SageMaker AI-provided templates bootstrap source version control, automated ML pipelines, and starter code. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Question 5

A company is processing very large datasets with Apache Spark and Apache Hadoop and wants a managed cluster platform to transform the data and move it in and out of Amazon S3 and Amazon DynamoDB. Which service fits?

Accepted Answer

Amazon EMR, a managed cluster platform that runs frameworks like Apache Hadoop and Apache Spark and moves data to and from S3.

Answer

AWS Glue DataBrew, a no-code visual tool for small cleaning recipes that does not run Apache Spark or Hadoop clusters at all.

Answer

Amazon Athena, a serverless SQL engine for querying S3 that does not provide a managed Spark or Hadoop processing cluster.

Answer

Amazon Kinesis Data Streams, a real-time ingestion service built for streaming records rather than batch big-data cluster processing.

Question 6

A company has many iterations of training runs and needs to compare parameters, configurations, and metrics across them. Which SageMaker AI capability is purpose-built for this?

Accepted Answer

SageMaker Experiments, which tracks the inputs, parameters, and results of runs for comparison.

Answer

SageMaker Clarify, which produces SHAP attributions to explain individual model predictions.

Answer

SageMaker Debugger, which captures training tensors to diagnose non-converging training jobs.

Answer

SageMaker Autopilot, which automatically builds and ranks candidate models from raw tabular data.

Question 7

A team runs a single-model SageMaker real-time endpoint and needs per-instance and per-GPU CloudWatch visibility, including InstanceId and AcceleratorId dimensions, to diagnose one hot instance. Which capability provides this?

Accepted Answer

SageMaker enhanced metrics, which add InstanceId, ContainerId, and AcceleratorId dimensions for per-instance, per-container, and per-GPU visibility.

Answer

Standard SageMaker invocation metrics, which by default already break down latency and errors by InstanceId and AcceleratorId across every endpoint instance.

Answer

AWS CloudTrail logs, which add per-instance and per-accelerator dimensions to the endpoint's CloudWatch metric namespace for analysis.

Answer

SageMaker Model Dashboard, which surfaces per-instance and per-GPU utilization by combining the data-quality and bias-drift report outputs.

Question 8

A production team is rolling out a model update with SageMaker AI blue/green deployment guardrails and must select the traffic-shifting modes that limit the blast radius of a regressive update compared with shifting all traffic at once. Which TWO modes apply?

Accepted Answer

Canary, which shifts a small portion of traffic first and confines the blast radius to the canary fleet

Accepted Answer

Linear, which shifts traffic in equal increments and gives granular control over the course of the update

Answer

All at once, which shifts every request to the new fleet in a single step so regressions hit all traffic

Answer

Auto-rollback, which reverts to the previous model automatically once a configured alarm has fired

Question 9

A developer is being onboarded to SageMaker Feature Store and is asking which scalar feature types the Feature Store schema supports for individual feature definitions. Which option matches the documented supported types?

Accepted Answer

Per the documentation, "Feature Store supports the following feature types: String, Fractional (IEEE 64-bit floating point value), and Integral (Int64 - 64 bit signed integral value). The default type is set to String", which is exactly the type set the developer can choose from.

Answer

Feature Store supports only String types; numeric values must be serialized as JSON strings before ingestion. Fractional (IEEE 64-bit) and Integral (Int64) types are explicitly supported alongside String. The documentation contradicts this option and supports the other choice instead.

Answer

Feature Store supports nested array and struct types so engineers can store list-of-floats embeddings directly. The documented supported types are scalar String, Fractional, and Integral; nested arrays are not in this list. The documentation contradicts this option and supports the other choice instead.

Answer

Feature Store supports Decimal128 and Binary types in addition to the standard scalar feature types. The documented types are String, Fractional, and Integral; Decimal128 and Binary are not on the supported list. The documentation contradicts this option and supports the other choice instead.

Question 10

A marketing team is being designed to group customers into segments based on attributes the team selects, with no labeled outcomes. Which SageMaker AI algorithm fits this unsupervised task?

Accepted Answer

K-Means, an unsupervised algorithm that groups records by similarity over attributes you define.

Answer

DeepAR, which forecasts scalar time series jointly across many related cross-sectional data units.

Answer

Linear Learner, which requires labeled 0/1 or numeric targets to fit a supervised model.

Answer

Object Detection, which classifies and boxes object instances within an input image scene.

Question 11

An organization is being asked by auditors to produce a record showing who called which SageMaker API, from which IP address, and when, for a compliance review. Which service provides this record?

Accepted Answer

AWS CloudTrail, which captures all SageMaker API calls and records the request, the source IP address, the time, and additional details.

Answer

Amazon EventBridge, which delivers near-real-time status-change events that include the caller identity and the source IP for each API request.

Answer

Amazon CloudWatch, which stores fifteen months of metrics that can be queried to reconstruct who invoked each SageMaker API call.

Answer

SageMaker Data Capture, which logs each API caller's identity and IP address alongside the inference request payloads in S3.

Question 12

An MLOps engineer is tasked with pick a purpose-built workflow orchestrator that natively integrates with all SageMaker features for ML automation. Which option matches the documented SageMaker Pipelines positioning?

Accepted Answer

Per the documentation, "Amazon SageMaker Pipelines is a purpose-built workflow orchestration service to automate machine learning (ML) development. Pipelines provide auto-scaling serverless infrastructure: you don't need to manage the underlying orchestration infrastructure to run Pipelines", which is the orchestrator the architect needs.

Answer

SageMaker Pipelines requires customers to operate a self-managed Apache Airflow cluster underneath. Pipelines run on auto-scaling serverless infrastructure; no cluster management is required. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Answer

SageMaker Pipelines is unable to track lineage or versions of pipeline runs. Pipelines support versioning and ML Lineage Tracking for auditability. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Answer

SageMaker Pipelines only integrates with the SageMaker training API and cannot orchestrate processing, evaluation, or deployment jobs. Pipelines integrate seamlessly with all SageMaker AI features and other AWS services for data processing, training, evaluation, deployment, and monitoring. The documentation contradicts this option and supports the other choice instead.

Question 13

A team is designing a SageMaker Ground Truth bounding-box labeling job and the platform engineer asks what prerequisite the input image S3 bucket must satisfy before the job can run. Which option matches the documented requirement?

Accepted Answer

Per the documentation, "Before you create a labeling job, you must upload your dataset to an Amazon S3 bucket. Ground Truth requires all S3 buckets that contain labeling job input image data have a CORS policy attached", which is the gating prerequisite the platform engineer is looking for.

Answer

Ground Truth requires every input bucket to have Object Lock enabled in compliance mode before a labeling job runs. The documented prerequisite is a CORS policy on the bucket, not Object Lock. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Answer

Ground Truth requires the input images to be uploaded directly to a SageMaker-managed bucket rather than a customer bucket. Customer S3 buckets are supported; the prerequisite is a CORS policy, not a managed bucket. The documentation contradicts this option and supports the other choice instead.

Answer

Ground Truth bypasses S3 entirely and reads input images from an Amazon EFS share mounted on the workforce instance. Input data must live in S3, not EFS, and the bucket needs a CORS policy. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit.

Question 14

A junior engineer is tasked with explaining how the SageMaker AI algorithm documentation frames the kind of task ML is well-suited for. Which option matches the documented framing?

Accepted Answer

Per the documentation, "Machine learning can help you accomplish empirical tasks that require some sort of inductive inference. This task involves induction as it uses data to train algorithms to make generalizable inferences. This means that the algorithms can make statistically reliable predictions or decisions, or complete other tasks when applied to new data that was not used to train them", which is how the docs frame the kind of task ML targets.

Answer

Machine learning is presented as a purely deductive system that follows hand-written rules without learning from data. The documentation frames ML as inductive inference learned from data, not deductive rule-following. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Answer

ML in the SageMaker documentation is restricted to deep learning and cannot include classical statistical methods. The documentation discusses a broad range of algorithms classified at various levels of abstraction, not deep learning only. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Answer

The documentation states that ML models can only be evaluated on data they were trained on, never on new data. The documentation explicitly emphasizes generalization to new data not used in training. The documentation contradicts this option and supports the other choice instead. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit. The cited documentation makes the alternative explicit.

Question 15

A platform team is being asked to let one group of users manage one set of SageMaker resources and a different group manage another set, partitioning access by attributes rather than enumerating every resource ARN. Which IAM approach fits?

Accepted Answer

Use tags with ResourceTag conditions in IAM policies to implement attribute-based access control (ABAC) for each group.

Answer

Attach the AmazonSageMakerFullAccess managed policy to each group, which scopes resources automatically by the user's group membership.

Answer

Use a separate SageMaker execution role per group, since the execution role determines which users may call the SageMaker management APIs.

Answer

Enable network isolation on each resource so that only the owning group's VPC subnets are permitted to reach that SageMaker resource.

AWS Certified Machine Learning Engineer - Associate

Sample questions

Sources

Similar exams