AWS Certified Machine Learning Engineer - Associate
Validates ability to build, deploy, operationalize, and maintain machine learning solutions and pipelines on AWS. Covers data preparation for ML, model development and training, deployment and orchestration of ML workflows, and ML solution monitoring, maintenance, and security. Emphasizes Amazon SageMaker for end-to-end ML lifecycle management including feature engineering, model training, inference deployment, and MLOps practices.
Exam domains
- Data Preparation for Machine Learning (ML)28%
Ingest and store data (Amazon S3 as data lake for ML, Amazon FSx for Lustre for HPC training data, EFS for shared notebook storage, EBS for SageMaker training instances; data ingestion - AWS Glue for ETL, Kinesis Data Firehose for streaming, AWS DMS for database to S3, AWS DataSync; data format selection - Parquet/ORC for analytics, RecordIO/Protobuf for SageMaker built-in algorithms, JSON Lines for SageMaker Pipelines). Transform data and perform feature engineering (Amazon SageMaker Data Wrangler for visual data prep - 300+ built-in transformations, custom transformations in PySpark/Pandas/SQL; SageMaker Processing for distributed data processing - SKLearn/Spark/PyTorch/TF containers; AWS Glue DataBrew for no-code prep; feature engineering techniques - normalization/standardization, one-hot encoding, target encoding, binning, polynomial features, datetime decomposition, missing value imputation, outlier handling; SageMaker Feature Store - online + offline stores, feature groups, point-in-time queries, automatic data refresh). Ensure data integrity and prepare data for modeling (data validation rules, schema drift detection with AWS Glue Data Quality, SageMaker Clarify for pre-training bias detection across protected attributes; train/validation/test splits, stratified sampling, cross-validation; class balancing - SMOTE, oversampling/undersampling; data labeling with SageMaker Ground Truth - human labeling workflows, automated data labeling, label adjudication).
- ML Model Development26%
Choose a modeling approach (problem framing - classification/regression/clustering/anomaly detection/time-series forecasting/recommendation; SageMaker built-in algorithms - XGBoost, Linear Learner, KNN, BlazingText, DeepAR, Object Detection, Image Classification, IP Insights, Neural Topic Model, Factorization Machines, Random Cut Forest; SageMaker JumpStart - 300+ foundation models pre-trained including LLMs/text-to-image; custom algorithms via Bring Your Own Container; framework support - PyTorch/TensorFlow/MXNet/SKLearn/HuggingFace pre-built containers; transfer learning vs training from scratch). Train and refine models (SageMaker Training Jobs - instance types including GPU - p3/p4d/g5 and AWS Trainium - trn1, Spot training for cost, distributed training - data parallel/model parallel/sharded data parallel, Hyperband and Bayesian search hyperparameter tuning, SageMaker AMT - Automatic Model Tuning, SageMaker Debugger for tensor inspection and profiling, SageMaker Experiments for tracking and comparing runs, SageMaker Pipelines for ML workflows, SageMaker Training Compiler for compile-time optimization). Analyze model performance (evaluation metrics - accuracy/precision/recall/F1/AUC-ROC for classification, MAE/MSE/RMSE/MAPE for regression, BLEU/ROUGE for NLP, recall@K for recommendation; confusion matrix; SageMaker Clarify post-training bias and SHAP feature attributions; SageMaker Experiments comparison; cross-validation results aggregation; model cards for documentation).
Sources
Questions are grounded in 50 references from official and authoritative materials.