Back to Notes

AWS ML Associate Exam — Index & Guide

AWS Certified Machine Learning — Associate (MLA-C01)

Comprehensive exam notes. Each domain has a dedicated file.


Exam Overview

Exam CodeMLA-C01
Duration170 minutes
Questions~65 questions (multiple choice + multiple response)
Passing Score720 / 1000
Cost$150 USD
LevelAssociate

Domain Breakdown

DomainWeightFile
1. Data Preparation for ML28%[[01 - Data Preparation]]
2. ML Model Development26%[[02 - Model Development]]
3. Deployment & Orchestration22%[[03 - Deployment & MLOps]]
4. Monitoring, Maintenance & Security24%[[04 - Monitoring & Security]]
ML Fundamentals (cross-domain)[[05 - ML Fundamentals]]
AWS AI Services (cross-domain)[[06 - AWS AI Services]]

Core Service Map

graph TD
    Data[Data Layer] --> S3[S3<br/>Storage]
    Data --> Glue[AWS Glue<br/>ETL + Catalog]
    Data --> Kinesis[Kinesis<br/>Streaming]
    Data --> Redshift[Redshift<br/>Data Warehouse]
    Data --> Athena[Athena<br/>SQL on S3]

    SM[SageMaker] --> Studio[Studio<br/>IDE]
    SM --> GT[Ground Truth<br/>Labeling]
    SM --> FS[Feature Store]
    SM --> Train[Training Jobs]
    SM --> HPO[Hyperparameter<br/>Tuning]
    SM --> Deploy[Endpoints<br/>Real-time / Batch]
    SM --> Pipe[Pipelines<br/>MLOps]
    SM --> MR[Model Registry]
    SM --> MM[Model Monitor]
    SM --> Clarify[Clarify<br/>Bias + Explainability]

    AI[AI Services] --> Rek[Rekognition]
    AI --> Comp[Comprehend]
    AI --> Tx[Textract]
    AI --> Bed[Bedrock<br/>GenAI]

    style SM fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
    style Data fill:#dcfce7,stroke:#16a34a
    style AI fill:#fef9c3,stroke:#ca8a04

The "When to Use What" Master Table

NeedService / Approach
Store training dataS3
Catalog and discover dataAWS Glue Data Catalog
ETL pipeline for ML dataAWS Glue
Query S3 data with SQLAthena
Streaming data ingestionKinesis Data Streams / Firehose
Data warehousingRedshift
Label training dataSageMaker Ground Truth
Interactive ML notebookSageMaker Studio / Notebook Instance
Pre-built ML environmentSageMaker Studio Lab
Train ML modelSageMaker Training Job
Tune hyperparametersSageMaker Automatic Model Tuning (HPO)
Feature engineering + reuseSageMaker Feature Store
Pre-trained algorithmsSageMaker Built-in Algorithms
Pre-trained model (fine-tune)SageMaker JumpStart
Real-time inferenceSageMaker Real-time Endpoint
Batch inferenceSageMaker Batch Transform
Low-traffic / serverless inferenceSageMaker Serverless Inference
Async large-payload inferenceSageMaker Async Inference
Version and approve modelsSageMaker Model Registry
Orchestrate ML pipelineSageMaker Pipelines
Detect data / model driftSageMaker Model Monitor
Detect biasSageMaker Clarify
Explain predictionsSageMaker Clarify (SHAP)
No-code MLSageMaker Canvas
Image/video analysisRekognition
Text analysis / NLPComprehend
Document extractionTextract
Speech to textTranscribe
Text to speechPolly
TranslationTranslate
Time series forecastingForecast
RecommendationsPersonalize
Enterprise searchKendra
ChatbotLex
Foundation models / GenAIBedrock
Anomaly detectionRandom Cut Forest (SageMaker)

Key Numbers to Memorise

FactValue
SageMaker Training: max runtime5 days (432,000 sec)
SageMaker Endpoint: min instances for HA2 (across AZs)
Batch Transform: file size limit per record100 MB (split_type helps)
Ground Truth: human review thresholdConfigurable confidence score
HPO: max parallel jobs10 (default)
Feature Store: online store latencySingle-digit ms
SageMaker Clarify bias metric: CI0.02 threshold common default
Model Monitor: baseline stats fromTraining/validation data
Kinesis Shard: reads5 transactions/sec, 2 MB/sec
Kinesis Shard: writes1,000 records/sec, 1 MB/sec

Exam Strategy

  • Domain 1 (28%) is the biggest — know Glue, S3, feature engineering cold
  • SageMaker questions dominate — know every endpoint type and when to use each
  • Scenario questions: always pick the most managed, least operational overhead AWS-native solution
  • "Custom container" → use when built-in algorithms don't fit
  • "BYOC" (Bring Your Own Container) = custom Docker image in ECR
  • When asked about cost optimization: Spot instances for training, serverless for low traffic inference
  • When asked about bias/fairness → always SageMaker Clarify
  • When asked about drift → always SageMaker Model Monitor