AWS Certified Machine Learning — Associate (MLA-C01)

Comprehensive exam notes. Each domain has a dedicated file.

Exam Overview


Exam Code	MLA-C01
Duration	170 minutes
Questions	~65 questions (multiple choice + multiple response)
Passing Score	720 / 1000
Cost	$150 USD
Level	Associate

Domain Breakdown

Domain	Weight	File
1. Data Preparation for ML	28%	[[01 - Data Preparation]]
2. ML Model Development	26%	[[02 - Model Development]]
3. Deployment & Orchestration	22%	[[03 - Deployment & MLOps]]
4. Monitoring, Maintenance & Security	24%	[[04 - Monitoring & Security]]
ML Fundamentals (cross-domain)	—	[[05 - ML Fundamentals]]
AWS AI Services (cross-domain)	—	[[06 - AWS AI Services]]

Core Service Map

graph TD
    Data[Data Layer] --> S3[S3<br/>Storage]
    Data --> Glue[AWS Glue<br/>ETL + Catalog]
    Data --> Kinesis[Kinesis<br/>Streaming]
    Data --> Redshift[Redshift<br/>Data Warehouse]
    Data --> Athena[Athena<br/>SQL on S3]

    SM[SageMaker] --> Studio[Studio<br/>IDE]
    SM --> GT[Ground Truth<br/>Labeling]
    SM --> FS[Feature Store]
    SM --> Train[Training Jobs]
    SM --> HPO[Hyperparameter<br/>Tuning]
    SM --> Deploy[Endpoints<br/>Real-time / Batch]
    SM --> Pipe[Pipelines<br/>MLOps]
    SM --> MR[Model Registry]
    SM --> MM[Model Monitor]
    SM --> Clarify[Clarify<br/>Bias + Explainability]

    AI[AI Services] --> Rek[Rekognition]
    AI --> Comp[Comprehend]
    AI --> Tx[Textract]
    AI --> Bed[Bedrock<br/>GenAI]

    style SM fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
    style Data fill:#dcfce7,stroke:#16a34a
    style AI fill:#fef9c3,stroke:#ca8a04

The "When to Use What" Master Table

Need	Service / Approach
Store training data	S3
Catalog and discover data	AWS Glue Data Catalog
ETL pipeline for ML data	AWS Glue
Query S3 data with SQL	Athena
Streaming data ingestion	Kinesis Data Streams / Firehose
Data warehousing	Redshift
Label training data	SageMaker Ground Truth
Interactive ML notebook	SageMaker Studio / Notebook Instance
Pre-built ML environment	SageMaker Studio Lab
Train ML model	SageMaker Training Job
Tune hyperparameters	SageMaker Automatic Model Tuning (HPO)
Feature engineering + reuse	SageMaker Feature Store
Pre-trained algorithms	SageMaker Built-in Algorithms
Pre-trained model (fine-tune)	SageMaker JumpStart
Real-time inference	SageMaker Real-time Endpoint
Batch inference	SageMaker Batch Transform
Low-traffic / serverless inference	SageMaker Serverless Inference
Async large-payload inference	SageMaker Async Inference
Version and approve models	SageMaker Model Registry
Orchestrate ML pipeline	SageMaker Pipelines
Detect data / model drift	SageMaker Model Monitor
Detect bias	SageMaker Clarify
Explain predictions	SageMaker Clarify (SHAP)
No-code ML	SageMaker Canvas
Image/video analysis	Rekognition
Text analysis / NLP	Comprehend
Document extraction	Textract
Speech to text	Transcribe
Text to speech	Polly
Translation	Translate
Time series forecasting	Forecast
Recommendations	Personalize
Enterprise search	Kendra
Chatbot	Lex
Foundation models / GenAI	Bedrock
Anomaly detection	Random Cut Forest (SageMaker)

Key Numbers to Memorise

Fact	Value
SageMaker Training: max runtime	5 days (432,000 sec)
SageMaker Endpoint: min instances for HA	2 (across AZs)
Batch Transform: file size limit per record	100 MB (split_type helps)
Ground Truth: human review threshold	Configurable confidence score
HPO: max parallel jobs	10 (default)
Feature Store: online store latency	Single-digit ms
SageMaker Clarify bias metric: CI	0.02 threshold common default
Model Monitor: baseline stats from	Training/validation data
Kinesis Shard: reads	5 transactions/sec, 2 MB/sec
Kinesis Shard: writes	1,000 records/sec, 1 MB/sec

Exam Strategy

Domain 1 (28%) is the biggest — know Glue, S3, feature engineering cold
SageMaker questions dominate — know every endpoint type and when to use each
Scenario questions: always pick the most managed, least operational overhead AWS-native solution
"Custom container" → use when built-in algorithms don't fit
"BYOC" (Bring Your Own Container) = custom Docker image in ECR
When asked about cost optimization: Spot instances for training, serverless for low traffic inference
When asked about bias/fairness → always SageMaker Clarify
When asked about drift → always SageMaker Model Monitor

AWS ML Associate Exam — Index & Guide

AWS Certified Machine Learning — Associate (MLA-C01)

Exam Overview

Domain Breakdown

Core Service Map

The "When to Use What" Master Table

Key Numbers to Memorise

Exam Strategy