graph TD
Data[Data Layer] --> S3[S3<br/>Storage]
Data --> Glue[AWS Glue<br/>ETL + Catalog]
Data --> Kinesis[Kinesis<br/>Streaming]
Data --> Redshift[Redshift<br/>Data Warehouse]
Data --> Athena[Athena<br/>SQL on S3]
SM[SageMaker] --> Studio[Studio<br/>IDE]
SM --> GT[Ground Truth<br/>Labeling]
SM --> FS[Feature Store]
SM --> Train[Training Jobs]
SM --> HPO[Hyperparameter<br/>Tuning]
SM --> Deploy[Endpoints<br/>Real-time / Batch]
SM --> Pipe[Pipelines<br/>MLOps]
SM --> MR[Model Registry]
SM --> MM[Model Monitor]
SM --> Clarify[Clarify<br/>Bias + Explainability]
AI[AI Services] --> Rek[Rekognition]
AI --> Comp[Comprehend]
AI --> Tx[Textract]
AI --> Bed[Bedrock<br/>GenAI]
style SM fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style Data fill:#dcfce7,stroke:#16a34a
style AI fill:#fef9c3,stroke:#ca8a04
The "When to Use What" Master Table
Need
Service / Approach
Store training data
S3
Catalog and discover data
AWS Glue Data Catalog
ETL pipeline for ML data
AWS Glue
Query S3 data with SQL
Athena
Streaming data ingestion
Kinesis Data Streams / Firehose
Data warehousing
Redshift
Label training data
SageMaker Ground Truth
Interactive ML notebook
SageMaker Studio / Notebook Instance
Pre-built ML environment
SageMaker Studio Lab
Train ML model
SageMaker Training Job
Tune hyperparameters
SageMaker Automatic Model Tuning (HPO)
Feature engineering + reuse
SageMaker Feature Store
Pre-trained algorithms
SageMaker Built-in Algorithms
Pre-trained model (fine-tune)
SageMaker JumpStart
Real-time inference
SageMaker Real-time Endpoint
Batch inference
SageMaker Batch Transform
Low-traffic / serverless inference
SageMaker Serverless Inference
Async large-payload inference
SageMaker Async Inference
Version and approve models
SageMaker Model Registry
Orchestrate ML pipeline
SageMaker Pipelines
Detect data / model drift
SageMaker Model Monitor
Detect bias
SageMaker Clarify
Explain predictions
SageMaker Clarify (SHAP)
No-code ML
SageMaker Canvas
Image/video analysis
Rekognition
Text analysis / NLP
Comprehend
Document extraction
Textract
Speech to text
Transcribe
Text to speech
Polly
Translation
Translate
Time series forecasting
Forecast
Recommendations
Personalize
Enterprise search
Kendra
Chatbot
Lex
Foundation models / GenAI
Bedrock
Anomaly detection
Random Cut Forest (SageMaker)
Key Numbers to Memorise
Fact
Value
SageMaker Training: max runtime
5 days (432,000 sec)
SageMaker Endpoint: min instances for HA
2 (across AZs)
Batch Transform: file size limit per record
100 MB (split_type helps)
Ground Truth: human review threshold
Configurable confidence score
HPO: max parallel jobs
10 (default)
Feature Store: online store latency
Single-digit ms
SageMaker Clarify bias metric: CI
0.02 threshold common default
Model Monitor: baseline stats from
Training/validation data
Kinesis Shard: reads
5 transactions/sec, 2 MB/sec
Kinesis Shard: writes
1,000 records/sec, 1 MB/sec
Exam Strategy
Domain 1 (28%) is the biggest — know Glue, S3, feature engineering cold
SageMaker questions dominate — know every endpoint type and when to use each
Scenario questions: always pick the most managed, least operational overhead AWS-native solution
"Custom container" → use when built-in algorithms don't fit
"BYOC" (Bring Your Own Container) = custom Docker image in ECR
When asked about cost optimization: Spot instances for training, serverless for low traffic inference
When asked about bias/fairness → always SageMaker Clarify
When asked about drift → always SageMaker Model Monitor