Back to Notes

AWS Lambda

What is Lambda?

  • Serverless compute — run code without provisioning or managing servers
  • Event-driven — executes in response to triggers (S3, API Gateway, Kinesis, DynamoDB Streams, SNS, SQS, etc.)
  • FaaS (Function as a Service) — you only write the function logic
  • AWS manages: provisioning, scaling, patching, availability
  • Pay per invocation + duration — free tier: 1M requests/month + 400,000 GB-seconds compute

Key limits:

ParameterLimit
Max execution timeout15 minutes
Memory128 MB – 10 GB
Ephemeral storage (/tmp)512 MB – 10 GB
Deployment package (zip)50 MB (250 MB unzipped)
Container image size10 GB
Concurrent executions (default)1,000 per region (soft limit)

Execution Model

graph LR
    Trigger[Event Trigger<br/>S3 / API GW / Kinesis] --> Lambda[Lambda Function<br/>Init + Execute]
    Lambda --> Result[Response /<br/>Downstream Action]

    subgraph Cold Start
        CS1[Download code] --> CS2[Start runtime] --> CS3[Run init code]
    end

    subgraph Warm Start
        WS1[Reuse existing<br/>execution environment]
    end

    style Cold Start fill:#fee2e2,stroke:#dc2626
    style Warm Start fill:#dcfce7,stroke:#16a34a

Cold Start vs Warm Start

Cold StartWarm Start
WhenFirst invocation / after idleReused execution environment
LatencyHigher (100ms–1s+)Near-zero
MitigationProvisioned Concurrency (keeps envs warm), SnapStart (Java)

Exam tip: Provisioned Concurrency = pre-initialized execution environments. Costs money even when idle. Use for latency-sensitive workloads (e.g., SageMaker endpoint wrapper).


Triggers (Event Sources)

graph TD
    Triggers --> Sync[Synchronous<br/>Waits for response]
    Triggers --> Async[Asynchronous<br/>Fire and forget]
    Triggers --> Stream[Stream / Queue<br/>Polling-based]

    Sync --> APIGW[API Gateway]
    Sync --> ALB[ALB]
    Sync --> Cognito[Cognito]

    Async --> S3[S3 Events]
    Async --> SNS[SNS]
    Async --> EventBridge[EventBridge]

    Stream --> Kinesis[Kinesis Data Streams]
    Stream --> DDB[DynamoDB Streams]
    Stream --> SQS[SQS]

    style Sync fill:#dbeafe,stroke:#3b82f6
    style Async fill:#dcfce7,stroke:#16a34a
    style Stream fill:#f3e8ff,stroke:#9333ea
  • Synchronous: caller waits for Lambda to return (e.g., API Gateway → Lambda → response)
  • Asynchronous: Lambda queues the event, retries on failure (2 retries by default), can send to DLQ
  • Stream/Queue: Lambda polls the source (Kinesis, SQS), processes in batches

Lambda for ML Pipelines (MLA-C01 focus)

This is where Lambda matters for the exam:

Pattern 1: S3 → Lambda → SageMaker (Trigger-based inference)

S3 (new file uploaded) → Lambda → invoke SageMaker endpoint → store result in S3/DynamoDB

Use case: Real-time batch processing — new data file lands, Lambda triggers inference job

Pattern 2: Kinesis → Lambda → Feature Store

Kinesis Data Stream → Lambda (transform/enrich) → SageMaker Feature Store / S3

Use case: Online feature engineering from streaming data before model serving

Pattern 3: API Gateway → Lambda → Bedrock

API Gateway → Lambda → Bedrock InvokeModel API → return LLM response

Use case: Serverless GenAI app — Lambda wraps Bedrock API, handles auth + prompt formatting

Pattern 4: EventBridge → Lambda → SageMaker Pipeline

EventBridge (cron) → Lambda → start SageMaker Pipeline execution

Use case: Scheduled retraining pipeline — trigger nightly model refresh without always-on compute

Exam tip: Lambda is the glue layer in ML pipelines. It connects triggers to ML services without needing an always-on server. Know these 4 patterns cold.


Lambda Layers

  • Layers = additional code/libraries bundled separately and shared across functions
  • Use case: Package large ML libraries (numpy, pandas, scikit-learn) once, reuse across functions
  • Up to 5 layers per function
  • Total unzipped size (function + layers) ≤ 250 MB
Lambda Function
├── function.py (your code)
└── Layers
    ├── numpy-layer (shared)
    ├── pandas-layer (shared)
    └── custom-utils-layer

MLA-C01 tip: SageMaker inference uses Lambda Layers to package inference dependencies when building Lambda-backed inference logic.


Lambda + VPC

  • By default, Lambda runs outside your VPC (can call public AWS services via internet)
  • To access RDS, ElastiCache, or private SageMaker endpoints: deploy Lambda inside a VPC
  • Requires: VPC ID, subnet IDs, security group IDs
  • Cold start penalty increases when Lambda is VPC-attached (ENI provisioning) — mitigated since 2019 with hyperplane ENIs
graph LR
    A[Lambda outside VPC] -->|Public endpoint| B[S3 / DynamoDB / Bedrock]
    C[Lambda inside VPC] -->|Private| D[RDS / ElastiCache / SageMaker Private Endpoint]
    C -->|Via NAT GW| B

IAM for Lambda

  • Execution Role — IAM role assumed by Lambda at runtime to call other AWS services
    • Example: allow s3:GetObject, sagemaker:InvokeEndpoint, bedrock:InvokeModel
  • Resource Policy — who can invoke the Lambda function (e.g., allow API Gateway, S3, EventBridge)
Execution Role = what Lambda CAN DO
Resource Policy = who CAN CALL Lambda

Exam tip: If Lambda can't call SageMaker/Bedrock, check the execution role policy first.


Lambda Concurrency

TypeDescription
Unreserved concurrencyDefault pool shared across all functions in the account
Reserved concurrencyCaps a function's max concurrency — also guarantees it won't be throttled by others
Provisioned concurrencyPre-warms execution environments — eliminates cold starts
  • Account default: 1,000 concurrent executions per region
  • Throttling → TooManyRequestsException (HTTP 429)

Pricing

  • Requests: $0.20 per 1M invocations (first 1M/month free)
  • Duration: $0.0000166667 per GB-second (400,000 GB-seconds free/month)
  • Provisioned Concurrency: charged per GB-second of provisioned capacity (even when idle)

Exam Cheat Sheet

TopicKey Fact
Max timeout15 minutes
Cold start fixProvisioned Concurrency
Shared librariesLambda Layers (up to 5)
Private resource accessDeploy Lambda inside VPC
ML pipeline glueLambda → SageMaker / Bedrock / Kinesis
Who calls LambdaResource Policy
What Lambda can callExecution Role
Async retry on failure2 retries → DLQ
Stream pollingKinesis, DynamoDB Streams, SQS
Concurrency capReserved Concurrency

Labs

Lab 1 — Hello Lambda (10 min)

Goal: Deploy your first function and test it manually.

  1. Console → Lambda → Create function → Author from scratch
  2. Runtime: Python 3.12 | Execution role: Create new with basic Lambda permissions
  3. Write a simple handler that returns {"statusCode": 200, "body": "Hello from Lambda"}
  4. Test with a sample event → observe logs in CloudWatch

Lab 2 — S3 Trigger (20 min)

Goal: Auto-trigger Lambda when a file lands in S3.

  1. Create an S3 bucket
  2. Create a Lambda function that logs the S3 object key from the event
  3. Add S3 trigger: bucket → event type PUT
  4. Upload a test file → verify Lambda fired in CloudWatch Logs
  5. Extend: print the file size from event['Records'][0]['s3']['object']['size']

Lab 3 — API Gateway → Lambda (20 min)

Goal: Build a serverless HTTP endpoint.

  1. Create Lambda function that parses event['queryStringParameters'] and returns a JSON response
  2. Attach API Gateway trigger (REST API or HTTP API)
  3. Deploy the API → test via curl or browser
  4. Extend: add a POST handler that reads event['body']

Lab 4 — Lambda → Bedrock (30 min) ⭐ MLA-C01 relevant

Goal: Call Bedrock from Lambda (the serverless GenAI pattern).

  1. Create Lambda with execution role that includes bedrock:InvokeModel
  2. Use boto3 to call bedrock-runtime.invoke_model() with Claude or Titan
  3. Parse the response and return the LLM output
  4. Test with a sample prompt
import boto3, json

def lambda_handler(event, context):
    client = boto3.client("bedrock-runtime", region_name="us-east-1")
    body = json.dumps({
        "prompt": event.get("prompt", "What is machine learning?"),
        "max_tokens_to_sample": 200
    })
    response = client.invoke_model(
        modelId="anthropic.claude-instant-v1",
        body=body,
        contentType="application/json",
        accept="application/json"
    )
    return json.loads(response["body"].read())

Lab 5 — Lambda + SageMaker Endpoint Invoke (30 min) ⭐ MLA-C01 relevant

Goal: Trigger real-time inference from Lambda.

  1. Deploy any SageMaker endpoint (even a built-in XGBoost demo)
  2. Create Lambda with sagemaker:InvokeEndpoint in execution role
  3. Use boto3 sagemaker-runtime to call the endpoint with sample payload
  4. Return the prediction result
  5. Add API Gateway trigger → you now have a serverless ML API

Lab 6 — Lambda Layers (20 min)

Goal: Package numpy/pandas as a Lambda Layer.

  1. Create a zip with python/numpy and python/pandas installed via pip
  2. Upload as a Lambda Layer
  3. Attach to a function and confirm import numpy works without bundling it in the function zip

Summary

FeatureLambda
TypeServerless FaaS
Max runtime15 min
Trigger modelEvent-driven (sync / async / stream)
ML rolePipeline glue — connect S3/Kinesis/API GW to SageMaker/Bedrock
ScalingAutomatic (up to concurrency limit)
Cold startYes — mitigated by Provisioned Concurrency
VPC supportYes (with ENI overhead)
Key for MLA-C01Trigger patterns + Bedrock/SageMaker invoke patterns