Langchain
L1 — Models, Prompts & Output Parsers
Source: DeepLearning.AI — LangChain for LLM App Dev | Code: [[L1-Model_prompt_parser.py]]
Why LangChain over raw API calls?
| Raw OpenAI API | LangChain |
|---|---|
| Manual prompt string building | Reusable ChatPromptTemplate with variables |
response.content is always a string | StructuredOutputParser converts to dict |
| No abstraction across providers | Same code works with OpenAI, Anthropic, etc. |
Model
from langchain_openai import ChatOpenAI
# temperature=0.0 → deterministic output (use for structured tasks)
# temperature=1.0 → creative/random output
chat = ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo")
Prompt Template
Reusable prompts with named variables — avoids manual f-string construction.
from langchain.prompts import ChatPromptTemplate
template_string = """Translate the text delimited by triple backticks \
into a style that is {style}. text: ```{text}```"""
prompt_template = ChatPromptTemplate.from_template(template_string)
# Fill in variables → returns list of LangChain message objects
messages = prompt_template.format_messages(
style="American English in a calm and respectful tone",
text="Arrr, I be fuming that me blender lid flew off!"
)
response = chat.invoke(messages)
print(response.content) # string output
format_messages()returnslist[HumanMessage]— LangChain message objects, not raw strings.
Output Parsers
The problem: response.content is always a str. Even if you ask for JSON, you can't call .get() on it.
response.content.get('gift') # ❌ AttributeError: 'str' has no attribute 'get'
The solution: StructuredOutputParser — define a schema, inject format instructions into the prompt, parse the response into a real Python dict.
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
# 1. Define schema
gift_schema = ResponseSchema(name="gift",
description="Was the item purchased as a gift? True or False.")
delivery_schema = ResponseSchema(name="delivery_days",
description="How many days to arrive? -1 if not found.")
price_schema = ResponseSchema(name="price_value",
description="Any sentences about value/price as a comma-separated list.")
output_parser = StructuredOutputParser.from_response_schemas(
[gift_schema, delivery_schema, price_schema]
)
# 2. Get format instructions to inject into prompt
format_instructions = output_parser.get_format_instructions()
# 3. Include {format_instructions} in your prompt template
template = """\
Extract info from the text below.
text: {text}
{format_instructions}
"""
prompt = ChatPromptTemplate.from_template(template)
messages = prompt.format_messages(
text=customer_review,
format_instructions=format_instructions
)
# 4. Parse response → Python dict
response = chat.invoke(messages)
output_dict = output_parser.parse(response.content)
print(type(output_dict)) # <class 'dict'>
print(output_dict.get('gift')) # True
print(output_dict['delivery_days']) # 2
Key Takeaways
temperature=0.0→ use for extraction/structured tasks;>0for creative tasksChatPromptTemplate→ reusable, variable-driven prompts- LLM output is always a string — use
StructuredOutputParserto get a real dict format_instructionstells the LLM exactly what JSON schema to output
L2 — Memory + Chains
Source: DeepLearning.AI — LangChain for LLM App Dev
Why Memory?
LLMs are stateless — each call is independent. Memory gives conversations continuity by injecting prior history into the prompt context.
Memory Types
1. ConversationBufferMemory
Stores the full conversation history verbatim.
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
memory.save_context({"input": "Hi, I'm Vatsal"}, {"output": "Hello Vatsal!"})
memory.save_context({"input": "What is my name?"}, {"output": "Your name is Vatsal."})
print(memory.load_memory_variables({}))
# {'history': 'Human: Hi, I'm Vatsal\nAI: Hello Vatsal!\nHuman: What is my name?\nAI: Your name is Vatsal.'}
Problem: Grows unbounded — will eventually exceed the context window.
2. ConversationBufferWindowMemory
Keeps only the last K exchanges. Older history is dropped.
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=2) # keep last 2 exchanges
Trade-off: Fixed cost, but loses early context (e.g., user's name mentioned at the start).
3. ConversationTokenBufferMemory
Keeps history up to a token limit — drops oldest messages when limit exceeded.
from langchain.memory import ConversationTokenBufferMemory
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
Best for: Staying within a predictable token budget while keeping as much context as possible.
4. ConversationSummaryMemory
Summarises older history instead of dropping it. Uses an LLM call to compress.
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)
Best for: Long conversations where key facts (name, goal) need to persist without full verbatim history. Costs extra LLM calls.
Memory Comparison
| Type | What it stores | Cost | Best for |
|---|---|---|---|
| BufferMemory | Full history | Grows unbounded | Short conversations |
| BufferWindowMemory | Last K turns | Fixed | Fixed-budget chats |
| TokenBufferMemory | History up to N tokens | Predictable | Token budget management |
| SummaryMemory | LLM-compressed summary | Extra LLM calls | Long multi-turn sessions |
Chains
LLMChain
The simplest chain — wraps a prompt template + LLM call into one reusable object.
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
"What is the best name for a company that makes {product}?"
)
chain = LLMChain(llm=llm, prompt=prompt)
chain.run("Queen Size Sheets")
SimpleSequentialChain
Chains where output of one step = input of next. Single input → single output.
from langchain.chains import SimpleSequentialChain
chain_one = LLMChain(llm=llm, prompt=first_prompt) # company name
chain_two = LLMChain(llm=llm, prompt=second_prompt) # describe the company
overall_chain = SimpleSequentialChain(
chains=[chain_one, chain_two],
verbose=True
)
overall_chain.run("Queen Size Sheets")
SequentialChain
Multiple inputs/outputs at each step — results from earlier steps can be passed to later ones.
from langchain.chains import SequentialChain
chain = SequentialChain(
chains=[chain_one, chain_two, chain_three],
input_variables=["review"],
output_variables=["english_review", "summary", "followup_message"],
verbose=True
)
Router Chain
Routes input to different sub-chains based on content. Uses an LLM to classify the input first.
from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
# Define multiple prompt templates (physics, math, history...)
# Router LLM picks the right one based on the question
Use case: Single entry point that handles multiple domains (e.g., technical vs. billing queries).
Memory + Chain Together
from langchain.chains import ConversationChain
conversation = ConversationChain(
llm=llm,
memory=ConversationBufferMemory(),
verbose=True
)
conversation.predict(input="Hi, my name is Vatsal")
conversation.predict(input="What is 1+1?")
conversation.predict(input="What is my name?") # remembers "Vatsal"
Key Takeaways — L2
- Memory is injected into the prompt — it's not magic, it's context stuffing
- BufferWindowMemory is the cheapest production-safe default
- SummaryMemory preserves semantic content at the cost of extra LLM calls
- LangChain
ConversationBufferMemoryis not thread-safe — never share across requests (see: session bleed bug in RAG in Production article) - Chains = composable LLM pipelines; Router Chain = dynamic dispatch based on input
Installation
uv add langchain
# Installing the OpenAI integration
uv add langchain-openai
# Installing the Anthropic integration
uv add langchain-anthropic
# AWS
uv add langchain-aws
# Gemini
uv add langchain-google-genai
Initialise any Standalone Model
from langchain.chat_model import init_chat_model
from pprint import pprint
## Set API Key for model provider
os.environ["OPENAI_API_KEY"] = "sk-..."
# Initialise model
model = init_chat_model("gpt-5-nano")
response = model.invoke("Tell me a joke")
print(response.content)
pprint(response.response_metadata)
Customising Model Parameters:
- temperature: Controls the randomness of the model’s output. A higher number makes responses more creative; lower ones make them more deterministic.
- max_tokens: Limits the total number of tokens in the response, effectively controlling how long the output can be.
- timeout: The maximum time (in seconds) to wait for a response from the model before canceling the request.
- max_retries: The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.
[!Note] We can use different model providers with
init_chat_modelfor some new models we need to use provider's langchain library.
[!important] For all different providers the API keys needs to be stored inside
.envfile.
Invoke
To call any model we can use invoke method with single message or list of messages.
from langchain.messages import AIMessage, HumanMeesage, SystemMessage
conversation = [
SystemMessage("You are a helpful assistant that translates English to French."),
HumanMessage("Translate: I love programming."),
AIMessage("J'adore la programmation."),
HumanMessage("Translate: I love building applications.")
]
response = model.invoke(conversation)
print(response) # AIMessage("J'adore créer des applications.")
Creating Agents
Agents combine language models with tools to create systems that can reason about tasks, decide which tools to use, and iteratively work towards solutions.
from langchain.agents import create_agent
from langchain.chat_model import init_chat_model
from langhcain.messages import HumanMessage, AIMessage
from ppprint import ppprint
agent = create_agent(model="claude-sonnet-4-5")
# If we want more control we can use initialised model to create agent
model = init_chat_model(model="gpt-5-nano", temperature=1.0)
agent = create_agent(model=model)
# we can simple specify model name
agent = create_agent("gpt-5-nano")
# Invoke agent with invoke function
response = agent.invoke({
"messages": [HumanMessage("What's the capital of the Moon?")],
})
pprint(response)
print(response['messages'][-1].content)
# We can pass chat history with several human messages
response = agent.invoke({
"messages": [
HumanMessage(content="What's the capital of the Moon?"),
AIMessage(content="The capital of the Moon is Luna City."),
HumanMessage(content="Interesting, tell me more about Luna City")
]
})
pprint(response)
# We can stream the response from agent instead of waiting a lot of time to receive the message
for token, metadata in agent.stream({
"messages": [HumanMessage(content="Tell me all about Luna City, the capital of the Moon")]
}, stream_mode="messages"):
# token is a message chunk with token content
# metadata contains which node produced the token
if token.content:
print(token.content, end="", flush=True)
System Prompt
You can shape how your agent approaches tasks by providing a prompt.
system_prompt = "You are a science fiction writer, create a capital city at the users request."
scifi_agent = create_agent(
model="gpt-5-nano",
system_prompt=system_prompt
)
response = scifi_agent.invoke(
{"messages": [question]}
)
print(response['messages'][-1].content)
Few Shot Example: To give prompt few examples of question and answer which prompt will follow.
system_prompt = """
You are a science fiction writer, create a space capital city at the users request.
User: What is the capital of mars?
Scifi Writer: Marsialis
User: What is the capital of Venus?
Scifi Writer: Venusovia
"""
Structure Prompt: Give the answer format in the system prompt so that model will follow the output format.
system_prompt = """
You are a science fiction writer, create a space capital city at the users request.
Please keep to the below structure.
Name: The name of the capital city
Location: Where it is based
Vibe: 2-3 words to describe its vibe
Economy: Main industries
"""
# Output
"""
Name: Selene Prime
Location: South Pole region, Shackleton Crater rim, perched on a permanently sunlit annulus with access to near-surface water ice
Vibe: Luminous, austere, resilient
Economy: Helium-3 and water-ice mining; in-situ resource utilization; lunar manufacturing and assembly; spaceport services and logistics; R&D in lunar science and life-support tech
"""
Structured Output:
Instead of passing the output format in the system prompt we can pass the response_format to return output in specific format.
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from pydantic import BaseModel
class CapitalInfo(BaseModel):
name: str
location: str
vibe: str
economy: str
agent = create_Agent(
mode="gpt-5-nano",
system_prompt="You are a science fiction writer, create a space capital city at the users request.",
response_format=CaptialInfo
)
question = HumanMessage("What is the capital of the Moon?")
response = agent.invoke(
{ "messages": [question]}
)
print(response["structured_response"])
capital_info = response["structured_response"]
capital_name = capital_info.name
capital_location = capital_info.location
print(f"{capital_name} is a city located at {capital_location}")
Tools
An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - i.e., when the model emits a final output or an iteration limit is reached.
%%{
init: {
"fontFamily": "monospace",
"flowchart": {
"curve": "curve"
},
"themeVariables": {"edgeLabelBackground": "transparent"}
}
}%%
graph TD
%% Outside the agent
QUERY([input])
LLM{model}
TOOL(tools)
ANSWER([output])
%% Main flows (no inline labels)
QUERY --> LLM
LLM --"action"--> TOOL
TOOL --"observation"--> LLM
LLM --"finish"--> ANSWER
classDef blueHighlight fill:#0a1c25,stroke:#0a455f,color:#bae6fd;
classDef greenHighlight fill:#0b1e1a,stroke:#0c4c39,color:#9ce4c4;
class QUERY blueHighlight;
class ANSWER blueHighlight;
We can create the tools with the tool decorator and with simple python functions. By default the tool name come from the function name and tool description comes from the function comment definition.
from toolchain.tools import tool
@tool
def square_root(x: float) -> float:
"""Calculate the square root of a number"""
return x ** 0.5
We can explicitly pass the tool name and description also
@tool("square_root", description="Calculate the square root of a number")
def tool1(x: float) -> float:
return x ** 0.5
# Invoking the tool with invoke function
tool1.invoke({"x": 467})
Tools example with search capability:
from langchain.tools import tool
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from typing import Dict, Any
from pprint import pprint
from tavily import TavilyClient
tavily_client = TavilyClient()
@tool
def web_search(query: str) -> Dict[str, Any]:
"""Search the web for the information"""
return tavily_client.search(query)
agent = create_agent(
model="gpt-5-nano",
tools=[web_search]
)
question = [HumanMessage("Who is the current mayor of San Francisco?")]
response = agent.invoke({
"messages": question
})
pprint(response['messages'])
Short Term Memory for the Agents
Short term memory lets your application remember previous interactions within a single thread or conversation. A thread organizes multiple interactions in a session, similar to the way email groups messages in a single conversation.
from langgraph.chechpoint.memory import InMemorySaver
from langchain.agents import create_agent
from pprint import pprint
from langhcain.messages import HumanMessage
agent = create_agent("gpt-5-nano", checkpointer=InMemorySaver())
question = HumanMessage(content="Hello my name is Seán and my favourite colour is green")
config = {"configurable": {"thread_id": "1"}}
response = agent.invoke({"messages": [question]}, config)
pprint(response)
question = HumanMessage("What's my favourite colour?")
response = agent.invoke({"messages": [question]}, config)
pprint(respone.messages[-1].content)
Messages
Messages are the fundamental unit of context for models in LangChain. They represent the input and output of models, carrying both the content and metadata needed to represent the state of a conversation when interacting with an LLM. Messages are objects that contain:
- Role - Identifies the message type (e.g.
system,user) - Content - Represents the actual content of the message (like text, images, audio, documents, etc.)
- Metadata - Optional fields such as response information, message IDs, and token usage
LangChain provides a standard message type that works across all model providers, ensuring consistent behavior regardless of the model being called.
Message Types
- System message - Tells the model how to behave and provide context for interactions
- Human message - Represents user input and interactions with the model
- AI message - Responses generated by the model, including text content, tool calls, and metadata
- Tool message - Represents the outputs of tool calls which is part of AIMessage itself as tool_calls. We can explicitly also usage
ToolMessageto create tool message object and pass it as conversation
Basic Usages
We can directly import the different create message objects and use it to invoke the model.
HumanMessageAIMessageSystemMessageToolMessage
Text Prompts
Text prompts are strings - ideal for straightforward generation tasks where you don’t need to retain conversation history. Use text prompts when:
- You have a single, standalone request
- You don’t need conversation history
- You want minimal code complexity
MultiModel Content Messages
LangChain chat models accept message content in the content attribute.This may contain either:
- A string
- A list of content blocks in a provider-native format
- A list of LangChain’s standard content blocks
from langchain.messages import HumanMessage
# String content
human_message = HumanMessage("Hello, how are you?")
# Provider-native format (e.g., OpenAI)
human_message = HumanMessage(content=[
{"type": "text", "text": "Hello, how are you?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
])
# List of standard content blocks
human_message = HumanMessage(content_blocks=[
{"type": "text", "text": "Hello, how are you?"},
{"type": "image", "url": "https://example.com/image.jpg"},
])
Image Input:
# From URL
message = {
"role": "user",
"content": [
{"type": "text", "text": "Describe the content of this image."},
{"type": "image", "url": "https://example.com/path/to/image.jpg"},
]
}
# From base64 data
message = {
"role": "user",
"content": [
{"type": "text", "text": "Describe the content of this image."},
{
"type": "image",
"base64": "AAAAIGZ0eXBtcDQyAAAAAGlzb21tcDQyAAACAGlzb2...",
"mime_type": "image/jpeg", # "image/png"
},
]
}
# From provider-managed File ID
message = {
"role": "user",
"content": [
{"type": "text", "text": "Describe the content of this image."},
{"type": "image", "file_id": "file-abc123"},
]
}
Audio Input:
# From base64 data
message = {
"role": "user",
"content": [
{"type": "text", "text": "Describe the content of this audio."},
{
"type": "audio",
"base64": "AAAAIGZ0eXBtcDQyAAAAAGlzb21tcDQyAAACAGlzb2...",
"mime_type": "audio/wav",
},
]
}
# From provider-managed File ID
message = {
"role": "user",
"content": [
{"type": "text", "text": "Describe the content of this audio."},
{"type": "audio", "file_id": "file-abc123"},
]
}