Back to Notes

Langchain

L1 — Models, Prompts & Output Parsers

Source: DeepLearning.AI — LangChain for LLM App Dev | Code: [[L1-Model_prompt_parser.py]]

Why LangChain over raw API calls?

Raw OpenAI APILangChain
Manual prompt string buildingReusable ChatPromptTemplate with variables
response.content is always a stringStructuredOutputParser converts to dict
No abstraction across providersSame code works with OpenAI, Anthropic, etc.

Model

from langchain_openai import ChatOpenAI

# temperature=0.0 → deterministic output (use for structured tasks)
# temperature=1.0 → creative/random output
chat = ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo")

Prompt Template

Reusable prompts with named variables — avoids manual f-string construction.

from langchain.prompts import ChatPromptTemplate

template_string = """Translate the text delimited by triple backticks \
into a style that is {style}. text: ```{text}```"""

prompt_template = ChatPromptTemplate.from_template(template_string)

# Fill in variables → returns list of LangChain message objects
messages = prompt_template.format_messages(
    style="American English in a calm and respectful tone",
    text="Arrr, I be fuming that me blender lid flew off!"
)

response = chat.invoke(messages)
print(response.content)  # string output

format_messages() returns list[HumanMessage] — LangChain message objects, not raw strings.


Output Parsers

The problem: response.content is always a str. Even if you ask for JSON, you can't call .get() on it.

response.content.get('gift')  # ❌ AttributeError: 'str' has no attribute 'get'

The solution: StructuredOutputParser — define a schema, inject format instructions into the prompt, parse the response into a real Python dict.

from langchain.output_parsers import ResponseSchema, StructuredOutputParser

# 1. Define schema
gift_schema = ResponseSchema(name="gift",
    description="Was the item purchased as a gift? True or False.")
delivery_schema = ResponseSchema(name="delivery_days",
    description="How many days to arrive? -1 if not found.")
price_schema = ResponseSchema(name="price_value",
    description="Any sentences about value/price as a comma-separated list.")

output_parser = StructuredOutputParser.from_response_schemas(
    [gift_schema, delivery_schema, price_schema]
)

# 2. Get format instructions to inject into prompt
format_instructions = output_parser.get_format_instructions()

# 3. Include {format_instructions} in your prompt template
template = """\
Extract info from the text below.
text: {text}
{format_instructions}
"""
prompt = ChatPromptTemplate.from_template(template)
messages = prompt.format_messages(
    text=customer_review,
    format_instructions=format_instructions
)

# 4. Parse response → Python dict
response = chat.invoke(messages)
output_dict = output_parser.parse(response.content)

print(type(output_dict))        # <class 'dict'>
print(output_dict.get('gift'))  # True
print(output_dict['delivery_days'])  # 2

Key Takeaways

  • temperature=0.0 → use for extraction/structured tasks; >0 for creative tasks
  • ChatPromptTemplate → reusable, variable-driven prompts
  • LLM output is always a string — use StructuredOutputParser to get a real dict
  • format_instructions tells the LLM exactly what JSON schema to output

L2 — Memory + Chains

Source: DeepLearning.AI — LangChain for LLM App Dev

Why Memory?

LLMs are stateless — each call is independent. Memory gives conversations continuity by injecting prior history into the prompt context.


Memory Types

1. ConversationBufferMemory

Stores the full conversation history verbatim.

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({"input": "Hi, I'm Vatsal"}, {"output": "Hello Vatsal!"})
memory.save_context({"input": "What is my name?"}, {"output": "Your name is Vatsal."})

print(memory.load_memory_variables({}))
# {'history': 'Human: Hi, I'm Vatsal\nAI: Hello Vatsal!\nHuman: What is my name?\nAI: Your name is Vatsal.'}

Problem: Grows unbounded — will eventually exceed the context window.


2. ConversationBufferWindowMemory

Keeps only the last K exchanges. Older history is dropped.

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)  # keep last 2 exchanges

Trade-off: Fixed cost, but loses early context (e.g., user's name mentioned at the start).


3. ConversationTokenBufferMemory

Keeps history up to a token limit — drops oldest messages when limit exceeded.

from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)

Best for: Staying within a predictable token budget while keeping as much context as possible.


4. ConversationSummaryMemory

Summarises older history instead of dropping it. Uses an LLM call to compress.

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)

Best for: Long conversations where key facts (name, goal) need to persist without full verbatim history. Costs extra LLM calls.


Memory Comparison

TypeWhat it storesCostBest for
BufferMemoryFull historyGrows unboundedShort conversations
BufferWindowMemoryLast K turnsFixedFixed-budget chats
TokenBufferMemoryHistory up to N tokensPredictableToken budget management
SummaryMemoryLLM-compressed summaryExtra LLM callsLong multi-turn sessions

Chains

LLMChain

The simplest chain — wraps a prompt template + LLM call into one reusable object.

from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "What is the best name for a company that makes {product}?"
)
chain = LLMChain(llm=llm, prompt=prompt)
chain.run("Queen Size Sheets")

SimpleSequentialChain

Chains where output of one step = input of next. Single input → single output.

from langchain.chains import SimpleSequentialChain

chain_one = LLMChain(llm=llm, prompt=first_prompt)   # company name
chain_two = LLMChain(llm=llm, prompt=second_prompt)  # describe the company

overall_chain = SimpleSequentialChain(
    chains=[chain_one, chain_two],
    verbose=True
)
overall_chain.run("Queen Size Sheets")

SequentialChain

Multiple inputs/outputs at each step — results from earlier steps can be passed to later ones.

from langchain.chains import SequentialChain

chain = SequentialChain(
    chains=[chain_one, chain_two, chain_three],
    input_variables=["review"],
    output_variables=["english_review", "summary", "followup_message"],
    verbose=True
)

Router Chain

Routes input to different sub-chains based on content. Uses an LLM to classify the input first.

from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser

# Define multiple prompt templates (physics, math, history...)
# Router LLM picks the right one based on the question

Use case: Single entry point that handles multiple domains (e.g., technical vs. billing queries).


Memory + Chain Together

from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(),
    verbose=True
)

conversation.predict(input="Hi, my name is Vatsal")
conversation.predict(input="What is 1+1?")
conversation.predict(input="What is my name?")  # remembers "Vatsal"

Key Takeaways — L2

  • Memory is injected into the prompt — it's not magic, it's context stuffing
  • BufferWindowMemory is the cheapest production-safe default
  • SummaryMemory preserves semantic content at the cost of extra LLM calls
  • LangChain ConversationBufferMemory is not thread-safe — never share across requests (see: session bleed bug in RAG in Production article)
  • Chains = composable LLM pipelines; Router Chain = dynamic dispatch based on input

Installation
uv add langchain

# Installing the OpenAI integration 
uv add langchain-openai 

# Installing the Anthropic integration 
uv add langchain-anthropic

# AWS
uv add langchain-aws

# Gemini
uv add langchain-google-genai

Initialise any Standalone Model

from langchain.chat_model import init_chat_model
from pprint import pprint

## Set API Key for model provider
os.environ["OPENAI_API_KEY"] = "sk-..."

# Initialise model
model = init_chat_model("gpt-5-nano")

response = model.invoke("Tell me a joke")

print(response.content)
pprint(response.response_metadata)

Customising Model Parameters:

  1. temperature: Controls the randomness of the model’s output. A higher number makes responses more creative; lower ones make them more deterministic.
  2. max_tokens: Limits the total number of tokens in the response, effectively controlling how long the output can be.
  3. timeout: The maximum time (in seconds) to wait for a response from the model before canceling the request.
  4. max_retries: The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.

[!Note] We can use different model providers with init_chat_model for some new models we need to use provider's langchain library.

[!important] For all different providers the API keys needs to be stored inside .env file.

Invoke

To call any model we can use invoke method with single message or list of messages.

from langchain.messages import AIMessage, HumanMeesage, SystemMessage

conversation = [ 
	SystemMessage("You are a helpful assistant that translates English to French."), 
	HumanMessage("Translate: I love programming."), 
	AIMessage("J'adore la programmation."), 
	HumanMessage("Translate: I love building applications.") 
] 

response = model.invoke(conversation)
print(response) # AIMessage("J'adore créer des applications.")

Creating Agents

Agents combine language models with tools to create systems that can reason about tasks, decide which tools to use, and iteratively work towards solutions.

from langchain.agents import create_agent
from langchain.chat_model import init_chat_model
from langhcain.messages import HumanMessage, AIMessage
from ppprint import ppprint

agent = create_agent(model="claude-sonnet-4-5")

# If we want more control we can use initialised model to create agent
model = init_chat_model(model="gpt-5-nano", temperature=1.0)
agent = create_agent(model=model)

# we can simple specify model name
agent = create_agent("gpt-5-nano")

# Invoke agent with invoke function
response = agent.invoke({
	"messages": [HumanMessage("What's the capital of the Moon?")], 
})
pprint(response)
print(response['messages'][-1].content)

# We can pass chat history with several human messages
response = agent.invoke({
	"messages": [
		HumanMessage(content="What's the capital of the Moon?"),
		AIMessage(content="The capital of the Moon is Luna City."),
		HumanMessage(content="Interesting, tell me more about Luna City")
	]
})
pprint(response)

# We can stream the response from agent instead of waiting a lot of time to receive the message
for token, metadata in agent.stream({
	"messages": [HumanMessage(content="Tell me all about Luna City, the capital of the Moon")]
}, stream_mode="messages"):
	
	# token is a message chunk with token content
	# metadata contains which node produced the token
	if token.content:
		print(token.content, end="", flush=True)

System Prompt

You can shape how your agent approaches tasks by providing a prompt.

system_prompt = "You are a science fiction writer, create a capital city at the users request."

scifi_agent = create_agent(
    model="gpt-5-nano",
    system_prompt=system_prompt
)

response = scifi_agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

Few Shot Example: To give prompt few examples of question and answer which prompt will follow.

system_prompt = """

You are a science fiction writer, create a space capital city at the users request.

User: What is the capital of mars?
Scifi Writer: Marsialis

User: What is the capital of Venus?
Scifi Writer: Venusovia

"""

Structure Prompt: Give the answer format in the system prompt so that model will follow the output format.

system_prompt = """

You are a science fiction writer, create a space capital city at the users request.

Please keep to the below structure.

Name: The name of the capital city

Location: Where it is based

Vibe: 2-3 words to describe its vibe

Economy: Main industries

"""

# Output
"""
Name: Selene Prime
Location: South Pole region, Shackleton Crater rim, perched on a permanently sunlit annulus with access to near-surface water ice
Vibe: Luminous, austere, resilient
Economy: Helium-3 and water-ice mining; in-situ resource utilization; lunar manufacturing and assembly; spaceport services and logistics; R&D in lunar science and life-support tech
"""

Structured Output: Instead of passing the output format in the system prompt we can pass the response_format to return output in specific format.

from langchain.agents import create_agent
from langchain.messages import HumanMessage
from pydantic import BaseModel

class CapitalInfo(BaseModel):
    name: str
    location: str
    vibe: str
    economy: str
    
agent = create_Agent(
	mode="gpt-5-nano",
	system_prompt="You are a science fiction writer, create a space capital city at the users request.",
	response_format=CaptialInfo
)

question = HumanMessage("What is the capital of the Moon?")

response = agent.invoke(
	{ "messages": [question]}
)
print(response["structured_response"])
capital_info = response["structured_response"]

capital_name = capital_info.name
capital_location = capital_info.location

print(f"{capital_name} is a city located at {capital_location}")

Tools

An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - i.e., when the model emits a final output or an iteration limit is reached.

%%{
  init: {
    "fontFamily": "monospace",
    "flowchart": {
      "curve": "curve"
    },
    "themeVariables": {"edgeLabelBackground": "transparent"}
  }
}%%
graph TD
  %% Outside the agent
  QUERY([input])
  LLM{model}
  TOOL(tools)
  ANSWER([output])

  %% Main flows (no inline labels)
  QUERY --> LLM
  LLM --"action"--> TOOL
  TOOL --"observation"--> LLM
  LLM --"finish"--> ANSWER

  classDef blueHighlight fill:#0a1c25,stroke:#0a455f,color:#bae6fd;
  classDef greenHighlight fill:#0b1e1a,stroke:#0c4c39,color:#9ce4c4;
  class QUERY blueHighlight;
  class ANSWER blueHighlight;

We can create the tools with the tool decorator and with simple python functions. By default the tool name come from the function name and tool description comes from the function comment definition.

from toolchain.tools import tool

@tool
def square_root(x: float) -> float:
    """Calculate the square root of a number"""
    return x ** 0.5

We can explicitly pass the tool name and description also

@tool("square_root", description="Calculate the square root of a number")
def tool1(x: float) -> float:
    return x ** 0.5

# Invoking the tool with invoke function
tool1.invoke({"x": 467})

Tools example with search capability:

from langchain.tools import tool
from langchain.agents import create_agent
from langchain.messages import HumanMessage

from typing import Dict, Any
from pprint import pprint
from tavily import TavilyClient

tavily_client = TavilyClient()

@tool
def web_search(query: str) -> Dict[str, Any]:
	"""Search the web for the information"""
	return tavily_client.search(query)
	
agent = create_agent(
	model="gpt-5-nano",
	tools=[web_search]
)

question = [HumanMessage("Who is the current mayor of San Francisco?")]
response = agent.invoke({
	"messages": question
})
  

pprint(response['messages'])

Short Term Memory for the Agents

Short term memory lets your application remember previous interactions within a single thread or conversation. A thread organizes multiple interactions in a session, similar to the way email groups messages in a single conversation.

from langgraph.chechpoint.memory import InMemorySaver
from langchain.agents import create_agent
from pprint import pprint
from langhcain.messages import HumanMessage

agent = create_agent("gpt-5-nano", checkpointer=InMemorySaver())

question = HumanMessage(content="Hello my name is Seán and my favourite colour is green")

config = {"configurable": {"thread_id": "1"}}
response = agent.invoke({"messages": [question]}, config)

pprint(response)
question = HumanMessage("What's my favourite colour?")
response = agent.invoke({"messages": [question]}, config)

pprint(respone.messages[-1].content)

Messages

Messages are the fundamental unit of context for models in LangChain. They represent the input and output of models, carrying both the content and metadata needed to represent the state of a conversation when interacting with an LLM. Messages are objects that contain:

  •  Role - Identifies the message type (e.g. systemuser)
  •  Content - Represents the actual content of the message (like text, images, audio, documents, etc.)
  •  Metadata - Optional fields such as response information, message IDs, and token usage

LangChain provides a standard message type that works across all model providers, ensuring consistent behavior regardless of the model being called.

Message Types
  •  System message - Tells the model how to behave and provide context for interactions
  •  Human message - Represents user input and interactions with the model
  •  AI message - Responses generated by the model, including text content, tool calls, and metadata
  •  Tool message - Represents the outputs of tool calls which is part of AIMessage itself as tool_calls. We can explicitly also usage ToolMessage to create tool message object and pass it as conversation
Basic Usages

We can directly import the different create message objects and use it to invoke the model.

  1. HumanMessage
  2. AIMessage
  3. SystemMessage
  4. ToolMessage
Text Prompts

Text prompts are strings - ideal for straightforward generation tasks where you don’t need to retain conversation history. Use text prompts when:

  • You have a single, standalone request
  • You don’t need conversation history
  • You want minimal code complexity​
MultiModel Content Messages

LangChain chat models accept message content in the content attribute.This may contain either:

  1. A string
  2. A list of content blocks in a provider-native format
  3. A list of LangChain’s standard content blocks
from langchain.messages import HumanMessage

# String content
human_message = HumanMessage("Hello, how are you?")

# Provider-native format (e.g., OpenAI)
human_message = HumanMessage(content=[
    {"type": "text", "text": "Hello, how are you?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
])

# List of standard content blocks
human_message = HumanMessage(content_blocks=[
    {"type": "text", "text": "Hello, how are you?"},
    {"type": "image", "url": "https://example.com/image.jpg"},
])

Image Input:

# From URL
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the content of this image."},
        {"type": "image", "url": "https://example.com/path/to/image.jpg"},
    ]
}

# From base64 data
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the content of this image."},
        {
            "type": "image",
            "base64": "AAAAIGZ0eXBtcDQyAAAAAGlzb21tcDQyAAACAGlzb2...",
            "mime_type": "image/jpeg", # "image/png"
        },
    ]
}

# From provider-managed File ID
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the content of this image."},
        {"type": "image", "file_id": "file-abc123"},
    ]
}

Audio Input:

# From base64 data
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the content of this audio."},
        {
            "type": "audio",
            "base64": "AAAAIGZ0eXBtcDQyAAAAAGlzb21tcDQyAAACAGlzb2...",
            "mime_type": "audio/wav",
        },
    ]
}

# From provider-managed File ID
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe the content of this audio."},
        {"type": "audio", "file_id": "file-abc123"},
    ]
}