RAG & LLM System | My Portfolio

Problem

Design a Retrieval-Augmented Generation (RAG) system — an LLM-powered Q&A that grounds answers in a private knowledge base.

Why It Matters for You

Direct relevance: AWS Bedrock LLM chat application at HCLTech. The Flask + Bedrock app you built IS a RAG system. Use as interview war story.

Functional Requirements

Ingest documents from various sources (PDF, web, DB)
Answer user questions using an LLM grounded in ingested docs
Keep knowledge base up to date (re-indexing)

Non-Functional Requirements

Low latency response (< 3s end-to-end)
Accurate retrieval (relevance)
Scalable to large corpora

High-Level Design

Documents → Chunker → Embedder → Vector DB (Pinecone/Weaviate/pgvector)
                                          ↓
User Query → Embedder → Similarity Search → Top-K Chunks
                                          ↓
                             LLM (Bedrock/GPT) + Chunks → Answer

Key Components

Component	Options
Embedding Model	OpenAI ada-002, HuggingFace
Vector DB	Pinecone, Weaviate, pgvector, FAISS
LLM	AWS Bedrock (Claude), GPT-4, Llama
Orchestration	LangChain, LlamaIndex
Chunking Strategy	Fixed size, recursive, semantic

Key Tradeoffs

Chunk size — small = precise retrieval, large = more context
Retrieval — dense (vector similarity) vs sparse (BM25) vs hybrid
Re-ranking — add a cross-encoder after retrieval for better accuracy
Caching — cache embeddings + common query results

Notes