Build a complete retrieval-augmented generation pipeline in Python. You will chunk documents, generate embeddings, implement cosine similarity search, retrieve relevant context for a query, and format a prompt ready for an LLM. The entire pipeline runs in about 50 lines of core logic.
You need Python 3.8 or later and two packages:
pip install sentence-transformers numpy
sentence-transformers provides pre-trained embedding models. numpy handles the vector math. No vector database is needed – you will implement similarity search directly.
Create a file called rag_lab.py.
Start with a small knowledge base of hardcoded documents on distinct topics. In a production system, these would come from files, databases, or APIs.
import numpy as np
from sentence_transformers import SentenceTransformer
# Source documents -- each is a short passage on a different topic
documents = [
{
"title": "Photosynthesis",
"text": (
"Photosynthesis is the process by which green plants and certain "
"other organisms convert light energy into chemical energy. During "
"photosynthesis, plants capture sunlight using chlorophyll in their "
"leaves and use it to transform water and carbon dioxide into oxygen "
"and glucose. The glucose provides energy for the plant's metabolic "
"processes. Photosynthesis is responsible for producing the oxygen "
"in Earth's atmosphere."
),
},
{
"title": "TCP/IP",
"text": (
"TCP/IP is the foundational protocol suite for the internet. TCP "
"(Transmission Control Protocol) provides reliable, ordered delivery "
"of data between applications. IP (Internet Protocol) handles "
"addressing and routing packets across networks. Together they enable "
"applications to communicate across diverse networks without needing "
"to know the underlying hardware. TCP uses a three-way handshake "
"(SYN, SYN-ACK, ACK) to establish connections and sequence numbers "
"to ensure packets arrive in order."
),
},
{
"title": "The French Revolution",
"text": (
"The French Revolution began in 1789 with the storming of the "
"Bastille and ended in 1799 with Napoleon's rise to power. It was "
"driven by widespread discontent with the monarchy, economic hardship, "
"and Enlightenment ideals of liberty and equality. The revolution "
"abolished feudalism, established the Declaration of the Rights of Man "
"and Citizen, and fundamentally transformed French governance. The Reign "
"of Terror under Robespierre saw thousands executed by guillotine."
),
},
{
"title": "Hash Functions",
"text": (
"A cryptographic hash function takes an arbitrary-length input and "
"produces a fixed-size output called a digest. Good hash functions "
"have three properties: preimage resistance (cannot reverse the hash), "
"second preimage resistance (cannot find a different input with the "
"same hash), and collision resistance (cannot find any two inputs with "
"the same hash). SHA-256 produces a 256-bit digest and is widely used "
"in TLS certificates, git, and blockchain systems."
),
},
{
"title": "Mitochondria",
"text": (
"Mitochondria are membrane-bound organelles found in the cytoplasm "
"of eukaryotic cells. They generate most of the cell's supply of "
"adenosine triphosphate (ATP), the molecule used as energy currency. "
"Mitochondria have their own DNA, separate from nuclear DNA, which "
"supports the endosymbiotic theory that they originated as free-living "
"bacteria that were engulfed by ancestral eukaryotic cells. Each cell "
"can contain hundreds to thousands of mitochondria depending on its "
"energy demands."
),
},
]
print(f"Loaded {len(documents)} documents")
Split each document into smaller chunks. For this lab, use simple sentence-based chunking. Production systems use more sophisticated strategies (recursive, semantic), but sentence splitting demonstrates the principle.
def chunk_documents(docs, max_chunk_length=200):
"""Split documents into chunks by sentence boundaries."""
chunks = []
for doc in docs:
sentences = doc["text"].replace(". ", ".\n").split("\n")
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) > max_chunk_length and current_chunk:
chunks.append({
"title": doc["title"],
"text": current_chunk.strip(),
})
current_chunk = sentence
else:
current_chunk += (" " if current_chunk else "") + sentence
if current_chunk.strip():
chunks.append({
"title": doc["title"],
"text": current_chunk.strip(),
})
return chunks
chunks = chunk_documents(documents)
print(f"Created {len(chunks)} chunks\n")
for i, chunk in enumerate(chunks):
print(f" Chunk {i}: [{chunk['title']}] {chunk['text'][:60]}...")
Loaded 5 documents
Created 10 chunks
Chunk 0: [Photosynthesis] Photosynthesis is the process by which green plants an...
Chunk 1: [Photosynthesis] The glucose provides energy for the plant's metabolic p...
Chunk 2: [TCP/IP] TCP/IP is the foundational protocol suite for the internet...
Chunk 3: [TCP/IP] Together they enable applications to communicate across div...
Chunk 4: [The French Revolution] The French Revolution began in 1789 with the storming o...
Chunk 5: [The French Revolution] The revolution abolished feudalism, established the Decla...
Chunk 6: [Hash Functions] A cryptographic hash function takes an arbitrary-length in...
Chunk 7: [Hash Functions] SHA-256 produces a 256-bit digest and is widely used in T...
Chunk 8: [Mitochondria] Mitochondria are membrane-bound organelles found in the c...
Chunk 9: [Mitochondria] Mitochondria have their own DNA, separate from nuclear DN...
Each document has been split into two or more chunks at sentence boundaries, keeping each chunk under 200 characters.
Use a pre-trained embedding model to convert each chunk into a dense vector. The first run downloads the model (about 90 MB).
print("\nLoading embedding model...")
model = SentenceTransformer("all-MiniLM-L6-v2")
# Embed all chunks
chunk_texts = [c["text"] for c in chunks]
embeddings = model.encode(chunk_texts, show_progress_bar=False)
print(f"Embedding shape: {embeddings.shape}")
print(f"Each chunk is now a {embeddings.shape[1]}-dimensional vector")
Loading embedding model...
Embedding shape: (10, 384)
Each chunk is now a 384-dimensional vector
Each chunk is now represented as a 384-dimensional vector. Chunks with similar meaning will have vectors that are close together in this space.
Build the retrieval function. Cosine similarity measures the angle between two vectors – 1.0 means identical direction, 0.0 means orthogonal.
def cosine_similarity(a, b):
"""Compute cosine similarity between two vectors."""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def retrieve(query, model, embeddings, chunks, top_k=3):
"""Embed a query and return the top-k most similar chunks."""
query_embedding = model.encode([query], show_progress_bar=False)[0]
# Compute similarity against all chunks
similarities = []
for i, emb in enumerate(embeddings):
sim = cosine_similarity(query_embedding, emb)
similarities.append((sim, i))
# Sort by similarity (descending) and return top-k
similarities.sort(reverse=True)
results = []
for sim, idx in similarities[:top_k]:
results.append({
"score": float(sim),
"title": chunks[idx]["title"],
"text": chunks[idx]["text"],
})
return results
Run a few queries and inspect the results.
queries = [
"How do plants produce energy from sunlight?",
"How does TCP establish a connection?",
"What is SHA-256 used for?",
]
for query in queries:
print(f"\nQuery: {query}")
results = retrieve(query, model, embeddings, chunks, top_k=3)
for rank, r in enumerate(results):
print(f" {rank + 1}. [{r['title']}] score={r['score']:.4f}")
print(f" {r['text'][:80]}...")
Query: How do plants produce energy from sunlight?
1. [Photosynthesis] score=0.7234
Photosynthesis is the process by which green plants and certain other organisms ...
2. [Photosynthesis] score=0.5102
The glucose provides energy for the plant's metabolic processes...
3. [Mitochondria] score=0.2841
Mitochondria are membrane-bound organelles found in the cytoplasm of eukaryotic...
Query: How does TCP establish a connection?
1. [TCP/IP] score=0.7518
Together they enable applications to communicate across diverse networks without...
2. [TCP/IP] score=0.6293
TCP/IP is the foundational protocol suite for the internet...
3. [Hash Functions] score=0.1204
A cryptographic hash function takes an arbitrary-length input and produces a fix...
Query: What is SHA-256 used for?
1. [Hash Functions] score=0.6847
SHA-256 produces a 256-bit digest and is widely used in TLS certificates, git, ...
2. [Hash Functions] score=0.5231
A cryptographic hash function takes an arbitrary-length input and produces a fix...
3. [TCP/IP] score=0.1156
TCP/IP is the foundational protocol suite for the internet...
The retriever correctly identifies relevant chunks for each query. Note that the scores for irrelevant chunks are much lower, showing clear separation between relevant and irrelevant results.
Build a prompt that injects the retrieved context, ready to send to an LLM.
def build_rag_prompt(query, results):
"""Format a RAG prompt with retrieved context."""
context_parts = []
for i, r in enumerate(results):
context_parts.append(f"[Source {i + 1}: {r['title']}]\n{r['text']}")
context = "\n\n".join(context_parts)
prompt = (
"Answer the question based only on the provided context. "
"If the context does not contain enough information, say so.\n\n"
f"Context:\n{context}\n\n"
f"Question: {query}\n\n"
"Answer:"
)
return prompt
# Build and display a RAG prompt
query = "How do plants convert sunlight into energy?"
results = retrieve(query, model, embeddings, chunks, top_k=2)
prompt = build_rag_prompt(query, results)
print("\nFull RAG Prompt:")
print("=" * 60)
print(prompt)
print("=" * 60)
print(f"\nPrompt length: {len(prompt)} characters")
Full RAG Prompt:
============================================================
Answer the question based only on the provided context. If the context does not contain enough information, say so.
Context:
[Source 1: Photosynthesis]
Photosynthesis is the process by which green plants and certain other organisms convert light energy into chemical energy. During photosynthesis, plants capture sunlight using chlorophyll in their leaves and use it to transform water and carbon dioxide into oxygen and glucose.
[Source 2: Photosynthesis]
The glucose provides energy for the plant's metabolic processes. Photosynthesis is responsible for producing the oxygen in Earth's atmosphere.
Question: How do plants convert sunlight into energy?
Answer:
============================================================
Prompt length: 612 characters
This prompt follows the standard RAG pattern: explicit instruction to use only the context, the retrieved chunks with source attribution, and the user’s question. An LLM receiving this prompt would generate an answer grounded in the retrieved text rather than relying on its training data.
If you have an OpenAI API key, you can complete the pipeline by sending the prompt to a model. This step is optional – the pipeline is complete without it.
# Uncomment the following to call the OpenAI API:
#
# import os
# from openai import OpenAI
#
# client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
#
# response = client.chat.completions.create(
# model="gpt-4o-mini",
# messages=[{"role": "user", "content": prompt}],
# max_tokens=200,
# )
#
# print("\nLLM Response:")
# print(response.choices[0].message.content)
This lab built a complete RAG pipeline from scratch:
The entire retrieval pipeline – from query to formatted prompt – runs in milliseconds on a laptop with no external services. Production systems add vector databases for scale, reranking for precision, and hybrid search for exact-match queries, but the core logic is the same.