How to use retrieval augmented generation RAG
· Category: AI & Machine Learning
Short answer
Retrieval augmented generation enhances large language models by fetching relevant documents from a knowledge base and conditioning generation on retrieved evidence.
Steps
- Chunk source documents into passages and embed them using a dense retrieval model.
- Store embeddings in a vector database such as FAISS, Pinecone, or Weaviate.
- Encode the user query and retrieve the top-k most similar passages.
- Concatenate retrieved context with the query into a structured prompt.
- Generate the response using an LLM instructed to ground answers in the provided context.
Tips
- Use hybrid retrieval combining dense vectors with keyword search for better coverage.
- Rerank retrieved passages with a cross-encoder to improve relevance.
- Cite sources in generated answers to improve trust and verifiability.
- Update the knowledge base embeddings periodically as source documents change.
Common issues
- Retrieval failures when the query vocabulary differs from indexed documents.
- Context window limits forcing truncation of retrieved evidence.
- Hallucinations when the model ignores retrieved text and relies on parametric memory.
- Latency added by retrieval and reranking steps.
Example
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 10)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
This snippet defines a simple neural network with dropout for regularization, a cross-entropy loss, and the Adam optimizer in PyTorch.