How to use retrieval augmented generation RAG

· Category: AI & Machine Learning

Short answer

Retrieval augmented generation enhances large language models by fetching relevant documents from a knowledge base and conditioning generation on retrieved evidence.

Steps

  1. Chunk source documents into passages and embed them using a dense retrieval model.
  2. Store embeddings in a vector database such as FAISS, Pinecone, or Weaviate.
  3. Encode the user query and retrieve the top-k most similar passages.
  4. Concatenate retrieved context with the query into a structured prompt.
  5. Generate the response using an LLM instructed to ground answers in the provided context.

Tips

  • Use hybrid retrieval combining dense vectors with keyword search for better coverage.
  • Rerank retrieved passages with a cross-encoder to improve relevance.
  • Cite sources in generated answers to improve trust and verifiability.
  • Update the knowledge base embeddings periodically as source documents change.

Common issues

  • Retrieval failures when the query vocabulary differs from indexed documents.
  • Context window limits forcing truncation of retrieved evidence.
  • Hallucinations when the model ignores retrieved text and relies on parametric memory.
  • Latency added by retrieval and reranking steps.

Example

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(256, 10)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

This snippet defines a simple neural network with dropout for regularization, a cross-entropy loss, and the Adam optimizer in PyTorch.