Question 1

Why is RAG useful?

Accepted Answer

RAG allows AI to answer questions about specific documents, internal company knowledge, recent information, or proprietary data without needing to retrain the model. It also reduces hallucinations because the model can cite specific source documents. It is the standard pattern for "chat with your documents" applications.

Question 2

How does RAG work?

Accepted Answer

Three steps: (1) Retrieval — given a user query, the system searches a vector database for the most relevant documents; (2) Augmentation — the retrieved documents are added to the model context along with the user query; (3) Generation — the model produces an answer grounded in the retrieved context.

Question 3

What is a vector database?

Accepted Answer

A vector database stores documents as numerical vectors (embeddings) that represent semantic meaning. Queries are also embedded and compared against stored vectors to find similar documents. Examples: Pinecone, Weaviate, Chroma, Qdrant.

Question 4

When should I NOT use RAG?

Accepted Answer

When the data is small enough to fit directly in the model context. Modern models (Claude Opus 4 with 1M context, Gemini 2.5 Pro) can handle 200-1000 page documents directly. For small knowledge bases, direct context inclusion outperforms RAG. RAG matters when you have hundreds of thousands of documents.

Question 5

Is RAG still relevant with long-context models?

Accepted Answer

Yes, but the threshold has shifted. RAG is now most useful for: (1) very large knowledge bases (millions of documents), (2) frequently-updated data, (3) cost-sensitive applications where putting everything in context is too expensive, (4) hybrid systems that need both retrieval and generation.

Question 6

What is the difference between RAG and fine-tuning?

Accepted Answer

Fine-tuning permanently changes model weights based on training data — best for teaching the model a domain or style. RAG provides knowledge dynamically at query time without changing the model — best for factual lookup. They are complementary; production systems often use both.

What Is RAG (Retrieval-Augmented Generation)?

Frequently Asked Questions