In Depth
In a RAG pipeline, a query is first converted to an embedding, then used to search a vector database for semantically similar passages. Those passages are injected into the model prompt as context. RAG reduces hallucination, enables citations, and allows knowledge to be updated without retraining the model.