In Depth

Retrieval can be sparse (BM25 keyword matching), dense (embedding-based semantic similarity), or hybrid (combining both). The quality of retrieval — precision, recall, and latency — directly determines the quality of generated answers in RAG systems. Re-ranking models, which re-score retrieved passages by relevance, are often layered on top of initial retrieval to improve precision before passing context to the LLM.