Large language models are impressive, but they have a fundamental limitation in that they only know what they were trained on. Ask a model about something that happened after its training cutoff, or about a document sitting in your company’s internal knowledge base, and it either makes something up or tells you it doesn’t know.
Retrieval-augmented generation, almost always shortened to RAG, is the approach the industry has settled on to fix this.
The idea is pretty straightforward. Instead of relying purely on what the model has memorized, you give it the ability to pull in relevant information from an external source, then use that information to generate a response.