Large Language Models (LLMs) are powerful but have inherent limitations. They can hallucinate, lack up-to-date knowledge, and struggle with domain-specific expertise. To mitigate these issues, two popular approaches have emerged: CAG (Cache-Augmented Generation) and RAG (Retrieval-Augmented Generation). While both enhance LLM performance, they serve different use cases and have distinct advantages and trade-offs.

https://www.youtube.com/watch?v=HdafI0t3sEY

Cache-Augmented Generation

Cache-Augmented Generation (CAG) leverages the expanded context windows of modern LLMs to preload all relevant knowledge into the model before inference. It then precomputes and stores the model’s internal states (key-value, or KV, caches) for this knowledge. When a query arrives, the model processes it using this preloaded cache, eliminating the need for real-time document retrieval.

CAG.png

How CAG Works:

Benefits of CAG:

Limitations:

Retrieval-Augmented Generation (RAG)