Knowledge Graphs & RAG
date
Oct 15, 2025
slug
kg-rag
status
Published
tags
ML
summary
type
Post
What is Knowledge Graph (KG)
A knowledge graph (KG) is a directed graph, where nodes represent entities (any objects in the domain) and edges denote labeled relationships between entities. Conceptually, a knowledge graph can be represented as a set of triplets: .
Reasoning on a knowledge graph is typically achieved by traversing along these relations to answer the question of interest.
For example, consider the question: “In which country does Alice work?”
Suppose the KG contains the following triplets:
- (Alice,worksAt,Emory)(Alice, worksAt, Emory)(Alice,worksAt,Emory)
- (Emory,locatedIn,Atlanta)(Emory, locatedIn, Atlanta)(Emory,locatedIn,Atlanta)
- (Atlanta,locatedIn,Georgia)(Atlanta, locatedIn, Georgia)(Atlanta,locatedIn,Georgia)
- (Georgia,locatedIn,USA)(Georgia, locatedIn, USA)(Georgia,locatedIn,USA)
Answering this requires multi-hop reasoning: starting from Alice, follow
worksAt
to Emory, then traverse locatedIn
edges step by step until reaching the entity USA. Thus, the answer is USA.KGs in the context of RAG
In recent years, one of the most prominent directions in knowledge graph (KG) research has been its integration with large language models (LLMs) as an external knowledge source. While naive RAG approaches rely on unstructured text for grounding and can achieve reasonable results, they struggle with queries that require understanding complex relationships between entities or performing multi-hop reasoning, which are capabilities that go beyond surface-level semantic similarity in text.
To address these challenges, KGs, as a structured form of data that explicitly encodes relationships among entities, have been increasingly adopted. By making relationships explicit in the external knowledge source, KGs reduce the burden on LLMs to perform implicit neural reasoning.
Framework of GraphRAG
![GraphRAG Framework, image source: Graph Retrieval-Augmented Generation: A Survey [1]](https://www.notion.so/image/attachment%3A3eb2e430-3b0e-4a5c-b942-85e91c73e6e3%3Aimage.png?table=block&id=28da7f6f-840c-8024-9b6e-f866ee193cd3&cache=v2)
The figure illustrates the general workflow of GraphRAG. In essence, it performs retrieval over a different type of data compared to vanilla RAG. The key distinction lies in how this data is constructed and how it is retrieved.
Knowledge Graph Construction
![GraphRAG pipeline, image source: From Local to Global: A GraphRAG Approach to
Query-Focused Summarization [2]](https://www.notion.so/image/attachment%3A5e919b03-3e57-4571-b12b-5b94a2e7a1a9%3Aimage.png?table=block&id=28da7f6f-840c-802f-b1b0-d03acef06a65&cache=v2)
In the GraphRAG framework, we can either leverage existing knowledge graphs (KGs) or construct a KG directly from textual data. In the latter case, the KG serves as a structured index on top of the text documents. To build a KG from unstructured text, the documents are first chunked, followed by entity recognition to extract entities and relation extraction to identify links among them. After the initial graph is constructed, community detection is applied to group related nodes into communities, thereby forming a hierarchical graph structure.
Knowledge Graph Guided Retrieval
The goal of this stage is to retrieve relevant information from knowledge graphs (KGs) given a textual query. The retrieved results may take the form of nodes, triplets, paths, or subgraphs. Since the input query is in natural language, the first step is to project it into the same representation space used to index the KG elements.
Retrieval approaches can generally be grouped into two categories:
- Heuristic-based: rely on graph traversal algorithms or hand-crafted rules (e.g., retrieving the k-hop neighbors of the topic entities [3]).
- Learning-based: use language models or graph neural networks to perform dense retrieval, depending on how the KG elements are indexed [4, 5].
After retrieval, additional pruning or reranking is typically applied to refine the results and improve precision.
GraphRAG v.s. (vanilla) RAG
Due to the structured nature of knowledge graphs, GraphRAG is well-suited for tasks that require multi-hop reasoning or global understanding. In contrast, vanilla RAG tends to perform better on tasks that focus on retrieving and grounding single, fine-grained details. [6]
The Overhead incurred by using KGs with RAG
The main overhead of integrating KGs into RAG lies in the cost of KG construction and retrieval. KG construction is often performed offline, but it can still be resource-intensive. To reduce this cost, some approaches replace LLMs with smaller NLP models for entity and relation extraction [7], while others explore alternative indexing strategies to accelerate both index construction and retrieval [8].