Knowledge Graphs & RAG

date

Oct 15, 2025

slug

kg-rag

status

Published

What is Knowledge Graph (KG)

A knowledge graph (KG) is a directed graph, where nodes represent entities (any objects in the domain) and edges denote labeled relationships between entities. Conceptually, a knowledge graph can be represented as a set of triplets: .

Reasoning on a knowledge graph is typically achieved by traversing along these relations to answer the question of interest.

For example, consider the question: “In which country does Alice work?”

Suppose the KG contains the following triplets:

(Alice,worksAt,Emory)(Alice, worksAt, Emory)(Alice,worksAt,Emory)

(Emory,locatedIn,Atlanta)(Emory, locatedIn, Atlanta)(Emory,locatedIn,Atlanta)

(Atlanta,locatedIn,Georgia)(Atlanta, locatedIn, Georgia)(Atlanta,locatedIn,Georgia)

(Georgia,locatedIn,USA)(Georgia, locatedIn, USA)(Georgia,locatedIn,USA)

Answering this requires multi-hop reasoning: starting from Alice, follow worksAt to Emory, then traverse locatedIn edges step by step until reaching the entity USA. Thus, the answer is USA.

KGs in the context of RAG

In recent years, one of the most prominent directions in knowledge graph (KG) research has been its integration with large language models (LLMs) as an external knowledge source. While naive RAG approaches rely on unstructured text for grounding and can achieve reasonable results, they struggle with queries that require understanding complex relationships between entities or performing multi-hop reasoning, which are capabilities that go beyond surface-level semantic similarity in text.

To address these challenges, KGs, as a structured form of data that explicitly encodes relationships among entities, have been increasingly adopted. By making relationships explicit in the external knowledge source, KGs reduce the burden on LLMs to perform implicit neural reasoning.

Framework of GraphRAG

GraphRAG Framework, image source: Graph Retrieval-Augmented Generation: A Survey [1] — GraphRAG Framework, image source: *Graph Retrieval-Augmented Generation: A Survey [1]*

The figure illustrates the general workflow of GraphRAG. In essence, it performs retrieval over a different type of data compared to vanilla RAG. The key distinction lies in how this data is constructed and how it is retrieved.

Knowledge Graph Construction

GraphRAG pipeline, image source: From Local to Global: A GraphRAG Approach to
Query-Focused Summarization [2] — GraphRAG pipeline, image source: *From Local to Global: A GraphRAG Approach to Query-Focused Summarization [2]*

In the GraphRAG framework, we can either leverage existing knowledge graphs (KGs) or construct a KG directly from textual data. In the latter case, the KG serves as a structured index on top of the text documents. To build a KG from unstructured text, the documents are first chunked, followed by entity recognition to extract entities and relation extraction to identify links among them. After the initial graph is constructed, community detection is applied to group related nodes into communities, thereby forming a hierarchical graph structure.

Knowledge Graph Guided Retrieval

The goal of this stage is to retrieve relevant information from knowledge graphs (KGs) given a textual query. The retrieved results may take the form of nodes, triplets, paths, or subgraphs. Since the input query is in natural language, the first step is to project it into the same representation space used to index the KG elements.

Retrieval approaches can generally be grouped into two categories:

Heuristic-based: rely on graph traversal algorithms or hand-crafted rules (e.g., retrieving the k-hop neighbors of the topic entities [3]).

Learning-based: use language models or graph neural networks to perform dense retrieval, depending on how the KG elements are indexed [4, 5].

After retrieval, additional pruning or reranking is typically applied to refine the results and improve precision.

GraphRAG v.s. (vanilla) RAG

Due to the structured nature of knowledge graphs, GraphRAG is well-suited for tasks that require multi-hop reasoning or global understanding. In contrast, vanilla RAG tends to perform better on tasks that focus on retrieving and grounding single, fine-grained details. [6]

The Overhead incurred by using KGs with RAG

The main overhead of integrating KGs into RAG lies in the cost of KG construction and retrieval. KG construction is often performed offline, but it can still be resource-intensive. To reduce this cost, some approaches replace LLMs with smaller NLP models for entity and relation extraction [7], while others explore alternative indexing strategies to accelerate both index construction and retrieval [8].

References

[1] Graph Retrieval-Augmented Generation: A Survey

[2] From Local to Global: A GraphRAG Approach to Query-Focused Summarization

[3] GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering

[4] Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

[5] GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning

[6] RAG vs. GraphRAG: A Systematic Evaluation and Key Insights

[7] E2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness

[8] Scalable Graph-based Retrieval-Augmented Generation via Locality-Sensitive Hashing