What is PageIndex?
PageIndex is a reasoning-based retrieval system that simulates how human experts navigate and extract knowledge from complex documents. Instead of relying on vector-based semantic search, it transforms documents into a hierarchical tree-structure index and allows the LLM to reason over that structure to retrieve the relevant information. The entire retrieval process is traceable, interpretable, and requires no vector DB and no chunking.
A detailed introduction of the PageIndex framework is available in this blog post .
Services
- PageIndex Chat Platform : A chat platform that allows you to directly analyze multiple long documents with reasoning-based retrieval.
- PageIndex Chat API (beta): API service for PageIndex Chat.
- PageIndex MCP : Integrating PageIndex with your own LLM agents.
Tools
- PageIndex Tree Generation: Generates hierarchical tree indexes for documents.
- PageIndex OCR: An OCR model that preserves the global structure of the document, see this blog for details.
Cookbook
- Vectorless RAG with PageIndex: A quick, hands-on introduction to the PageIndex approach.
- Vision RAG with PageIndex: A simple example of building a vision-only RAG system using PageIndex.
- Agentic Retrieval with Chat API: How to create an agentic retriever by prompting the PageIndex Chat API.
💬 Support
Last updated on