Roadmap

PageIndex Roadmap

Welcome to the PageIndex roadmap! Here’s what we’re building to create the next-generation RAG infrastructure.

🎯

We’re building a reasoning-native RAG system that enables AI to accurately and transparently understand and retrieve information from complex documents.

🧩 Open Source

The open-sourced PageIndex is based on text input. However, extracting high-quality text from PDF documents remains a non-trivial challenge due to layout complexity, encoding issues, and structural ambiguity.

To address this, we’ve introduced the PageIndex OCR service, which leverages the power of vLLM to convert PDFs directly into structured tree representations. This end-to-end solution significantly enhances both the accuracy and efficiency of PageIndex tree generation.

🚀 Recently Shipped

PageIndex OCR API
PageIndex Tree Generation API
PageIndex Retrieval API [Beta]
PageIndex Dashboard

🛠 In Progress

Multi-doc Retrieval Support
Extended File Support
Expert Knowledge
Enterprise Features
Dedicated Database

💬 Contact Us

🤝 Join our Discord
📨 Leave us a message