Skip to Content
Roadmap

PageIndex Roadmap

Welcome to the PageIndex roadmap! Here’s what we’re building to create the next-generation RAG infrastructure.

🎯

We’re building a reasoning-native RAG system that enables AI to accurately and transparently understand and retrieve information from complex documents.

🧩 Open Source

The open-sourced PageIndex is based on text input. However, extracting high-quality text from PDF documents remains a non-trivial challenge due to layout complexity, encoding issues, and structural ambiguity.

To address this, we’ve introduced the PageIndex OCR service, which leverages the power of vLLM to convert PDFs directly into structured tree representations. This end-to-end solution significantly enhances both the accuracy and efficiency of PageIndex tree generation.

🚀 Recently Shipped

  • PageIndex OCR API
  • PageIndex Tree Generation API
  • PageIndex Retrieval API [Beta]
  • PageIndex Dashboard

🛠 In Progress

  • Multi-doc Retrieval Support
  • Extended File Support
  • Expert Knowledge
  • Enterprise Features
  • Dedicated Database

💬 Contact Us

Last updated on