📝 Ideas Behind PageIndex
A selection of posts from the PageIndex blog: technical deep-dives, research, and insights.
Read all blog posts
Framework
The PageIndex framework that uses LLMs to reason over document trees for retrieval, instead of relying on static vector similarity.
Scaling PageIndex’s vectorless retrieval to millions of documents by unifying document trees with file system hierarchies and query trees.
Insights
Why vector retrieval can’t condition on full context — a fundamental limitation of vector RAG systems.
Why technical manuals break conventional RAG, and how PageIndex’s reasoning-based retrieval solves their unique challenges.
What Claude Code teaches us about skipping the vector DB in RAG and letting the LLM itself drive retrieval.
Rethinking the OCR pipeline from an information-theoretic view, and when a direct vision-based approach wins.