Skip to Content
JavaScript SDKDocument Processing

🌲 PageIndex Document Processing

PageIndex generates a hierarchical “table of contents” tree that maintains the original document’s logical flow and organizational structure. This LLM-optimized “table of contents” enables precise navigation and is ready for reasoning-based RAG. See our cookbook for a practical example.

Currently accepts PDF files only (more formats coming soon).

Submit Document for Processing

  • Uploads a PDF document to generate a PageIndex hierarchical tree.
  • Returns a document identifier (doc_id) for subsequent operations.

Parameters

NameTypeRequiredDescriptionDefault
fileBlob | Buffer | ArrayBufferyesFile content to upload-
fileNamestringyesName of the file-
options.modestringnoProcessing mode-
options.folderIdstringnoTarget folder ID. Falls back to folderScope if not set-

Example Request

import { readFileSync } from "fs"; const file = readFileSync("./2023-annual-report.pdf"); const result = await client.api.submitDocument(file, "2023-annual-report.pdf"); const doc_id = result.doc_id;

Example Response

{ "doc_id": "pi-abc123def456" }

You can organize documents into Folders for better workspace management (Max plan).


Get Processing Status & Tree Structure

Check processing status and (when complete) get the PageIndex tree for a submitted document.

Parameters

NameTypeRequiredDescriptionDefault
docIdstringyesDocument ID-
options.nodeSummarybooleannoInclude node summary for each node in responsefalse

Example Request

const tree = await client.api.getTree(doc_id); if (tree.status === "completed") { console.log("PageIndex Tree Structure:", tree.result); }

Example Response (Processing):

{ "doc_id": "pi-abc123def456", "status": "processing" }

Example Response (Completed):

{ "doc_id": "pi-abc123def456", "status": "completed", "result": [ { "title": "Financial Stability", "node_id": "0006", "page_index": 21, "text": "The Federal Reserve maintains financial stability through comprehensive monitoring and regulatory oversight...", "nodes": [ { "title": "Monitoring Financial Vulnerabilities", "node_id": "0007", "page_index": 22, "text": "The Federal Reserve's monitoring focuses on identifying and assessing potential risks..." }, { "title": "Domestic and International Cooperation and Coordination", "node_id": "0008", "page_index": 28, "text": "In 2023, the Federal Reserve collaborated internationally with central banks and regulatory authorities..." } ] } ] }

Get Document Metadata

Retrieve document information including processing status, page count, and creation time.

Parameters

NameTypeRequiredDescription
docIdstringyesDocument ID

Example Request

const doc = await client.api.getDocument(doc_id); console.log(`Document: ${doc.name}`); console.log(`Status: ${doc.status}`); console.log(`Pages: ${doc.pageNum}`);

Example Response

{ "id": "pi-abc123def456", "name": "2023-annual-report.pdf", "description": "Annual Report 2023", "status": "completed", "createdAt": "2024-01-15T10:30:00.000Z", "pageNum": 42, "folderId": "folder-123" }
FieldTypeDescription
idstringDocument ID
namestringDocument filename
descriptionstringDocument description
statusstring"queued" | "processing" | "completed" | "failed"
createdAtstringISO 8601 timestamp
pageNumnumberNumber of pages
folderIdstringFolder ID (if assigned)

List Documents

Retrieve a paginated list of all documents, ordered by creation date (newest first).

Parameters

NameTypeRequiredDescriptionDefault
options.limitnumbernoMaximum documents to return (1–100)50
options.offsetnumbernoNumber of documents to skip0
options.folderIdstringnoFilter by folder ID-

Example

const result = await client.api.listDocuments({ limit: 10 }); console.log(`Total: ${result.total}`); for (const doc of result.documents) { console.log(`${doc.name} (${doc.status})`); } // Next page const page2 = await client.api.listDocuments({ limit: 10, offset: 10 });

Response

{ "documents": [ { "id": "pi-abc123def456", "name": "2023-annual-report.pdf", "status": "completed", "createdAt": "2024-01-15T10:30:00.000Z", "pageNum": 42 } ], "total": 25, "limit": 10, "offset": 0 }

Delete a Document

Permanently delete a PageIndex document and all its associated data.

Parameters

NameTypeRequiredDescription
docIdstringyesDocument ID

Example Request

await client.api.deleteDocument(doc_id);

💬 Community & Support

Last updated on