Skip to Content
API Endpoints

📑 PageIndex API Endpoints

Currently accepts PDF files only (more formats coming soon).

Submit Document for Processing

  • Endpoint: POST https://api.pageindex.ai/doc/
  • Description: Upload a PDF document for processing. The system will automatically process both tree generation and OCR. Immediately returns a document identifier (doc_id) for subsequent operations.

Request Body (multipart/form-data):

  • file (binary, required): PDF document.

Example

import requests api_key = "YOUR_API_KEY" file_path = "./2023-annual-report.pdf" with open(file_path, "rb") as file: response = requests.post( "https://api.pageindex.ai/doc/", headers={"api_key": api_key}, files={"file": file} )

Example Response:

{ "doc_id": "abc123def456" }

Get Processing Status & Results

  • Endpoint: GET https://api.pageindex.ai/doc/{doc_id}/
  • Description: Check processing status and (when complete) get the results for a submitted document.

Parameters (URL Path):

  • doc_id (string, required): Document ID.

Query Parameters:

  • type (string, optional): Result type. Use "tree" for tree structure or "ocr" for OCR results. If not specified, returns the default result type based on the original processing type.
  • format (string, optional): For OCR results, specify output format. Use "page" (default) for page-based results or "node" for node-based results.

Example - Get OCR Results:

import requests api_key = "YOUR_API_KEY" doc_id = "abc123def456" # Get OCR results (default, in page format) response = requests.get( f"https://api.pageindex.ai/doc/{doc_id}/?type=ocr", headers={"api_key": api_key} ) # Get OCR results in node format response = requests.get( f"https://api.pageindex.ai/doc/{doc_id}/?type=ocr&format=node", headers={"api_key": api_key} )

Example - Get Tree Structure:

import requests api_key = "YOUR_API_KEY" doc_id = "abc123def456" response = requests.get( f"https://api.pageindex.ai/doc/{doc_id}/?type=tree", headers={"api_key": api_key} )

Example Response (Tree Processing):

{ "doc_id": "abc123def456", "status": "processing", "retrieval_ready": false }

Example Response (Tree Completed):

{ "doc_id": "abc123def456", "status": "completed", "retrieval_ready": true, "result": [ ... { "title": "Financial Stability", "node_id": "0006", "page_index": 21, "text": "The Federal Reserve maintains financial stability through comprehensive monitoring and regulatory oversight...", "nodes": [ { "title": "Monitoring Financial Vulnerabilities", "node_id": "0007", "page_index": 22, "text": "The Federal Reserve's monitoring focuses on identifying and assessing potential risks..." }, { "title": "Domestic and International Cooperation and Coordination", "node_id": "0008", "page_index": 28, "text": "In 2023, the Federal Reserve collaborated internationally with central banks and regulatory authorities..." } ] } ... ] }

Notes:

  • For tree generation: The "result" field contains the hierarchical tree structure.
  • For OCR processing: The "result" field format depends on the format parameter:
    • "page" (default): List of page objects, each containing page_index, markdown, and images
    • "node": List of node objects, organized by document structure
  • page_index is 1-based (first page is 1).
  • markdown contains the recognized text in markdown format.
  • images is a list of base64-encoded images detected on that page; may be empty.

Delete a PageIndex Document

  • Endpoint: DELETE https://api.pageindex.ai/doc/{doc_id}/
  • Description: Permanently delete a PageIndex document and all its associated data.

Parameters (URL Path):

  • doc_id (string, required): Document ID.

Example:

import requests api_key = "YOUR_API_KEY" doc_id = "abc123def456" response = requests.delete( f"https://api.pageindex.ai/doc/{doc_id}/", headers={"api_key": api_key} )

🔍 PageIndex Retrieval API

Retrieval function requires a completed PageIndex tree generation.

Currently, only single-document retrieval is supported. Multi-document retrieval is coming soon. See also the Doc Search page for document search examples.

Retrieve from a PageIndex Document

  • Endpoint: POST https://api.pageindex.ai/retrieval/
  • Description: Submit a query to create a retrieval task for a specific PageIndex document. Returns a retrieval task ID.

Before Retrieval

Before submitting a retrieval query, you should check if the document is ready for retrieval by checking the retrieval_ready field in the tree endpoint response:

# Check if document is ready for retrieval tree_response = requests.get( f"https://api.pageindex.ai/doc/{doc_id}/?type=tree", headers={"api_key": api_key} ) retrieval_ready = tree_response.json().get("retrieval_ready")

Parameters (in JSON body):

  • doc_id (string, required): The PageIndex document ID to retrieve from.
  • query (string, required): The user question or information need.
  • thinking (boolean, optional): If set to true, the model will first plan what information is required before performing retrieval, helping you gather more comprehensive and relevant information. Default is false.

Example:

import requests api_key = "YOUR_API_KEY" payload = { "doc_id": "abc123def456", "query": "What are the main sources of revenue?", "thinking": False } response = requests.post( "https://api.pageindex.ai/retrieval/", headers={"api_key": api_key}, json=payload )

Example Response:

{ "retrieval_id": "xyz789ghi012" }

Get Retrieval Status & Results

  • Endpoint: GET https://api.pageindex.ai/retrieval/{retrieval_id}/
  • Description: Get the status and, when ready, the result for a specific retrieval query.

Parameters (URL Path):

  • retrieval_id (string, required)

Example:

import requests api_key = "YOUR_API_KEY" retrieval_id = "xyz789ghi012" response = requests.get( f"https://api.pageindex.ai/retrieval/{retrieval_id}/", headers={"api_key": api_key} )

Example Response (Processing):

{ "retrieval_id": "xyz789ghi012", "status": "processing" }

Example Response (Completed):

{ "retrieval_id": "xyz789ghi012", "doc_id": "abc123def456", "status": "completed", "query": "What are the recent trends in the labor market?", "retrieved_nodes": [ { "title": "March 2024 Summary", "node_id": "0005", "relevant_contents": [ { "page_index": 10, "relevant_content": "The labor market has gained averaging 239,000 per month since June 2023..." } ] } ] }

💬 Support

Last updated on