📑 PageIndex API Endpoints
Currently accepts PDF files only (more formats coming soon).
Submit Document for Processing
- Endpoint:
POST
https://api.pageindex.ai/doc/
- Description: Upload a PDF document for processing. The system will automatically process both tree generation and OCR. Immediately returns a document identifier (
doc_id
) for subsequent operations.
Request Body (multipart/form-data):
file
(binary, required): PDF document.
Example
import requests
api_key = "YOUR_API_KEY"
file_path = "./2023-annual-report.pdf"
with open(file_path, "rb") as file:
response = requests.post(
"https://api.pageindex.ai/doc/",
headers={"api_key": api_key},
files={"file": file}
)
Example Response:
{ "doc_id": "abc123def456" }
Get Processing Status & Results
- Endpoint:
GET
https://api.pageindex.ai/doc/{doc_id}/
- Description: Check processing status and (when complete) get the results for a submitted document.
Parameters (URL Path):
doc_id
(string, required): Document ID.
Query Parameters:
type
(string, optional): Result type. Use"tree"
for tree structure or"ocr"
for OCR results. If not specified, returns the default result type based on the original processing type.format
(string, optional): For OCR results, specify output format. Use"page"
(default) for page-based results or"node"
for node-based results.
Example - Get OCR Results:
import requests
api_key = "YOUR_API_KEY"
doc_id = "abc123def456"
# Get OCR results (default, in page format)
response = requests.get(
f"https://api.pageindex.ai/doc/{doc_id}/?type=ocr",
headers={"api_key": api_key}
)
# Get OCR results in node format
response = requests.get(
f"https://api.pageindex.ai/doc/{doc_id}/?type=ocr&format=node",
headers={"api_key": api_key}
)
Example - Get Tree Structure:
import requests
api_key = "YOUR_API_KEY"
doc_id = "abc123def456"
response = requests.get(
f"https://api.pageindex.ai/doc/{doc_id}/?type=tree",
headers={"api_key": api_key}
)
Example Response (Tree Processing):
{
"doc_id": "abc123def456",
"status": "processing",
"retrieval_ready": false
}
Example Response (Tree Completed):
{
"doc_id": "abc123def456",
"status": "completed",
"retrieval_ready": true,
"result": [
...
{
"title": "Financial Stability",
"node_id": "0006",
"page_index": 21,
"text": "The Federal Reserve maintains financial stability through comprehensive monitoring and regulatory oversight...",
"nodes": [
{
"title": "Monitoring Financial Vulnerabilities",
"node_id": "0007",
"page_index": 22,
"text": "The Federal Reserve's monitoring focuses on identifying and assessing potential risks..."
},
{
"title": "Domestic and International Cooperation and Coordination",
"node_id": "0008",
"page_index": 28,
"text": "In 2023, the Federal Reserve collaborated internationally with central banks and regulatory authorities..."
}
]
}
...
]
}
Notes:
- For tree generation: The
"result"
field contains the hierarchical tree structure. - For OCR processing: The
"result"
field format depends on theformat
parameter:"page"
(default): List of page objects, each containingpage_index
,markdown
, andimages
"node"
: List of node objects, organized by document structure
page_index
is 1-based (first page is 1).markdown
contains the recognized text in markdown format.images
is a list of base64-encoded images detected on that page; may be empty.
Delete a PageIndex Document
- Endpoint:
DELETE
https://api.pageindex.ai/doc/{doc_id}/
- Description: Permanently delete a PageIndex document and all its associated data.
Parameters (URL Path):
doc_id
(string, required): Document ID.
Example:
import requests
api_key = "YOUR_API_KEY"
doc_id = "abc123def456"
response = requests.delete(
f"https://api.pageindex.ai/doc/{doc_id}/",
headers={"api_key": api_key}
)
🔍 PageIndex Retrieval API
Retrieval function requires a completed PageIndex tree generation.
Currently, only single-document retrieval is supported. Multi-document retrieval is coming soon. See also the Doc Search page for document search examples.
Retrieve from a PageIndex Document
- Endpoint:
POST
https://api.pageindex.ai/retrieval/
- Description: Submit a query to create a retrieval task for a specific PageIndex document. Returns a retrieval task ID.
Before Retrieval
Before submitting a retrieval query, you should check if the document is ready for retrieval by checking the retrieval_ready
field in the tree endpoint response:
# Check if document is ready for retrieval
tree_response = requests.get(
f"https://api.pageindex.ai/doc/{doc_id}/?type=tree",
headers={"api_key": api_key}
)
retrieval_ready = tree_response.json().get("retrieval_ready")
Parameters (in JSON body):
doc_id
(string, required): The PageIndex document ID to retrieve from.query
(string, required): The user question or information need.thinking
(boolean, optional): If set totrue
, the model will first plan what information is required before performing retrieval, helping you gather more comprehensive and relevant information. Default isfalse
.
Example:
import requests
api_key = "YOUR_API_KEY"
payload = {
"doc_id": "abc123def456",
"query": "What are the main sources of revenue?",
"thinking": False
}
response = requests.post(
"https://api.pageindex.ai/retrieval/",
headers={"api_key": api_key},
json=payload
)
Example Response:
{
"retrieval_id": "xyz789ghi012"
}
Get Retrieval Status & Results
- Endpoint:
GET
https://api.pageindex.ai/retrieval/{retrieval_id}/
- Description: Get the status and, when ready, the result for a specific retrieval query.
Parameters (URL Path):
retrieval_id
(string, required)
Example:
import requests
api_key = "YOUR_API_KEY"
retrieval_id = "xyz789ghi012"
response = requests.get(
f"https://api.pageindex.ai/retrieval/{retrieval_id}/",
headers={"api_key": api_key}
)
Example Response (Processing):
{
"retrieval_id": "xyz789ghi012",
"status": "processing"
}
Example Response (Completed):
{
"retrieval_id": "xyz789ghi012",
"doc_id": "abc123def456",
"status": "completed",
"query": "What are the recent trends in the labor market?",
"retrieved_nodes": [
{
"title": "March 2024 Summary",
"node_id": "0005",
"relevant_contents": [
{
"page_index": 10,
"relevant_content": "The labor market has gained averaging 239,000 per month since June 2023..."
}
]
}
]
}