Skip to Content

📑 PageIndex OCR

This documentation provides class methods, parameters, and sample responses for the PageIndex OCR SDK.


Submit Document for OCR Processing

  • Upload a PDF file for OCR processing.
  • Return a document identifier (doc_id) for subsequent operations.

Parameters

NameTypeRequiredDescription
file_pathstringyesLocal path to the file

Example Request

result = pi_client.submit_document("./sample.pdf") doc_id = result["doc_id"]

Example Response

{ "doc_id": "abc123def456" }

Get OCR Processing Status & Results

Check processing status and (when complete) get the OCR results for a submitted document.

Parameters

NameTypeRequiredDescriptionDefault
doc_idstringyesDocument ID-
formatstringnoOutput format: “page” or “node""page”

Format Options:

  • "page" (default): Returns results organized by page, with each page containing markdown content and images
  • "node": Returns results organized by document nodes/sections, preserving the hierarchical structure

Example Request

# Get OCR results in page format (default) ocr_result = pi_client.get_ocr(doc_id) if ocr_result.get("status") == "completed": print("OCR Results:", ocr_result.get("result")) # Get OCR results in node format ocr_result = pi_client.get_ocr(doc_id, format="node") if ocr_result.get("status") == "completed": print("OCR Results:", ocr_result.get("result"))

Example Response (Processing):

{ "doc_id": "abc123def456", "status": "processing" }

Example Response (Completed):

{ "doc_id": "abc123def456", "status": "completed", "result": [ { "page_index": 1, "markdown": "Content from page 1 in markdown format", "images": [ "iVBORw0KGgoAAAANSUhEUgAA...", // Base64-encoded image(s) "iVBORw0KGgoAAAANSUhEUgAB..." // More images if present ] }, { "page_index": 2, "markdown": "Content from page 2 in markdown format", "images": [] } ] }

Each page object includes:

  • page_index (1-based)
  • markdown (OCR-formatted markdown text)
  • images (array of base64-encoded images; may be empty)

Delete an OCR Document

  • Remove a previously uploaded OCR document.

Parameters

NameTypeRequiredDescription
doc_idstringyesDocument ID

Example Request

pi_client.delete_document(doc_id)

💬 Support

Last updated on