Python SDK
Official Python SDK for the DocuTray API — OCR conversion, document identification, data extraction, and knowledge bases for Python 3.10+.
The official Python library for the DocuTray API, providing access to document processing capabilities including OCR, document identification, data extraction, and knowledge bases.
Installation
pip install docutrayRequires Python 3.10+.
Quick Start
Synchronous Usage
from pathlib import Path
from docutray import Client
client = Client(api_key="your-api-key")
# Convert a document
result = client.convert.run(
file=Path("invoice.pdf"),
document_type_code="invoice"
)
print(result.data)
client.close()Asynchronous Usage
import asyncio
from pathlib import Path
from docutray import AsyncClient
async def main():
async with AsyncClient(api_key="your-api-key") as client:
result = await client.convert.run(
file=Path("invoice.pdf"),
document_type_code="invoice"
)
print(result.data)
asyncio.run(main())Configuration
# Via constructor
client = Client(api_key="your-api-key")
# Via environment variable (DOCUTRAY_API_KEY)
client = Client()Resources
Client
The main entry points for the SDK:
Client— Synchronous clientAsyncClient— Asynchronous client
API Resources
- Convert — Document conversion and data extraction
- Identify — Automatic document type identification
- DocumentTypes — Document type catalog and schema validation
- Steps — Workflow step execution
- KnowledgeBases — Knowledge base management and semantic search
Error Handling
- Exception Hierarchy — Comprehensive error classes with status-specific exceptions
Types
Response and model types:
Shared Types
Shared TypeScript types in the Docutray Node.js SDK — upload MIME types, pagination wrappers, rate-limit info, and error-detail models with fields.
Client
Synchronous and asynchronous Docutray Python clients — configuration, auth, and typed methods for convert, identify, and knowledge-base operations.