Document Processing
DocuTray CLI guide: Document Processing
This guide covers the full document processing workflow — from converting documents to structured data, to identifying document types, and handling asynchronous operations.
Converting documents
The convert command extracts structured data from a document using a specified document type schema.
Basic usage
# Convert a local file
docutray convert invoice.pdf --type electronic-invoice
# Convert from a URL
docutray convert https://example.com/doc.pdf --type electronic-invoiceSynchronous vs asynchronous processing
By default, convert processes synchronously — the command blocks until the result is ready:
docutray convert invoice.pdf --type electronic-invoice
# Waits and returns the extracted data as JSONFor long-running documents, use --async to enable polling with status updates:
docutray convert large-document.pdf --type electronic-invoice --async
# Status updates are emitted to stderr as JSON:
# {"status":"processing"}
# {"status":"processing"}
# Final result is written to stdoutWebhooks
Instead of polling, you can receive a notification when processing completes:
docutray convert invoice.pdf --type electronic-invoice --webhook-url https://example.com/hooks/docutrayAttaching metadata
Attach custom metadata to a conversion for tracking purposes:
docutray convert invoice.pdf --type electronic-invoice --metadata '{"orderId":"ORD-123","source":"email"}'Metadata is stored with the conversion result and included in webhook payloads.
Identifying documents
The identify command analyzes a document and returns the best-matching document type with a confidence score.
docutray identify document.pdfOutput:
{
"document_type": {
"code": "electronic-invoice",
"name": "Electronic Invoice",
"confidence": 0.95
},
"alternatives": [
{
"code": "receipt",
"name": "Receipt",
"confidence": 0.12
}
]
}Restricting to specific types
Narrow identification to a known set of document types:
docutray identify document.pdf --types invoice,receipt,contractTable output
For human-readable output:
docutray identify document.pdf --tablecode name confidence
------------------ ------------------ ----------
electronic-invoice Electronic Invoice 0.95
receipt Receipt 0.12Processing steps
Steps are reusable processing pipelines configured in the DocuTray dashboard.
Running a step
# Run a step and wait for results
docutray steps run extract-fields invoice.pdf
# Run a step on a URL
docutray steps run extract-fields https://example.com/doc.pdfAsync step execution
Start a step and return immediately:
docutray steps run extract-fields invoice.pdf --no-waitOutput:
{
"id": "exec_abc123",
"status": "pending"
}Then check the status later:
docutray steps status exec_abc123Common workflows
Identify then convert
# Identify the document type, then convert using the detected type
TYPE=$(docutray identify document.pdf | jq -r '.document_type.code')
docutray convert document.pdf --type "$TYPE"Batch processing
# Process all PDFs in a directory
for file in documents/*.pdf; do
echo "Processing: $file" >&2
docutray convert "$file" --type electronic-invoice > "results/$(basename "$file" .pdf).json"
doneError handling in scripts
if result=$(docutray convert invoice.pdf --type electronic-invoice 2>/dev/null); then
echo "$result" | jq '.extractedData'
else
echo "Conversion failed" >&2
exit 1
fiOutput format
All commands output JSON to stdout by default. Errors are written to stderr as JSON with an error field:
{
"error": "Document type not found: invalid-type",
"status": 404
}Exit codes:
0— success1— error (details on stderr)