# Docutray - Full Documentation

> Docutray is a document processing platform that converts any document
> into structured data using AI-powered OCR, with validation workflows
> and a multi-tenant REST API.

# Getting Started (https://docs.docutray.com/docs/getting-started)















This guide will help you get started with DocuTray quickly.

## Creating an Account [#creating-an-account]

To start using DocuTray, you need to create an account by following these steps:

1. Visit the registration page at [https://app.docutray.com/register](https://app.docutray.com/register)

<img alt="DocuTray Registration Page" src="__img0" />

2. Complete the required fields:
   * **Full Name**: Enter your first and last name
   * **Email**: Use a valid email address
   * **Password**: Create a secure password (minimum 8 characters)
   * **Confirm Password**: Repeat the password for verification

<img alt="Completed Registration Form" src="__img1" />

3. Click the "Register" button

4. You will receive a confirmation email at the provided address. Open this email and click the verification link.

<img alt="Verification Email" src="__img2" />

5. Done! Once your account is verified, you can log in and start using DocuTray.

If you already have an account, you can go directly to the [login page](https://app.docutray.com/login).

## Creating an API Key [#creating-an-api-key]

After creating your account, you can generate an API Key to integrate DocuTray with your applications by following these steps:

1. Log in to your DocuTray account at [https://app.docutray.com/login](https://app.docutray.com/login)

2. Select the organization you want to work with

3. Navigate to "Account" > "API Keys" in the navigation menu

<img alt="API Keys Menu" src="__img3" />

4. Click the "New API Key" button

5. Enter a descriptive name for your API Key and click "Create"

<img alt="Create New API Key" src="__img4" />

6. Copy the generated API Key and store it in a safe place. **Important**: This will be the only time you can see the complete key.

<img alt="Copy API Key" src="__img5" />

You can now use this API Key to authenticate your requests to the DocuTray API.

## Your First Conversion [#your-first-conversion]

Once you have your API Key, you can process your first document with a simple API call.

### Supported File Formats [#supported-file-formats]

DocuTray supports the following file formats:

* **Images**: JPEG, PNG, GIF, BMP, WebP
* **Documents**: PDF (up to 100MB)

### Install the SDK [#install-the-sdk]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```bash
    pip install docutray
    ```
  </Tab>

  <Tab value="Node.js">
    ```bash
    npm install docutray
    ```
  </Tab>

  <Tab value="cURL">
    No installation required — cURL is available on most systems.
  </Tab>
</Tabs>

### Making the API Call [#making-the-api-call]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    result = client.convert.run(
        file=Path("invoice.pdf"),
        document_type_code="invoice"
    )

    print(result.data)
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    const result = await client.convert.run({
      file: readFileSync('invoice.pdf'),
      documentTypeCode: 'invoice',
    });

    console.log(result.data);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@invoice.pdf" \
      -F "document_type_code=invoice"
    ```
  </Tab>
</Tabs>

### Response [#response]

The API returns extracted data in JSON format according to your document type schema:

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    # result.data contains the extracted fields
    {
        "numero_factura": "F-2024-001",
        "fecha_emision": "2024-01-15",
        "rfc_emisor": "XAXX010101000",
        "razon_social_emisor": "Empresa Ejemplo S.A. de C.V.",
        "subtotal": 1000,
        "iva": 160,
        "total": 1160
    }
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // result.data contains the extracted fields
    {
      numero_factura: 'F-2024-001',
      fecha_emision: '2024-01-15',
      rfc_emisor: 'XAXX010101000',
      razon_social_emisor: 'Empresa Ejemplo S.A. de C.V.',
      subtotal: 1000,
      iva: 160,
      total: 1160
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```json
    {
      "data": {
        "numero_factura": "F-2024-001",
        "fecha_emision": "2024-01-15",
        "rfc_emisor": "XAXX010101000",
        "razon_social_emisor": "Empresa Ejemplo S.A. de C.V.",
        "subtotal": 1000,
        "iva": 160,
        "total": 1160
      }
    }
    ```
  </Tab>
</Tabs>

> **Tip**: For large files or batch processing, use async conversion which processes documents in the background:

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    status = client.convert.run_async(
        file=Path("large_document.pdf"),
        document_type_code="invoice"
    )
    # Poll for completion
    final = status.wait()
    print(final.data)
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const status = await client.convert.runAsync({
      file: readFileSync('large_document.pdf'),
      documentTypeCode: 'invoice',
    });
    // Poll for completion
    const final = await status.wait();
    console.log(final.data);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Start async conversion
    curl -X POST https://app.docutray.com/api/convert-async \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@large_document.pdf" \
      -F "document_type_code=invoice"

    # Check status with the returned conversion_id
    curl https://app.docutray.com/api/convert-async/CONVERSION_ID \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Next Steps [#next-steps]

Now that you've completed your first conversion, explore these resources:

* **[Document Types](/docs/document-types)** — Browse available document types and their schemas
* **[API Reference](/docs/api)** — Complete API documentation with all endpoints
* **[Python SDK](/docs/python-sdk)** — Python SDK reference and guides
* **[Node.js SDK](/docs/node-sdk)** — Node.js SDK reference and guides
* **[Webhooks](/docs/webhooks)** — Set up webhooks to receive conversion results automatically
* **[Guides](/docs/guides)** — Step-by-step tutorials for common use cases


---

# API Reference (https://docs.docutray.com/docs/api)



The Docutray API provides a complete set of endpoints for document processing, type management, and workflow automation.

## Authentication [#authentication]

All API requests require authentication using an API Key in the `Authorization` header:

```bash
Authorization: Bearer YOUR_API_KEY
```

You can generate API Keys from your organization's dashboard in **Account** > **API Keys**.

## Base URL [#base-url]

All API endpoints use the following base URLs:

| Environment | URL                            |
| ----------- | ------------------------------ |
| Production  | `https://app.docutray.com`     |
| Staging     | `https://staging.docutray.com` |

## Available Endpoints [#available-endpoints]

Navigate through the sidebar to explore all available API endpoints organized by functionality:

* **Document Conversion** - Convert documents to structured data
* **Document Identification** - Automatically identify document types
* **Document Types** - Manage document type schemas
* **Knowledge Bases** - Manage knowledge bases for RAG operations
* **Steps Execution** - Execute workflow steps asynchronously

## Response Format [#response-format]

All API responses follow a consistent JSON format:

```json
{
  "success": true,
  "data": { ... }
}
```

Error responses include details about the failure:

```json
{
  "success": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid document type"
  }
}
```

## Rate Limits [#rate-limits]

API requests are rate-limited based on your subscription plan. Contact support for custom limits.


---

# CLI (https://docs.docutray.com/docs/cli)



<InstallBar
  pkg="@docutray/cli"
  version="beta"
  commands="[
  { command: &#x22;npm install -g @docutray/cli&#x22; },
]"
/>

`@docutray/cli` is the official command-line interface for DocuTray. Convert
documents, manage API keys, batch-process folders and tail webhook events
from your terminal or shell scripts — no runtime, no glue code.

> **Status:** Beta. The command surface is stable; flag names may still
> change between minor versions.

## Why a CLI? [#why-a-cli]

The CLI is the right surface when you:

* Need to **batch-convert a folder** of scans without writing code.
* Want a **CI step** that converts a file and asserts on extracted fields.
* Are **prototyping** and don't want to set up an SDK yet.
* Need to **stream JSONL** straight into a data warehouse (BigQuery, Snowflake, DuckDB).

For everything else, reach for the [Node SDK](/docs/node-sdk) or
[Python SDK](/docs/python-sdk).

## Install [#install]

<CodeBlockTabs defaultValue="npm">
  <CodeBlockTabsList>
    <CodeBlockTabsTrigger value="npm">
      npm
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="binary">
      binary
    </CodeBlockTabsTrigger>
  </CodeBlockTabsList>

  <CodeBlockTab value="npm">
    ```bash
    npm install -g @docutray/cli
    ```
  </CodeBlockTab>

  <CodeBlockTab value="binary">
    ```bash
    # Linux / macOS
    curl -fsSL https://app.docutray.com/install.sh | sh
    ```
  </CodeBlockTab>
</CodeBlockTabs>

Verify:

```bash
docutray --version
```

## Quick start [#quick-start]

```bash
# 1. Authenticate (opens browser, stores token in ~/.docutray/config.json)
docutray auth login

# 2. Convert a single document
docutray convert ./invoice.pdf --type invoice --out ./out.json

# 3. Batch-convert a folder, streaming JSONL to stdout
docutray convert ./invoices/*.pdf --type invoice --jsonl > data.jsonl
```

## Available commands [#available-commands]

| Command                      | What it does                     |
| ---------------------------- | -------------------------------- |
| `docutray auth login`        | Authenticate with browser flow   |
| `docutray auth logout`       | Clear stored credentials         |
| `docutray auth status`       | Print active org and key         |
| `docutray convert <file>`    | Convert one or many files        |
| `docutray types list`        | List supported document types    |
| `docutray types show <code>` | Print schema for a document type |
| `docutray webhooks tail`     | Stream webhook events live       |
| `docutray config`            | Inspect / edit config            |

See the [commands reference](/docs/cli/commands/convert) for full flag listings.

## Source [#source]

The CLI is open source: [github.com/docutray/docutray-cli](https://github.com/docutray/docutray-cli)


---

# 8-Column Balance Sheet (https://docs.docutray.com/docs/document-types/balance_ocho_columnas)



8-column balance sheet with company identification, period, and accounts with their values. This document type processes accounting balance sheets and extracts structured information from them.

**Document type code:** `balance_ocho_columnas`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "empresa": "Example Company S.A.",
      "año": "2023",
      "periodo": "January - December",
      "contador": "John Pérez CPA",
      "cuenta": [
        {
          "codigo": "1101",
          "nombre": "Cash",
          "debitos": 1000000,
          "creditos": 500000,
          "deudor": 500000,
          "acreedor": 0,
          "activo": 500000,
          "pasivo": 0,
          "perdida": 0,
          "ganancia": 0
        },
        {
          "codigo": "2101",
          "nombre": "Suppliers",
          "debitos": 200000,
          "creditos": 800000,
          "deudor": 0,
          "acreedor": 600000,
          "activo": 0,
          "pasivo": 600000,
          "perdida": 0,
          "ganancia": 0
        }
      ]
    }
  }
}
```

## Main fields [#main-fields]

| Field      | Type   | Description                        |
| ---------- | ------ | ---------------------------------- |
| `empresa`  | String | Company name                       |
| `año`      | String | Balance sheet year                 |
| `periodo`  | String | Balance sheet period               |
| `contador` | String | Name of the responsible accountant |

## Account fields [#account-fields]

| Field      | Type   | Description             |
| ---------- | ------ | ----------------------- |
| `codigo`   | String | Accounting account code |
| `nombre`   | String | Accounting account name |
| `debitos`  | Number | Debits amount           |
| `creditos` | Number | Credits amount          |
| `deudor`   | Number | Debtor balance          |
| `acreedor` | Number | Creditor balance        |
| `activo`   | Number | Asset value             |
| `pasivo`   | Number | Liability value         |
| `perdida`  | Number | Loss value              |
| `ganancia` | Number | Gain value              |

## Important considerations [#important-considerations]

* Each account contains the 8 characteristic columns of the balance sheet: debits, credits, debtor, creditor, asset, liability, loss, and gain
* Account codes follow the standard chart of accounts
* All amounts are expressed in local currency


---

# Bill of Lading (https://docs.docutray.com/docs/document-types/bl)



Bill of Lading for international maritime transport with shipment details, ports, and cargo information.

**Document type code:** `bl`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "bl_number": "MAEU123456789",
      "shipper": "Global Export Corp",
      "consignee": "International Import Ltd",
      "notify_party": "Local Agent S.A.",
      "vessel": "ATLANTIC QUEEN",
      "voyage": "2023-045N",
      "port_of_loading": "Hamburg, Germany",
      "port_of_discharge": "Valparaíso, Chile",
      "place_of_delivery": "Santiago, Chile",
      "date_of_issue": "2023-11-15T00:00:00Z",
      "freight_payment": "PREPAID",
      "container_details": [
        {
          "container_number": "MAEU987654321",
          "seal_number": "SL123456",
          "type_size": "40'HC",
          "weight": "28500 KGS",
          "packages": 1200,
          "description": "MACHINERY PARTS"
        }
      ]
    }
  }
}
```

## Main fields [#main-fields]

| Field               | Type               | Description                             |
| ------------------- | ------------------ | --------------------------------------- |
| `bl_number`         | String             | Bill of Lading identification number    |
| `shipper`           | String             | Shipper/exporter name                   |
| `consignee`         | String             | Consignee/importer name                 |
| `notify_party`      | String             | Party to be notified upon arrival       |
| `vessel`            | String             | Vessel name                             |
| `voyage`            | String             | Voyage number                           |
| `port_of_loading`   | String             | Port where cargo was loaded             |
| `port_of_discharge` | String             | Port where cargo will be discharged     |
| `place_of_delivery` | String             | Final delivery location                 |
| `date_of_issue`     | String (date-time) | Bill of Lading issue date               |
| `freight_payment`   | String             | Freight payment terms (PREPAID/COLLECT) |

## Container details fields [#container-details-fields]

| Field              | Type   | Description                     |
| ------------------ | ------ | ------------------------------- |
| `container_number` | String | Container identification number |
| `seal_number`      | String | Container seal number           |
| `type_size`        | String | Container type and size         |
| `weight`           | String | Container weight                |
| `packages`         | Number | Number of packages              |
| `description`      | String | Cargo description               |

## Important considerations [#important-considerations]

* It is an official maritime transport document
* Essential for international cargo clearance
* Contains detailed information about containers and cargo
* Used for customs procedures and cargo tracking
* Bill of Lading number is unique for tracking purposes


---

# Professional Fee Receipt (https://docs.docutray.com/docs/document-types/boleta_honorarios)



Professional fee receipt (boleta de honorarios) with professional services details, client information, and tax calculations.

**Document type code:** `boleta_honorarios`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_boleta": 145,
      "fecha_emision": "2023-11-20T00:00:00Z",
      "rut_profesional": "12.345.678-9",
      "nombre_profesional": "María González López",
      "rut_cliente": "98.765.432-1",
      "nombre_cliente": "Tech Solutions S.A.",
      "descripcion_servicios": "Consulting services for software development project - November 2023",
      "honorarios_brutos": 850000,
      "retencion_impuesto": 106250,
      "honorarios_liquidos": 743750,
      "periodo_servicios": "November 2023",
      "direccion_profesional": "Av. Providencia 1234, Santiago",
      "actividad_economica": "Software Consulting"
    }
  }
}
```

## Main fields [#main-fields]

| Field                   | Type               | Description                            |
| ----------------------- | ------------------ | -------------------------------------- |
| `numero_boleta`         | Number             | Receipt number                         |
| `fecha_emision`         | String (date-time) | Receipt issue date                     |
| `rut_profesional`       | String             | Professional's RUT (tax ID)            |
| `nombre_profesional`    | String             | Professional's full name               |
| `rut_cliente`           | String             | Client's RUT (tax ID)                  |
| `nombre_cliente`        | String             | Client's name or company name          |
| `descripcion_servicios` | String             | Description of services provided       |
| `honorarios_brutos`     | Number             | Gross fees before tax retention        |
| `retencion_impuesto`    | Number             | Tax retention amount (typically 12.5%) |
| `honorarios_liquidos`   | Number             | Net fees after tax retention           |
| `periodo_servicios`     | String             | Period when services were provided     |
| `direccion_profesional` | String             | Professional's address                 |
| `actividad_economica`   | String             | Economic activity or professional area |

## Important considerations [#important-considerations]

* It is an official tax document for professional services in Chile
* Tax retention is typically 12.5% of gross fees
* The receipt number must be sequential and unique per professional
* RUT format must be valid Chilean tax identification
* Used for income tax declarations by both professional and client
* Net fees = Gross fees - Tax retention
* Essential document for tax compliance for independent professionals


---

# Current Account Statement (https://docs.docutray.com/docs/document-types/cartola_cc)



Current Account Statement with transaction details.

**Document type code:** `cartola_cc`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "titular": "John Doe",
      "fecha_desde": "2023-11-01T00:00:00Z",
      "fecha_hasta": "2023-11-30T23:59:59Z",
      "transacciones": [
        {
          "fecha": "2023-11-15T10:30:00Z",
          "descripcion": "TRANSFER RECEIVED",
          "sucursal": "001 DOWNTOWN BRANCH",
          "numero_documento": "TRF123456",
          "tipo": "abono",
          "monto": 500000
        },
        {
          "fecha": "2023-11-20T14:45:00Z",
          "descripcion": "UTILITY BILL PAYMENT",
          "sucursal": "002 UPTOWN BRANCH",
          "numero_documento": "PSB789012",
          "tipo": "cargo",
          "monto": 35000
        },
        {
          "fecha": "2023-11-25T09:15:00Z",
          "descripcion": "ATM WITHDRAWAL",
          "sucursal": "ATM SHOPPING MALL",
          "numero_documento": "GCA345678",
          "tipo": "cargo",
          "monto": 100000
        },
        {
          "fecha": "2023-11-28T16:20:00Z",
          "descripcion": "CASH DEPOSIT",
          "sucursal": "003 BUSINESS DISTRICT",
          "numero_documento": "DEP901234",
          "tipo": "abono",
          "monto": 250000
        }
      ]
    }
  }
}
```

## Main fields [#main-fields]

| Field         | Type               | Description                  |
| ------------- | ------------------ | ---------------------------- |
| `titular`     | String             | Account holder name          |
| `n_cuenta`    | String             | Account number or identifier |
| `fecha_desde` | String (date-time) | Statement start date         |
| `fecha_hasta` | String (date-time) | Statement end date           |

## Transaction fields [#transaction-fields]

| Field              | Type               | Description                                                                                                                                                                                     |
| ------------------ | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `fecha`            | String (date-time) | Transaction date                                                                                                                                                                                |
| `descripcion`      | String             | Transaction description                                                                                                                                                                         |
| `sucursal`         | String             | Transaction branch. This may not appear                                                                                                                                                         |
| `numero_documento` | String             | Transaction document number. This may not appear                                                                                                                                                |
| `tipo`             | String (enum)      | Transaction type. Transactions can be Charges or Credits                                                                                                                                        |
| `monto`            | Number             | Transaction amount. Be careful not to confuse this value with the Balance or Daily Balance column. You will usually find it in the Charges or Credits columns according to the transaction type |

## Transaction types [#transaction-types]

* **cargo**: Represents a debit or money outflow from the account
* **abono**: Represents a credit or money inflow to the account

## Important considerations [#important-considerations]

* This is a bank current account statement with complete transaction information
* Transactions can be charges (debits) or credits (deposits)
* Important to verify the statement period dates for the correct timeframe
* The amount field corresponds to the actual transaction value, not the account balance


---

# Credit Card Statement (https://docs.docutray.com/docs/document-types/cartola_tc)



Credit Card Statement with credit limits, billed amounts, and transaction details.

**Document type code:** `cartola_tc`

## Response Structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "titular": "Juan Pérez",
      "numero_tarjeta": "XXXX-XXXX-XXXX-1234",
      "fecha_estado_cuenta": "2023-12-01T00:00:00Z",
      "monto_total_facturado": 125000,
      "tipo_cartola": "nacional",
      "moneda": "CLP",
      "cupo_disponible": 1500000,
      "cupo_utilizado": 500000,
      "cupo_total": 2000000,
      "saldo_periodo_anterior": 75000,
      "transacciones": [
        {
          "fecha": "2023-11-15T00:00:00Z",
          "descripcion": "SUPERMERCADO XYZ",
          "monto_mensual": 45000,
          "compra_en_cuotas": true,
          "numero_cuota": 2,
          "total_cuotas": 6,
          "monto_total": 270000
        },
        {
          "fecha": "2023-11-20T00:00:00Z",
          "descripcion": "FARMACIA ABC",
          "monto_mensual": 15000,
          "compra_en_cuotas": false,
          "numero_cuota": null,
          "total_cuotas": null,
          "monto_total": null
        },
        {
          "fecha": "2023-11-25T00:00:00Z",
          "descripcion": "PAGO POR INTERNET",
          "monto_mensual": -75000,
          "compra_en_cuotas": false,
          "numero_cuota": null,
          "total_cuotas": null,
          "monto_total": null
        }
      ]
    }
  }
}
```

## Main Fields [#main-fields]

| Field                                    | Type               | Description                                                                                                                                                                                                                                                                                                             |
| ---------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `titular`                                | String             | Cardholder's name                                                                                                                                                                                                                                                                                                       |
| `numero_tarjeta`                         | String             | Masked card number format                                                                                                                                                                                                                                                                                               |
| `fecha_estado_cuenta`                    | String (date-time) | Statement date                                                                                                                                                                                                                                                                                                          |
| `monto_total_facturado`                  | Number             | Total billed amount                                                                                                                                                                                                                                                                                                     |
| `tipo_cartola`                           | String (enum)      | Indicates if it is national or international                                                                                                                                                                                                                                                                            |
| `moneda`                                 | String (enum)      | CLP for national statements, USD for international ones                                                                                                                                                                                                                                                                 |
| `cupo_disponible`                        | Number             | Available credit limit                                                                                                                                                                                                                                                                                                  |
| `cupo_utilizado`                         | Number             | Used credit limit                                                                                                                                                                                                                                                                                                       |
| `cupo_total`                             | Number             | Total credit limit                                                                                                                                                                                                                                                                                                      |
| `saldo_periodo_anterior`                 | Number (nullable)  | Previous period balance, may also appear as final owed balance from previous period, previous billed balance, etc. Note that in some statements the Previous Period Starting Owed Balance appears, but for you it's always important to find the Final one, which may be called Final Owed Balance from Previous Period |
| `monto_total_facturado_periodo_anterior` | Number             | Total billed amount from previous period                                                                                                                                                                                                                                                                                |

## Transaction Fields [#transaction-fields]

| Field              | Type               | Description                                                                    |
| ------------------ | ------------------ | ------------------------------------------------------------------------------ |
| `fecha`            | String (date-time) | Transaction date                                                               |
| `descripcion`      | String             | Transaction description                                                        |
| `monto_mensual`    | Number             | Monthly amount to pay for the transaction                                      |
| `compra_en_cuotas` | Boolean            | Indicates if it's an installment purchase                                      |
| `numero_cuota`     | Number (nullable)  | Current installment number to pay in the month, only for installment purchases |
| `total_cuotas`     | Number (nullable)  | Total number of installments, only for installment purchases                   |
| `monto_total`      | Number (nullable)  | Total transaction amount, only for installment purchases                       |

## Important considerations [#important-considerations]

* The transaction list may include entries with names like PAID AMOUNT or INTERNET PAYMENT, which have negative amounts, but should also be included
* National statements use CLP currency and international statements use USD
* Installment transactions show the detail of each installment and the original total amount
* Important to verify the statement date for the correct period
* Credit card statement with complete information about limits and transactions


---

# AFP Contributions Certificate (https://docs.docutray.com/docs/document-types/cotizaciones_afp)



Certificate of AFP (pension fund) contributions with contribution history and employer information.

**Document type code:** `cotizaciones_afp`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "rut_afiliado": "12.345.678-9",
      "nombre_afiliado": "Juan Carlos Pérez",
      "afp": "Habitat",
      "fecha_emision": "2023-11-15T00:00:00Z",
      "periodo_consultado": "January 2023 - October 2023",
      "cotizaciones": [
        {
          "mes": "2023-10",
          "empleador": "Tech Solutions S.A.",
          "rut_empleador": "98.765.432-1",
          "remuneracion": 1200000,
          "cotizacion_obligatoria": 120000,
          "cotizacion_voluntaria": 50000,
          "seguro_cesantia": 36000,
          "estado": "PAGADO"
        },
        {
          "mes": "2023-09",
          "empleador": "Tech Solutions S.A.",
          "rut_empleador": "98.765.432-1",
          "remuneracion": 1200000,
          "cotizacion_obligatoria": 120000,
          "cotizacion_voluntaria": 50000,
          "seguro_cesantia": 36000,
          "estado": "PAGADO"
        }
      ]
    }
  }
}
```

## Main fields [#main-fields]

| Field                | Type               | Description                           |
| -------------------- | ------------------ | ------------------------------------- |
| `rut_afiliado`       | String             | Member's RUT (tax ID)                 |
| `nombre_afiliado`    | String             | Member's full name                    |
| `afp`                | String             | AFP name (pension fund administrator) |
| `fecha_emision`      | String (date-time) | Certificate issue date                |
| `periodo_consultado` | String             | Period covered by the certificate     |

## Contribution fields [#contribution-fields]

| Field                    | Type   | Description                                    |
| ------------------------ | ------ | ---------------------------------------------- |
| `mes`                    | String | Contribution month (YYYY-MM format)            |
| `empleador`              | String | Employer's name or company                     |
| `rut_empleador`          | String | Employer's RUT (tax ID)                        |
| `remuneracion`           | Number | Monthly salary or wage                         |
| `cotizacion_obligatoria` | Number | Mandatory pension contribution (typically 10%) |
| `cotizacion_voluntaria`  | Number | Voluntary additional contribution              |
| `seguro_cesantia`        | Number | Unemployment insurance contribution            |
| `estado`                 | String | Payment status (PAID/PENDING/OVERDUE)          |

## Important considerations [#important-considerations]

* Official document from Chilean pension system (AFP)
* Shows contribution history for tax and benefit purposes
* Mandatory contributions are typically 10% of salary
* Used for pension calculations and employment verification
* Essential for retirement planning and loan applications
* Payment status indicates employer compliance with contributions


---

# Curriculum Vitae (https://docs.docutray.com/docs/document-types/cv)



To extract all data from a CV, including description, education, and work experience.

**Document type code:** `cv`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "nombre": "Ana María Torres",
      "telefono": "+56 9 8765 4321",
      "correo_electronico": "ana.torres@email.com",
      "descripcion": "Systems Engineer with 8 years of experience in software development and technology project management.",
      "educacion": [
        {
          "institucion": "University of Chile",
          "titulo": "Systems Engineering",
          "ano_ingreso": 2010,
          "ano_salida": 2015,
          "ubicacion": "Santiago, Chile"
        },
        {
          "institucion": "AIEP Professional Institute",
          "titulo": "Programming Technician",
          "ano_ingreso": 2008,
          "ano_salida": 2010,
          "ubicacion": "Santiago, Chile"
        }
      ],
      "experiencia_laboral": [
        {
          "empresa": "TechSolutions S.A.",
          "cargo": "Senior Developer",
          "ano_ingreso": 2020,
          "ano_salida": 2023,
          "descripcion": "Team leadership, implementation of scalable architectures, and mentoring junior developers.",
          "ubicacion": "Santiago, Chile"
        },
        {
          "empresa": "Innovate Corp",
          "cargo": "Full Stack Developer",
          "ano_ingreso": 2017,
          "ano_salida": 2020,
          "descripcion": "Web application development using React, Node.js, and PostgreSQL. Participation in digital transformation projects.",
          "ubicacion": "Valparaíso, Chile"
        }
      ]
    }
  }
}
```

## Main fields [#main-fields]

| Field                | Type   | Description                                  |
| -------------------- | ------ | -------------------------------------------- |
| `nombre`             | String | Person's full name                           |
| `telefono`           | String | Contact phone number                         |
| `correo_electronico` | String | Email address                                |
| `descripcion`        | String | Professional summary or personal description |

## Education fields [#education-fields]

| Field         | Type   | Description                             |
| ------------- | ------ | --------------------------------------- |
| `institucion` | String | Educational institution name            |
| `titulo`      | String | Degree or title obtained                |
| `ano_ingreso` | Number | Year of entry to the institution        |
| `ano_salida`  | Number | Year of graduation from the institution |
| `ubicacion`   | String | Institution location                    |

## Work experience fields [#work-experience-fields]

| Field         | Type   | Description                                      |
| ------------- | ------ | ------------------------------------------------ |
| `empresa`     | String | Company or employer name                         |
| `cargo`       | String | Job title or position held                       |
| `ano_ingreso` | Number | Year started in the position                     |
| `ano_salida`  | Number | Year ended in the position                       |
| `descripcion` | String | Description of responsibilities and achievements |
| `ubicacion`   | String | Work location                                    |

## Important considerations [#important-considerations]

* Education and experience arrays are generally ordered by relevance or chronology
* Years can be in full format (YYYY) or abbreviated according to the original document
* Personal description may include skills, professional objectives, or career summary
* Useful for recruitment processes and professional profile analysis


---

# Electronic Invoice (https://docs.docutray.com/docs/document-types/factura)



Electronic Invoice from SII (Chile) with issuer, recipient, and product/service details.

**Document type code:** `factura`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "folio": 123456,
      "fecha_emision": "2023-11-15T00:00:00Z",
      "rut_emisor": "98.765.432-1",
      "nombre_emisor": "Commercial Company S.A.",
      "rut_receptor": "12.345.678-9",
      "nombre_receptor": "Juan Pérez González",
      "detalle": [
        {
          "descripcion": "HP Pavilion Notebook",
          "unidad": "Unit",
          "cantidad": 2,
          "precio_unitario": 450000,
          "precio_total": 900000
        },
        {
          "descripcion": "Wireless Mouse",
          "unidad": "Unit",
          "cantidad": 2,
          "precio_unitario": 25000,
          "precio_total": 50000
        }
      ],
      "total_neto": 950000,
      "iva": 180500,
      "total": 1130500
    }
  }
}
```

## Main fields [#main-fields]

| Field             | Type               | Description                      |
| ----------------- | ------------------ | -------------------------------- |
| `folio`           | Number             | Invoice folio number             |
| `fecha_emision`   | String (date-time) | Invoice issue date               |
| `rut_emisor`      | String             | Issuer's RUT (tax ID)            |
| `nombre_emisor`   | String             | Issuer's name or company name    |
| `rut_receptor`    | String             | Recipient's RUT (tax ID)         |
| `nombre_receptor` | String             | Recipient's name or company name |
| `total_neto`      | Number             | Net total before taxes           |
| `iva`             | Number             | VAT amount                       |
| `total`           | Number             | Final total including taxes      |

## Detail fields [#detail-fields]

| Field             | Type   | Description                    |
| ----------------- | ------ | ------------------------------ |
| `descripcion`     | String | Product or service description |
| `unidad`          | String | Unit of measurement            |
| `cantidad`        | Number | Quantity of products/services  |
| `precio_unitario` | Number | Price per unit                 |
| `precio_total`    | Number | Total price for the line       |

## Important considerations [#important-considerations]

* It is an official tax document from Chile's SII
* The folio is unique for each issuer
* VAT is calculated on the net total according to the current rate
* RUT must be in valid Chilean format
* Each detail line represents a billed product or service
* The sum of all `precio_total` from detail should match `total_neto`


---

# Document Types (https://docs.docutray.com/docs/document-types)



A **document type** tells DocuTray what to extract from a document and how to
shape the result. Each type defines a JSON schema, a stable API code (the
`document_type_code` you pass to the [convert](/docs/operations/convert)
endpoint), and the extraction hints that guide the OCR pipeline. When you
convert a file against a type, you always get back the same structured fields —
no matter the layout of the underlying scan.

The built-in types below cover common Chilean and international business
documents, grouped by domain. Each page documents the type's API code, its
response structure, and the individual fields it returns. Need something that
isn't listed? You can define your own with the
[Create a document type](/docs/guides/crear-tipo-documento) guide, or validate
extracted data against any schema with the
[Document Types operations](/docs/operations/document-types).

## Financial Documents [#financial-documents]

<Cards>
  <Card title="8-Column Balance Sheet" href="/docs/document-types/balance_ocho_columnas" />

  <Card title="Current Account Statement" href="/docs/document-types/cartola_cc" />

  <Card title="Credit Card Statement" href="/docs/document-types/cartola_tc" />

  <Card title="Promissory Note" href="/docs/document-types/pagare" />

  <Card title="Transbank Voucher" href="/docs/document-types/voucher_transbank" />
</Cards>

## Tax Documents [#tax-documents]

<Cards>
  <Card title="Professional Fee Receipt" href="/docs/document-types/boleta_honorarios" />

  <Card title="Electronic Invoice" href="/docs/document-types/factura" />

  <Card title="Invoice" href="/docs/document-types/invoice" />

  <Card title="Purchase Order" href="/docs/document-types/oc" />
</Cards>

## Labor Documents [#labor-documents]

<Cards>
  <Card title="AFP Contributions Certificate" href="/docs/document-types/cotizaciones_afp" />

  <Card title="Curriculum Vitae" href="/docs/document-types/cv" />

  <Card title="Payroll" href="/docs/document-types/liquidacion_sueldo" />
</Cards>

## Medical Documents [#medical-documents]

<Cards>
  <Card title="Medical Prescription" href="/docs/document-types/receta_medica" />
</Cards>

## International Commerce Documents [#international-commerce-documents]

<Cards>
  <Card title="Bill of Lading" href="/docs/document-types/bl" />
</Cards>


---

# Invoice (https://docs.docutray.com/docs/document-types/invoice)



International services invoice with currency, amount, date, and issuer and recipient information.

**Document type code:** `invoice`

## Response Structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "moneda": "USD",
      "fecha_pago": "2023-12-15T00:00:00Z",
      "invoice_id": "INV-2023-001234",
      "monto_total": 1250.50,
      "fecha_emisión": "2023-12-01T00:00:00Z",
      "tax_id_emisor": "12345678-9",
      "tax_id_receptor": "98765432-1",
      "nombre_emisor": "ABC Company Inc.",
      "nombre_receptor": "XYZ Client Ltd."
    }
  }
}
```

## Main Fields [#main-fields]

| Field             | Type               | Description                                                                       |
| ----------------- | ------------------ | --------------------------------------------------------------------------------- |
| `moneda`          | String             | Currency in which the Invoice is being charged. Uses ISO 4217 format for currency |
| `fecha_pago`      | String (date-time) | The date when payment was made, if available                                      |
| `invoice_id`      | String             | The code or identification number of the invoice                                  |
| `monto_total`     | Number             | Total amount of the Invoice                                                       |
| `fecha_emisión`   | String (date-time) | The date when the Invoice was issued                                              |
| `tax_id_emisor`   | String             | Tax ID, RUT or fiscal identifier of the Invoice issuer, if available              |
| `tax_id_receptor` | String             | Tax ID, RUT or fiscal identifier of the Invoice recipient, if available           |
| `nombre_emisor`   | String             | Name or business name of the Invoice issuer, if available                         |
| `nombre_receptor` | String             | Name or business name of the Invoice recipient, if available                      |

## Important considerations [#important-considerations]

* All listed fields are **required** for document processing
* Dates will be in ISO 8601 format (date-time)
* Currency will follow ISO 4217 standard (e.g: USD, EUR, CLP)
* Amounts are numeric values without currency formatting
* It is an international billing document
* Used for international services and products


---

# Payroll (https://docs.docutray.com/docs/document-types/liquidacion_sueldo)



Detailed payslip with salary information, deductions, bonuses, and net payment calculations.

**Document type code:** `liquidacion_sueldo`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "empleado": "Carlos Mendoza Ruiz",
      "rut": "15.678.432-9",
      "cargo": "Software Developer",
      "empresa": "Innovate Tech S.A.",
      "periodo": "November 2023",
      "fecha_pago": "2023-11-30T00:00:00Z",
      "dias_trabajados": 22,
      "sueldo_base": 1200000,
      "haberes": [
        {
          "concepto": "Overtime Hours",
          "cantidad": 8,
          "valor_unitario": 15000,
          "total": 120000
        },
        {
          "concepto": "Performance Bonus",
          "cantidad": 1,
          "valor_unitario": 100000,
          "total": 100000
        }
      ],
      "descuentos": [
        {
          "concepto": "AFP Contribution",
          "porcentaje": 10,
          "total": 120000
        },
        {
          "concepto": "Health Insurance",
          "porcentaje": 7,
          "total": 84000
        },
        {
          "concepto": "Income Tax",
          "porcentaje": null,
          "total": 45000
        }
      ],
      "total_haberes": 1420000,
      "total_descuentos": 249000,
      "liquido_a_pagar": 1171000
    }
  }
}
```

## Main fields [#main-fields]

| Field              | Type               | Description                     |
| ------------------ | ------------------ | ------------------------------- |
| `empleado`         | String             | Employee's full name            |
| `rut`              | String             | Employee's RUT (tax ID)         |
| `cargo`            | String             | Job position or title           |
| `empresa`          | String             | Company name                    |
| `periodo`          | String             | Payroll period                  |
| `fecha_pago`       | String (date-time) | Payment date                    |
| `dias_trabajados`  | Number             | Days worked in the period       |
| `sueldo_base`      | Number             | Base salary                     |
| `total_haberes`    | Number             | Total earnings (base + bonuses) |
| `total_descuentos` | Number             | Total deductions                |
| `liquido_a_pagar`  | Number             | Net amount to be paid           |

## Earnings (haberes) fields [#earnings-haberes-fields]

| Field            | Type   | Description                              |
| ---------------- | ------ | ---------------------------------------- |
| `concepto`       | String | Earnings concept (overtime, bonus, etc.) |
| `cantidad`       | Number | Quantity (hours, units, etc.)            |
| `valor_unitario` | Number | Unit value                               |
| `total`          | Number | Total amount for this earning            |

## Deductions (descuentos) fields [#deductions-descuentos-fields]

| Field        | Type              | Description                                |
| ------------ | ----------------- | ------------------------------------------ |
| `concepto`   | String            | Deduction concept (AFP, health, tax, etc.) |
| `porcentaje` | Number (nullable) | Percentage applied, if applicable          |
| `total`      | Number            | Total deducted amount                      |

## Important considerations [#important-considerations]

* Official payroll document for employment in Chile
* Net pay = Total earnings - Total deductions
* AFP and health contributions are mandatory in Chile (typically 10% and 7%)
* Income tax varies based on salary brackets
* Days worked affects proportional salary calculations
* Essential document for employment verification and loan applications
* All amounts are in Chilean pesos (CLP)


---

# Purchase Order (https://docs.docutray.com/docs/document-types/oc)



Corporate purchase order with supplier information, requested products/services, and delivery details.

**Document type code:** `oc`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_orden": "OC-2023-001245",
      "fecha_emision": "2023-11-10T00:00:00Z",
      "empresa_compradora": "Tech Solutions S.A.",
      "rut_comprador": "98.765.432-1",
      "proveedor": "Office Supplies Corp",
      "rut_proveedor": "12.345.678-9",
      "contacto_comprador": "María González - Procurement",
      "telefono_comprador": "+56 2 2345 6789",
      "direccion_entrega": "Av. Providencia 1234, Santiago, Chile",
      "fecha_entrega_solicitada": "2023-11-20T00:00:00Z",
      "detalle": [
        {
          "codigo_producto": "LAP001",
          "descripcion": "Business Laptop HP ProBook 450",
          "cantidad": 5,
          "precio_unitario": 650000,
          "precio_total": 3250000
        },
        {
          "codigo_producto": "MOU002",
          "descripcion": "Wireless Mouse Logitech MX Master 3",
          "cantidad": 5,
          "precio_unitario": 85000,
          "precio_total": 425000
        }
      ],
      "subtotal": 3675000,
      "iva": 698250,
      "total": 4373250,
      "condiciones_pago": "30 days net",
      "observaciones": "Delivery required during business hours. Contact procurement department upon arrival."
    }
  }
}
```

## Main fields [#main-fields]

| Field                      | Type               | Description                             |
| -------------------------- | ------------------ | --------------------------------------- |
| `numero_orden`             | String             | Purchase order number                   |
| `fecha_emision`            | String (date-time) | Purchase order issue date               |
| `empresa_compradora`       | String             | Purchasing company name                 |
| `rut_comprador`            | String             | Purchaser's RUT (tax ID)                |
| `proveedor`                | String             | Supplier name                           |
| `rut_proveedor`            | String             | Supplier's RUT (tax ID)                 |
| `contacto_comprador`       | String             | Buyer contact person                    |
| `telefono_comprador`       | String             | Buyer contact phone                     |
| `direccion_entrega`        | String             | Delivery address                        |
| `fecha_entrega_solicitada` | String (date-time) | Requested delivery date                 |
| `subtotal`                 | Number             | Subtotal before taxes                   |
| `iva`                      | Number             | VAT amount                              |
| `total`                    | Number             | Total amount including taxes            |
| `condiciones_pago`         | String             | Payment terms                           |
| `observaciones`            | String             | Additional observations or instructions |

## Detail fields [#detail-fields]

| Field             | Type   | Description                    |
| ----------------- | ------ | ------------------------------ |
| `codigo_producto` | String | Product or service code        |
| `descripcion`     | String | Product or service description |
| `cantidad`        | Number | Requested quantity             |
| `precio_unitario` | Number | Unit price                     |
| `precio_total`    | Number | Total price for the line       |

## Important considerations [#important-considerations]

* It is a formal procurement document between companies
* Purchase order number is unique and used for tracking
* Serves as authorization for the supplier to deliver goods/services
* Essential for inventory control and accounts payable processes
* Payment terms define when payment is due after delivery
* Delivery address may differ from company's main address
* Total amount = Subtotal + VAT (typically 19% in Chile)


---

# Promissory Note (https://docs.docutray.com/docs/document-types/pagare)



Financial promissory note with debtor information, amount, payment terms, and maturity date.

**Document type code:** `pagare`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_pagare": "PN-2023-000789",
      "fecha_emision": "2023-11-01T00:00:00Z",
      "lugar_emision": "Santiago, Chile",
      "deudor": "Carlos Mendoza Ruiz",
      "rut_deudor": "15.678.432-9",
      "beneficiario": "Banco de Chile",
      "rut_beneficiario": "97.004.000-5",
      "monto": 5000000,
      "moneda": "CLP",
      "fecha_vencimiento": "2024-05-01T00:00:00Z",
      "tasa_interes": 2.5,
      "tipo_interes": "monthly",
      "forma_pago": "Monthly installments of CLP 450,000",
      "lugar_pago": "Any branch of Banco de Chile",
      "avalista": "María González López",
      "rut_avalista": "12.345.678-9",
      "clausulas_especiales": "In case of default, the debtor agrees to pay legal collection costs and attorney fees."
    }
  }
}
```

## Main fields [#main-fields]

| Field                  | Type               | Description                           |
| ---------------------- | ------------------ | ------------------------------------- |
| `numero_pagare`        | String             | Promissory note number                |
| `fecha_emision`        | String (date-time) | Issue date                            |
| `lugar_emision`        | String             | Place where the note was issued       |
| `deudor`               | String             | Debtor's full name                    |
| `rut_deudor`           | String             | Debtor's RUT (tax ID)                 |
| `beneficiario`         | String             | Beneficiary's name (creditor)         |
| `rut_beneficiario`     | String             | Beneficiary's RUT (tax ID)            |
| `monto`                | Number             | Principal amount                      |
| `moneda`               | String             | Currency (CLP, USD, etc.)             |
| `fecha_vencimiento`    | String (date-time) | Maturity date                         |
| `tasa_interes`         | Number             | Interest rate percentage              |
| `tipo_interes`         | String             | Interest type (monthly, annual, etc.) |
| `forma_pago`           | String             | Payment method description            |
| `lugar_pago`           | String             | Payment location                      |
| `avalista`             | String             | Guarantor's name (if applicable)      |
| `rut_avalista`         | String             | Guarantor's RUT (if applicable)       |
| `clausulas_especiales` | String             | Special clauses or conditions         |

## Important considerations [#important-considerations]

* It is a legally binding financial document
* The debtor commits to pay the specified amount by the maturity date
* Interest rate and payment terms must be clearly specified
* Guarantor provides additional security for the loan
* Used for personal and commercial loans
* Essential for legal collection processes if payment defaults occur
* Currency should be specified to avoid confusion in international transactions


---

# Medical Prescription (https://docs.docutray.com/docs/document-types/receta_medica)



Medical prescription with doctor information, patient details, and prescribed medications with dosage instructions.

**Document type code:** `receta_medica`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "doctor": "Dr. Ana María Fernández",
      "especialidad": "Internal Medicine",
      "rut_doctor": "12.345.678-9",
      "registro_medico": "RM-12345",
      "paciente": "Juan Carlos Pérez",
      "rut_paciente": "15.678.432-K",
      "fecha_prescripcion": "2023-11-15T00:00:00Z",
      "diagnostico": "Hypertension and Type 2 Diabetes",
      "medicamentos": [
        {
          "nombre": "Losartan",
          "concentracion": "50mg",
          "forma_farmaceutica": "Tablets",
          "cantidad": 30,
          "posologia": "1 tablet daily, preferably in the morning",
          "duracion_tratamiento": "30 days"
        },
        {
          "nombre": "Metformin",
          "concentracion": "850mg",
          "forma_farmaceutica": "Tablets",
          "cantidad": 60,
          "posologia": "1 tablet twice daily with meals",
          "duracion_tratamiento": "30 days"
        }
      ],
      "indicaciones_generales": "Monitor blood pressure and glucose levels weekly. Return for follow-up in 30 days.",
      "hospital_clinica": "Hospital Clínico Universidad de Chile"
    }
  }
}
```

## Main fields [#main-fields]

| Field                    | Type               | Description                          |
| ------------------------ | ------------------ | ------------------------------------ |
| `doctor`                 | String             | Prescribing doctor's full name       |
| `especialidad`           | String             | Doctor's medical specialty           |
| `rut_doctor`             | String             | Doctor's RUT (tax ID)                |
| `registro_medico`        | String             | Doctor's medical license number      |
| `paciente`               | String             | Patient's full name                  |
| `rut_paciente`           | String             | Patient's RUT (tax ID)               |
| `fecha_prescripcion`     | String (date-time) | Prescription date                    |
| `diagnostico`            | String             | Medical diagnosis                    |
| `indicaciones_generales` | String             | General instructions for the patient |
| `hospital_clinica`       | String             | Hospital or clinic name              |

## Medication fields [#medication-fields]

| Field                  | Type   | Description                                          |
| ---------------------- | ------ | ---------------------------------------------------- |
| `nombre`               | String | Medication name (generic or brand)                   |
| `concentracion`        | String | Medication concentration/strength                    |
| `forma_farmaceutica`   | String | Pharmaceutical form (tablets, capsules, syrup, etc.) |
| `cantidad`             | Number | Quantity prescribed                                  |
| `posologia`            | String | Dosage instructions                                  |
| `duracion_tratamiento` | String | Treatment duration                                   |

## Important considerations [#important-considerations]

* It is an official medical document required for controlled medication dispensing
* Doctor must have valid medical license to prescribe
* Patient identification is essential for pharmacy dispensing
* Dosage instructions must be followed exactly as prescribed
* Some medications may require special handling or storage
* Used for insurance reimbursement and medication tracking
* Essential for patient safety and treatment compliance


---

# Transbank Voucher (https://docs.docutray.com/docs/document-types/voucher_transbank)



Transbank transaction voucher with card payment details, merchant information, and transaction amounts.

**Document type code:** `voucher_transbank`

## Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_transaccion": "123456789012",
      "fecha_hora": "2023-11-15T14:32:15Z",
      "comercio": "SuperMarket Plaza S.A.",
      "rut_comercio": "98.765.432-1",
      "terminal": "12345678",
      "numero_tarjeta": "XXXX-XXXX-XXXX-1234",
      "tipo_tarjeta": "VISA CREDIT",
      "banco_emisor": "Banco de Chile",
      "codigo_autorizacion": "AB123456",
      "monto": 45750,
      "moneda": "CLP",
      "tipo_transaccion": "SALE",
      "cuotas": 1,
      "plan_cuotas": "Without Interest",
      "estado": "APPROVED",
      "codigo_respuesta": "00",
      "descripcion_respuesta": "TRANSACTION APPROVED",
      "numero_referencia": "987654321",
      "numero_lote": "000123"
    }
  }
}
```

## Main fields [#main-fields]

| Field                   | Type               | Description                                   |
| ----------------------- | ------------------ | --------------------------------------------- |
| `numero_transaccion`    | String             | Unique transaction number                     |
| `fecha_hora`            | String (date-time) | Transaction date and time                     |
| `comercio`              | String             | Merchant name                                 |
| `rut_comercio`          | String             | Merchant's RUT (tax ID)                       |
| `terminal`              | String             | Terminal identification number                |
| `numero_tarjeta`        | String             | Masked card number                            |
| `tipo_tarjeta`          | String             | Card type (VISA, MASTERCARD, etc.)            |
| `banco_emisor`          | String             | Card issuing bank                             |
| `codigo_autorizacion`   | String             | Transaction authorization code                |
| `monto`                 | Number             | Transaction amount                            |
| `moneda`                | String             | Currency (typically CLP)                      |
| `tipo_transaccion`      | String             | Transaction type (SALE, REFUND, etc.)         |
| `cuotas`                | Number             | Number of installments                        |
| `plan_cuotas`           | String             | Installment plan description                  |
| `estado`                | String             | Transaction status (APPROVED, DECLINED, etc.) |
| `codigo_respuesta`      | String             | Response code from payment processor          |
| `descripcion_respuesta` | String             | Response description                          |
| `numero_referencia`     | String             | Reference number for tracking                 |
| `numero_lote`           | String             | Batch number for settlement                   |

## Important considerations [#important-considerations]

* Official payment voucher from Chile's main payment processor
* Authorization code confirms transaction approval
* Used for reconciliation and accounting purposes
* Card number is masked for security (only last 4 digits visible)
* Response code "00" typically indicates successful transaction
* Essential for refunds and dispute resolution
* Batch number groups transactions for daily settlement
* Different transaction types may have different data requirements


---

# Create Document Type (https://docs.docutray.com/docs/guides/crear-tipo-documento)



















This guide will help you create a custom document type using Docutray's AI-powered creation wizard.

## Prerequisites [#prerequisites]

Before you begin, make sure you have:

* An active Docutray account
* At least one sample document of the type you want to create (PDF, JPG, PNG, etc.)
* A clear description of the data you want to extract from the document

## Step 1: Access the Creation Wizard [#step-1-access-the-creation-wizard]

1. Log in to your Docutray account at [https://app.docutray.com](https://app.docutray.com)

2. In the sidebar menu, navigate to **Document Types**

3. Click the **New Document Type** button

<img alt="New document type button" src="__img0" />

## Step 2: Upload Sample Documents [#step-2-upload-sample-documents]

The wizard will show you an upload zone where you can upload your sample documents.

### Supported Formats [#supported-formats]

* **PDF**: PDF documents
* **Images**: JPG, PNG, GIF, BMP, WebP

### Limits [#limits]

* **Maximum size**: 10MB per file
* **Maximum quantity**: 5 files at a time

### How to Upload [#how-to-upload]

You have two options:

1. **Drag and drop**: Drag files directly to the upload zone
2. **Select files**: Click the upload zone to open the file selector

<img alt="Document upload zone" src="__img1" />

<Callout type="info">
  Tip: Upload multiple examples of the same document type to get better schema generation results.
</Callout>

## Step 3: Describe the Data to Extract [#step-3-describe-the-data-to-extract]

Once at least one document is uploaded, a configuration panel with a text field will appear.

### Describe the Fields [#describe-the-fields]

In the description field, clearly indicate what data you want to extract from the document. Be specific about:

* **Field names** you want to obtain
* **Expected data types** (text, numbers, dates, lists)
* **Approximate location** in the document if relevant

### Description Example [#description-example]

```
Extract the following data from the invoice:
- Invoice number
- Issue date
- Issuer tax ID
- Issuer company name
- Recipient tax ID
- Net total
- Tax (e.g., VAT)
- Total amount due
- List of items with: quantity, description, unit price, and total
```

<img alt="Configuration panel" src="__img2" />

## Step 4: Generate the Schema [#step-4-generate-the-schema]

1. Click the **Generate Schema with AI** button

2. The system will analyze your documents and automatically generate:
   * A JSON schema with detected fields
   * A suggested name for the document type
   * A description of the document type

3. While generating, you'll see progress indicators:
   * Analyzing documents...
   * Generating schema...
   * Extracting test data...

<img alt="Generation progress" src="__img3" />

<Callout type="warning">
  Generation can take 10-30 seconds depending on document complexity.
</Callout>

## Step 5: Review and Edit the Schema [#step-5-review-and-edit-the-schema]

Once generated, you can view and edit the schema in an interactive table.

### Edit Fields [#edit-fields]

For each field you can modify:

* **Name**: The field identifier name
* **Type**: Text, Number, Boolean, Array, or Object
* **Description**: A field description
* **Required**: Whether the field is mandatory

### Available Field Types [#available-field-types]

| Type    | Icon   | Use               |
| ------- | ------ | ----------------- |
| Text    | `A`    | Text strings      |
| Number  | `#`    | Numeric values    |
| Boolean | Toggle | True/False        |
| Array   | `[ ]`  | Arrays of values  |
| Object  | `{ }`  | Nested structures |

### Add or Remove Fields [#add-or-remove-fields]

* **Add**: Use the "Add Field" button at the bottom of the table
* **Remove**: Use the trash icon on each row

<img alt="Schema editor" src="__img4" />

## Step 6: Test the Extraction [#step-6-test-the-extraction]

The system automatically runs an extraction test after generating the schema.

### View Results [#view-results]

1. Switch to the **Results** tab

2. You'll see the extracted data from the sample document in structured format

3. You can toggle between tree view and JSON to review the data

### If Results are Incorrect [#if-results-are-incorrect]

1. Go back to the **Configuration** tab
2. Adjust the schema as needed
3. Click **Regenerate** to test again

<img alt="Extraction results" src="__img5" />

## Step 7: Configure Metadata [#step-7-configure-metadata]

Before creating the document type, configure its information:

### Name (Required) [#name-required]

* Enter a descriptive name for the document type
* Example: "Electronic Invoice", "Fee Receipt"

### Description (Optional) [#description-optional]

* Add a description to help identify the document's purpose

### Save as Draft [#save-as-draft]

* Check this option if you want to save the type without activating it immediately
* Drafts are not available for API use until activated

<img alt="Metadata form" src="__img6" />

## Step 8: Create the Document Type [#step-8-create-the-document-type]

1. Review that all data is correct

2. Click the **Create Document** button in the top right corner

3. The system will save the document type and redirect you to its detail page

<img alt="Create document button" src="__img7" />

<Callout type="success">
  Congratulations! Your new document type is ready to use.
</Callout>

## Error Handling [#error-handling]

### Common Errors [#common-errors]

| Error              | Solution                                     |
| ------------------ | -------------------------------------------- |
| File too large     | Reduce file size to under 10MB               |
| Unsupported format | Use PDF, JPG, PNG, GIF, BMP, or WebP         |
| Generation error   | Check your connection and retry              |
| Extraction error   | Adjust the description and regenerate schema |

### Retry Operations [#retry-operations]

If an error occurs during generation or extraction:

1. An alert will appear with the error message
2. Use the **Retry** button to re-execute the operation
3. If the error persists, try with a more detailed description

## Next Steps [#next-steps]

Once your document type is created, you can:

* **Use the API**: Convert documents using the `/api/convert` endpoint
* **Create Flows**: Automate processing with DocFlows
* **Configure Webhooks**: Receive notifications when documents are processed

<Cards>
  <Card title="API Documentation" href="/docs/api/conversion/convertDocument" />

  <Card title="Configure Webhooks" href="/docs/webhooks" />
</Cards>

## Keyboard Shortcuts [#keyboard-shortcuts]

For faster navigation:

| Action                        | Shortcut                          |
| ----------------------------- | --------------------------------- |
| Open file selector            | `Enter` or `Space` on upload zone |
| Navigate between fields       | `Tab`                             |
| Expand/collapse nested fields | `Enter` on expand button          |


---

# User Guides (https://docs.docutray.com/docs/guides)



These guides walk you through DocuTray's core workflows end to end. Each one is
task-oriented: start from a goal — "create a document type", "receive webhook
notifications" — and follow the steps to a working result. If you are new to the
platform, begin with [Getting Started](/docs/getting-started) to create an
account and generate your first API key, then come back here to dig into
specific features.

Looking for something more reference-like instead of a tutorial? See the
[REST API reference](/docs/api), the [Node.js](/docs/node-sdk) and
[Python](/docs/python-sdk) SDKs, or the [CLI](/docs/cli) for the full surface of
every operation.

## Document Management [#document-management]

Define what DocuTray should extract from your documents. A document type pairs a
JSON schema with extraction hints, so every conversion of that type returns the
same structured shape.

<Cards>
  <Card title="Create Document Type" href="/docs/guides/crear-tipo-documento" description="Use the AI wizard to turn a single sample document into a reusable type with schema, prompts, and validation." />
</Cards>

## Configuration [#configuration]

Connect DocuTray to the rest of your stack — authenticate your requests and get
notified the moment a document finishes processing.

<Cards>
  <Card title="Getting Started" href="/docs/getting-started" description="Create your account, generate an API key, and run your first OCR conversion." />

  <Card title="Configure Webhooks" href="/docs/webhooks/configuracion" description="Receive real-time notifications when conversions, identifications, and steps complete." />
</Cards>


---

# Convert Documents (https://docs.docutray.com/docs/operations/convert)





The Convert operation extracts structured data from documents using AI-powered OCR. You provide a document (image or PDF) and a document type code, and DocuTray returns the extracted fields as JSON according to the document type's schema.

## Quick Start [#quick-start]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    result = client.convert.run(
        file=Path("invoice.pdf"),
        document_type_code="invoice"
    )

    print(result.data)
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    const result = await client.convert.run({
      file: readFileSync('invoice.pdf'),
      filename: 'invoice.pdf',
      documentTypeCode: 'invoice',
    });

    console.log(result.data);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@invoice.pdf" \
      -F "document_type_code=invoice"
    ```
  </Tab>
</Tabs>

## Response [#response]

<Tabs groupId="lang" items="['Python', 'Node.js', 'JSON']">
  <Tab value="Python">
    ```python
    # result is a ConversionResult
    print(result.data)
    # {
    #     "invoice_number": "F-2024-001",
    #     "issue_date": "2024-01-15",
    #     "vendor_name": "Acme Corporation",
    #     "subtotal": 1000,
    #     "tax": 160,
    #     "total": 1160
    # }
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // result is a ConversionResult
    console.log(result.data);
    // {
    //   invoice_number: 'F-2024-001',
    //   issue_date: '2024-01-15',
    //   vendor_name: 'Acme Corporation',
    //   subtotal: 1000,
    //   tax: 160,
    //   total: 1160
    // }
    ```
  </Tab>

  <Tab value="JSON">
    ```json
    {
      "data": {
        "invoice_number": "F-2024-001",
        "issue_date": "2024-01-15",
        "vendor_name": "Acme Corporation",
        "subtotal": 1000,
        "tax": 160,
        "total": 1160
      }
    }
    ```
  </Tab>
</Tabs>

## Async Conversion [#async-conversion]

For large documents or batch processing, use async conversion. The document is processed in the background and you can poll for the result.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    # Start async conversion
    status = client.convert.run_async(
        file=Path("large_document.pdf"),
        document_type_code="invoice"
    )

    print(f"Conversion ID: {status.conversion_id}")
    print(f"Status: {status.status}")  # ENQUEUED

    # Wait for completion (polls automatically)
    result = status.wait()

    if result.is_success():
        print(result.data)
    elif result.is_error():
        print(f"Error: {result.error}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // Start async conversion
    const status = await client.convert.runAsync({
      file: readFileSync('large_document.pdf'),
      filename: 'large_document.pdf',
      documentTypeCode: 'invoice',
    });

    console.log(`Conversion ID: ${status.conversion_id}`);
    console.log(`Status: ${status.status}`); // ENQUEUED

    // Wait for completion (polls automatically)
    const result = await status.wait({
      onStatus: (s) => console.log(`Status: ${s.status}`),
    });

    if (result.isSuccess()) {
      console.log(result.data);
    } else if (result.isFailed()) {
      console.log(`Error: ${result.error}`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Start async conversion
    curl -X POST https://app.docutray.com/api/convert-async \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@large_document.pdf" \
      -F "document_type_code=invoice"

    # Response:
    # {
    #   "conversion_id": "cm5vm9hx30001m5cgh0p9v8qa",
    #   "status": "ENQUEUED",
    #   "status_url": "https://app.docutray.com/api/convert-async/status/cm5vm9hx30001m5cgh0p9v8qa"
    # }

    # Poll for status
    curl https://app.docutray.com/api/convert-async/status/cm5vm9hx30001m5cgh0p9v8qa \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

### Checking Status Manually [#checking-status-manually]

You can also check the status of an async conversion by its ID:

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    status = client.convert.get_status("cm5vm9hx30001m5cgh0p9v8qa")

    if status.is_success():
        print(status.data)
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const status = await client.convert.getStatus('cm5vm9hx30001m5cgh0p9v8qa');

    if (status.isSuccess()) {
      console.log(status.data);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/convert-async/status/cm5vm9hx30001m5cgh0p9v8qa \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Input Methods [#input-methods]

DocuTray supports three methods for providing documents.

### File Upload [#file-upload]

Upload a file directly from disk or memory.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path

    # From a file path
    result = client.convert.run(
        file=Path("invoice.pdf"),
        document_type_code="invoice"
    )

    # From bytes
    with open("invoice.pdf", "rb") as f:
        result = client.convert.run(
            file=f.read(),
            document_type_code="invoice",
            content_type="application/pdf"
        )

    # From a file object
    with open("invoice.pdf", "rb") as f:
        result = client.convert.run(
            file=f,
            document_type_code="invoice"
        )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import { readFileSync } from 'fs';

    const result = await client.convert.run({
      file: readFileSync('invoice.pdf'),
      documentTypeCode: 'invoice',
      filename: 'invoice.pdf',
    });
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@invoice.pdf" \
      -F "document_type_code=invoice"
    ```
  </Tab>
</Tabs>

### URL [#url]

Provide a publicly accessible URL to the document. DocuTray will download and process it.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    result = client.convert.run(
        url="https://example.com/invoice.pdf",
        document_type_code="invoice"
    )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const result = await client.convert.run({
      url: 'https://example.com/invoice.pdf',
      documentTypeCode: 'invoice',
    });
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "image_url": "https://example.com/invoice.pdf",
        "document_type_code": "invoice"
      }'
    ```
  </Tab>
</Tabs>

### Base64 [#base64]

Send a base64-encoded document in the request body.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    import base64

    with open("invoice.pdf", "rb") as f:
        encoded = base64.b64encode(f.read()).decode()

    result = client.convert.run(
        file_base64=encoded,
        document_type_code="invoice",
        content_type="application/pdf"
    )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import { readFileSync } from 'fs';

    const encoded = readFileSync('invoice.pdf').toString('base64');

    const result = await client.convert.run({
      base64: encoded,
      documentTypeCode: 'invoice',
      contentType: 'application/pdf',
    });
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    BASE64=$(base64 -i invoice.pdf)

    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d "{
        \"image_base64\": \"$BASE64\",
        \"image_content_type\": \"application/pdf\",
        \"document_type_code\": \"invoice\"
      }"
    ```
  </Tab>
</Tabs>

## Document Metadata [#document-metadata]

You can attach custom metadata to any conversion. This metadata is returned in status responses and webhooks, useful for tracking your internal references.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    result = client.convert.run(
        file=Path("invoice.pdf"),
        document_type_code="invoice",
        document_metadata={"customer_id": "cust_123", "batch": "2024-Q1"}
    )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const result = await client.convert.run({
      file: readFileSync('invoice.pdf'),
      filename: 'invoice.pdf',
      documentTypeCode: 'invoice',
      documentMetadata: { customer_id: 'cust_123', batch: '2024-Q1' },
    });
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@invoice.pdf" \
      -F "document_type_code=invoice" \
      -F 'document_metadata={"customer_id": "cust_123", "batch": "2024-Q1"}'
    ```
  </Tab>
</Tabs>

## Parameters [#parameters]

| Parameter                | Type   | Required | Description                                               |
| ------------------------ | ------ | -------- | --------------------------------------------------------- |
| `document_type_code`     | string | Yes      | Code identifying the document type schema to use          |
| `file`                   | File   | No       | File to process (path, bytes, or file object)             |
| `url`                    | string | No       | Public URL of the document to download and process        |
| `file_base64` / `base64` | string | No       | Base64-encoded document content                           |
| `content_type`           | string | No       | MIME type of the document (auto-detected if not provided) |
| `document_metadata`      | object | No       | Custom metadata to attach to the conversion               |

<Callout type="info">
  You must provide exactly one of `file`, `url`, or `file_base64`/`base64`.
</Callout>

**Supported file formats:** JPEG, PNG, GIF, BMP, WebP, PDF (up to 100MB)

## Error Handling [#error-handling]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from docutray import (
        AuthenticationError,
        BadRequestError,
        RateLimitError,
        NotFoundError,
        DocuTrayError,
    )

    try:
        result = client.convert.run(
            file=Path("invoice.pdf"),
            document_type_code="invoice"
        )
    except AuthenticationError:
        print("Invalid API key")
    except BadRequestError as e:
        print(f"Invalid request: {e.message}")
    except RateLimitError as e:
        print(f"Rate limited. Retry after {e.retry_after}s")
    except NotFoundError:
        print("Document type not found")
    except DocuTrayError as e:
        print(f"Error: {e.message}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import {
      AuthenticationError,
      BadRequestError,
      RateLimitError,
      NotFoundError,
      DocuTrayError,
    } from 'docutray';

    try {
      const result = await client.convert.run({
        file: readFileSync('invoice.pdf'),
        filename: 'invoice.pdf',
        documentTypeCode: 'invoice',
      });
    } catch (error) {
      if (error instanceof AuthenticationError) {
        console.error('Invalid API key');
      } else if (error instanceof RateLimitError) {
        console.error(`Rate limited. Retry after ${error.retryAfter}s`);
      } else if (error instanceof NotFoundError) {
        console.error('Document type not found');
      } else if (error instanceof DocuTrayError) {
        console.error(`Error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## Complete Code [#complete-code]

End-to-end example that converts a document with error handling and result processing.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client, AuthenticationError, RateLimitError, DocuTrayError

    client = Client(api_key="YOUR_API_KEY")

    try:
        # Sync conversion for small documents
        result = client.convert.run(
            file=Path("invoice.pdf"),
            document_type_code="invoice",
            document_metadata={"source": "email", "batch": "2024-Q1"}
        )

        # Access extracted data
        data = result.data
        print(f"Invoice: {data.get('invoice_number')}")
        print(f"Total: ${data.get('total')}")

    except AuthenticationError:
        print("Check your API key")
    except RateLimitError as e:
        print(f"Rate limited. Retry after {e.retry_after} seconds")
    except DocuTrayError as e:
        print(f"Conversion failed: {e.message}")
    finally:
        client.close()
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, {
      AuthenticationError,
      RateLimitError,
      DocuTrayError,
    } from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      // Sync conversion for small documents
      const result = await client.convert.run({
        file: readFileSync('invoice.pdf'),
        filename: 'invoice.pdf',
        documentTypeCode: 'invoice',
        documentMetadata: { source: 'email', batch: '2024-Q1' },
      });

      // Access extracted data
      const data = result.data;
      console.log(`Invoice: ${data.invoice_number}`);
      console.log(`Total: $${data.total}`);
    } catch (error) {
      if (error instanceof AuthenticationError) {
        console.error('Check your API key');
      } else if (error instanceof RateLimitError) {
        console.error(`Rate limited. Retry after ${error.retryAfter}s`);
      } else if (error instanceof DocuTrayError) {
        console.error(`Conversion failed: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## SDK Reference [#sdk-reference]

For detailed class and method documentation:

* [Python SDK Convert Reference](/docs/python-sdk/resources/convert)
* [Node.js SDK Convert Reference](/docs/node-sdk/resources/convert)
* [REST API Reference](/docs/api)


---

# Document Types (https://docs.docutray.com/docs/operations/document-types)





Document types define what data DocuTray extracts from your documents. Each type has a JSON schema that describes the fields to extract. Use the Document Types API to list available types, create custom types, update existing ones, inspect their schemas, and validate extracted data.

## List Document Types [#list-document-types]

Retrieve all document types accessible to your organization, including public types and your custom types.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    # List all document types
    page = client.document_types.list()

    for doc_type in page.data:
        print(f"{doc_type.codeType}: {doc_type.name}")

    # Auto-paginate through all results
    for doc_type in client.document_types.list().auto_paging_iter():
        print(f"{doc_type.codeType}: {doc_type.name}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    // List all document types
    const page = await client.documentTypes.list();

    for (const docType of page.data) {
      console.log(`${docType.codeType}: ${docType.name}`);
    }

    // Auto-paginate through all results
    for await (const docType of client.documentTypes.list().autoPagingIter()) {
      console.log(`${docType.codeType}: ${docType.name}`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/document-types \
      -H "Authorization: Bearer YOUR_API_KEY"

    # With search and pagination
    curl "https://app.docutray.com/api/document-types?search=invoice&page=1&limit=20" \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

### Response [#response]

<Tabs groupId="lang" items="['Python', 'Node.js', 'JSON']">
  <Tab value="Python">
    ```python
    # page is a Page[DocumentType]
    print(f"Total: {page.pagination.total}")
    print(f"Page: {page.pagination.page}")

    for dt in page.data:
        print(f"  {dt.name} ({dt.codeType})")
        print(f"  Public: {dt.isPublic}, Draft: {dt.isDraft}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // page is a Page<DocumentType>
    console.log(`Total: ${page.pagination.total}`);
    console.log(`Page: ${page.pagination.page}`);

    for (const dt of page.data) {
      console.log(`  ${dt.name} (${dt.codeType})`);
      console.log(`  Public: ${dt.isPublic}, Draft: ${dt.isDraft}`);
    }
    ```
  </Tab>

  <Tab value="JSON">
    ```json
    {
      "data": [
        {
          "id": "cm5vm9hx30001m5cgh0p9v8qa",
          "name": "Invoice",
          "codeType": "invoice",
          "description": "Standard invoice document",
          "isPublic": true,
          "isDraft": false,
          "createdAt": "2024-01-15T10:30:00.000Z",
          "updatedAt": "2024-01-15T10:30:00.000Z"
        }
      ],
      "pagination": {
        "total": 50,
        "page": 1,
        "limit": 20
      }
    }
    ```
  </Tab>
</Tabs>

### Search and Pagination [#search-and-pagination]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    # Search by name or code
    page = client.document_types.list(search="invoice")

    # Manual pagination
    page = client.document_types.list(page=1, limit=10)

    # Iterate through all pages
    for page_chunk in client.document_types.list().iter_pages():
        print(f"Page {page_chunk.page}: {len(page_chunk.data)} items")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // Search by name or code
    const page = await client.documentTypes.list({ search: 'invoice' });

    // Manual pagination
    const page2 = await client.documentTypes.list({ page: 1, limit: 10 });

    // Iterate through all pages
    for await (const pageChunk of client.documentTypes.list().iterPages()) {
      console.log(`Page: ${pageChunk.data.length} items`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Search by name
    curl "https://app.docutray.com/api/document-types?search=invoice" \
      -H "Authorization: Bearer YOUR_API_KEY"

    # Page 2 with 10 results per page
    curl "https://app.docutray.com/api/document-types?page=2&limit=10" \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Get Document Type [#get-document-type]

Retrieve a specific document type by ID, including its full JSON schema.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    doc_type = client.document_types.get("dt_abc123")

    print(f"Name: {doc_type.name}")
    print(f"Code: {doc_type.codeType}")
    print(f"Description: {doc_type.description}")
    print(f"Schema: {doc_type.schema_}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const docType = await client.documentTypes.get('dt_abc123');

    console.log(`Name: ${docType.name}`);
    console.log(`Code: ${docType.codeType}`);
    console.log(`Description: ${docType.description}`);
    console.log('Schema:', JSON.stringify(docType.schema, null, 2));
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/document-types/dt_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Create Document Type [#create-document-type]

Create a new document type with a JSON schema that defines the fields to extract.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    doc_type = client.document_types.create(
        name="Purchase Order",
        code_type="myorg_purchase_order",
        description="Standard purchase order document",
        json_schema={
            "type": "object",
            "properties": {
                "po_number": {"type": "string", "description": "Purchase order number"},
                "vendor": {"type": "string", "description": "Vendor name"},
                "total": {"type": "number", "description": "Total amount"},
            },
        },
        is_draft=True,
        conversion_mode="json",
    )

    print(f"Created: {doc_type.name} ({doc_type.codeType})")
    print(f"Status: {doc_type.status}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    const docType = await client.documentTypes.create({
      name: 'Purchase Order',
      codeType: 'myorg_purchase_order',
      description: 'Standard purchase order document',
      jsonSchema: {
        type: 'object',
        properties: {
          po_number: { type: 'string', description: 'Purchase order number' },
          vendor: { type: 'string', description: 'Vendor name' },
          total: { type: 'number', description: 'Total amount' },
        },
      },
      isDraft: true,
      conversionMode: 'json',
    });

    console.log(`Created: ${docType.name} (${docType.codeType})`);
    console.log(`Status: ${docType.status}`);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/document-types \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "name": "Purchase Order",
        "codeType": "myorg_purchase_order",
        "description": "Standard purchase order document",
        "jsonSchema": {
          "type": "object",
          "properties": {
            "po_number": {"type": "string", "description": "Purchase order number"},
            "vendor": {"type": "string", "description": "Vendor name"},
            "total": {"type": "number", "description": "Total amount"}
          }
        },
        "isDraft": true,
        "conversionMode": "json"
      }'
    ```
  </Tab>
</Tabs>

### Response [#response-1]

```json
{
  "data": {
    "id": "cm5vm9hx30001m5cgh0p9v8qa",
    "codeType": "myorg_purchase_order",
    "name": "Purchase Order",
    "description": "Standard purchase order document",
    "isPublic": false,
    "isDraft": true,
    "status": "draft",
    "createdAt": "2024-03-15T10:30:00.000Z",
    "updatedAt": "2024-03-15T10:30:00.000Z"
  }
}
```

### Admin vs Non-Admin Behavior [#admin-vs-non-admin-behavior]

| Behavior              | Non-Admin Users                           | Admin Users             |
| --------------------- | ----------------------------------------- | ----------------------- |
| `codeType` prefix     | Must start with org slug (e.g., `myorg_`) | No prefix required      |
| `codeType` min length | At least 3 characters after prefix        | No minimum after prefix |
| `isPublic`            | Forced to `false`                         | Can set to `true`       |
| `source`              | Set to `USER`                             | Defaults to `ADMIN`     |

### Error Handling [#error-handling]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from docutray import Client, ConflictError, BadRequestError, PermissionDeniedError

    client = Client(api_key="YOUR_API_KEY")

    try:
        doc_type = client.document_types.create(
            name="Purchase Order",
            code_type="myorg_purchase_order",
            description="Standard purchase order document",
            json_schema={"type": "object", "properties": {}},
        )
    except ConflictError:
        # 409: codeType already exists
        print("A document type with this codeType already exists")
    except BadRequestError as e:
        # 400: Invalid request body
        print(f"Validation error: {e.message}")
    except PermissionDeniedError:
        # 403: Insufficient permissions
        print("You don't have permission to create this document type")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { ConflictError, BadRequestError, PermissionDeniedError } from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      const docType = await client.documentTypes.create({
        name: 'Purchase Order',
        codeType: 'myorg_purchase_order',
        description: 'Standard purchase order document',
        jsonSchema: { type: 'object', properties: {} },
      });
    } catch (error) {
      if (error instanceof ConflictError) {
        // 409: codeType already exists
        console.error('A document type with this codeType already exists');
      } else if (error instanceof BadRequestError) {
        // 400: Invalid request body
        console.error(`Validation error: ${error.message}`);
      } else if (error instanceof PermissionDeniedError) {
        // 403: Insufficient permissions
        console.error("You don't have permission to create this document type");
      }
    }
    ```
  </Tab>
</Tabs>

## Update Document Type [#update-document-type]

Update an existing document type. All fields are optional. The `codeType` field is immutable and will be ignored if provided.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    doc_type = client.document_types.update(
        "cm5vm9hx30001m5cgh0p9v8qa",
        name="Updated Purchase Order",
        description="Updated description",
        is_draft=False,  # Publish the document type
        prompt_hints="Focus on extracting line items and totals",
    )

    print(f"Updated: {doc_type.name}")
    print(f"Status: {doc_type.status}")  # "published" when isDraft=False
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    const docType = await client.documentTypes.update('cm5vm9hx30001m5cgh0p9v8qa', {
      name: 'Updated Purchase Order',
      description: 'Updated description',
      isDraft: false, // Publish the document type
      promptHints: 'Focus on extracting line items and totals',
    });

    console.log(`Updated: ${docType.name}`);
    console.log(`Status: ${docType.status}`); // "published" when isDraft=false
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X PUT https://app.docutray.com/api/document-types/cm5vm9hx30001m5cgh0p9v8qa \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "name": "Updated Purchase Order",
        "description": "Updated description",
        "isDraft": false,
        "promptHints": "Focus on extracting line items and totals"
      }'
    ```
  </Tab>
</Tabs>

### Response [#response-2]

```json
{
  "data": {
    "id": "cm5vm9hx30001m5cgh0p9v8qa",
    "codeType": "myorg_purchase_order",
    "name": "Updated Purchase Order",
    "description": "Updated description",
    "isPublic": false,
    "isDraft": false,
    "status": "published",
    "createdAt": "2024-03-15T10:30:00.000Z",
    "updatedAt": "2024-03-16T14:20:00.000Z"
  }
}
```

### Permissions [#permissions]

* Non-admin users can only update document types they created.
* Non-admin users cannot set `isPublic` to `true`.
* When `isDraft` changes, the `status` field is automatically updated.
* A version snapshot is created before each update.

### Error Handling [#error-handling-1]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from docutray import Client, NotFoundError, PermissionDeniedError, BadRequestError

    client = Client(api_key="YOUR_API_KEY")

    try:
        doc_type = client.document_types.update(
            "cm5vm9hx30001m5cgh0p9v8qa",
            name="Updated Name",
        )
    except NotFoundError:
        # 404: Document type not found
        print("Document type not found")
    except PermissionDeniedError:
        # 403: Can only update your own document types
        print("You don't have permission to update this document type")
    except BadRequestError as e:
        # 400: Invalid request body
        print(f"Validation error: {e.message}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { NotFoundError, PermissionDeniedError, BadRequestError } from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      const docType = await client.documentTypes.update('cm5vm9hx30001m5cgh0p9v8qa', {
        name: 'Updated Name',
      });
    } catch (error) {
      if (error instanceof NotFoundError) {
        // 404: Document type not found
        console.error('Document type not found');
      } else if (error instanceof PermissionDeniedError) {
        // 403: Can only update your own document types
        console.error("You don't have permission to update this document type");
      } else if (error instanceof BadRequestError) {
        // 400: Invalid request body
        console.error(`Validation error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## Validate Data [#validate-data]

Validate extracted data against a document type's schema. Useful for checking data quality after conversion or before submitting to downstream systems.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    result = client.document_types.validate(
        "dt_invoice",
        {"invoice_number": "INV-001", "total": 100}
    )

    if result.is_valid():
        print("Data is valid!")
    else:
        for error in result.errors.messages:
            print(f"Error: {error}")
        for warning in result.warnings.messages:
            print(f"Warning: {warning}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const result = await client.documentTypes.validate('dt_invoice', {
      invoice_number: 'INV-001',
      total: 100,
    });

    if (result.errors.count === 0) {
      console.log('Schema is valid!');
    } else {
      for (const error of result.errors.messages) {
        console.log(`Error: ${error}`);
      }
    }

    if (result.warnings?.count > 0) {
      for (const warning of result.warnings.messages) {
        console.log(`Warning: ${warning}`);
      }
    }
    ```
  </Tab>
</Tabs>

## Understanding Schemas [#understanding-schemas]

Each document type has a JSON schema that defines the fields DocuTray extracts. For example, an invoice schema might look like:

```json
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Invoice number or identifier"
    },
    "issue_date": {
      "type": "string",
      "format": "date",
      "description": "Date the invoice was issued"
    },
    "vendor_name": {
      "type": "string",
      "description": "Name of the issuing company"
    },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "amount": { "type": "number" }
        }
      }
    },
    "subtotal": { "type": "number" },
    "tax": { "type": "number" },
    "total": { "type": "number" }
  }
}
```

The schema determines which fields are extracted during conversion and what data types they should have.

## Parameters [#parameters]

### List Parameters [#list-parameters]

| Parameter | Type    | Required | Description                          |
| --------- | ------- | -------- | ------------------------------------ |
| `search`  | string  | No       | Search by name, code, or description |
| `page`    | integer | No       | Page number (default: 1)             |
| `limit`   | integer | No       | Items per page, 1-100 (default: 20)  |

### Get Parameters [#get-parameters]

| Parameter | Type   | Required | Description      |
| --------- | ------ | -------- | ---------------- |
| `id`      | string | Yes      | Document type ID |

### Create Parameters (Request Body) [#create-parameters-request-body]

| Parameter              | Type    | Required | Description                                                                         |
| ---------------------- | ------- | -------- | ----------------------------------------------------------------------------------- |
| `name`                 | string  | Yes      | Document type name (min 2 characters)                                               |
| `codeType`             | string  | Yes      | Unique code identifier (`^[a-z0-9_]+$`). Non-admin users must prefix with org slug. |
| `description`          | string  | Yes      | Document type description                                                           |
| `jsonSchema`           | object  | Yes      | JSON Schema for document validation                                                 |
| `isDraft`              | boolean | No       | Whether the document type is a draft (default: `true`)                              |
| `promptHints`          | string  | No       | Hints for the OCR prompt                                                            |
| `identifyPromptHints`  | string  | No       | Hints for the document identification prompt                                        |
| `conversionMode`       | string  | No       | Conversion mode: `json`, `toon`, or `multi_prompt` (default: `json`)                |
| `keepPropertyOrdering` | boolean | No       | Preserve property ordering in schema (default: `false`)                             |
| `isPublic`             | boolean | No       | Whether the document type is public (admin only)                                    |

### Update Parameters [#update-parameters]

#### Path Parameters [#path-parameters]

| Parameter | Type   | Required | Description      |
| --------- | ------ | -------- | ---------------- |
| `id`      | string | Yes      | Document type ID |

#### Request Body [#request-body]

All fields are optional. The `codeType` field is immutable and cannot be changed.

| Parameter              | Type    | Required | Description                                        |
| ---------------------- | ------- | -------- | -------------------------------------------------- |
| `name`                 | string  | No       | Document type name                                 |
| `description`          | string  | No       | Document type description                          |
| `jsonSchema`           | object  | No       | JSON Schema for document validation                |
| `isDraft`              | boolean | No       | Whether the document type is a draft               |
| `promptHints`          | string  | No       | Hints for the OCR prompt                           |
| `identifyPromptHints`  | string  | No       | Hints for the document identification prompt       |
| `conversionMode`       | string  | No       | Conversion mode: `json`, `toon`, or `multi_prompt` |
| `keepPropertyOrdering` | boolean | No       | Preserve property ordering in schema               |
| `isPublic`             | boolean | No       | Whether the document type is public (admin only)   |

## Complete Code [#complete-code]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from docutray import Client, NotFoundError, ConflictError, DocuTrayError

    client = Client(api_key="YOUR_API_KEY")

    try:
        # List available document types
        print("Available document types:")
        for doc_type in client.document_types.list().auto_paging_iter():
            status = "published" if not doc_type.isDraft else "draft"
            scope = "public" if doc_type.isPublic else "private"
            print(f"  [{status}/{scope}] {doc_type.name} ({doc_type.codeType})")

        # Get details for a specific type
        invoice_type = client.document_types.get("dt_abc123")
        print(f"\nSchema for {invoice_type.name}:")
        print(invoice_type.schema_)

        # Create a new document type
        new_type = client.document_types.create(
            name="Purchase Order",
            code_type="myorg_purchase_order",
            description="Standard purchase order document",
            json_schema={
                "type": "object",
                "properties": {
                    "po_number": {"type": "string"},
                    "total": {"type": "number"},
                },
            },
        )
        print(f"\nCreated: {new_type.name} ({new_type.codeType})")

        # Update the document type
        updated = client.document_types.update(
            new_type.id,
            name="Updated Purchase Order",
            is_draft=False,
        )
        print(f"Updated: {updated.name}, Status: {updated.status}")

    except ConflictError:
        print("Document type with this codeType already exists")
    except NotFoundError:
        print("Document type not found")
    except DocuTrayError as e:
        print(f"Error: {e.message}")
    finally:
        client.close()
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { NotFoundError, ConflictError, DocuTrayError } from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      // List available document types
      console.log('Available document types:');
      for await (const docType of client.documentTypes.list().autoPagingIter()) {
        const status = docType.isDraft ? 'draft' : 'published';
        const scope = docType.isPublic ? 'public' : 'private';
        console.log(`  [${status}/${scope}] ${docType.name} (${docType.codeType})`);
      }

      // Get details for a specific type
      const invoiceType = await client.documentTypes.get('dt_abc123');
      console.log(`\nSchema for ${invoiceType.name}:`);
      console.log(JSON.stringify(invoiceType.schema, null, 2));

      // Create a new document type
      const newType = await client.documentTypes.create({
        name: 'Purchase Order',
        codeType: 'myorg_purchase_order',
        description: 'Standard purchase order document',
        jsonSchema: {
          type: 'object',
          properties: {
            po_number: { type: 'string' },
            total: { type: 'number' },
          },
        },
      });
      console.log(`\nCreated: ${newType.name} (${newType.codeType})`);

      // Update the document type
      const updated = await client.documentTypes.update(newType.id, {
        name: 'Updated Purchase Order',
        isDraft: false,
      });
      console.log(`Updated: ${updated.name}, Status: ${updated.status}`);
    } catch (error) {
      if (error instanceof ConflictError) {
        console.error('Document type with this codeType already exists');
      } else if (error instanceof NotFoundError) {
        console.error('Document type not found');
      } else if (error instanceof DocuTrayError) {
        console.error(`Error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## SDK Reference [#sdk-reference]

For detailed class and method documentation:

* [Python SDK Document Types Reference](/docs/python-sdk/resources/document_types)
* [Node.js SDK Document Types Reference](/docs/node-sdk/resources/document-types)
* [REST API Reference](/docs/api)


---

# Identify Documents (https://docs.docutray.com/docs/operations/identify)





The Identify operation automatically detects which type of document you have. Given a document and a list of possible types, DocuTray returns the best match with a confidence score and ranked alternatives.

## Quick Start [#quick-start]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    result = client.identify.run(
        file=Path("document.pdf"),
        document_type_code_options=["invoice", "receipt", "contract"]
    )

    print(f"Type: {result.document_type.name}")
    print(f"Confidence: {result.document_type.confidence:.0%}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    const result = await client.identify.run({
      file: readFileSync('document.pdf'),
      filename: 'document.pdf',
      documentTypeCodeOptions: ['invoice', 'receipt', 'contract'],
    });

    console.log(`Type: ${result.document_type.name}`);
    console.log(`Confidence: ${(result.document_type.confidence * 100).toFixed(0)}%`);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/identify \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@document.pdf" \
      -F 'document_type_code_options=["invoice", "receipt", "contract"]'
    ```
  </Tab>
</Tabs>

## Response [#response]

<Tabs groupId="lang" items="['Python', 'Node.js', 'JSON']">
  <Tab value="Python">
    ```python
    # result is an IdentificationResult
    print(result.document_type.code)        # "invoice"
    print(result.document_type.name)        # "Invoice"
    print(result.document_type.confidence)  # 0.95

    # View alternatives ranked by confidence
    for alt in result.alternatives:
        print(f"  {alt.name}: {alt.confidence:.0%}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // result is an IdentificationResult
    console.log(result.document_type.code);        // "invoice"
    console.log(result.document_type.name);        // "Invoice"
    console.log(result.document_type.confidence);  // 0.95

    // View alternatives ranked by confidence
    for (const alt of result.alternatives) {
      console.log(`  ${alt.name}: ${(alt.confidence * 100).toFixed(0)}%`);
    }
    ```
  </Tab>

  <Tab value="JSON">
    ```json
    {
      "document_type": {
        "code": "invoice",
        "name": "Invoice",
        "confidence": 0.95
      },
      "alternatives": [
        {
          "code": "receipt",
          "name": "Receipt",
          "confidence": 0.04
        },
        {
          "code": "contract",
          "name": "Contract",
          "confidence": 0.01
        }
      ]
    }
    ```
  </Tab>
</Tabs>

## Async Identification [#async-identification]

For large documents, use async identification to process in the background.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    # Start async identification
    status = client.identify.run_async(
        file=Path("document.pdf"),
        document_type_code_options=["invoice", "receipt"]
    )

    # Wait for completion
    result = status.wait()

    if result.is_success():
        print(f"Type: {result.document_type.code}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // Start async identification
    const status = await client.identify.runAsync({
      file: readFileSync('document.pdf'),
      filename: 'document.pdf',
      documentTypeCodeOptions: ['invoice', 'receipt'],
    });

    // Wait for completion
    const result = await status.wait({
      onStatus: (s) => console.log(`Status: ${s.status}`),
    });

    if (result.isSuccess()) {
      console.log(`Type: ${result.document_type.code}`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Start async identification
    curl -X POST https://app.docutray.com/api/identify-async \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@document.pdf" \
      -F 'document_type_code_options=["invoice", "receipt"]'

    # Poll for status
    curl https://app.docutray.com/api/identify-async/status/IDENTIFICATION_ID \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Identify Then Convert [#identify-then-convert]

A common pattern is to first identify a document, then convert it using the detected type. This is useful when you receive documents of unknown types.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")
    document = Path("unknown_document.pdf")

    # Step 1: Identify the document type
    identification = client.identify.run(
        file=document,
        document_type_code_options=["invoice", "receipt", "contract"]
    )

    detected_type = identification.document_type.code
    confidence = identification.document_type.confidence
    print(f"Detected: {detected_type} ({confidence:.0%})")

    # Step 2: Convert using the detected type
    if confidence > 0.8:
        result = client.convert.run(
            file=document,
            document_type_code=detected_type
        )
        print(result.data)
    else:
        print("Low confidence — review manually")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });
    const document = readFileSync('unknown_document.pdf');

    // Step 1: Identify the document type
    const identification = await client.identify.run({
      file: document,
      documentTypeCodeOptions: ['invoice', 'receipt', 'contract'],
    });

    const detectedType = identification.document_type.code;
    const confidence = identification.document_type.confidence;
    console.log(`Detected: ${detectedType} (${(confidence * 100).toFixed(0)}%)`);

    // Step 2: Convert using the detected type
    if (confidence > 0.8) {
      const result = await client.convert.run({
        file: document,
        documentTypeCode: detectedType,
      });
      console.log(result.data);
    } else {
      console.log('Low confidence — review manually');
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Step 1: Identify the document type
    IDENTIFY_RESULT=$(curl -s -X POST https://app.docutray.com/api/identify \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@unknown_document.pdf" \
      -F 'document_type_code_options=["invoice", "receipt", "contract"]')

    # Extract the detected type code
    DOC_TYPE=$(echo $IDENTIFY_RESULT | jq -r '.document_type.code')
    echo "Detected type: $DOC_TYPE"

    # Step 2: Convert using the detected type
    curl -X POST https://app.docutray.com/api/convert \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@unknown_document.pdf" \
      -F "document_type_code=$DOC_TYPE"
    ```
  </Tab>
</Tabs>

## Parameters [#parameters]

| Parameter                    | Type      | Required | Description                                               |
| ---------------------------- | --------- | -------- | --------------------------------------------------------- |
| `file`                       | File      | No       | File to identify (path, bytes, or file object)            |
| `url`                        | string    | No       | Public URL of the document to download and identify       |
| `file_base64` / `base64`     | string    | No       | Base64-encoded document content                           |
| `document_type_code_options` | string\[] | Yes      | List of document type codes to consider                   |
| `content_type`               | string    | No       | MIME type of the document (auto-detected if not provided) |
| `document_metadata`          | object    | No       | Custom metadata to attach to the identification           |

<Callout type="info">
  You must provide exactly one of `file`, `url`, or `file_base64`/`base64`.
</Callout>

## Complete Code [#complete-code]

End-to-end example with the identify-then-convert pattern and error handling.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client, NotFoundError, DocuTrayError

    client = Client(api_key="YOUR_API_KEY")
    DOCUMENT_TYPES = ["invoice", "receipt", "contract", "id_card"]

    try:
        document = Path("incoming_document.pdf")

        # Identify document type
        identification = client.identify.run(
            file=document,
            document_type_code_options=DOCUMENT_TYPES
        )

        best_match = identification.document_type
        print(f"Identified as: {best_match.name} ({best_match.confidence:.0%})")

        # Show alternatives if confidence is moderate
        if best_match.confidence < 0.9:
            print("Alternatives:")
            for alt in identification.alternatives:
                print(f"  - {alt.name}: {alt.confidence:.0%}")

        # Convert if confidence is sufficient
        if best_match.confidence >= 0.7:
            result = client.convert.run(
                file=document,
                document_type_code=best_match.code
            )
            print(f"Extracted {len(result.data)} fields")
        else:
            print("Confidence too low for automatic conversion")

    except NotFoundError:
        print("One or more document types not found")
    except DocuTrayError as e:
        print(f"Error: {e.message}")
    finally:
        client.close()
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { NotFoundError, DocuTrayError } from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });
    const DOCUMENT_TYPES = ['invoice', 'receipt', 'contract', 'id_card'];

    try {
      const document = readFileSync('incoming_document.pdf');

      // Identify document type
      const identification = await client.identify.run({
        file: document,
        documentTypeCodeOptions: DOCUMENT_TYPES,
      });

      const bestMatch = identification.document_type;
      console.log(`Identified as: ${bestMatch.name} (${(bestMatch.confidence * 100).toFixed(0)}%)`);

      // Show alternatives if confidence is moderate
      if (bestMatch.confidence < 0.9) {
        console.log('Alternatives:');
        for (const alt of identification.alternatives) {
          console.log(`  - ${alt.name}: ${(alt.confidence * 100).toFixed(0)}%`);
        }
      }

      // Convert if confidence is sufficient
      if (bestMatch.confidence >= 0.7) {
        const result = await client.convert.run({
          file: document,
          documentTypeCode: bestMatch.code,
        });
        console.log(`Extracted ${Object.keys(result.data).length} fields`);
      } else {
        console.log('Confidence too low for automatic conversion');
      }
    } catch (error) {
      if (error instanceof NotFoundError) {
        console.error('One or more document types not found');
      } else if (error instanceof DocuTrayError) {
        console.error(`Error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## SDK Reference [#sdk-reference]

For detailed class and method documentation:

* [Python SDK Identify Reference](/docs/python-sdk/resources/identify)
* [Node.js SDK Identify Reference](/docs/node-sdk/resources/identify)
* [REST API Reference](/docs/api)


---

# Operations (https://docs.docutray.com/docs/operations)





Operations are the core building blocks of DocuTray. Each operation provides a specific document processing capability that you can use via our SDKs or REST API.

All examples below include tabs for **Python**, **Node.js**, and **cURL** so you can quickly get started in your preferred language.

<Cards>
  <Card title="Convert Documents" description="Extract structured data from documents using AI-powered OCR. Supports sync and async modes with file upload, URL, and base64 input." href="/docs/operations/convert" />

  <Card title="Identify Documents" description="Automatically detect the document type from a set of candidates. Get confidence scores and use the identify-then-convert flow." href="/docs/operations/identify" />

  <Card title="Document Types" description="List and inspect available document types and their JSON schemas. Validate extracted data against schemas." href="/docs/operations/document-types" />

  <Card title="Steps" description="Execute predefined workflow steps to process documents through configurable pipelines." href="/docs/operations/steps" />

  <Card title="Knowledge Bases" description="Manage knowledge bases with semantic search, document storage, and embedding-powered retrieval." href="/docs/operations/knowledge-bases" />
</Cards>

## Common Patterns [#common-patterns]

All document-processing operations (Convert, Identify, Steps) share these features:

* **Sync and Async modes** — Convert and Identify support both sync and async modes; Steps are async-only. Use sync for small documents and real-time responses, and async for large files or batch processing.
* **Multiple input methods** — Upload files directly, provide a URL, or send base64-encoded content.
* **Polling with SDKs** — The Python and Node.js SDKs provide built-in polling helpers (`wait()`) for async operations.
* **Error handling** — Consistent error types across all operations with typed exceptions in both SDKs.

## SDK Reference [#sdk-reference]

For detailed class and method documentation, see the full SDK references:

* [Python SDK Reference](/docs/python-sdk)
* [Node.js SDK Reference](/docs/node-sdk)
* [REST API Reference](/docs/api)


---

# Knowledge Bases (https://docs.docutray.com/docs/operations/knowledge-bases)





Knowledge Bases let you store documents with embeddings for semantic search. Upload documents, and DocuTray automatically generates vector embeddings so you can search by meaning rather than exact keywords.

## List Knowledge Bases [#list-knowledge-bases]

Retrieve all knowledge bases in your organization.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    for kb in client.knowledge_bases.list().auto_paging_iter():
        print(f"{kb.name}: {kb.documentCount} documents")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    for await (const kb of client.knowledgeBases.list().autoPagingIter()) {
      console.log(`${kb.name}: ${kb.documentCount} documents`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/knowledge-bases \
      -H "Authorization: Bearer YOUR_API_KEY"

    # With filters
    curl "https://app.docutray.com/api/knowledge-bases?isActive=true&search=docs&limit=10" \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

### Response [#response]

<Tabs groupId="lang" items="['JSON']">
  <Tab value="JSON">
    ```json
    {
      "data": [
        {
          "id": "kb_abc123",
          "name": "Product Documentation",
          "description": "Technical docs and user guides",
          "isActive": true,
          "documentCount": 150,
          "createdAt": "2024-01-15T10:30:00.000Z",
          "updatedAt": "2024-03-10T14:20:00.000Z"
        }
      ],
      "pagination": {
        "total": 3,
        "page": 1,
        "limit": 20
      }
    }
    ```
  </Tab>
</Tabs>

## Get Knowledge Base [#get-knowledge-base]

Retrieve details of a specific knowledge base.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    kb = client.knowledge_bases.get("kb_abc123")

    print(f"Name: {kb.name}")
    print(f"Description: {kb.description}")
    print(f"Documents: {kb.documentCount}")
    print(f"Active: {kb.isActive}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const kb = await client.knowledgeBases.get('kb_abc123');

    console.log(`Name: ${kb.name}`);
    console.log(`Description: ${kb.description}`);
    console.log(`Documents: ${kb.documentCount}`);
    console.log(`Active: ${kb.isActive}`);
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/knowledge-bases/kb_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Create Knowledge Base [#create-knowledge-base]

Create a new knowledge base with an optional JSON schema for document structure.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    kb = client.knowledge_bases.create(
        name="Product Documentation",
        description="Technical docs and user guides",
        schema={
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "content": {"type": "string"},
                "category": {"type": "string"}
            }
        }
    )

    print(f"Created: {kb.id}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const kb = await client.knowledgeBases.create({
      name: 'Product Documentation',
      description: 'Technical docs and user guides',
      schema: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          content: { type: 'string' },
          category: { type: 'string' },
        },
      },
    });

    console.log(`Created: ${kb.id}`);
    ```
  </Tab>
</Tabs>

## Semantic Search [#semantic-search]

Search a knowledge base using natural language. DocuTray converts your query to an embedding and finds the most semantically similar documents.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    results = client.knowledge_bases.search(
        "kb_abc123",
        query="how to configure authentication",
        limit=5,
        similarity_threshold=0.7,
        include_metadata=True
    )

    print(f"Found {results.resultsCount} results")

    for item in results.data:
        print(f"  [{item.similarity:.0%}] {item.document.content.get('title')}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const results = await client.knowledgeBases.search('kb_abc123', {
      query: 'how to configure authentication',
      limit: 5,
    });

    console.log(`Found ${results.resultsCount} results`);

    for (const item of results.data) {
      console.log(`  [${(item.similarity * 100).toFixed(0)}%] ${item.document.content.title}`);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl -X POST https://app.docutray.com/api/knowledge-bases/kb_abc123/search \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "query": "how to configure authentication",
        "limit": 5
      }'
    ```
  </Tab>
</Tabs>

### Search Response [#search-response]

<Tabs groupId="lang" items="['JSON']">
  <Tab value="JSON">
    ```json
    {
      "data": [
        {
          "document": {
            "id": "doc_xyz",
            "content": {
              "title": "Authentication Setup Guide",
              "content": "To configure authentication..."
            },
            "metadata": { "category": "security" }
          },
          "similarity": 0.92
        }
      ],
      "query": "how to configure authentication",
      "resultsCount": 1
    }
    ```
  </Tab>
</Tabs>

## Manage Documents [#manage-documents]

Add, list, update, and remove documents from a knowledge base.

### Add a Document [#add-a-document]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    doc = client.knowledge_bases.documents("kb_abc123").create(
        content={
            "title": "Getting Started Guide",
            "content": "Welcome to our product. This guide covers..."
        },
        metadata={"source": "manual", "version": "2.0"},
        generate_embedding=True
    )

    print(f"Created document: {doc.id}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const doc = await client.knowledgeBases.documents('kb_abc123').create({
      content: {
        title: 'Getting Started Guide',
        content: 'Welcome to our product. This guide covers...',
      },
      metadata: { source: 'manual', version: '2.0' },
    });

    console.log(`Created document: ${doc.id}`);
    ```
  </Tab>
</Tabs>

### List Documents [#list-documents]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    docs = client.knowledge_bases.documents("kb_abc123").list()

    for doc in docs.auto_paging_iter():
        print(f"  {doc.id}: {doc.content.get('title')}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const docs = await client.knowledgeBases.documents('kb_abc123').list();

    for (const doc of docs.data) {
      console.log(`  ${doc.id}: ${doc.content.title}`);
    }
    ```
  </Tab>
</Tabs>

### Update a Document [#update-a-document]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    doc = client.knowledge_bases.documents("kb_abc123").update(
        "doc_xyz",
        content={"title": "Updated Guide", "content": "New content..."},
        regenerate_embedding=False
    )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const doc = await client.knowledgeBases.documents('kb_abc123').update('doc_xyz', {
      content: { title: 'Updated Guide', content: 'New content...' },
    });
    ```
  </Tab>
</Tabs>

### Delete a Document [#delete-a-document]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    client.knowledge_bases.documents("kb_abc123").delete("doc_xyz")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    await client.knowledgeBases.documents('kb_abc123').delete('doc_xyz');
    ```
  </Tab>
</Tabs>

## Sync Knowledge Base [#sync-knowledge-base]

Regenerate embeddings for all documents in a knowledge base. Useful after bulk updates.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    result = client.knowledge_bases.sync(
        "kb_abc123",
        regenerate_embeddings=True
    )

    print(f"Status: {result.status}")
    print(f"Documents processed: {result.documentsProcessed}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const result = await client.knowledgeBases.sync('kb_abc123');

    console.log(`Status: ${result.status}`);
    console.log(`Documents processed: ${result.documentsProcessed}`);
    ```
  </Tab>
</Tabs>

## Parameters [#parameters]

### List Parameters [#list-parameters]

| Parameter  | Type    | Required | Description                             |
| ---------- | ------- | -------- | --------------------------------------- |
| `isActive` | boolean | No       | Filter by active status (default: true) |
| `search`   | string  | No       | Search by name or description           |
| `page`     | integer | No       | Page number (default: 1)                |
| `limit`    | integer | No       | Items per page, 1-100 (default: 20)     |

### Search Parameters [#search-parameters]

| Parameter              | Type    | Required | Description                          |
| ---------------------- | ------- | -------- | ------------------------------------ |
| `query`                | string  | Yes      | Natural language search query        |
| `limit`                | integer | No       | Maximum results to return            |
| `similarity_threshold` | float   | No       | Minimum similarity score (0-1)       |
| `include_metadata`     | boolean | No       | Include document metadata in results |

## Complete Code [#complete-code]

End-to-end example: create a knowledge base, add documents, and search.

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from docutray import Client, DocuTrayError

    client = Client(api_key="YOUR_API_KEY")

    try:
        # Create a knowledge base
        kb = client.knowledge_bases.create(
            name="FAQ Database",
            description="Frequently asked questions and answers"
        )
        print(f"Created KB: {kb.id}")

        # Add documents
        docs_client = client.knowledge_bases.documents(kb.id)

        docs_client.create(
            content={
                "question": "How do I reset my password?",
                "answer": "Go to Settings > Security > Change Password"
            },
            generate_embedding=True
        )

        docs_client.create(
            content={
                "question": "What file formats are supported?",
                "answer": "JPEG, PNG, GIF, BMP, WebP, and PDF up to 100MB"
            },
            generate_embedding=True
        )

        # Search the knowledge base
        results = client.knowledge_bases.search(
            kb.id,
            query="how to change my password",
            limit=3
        )

        print(f"\nSearch results ({results.resultsCount} found):")
        for item in results.data:
            content = item.document.content
            print(f"  [{item.similarity:.0%}] {content.get('question')}")
            print(f"    → {content.get('answer')}")

    except DocuTrayError as e:
        print(f"Error: {e.message}")
    finally:
        client.close()
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { DocuTrayError } from 'docutray';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      // Create a knowledge base
      const kb = await client.knowledgeBases.create({
        name: 'FAQ Database',
        description: 'Frequently asked questions and answers',
      });
      console.log(`Created KB: ${kb.id}`);

      // Add documents
      const docs = client.knowledgeBases.documents(kb.id);

      await docs.create({
        content: {
          question: 'How do I reset my password?',
          answer: 'Go to Settings > Security > Change Password',
        },
      });

      await docs.create({
        content: {
          question: 'What file formats are supported?',
          answer: 'JPEG, PNG, GIF, BMP, WebP, and PDF up to 100MB',
        },
      });

      // Search the knowledge base
      const results = await client.knowledgeBases.search(kb.id, {
        query: 'how to change my password',
        limit: 3,
      });

      console.log(`\nSearch results (${results.resultsCount} found):`);
      for (const item of results.data) {
        const content = item.document.content;
        console.log(`  [${(item.similarity * 100).toFixed(0)}%] ${content.question}`);
        console.log(`    → ${content.answer}`);
      }
    } catch (error) {
      if (error instanceof DocuTrayError) {
        console.error(`Error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## SDK Reference [#sdk-reference]

For detailed class and method documentation:

* [Python SDK Knowledge Bases Reference](/docs/python-sdk/resources/knowledge_bases)
* [Node.js SDK Knowledge Bases Reference](/docs/node-sdk/resources/knowledge-bases)
* [REST API Reference](/docs/api)


---

# Steps (https://docs.docutray.com/docs/operations/steps)





Steps are preconfigured document processing pipelines. Each step defines a specific processing workflow — you provide a document and DocuTray executes the step's pipeline, returning structured results. Steps are always executed asynchronously.

## Quick Start [#quick-start]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client

    client = Client(api_key="YOUR_API_KEY")

    # Execute a step
    status = client.steps.run_async(
        step_id="step_abc123",
        file=Path("document.pdf")
    )

    # Wait for completion
    result = status.wait()

    if result.is_success():
        print(result.data)
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    // Execute a step
    const status = await client.steps.runAsync({
      stepId: 'step_abc123',
      file: readFileSync('document.pdf'),
      filename: 'document.pdf',
    });

    // Wait for completion
    const result = await status.wait();

    if (result.isSuccess()) {
      console.log(result.data);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # Start step execution
    curl -X POST https://app.docutray.com/api/steps-async/step_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@document.pdf"

    # Response:
    # {
    #   "execution_id": "exec_abc123",
    #   "status": "ENQUEUED"
    # }

    # Poll for status
    curl https://app.docutray.com/api/steps-async/status/exec_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY"
    ```
  </Tab>
</Tabs>

## Response [#response]

<Tabs groupId="lang" items="['Python', 'Node.js', 'JSON']">
  <Tab value="Python">
    ```python
    # status is a StepExecutionStatus
    print(status.execution_id)        # "exec_abc123"
    print(status.status)              # "SUCCESS"
    print(status.data)                # Extracted data
    print(status.original_filename)   # "document.pdf"
    print(status.request_timestamp)   # When execution started
    print(status.response_timestamp)  # When execution completed
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // status is a StepExecutionStatus
    console.log(status.id);                   // "exec_abc123"
    console.log(status.status);               // "SUCCESS"
    console.log(status.data);                 // Extracted data
    console.log(status.original_filename);    // "document.pdf"
    console.log(status.request_timestamp);    // When execution started
    console.log(status.response_timestamp);   // When execution completed
    ```
  </Tab>

  <Tab value="JSON">
    ```json
    {
      "execution_id": "exec_abc123",
      "status": "SUCCESS",
      "data": {
        "invoice_number": "INV-2024-001",
        "total": 1160.00
      },
      "original_filename": "document.pdf",
      "request_timestamp": "2024-01-15T10:30:00.000Z",
      "response_timestamp": "2024-01-15T10:30:45.000Z"
    }
    ```
  </Tab>
</Tabs>

## Polling and Status [#polling-and-status]

Steps are always asynchronous. You can poll for status manually or use the SDK's built-in `wait()` method.

### Using wait() (Recommended) [#using-wait-recommended]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    status = client.steps.run_async(
        step_id="step_abc123",
        file=Path("document.pdf")
    )

    # Wait with automatic polling
    result = status.wait()

    if result.is_success():
        print("Step completed successfully")
        print(result.data)
    elif result.is_error():
        print(f"Step failed: {result.error}")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const status = await client.steps.runAsync({
      stepId: 'step_abc123',
      file: readFileSync('document.pdf'),
    });

    // Wait with status callback
    const result = await status.wait({
      onStatus: (s) => console.log(`Status: ${s.status}`),
      pollInterval: 2000,
      timeout: 300_000,
    });

    if (result.isSuccess()) {
      console.log('Step completed successfully');
      console.log(result.data);
    } else if (result.isFailed()) {
      console.log(`Step failed: ${result.error}`);
    }
    ```
  </Tab>
</Tabs>

### Manual Polling [#manual-polling]

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    status = client.steps.get_status("exec_abc123")

    if status.is_success():
        print(status.data)
    elif status.status == "ENQUEUED" or status.status == "PROCESSING":
        print("Still processing...")
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    const status = await client.steps.getStatus('exec_abc123');

    if (status.isSuccess()) {
      console.log(status.data);
    } else if (status.status === 'ENQUEUED' || status.status === 'PROCESSING') {
      console.log('Still processing...');
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://app.docutray.com/api/steps-async/status/exec_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY"

    # Status transitions: ENQUEUED → PROCESSING → SUCCESS | ERROR
    ```
  </Tab>
</Tabs>

## Input Methods [#input-methods]

Steps support the same input methods as Convert and Identify: file upload, URL, and base64.

<Tabs groupId="lang" items="['Python', 'Node.js', 'cURL']">
  <Tab value="Python">
    ```python
    # File upload
    status = client.steps.run_async(
        step_id="step_abc123",
        file=Path("document.pdf")
    )

    # URL
    status = client.steps.run_async(
        step_id="step_abc123",
        url="https://example.com/document.pdf"
    )

    # Base64
    import base64
    with open("document.pdf", "rb") as f:
        encoded = base64.b64encode(f.read()).decode()

    status = client.steps.run_async(
        step_id="step_abc123",
        file_base64=encoded,
        content_type="application/pdf"
    )
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    // File upload
    const status = await client.steps.runAsync({
      stepId: 'step_abc123',
      file: readFileSync('document.pdf'),
      filename: 'document.pdf',
    });

    // URL
    const status2 = await client.steps.runAsync({
      stepId: 'step_abc123',
      url: 'https://example.com/document.pdf',
    });

    // Base64
    const encoded = readFileSync('document.pdf').toString('base64');
    const status3 = await client.steps.runAsync({
      stepId: 'step_abc123',
      base64: encoded,
      contentType: 'application/pdf',
    });
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    # File upload
    curl -X POST https://app.docutray.com/api/steps-async/step_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F "image=@document.pdf"

    # URL
    curl -X POST https://app.docutray.com/api/steps-async/step_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"image_url": "https://example.com/document.pdf"}'

    # Base64
    curl -X POST https://app.docutray.com/api/steps-async/step_abc123 \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d "{\"image_base64\": \"$(base64 -i document.pdf)\", \"image_content_type\": \"application/pdf\"}"
    ```
  </Tab>
</Tabs>

## Parameters [#parameters]

| Parameter                | Type   | Required | Description                                   |
| ------------------------ | ------ | -------- | --------------------------------------------- |
| `step_id` / `stepId`     | string | Yes      | ID of the step to execute                     |
| `file`                   | File   | No       | File to process (path, bytes, or file object) |
| `url`                    | string | No       | Public URL of the document                    |
| `file_base64` / `base64` | string | No       | Base64-encoded document content               |
| `content_type`           | string | No       | MIME type (auto-detected if not provided)     |
| `document_metadata`      | object | No       | Custom metadata returned in status responses  |

<Callout type="info">
  You must provide exactly one of `file`, `url`, or `file_base64`/`base64`.
</Callout>

## Complete Code [#complete-code]

<Tabs groupId="lang" items="['Python', 'Node.js']">
  <Tab value="Python">
    ```python
    from pathlib import Path
    from docutray import Client, NotFoundError, DocuTrayError

    client = Client(api_key="YOUR_API_KEY")

    try:
        # Execute a step with metadata
        status = client.steps.run_async(
            step_id="step_abc123",
            file=Path("invoice.pdf"),
            document_metadata={"source": "email", "customer_id": "cust_456"}
        )

        print(f"Execution started: {status.execution_id}")

        # Wait for completion
        result = status.wait()

        if result.is_success():
            print("Step completed!")
            print(f"Result: {result.data}")
        elif result.is_error():
            print(f"Step failed: {result.error}")

    except NotFoundError:
        print("Step not found — check the step ID")
    except DocuTrayError as e:
        print(f"Error: {e.message}")
    finally:
        client.close()
    ```
  </Tab>

  <Tab value="Node.js">
    ```typescript
    import DocuTray, { NotFoundError, DocuTrayError } from 'docutray';
    import { readFileSync } from 'fs';

    const client = new DocuTray({ apiKey: 'YOUR_API_KEY' });

    try {
      // Execute a step with metadata
      const status = await client.steps.runAsync({
        stepId: 'step_abc123',
        file: readFileSync('invoice.pdf'),
        filename: 'invoice.pdf',
        documentMetadata: { source: 'email', customer_id: 'cust_456' },
      });

      console.log(`Execution started: ${status.id}`);

      // Wait for completion
      const result = await status.wait({
        onStatus: (s) => console.log(`Status: ${s.status}`),
      });

      if (result.isSuccess()) {
        console.log('Step completed!');
        console.log('Result:', result.data);
      } else if (result.isFailed()) {
        console.log(`Step failed: ${result.error}`);
      }
    } catch (error) {
      if (error instanceof NotFoundError) {
        console.error('Step not found — check the step ID');
      } else if (error instanceof DocuTrayError) {
        console.error(`Error: ${error.message}`);
      }
    }
    ```
  </Tab>
</Tabs>

## SDK Reference [#sdk-reference]

For detailed class and method documentation:

* [Python SDK Steps Reference](/docs/python-sdk/resources/steps)
* [Node.js SDK Steps Reference](/docs/node-sdk/resources/steps)
* [REST API Reference](/docs/api)


---

# Client (https://docs.docutray.com/docs/node-sdk/client)



<InstallBar
  pkg="docutray"
  version="v0.1.1-rc.0"
  commands="[
  { command: &#x22;npm install docutray&#x22; },
  { command: &#x22;pnpm add docutray&#x22; },
]"
/>

## DocuTray [#docutray]

The main client class for interacting with the DocuTray API. Provides access to all API resources through typed properties.

<Tabs items="[&#x22;Environment variable&#x22;, &#x22;Explicit key&#x22;, &#x22;Custom config&#x22;]">
  <Tab value="Environment variable">
    ```ts
    import DocuTray from 'docutray';

    // Using environment variable (DOCUTRAY_API_KEY)
    const client = new DocuTray();
    ```
  </Tab>

  <Tab value="Explicit key">
    ```ts
    import DocuTray from 'docutray';

    const client = new DocuTray({ apiKey: 'dt_my-api-key' });
    ```
  </Tab>

  <Tab value="Custom config">
    ```ts
    import DocuTray from 'docutray';

    const client = new DocuTray({
      apiKey: 'dt_my-api-key',
      timeout: 30_000,
      maxRetries: 3,
    });
    ```
  </Tab>
</Tabs>

### Resources [#resources]

Each property on a `DocuTray` instance exposes a typed resource client. Resource methods return typed promises and accept request-scoped overrides.

<ResourceGrid>
  <ResourceCard name="client.convert" type="Convert" href="/docs/node-sdk/resources/convert" description="Document conversion operations — sync, async, status polling." />

  <ResourceCard name="client.identify" type="Identify" href="/docs/node-sdk/resources/identify" description="Detect document type from file or URL using your registered taxonomy." />

  <ResourceCard name="client.documentTypes" type="DocumentTypes" href="/docs/node-sdk/resources/document-types" description="List, get and manage your organization's document type catalog." />

  <ResourceCard name="client.steps" type="Steps" href="/docs/node-sdk/resources/steps" description="Execute individual workflow steps asynchronously." />

  <ResourceCard name="client.knowledgeBases" type="KnowledgeBases" href="/docs/node-sdk/resources/knowledge-bases" description="Manage knowledge bases for retrieval-augmented operations." />
</ResourceGrid>

## ClientOptions [#clientoptions]

Configuration options for the DocuTray client.

<TypeCard path="../../vendor/docutray-node/src/core/types.ts" name="ClientOptions" />

## RequestOptions [#requestoptions]

Per-request options that override client-level defaults.

<TypeCard path="../../vendor/docutray-node/src/core/types.ts" name="RequestOptions" />

## RetryConfig [#retryconfig]

Configuration for the exponential backoff retry strategy.

<TypeCard path="../../vendor/docutray-node/src/core/types.ts" name="RetryConfig" />

## File Input Types [#file-input-types]

### FileInput [#fileinput]

Accepted file input types for document uploads: `Blob | Buffer | ArrayBuffer | FileWithMetadata`.

<Callout type="info">
  When passing a raw `Blob` or `Buffer` without a `filename`, DocuTray
  infers the MIME type from the content. For best accuracy, prefer
  `FileWithMetadata`.
</Callout>

### FileWithMetadata [#filewithmetadata]

A file with explicit filename and optional content type.

<TypeCard path="../../vendor/docutray-node/src/core/types.ts" name="FileWithMetadata" />


---

# Errors (https://docs.docutray.com/docs/node-sdk/errors)



## Error Hierarchy [#error-hierarchy]

All errors thrown by the SDK extend `DocuTrayError`. API errors include HTTP status code information and response details.

```
DocuTrayError
├── APIConnectionError
│   └── APITimeoutError
└── APIError
    ├── BadRequestError (400)
    ├── AuthenticationError (401)
    ├── PermissionDeniedError (403)
    ├── NotFoundError (404)
    ├── ConflictError (409)
    ├── UnprocessableEntityError (422)
    ├── RateLimitError (429)
    └── InternalServerError (5xx)
```

## Usage [#usage]

```ts
import DocuTray, {
  DocuTrayError,
  APIError,
  AuthenticationError,
  RateLimitError,
} from 'docutray';

const client = new DocuTray();

try {
  await client.convert.run({ documentTypeCode: 'invoice', url: '...' });
} catch (err) {
  if (err instanceof RateLimitError) {
    console.log(`Rate limited. Retry after ${err.retryAfter}s`);
  } else if (err instanceof AuthenticationError) {
    console.log('Invalid API key');
  } else if (err instanceof APIError) {
    console.log(`API error ${err.statusCode}: ${err.message}`);
    console.log('Request ID:', err.requestId);
  } else if (err instanceof DocuTrayError) {
    console.log('SDK error:', err.message);
  }
}
```

## DocuTrayError [#docutrayerror]

Base error class for all SDK errors.

| Property  | Type     | Description      |
| --------- | -------- | ---------------- |
| `message` | `string` | Error message    |
| `name`    | `string` | Error class name |

## APIConnectionError [#apiconnectionerror]

Thrown when the SDK cannot establish a connection to the API.

| Property | Type      | Description                                             |
| -------- | --------- | ------------------------------------------------------- |
| `cause`  | `unknown` | The underlying error that caused the connection failure |

## APITimeoutError [#apitimeouterror]

Thrown when a request exceeds the configured timeout or is aborted. Extends `APIConnectionError`.

## APIError [#apierror]

Thrown when the API returns a non-success HTTP status code.

| Property     | Type                  | Description                     |
| ------------ | --------------------- | ------------------------------- |
| `statusCode` | `number`              | The HTTP status code            |
| `requestId`  | `string \| undefined` | The `x-request-id` header value |
| `body`       | `unknown`             | The parsed response body        |
| `headers`    | `Headers`             | The raw response headers        |

### Status-Specific Errors [#status-specific-errors]

| Error Class                | HTTP Status | Description                |
| -------------------------- | ----------- | -------------------------- |
| `BadRequestError`          | 400         | Invalid request parameters |
| `AuthenticationError`      | 401         | Invalid or missing API key |
| `PermissionDeniedError`    | 403         | Insufficient permissions   |
| `NotFoundError`            | 404         | Resource not found         |
| `ConflictError`            | 409         | Resource conflict          |
| `UnprocessableEntityError` | 422         | Validation errors          |
| `RateLimitError`           | 429         | Rate limit exceeded        |
| `InternalServerError`      | 5xx         | Server-side errors         |

## RateLimitError [#ratelimiterror]

Includes additional rate-limit metadata extracted from response headers.

| Property     | Type                  | Description                                |
| ------------ | --------------------- | ------------------------------------------ |
| `retryAfter` | `number \| undefined` | Seconds to wait before retrying            |
| `limitType`  | `string \| undefined` | Type of rate limit hit                     |
| `limit`      | `number \| undefined` | Maximum requests allowed in current window |
| `remaining`  | `number \| undefined` | Requests remaining in current window       |
| `resetTime`  | `Date \| undefined`   | When the rate limit window resets          |


---

# Node.js SDK (https://docs.docutray.com/docs/node-sdk)



The official Node.js library for the [DocuTray API](https://docutray.com), providing access to document processing capabilities including OCR, document identification, data extraction, and knowledge bases.

## Installation [#installation]

```bash
npm install docutray
```

Requires Node.js 20+.

## Quick Start [#quick-start]

```typescript
import DocuTray from 'docutray';
import { readFileSync } from 'fs';

const client = new DocuTray({ apiKey: 'your-api-key' });

// Convert a document
const result = await client.convert.run({
  file: readFileSync('invoice.pdf'),
  documentTypeCode: 'invoice',
});

console.log(result.data);
```

### Async Conversion [#async-conversion]

For large documents, use async conversion with polling:

```typescript
const status = await client.convert.runAsync({
  file: readFileSync('large_document.pdf'),
  documentTypeCode: 'invoice',
});

// Poll for completion
const final = await status.wait();
if (final.isSuccess()) {
  console.log(final.data);
}
```

## Configuration [#configuration]

<CodeBlockTabs defaultValue="API Key">
  <CodeBlockTabsList>
    <CodeBlockTabsTrigger value="API Key">
      API Key
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="Timeout">
      Timeout
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="Retries">
      Retries
    </CodeBlockTabsTrigger>
  </CodeBlockTabsList>

  <CodeBlockTab value="API Key">
    ```typescript
    // Via constructor
    const client = new DocuTray({ apiKey: 'your-api-key' });

    // Via environment variable (DOCUTRAY_API_KEY)
    const client = new DocuTray();
    ```
  </CodeBlockTab>

  <CodeBlockTab value="Timeout">
    ```typescript
    const client = new DocuTray({
      apiKey: 'your-api-key',
      timeout: 30_000, // 30 seconds
    });
    ```
  </CodeBlockTab>

  <CodeBlockTab value="Retries">
    ```typescript
    // Default: 2 retries with exponential backoff
    const client = new DocuTray({ apiKey: 'your-api-key', maxRetries: 5 });
    ```
  </CodeBlockTab>
</CodeBlockTabs>

## Resources [#resources]

### Client [#client]

The main entry point for the SDK:

* [`DocuTray`](/docs/node-sdk/client) — Client class with resource properties

### API Resources [#api-resources]

* [Convert](/docs/node-sdk/resources/convert) — Document conversion and data extraction
* [Identify](/docs/node-sdk/resources/identify) — Automatic document type identification
* [DocumentTypes](/docs/node-sdk/resources/document-types) — Document type catalog and schema validation
* [Steps](/docs/node-sdk/resources/steps) — Workflow step execution
* [KnowledgeBases](/docs/node-sdk/resources/knowledge-bases) — Knowledge base management and semantic search

### Error Handling [#error-handling]

* [Error Hierarchy](/docs/node-sdk/errors) — Comprehensive error classes with status-specific exceptions

### Types [#types]

Response and model types:

* [Convert Types](/docs/node-sdk/types/convert)
* [Identify Types](/docs/node-sdk/types/identify)
* [Document Type Types](/docs/node-sdk/types/document-type)
* [Step Types](/docs/node-sdk/types/step)
* [Knowledge Base Types](/docs/node-sdk/types/knowledge-base)
* [Shared Types](/docs/node-sdk/types/shared)


---

# Client (https://docs.docutray.com/docs/python-sdk/client)



<InstallBar
  pkg="docutray"
  version="v0.1.0"
  commands="[
  { command: &#x22;pip install docutray&#x22; },
  { command: &#x22;uv add docutray&#x22; },
]"
/>

The main client classes for interacting with the DocuTray API.

## `Client` [#client]

Synchronous client for the DocuTray API.

**Example:**

```python
>>> client = Client(api_key="sk_test_123")
    >>> # Convert a document
    >>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data)
    >>> client.close()

    Or using a context manager:
    >>> with Client(api_key="sk_test_123") as client:
    ...     result = client.identify.run(file=Path("document.pdf"))
    ...     print(f"Type: {result.document_type.name}")
```

**Properties:**

* `convert`
  : Document conversion operations.

* `document_types`
  : Document type catalog operations.

* `identify`
  : Document identification operations.

* `knowledge_bases`
  : Knowledge base operations for semantic search.

* `steps`
  : Step execution operations.

## `AsyncClient` [#asyncclient]

Asynchronous client for the DocuTray API.

**Example:**

```python
>>> async with AsyncClient(api_key="sk_test_123") as client:
    ...     result = await client.convert.run(
    ...         file=Path("invoice.pdf"),
    ...         document_type_code="invoice"
    ...     )
    ...     print(result.data)
```

**Properties:**

* `convert`
  : Document conversion operations (async).

* `document_types`
  : Document type catalog operations (async).

* `identify`
  : Document identification operations (async).

* `knowledge_bases`
  : Knowledge base operations for semantic search (async).

* `steps`
  : Step execution operations (async).


---

# Exceptions (https://docs.docutray.com/docs/python-sdk/exceptions)



Exception classes for error handling in the DocuTray SDK.

## Exception Hierarchy [#exception-hierarchy]

```
DocuTrayError (base)
├── APIConnectionError (network errors)
│   └── APITimeoutError (request timeout)
└── APIError (HTTP errors)
    ├── BadRequestError (400)
    ├── AuthenticationError (401)
    ├── PermissionDeniedError (403)
    ├── NotFoundError (404)
    ├── ConflictError (409)
    ├── UnprocessableEntityError (422)
    ├── RateLimitError (429)
    └── InternalServerError (5xx)
```

### `DocuTrayError` [#docutrayerror]

Base exception for all DocuTray SDK errors.

**Arguments:**

message: The error message.

### `APIConnectionError` [#apiconnectionerror]

Raised when the SDK cannot connect to the API server.

This includes network errors, DNS resolution failures, and other
connection-level problems.

**Arguments:**

message: The error message.
should\_retry: Whether this error should be retried.

### `APITimeoutError` [#apitimeouterror]

Raised when a request times out.

**Arguments:**

message: The error message.

### `APIError` [#apierror]

Base class for errors returned by the API.

All HTTP error responses from the API are converted to subclasses
of this exception. Contains rich context for debugging.

**Arguments:**

message: Human-readable error description.
status\_code: HTTP status code from the response.
request\_id: Request ID from X-Request-ID header for debugging.
body: Parsed JSON response body (can be any JSON type).
headers: Response headers.

### `BadRequestError` [#badrequesterror]

Raised when the API returns a 400 Bad Request error.

This typically indicates invalid parameters or malformed request data.

### `AuthenticationError` [#authenticationerror]

Raised when authentication fails (401 Unauthorized).

This indicates an invalid, expired, or missing API key.

### `PermissionDeniedError` [#permissiondeniederror]

Raised when access is forbidden (403 Forbidden).

This indicates the API key doesn't have permission for the requested operation.

### `NotFoundError` [#notfounderror]

Raised when a resource is not found (404 Not Found).

### `ConflictError` [#conflicterror]

Raised when there's a conflict with the current state (409 Conflict).

This typically occurs when trying to create a resource that already exists
or when there's a version conflict.

### `UnprocessableEntityError` [#unprocessableentityerror]

Raised when the request is well-formed but contains semantic errors (422).

This indicates validation errors in the request payload.

### `RateLimitError` [#ratelimiterror]

Raised when rate limit is exceeded (429 Too Many Requests).

Check the `retry_after` property for the recommended wait time.
Additional rate limit details are available in `limit_type`, `limit`,
`remaining`, and `reset_time` properties when provided by the API.

**Properties:**

* `limit`
  : Get the maximum limit for this period.

* `limit_type`
  : Get the type of rate limit exceeded (minute, hour, day).

* `remaining`
  : Get the number of remaining requests.

* `reset_time`
  : Get the timestamp when the rate limit resets.

* `retry_after`
  : Get the recommended wait time in seconds from Retry-After header.

### `InternalServerError` [#internalservererror]

Raised when the API returns a 5xx server error.

These errors are typically transient and can be retried.


---

# Python SDK (https://docs.docutray.com/docs/python-sdk)



The official Python library for the [DocuTray API](https://docutray.com), providing access to document processing capabilities including OCR, document identification, data extraction, and knowledge bases.

## Installation [#installation]

```bash
pip install docutray
```

Requires Python 3.10+.

## Quick Start [#quick-start]

### Synchronous Usage [#synchronous-usage]

```python
from pathlib import Path
from docutray import Client

client = Client(api_key="your-api-key")

# Convert a document
result = client.convert.run(
    file=Path("invoice.pdf"),
    document_type_code="invoice"
)
print(result.data)

client.close()
```

### Asynchronous Usage [#asynchronous-usage]

```python
import asyncio
from pathlib import Path
from docutray import AsyncClient

async def main():
    async with AsyncClient(api_key="your-api-key") as client:
        result = await client.convert.run(
            file=Path("invoice.pdf"),
            document_type_code="invoice"
        )
        print(result.data)

asyncio.run(main())
```

## Configuration [#configuration]

<CodeBlockTabs defaultValue="API Key">
  <CodeBlockTabsList>
    <CodeBlockTabsTrigger value="API Key">
      API Key
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="Timeout">
      Timeout
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="Retries">
      Retries
    </CodeBlockTabsTrigger>
  </CodeBlockTabsList>

  <CodeBlockTab value="API Key">
    ```python
    # Via constructor
    client = Client(api_key="your-api-key")

    # Via environment variable (DOCUTRAY_API_KEY)
    client = Client()
    ```
  </CodeBlockTab>

  <CodeBlockTab value="Timeout">
    ```python
    import httpx

    client = Client(
        api_key="your-api-key",
        timeout=httpx.Timeout(connect=5.0, read=60.0, write=60.0, pool=10.0)
    )
    ```
  </CodeBlockTab>

  <CodeBlockTab value="Retries">
    ```python
    # Default: 2 retries with exponential backoff
    client = Client(api_key="your-api-key", max_retries=5)
    ```
  </CodeBlockTab>
</CodeBlockTabs>

## Resources [#resources]

### Client [#client]

The main entry points for the SDK:

* [`Client`](/docs/python-sdk/client#client) — Synchronous client
* [`AsyncClient`](/docs/python-sdk/client#asyncclient) — Asynchronous client

### API Resources [#api-resources]

* [Convert](/docs/python-sdk/resources/convert) — Document conversion and data extraction
* [Identify](/docs/python-sdk/resources/identify) — Automatic document type identification
* [DocumentTypes](/docs/python-sdk/resources/document_types) — Document type catalog and schema validation
* [Steps](/docs/python-sdk/resources/steps) — Workflow step execution
* [KnowledgeBases](/docs/python-sdk/resources/knowledge_bases) — Knowledge base management and semantic search

### Error Handling [#error-handling]

* [Exception Hierarchy](/docs/python-sdk/exceptions) — Comprehensive error classes with status-specific exceptions

### Types [#types]

Response and model types:

* [Convert Types](/docs/python-sdk/types/convert)
* [Identify Types](/docs/python-sdk/types/identify)
* [Document Type Types](/docs/python-sdk/types/document_type)
* [Step Types](/docs/python-sdk/types/step)
* [Knowledge Base Types](/docs/python-sdk/types/knowledge_base)
* [Shared Types](/docs/python-sdk/types/shared)


---

# Skills (https://docs.docutray.com/docs/skills)



<InstallBar
  pkg="@docutray/skills"
  version="beta"
  commands="[
  { command: &#x22;npx @docutray/skills install&#x22; },
  { command: &#x22;npm install -g @docutray/skills&#x22; },
]"
/>

`docutray-skills` is a curated collection of agent skills that expose the
DocuTray API to AI assistants — Claude, Cursor, Cline and any
MCP-compatible client. Drop them into your agent and it can convert
documents, look up document types, and validate fields without you
writing a single tool wiring.

> **Status:** Beta. We add and refine skills as we learn how agents
> actually use the API in production.

## Why skills, not just an SDK? [#why-skills-not-just-an-sdk]

LLMs work best with **focused, well-described tools**. A naive
"here's the OpenAPI spec" approach overwhelms them. `docutray-skills`
ships:

* **Hand-written prompts** that explain when to call each tool.
* **Tight JSON schemas** instead of the full OpenAPI surface.
* **Few-shot examples** showing typical doc-understanding flows.
* **Error recovery patterns** for low-confidence fields and retries.

The result: agents pick the right tool more often, and call it with
better arguments.

## Install [#install]

<CodeBlockTabs defaultValue="Claude (claude.skills.json)">
  <CodeBlockTabsList>
    <CodeBlockTabsTrigger value="Claude (claude.skills.json)">
      Claude (claude.skills.json)
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="MCP server">
      MCP server
    </CodeBlockTabsTrigger>

    <CodeBlockTabsTrigger value="manual">
      manual
    </CodeBlockTabsTrigger>
  </CodeBlockTabsList>

  <CodeBlockTab value="Claude (claude.skills.json)">
    ```bash
    npx @docutray/skills install
    ```
  </CodeBlockTab>

  <CodeBlockTab value="MCP server">
    ```bash
    npm install -g @docutray/skills
    docutray-skills serve --port 3333
    ```
  </CodeBlockTab>

  <CodeBlockTab value="manual">
    ```bash
    # In your agent's tool definition
    {
      "skills": ["@docutray/skills"]
    }
    ```
  </CodeBlockTab>
</CodeBlockTabs>

You'll need a DocuTray API key in the environment:

```bash
export DOCUTRAY_API_KEY=sk_live_...
```

## Available skills [#available-skills]

| Skill                      | What the agent gets                               |
| -------------------------- | ------------------------------------------------- |
| `docutray.convert`         | Extract structured data from any supported doc    |
| `docutray.types.list`      | Browse supported document types                   |
| `docutray.types.describe`  | Read the schema for a type before calling convert |
| `docutray.validate`        | Check a field's value against the schema          |
| `docutray.webhooks.recent` | List recent webhook events for debugging          |

See the [skill overview](/docs/skills/readme) for full install
instructions, contents, and usage details.

## Quick example [#quick-example]

A Claude session with the skill installed:

```text
User:  Pull the totals out of these three invoices and sum them.
       [attaches 3 PDFs]

Claude: I'll convert each one with docutray.convert, then sum the
        net_total fields...

[uses docutray.convert × 3]
[reads the structured response]

Claude: The three invoices total €4,287.50. Here's the breakdown:
        - INV-1042: €1,200.00
        - INV-1043:   €987.50
        - INV-1044: €2,100.00
```

No glue code. The agent reads the schema, calls the tool, parses the
response.

## Source [#source]

`docutray-skills` is open source:
[github.com/docutray/docutray-skills](https://github.com/docutray/docutray-skills)


---

# Overview (https://docs.docutray.com/docs/skills/readme)



Agent skills for [DocuTray CLI](https://docs.docutray.com/docs/cli) — AI-powered document processing from your coding agent.

## Install [#install]

```bash
npx skills add docutray/docutray-skills
```

This installs skills for AI coding agents like Claude Code, Cursor, Windsurf, Codex, and [40+ others](https://agentskills.io).

## What's included [#whats-included]

| Skill      | Description                                                                                                                                                                                                                                          |
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docutray` | One unified skill: install/auth, convert, identify, types (list/get/export), steps, and custom document type creation. CLI is the canonical example; Python/Node/REST equivalents and depth content live in `references/{setup,platform,advanced}/`. |

## What is DocuTray? [#what-is-docutray]

DocuTray converts documents (PDFs, images, scanned files) into structured JSON data using AI-powered extraction schemas called **document types**. The CLI is designed for automation pipelines and AI agents — all output is JSON with clear exit codes.

Key commands:

* `docutray convert` — Extract structured data from a document
* `docutray identify` — Detect document type automatically
* `docutray types list/get/export` — Manage extraction schemas
* `docutray steps run/status` — Execute processing pipelines

Learn more at [docutray.com](https://docutray.com) · [CLI docs](https://docs.docutray.com/docs/cli)

## Development [#development]

This repository uses [OpenSpec](https://github.com/openspec-dev/openspec) for change management and follows the [Agent Skills specification](https://agentskills.io).

## License [#license]

MIT


---

# Webhook Configuration (https://docs.docutray.com/docs/webhooks/configuracion)



















This guide will show you how to configure and manage webhooks in your Docutray account.

## Create a New Webhook [#create-a-new-webhook]

### Step 1: Access webhook settings [#step-1-access-webhook-settings]

1. Sign in to your Docutray account at [https://app.docutray.com/login](https://app.docutray.com/login)

2. Select the organization you want to work with

3. Navigate to "Settings" > "Organization" > "Webhooks" in the navigation menu

<img alt="Webhook configuration menu" src="__img0" />

### Step 2: Create a new webhook [#step-2-create-a-new-webhook]

1. Click the "Add Webhook" button

<img alt="Webhooks page" src="__img1" />

2. Complete the required fields in the form:

<img alt="Webhook creation form" src="__img2" />

* **Endpoint URL**: The HTTPS URL where you'll receive notifications
* **Events**: Select the types of events you want to receive:
  * **Conversion Events**: `CONVERSION_STARTED`, `CONVERSION_COMPLETED`, `CONVERSION_FAILED`
  * **Identification Events**: `IDENTIFICATION_STARTED`, `IDENTIFICATION_COMPLETED`, `IDENTIFICATION_FAILED`
  * **Steps Events**: `STEP_STARTED`, `STEP_COMPLETED`, `STEP_FAILED`
* **Enabled**: Allows you to activate or deactivate the webhook

3. Click "Create Webhook"

4. **Important**: Copy and save the automatically generated secret. This secret is used to verify the authenticity of requests.

<img alt="Generated webhook secret" src="__img3" />

### Step 3: Configure your endpoint [#step-3-configure-your-endpoint]

Your endpoint must meet the following requirements:

* **Protocol**: Be publicly accessible via HTTPS
* **Response**: Respond with a 200-299 status code to confirm receipt
* **Format**: Process POST requests with Content-Type `application/json`
* **Response time**: Respond in less than 30 seconds

## Webhook Management [#webhook-management]

### View configured webhooks [#view-configured-webhooks]

On the webhooks page you can see all configured webhooks:

<img alt="List of configured webhooks" src="__img4" />

### Edit a webhook [#edit-a-webhook]

1. Click the options menu (⋯) of the webhook you want to edit
2. Select "Edit"
3. Modify the necessary fields
4. Click "Update Webhook"

<img alt="Edit webhook" src="__img5" />

### Enable/Disable a webhook [#enabledisable-a-webhook]

You can enable or disable a webhook using the toggle switch in the webhook list, without needing to delete it.

<img alt="Enable/Disable webhook" src="__img6" />

### Regenerate secret [#regenerate-secret]

If you need to change the secret:

1. Click the options menu (⋯) of the webhook
2. Select "Regenerate secret"
3. Copy and save the new secret

**Note**: The old secret will stop working immediately.

<img alt="Regenerate secret" src="__img7" />

### Delete a webhook [#delete-a-webhook]

1. Click the options menu (⋯) of the webhook
2. Select "Delete"
3. Confirm deletion

**Note**: This action cannot be undone.

## Data Structure [#data-structure]

### HTTP Headers [#http-headers]

Each webhook request includes the following headers:

```http
Content-Type: application/json
User-Agent: Docutray-Webhook/1.0
X-Docutray-Signature: sha256=<hmac_signature_body>
X-Docutray-Auth-Signature: sha256=<hmac_signature_auth>
X-Docutray-Timestamp: <unix_timestamp>
X-Docutray-Request-Id: <uuid>
X-Docutray-Event: <event_type>
```

### Header Description [#header-description]

* **X-Docutray-Signature**: HMAC signature based on message body
* **X-Docutray-Auth-Signature**: HMAC signature based on metadata (for Lambda Authorizers)
* **X-Docutray-Timestamp**: Unix timestamp in seconds
* **X-Docutray-Request-Id**: Unique UUID for each delivery
* **X-Docutray-Event**: Event type (e.g., `CONVERSION_COMPLETED`)

## Recommendations [#recommendations]

### Reliability [#reliability]

* Respond quickly (within 30 seconds)
* Implement idempotent processing
* Log received events for debugging
* Use a message queue for asynchronous processing if needed

### Error handling [#error-handling]

* Docutray will retry up to 5 times with exponential backoff
* If your endpoint doesn't respond consistently, the webhook may be automatically disabled
* Retries follow this sequence: 30s, 1min, 5min, 15min, 1hour

### Testing [#testing]

* Use tools like [webhook.site](https://webhook.site) to test webhook reception
* Implement a test endpoint before production
* Verify that your firewall allows connections from Docutray servers

## Next Steps [#next-steps]

* **[Security](/docs/webhooks/seguridad)**: Implement signature verification to protect your endpoint
* **[Conversion Events](/docs/webhooks/conversion)**: Learn about document conversion webhooks
* **[Identification Events](/docs/webhooks/identificacion)**: Learn about identification webhooks
* **[Steps Events](/docs/webhooks/steps)**: Learn about steps webhooks
* **[Examples](/docs/webhooks/ejemplos)**: Review sample code to implement webhooks


---

# Conversion Events (https://docs.docutray.com/docs/webhooks/conversion)



Conversion webhooks are sent during document processing when using a specific document type to extract structured data.

## Conversion Events [#conversion-events]

### Conversion Started (`CONVERSION_STARTED`) [#conversion-started-conversion_started]

Sent when document processing begins:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  }
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion

### Conversion Completed (`CONVERSION_COMPLETED`) [#conversion-completed-conversion_completed]

Sent when conversion finishes successfully:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "SUCCESS",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:15.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  },
  "data": {
    "invoiceNumber": "INV-2024-001",
    "amount": 1250.00,
    "vendor": "ABC Company Inc.",
    "date": "2024-01-15"
  }
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"SUCCESS"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `response_timestamp` (string): ISO 8601 timestamp of when conversion completed
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion
* `data` (object): Extracted data from the document according to the document type schema

### Conversion Failed (`CONVERSION_FAILED`) [#conversion-failed-conversion_failed]

Sent when conversion fails:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "ERROR",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:10.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  },
  "error": "Error during OCR processing: Unable to process image"
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"ERROR"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `response_timestamp` (string): ISO 8601 timestamp of when conversion failed
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion
* `error` (string): Descriptive error message

## Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'CONVERSION_STARTED':
      console.log(`Conversion started: ${data.conversion_id}`);
      // Update database with "processing" status
      break;

    case 'CONVERSION_COMPLETED':
      console.log(`Conversion completed: ${data.conversion_id}`);
      console.log('Extracted data:', data.data);
      // Save extracted data to database
      // Send notification to user
      break;

    case 'CONVERSION_FAILED':
      console.log(`Conversion failed: ${data.conversion_id}`);
      console.log('Error:', data.error);
      // Log error and notify user
      break;
  }

  res.status(200).send('OK');
});
```

## Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Implementation Examples (https://docs.docutray.com/docs/webhooks/ejemplos)



This page provides complete examples of how to implement Docutray webhooks in different languages and frameworks.

## Node.js/Express [#nodejsexpress]

```javascript
const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.raw({ type: 'application/json' }));

app.post('/webhooks/docutray', (req, res) => {
  const signature = req.headers['x-docutray-signature'];
  const eventType = req.headers['x-docutray-event'];
  const payload = req.body;

  // Verify signature
  const secret = process.env.DOCUTRAY_WEBHOOK_SECRET;
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');

  if (`sha256=${expectedSignature}` !== signature) {
    return res.status(401).send('Signature verification failed');
  }

  const data = JSON.parse(payload);

  // Process based on event type
  switch (eventType) {
    // Conversion Events
    case 'CONVERSION_STARTED':
      console.log(`Conversion started: ${data.conversion_id}`);
      break;
    case 'CONVERSION_COMPLETED':
      console.log(`Conversion completed: ${data.conversion_id}`);
      console.log('Extracted data:', data.data);
      break;
    case 'CONVERSION_FAILED':
      console.log(`Conversion failed: ${data.conversion_id}`);
      console.log('Error:', data.error);
      break;

    // Identification Events
    case 'IDENTIFICATION_STARTED':
      console.log(`Identification started: ${data.identification_id}`);
      break;
    case 'IDENTIFICATION_COMPLETED':
      console.log(`Identification completed: ${data.identification_id}`);
      console.log('Identified type:', data.document_type);
      break;
    case 'IDENTIFICATION_FAILED':
      console.log(`Identification failed: ${data.identification_id}`);
      console.log('Error:', data.error);
      break;

    // Steps Events
    case 'STEP_STARTED':
      console.log(`Step started: ${data.step_name} (${data.step_execution_id})`);
      break;
    case 'STEP_COMPLETED':
      console.log(`Step completed: ${data.step_name} (${data.step_execution_id})`);
      if (data.data) console.log('Processed data:', data.data);
      if (data.validation) console.log('Validation:', data.validation);
      break;
    case 'STEP_FAILED':
      console.log(`Step failed: ${data.step_name} (${data.step_execution_id})`);
      console.log('Error:', data.error);
      break;
  }

  res.status(200).send('OK');
});

app.listen(3000);
```

## Python/Flask [#pythonflask]

```python
import hmac
import hashlib
import json
import os
from flask import Flask, request

app = Flask(__name__)

@app.route('/webhooks/docutray', methods=['POST'])
def handle_webhook():
    signature = request.headers.get('X-Docutray-Signature')
    event_type = request.headers.get('X-Docutray-Event')
    payload = request.get_data()

    # Verify signature
    secret = os.environ['DOCUTRAY_WEBHOOK_SECRET'].encode()
    expected_signature = hmac.new(
        secret,
        payload,
        hashlib.sha256
    ).hexdigest()

    if f'sha256={expected_signature}' != signature:
        return 'Signature verification failed', 401

    data = json.loads(payload)

    # Process based on event type
    # Conversion Events
    if event_type == 'CONVERSION_STARTED':
        print(f"Conversion started: {data['conversion_id']}")
    elif event_type == 'CONVERSION_COMPLETED':
        print(f"Conversion completed: {data['conversion_id']}")
        print(f"Extracted data: {data['data']}")
    elif event_type == 'CONVERSION_FAILED':
        print(f"Conversion failed: {data['conversion_id']}")
        print(f"Error: {data['error']}")

    # Identification Events
    elif event_type == 'IDENTIFICATION_STARTED':
        print(f"Identification started: {data['identification_id']}")
    elif event_type == 'IDENTIFICATION_COMPLETED':
        print(f"Identification completed: {data['identification_id']}")
        print(f"Identified type: {data['document_type']}")
    elif event_type == 'IDENTIFICATION_FAILED':
        print(f"Identification failed: {data['identification_id']}")
        print(f"Error: {data['error']}")

    # Steps Events
    elif event_type == 'STEP_STARTED':
        print(f"Step started: {data['step_name']} ({data['step_execution_id']})")
    elif event_type == 'STEP_COMPLETED':
        print(f"Step completed: {data['step_name']} ({data['step_execution_id']})")
        if 'data' in data:
            print(f"Processed data: {data['data']}")
        if 'validation' in data:
            print(f"Validation: {data['validation']}")
    elif event_type == 'STEP_FAILED':
        print(f"Step failed: {data['step_name']} ({data['step_execution_id']})")
        print(f"Error: {data['error']}")

    return 'OK', 200

if __name__ == '__main__':
    app.run(port=3000)
```

## Python/FastAPI [#pythonfastapi]

```python
import hmac
import hashlib
import os
from fastapi import FastAPI, Request, Header, HTTPException

app = FastAPI()

@app.post("/webhooks/docutray")
async def handle_webhook(
    request: Request,
    x_docutray_signature: str = Header(...),
    x_docutray_event: str = Header(...)
):
    payload = await request.body()

    # Verify signature
    secret = os.environ['DOCUTRAY_WEBHOOK_SECRET'].encode()
    expected_signature = hmac.new(
        secret,
        payload,
        hashlib.sha256
    ).hexdigest()

    if f'sha256={expected_signature}' != x_docutray_signature:
        raise HTTPException(status_code=401, detail="Signature verification failed")

    data = await request.json()

    # Process based on event type
    if x_docutray_event == 'CONVERSION_STARTED':
        print(f"Conversion started: {data['conversion_id']}")
    elif x_docutray_event == 'CONVERSION_COMPLETED':
        print(f"Conversion completed: {data['conversion_id']}")
        print(f"Extracted data: {data['data']}")
    elif x_docutray_event == 'CONVERSION_FAILED':
        print(f"Conversion failed: {data['conversion_id']}")
    # ... other events

    return {"status": "ok"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=3000)
```

## PHP [#php]

```php
<?php

$signature = $_SERVER['HTTP_X_DOCUTRAY_SIGNATURE'];
$eventType = $_SERVER['HTTP_X_DOCUTRAY_EVENT'];
$payload = file_get_contents('php://input');

// Verify signature
$secret = getenv('DOCUTRAY_WEBHOOK_SECRET');
$expectedSignature = 'sha256=' . hash_hmac('sha256', $payload, $secret);

if ($signature !== $expectedSignature) {
    http_response_code(401);
    echo 'Signature verification failed';
    exit;
}

$data = json_decode($payload, true);

// Process based on event type
switch ($eventType) {
    case 'CONVERSION_STARTED':
        error_log("Conversion started: " . $data['conversion_id']);
        break;

    case 'CONVERSION_COMPLETED':
        error_log("Conversion completed: " . $data['conversion_id']);
        error_log("Extracted data: " . json_encode($data['data']));
        break;

    case 'CONVERSION_FAILED':
        error_log("Conversion failed: " . $data['conversion_id']);
        error_log("Error: " . $data['error']);
        break;

    // ... other events
}

http_response_code(200);
echo 'OK';
```

## Go [#go]

```go
package main

import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "os"
)

type WebhookData map[string]interface{}

func verifySignature(payload []byte, signature string, secret string) bool {
    h := hmac.New(sha256.New, []byte(secret))
    h.Write(payload)
    expectedSignature := "sha256=" + hex.EncodeToString(h.Sum(nil))
    return expectedSignature == signature
}

func handleWebhook(w http.ResponseWriter, r *http.Request) {
    signature := r.Header.Get("X-Docutray-Signature")
    eventType := r.Header.Get("X-Docutray-Event")

    payload, err := ioutil.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Error reading body", http.StatusBadRequest)
        return
    }

    // Verify signature
    secret := os.Getenv("DOCUTRAY_WEBHOOK_SECRET")
    if !verifySignature(payload, signature, secret) {
        http.Error(w, "Signature verification failed", http.StatusUnauthorized)
        return
    }

    var data WebhookData
    if err := json.Unmarshal(payload, &data); err != nil {
        http.Error(w, "Error parsing JSON", http.StatusBadRequest)
        return
    }

    // Process based on event type
    switch eventType {
    case "CONVERSION_STARTED":
        log.Printf("Conversion started: %v", data["conversion_id"])
    case "CONVERSION_COMPLETED":
        log.Printf("Conversion completed: %v", data["conversion_id"])
        log.Printf("Extracted data: %v", data["data"])
    case "CONVERSION_FAILED":
        log.Printf("Conversion failed: %v", data["conversion_id"])
        log.Printf("Error: %v", data["error"])
    // ... other events
    }

    w.WriteHeader(http.StatusOK)
    fmt.Fprintf(w, "OK")
}

func main() {
    http.HandleFunc("/webhooks/docutray", handleWebhook)
    log.Println("Server started on :3000")
    log.Fatal(http.ListenAndServe(":3000", nil))
}
```

## Ruby/Sinatra [#rubysinatra]

```ruby
require 'sinatra'
require 'json'
require 'openssl'

post '/webhooks/docutray' do
  signature = request.env['HTTP_X_DOCUTRAY_SIGNATURE']
  event_type = request.env['HTTP_X_DOCUTRAY_EVENT']
  payload = request.body.read

  # Verify signature
  secret = ENV['DOCUTRAY_WEBHOOK_SECRET']
  expected_signature = 'sha256=' + OpenSSL::HMAC.hexdigest('sha256', secret, payload)

  if signature != expected_signature
    halt 401, 'Signature verification failed'
  end

  data = JSON.parse(payload)

  # Process based on event type
  case event_type
  when 'CONVERSION_STARTED'
    puts "Conversion started: #{data['conversion_id']}"
  when 'CONVERSION_COMPLETED'
    puts "Conversion completed: #{data['conversion_id']}"
    puts "Extracted data: #{data['data']}"
  when 'CONVERSION_FAILED'
    puts "Conversion failed: #{data['conversion_id']}"
    puts "Error: #{data['error']}"
  # ... other events
  end

  status 200
  body 'OK'
end
```

## Implementation recommendations [#implementation-recommendations]

### Asynchronous processing [#asynchronous-processing]

For webhooks that require heavy processing, consider using a task queue:

```javascript
// Example with Bull (Redis)
const Queue = require('bull');
const webhookQueue = new Queue('webhook-processing');

app.post('/webhooks/docutray', async (req, res) => {
  // Verify signature first
  if (!verifySignature(req)) {
    return res.status(401).send('Invalid signature');
  }

  // Add to queue for asynchronous processing
  await webhookQueue.add({
    eventType: req.headers['x-docutray-event'],
    data: JSON.parse(req.body)
  });

  // Respond immediately
  res.status(200).send('OK');
});

// Process in background
webhookQueue.process(async (job) => {
  const { eventType, data } = job.data;
  // Heavy processing here
});
```

### Error handling and retries [#error-handling-and-retries]

```javascript
app.post('/webhooks/docutray', async (req, res) => {
  try {
    // Verify signature
    if (!verifySignature(req)) {
      return res.status(401).send('Invalid signature');
    }

    // Process webhook
    await processWebhook(req.body);

    // Respond successfully
    res.status(200).send('OK');
  } catch (error) {
    console.error('Error processing webhook:', error);

    // Return 500 error so Docutray retries
    res.status(500).send('Internal server error');
  }
});
```

### Logging and debugging [#logging-and-debugging]

```javascript
const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'webhooks.log' })
  ]
});

app.post('/webhooks/docutray', (req, res) => {
  const requestId = req.headers['x-docutray-request-id'];
  const eventType = req.headers['x-docutray-event'];

  logger.info('Webhook received', {
    requestId,
    eventType,
    timestamp: new Date().toISOString()
  });

  // Process webhook...

  logger.info('Webhook processed', {
    requestId,
    eventType,
    duration: Date.now() - startTime
  });

  res.status(200).send('OK');
});
```

## Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)


---

# Identification Events (https://docs.docutray.com/docs/webhooks/identificacion)



Identification webhooks are sent during the automatic document type identification process, where Docutray analyzes the document and determines its type among the specified options.

## Identification Events [#identification-events]

### Identification Started (`IDENTIFICATION_STARTED`) [#identification-started-identification_started]

Sent when document identification process begins:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "document_type_code_options": ["invoice", "receipt", "purchase_order"]
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `document_type_code_options` (array, optional): List of document type codes to identify among

### Identification Completed (`IDENTIFICATION_COMPLETED`) [#identification-completed-identification_completed]

Sent when identification finishes successfully:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "SUCCESS",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:08.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "document_type": {
    "code": "invoice",
    "name": "Invoice",
    "confidence": 0.95
  },
  "document_type_code_options": ["invoice", "receipt", "purchase_order"],
  "alternatives": [
    {
      "code": "receipt",
      "name": "Receipt",
      "confidence": 0.78
    },
    {
      "code": "purchase_order",
      "name": "Purchase Order",
      "confidence": 0.45
    }
  ]
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"SUCCESS"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `response_timestamp` (string): ISO 8601 timestamp of when identification completed
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `document_type` (object): Identified document type with highest confidence
  * `code` (string): Document type code
  * `name` (string): Document type name
  * `confidence` (number): Identification confidence level (0-1)
* `document_type_code_options` (array, optional): List of document type codes that were identified among
* `alternatives` (array, optional): Alternative document types with their confidence levels

### Identification Failed (`IDENTIFICATION_FAILED`) [#identification-failed-identification_failed]

Sent when identification fails:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "ERROR",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:05.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "error": "Unable to identify document type: image quality too low"
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"ERROR"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `response_timestamp` (string): ISO 8601 timestamp of when identification failed
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `error` (string): Descriptive error message

## Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'IDENTIFICATION_STARTED':
      console.log(`Identification started: ${data.identification_id}`);
      // Update database with "identifying" status
      break;

    case 'IDENTIFICATION_COMPLETED':
      console.log(`Identification completed: ${data.identification_id}`);
      console.log(`Identified type: ${data.document_type.code}`);
      console.log(`Confidence: ${data.document_type.confidence}`);
      // Save identified type to database
      // If confidence is high, proceed with automatic conversion
      if (data.document_type.confidence > 0.9) {
        // Start automatic conversion
      }
      break;

    case 'IDENTIFICATION_FAILED':
      console.log(`Identification failed: ${data.identification_id}`);
      console.log('Error:', data.error);
      // Log error and request manual intervention
      break;
  }

  res.status(200).send('OK');
});
```

## Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Webhooks (https://docs.docutray.com/docs/webhooks)



Webhooks allow you to receive real-time notifications about events that occur in your Docutray account. When you configure a webhook, Docutray will send an HTTP POST request to the URL you specify each time an event you've subscribed to occurs.

## Available Webhook Types [#available-webhook-types]

Docutray supports three types of webhooks, each designed for different use cases:

<Cards>
  <Card title="Conversion Webhooks" href="/docs/webhooks/conversion">
    Receive notifications during document processing when using a specific document type to extract structured data.
  </Card>

  <Card title="Identification Webhooks" href="/docs/webhooks/identificacion">
    Receive notifications during the automatic document type identification process.
  </Card>

  <Card title="Steps Webhooks" href="/docs/webhooks/steps">
    Receive notifications during the execution of individual steps in document processing workflows.
  </Card>
</Cards>

## Configuration Guides [#configuration-guides]

<Cards>
  <Card title="Initial Setup" href="/docs/webhooks/configuracion">
    Learn how to configure and manage webhooks in your Docutray account.
  </Card>

  <Card title="Security and Verification" href="/docs/webhooks/seguridad">
    Protect your endpoints with HMAC signature verification and replay attack prevention.
  </Card>

  <Card title="Implementation Examples" href="/docs/webhooks/ejemplos">
    Sample code in Node.js and Python to implement webhooks.
  </Card>
</Cards>

## Key Features [#key-features]

* **Real-time notifications**: Receive events immediately as they occur
* **Multiple events**: Subscribe to specific events based on your needs
* **Robust security**: HMAC signature verification with two methods available
* **Automatic retries**: Retry system with exponential backoff
* **Flexible management**: Enable, disable, or delete webhooks without affecting your integration

## Next Steps [#next-steps]

1. **[Configuration](/docs/webhooks/configuracion)**: Start by setting up your first webhook
2. **[Security](/docs/webhooks/seguridad)**: Implement signature verification on your endpoint
3. **Select your webhook type**: Choose between Conversion, Identification, or Steps based on your use case
4. **[Examples](/docs/webhooks/ejemplos)**: Review sample code for your preferred language


---

# Security and Signature Verification (https://docs.docutray.com/docs/webhooks/seguridad)



Docutray provides two signature verification methods to adapt to different architectures:

## Method 1: Body-based signature (Traditional) [#method-1-body-based-signature-traditional]

Validates the complete payload content. Ideal for traditional implementations:

```javascript
const crypto = require('crypto');

function verifyWebhookBody(bodyString, signature, secret) {
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(bodyString)
    .digest('hex');

  return `sha256=${expectedSignature}` === signature;
}

// Usage in Express.js
app.post('/webhook', express.raw({type: 'application/json'}), (req, res) => {
  const signature = req.headers['x-docutray-signature'];
  const secret = process.env.WEBHOOK_SECRET;

  if (!verifyWebhookBody(req.body.toString(), signature, secret)) {
    return res.status(401).send('Invalid signature');
  }

  // Process webhook
  const payload = JSON.parse(req.body);
  // ...
});
```

## Method 2: Authentication signature (Lambda Authorizers compatible) [#method-2-authentication-signature-lambda-authorizers-compatible]

Validates using only metadata in headers, **without body access**. Ideal for AWS Lambda Authorizers, Azure Functions, or Google Cloud Functions:

```javascript
const crypto = require('crypto');

function verifyWebhookAuth(headers, webhookUrl, secret) {
  const authSignature = headers['x-docutray-auth-signature'];
  const timestamp = headers['x-docutray-timestamp'];
  const requestId = headers['x-docutray-request-id'];
  const eventType = headers['x-docutray-event'];

  // Validate timestamp (5-minute window)
  const now = Math.floor(Date.now() / 1000);
  if (Math.abs(now - parseInt(timestamp)) > 300) {
    return false; // Timestamp expired
  }

  // Calculate expected signature
  const authPayload = `${requestId}|${timestamp}|${webhookUrl}|${eventType}`;
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(authPayload)
    .digest('hex');

  return `sha256=${expectedSignature}` === authSignature;
}
```

## Complete example: AWS Lambda Authorizer [#complete-example-aws-lambda-authorizer]

```javascript
// Lambda Authorizer for AWS API Gateway
exports.handler = async (event) => {
  const crypto = require('crypto');

  try {
    // Extract headers
    const authSignature = event.headers['x-docutray-auth-signature'];
    const timestamp = parseInt(event.headers['x-docutray-timestamp']);
    const requestId = event.headers['x-docutray-request-id'];
    const eventType = event.headers['x-docutray-event'];
    const webhookUrl = `https://${event.headers.host}${event.path}`;

    // Validate header presence
    if (!authSignature || !timestamp || !requestId || !eventType) {
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Validate timestamp (5-minute window)
    const now = Math.floor(Date.now() / 1000);
    if (Math.abs(now - timestamp) > 300) {
      console.log('Webhook timestamp expired');
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Recalculate expected signature
    const secret = process.env.DOCUTRAY_WEBHOOK_SECRET;
    const authPayload = `${requestId}|${timestamp}|${webhookUrl}|${eventType}`;
    const expectedSignature = crypto
      .createHmac('sha256', secret)
      .update(authPayload)
      .digest('hex');

    // Validate signature
    if (`sha256=${expectedSignature}` !== authSignature) {
      console.log('Signature verification failed');
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Valid signature - allow request
    return generatePolicy('user', 'Allow', event.methodArn);

  } catch (error) {
    console.error('Error in authorizer:', error);
    return generatePolicy('user', 'Deny', event.methodArn);
  }
};

function generatePolicy(principalId, effect, resource) {
  return {
    principalId,
    policyDocument: {
      Version: '2012-10-17',
      Statement: [{
        Action: 'execute-api:Invoke',
        Effect: effect,
        Resource: resource
      }]
    }
  };
}
```

## Example: Python for AWS Lambda Authorizer [#example-python-for-aws-lambda-authorizer]

```python
import hmac
import hashlib
import time
import os

def lambda_handler(event, context):
    try:
        # Extract headers
        headers = {k.lower(): v for k, v in event['headers'].items()}
        auth_signature = headers.get('x-docutray-auth-signature')
        timestamp = int(headers.get('x-docutray-timestamp', 0))
        request_id = headers.get('x-docutray-request-id')
        event_type = headers.get('x-docutray-event')
        webhook_url = f"https://{headers['host']}{event['path']}"

        # Validate header presence
        if not all([auth_signature, timestamp, request_id, event_type]):
            return generate_policy('user', 'Deny', event['methodArn'])

        # Validate timestamp (5-minute window)
        now = int(time.time())
        if abs(now - timestamp) > 300:
            print('Webhook timestamp expired')
            return generate_policy('user', 'Deny', event['methodArn'])

        # Recalculate expected signature
        secret = os.environ['DOCUTRAY_WEBHOOK_SECRET']
        auth_payload = f"{request_id}|{timestamp}|{webhook_url}|{event_type}"
        expected_signature = hmac.new(
            secret.encode(),
            auth_payload.encode(),
            hashlib.sha256
        ).hexdigest()

        # Validate signature
        if f"sha256={expected_signature}" != auth_signature:
            print('Signature verification failed')
            return generate_policy('user', 'Deny', event['methodArn'])

        # Valid signature - allow request
        return generate_policy('user', 'Allow', event['methodArn'])

    except Exception as error:
        print(f'Error in authorizer: {error}')
        return generate_policy('user', 'Deny', event['methodArn'])

def generate_policy(principal_id, effect, resource):
    return {
        'principalId': principal_id,
        'policyDocument': {
            'Version': '2012-10-17',
            'Statement': [{
                'Action': 'execute-api:Invoke',
                'Effect': effect,
                'Resource': resource
            }]
        }
    }
```

## Replay attack protection [#replay-attack-protection]

The system includes automatic replay attack protection through:

1. **Unique timestamp**: Each delivery includes `X-Docutray-Timestamp` (Unix seconds)
2. **Unique Request ID**: Each delivery has a unique `X-Docutray-Request-Id` (UUID)

### Security recommendations [#security-recommendations]

* **Validate timestamp**: Reject requests with timestamps outside a reasonable window (recommended: 5 minutes)
* **Cache request-id**: Temporarily store processed request-ids to detect duplicates
* **Use HTTPS**: Always use HTTPS endpoints to prevent interception

```javascript
// Example of request-id cache with Redis
const redis = require('redis');
const client = redis.createClient();

async function isReplayAttack(requestId) {
  const key = `webhook:${requestId}`;
  const exists = await client.exists(key);

  if (exists) {
    return true; // Request-id already processed
  }

  // Mark as processed (10-minute TTL)
  await client.setex(key, 600, '1');
  return false;
}
```

## Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Steps Events (https://docs.docutray.com/docs/webhooks/steps)



Steps webhooks are sent during the execution of individual steps in document processing workflows. Each step can perform operations such as conversion, identification, or validation.

## Steps Events [#steps-events]

### Step Started (`STEP_STARTED`) [#step-started-step_started]

Sent when step execution begins:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  }
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `document_metadata` (object, optional): Metadata of the document being processed

### Step Completed (`STEP_COMPLETED`) [#step-completed-step_completed]

Sent when a step finishes successfully:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "completed",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:12.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  },
  "data": {
    "invoiceNumber": "INV-2024-001",
    "amount": 1250.00,
    "vendor": "ABC Company Inc.",
    "date": "2024-01-15"
  },
  "identification": {
    "document_type": "invoice",
    "confidence": 0.95
  },
  "validation": {
    "errors": {
      "count": 0,
      "messages": []
    },
    "warnings": {
      "count": 1,
      "messages": ["Amount exceeds historical average"]
    }
  }
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"completed"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `response_timestamp` (string): ISO 8601 timestamp of when step completed
* `document_metadata` (object, optional): Metadata of the document being processed
* `data` (object, optional): Processed document data (if step performed conversion)
* `identification` (object, optional): Identification result (if step performed identification)
* `validation` (object, optional): Validation result with errors and warnings

### Step Failed (`STEP_FAILED`) [#step-failed-step_failed]

Sent when a step fails:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "failed",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:08.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  },
  "error": "Validation failed: required field 'invoiceNumber' is missing"
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"failed"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `response_timestamp` (string): ISO 8601 timestamp of when step failed
* `document_metadata` (object, optional): Metadata of the document being processed
* `error` (string): Descriptive error message

## Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'STEP_STARTED':
      console.log(`Step started: ${data.step_name} (${data.step_execution_id})`);
      // Update UI with workflow progress
      break;

    case 'STEP_COMPLETED':
      console.log(`Step completed: ${data.step_name} (${data.step_execution_id})`);

      // Process data if available
      if (data.data) {
        console.log('Processed data:', data.data);
        // Save data to database
      }

      // Check validation results
      if (data.validation) {
        if (data.validation.errors.count > 0) {
          console.log('Validation errors:', data.validation.errors.messages);
          // Notify validation errors
        }
        if (data.validation.warnings.count > 0) {
          console.log('Warnings:', data.validation.warnings.messages);
          // Log warnings
        }
      }
      break;

    case 'STEP_FAILED':
      console.log(`Step failed: ${data.step_name} (${data.step_execution_id})`);
      console.log('Error:', data.error);
      // Stop workflow and notify error
      // Log failure for analysis
      break;
  }

  res.status(200).send('OK');
});
```

## Use cases [#use-cases]

### Monitoring complex workflows [#monitoring-complex-workflows]

Steps webhooks are ideal for monitoring the execution of multi-step workflows:

```javascript
// Example: Track progress of a multi-step workflow
const flowProgress = {
  stepStates: {},
  totalSteps: 0,
  completedSteps: 0
};

function handleStepEvent(data, eventType) {
  const stepId = data.step_id;

  if (eventType === 'STEP_STARTED') {
    flowProgress.stepStates[stepId] = 'processing';
    flowProgress.totalSteps++;
  } else if (eventType === 'STEP_COMPLETED') {
    flowProgress.stepStates[stepId] = 'completed';
    flowProgress.completedSteps++;
  } else if (eventType === 'STEP_FAILED') {
    flowProgress.stepStates[stepId] = 'failed';
  }

  // Calculate progress percentage
  const progress = (flowProgress.completedSteps / flowProgress.totalSteps) * 100;
  console.log(`Workflow progress: ${progress}%`);
}
```

### Validation and quality control [#validation-and-quality-control]

Use validation results to implement quality controls:

```javascript
function handleValidationResults(validation) {
  // Stop processing if there are critical errors
  if (validation.errors.count > 0) {
    // Send to manual review
    sendToManualReview(validation.errors.messages);
    return;
  }

  // Warnings don't block the workflow
  if (validation.warnings.count > 0) {
    // Log for analysis but continue
    logWarnings(validation.warnings.messages);
  }

  // Continue to next step in workflow
  proceedToNextStep();
}
```

## Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Ai Agent Usage (https://docs.docutray.com/docs/cli/guides/ai-agent-usage)



This guide covers how to use the DocuTray CLI from AI agents (Claude Code, GitHub Copilot, Codex) and automation scripts. The CLI is designed with AI agents as a primary audience — every design decision reflects this.

## Design principles for agents [#design-principles-for-agents]

* **JSON output by default** — all commands output structured JSON to stdout, parseable without extra flags
* **No interactive prompts** — every command (except `login`) works non-interactively with flags and arguments
* **Clear exit codes** — `0` for success, `1` for errors with JSON details on stderr
* **Composable** — commands can be piped and chained with standard Unix tools

## Authentication for agents [#authentication-for-agents]

Use the `DOCUTRAY_API_KEY` environment variable:

```bash
export DOCUTRAY_API_KEY=dt_live_abc123
```

This is the recommended method for agents. Never use `docutray login` from an agent — it requires interactive input.

### Verifying authentication [#verifying-authentication]

```bash
docutray status | jq -e '.authenticated'
# Exit code 0 if authenticated, 1 if not
```

## Parsing JSON output [#parsing-json-output]

All commands return structured JSON. Use `jq` or your language's JSON parser.

### Extracting fields [#extracting-fields]

```bash
# Get the extracted data from a conversion
docutray convert invoice.pdf -t electronic-invoice | jq '.extractedData'

# Get the detected document type code
docutray identify document.pdf | jq -r '.document_type.code'

# List all document type codes
docutray types list | jq -r '.data[].codeType'
```

### Error handling [#error-handling]

Errors are written to stderr as JSON:

```bash
# Capture both stdout and stderr
result=$(docutray convert invoice.pdf -t bad-type 2>error.json)
exit_code=$?

if [ $exit_code -ne 0 ]; then
  error_message=$(jq -r '.error' error.json)
  echo "Failed: $error_message"
fi
```

## Common agent patterns [#common-agent-patterns]

### Identify then convert [#identify-then-convert]

The most common workflow — detect the document type, then convert:

```bash
TYPE=$(docutray identify document.pdf | jq -r '.document_type.code')
docutray convert document.pdf --type "$TYPE"
```

### Batch processing with error collection [#batch-processing-with-error-collection]

```bash
errors=()
for file in documents/*.pdf; do
  if ! docutray convert "$file" -t electronic-invoice > "results/$(basename "$file" .pdf).json" 2>/dev/null; then
    errors+=("$file")
  fi
done

if [ ${#errors[@]} -gt 0 ]; then
  echo "Failed files: ${errors[*]}" >&2
fi
```

### Checking if a document type exists [#checking-if-a-document-type-exists]

```bash
if docutray types get my-type > /dev/null 2>&1; then
  echo "Type exists"
else
  echo "Type not found"
fi
```

### Async processing with status polling [#async-processing-with-status-polling]

```bash
# Start step execution without waiting
exec_id=$(docutray steps run extract-fields invoice.pdf --no-wait | jq -r '.id')

# Poll until complete
while true; do
  status=$(docutray steps status "$exec_id" | jq -r '.status')
  case "$status" in
    completed) break ;;
    failed) echo "Step failed" >&2; exit 1 ;;
    *) sleep 2 ;;
  esac
done

# Get final result
docutray steps status "$exec_id"
```

## Integration examples [#integration-examples]

### Claude Code [#claude-code]

When using DocuTray from Claude Code, the CLI's JSON output and clear exit codes make it straightforward:

```
You: Convert invoice.pdf using the electronic-invoice type
Claude Code: $ docutray convert invoice.pdf --type electronic-invoice
```

Claude Code can parse the JSON output directly and present results in a readable format.

### GitHub Actions [#github-actions]

```yaml
- name: Process documents
  env:
    DOCUTRAY_API_KEY: ${{ secrets.DOCUTRAY_API_KEY }}
  run: |
    for file in uploads/*.pdf; do
      docutray convert "$file" -t electronic-invoice > "processed/$(basename "$file" .pdf).json"
    done
```

### Node.js scripts [#nodejs-scripts]

```javascript
import { execSync } from 'node:child_process';

const result = JSON.parse(
  execSync('docutray convert invoice.pdf -t electronic-invoice', {
    encoding: 'utf-8',
    env: { ...process.env, DOCUTRAY_API_KEY: 'dt_live_abc123' },
  })
);

console.log(result.extractedData);
```

### Python scripts [#python-scripts]

```python
import json
import os
import subprocess

result = subprocess.run(
    ["docutray", "convert", "invoice.pdf", "-t", "electronic-invoice"],
    capture_output=True, text=True,
    env={**os.environ, "DOCUTRAY_API_KEY": "dt_live_abc123"}
)

if result.returncode == 0:
    data = json.loads(result.stdout)
    print(data["extractedData"])
else:
    error = json.loads(result.stderr)
    print(f"Error: {error['error']}")
```

## Tips for agent developers [#tips-for-agent-developers]

1. **Always check exit codes** — don't assume success
2. **Parse stderr for errors** — error details are always JSON on stderr
3. **Use `--type` explicitly** — don't rely on auto-detection for production workflows
4. **Prefer env vars** — `DOCUTRAY_API_KEY` is the most portable auth method
5. **Use `jq -e`** — the `-e` flag makes `jq` exit with code 1 when the result is `false` or `null`, useful for conditionals
6. **Pipe through `jq -r`** — the `-r` flag outputs raw strings without quotes, better for shell variables


---

# Authentication (https://docs.docutray.com/docs/cli/guides/authentication)



This guide covers how to authenticate the DocuTray CLI, manage API keys, and configure authentication for different environments.

## Getting an API key [#getting-an-api-key]

1. Sign in to the [DocuTray Dashboard](https://app.docutray.com)
2. Navigate to **Settings > API Keys**
3. Click **Create API Key** and copy the generated key (it starts with `dt_`)

## Authentication methods [#authentication-methods]

The CLI supports two authentication methods. When both are present, the environment variable takes precedence.

### 1. Environment variable (recommended for CI/CD) [#1-environment-variable-recommended-for-cicd]

Set `DOCUTRAY_API_KEY` in your environment:

```bash
export DOCUTRAY_API_KEY=dt_live_abc123
docutray convert invoice.pdf --type electronic-invoice
```

This is the recommended method for:

* CI/CD pipelines (GitHub Actions, GitLab CI, etc.)
* AI agents (Claude Code, Copilot, Codex)
* Docker containers
* Serverless functions

### 2. Config file (recommended for local development) [#2-config-file-recommended-for-local-development]

Use the `login` command to store your key locally:

```bash
docutray login
# Or non-interactively:
docutray login dt_live_abc123
```

Credentials are stored in `~/.config/docutray/config.json` with restricted file permissions (`0600`).

## Verifying authentication [#verifying-authentication]

```bash
docutray status
```

Output:

```json
{
  "authenticated": true,
  "apiKey": "dt_l****c123",
  "source": "environment",
  "baseUrl": "https://app.docutray.com",
  "configPath": "/home/user/.config/docutray/config.json"
}
```

The `source` field indicates where the API key was found: `environment`, `config`, or `none`.

## CI/CD configuration [#cicd-configuration]

### GitHub Actions [#github-actions]

```yaml
jobs:
  process-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g @docutray/cli
      - run: docutray convert invoice.pdf --type electronic-invoice
        env:
          DOCUTRAY_API_KEY: ${{ secrets.DOCUTRAY_API_KEY }}
```

### GitLab CI [#gitlab-ci]

```yaml
process-docs:
  image: node:20
  script:
    - npm install -g @docutray/cli
    - docutray convert invoice.pdf --type electronic-invoice
  variables:
    DOCUTRAY_API_KEY: $DOCUTRAY_API_KEY
```

### Docker [#docker]

```dockerfile
FROM node:20-slim
RUN npm install -g @docutray/cli
ENV DOCUTRAY_API_KEY=""
CMD ["docutray", "convert", "invoice.pdf", "--type", "electronic-invoice"]
```

```bash
docker run -e DOCUTRAY_API_KEY=dt_live_abc123 my-processor
```

## Custom API base URL [#custom-api-base-url]

For staging environments or self-hosted deployments:

```bash
# Via login
docutray login --base-url https://staging.docutray.com

# Via environment variable
export DOCUTRAY_BASE_URL=https://staging.docutray.com
```

## Logging out [#logging-out]

To remove stored credentials from the local machine:

```bash
docutray logout
```

This deletes the config file. It does **not** invalidate the API key on the server — revoke keys from the Dashboard.

## Security best practices [#security-best-practices]

* Never commit API keys to version control
* Use environment variables in CI/CD, not `docutray login`
* Rotate keys periodically from the Dashboard
* Use separate keys for production and development
* The config file is created with `0600` permissions (owner-only read/write)


---

# Document Processing (https://docs.docutray.com/docs/cli/guides/document-processing)



This guide covers the full document processing workflow — from converting documents to structured data, to identifying document types, and handling asynchronous operations.

## Converting documents [#converting-documents]

The `convert` command extracts structured data from a document using a specified document type schema.

### Basic usage [#basic-usage]

```bash
# Convert a local file
docutray convert invoice.pdf --type electronic-invoice

# Convert from a URL
docutray convert https://example.com/doc.pdf --type electronic-invoice
```

### Synchronous vs asynchronous processing [#synchronous-vs-asynchronous-processing]

By default, `convert` processes synchronously — the command blocks until the result is ready:

```bash
docutray convert invoice.pdf --type electronic-invoice
# Waits and returns the extracted data as JSON
```

For long-running documents, use `--async` to enable polling with status updates:

```bash
docutray convert large-document.pdf --type electronic-invoice --async
# Status updates are emitted to stderr as JSON:
# {"status":"processing"}
# {"status":"processing"}
# Final result is written to stdout
```

### Webhooks [#webhooks]

Instead of polling, you can receive a notification when processing completes:

```bash
docutray convert invoice.pdf --type electronic-invoice --webhook-url https://example.com/hooks/docutray
```

### Attaching metadata [#attaching-metadata]

Attach custom metadata to a conversion for tracking purposes:

```bash
docutray convert invoice.pdf --type electronic-invoice --metadata '{"orderId":"ORD-123","source":"email"}'
```

Metadata is stored with the conversion result and included in webhook payloads.

## Identifying documents [#identifying-documents]

The `identify` command analyzes a document and returns the best-matching document type with a confidence score.

```bash
docutray identify document.pdf
```

Output:

```json
{
  "document_type": {
    "code": "electronic-invoice",
    "name": "Electronic Invoice",
    "confidence": 0.95
  },
  "alternatives": [
    {
      "code": "receipt",
      "name": "Receipt",
      "confidence": 0.12
    }
  ]
}
```

### Restricting to specific types [#restricting-to-specific-types]

Narrow identification to a known set of document types:

```bash
docutray identify document.pdf --types invoice,receipt,contract
```

### Table output [#table-output]

For human-readable output:

```bash
docutray identify document.pdf --table
```

```
code                name                confidence
------------------  ------------------  ----------
electronic-invoice  Electronic Invoice  0.95
receipt             Receipt             0.12
```

## Processing steps [#processing-steps]

Steps are reusable processing pipelines configured in the DocuTray dashboard.

### Running a step [#running-a-step]

```bash
# Run a step and wait for results
docutray steps run extract-fields invoice.pdf

# Run a step on a URL
docutray steps run extract-fields https://example.com/doc.pdf
```

### Async step execution [#async-step-execution]

Start a step and return immediately:

```bash
docutray steps run extract-fields invoice.pdf --no-wait
```

Output:

```json
{
  "id": "exec_abc123",
  "status": "pending"
}
```

Then check the status later:

```bash
docutray steps status exec_abc123
```

## Common workflows [#common-workflows]

### Identify then convert [#identify-then-convert]

```bash
# Identify the document type, then convert using the detected type
TYPE=$(docutray identify document.pdf | jq -r '.document_type.code')
docutray convert document.pdf --type "$TYPE"
```

### Batch processing [#batch-processing]

```bash
# Process all PDFs in a directory
for file in documents/*.pdf; do
  echo "Processing: $file" >&2
  docutray convert "$file" --type electronic-invoice > "results/$(basename "$file" .pdf).json"
done
```

### Error handling in scripts [#error-handling-in-scripts]

```bash
if result=$(docutray convert invoice.pdf --type electronic-invoice 2>/dev/null); then
  echo "$result" | jq '.extractedData'
else
  echo "Conversion failed" >&2
  exit 1
fi
```

## Output format [#output-format]

All commands output JSON to stdout by default. Errors are written to stderr as JSON with an `error` field:

```json
{
  "error": "Document type not found: invalid-type",
  "status": 404
}
```

Exit codes:

* `0` — success
* `1` — error (details on stderr)


---

# Document Types (https://docs.docutray.com/docs/cli/guides/document-types)



This guide covers how to list, inspect, and export document types using the DocuTray CLI. Document types define the extraction schema — the fields and structure that DocuTray extracts from documents during conversion.

## Listing document types [#listing-document-types]

View all available document types:

```bash
docutray types list
```

Output:

```json
{
  "data": [
    {
      "codeType": "electronic-invoice",
      "name": "Electronic Invoice",
      "isPublic": true,
      "isDraft": false
    }
  ],
  "total": 42,
  "page": 1,
  "limit": 20
}
```

### Table format [#table-format]

For a quick overview:

```bash
docutray types list --table
```

```
code                name                public  draft
------------------  ------------------  ------  -----
electronic-invoice  Electronic Invoice  yes     no
receipt             Receipt             yes     no
contract            Contract            no      no
```

### Searching [#searching]

Filter document types by name:

```bash
docutray types list --search invoice
```

### Pagination [#pagination]

Navigate through large result sets:

```bash
# First page, 50 results
docutray types list --limit 50

# Second page
docutray types list --limit 50 --page 2
```

### Extracting type codes [#extracting-type-codes]

Get just the type codes for scripting:

```bash
docutray types list | jq -r '.data[].codeType'
```

## Getting type details [#getting-type-details]

Inspect a specific document type by its code:

```bash
docutray types get electronic-invoice
```

Returns the full type definition including name, description, field schema, and configuration.

### Table format [#table-format-1]

```bash
docutray types get electronic-invoice --table
```

## Exporting document types [#exporting-document-types]

Export a document type definition to JSON for backup, version control, or migration between environments.

### Export to stdout [#export-to-stdout]

```bash
docutray types export electronic-invoice
```

### Export to a file [#export-to-a-file]

```bash
docutray types export electronic-invoice --output electronic-invoice.json
```

Output:

```json
{
  "exported": "electronic-invoice.json",
  "code": "electronic-invoice"
}
```

### Backup all types [#backup-all-types]

```bash
mkdir -p type-backups
for code in $(docutray types list | jq -r '.data[].codeType'); do
  docutray types export "$code" --output "type-backups/${code}.json"
  echo "Exported: $code" >&2
done
```

## Creating document types [#creating-document-types]

Create a new document type with a name, code, description, and JSON schema:

```bash
docutray types create \
  --name "Invoice" \
  --code invoice \
  --description "Standard commercial invoice" \
  --schema schema.json
```

The `--schema` flag accepts either a file path or an inline JSON string:

```bash
docutray types create \
  --name "Receipt" \
  --code receipt \
  --description "Purchase receipt" \
  --schema '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"},"vendor":{"type":"string"}}}'
```

### Publishing [#publishing]

By default, types are created as drafts. To publish immediately:

```bash
docutray types create \
  --name "Invoice" \
  --code invoice \
  --description "Standard invoice" \
  --schema schema.json \
  --publish
```

### Conversion modes [#conversion-modes]

Choose how DocuTray processes the document:

```bash
# Default JSON extraction
docutray types create --name "Invoice" --code invoice --description "Invoice" --schema schema.json --conversion-mode json

# Toon mode
docutray types create --name "Invoice" --code invoice_toon --description "Invoice toon" --schema schema.json --conversion-mode toon

# Multi-prompt mode
docutray types create --name "Invoice" --code invoice_multi --description "Invoice multi" --schema schema.json --conversion-mode multi_prompt
```

### Prompt hints [#prompt-hints]

Guide the extraction with custom hints:

```bash
docutray types create \
  --name "Invoice" \
  --code invoice \
  --description "Standard invoice" \
  --schema schema.json \
  --prompt-hints "Use dd/mm/yyyy format for dates. Amounts should include currency symbol." \
  --identify-hints "Look for invoice number, date, and total amount"
```

### Example JSON schema [#example-json-schema]

A minimal but complete extraction schema:

```json
{
  "type": "object",
  "properties": {
    "invoiceNumber": {
      "type": "string",
      "description": "The invoice number or identifier"
    },
    "date": {
      "type": "string",
      "description": "Invoice date in ISO 8601 format"
    },
    "total": {
      "type": "number",
      "description": "Total amount including taxes"
    },
    "lineItems": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unitPrice": { "type": "number" }
        }
      }
    }
  }
}
```

## Updating document types [#updating-document-types]

Update any field of an existing document type:

```bash
# Update the name
docutray types update invoice --name "Updated Invoice"

# Update the schema
docutray types update invoice --schema new-schema.json

# Update prompt hints
docutray types update invoice --prompt-hints "Use dd/mm/yyyy format for dates"

# Publish a draft
docutray types update invoice --publish

# Update multiple fields at once
docutray types update invoice \
  --name "Commercial Invoice v2" \
  --description "Updated extraction schema" \
  --schema updated-schema.json \
  --prompt-hints "Extract amounts in USD"
```

Note: the `codeType` identifier cannot be changed after creation.

## Common workflows [#common-workflows]

### Find the right type for a document [#find-the-right-type-for-a-document]

```bash
# First, identify what type the document is
docutray identify unknown-document.pdf

# Then inspect that type's schema
docutray types get electronic-invoice

# Finally, convert using the right type
docutray convert unknown-document.pdf --type electronic-invoice
```

### Audit available types [#audit-available-types]

```bash
# List all types with details
docutray types list --table --limit 100

# Check if a specific type exists
docutray types get my-custom-type 2>/dev/null && echo "exists" || echo "not found"
```

### Version-control type definitions [#version-control-type-definitions]

```bash
# Export all types to a directory tracked by Git
docutray types list | jq -r '.data[].codeType' | while read code; do
  docutray types export "$code" > "types/${code}.json"
done
git add types/
git commit -m "chore: snapshot document type definitions"
```


---

# Convert (https://docs.docutray.com/docs/cli/commands/convert)



Convert a document to structured data using a specified document type schema. Accepts a local file path or a public URL as the source. By default, processing is synchronous — the command waits and returns the extracted data. Use --async for long-running documents to poll for completion with status updates on stderr.

* [`docutray convert SOURCE`](#docutray-convert-source)

## `docutray convert SOURCE` [#docutray-convert-source]

Convert a document to structured data using a specified document type schema. Accepts a local file path or a public URL as the source. By default, processing is synchronous — the command waits and returns the extracted data. Use --async for long-running documents to poll for completion with status updates on stderr.

```
USAGE
  $ docutray convert SOURCE -t <value> [--async] [--json] [--metadata <value>] [--timeout <value>]
    [--webhook-url <value>]

ARGUMENTS
  SOURCE  File path or URL to convert

FLAGS
  -t, --type=<value>         (required) Document type code to use for extraction (see: docutray types list)
      --async                Use async processing with polling (default: false). Status updates are emitted to stderr.
      --json                 Output as JSON (default when piped)
      --metadata=<value>     JSON metadata to attach to the conversion (e.g. '{"key":"value"}')
      --timeout=<value>      [default: 300] Polling timeout in seconds for async processing
      --webhook-url=<value>  Webhook URL to receive a POST notification when conversion completes

DESCRIPTION
  Convert a document to structured data using a specified document type schema. Accepts a local file path or a public
  URL as the source. By default, processing is synchronous — the command waits and returns the extracted data. Use
  --async for long-running documents to poll for completion with status updates on stderr.

EXAMPLES
  Convert a local PDF using a document type

    $ docutray convert invoice.pdf --type electronic-invoice

  Convert a document from a URL

    $ docutray convert https://example.com/doc.pdf -t electronic-invoice

  Use async processing with status polling

    $ docutray convert invoice.pdf -t electronic-invoice --async

  Async with 10-minute timeout for large documents

    $ docutray convert large-doc.pdf -t electronic-invoice --async --timeout 600

  Convert with webhook notification on completion

    $ docutray convert receipt.jpg -t receipt --webhook-url https://example.com/hook

  Attach custom metadata to the conversion

    $ docutray convert invoice.pdf -t electronic-invoice --metadata '{"ref":"order-123"}'

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/convert
```

*See code: [src/commands/convert.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/convert.ts)*


---

# Identify (https://docs.docutray.com/docs/cli/commands/identify)



Identify the type of a document by analyzing its content. Returns the best-matching document type along with alternative matches ranked by confidence score. Use --types to restrict identification to a specific set of document types. Accepts a local file path or a public URL.

* [`docutray identify SOURCE`](#docutray-identify-source)

## `docutray identify SOURCE` [#docutray-identify-source]

Identify the type of a document by analyzing its content. Returns the best-matching document type along with alternative matches ranked by confidence score. Use --types to restrict identification to a specific set of document types. Accepts a local file path or a public URL.

```
USAGE
  $ docutray identify SOURCE [--async] [--json] [--types <value>]

ARGUMENTS
  SOURCE  File path or URL to identify

FLAGS
  --async          Use async processing with polling (default: false). Status updates are emitted to stderr.
  --json           Output as JSON (default when piped)
  --types=<value>  Comma-separated list of document type codes to restrict identification (e.g. invoice,receipt)

DESCRIPTION
  Identify the type of a document by analyzing its content. Returns the best-matching document type along with
  alternative matches ranked by confidence score. Use --types to restrict identification to a specific set of document
  types. Accepts a local file path or a public URL.

EXAMPLES
  Identify a local document

    $ docutray identify document.pdf

  Identify a document from a URL

    $ docutray identify https://example.com/doc.pdf

  Restrict to specific document types

    $ docutray identify document.pdf --types invoice,receipt,contract

  Force JSON output

    $ docutray identify document.pdf --json

  Use async processing with status polling

    $ docutray identify document.pdf --async

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/identify
```

*See code: [src/commands/identify.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/identify.ts)*


---

# Login (https://docs.docutray.com/docs/cli/commands/login)



Configure your DocuTray API key for authentication. When called without arguments, prompts to choose between pasting an existing API key or authenticating via OAuth2 in the browser. For non-interactive shells (CI, AI coding agents), use --oauth to drive the OAuth flow end-to-end without a TTY: the CLI prints the authorization URL on stderr, opens your browser, waits for the callback, and writes the resulting API key to \~/.config/docutray/config.json.

* [`docutray login [API-KEY]`](#docutray-login-api-key)

## `docutray login [API-KEY]` [#docutray-login-api-key]

Configure your DocuTray API key for authentication. When called without arguments, prompts to choose between pasting an existing API key or authenticating via OAuth2 in the browser. For non-interactive shells (CI, AI coding agents), use --oauth to drive the OAuth flow end-to-end without a TTY: the CLI prints the authorization URL on stderr, opens your browser, waits for the callback, and writes the resulting API key to \~/.config/docutray/config.json.

```
USAGE
  $ docutray login [API-KEY] [--api-key <value>] [--base-url <value>] [--json] [--no-browser] [--oauth]
    [--timeout <value>]

ARGUMENTS
  [API-KEY]  API key to save (omit for interactive prompt)

FLAGS
  --api-key=<value>   API key for non-interactive login
  --base-url=<value>  Custom base URL for the DocuTray API (default: https://app.docutray.com)
  --json              Output as JSON (default when piped)
  --no-browser        Skip opening the browser; print the URL only (--oauth only)
  --oauth             Login via OAuth in the browser (works without a TTY)
  --timeout=<value>   [default: 180] OAuth callback timeout in seconds (--oauth only)

DESCRIPTION
  Configure your DocuTray API key for authentication. When called without arguments, prompts to choose between pasting
  an existing API key or authenticating via OAuth2 in the browser. For non-interactive shells (CI, AI coding agents),
  use --oauth to drive the OAuth flow end-to-end without a TTY: the CLI prints the authorization URL on stderr, opens
  your browser, waits for the callback, and writes the resulting API key to ~/.config/docutray/config.json.

EXAMPLES
  Interactive login — choose API key or OAuth2 browser login

    $ docutray login

  Login via OAuth in the browser (works in agents/CI without a TTY)

    $ docutray login --oauth

  Print the OAuth URL but do not open the browser

    $ docutray login --oauth --no-browser

  Non-interactive login with API key as argument

    $ docutray login dt_live_abc123

  Non-interactive login with API key flag

    $ docutray login --api-key dt_live_abc123

  Login with a custom API base URL

    $ docutray login --base-url https://staging.docutray.com

  Alternative: use env var instead of login

    DOCUTRAY_API_KEY=dt_live_abc123 docutray status

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/login
```

*See code: [src/commands/login.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/login.ts)*


---

# Logout (https://docs.docutray.com/docs/cli/commands/logout)



Clear stored credentials by removing the local configuration file. This does not invalidate the API key itself — it only removes it from this machine. Has no effect if you are authenticating via the DOCUTRAY\_API\_KEY environment variable.

* [`docutray logout`](#docutray-logout)

## `docutray logout` [#docutray-logout]

Clear stored credentials by removing the local configuration file. This does not invalidate the API key itself — it only removes it from this machine. Has no effect if you are authenticating via the DOCUTRAY\_API\_KEY environment variable.

```
USAGE
  $ docutray logout [--json]

FLAGS
  --json  Output as JSON (default when piped)

DESCRIPTION
  Clear stored credentials by removing the local configuration file. This does not invalidate the API key itself — it
  only removes it from this machine. Has no effect if you are authenticating via the DOCUTRAY_API_KEY environment
  variable.

EXAMPLES
  Remove stored API key from local config

    $ docutray logout

  Logout and verify authentication is cleared

    $ docutray logout && docutray status

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/logout
```

*See code: [src/commands/logout.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/logout.ts)*


---

# Status (https://docs.docutray.com/docs/cli/commands/status)



Show current authentication status and configuration. Displays whether you are authenticated, the masked API key, the credential source (environment variable or config file), the API base URL, and the config file path. Useful for verifying your setup before running commands.

* [`docutray status`](#docutray-status)

## `docutray status` [#docutray-status]

Show current authentication status and configuration. Displays whether you are authenticated, the masked API key, the credential source (environment variable or config file), the API base URL, and the config file path. Useful for verifying your setup before running commands.

```
USAGE
  $ docutray status [--json]

FLAGS
  --json  Output as JSON (default when piped)

DESCRIPTION
  Show current authentication status and configuration. Displays whether you are authenticated, the masked API key, the
  credential source (environment variable or config file), the API base URL, and the config file path. Useful for
  verifying your setup before running commands.

EXAMPLES
  Check current authentication status

    $ docutray status

  Output as JSON (default when piped)

    $ docutray status --json

  Verify env var authentication is detected

    DOCUTRAY_API_KEY=dt_live_abc123 docutray status

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/status
```

*See code: [src/commands/status.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/status.ts)*


---

# Steps (https://docs.docutray.com/docs/cli/commands/steps)



Execute and monitor document processing steps. Steps are reusable processing pipelines that can be applied to documents for extraction, transformation, and validation.

* [`docutray steps run STEP-ID SOURCE`](#docutray-steps-run-step-id-source)
* [`docutray steps status EXECUTION-ID`](#docutray-steps-status-execution-id)

## `docutray steps run STEP-ID SOURCE` [#docutray-steps-run-step-id-source]

Execute a processing step on a document. Steps are reusable processing pipelines configured in the DocuTray dashboard. By default, the command waits for the step to complete and returns the result. Use --no-wait to return the execution status immediately without polling. Accepts a local file path or a public URL as the document source.

```
USAGE
  $ docutray steps:run STEP-ID SOURCE [--json] [--metadata <value>] [--no-wait] [--webhook-url <value>]

ARGUMENTS
  STEP-ID  Step ID to execute
  SOURCE   File path or URL to process

FLAGS
  --json                 Output as JSON (default when piped)
  --metadata=<value>     JSON metadata to attach to the execution (e.g. '{"key":"value"}')
  --no-wait              Return immediately with execution status instead of waiting for completion (default: false)
  --webhook-url=<value>  Webhook URL to receive a POST notification when the step completes

DESCRIPTION
  Execute a processing step on a document. Steps are reusable processing pipelines configured in the DocuTray dashboard.
  By default, the command waits for the step to complete and returns the result. Use --no-wait to return the execution
  status immediately without polling. Accepts a local file path or a public URL as the document source.

EXAMPLES
  Run a step on a local file and wait for results

    $ docutray steps run extract-fields invoice.pdf

  Run a step on a document URL

    $ docutray steps run extract-fields https://example.com/doc.pdf

  Start execution and return immediately (async)

    $ docutray steps run extract-fields invoice.pdf --no-wait

  Attach custom metadata to the execution

    $ docutray steps run extract-fields invoice.pdf --metadata '{"ref":"order-123"}'

  Receive a webhook notification on completion

    $ docutray steps run extract-fields invoice.pdf --webhook-url https://example.com/hook

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/steps/run
```

*See code: [src/commands/steps/run.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/steps/run.ts)*

## `docutray steps status EXECUTION-ID` [#docutray-steps-status-execution-id]

Query the current status of a step execution by its execution ID. Use this to check progress on executions started with --no-wait, or to retrieve results after receiving a webhook notification. Returns the execution status, progress, and result data when complete.

```
USAGE
  $ docutray steps:status EXECUTION-ID [--json]

ARGUMENTS
  EXECUTION-ID  Step execution ID to query

FLAGS
  --json  Output as JSON (default when piped)

DESCRIPTION
  Query the current status of a step execution by its execution ID. Use this to check progress on executions started
  with --no-wait, or to retrieve results after receiving a webhook notification. Returns the execution status, progress,
  and result data when complete.

EXAMPLES
  Check the status of an execution

    $ docutray steps status exec_abc123

  Output as JSON

    $ docutray steps status exec_abc123 --json

  Start async execution then check its status

    $ docutray steps run my-step doc.pdf --no-wait | jq -r .id | xargs docutray steps status

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/steps/status
```

*See code: [src/commands/steps/status.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/steps/status.ts)*


---

# Types (https://docs.docutray.com/docs/cli/commands/types)



Manage document types (extraction schemas). Document types define the fields and structure that DocuTray extracts from documents during conversion.

* [`docutray types create`](#docutray-types-create)
* [`docutray types export CODE`](#docutray-types-export-code)
* [`docutray types get CODE`](#docutray-types-get-code)
* [`docutray types list`](#docutray-types-list)
* [`docutray types update CODE`](#docutray-types-update-code)

## `docutray types create` [#docutray-types-create]

Create a new document type. Defines an extraction schema that DocuTray uses when converting documents. Requires a name, code, description, and JSON schema. The schema can be provided as a file path or inline JSON string.

```
USAGE
  $ docutray types:create --code <value> --description <value> --name <value> --schema <value> [--conversion-mode
    json|toon|multi_prompt] [--identify-hints <value>] [--json] [--keep-ordering] [--prompt-hints <value>] [--publish |
    --draft]

FLAGS
  --code=<value>              (required) Unique code identifier (lowercase, numbers, underscores)
  --conversion-mode=<option>  Conversion mode
                              <options: json|toon|multi_prompt>
  --description=<value>       (required) Description of the document type
  --[no-]draft                Create as draft (default: true)
  --identify-hints=<value>    Hints for automatic document identification
  --json                      Output as JSON (default when piped)
  --keep-ordering             Preserve property ordering in extraction output
  --name=<value>              (required) Document type name
  --prompt-hints=<value>      General extraction prompt hints
  --publish                   Publish immediately (equivalent to --no-draft)
  --schema=<value>            (required) JSON schema: file path or inline JSON string

DESCRIPTION
  Create a new document type. Defines an extraction schema that DocuTray uses when converting documents. Requires a
  name, code, description, and JSON schema. The schema can be provided as a file path or inline JSON string.

EXAMPLES
  Create from a schema file

    $ docutray types create --name "Invoice" --code invoice --description "Standard invoice" --schema schema.json

  Create with inline JSON schema

    $ docutray types create --name "Invoice" --code invoice --description "Standard invoice" --schema \
      '{"type":"object","properties":{"total":{"type":"number"}}}'

  Create and publish immediately

    $ docutray types create --name "Invoice" --code invoice --description "Standard invoice" --schema schema.json \
      --publish

  Create with a specific conversion mode

    $ docutray types create --name "Invoice" --code invoice --description "Standard invoice" --schema schema.json \
      --conversion-mode toon

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/types/create
```

*See code: [src/commands/types/create.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/types/create.ts)*

## `docutray types export CODE` [#docutray-types-export-code]

Export a document type definition to JSON format. By default, writes to stdout for piping or redirection. Use --output to write directly to a file. Useful for backing up type definitions, version-controlling them in Git, or migrating types between environments.

```
USAGE
  $ docutray types:export CODE [--force] [--json] [-o <value>]

ARGUMENTS
  CODE  Document type code

FLAGS
  -o, --output=<value>  Output file path. If omitted, writes to stdout.
      --force           Overwrite existing file
      --json            Output as JSON (default when piped)

DESCRIPTION
  Export a document type definition to JSON format. By default, writes to stdout for piping or redirection. Use --output
  to write directly to a file. Useful for backing up type definitions, version-controlling them in Git, or migrating
  types between environments.

EXAMPLES
  Export a document type to stdout

    $ docutray types export electronic-invoice

  Export directly to a file

    $ docutray types export electronic-invoice -o invoice-type.json

  Export using shell redirection

    $ docutray types export electronic-invoice > backup.json

  Export and pretty-print with jq

    $ docutray types export electronic-invoice | jq .

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/types/export
```

*See code: [src/commands/types/export.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/types/export.ts)*

## `docutray types get CODE` [#docutray-types-get-code]

Get the full details of a document type by its code. Returns the type name, description, field schema, and configuration. Use this to inspect a document type before converting documents or to verify type settings.

```
USAGE
  $ docutray types:get CODE [--json]

ARGUMENTS
  CODE  Document type code

FLAGS
  --json  Output as JSON (default when piped)

DESCRIPTION
  Get the full details of a document type by its code. Returns the type name, description, field schema, and
  configuration. Use this to inspect a document type before converting documents or to verify type settings.

EXAMPLES
  Get full details of a document type

    $ docutray types get electronic-invoice

  Output full JSON (includes field schema)

    $ docutray types get electronic-invoice --json

  Extract just the field schema (useful for scripts)

    $ docutray types get electronic-invoice | jq .fields

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/types/get
```

*See code: [src/commands/types/get.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/types/get.ts)*

## `docutray types list` [#docutray-types-list]

List available document types with pagination and search. Document types define the extraction schema used when converting documents. Results are paginated — use --page and --limit to navigate through large result sets. Use --search to filter by name.

```
USAGE
  $ docutray types:list [--json] [--limit <value>] [--page <value>] [--search <value>]

FLAGS
  --json            Output as JSON (default when piped)
  --limit=<value>   [default: 20] Number of results per page
  --page=<value>    [default: 1] Page number for pagination
  --search=<value>  Filter document types by name (case-insensitive substring match)

DESCRIPTION
  List available document types with pagination and search. Document types define the extraction schema used when
  converting documents. Results are paginated — use --page and --limit to navigate through large result sets. Use
  --search to filter by name.

EXAMPLES
  List all document types (first page, default 20 results)

    $ docutray types list

  Search for document types by name

    $ docutray types list --search invoice

  Paginate through results

    $ docutray types list --limit 50 --page 2

  Force JSON output

    $ docutray types list --json

  Extract just the type codes (useful for scripting)

    $ docutray types list | jq ".data[].codeType"

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/types/list
```

*See code: [src/commands/types/list.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/types/list.ts)*

## `docutray types update CODE` [#docutray-types-update-code]

Update an existing document type. Allows modifying name, description, schema, prompt hints, and other settings. At least one field to update must be provided. The code identifier cannot be changed.

```
USAGE
  $ docutray types:update CODE [--conversion-mode json|toon|multi_prompt] [--description <value>]
    [--identify-hints <value>] [--json] [--keep-ordering] [--name <value>] [--prompt-hints <value>] [--publish |
    --draft] [--schema <value>]

ARGUMENTS
  CODE  Document type code to update

FLAGS
  --conversion-mode=<option>  Conversion mode
                              <options: json|toon|multi_prompt>
  --description=<value>       New description
  --[no-]draft                Set draft status
  --identify-hints=<value>    Hints for automatic document identification
  --json                      Output as JSON (default when piped)
  --[no-]keep-ordering        Preserve property ordering in extraction output
  --name=<value>              New name
  --prompt-hints=<value>      General extraction prompt hints
  --publish                   Publish immediately (sets draft to false)
  --schema=<value>            New JSON schema: file path or inline JSON string

DESCRIPTION
  Update an existing document type. Allows modifying name, description, schema, prompt hints, and other settings. At
  least one field to update must be provided. The code identifier cannot be changed.

EXAMPLES
  Update the name

    $ docutray types update invoice --name "Updated Invoice"

  Update the schema from a file

    $ docutray types update invoice --schema new-schema.json

  Update prompt hints

    $ docutray types update invoice --prompt-hints "Use dd/mm/yyyy for dates"

  Publish a draft type

    $ docutray types update invoice --publish

DOCUMENTATION
  Learn more: https://docs.docutray.com/cli/types/update
```

*See code: [src/commands/types/update.ts](https://github.com/docutray/docutray-cli/blob/v0.2.1/src/commands/types/update.ts)*


---

# Convert Types (https://docs.docutray.com/docs/node-sdk/types/convert)



## ConversionStatusType [#conversionstatustype]

Possible statuses for a conversion operation.

```ts
type ConversionStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

## ConversionResult [#conversionresult]

Extracted data from a successful conversion.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConversionResult" />

## ConversionStatus [#conversionstatus]

Status of a conversion operation, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConversionStatus" />

## ConvertParams [#convertparams]

Parameters for creating a conversion request. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConvertParams" />

## Type Guards [#type-guards]

### `isConversionComplete(status)` [#isconversioncompletestatus]

Returns `true` if the conversion has reached a terminal state (`SUCCESS` or `ERROR`).

### `isConversionSuccess(status)` [#isconversionsuccessstatus]

Returns `true` if the conversion completed successfully.

### `isConversionError(status)` [#isconversionerrorstatus]

Returns `true` if the conversion failed with an error.

```ts
import { isConversionSuccess, isConversionError } from 'docutray';

if (isConversionSuccess(status)) {
  console.log('Data:', status.data);
} else if (isConversionError(status)) {
  console.log('Error:', status.error);
}
```


---

# Document Type Types (https://docs.docutray.com/docs/node-sdk/types/document-type)



## DocumentType [#documenttype]

A document type definition from the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="DocumentType" />

## ValidationErrorInfo [#validationerrorinfo]

Validation error details.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationErrorInfo" />

## ValidationWarningInfo [#validationwarninginfo]

Validation warning details.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationWarningInfo" />

## ValidationResult [#validationresult]

Result of validating a document type schema.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationResult" />

## DocumentTypesListParams [#documenttypeslistparams]

Parameters for listing document types.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="DocumentTypesListParams" />

## Type Guards [#type-guards]

### `isValidationValid(result)` [#isvalidationvalidresult]

Returns `true` if the validation result has no errors.

### `hasValidationWarnings(result)` [#hasvalidationwarningsresult]

Returns `true` if the validation result has warnings.

```ts
import { isValidationValid, hasValidationWarnings } from 'docutray';

const result = await client.documentTypes.validate('dt_abc123');

if (isValidationValid(result)) {
  console.log('Schema is valid');
  if (hasValidationWarnings(result)) {
    console.log('Warnings:', result.warnings.messages);
  }
}
```


---

# Identify Types (https://docs.docutray.com/docs/node-sdk/types/identify)



## IdentificationStatusType [#identificationstatustype]

Possible statuses for an identification operation.

```ts
type IdentificationStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

## DocumentTypeMatch [#documenttypematch]

A document type match with confidence score.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="DocumentTypeMatch" />

## IdentificationResult [#identificationresult]

Result of a successful identification, including primary and alternative matches.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentificationResult" />

## IdentificationStatus [#identificationstatus]

Status of an identification operation, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentificationStatus" />

## IdentifyParams [#identifyparams]

Parameters for creating an identification request. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentifyParams" />

## Type Guards [#type-guards]

### `isIdentificationComplete(status)` [#isidentificationcompletestatus]

Returns `true` if the identification has reached a terminal state (`SUCCESS` or `ERROR`).

### `isIdentificationSuccess(status)` [#isidentificationsuccessstatus]

Returns `true` if the identification completed successfully.

### `isIdentificationError(status)` [#isidentificationerrorstatus]

Returns `true` if the identification failed with an error.

```ts
import { isIdentificationSuccess } from 'docutray';

if (isIdentificationSuccess(status)) {
  console.log('Document type:', status.document_type);
}
```


---

# Knowledge Base Types (https://docs.docutray.com/docs/node-sdk/types/knowledge-base)



These are the TypeScript models returned and accepted by the knowledge-base
methods of the [Docutray Node.js SDK](/docs/node-sdk). A knowledge base stores
documents with vector embeddings so you can run semantic search over them — see
the [Knowledge Bases operation guide](/docs/operations/knowledge-bases) for the
end-to-end workflow and runnable examples. Every type below is exported from the
package, so you can import it directly for fully typed request and response
handling:

```ts
import type { KnowledgeBase, SearchResult } from 'docutray';
```

All list endpoints return the [`PaginatedResponse`](/docs/node-sdk/types/shared)
wrapper described in the shared types, and every method can throw the typed
errors documented in [Errors](/docs/node-sdk/errors).

## KnowledgeBase [#knowledgebase]

A knowledge base definition as returned by the API — its id, name, description,
and metadata such as the embedding model and document count. This is the shape
you receive from `client.knowledgeBases.create()` and `.list()`.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="KnowledgeBase" />

## KnowledgeBaseDocument [#knowledgebasedocument]

A single document stored inside a knowledge base, including its source content
reference and sync status. You add these by uploading files, after which
Docutray generates the embeddings used for search.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="KnowledgeBaseDocument" />

## SearchResultItem [#searchresultitem]

One match from a semantic-search query, pairing the matched document chunk with
a similarity score. Higher scores indicate a closer semantic match to the query.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SearchResultItem" />

## SearchResult [#searchresult]

The full response from a knowledge-base query — the ordered list of
[`SearchResultItem`](#searchresultitem) matches plus query metadata. Returned by
`client.knowledgeBases.search()`.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SearchResult" />

## SyncResult [#syncresult]

The outcome of a knowledge-base sync operation, reporting how many documents
were added, updated, or removed. Use it to confirm a re-index completed before
relying on fresh search results.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SyncResult" />


---

# Shared Types (https://docs.docutray.com/docs/node-sdk/types/shared)



These types are reused across every method of the
[Docutray Node.js SDK](/docs/node-sdk): the MIME types you may upload, the
pagination envelope wrapping list responses, rate-limit metadata read from
response headers, and the error shapes surfaced when a request fails. They are
all exported from the package for direct import:

```ts
import type { ImageContentType, PaginatedResponse } from 'docutray';
```

For the full error class hierarchy that wraps [`ErrorDetail`](#errordetail) and
[`QuotaExceededInfo`](#quotaexceededinfo), see [Errors](/docs/node-sdk/errors).

## ImageContentType [#imagecontenttype]

The set of file MIME types accepted by upload-based operations such as
[Convert](/docs/operations/convert) and [Identify](/docs/operations/identify).
Passing any other content type is rejected before the request is sent.

```ts
type ImageContentType =
  | 'image/png'
  | 'image/jpeg'
  | 'image/tiff'
  | 'image/webp'
  | 'application/pdf';
```

## Pagination [#pagination]

Pagination metadata returned by the API — the current page, page size, and total
count. It lets you compute how many more pages remain when iterating list
endpoints.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="Pagination" />

## PaginatedResponse [#paginatedresponse]

The generic wrapper around every list response: the array of items plus the
[`Pagination`](#pagination) block above. List methods like
`client.documentTypes.list()` resolve to this shape.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="PaginatedResponse" />

## RateLimitInfo [#ratelimitinfo]

Rate-limit state parsed from the API response headers — remaining requests and
the reset window. Read it to back off proactively before hitting a 429.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="RateLimitInfo" />

## QuotaExceededInfo [#quotaexceededinfo]

The details attached to a `429` quota response — which limit was exceeded and
when it resets. Surfaced on the corresponding SDK error so you can retry after
the reset.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="QuotaExceededInfo" />

## ErrorDetail [#errordetail]

A single structured error entry from a failed API response, carrying the field,
code, and message. The typed SDK [errors](/docs/node-sdk/errors) expose an array
of these for validation failures.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="ErrorDetail" />


---

# Step Types (https://docs.docutray.com/docs/node-sdk/types/step)



## StepExecutionStatusType [#stepexecutionstatustype]

Possible statuses for a step execution.

```ts
type StepExecutionStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

## StepExecutionStatus [#stepexecutionstatus]

Status of a step execution, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/step.ts" name="StepExecutionStatus" />

## StepsRunParams [#stepsrunparams]

Parameters for running a step. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/step.ts" name="StepsRunParams" />

## Type Guards [#type-guards]

### `isStepExecutionComplete(status)` [#isstepexecutioncompletestatus]

Returns `true` if the step execution has reached a terminal state (`SUCCESS` or `ERROR`).

### `isStepExecutionSuccess(status)` [#isstepexecutionsuccessstatus]

Returns `true` if the step execution completed successfully.

### `isStepExecutionError(status)` [#isstepexecutionerrorstatus]

Returns `true` if the step execution failed with an error.

```ts
import { isStepExecutionSuccess } from 'docutray';

const result = await status.wait();

if (isStepExecutionSuccess(result)) {
  console.log('Processed data:', result.data);
}
```


---

# Convert (https://docs.docutray.com/docs/node-sdk/resources/convert)



## Convert [#convert]

Resource for converting documents to structured data using OCR. Access via `client.convert`.

### Methods [#methods]

#### `run(params, options?)` [#runparams-options]

Creates a synchronous conversion request. The API processes the document and returns the result.

```ts
const status = await client.convert.run({
  documentTypeCode: 'invoice',
  url: 'https://example.com/invoice.pdf',
});

console.log(status.conversion_id);
console.log(status.status); // 'SUCCESS' | 'ERROR' | 'ENQUEUED' | 'PROCESSING'
console.log(status.data);   // extracted data (on success)
```

**Parameters**: [`ConvertParams`](/docs/node-sdk/types/convert#convertparams)
**Returns**: `Promise<ConversionStatus>`

#### `runAsync(params, options?)` [#runasyncparams-options]

Creates an asynchronous conversion request. Returns a status object with a `.wait()` method for polling.

```ts
import fs from 'fs';

const status = await client.convert.runAsync({
  documentTypeCode: 'invoice',
  file: fs.readFileSync('invoice.pdf'),
});

// Poll until completion
const result = await status.wait();
console.log(result.data); // extracted data
```

**Parameters**: [`ConvertParams`](/docs/node-sdk/types/convert#convertparams)
**Returns**: `Promise<ConversionStatus & { wait(): Promise<ConversionStatus> }>`

#### `getStatus(conversionId, options?)` [#getstatusconversionid-options]

Retrieves the current status of a conversion operation.

```ts
const status = await client.convert.getStatus('conv_abc123');
```

**Parameters**: `conversionId: string`
**Returns**: `Promise<ConversionStatus>`

### Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.convert.withRawResponse.run({
  documentTypeCode: 'invoice',
  url: 'https://example.com/invoice.pdf',
});

console.log(raw.statusCode); // 200
console.log(raw.headers);    // Response headers
const data = await raw.parse(); // Parsed body
```


---

# Document Types (https://docs.docutray.com/docs/node-sdk/resources/document-types)



## DocumentTypes [#documenttypes]

Resource for listing and inspecting document type definitions. Access via `client.documentTypes`.

### Methods [#methods]

#### `list(params?, options?)` [#listparams-options]

Lists available document types with pagination.

```ts
const page = await client.documentTypes.list({ limit: 10 });

for (const docType of page.data) {
  console.log(docType.name, docType.codeType);
}

// Auto-pagination
for await (const docType of page.autoPagingIter()) {
  console.log(docType.name);
}
```

**Parameters**: [`DocumentTypesListParams`](/docs/node-sdk/types/document-type#documenttypeslistparams) (optional)
**Returns**: `Promise<Page<DocumentType>>`

#### `get(id, options?)` [#getid-options]

Retrieves a single document type by ID.

```ts
const docType = await client.documentTypes.get('dt_abc123');
console.log(docType.name, docType.schema);
```

**Parameters**: `id: string`
**Returns**: `Promise<DocumentType>`

#### `validate(id, options?)` [#validateid-options]

Validates a document type schema.

```ts
const result = await client.documentTypes.validate('dt_abc123');

if (result.errors.count === 0) {
  console.log('Schema is valid');
} else {
  console.log('Errors:', result.errors.messages);
}
```

**Parameters**: `id: string`
**Returns**: `Promise<ValidationResult>`

### Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.documentTypes.withRawResponse.list();
console.log(raw.statusCode);
```


---

# Identify (https://docs.docutray.com/docs/node-sdk/resources/identify)



## Identify [#identify]

Resource for identifying document types from images. Access via `client.identify`.

### Methods [#methods]

#### `run(params, options?)` [#runparams-options]

Creates a synchronous identification request.

```ts
const status = await client.identify.run({
  url: 'https://example.com/document.pdf',
});

console.log(status.document_type); // best match
console.log(status.alternatives);  // other matches
```

**Parameters**: [`IdentifyParams`](/docs/node-sdk/types/identify#identifyparams)
**Returns**: `Promise<IdentificationStatus>`

#### `runAsync(params, options?)` [#runasyncparams-options]

Creates an asynchronous identification request with a `.wait()` method for polling.

```ts
import fs from 'fs';

const status = await client.identify.runAsync({
  file: fs.readFileSync('document.pdf'),
});

const result = await status.wait();
console.log(result.document_type);
```

**Parameters**: [`IdentifyParams`](/docs/node-sdk/types/identify#identifyparams)
**Returns**: `Promise<IdentificationStatus & { wait(): Promise<IdentificationStatus> }>`

#### `getStatus(identificationId, options?)` [#getstatusidentificationid-options]

Retrieves the current status of an identification operation.

```ts
const status = await client.identify.getStatus('id_abc123');
```

**Parameters**: `identificationId: string`
**Returns**: `Promise<IdentificationStatus>`

### Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.identify.withRawResponse.run({
  url: 'https://example.com/document.pdf',
});

console.log(raw.statusCode);
const data = await raw.parse();
```


---

# Knowledge Bases (https://docs.docutray.com/docs/node-sdk/resources/knowledge-bases)



## KnowledgeBases [#knowledgebases]

Resource for managing knowledge bases and their documents. Access via `client.knowledgeBases`.

### Methods [#methods]

#### `list(params?, options?)` [#listparams-options]

Lists knowledge bases with pagination.

```ts
const page = await client.knowledgeBases.list({ limit: 10 });

for (const kb of page.data) {
  console.log(kb.name, kb.documentCount);
}
```

**Returns**: `Promise<Page<KnowledgeBase>>`

#### `get(id, options?)` [#getid-options]

Retrieves a single knowledge base by ID.

```ts
const kb = await client.knowledgeBases.get('kb_abc123');
```

**Returns**: `Promise<KnowledgeBase>`

#### `create(params, options?)` [#createparams-options]

Creates a new knowledge base.

```ts
const kb = await client.knowledgeBases.create({
  name: 'Product catalog',
  description: 'Product information database',
});
```

**Returns**: `Promise<KnowledgeBase>`

#### `update(id, params, options?)` [#updateid-params-options]

Updates an existing knowledge base.

```ts
const kb = await client.knowledgeBases.update('kb_abc123', {
  name: 'Updated catalog',
});
```

**Returns**: `Promise<KnowledgeBase>`

#### `delete(id, options?)` [#deleteid-options]

Deletes a knowledge base.

```ts
await client.knowledgeBases.delete('kb_abc123');
```

**Returns**: `Promise<void>`

#### `search(id, params, options?)` [#searchid-params-options]

Searches a knowledge base for matching documents.

```ts
const results = await client.knowledgeBases.search('kb_abc123', {
  query: 'invoice total',
  limit: 5,
});

for (const item of results.data) {
  console.log(item.document.content, item.similarity);
}
```

**Returns**: `Promise<SearchResult>`

#### `sync(id, options?)` [#syncid-options]

Triggers a sync operation for the knowledge base.

```ts
const result = await client.knowledgeBases.sync('kb_abc123');
console.log(result.status, result.documentsProcessed);
```

**Returns**: `Promise<SyncResult>`

### Documents [#documents]

Access knowledge base documents via `client.knowledgeBases.documents(knowledgeBaseId)`.

#### `documents(kbId).list(params?, options?)` [#documentskbidlistparams-options]

Lists documents in a knowledge base.

```ts
const docs = await client.knowledgeBases.documents('kb_abc123').list();
```

#### `documents(kbId).get(docId, options?)` [#documentskbidgetdocid-options]

Gets a single document.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').get('doc_xyz');
```

#### `documents(kbId).create(params, options?)` [#documentskbidcreateparams-options]

Creates a new document in the knowledge base.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').create({
  content: { title: 'Invoice', amount: 100 },
  metadata: { source: 'upload' },
});
```

#### `documents(kbId).update(docId, params, options?)` [#documentskbidupdatedocid-params-options]

Updates an existing document.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').update('doc_xyz', {
  content: { title: 'Updated Invoice' },
});
```

#### `documents(kbId).delete(docId, options?)` [#documentskbiddeletedocid-options]

Deletes a document from the knowledge base.

```ts
await client.knowledgeBases.documents('kb_abc123').delete('doc_xyz');
```


---

# Steps (https://docs.docutray.com/docs/node-sdk/resources/steps)



## Steps [#steps]

Resource for running predefined processing steps. Access via `client.steps`.

### Methods [#methods]

#### `runAsync(params, options?)` [#runasyncparams-options]

Runs a processing step asynchronously with a `.wait()` method for polling.

```ts
const status = await client.steps.runAsync({
  stepId: 'step_abc123',
  url: 'https://example.com/document.pdf',
});

const result = await status.wait();
console.log(result.status); // 'SUCCESS'
console.log(result.data);   // processed result
```

**Parameters**: [`StepsRunParams`](/docs/node-sdk/types/step#stepsrunparams)
**Returns**: `Promise<StepExecutionStatus & { wait(): Promise<StepExecutionStatus> }>`

#### `getStatus(executionId, options?)` [#getstatusexecutionid-options]

Retrieves the current status of a step execution.

```ts
const status = await client.steps.getStatus('exec_abc123');
console.log(status.status);
```

**Parameters**: `executionId: string`
**Returns**: `Promise<StepExecutionStatus>`

### Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.steps.withRawResponse.runAsync({
  stepId: 'step_abc123',
  url: 'https://example.com/document.pdf',
});

console.log(raw.statusCode);
```


---

# Convert (https://docs.docutray.com/docs/python-sdk/resources/convert)



Convert resource for document conversion operations.

## `AsyncConvert` [#asyncconvert]

Asynchronous document conversion operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     result = await client.convert.run(
    ...         file=Path("invoice.pdf"),
    ...         document_type_code="invoice"
    ...     )
    ...     print(result.data)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

### `get_status` [#get_status]

```python
def get_status(self, conversion_id: str) -> ConversionStatus
```

Get the status of an asynchronous conversion.

**Arguments:**

conversion\_id: The conversion ID returned by run\_async().

**Returns:**

The current conversion status.

### `run` [#run]

```python
def run(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionResult
```

Convert a document asynchronously.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The conversion result with extracted data.

### `run_async` [#run_async]

```python
def run_async(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionStatus
```

Start an asynchronous document conversion.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial conversion status with conversion\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

## `Convert` [#convert]

Synchronous document conversion operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data)
```

**Arguments:**

client: The parent client instance.

**Methods:**

### `get_status` [#get_status-1]

```python
def get_status(self, conversion_id: str) -> ConversionStatus
```

Get the status of an asynchronous conversion.

**Arguments:**

conversion\_id: The conversion ID returned by run\_async().

**Returns:**

The current conversion status.

**Example:**

```python
>>> status = client.convert.get_status("conv_abc123")
    >>> if status.is_success():
    ...     print(status.data)
```

### `run` [#run-1]

```python
def run(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionResult
```

Convert a document synchronously.

Sends a document to the API and waits for the conversion result.
This is suitable for small documents that process quickly.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The conversion result with extracted data.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
AuthenticationError: If the API key is invalid.

**Example:**

```python
>>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data["total"])
```

### `run_async` [#run_async-1]

```python
def run_async(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionStatus
```

Start an asynchronous document conversion.

Initiates a conversion job and returns immediately with a conversion ID.
Use get\_status() to poll for completion, or call wait() on the result.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial conversion status with conversion\_id.

**Example:**

```python
>>> status = client.convert.run_async(
    ...     file=Path("large_document.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(f"Conversion ID: {status.conversion_id}")
    >>> # Poll for completion
    >>> final = status.wait()
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Document Types (https://docs.docutray.com/docs/python-sdk/resources/document_types)



Document Types resource for document type catalog operations.

## `AsyncDocumentTypes` [#asyncdocumenttypes]

Asynchronous document type operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     page = await client.document_types.list()
    ...     for doc_type in page.data:
    ...         print(f"{doc_type.codeType}: {doc_type.name}")
    >>>
    >>> # Iterate through all document types across pages
    >>> async for doc_type in (await client.document_types.list()).auto_paging_iter_async():
    ...     print(doc_type.name)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

### `get` [#get]

```python
def get(self, type_id: str) -> DocumentType
```

Get a specific document type by ID.

**Arguments:**

type\_id: The document type ID.

**Returns:**

The document type details including schema.

### `list` [#list]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> AsyncPage[DocumentType]
```

List available document types.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page. Defaults to server default.
search: Search term to filter document types by name.

**Returns:**

An AsyncPage of document types with pagination support.

### `validate` [#validate]

```python
def validate(self, type_id: str, data: dict[str, Any]) -> ValidationResult
```

Validate JSON data against a document type's schema.

**Arguments:**

type\_id: The document type ID to validate against.
data: The JSON data to validate.

**Returns:**

Validation result with errors and warnings.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

## `DocumentTypes` [#documenttypes]

Synchronous document type operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> page = client.document_types.list()
    >>> for doc_type in page.data:
    ...     print(f"{doc_type.codeType}: {doc_type.name}")
    >>>
    >>> # Iterate through all document types across pages
    >>> for doc_type in client.document_types.list().auto_paging_iter():
    ...     print(doc_type.name)
```

**Arguments:**

client: The parent client instance.

**Methods:**

### `get` [#get-1]

```python
def get(self, type_id: str) -> DocumentType
```

Get a specific document type by ID.

**Arguments:**

type\_id: The document type ID.

**Returns:**

The document type details including schema.

**Raises:**

NotFoundError: If the document type doesn't exist.

**Example:**

```python
>>> doc_type = client.document_types.get("dt_abc123")
    >>> print(f"Name: {doc_type.name}")
    >>> print(f"Schema: {doc_type.schema_}")
```

### `list` [#list-1]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> Page[DocumentType]
```

List available document types.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page. Defaults to server default.
search: Search term to filter document types by name.

**Returns:**

A Page of document types with pagination support.

**Example:**

```python
>>> # List all document types
    >>> page = client.document_types.list()
    >>> for doc_type in page.data:
    ...     print(doc_type.name)
    >>>
    >>> # Iterate through all pages
    >>> for page in client.document_types.list().iter_pages():
    ...     print(f"Page {page.page}: {len(page.data)} items")
    >>>
    >>> # Iterate through all items automatically
    >>> for doc_type in client.document_types.list().auto_paging_iter():
    ...     print(doc_type.name)
    >>>
    >>> # Search for specific types
    >>> page = client.document_types.list(search="invoice")
```

### `validate` [#validate-1]

```python
def validate(self, type_id: str, data: dict[str, Any]) -> ValidationResult
```

Validate JSON data against a document type's schema.

This validates extracted data to check if it conforms to the
document type's expected structure and requirements.

**Arguments:**

type\_id: The document type ID to validate against.
data: The JSON data to validate.

**Returns:**

Validation result with errors and warnings.

**Example:**

```python
>>> result = client.document_types.validate(
    ...     "dt_invoice",
    ...     {"invoice_number": "INV-001", "total": 100}
    ... )
    >>> if result.is_valid():
    ...     print("Data is valid!")
    >>> else:
    ...     for error in result.errors.messages:
    ...         print(f"Error: {error}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Identify (https://docs.docutray.com/docs/python-sdk/resources/identify)



Identify resource for document type identification operations.

## `AsyncIdentify` [#asyncidentify]

Asynchronous document identification operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     result = await client.identify.run(file=Path("document.pdf"))
    ...     print(f"Type: {result.document_type.code}")
```

**Arguments:**

client: The parent async client instance.

**Methods:**

### `get_status` [#get_status]

```python
def get_status(self, identification_id: str) -> IdentificationStatus
```

Get the status of an asynchronous identification.

**Arguments:**

identification\_id: The identification ID returned by run\_async().

**Returns:**

The current identification status.

### `run` [#run]

```python
def run(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationResult
```

Identify the type of a document asynchronously.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The identification result with document type and alternatives.

### `run_async` [#run_async]

```python
def run_async(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationStatus
```

Start an asynchronous document identification.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The initial identification status with identification\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

## `Identify` [#identify]

Synchronous document identification operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> result = client.identify.run(file=Path("document.pdf"))
    >>> print(f"Type: {result.document_type.code}")
    >>> print(f"Confidence: {result.document_type.confidence}")
```

**Arguments:**

client: The parent client instance.

**Methods:**

### `get_status` [#get_status-1]

```python
def get_status(self, identification_id: str) -> IdentificationStatus
```

Get the status of an asynchronous identification.

**Arguments:**

identification\_id: The identification ID returned by run\_async().

**Returns:**

The current identification status.

**Example:**

```python
>>> status = client.identify.get_status("id_abc123")
    >>> if status.is_success():
    ...     print(status.document_type.name)
```

### `run` [#run-1]

```python
def run(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationResult
```

Identify the type of a document synchronously.

Sends a document to the API and returns the identified document type
with confidence scores.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to. If provided, the API will only consider
these document types when identifying.

**Returns:**

The identification result with document type and alternatives.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
AuthenticationError: If the API key is invalid.

**Example:**

```python
>>> result = client.identify.run(file=Path("unknown.pdf"))
    >>> print(f"Identified as: {result.document_type.name}")
    >>> for alt in result.alternatives:
    ...     print(f"  Alternative: {alt.name} ({alt.confidence:.2%})")

    >>> # Limit to specific document types
    >>> result = client.identify.run(
    ...     file=Path("statement.pdf"),
    ...     document_type_code_options=["cartola_cc", "cartola_tc"]
    ... )
```

### `run_async` [#run_async-1]

```python
def run_async(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationStatus
```

Start an asynchronous document identification.

Initiates an identification job and returns immediately with an ID.
Use get\_status() to poll for completion.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The initial identification status with identification\_id.

**Example:**

```python
>>> status = client.identify.run_async(file=Path("document.pdf"))
    >>> final = status.wait()
    >>> print(f"Type: {final.document_type.code}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Knowledge Bases (https://docs.docutray.com/docs/python-sdk/resources/knowledge_bases)



Knowledge Bases resource for semantic document search operations.

## `AsyncKnowledgeBaseDocuments` [#asyncknowledgebasedocuments]

Asynchronous document operations for a knowledge base.

**Arguments:**

client: The parent async client instance.
knowledge\_base\_id: The knowledge base ID.

**Methods:**

### `create` [#create]

```python
def create(self, content: dict[str, Any], document_id: str | None = None, metadata: dict[str, Any] | None = None, generate_embedding: bool = True) -> KnowledgeBaseDocument
```

Add a document to the knowledge base.

**Arguments:**

content: Document content matching the knowledge base schema.
document\_id: Optional external document reference ID.
metadata: Optional additional metadata.
generate\_embedding: Whether to automatically generate embedding. Defaults to True.

**Returns:**

The created document.

### `delete` [#delete]

```python
def delete(self, document_id: str) -> None
```

Delete a document from the knowledge base.

**Arguments:**

document\_id: The document ID to delete.

### `get` [#get]

```python
def get(self, document_id: str) -> KnowledgeBaseDocument
```

Get a specific document by ID.

**Arguments:**

document\_id: The document ID.

**Returns:**

The document details.

### `list` [#list]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> AsyncPage[KnowledgeBaseDocument]
```

List documents in the knowledge base.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter documents.

**Returns:**

An AsyncPage of documents with pagination support.

### `update` [#update]

```python
def update(self, document_id: str, content: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None, regenerate_embedding: bool = False) -> KnowledgeBaseDocument
```

Update a document in the knowledge base.

**Arguments:**

document\_id: The document ID to update.
content: Updated document content.
metadata: Updated metadata.
regenerate\_embedding: Whether to force embedding regeneration. Defaults to False.

**Returns:**

The updated document.

## `AsyncKnowledgeBases` [#asyncknowledgebases]

Asynchronous knowledge base operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     async for kb in (await client.knowledge_bases.list()).auto_paging_iter_async():
    ...         print(f"{kb.name}: {kb.documentCount} documents")
```

**Arguments:**

client: The parent async client instance.

**Methods:**

### `create` [#create-1]

```python
def create(self, name: str, description: str, schema: dict[str, Any], indexing_preferences: dict[str, Any] | None = None) -> KnowledgeBase
```

Create a new knowledge base.

**Arguments:**

name: Unique name for the knowledge base.
description: Description of the knowledge base.
schema: JSON schema for documents in this knowledge base.
indexing\_preferences: Optional indexing configuration.

**Returns:**

The created knowledge base.

### `delete` [#delete-1]

```python
def delete(self, knowledge_base_id: str) -> None
```

Delete a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to delete.

### `documents` [#documents]

```python
def documents(self, knowledge_base_id: str) -> AsyncKnowledgeBaseDocuments
```

Access document operations for a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

An AsyncKnowledgeBaseDocuments instance for document operations.

### `get` [#get-1]

```python
def get(self, knowledge_base_id: str) -> KnowledgeBase
```

Get a specific knowledge base by ID.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

The knowledge base details.

### `list` [#list-1]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None, is_active: bool | None = None) -> AsyncPage[KnowledgeBase]
```

List knowledge bases.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter by name or description.
is\_active: Filter by active status.

**Returns:**

An AsyncPage of knowledge bases with pagination support.

### `search` [#search]

```python
def search(self, knowledge_base_id: str, query: str, limit: int | None = None, similarity_threshold: float | None = None, include_metadata: bool | None = None) -> SearchResult
```

Perform semantic search in a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to search.
query: Search query text.
limit: Maximum number of results (1-50).
similarity\_threshold: Minimum similarity score (0-1).
include\_metadata: Include document metadata in results.

**Returns:**

Search results with similarity scores.

### `sync` [#sync]

```python
def sync(self, knowledge_base_id: str, regenerate_embeddings: bool | None = None) -> SyncResult
```

Trigger manual synchronization of a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to sync.
regenerate\_embeddings: Whether to regenerate all embeddings.

**Returns:**

The sync operation result.

### `update` [#update-1]

```python
def update(self, knowledge_base_id: str, name: str | None = None, description: str | None = None, is_active: bool | None = None) -> KnowledgeBase
```

Update a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to update.
name: New name for the knowledge base.
description: New description.
is\_active: Active status.

**Returns:**

The updated knowledge base.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

## `KnowledgeBaseDocuments` [#knowledgebasedocuments]

Synchronous document operations for a knowledge base.

**Arguments:**

client: The parent client instance.
knowledge\_base\_id: The knowledge base ID.

**Methods:**

### `create` [#create-2]

```python
def create(self, content: dict[str, Any], document_id: str | None = None, metadata: dict[str, Any] | None = None, generate_embedding: bool = True) -> KnowledgeBaseDocument
```

Add a document to the knowledge base.

**Arguments:**

content: Document content matching the knowledge base schema.
document\_id: Optional external document reference ID.
metadata: Optional additional metadata.
generate\_embedding: Whether to automatically generate embedding. Defaults to True.

**Returns:**

The created document.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").create(
    ...     content={"title": "User Guide", "text": "..."},
    ...     metadata={"source": "manual"}
    ... )
    >>> print(f"Created: {doc.id}")
```

### `delete` [#delete-2]

```python
def delete(self, document_id: str) -> None
```

Delete a document from the knowledge base.

**Arguments:**

document\_id: The document ID to delete.

**Raises:**

NotFoundError: If the document doesn't exist.

**Example:**

```python
>>> client.knowledge_bases.documents("kb_123").delete("doc_456")
```

### `get` [#get-2]

```python
def get(self, document_id: str) -> KnowledgeBaseDocument
```

Get a specific document by ID.

**Arguments:**

document\_id: The document ID.

**Returns:**

The document details.

**Raises:**

NotFoundError: If the document doesn't exist.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").get("doc_456")
    >>> print(doc.content)
```

### `list` [#list-2]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> Page[KnowledgeBaseDocument]
```

List documents in the knowledge base.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter documents.

**Returns:**

A Page of documents with pagination support.

**Example:**

```python
>>> docs = client.knowledge_bases.documents("kb_123").list()
    >>> for doc in docs.auto_paging_iter():
    ...     print(doc.id)
```

### `update` [#update-2]

```python
def update(self, document_id: str, content: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None, regenerate_embedding: bool = False) -> KnowledgeBaseDocument
```

Update a document in the knowledge base.

**Arguments:**

document\_id: The document ID to update.
content: Updated document content.
metadata: Updated metadata.
regenerate\_embedding: Whether to force embedding regeneration. Defaults to False.

**Returns:**

The updated document.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").update(
    ...     "doc_456",
    ...     content={"title": "Updated Guide", "text": "..."}
    ... )
```

## `KnowledgeBases` [#knowledgebases]

Synchronous knowledge base operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> # List knowledge bases
    >>> for kb in client.knowledge_bases.list().auto_paging_iter():
    ...     print(f"{kb.name}: {kb.documentCount} documents")
    >>>
    >>> # Search in a knowledge base
    >>> results = client.knowledge_bases.search("kb_123", query="authentication")
    >>> for item in results.data:
    ...     print(f"{item.document.id}: {item.similarity:.2%}")
```

**Arguments:**

client: The parent client instance.

**Methods:**

### `create` [#create-3]

```python
def create(self, name: str, description: str, schema: dict[str, Any], indexing_preferences: dict[str, Any] | None = None) -> KnowledgeBase
```

Create a new knowledge base.

**Arguments:**

name: Unique name for the knowledge base.
description: Description of the knowledge base.
schema: JSON schema for documents in this knowledge base.
indexing\_preferences: Optional indexing configuration.

**Returns:**

The created knowledge base.

**Raises:**

ConflictError: If a knowledge base with that name already exists.

**Example:**

```python
>>> kb = client.knowledge_bases.create(
    ...     name="User Documentation",
    ...     description="Product user guides and manuals",
    ...     schema={"type": "object", "properties": {"title": {"type": "string"}}}
    ... )
    >>> print(f"Created: {kb.id}")
```

### `delete` [#delete-3]

```python
def delete(self, knowledge_base_id: str) -> None
```

Delete a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to delete.

**Raises:**

NotFoundError: If the knowledge base doesn't exist.

**Example:**

```python
>>> client.knowledge_bases.delete("kb_123")
```

### `documents` [#documents-1]

```python
def documents(self, knowledge_base_id: str) -> KnowledgeBaseDocuments
```

Access document operations for a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

A KnowledgeBaseDocuments instance for document operations.

**Example:**

```python
>>> docs = client.knowledge_bases.documents("kb_123")
    >>> for doc in docs.list().auto_paging_iter():
    ...     print(doc.content)
```

### `get` [#get-3]

```python
def get(self, knowledge_base_id: str) -> KnowledgeBase
```

Get a specific knowledge base by ID.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

The knowledge base details.

**Raises:**

NotFoundError: If the knowledge base doesn't exist.

**Example:**

```python
>>> kb = client.knowledge_bases.get("kb_123")
    >>> print(f"{kb.name}: {kb.description}")
```

### `list` [#list-3]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None, is_active: bool | None = None) -> Page[KnowledgeBase]
```

List knowledge bases.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter by name or description.
is\_active: Filter by active status.

**Returns:**

A Page of knowledge bases with pagination support.

**Example:**

```python
>>> for kb in client.knowledge_bases.list().auto_paging_iter():
    ...     print(f"{kb.name}: {kb.documentCount} documents")
```

### `search` [#search-1]

```python
def search(self, knowledge_base_id: str, query: str, limit: int | None = None, similarity_threshold: float | None = None, include_metadata: bool | None = None) -> SearchResult
```

Perform semantic search in a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to search.
query: Search query text.
limit: Maximum number of results (1-50).
similarity\_threshold: Minimum similarity score (0-1).
include\_metadata: Include document metadata in results.

**Returns:**

Search results with similarity scores.

**Example:**

```python
>>> results = client.knowledge_bases.search(
    ...     "kb_123",
    ...     query="how to configure authentication",
    ...     limit=5
    ... )
    >>> for item in results.data:
    ...     print(f"{item.similarity:.2%}: {item.document.content}")
```

### `sync` [#sync-1]

```python
def sync(self, knowledge_base_id: str, regenerate_embeddings: bool | None = None) -> SyncResult
```

Trigger manual synchronization of a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to sync.
regenerate\_embeddings: Whether to regenerate all embeddings.

**Returns:**

The sync operation result.

**Example:**

```python
>>> result = client.knowledge_bases.sync("kb_123", regenerate_embeddings=True)
    >>> print(f"Sync status: {result.status}")
```

### `update` [#update-3]

```python
def update(self, knowledge_base_id: str, name: str | None = None, description: str | None = None, is_active: bool | None = None) -> KnowledgeBase
```

Update a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to update.
name: New name for the knowledge base.
description: New description.
is\_active: Active status.

**Returns:**

The updated knowledge base.

**Example:**

```python
>>> kb = client.knowledge_bases.update(
    ...     "kb_123",
    ...     description="Updated documentation"
    ... )
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Steps (https://docs.docutray.com/docs/python-sdk/resources/steps)



Steps resource for step execution operations.

## `AsyncSteps` [#asyncsteps]

Asynchronous step execution operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     status = await client.steps.run_async(
    ...         step_id="step_extraction",
    ...         file=Path("document.pdf")
    ...     )
    ...     result = await status.wait()
    ...     print(result.data)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

### `get_status` [#get_status]

```python
def get_status(self, execution_id: str) -> StepExecutionStatus
```

Get the status of a step execution.

**Arguments:**

execution\_id: The execution ID returned by run\_async().

**Returns:**

The current execution status.

### `run_async` [#run_async]

```python
def run_async(self, step_id: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> StepExecutionStatus
```

Execute a step asynchronously.

**Arguments:**

step\_id: The ID of the step to execute.
file: File to process (Path, bytes, or file-like object).
url: URL of the document to process (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial execution status with execution\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

## `Steps` [#steps]

Synchronous step execution operations.

Steps allow executing predefined document processing workflows.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> status = client.steps.run_async(
    ...     step_id="step_extraction",
    ...     file=Path("document.pdf")
    ... )
    >>> result = status.wait()
    >>> print(result.data)
```

**Arguments:**

client: The parent client instance.

**Methods:**

### `get_status` [#get_status-1]

```python
def get_status(self, execution_id: str) -> StepExecutionStatus
```

Get the status of a step execution.

**Arguments:**

execution\_id: The execution ID returned by run\_async().

**Returns:**

The current execution status.

**Example:**

```python
>>> status = client.steps.get_status("exec_abc123")
    >>> if status.is_success():
    ...     print(status.data)
```

### `run_async` [#run_async-1]

```python
def run_async(self, step_id: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> StepExecutionStatus
```

Execute a step asynchronously.

Initiates step execution and returns immediately with an execution ID.
Use get\_status() to poll for completion.

**Arguments:**

step\_id: The ID of the step to execute.
file: File to process (Path, bytes, or file-like object).
url: URL of the document to process (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial execution status with execution\_id.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
NotFoundError: If the step doesn't exist.

**Example:**

```python
>>> status = client.steps.run_async(
    ...     "step_invoice_extraction",
    ...     file=Path("invoice.pdf")
    ... )
    >>> print(f"Execution ID: {status.execution_id}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Convert Types (https://docs.docutray.com/docs/python-sdk/types/convert)



Types for document conversion operations.

## `ConversionResult` [#conversionresult]

Result of a synchronous document conversion.

**Fields:**

* `data`: `dict[str, Any]` - Extracted data according to the document type JSON schema.

* `model_config`: `Any`

## `ConversionStatus` [#conversionstatus]

Status of an asynchronous document conversion.

**Fields:**

* `conversion_id`: `str` - Unique conversion ID.

* `data`: `dict[str, Any] | None` - Extracted data (only present when status is SUCCESS).

* `document_type_code`: `str | None` - Document type code used for conversion.

* `error`: `str | None` - Error message (only present when status is ERROR).

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when conversion was started.

* `response_timestamp`: `datetime | None` - Timestamp when conversion was completed (only for SUCCESS/ERROR).

* `status`: `ConversionStatusType` - Current conversion status.

* `status_url`: `str | None` - URL to check conversion status.


---

# Document Type Types (https://docs.docutray.com/docs/python-sdk/types/document_type)



Types for document type operations.

## `DocumentType` [#documenttype]

A document type definition.

**Fields:**

* `codeType`: `str` - Unique document type code.

* `createdAt`: `datetime | None` - Creation timestamp.

* `description`: `str | None` - Document type description.

* `id`: `str` - Unique document type ID.

* `isDraft`: `bool` - Indicates if the document type is a draft.

* `isPublic`: `bool` - Indicates if the document type is public.

* `model_config`: `Any`

* `name`: `str` - Document type name.

* `schema_`: `dict[str, Any] | None` - JSON schema for the document type (when retrieved by ID).

* `updatedAt`: `datetime | None` - Last update timestamp.

## `ValidationErrorInfo` [#validationerrorinfo]

Validation error information.

**Fields:**

* `count`: `int` - Total number of errors found.

* `messages`: `list[str]` - List of descriptive error messages.

* `model_config`: `Any`

## `ValidationResult` [#validationresult]

Result of JSON validation against a document type schema.

**Fields:**

* `errors`: `ValidationErrorInfo` - Validation errors.

* `model_config`: `Any`

* `warnings`: `ValidationWarningInfo` - Validation warnings.

## `ValidationWarningInfo` [#validationwarninginfo]

Validation warning information.

**Fields:**

* `count`: `int` - Total number of warnings found.

* `messages`: `list[str]` - List of descriptive warning messages.

* `model_config`: `Any`


---

# Identify Types (https://docs.docutray.com/docs/python-sdk/types/identify)



Types for document identification operations.

## `DocumentTypeMatch` [#documenttypematch]

A matched document type with confidence score.

**Fields:**

* `code`: `str` - Document type code.

* `confidence`: `float` - Confidence score (0-1).

* `model_config`: `Any`

* `name`: `str` - Document type name.

## `IdentificationResult` [#identificationresult]

Result of a synchronous document identification.

**Fields:**

* `alternatives`: `list[DocumentTypeMatch]` - Alternative document types with their confidence levels.

* `document_type`: `DocumentTypeMatch` - Primary identified document type.

* `model_config`: `Any`

## `IdentificationStatus` [#identificationstatus]

Status of an asynchronous document identification.

**Fields:**

* `alternatives`: `list[DocumentTypeMatch] | None` - Alternative document types (only present when status is SUCCESS).

* `document_type`: `DocumentTypeMatch | None` - Primary identified document type (only present when status is SUCCESS).

* `error`: `str | None` - Error message (only present when status is ERROR).

* `identification_id`: `str` - Unique identification ID.

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when identification was started.

* `response_timestamp`: `datetime | None` - Timestamp when identification was completed (only for SUCCESS/ERROR).

* `status`: `IdentificationStatusType` - Current identification status.

* `status_url`: `str | None` - URL to check identification status.


---

# Knowledge Base Types (https://docs.docutray.com/docs/python-sdk/types/knowledge_base)



Types for knowledge base operations.

## `KnowledgeBase` [#knowledgebase]

A knowledge base for semantic document search.

**Fields:**

* `createdAt`: `datetime | None` - Timestamp when the knowledge base was created.

* `description`: `str | None` - Description of the knowledge base.

* `documentCount`: `int | None` - Number of documents in the knowledge base.

* `id`: `str` - Unique knowledge base ID.

* `isActive`: `bool` - Whether the knowledge base is active.

* `model_config`: `Any`

* `name`: `str` - Name of the knowledge base.

* `schema_`: `dict[str, Any] | None` - JSON schema for documents in this knowledge base.

* `updatedAt`: `datetime | None` - Timestamp when the knowledge base was last updated.

## `KnowledgeBaseDocument` [#knowledgebasedocument]

A document stored in a knowledge base.

**Fields:**

* `content`: `dict[str, Any]` - Document content matching the knowledge base schema.

* `createdAt`: `datetime | None` - Timestamp when the document was added.

* `documentId`: `str | None` - External document reference ID.

* `id`: `str` - Unique document ID within the knowledge base.

* `metadata`: `dict[str, Any] | None` - Additional metadata for the document.

* `model_config`: `Any`

* `updatedAt`: `datetime | None` - Timestamp when the document was last updated.

## `SearchResult` [#searchresult]

Result of a semantic search operation.

**Fields:**

* `data`: `list[SearchResultItem]` - List of matching documents with similarity scores.

* `model_config`: `Any`

* `query`: `str | None` - The processed search query.

* `resultsCount`: `int` - Total number of results returned.

## `SearchResultItem` [#searchresultitem]

A single search result with similarity score.

**Fields:**

* `document`: `KnowledgeBaseDocument` - The matched document.

* `model_config`: `Any`

* `similarity`: `float` - Similarity score (0-1), higher is more similar.

## `SyncResult` [#syncresult]

Result of a knowledge base synchronization operation.

**Fields:**

* `completedAt`: `datetime | None` - Timestamp when sync completed.

* `documentsProcessed`: `int | None` - Number of documents processed during sync.

* `errors`: `list[str] | None` - Any errors encountered during sync.

* `model_config`: `Any`

* `startedAt`: `datetime | None` - Timestamp when sync started.

* `status`: `str` - Sync status (e.g., 'started', 'completed', 'failed').

* `syncId`: `str | None` - Unique sync operation ID.


---

# Shared Types (https://docs.docutray.com/docs/python-sdk/types/shared)



Shared types used across multiple resources.

## `APIResponse` [#apiresponse]

Base class for API responses with common fields.

**Fields:**

* `model_config`: `Any`

## `ErrorDetail` [#errordetail]

Error detail information.

**Fields:**

* `errors`: `list[str] | None` - List of specific validation errors.

* `message`: `str` - Error message.

* `model_config`: `Any`

## `PaginatedResponse` [#paginatedresponse]

Generic paginated response wrapper.

**Fields:**

* `data`: `list[T]` - List of items in the current page.

* `model_config`: `Any`

* `pagination`: `Pagination` - Pagination metadata.

## `Pagination` [#pagination]

Pagination information for list responses.

**Fields:**

* `limit`: `int` - Number of items per page.

* `model_config`: `Any`

* `page`: `int` - Current page number (1-indexed).

* `total`: `int` - Total number of items matching the query.


---

# Step Types (https://docs.docutray.com/docs/python-sdk/types/step)



Types for step execution operations.

## `StepExecutionStatus` [#stepexecutionstatus]

Status of an asynchronous step execution.

**Fields:**

* `data`: `dict[str, Any] | None` - Result data (only present when status is SUCCESS).

* `error`: `str | dict[str, Any] | None` - Error message or details (only present when status is ERROR).

* `execution_id`: `str` - Unique execution ID.

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when execution was started.

* `response_timestamp`: `datetime | None` - Timestamp when execution was completed (only for SUCCESS/ERROR).

* `status`: `StepExecutionStatusType` - Current execution status.

* `step_id`: `str | None` - Step ID that was executed.


---

# convertDocumentAsync (https://docs.docutray.com/docs/api/conversion/convertDocumentAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# getConversionStatus (https://docs.docutray.com/docs/api/conversion/getConversionStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# convertDocument (https://docs.docutray.com/docs/api/conversion/convertDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# getDocumentType (https://docs.docutray.com/docs/api/document-types/getDocumentType)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# updateDocumentType (https://docs.docutray.com/docs/api/document-types/updateDocumentType)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# validateDocument (https://docs.docutray.com/docs/api/document-types/validateDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# listDocumentTypes (https://docs.docutray.com/docs/api/document-types/listDocumentTypes)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# createDocumentType (https://docs.docutray.com/docs/api/document-types/createDocumentType)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# identifyDocumentAsync (https://docs.docutray.com/docs/api/identification/identifyDocumentAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# getIdentificationStatus (https://docs.docutray.com/docs/api/identification/getIdentificationStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# identifyDocument (https://docs.docutray.com/docs/api/identification/identifyDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# executeStepAsync (https://docs.docutray.com/docs/api/steps/executeStepAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# getStepExecutionStatus (https://docs.docutray.com/docs/api/steps/getStepExecutionStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# getMonthlyUsage (https://docs.docutray.com/docs/api/usage/getMonthlyUsage)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api
