# Docutray - Full Documentation

> Docutray is a document processing platform that converts any document
> into structured data using AI-powered OCR, with validation workflows
> and a multi-tenant REST API.

# Getting Started (https://docs.docutray.com/docs/getting-started)


Getting Started [#getting-started]

This guide will help you get started with DocuTray quickly.

Creating an Account [#creating-an-account]

To start using DocuTray, you need to create an account by following these steps:

1. Visit the registration page at [https://app.docutray.com/register](https://app.docutray.com/register)

<img alt="DocuTray Registration Page" src={__img0} placeholder="blur" />

2. Complete the required fields:
   * **Full Name**: Enter your first and last name
   * **Email**: Use a valid email address
   * **Password**: Create a secure password (minimum 8 characters)
   * **Confirm Password**: Repeat the password for verification

<img alt="Completed Registration Form" src={__img1} placeholder="blur" />

3. Click the "Register" button

4. You will receive a confirmation email at the provided address. Open this email and click the verification link.

<img alt="Verification Email" src={__img2} placeholder="blur" />

5. Done! Once your account is verified, you can log in and start using DocuTray.

If you already have an account, you can go directly to the [login page](https://app.docutray.com/login).

Creating an API Key [#creating-an-api-key]

After creating your account, you can generate an API Key to integrate DocuTray with your applications by following these steps:

1. Log in to your DocuTray account at [https://app.docutray.com/login](https://app.docutray.com/login)

2. Select the organization you want to work with

3. Navigate to "Account" > "API Keys" in the navigation menu

<img alt="API Keys Menu" src={__img3} placeholder="blur" />

4. Click the "New API Key" button

5. Enter a descriptive name for your API Key and click "Create"

<img alt="Create New API Key" src={__img4} placeholder="blur" />

6. Copy the generated API Key and store it in a safe place. **Important**: This will be the only time you can see the complete key.

<img alt="Copy API Key" src={__img5} placeholder="blur" />

You can now use this API Key to authenticate your requests to the DocuTray API.

Your First Conversion [#your-first-conversion]

Once you have your API Key, you can process your first document with a simple API call.

Supported File Formats [#supported-file-formats]

DocuTray supports the following file formats:

* **Images**: JPEG, PNG, GIF, BMP, WebP
* **Documents**: PDF (up to 100MB)

Making the API Call [#making-the-api-call]

```bash
curl -X POST https://app.docutray.com/api/convert \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "image=@invoice.pdf" \
  -F "document_type_code=invoice"
```

The API will return extracted data in JSON format according to your document type schema:

```json
{
  "data": {
    "numero_factura": "F-2024-001",
    "fecha_emision": "2024-01-15",
    "rfc_emisor": "XAXX010101000",
    "razon_social_emisor": "Empresa Ejemplo S.A. de C.V.",
    "subtotal": 1000,
    "iva": 160,
    "total": 1160
  }
}
```

> **Tip**: For large files or batch processing, use the [async conversion endpoint](/docs/api/convert-async/post) which processes documents in the background and allows you to check the status later.

Next Steps [#next-steps]

Now that you've completed your first conversion, explore these resources:

* **[Document Types](/docs/document-types)** - Browse available document types and their schemas
* **[API Reference](/docs/api)** - Complete API documentation with all endpoints
* **[Webhooks](/docs/webhooks)** - Set up webhooks to receive conversion results automatically
* **[Guides](/docs/guides)** - Step-by-step tutorials for common use cases


---

# API Reference (https://docs.docutray.com/docs/api)


API Reference [#api-reference]

The Docutray API provides a complete set of endpoints for document processing, type management, and workflow automation.

Authentication [#authentication]

All API requests require authentication using an API Key in the `Authorization` header:

```bash
Authorization: Bearer YOUR_API_KEY
```

You can generate API Keys from your organization's dashboard in **Account** > **API Keys**.

Base URL [#base-url]

All API endpoints use the following base URLs:

| Environment | URL                            |
| ----------- | ------------------------------ |
| Production  | `https://app.docutray.com`     |
| Staging     | `https://staging.docutray.com` |

Available Endpoints [#available-endpoints]

Navigate through the sidebar to explore all available API endpoints organized by functionality:

* **Document Conversion** - Convert documents to structured data
* **Document Identification** - Automatically identify document types
* **Document Types** - Manage document type schemas
* **Knowledge Bases** - Manage knowledge bases for RAG operations
* **Steps Execution** - Execute workflow steps asynchronously

Response Format [#response-format]

All API responses follow a consistent JSON format:

```json
{
  "success": true,
  "data": { ... }
}
```

Error responses include details about the failure:

```json
{
  "success": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid document type"
  }
}
```

Rate Limits [#rate-limits]

API requests are rate-limited based on your subscription plan. Contact support for custom limits.


---

# 8-Column Balance Sheet (https://docs.docutray.com/docs/document-types/balance_ocho_columnas)


8-column balance sheet with company identification, period, and accounts with their values. This document type processes accounting balance sheets and extracts structured information from them.

**Document type code:** `balance_ocho_columnas`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "empresa": "Example Company S.A.",
      "año": "2023",
      "periodo": "January - December",
      "contador": "John Pérez CPA",
      "cuenta": [
        {
          "codigo": "1101",
          "nombre": "Cash",
          "debitos": 1000000,
          "creditos": 500000,
          "deudor": 500000,
          "acreedor": 0,
          "activo": 500000,
          "pasivo": 0,
          "perdida": 0,
          "ganancia": 0
        },
        {
          "codigo": "2101",
          "nombre": "Suppliers",
          "debitos": 200000,
          "creditos": 800000,
          "deudor": 0,
          "acreedor": 600000,
          "activo": 0,
          "pasivo": 600000,
          "perdida": 0,
          "ganancia": 0
        }
      ]
    }
  }
}
```

Main fields [#main-fields]

| Field      | Type   | Description                        |
| ---------- | ------ | ---------------------------------- |
| `empresa`  | String | Company name                       |
| `año`      | String | Balance sheet year                 |
| `periodo`  | String | Balance sheet period               |
| `contador` | String | Name of the responsible accountant |

Account fields [#account-fields]

| Field      | Type   | Description             |
| ---------- | ------ | ----------------------- |
| `codigo`   | String | Accounting account code |
| `nombre`   | String | Accounting account name |
| `debitos`  | Number | Debits amount           |
| `creditos` | Number | Credits amount          |
| `deudor`   | Number | Debtor balance          |
| `acreedor` | Number | Creditor balance        |
| `activo`   | Number | Asset value             |
| `pasivo`   | Number | Liability value         |
| `perdida`  | Number | Loss value              |
| `ganancia` | Number | Gain value              |

Important considerations [#important-considerations]

* Each account contains the 8 characteristic columns of the balance sheet: debits, credits, debtor, creditor, asset, liability, loss, and gain
* Account codes follow the standard chart of accounts
* All amounts are expressed in local currency


---

# Bill of Lading (https://docs.docutray.com/docs/document-types/bl)


Bill of Lading for international maritime transport with shipment details, ports, and cargo information.

**Document type code:** `bl`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "bl_number": "MAEU123456789",
      "shipper": "Global Export Corp",
      "consignee": "International Import Ltd",
      "notify_party": "Local Agent S.A.",
      "vessel": "ATLANTIC QUEEN",
      "voyage": "2023-045N",
      "port_of_loading": "Hamburg, Germany",
      "port_of_discharge": "Valparaíso, Chile",
      "place_of_delivery": "Santiago, Chile",
      "date_of_issue": "2023-11-15T00:00:00Z",
      "freight_payment": "PREPAID",
      "container_details": [
        {
          "container_number": "MAEU987654321",
          "seal_number": "SL123456",
          "type_size": "40'HC",
          "weight": "28500 KGS",
          "packages": 1200,
          "description": "MACHINERY PARTS"
        }
      ]
    }
  }
}
```

Main fields [#main-fields]

| Field               | Type               | Description                             |
| ------------------- | ------------------ | --------------------------------------- |
| `bl_number`         | String             | Bill of Lading identification number    |
| `shipper`           | String             | Shipper/exporter name                   |
| `consignee`         | String             | Consignee/importer name                 |
| `notify_party`      | String             | Party to be notified upon arrival       |
| `vessel`            | String             | Vessel name                             |
| `voyage`            | String             | Voyage number                           |
| `port_of_loading`   | String             | Port where cargo was loaded             |
| `port_of_discharge` | String             | Port where cargo will be discharged     |
| `place_of_delivery` | String             | Final delivery location                 |
| `date_of_issue`     | String (date-time) | Bill of Lading issue date               |
| `freight_payment`   | String             | Freight payment terms (PREPAID/COLLECT) |

Container details fields [#container-details-fields]

| Field              | Type   | Description                     |
| ------------------ | ------ | ------------------------------- |
| `container_number` | String | Container identification number |
| `seal_number`      | String | Container seal number           |
| `type_size`        | String | Container type and size         |
| `weight`           | String | Container weight                |
| `packages`         | Number | Number of packages              |
| `description`      | String | Cargo description               |

Important considerations [#important-considerations]

* It is an official maritime transport document
* Essential for international cargo clearance
* Contains detailed information about containers and cargo
* Used for customs procedures and cargo tracking
* Bill of Lading number is unique for tracking purposes


---

# Professional Fee Receipt (https://docs.docutray.com/docs/document-types/boleta_honorarios)


Professional fee receipt (boleta de honorarios) with professional services details, client information, and tax calculations.

**Document type code:** `boleta_honorarios`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_boleta": 145,
      "fecha_emision": "2023-11-20T00:00:00Z",
      "rut_profesional": "12.345.678-9",
      "nombre_profesional": "María González López",
      "rut_cliente": "98.765.432-1",
      "nombre_cliente": "Tech Solutions S.A.",
      "descripcion_servicios": "Consulting services for software development project - November 2023",
      "honorarios_brutos": 850000,
      "retencion_impuesto": 106250,
      "honorarios_liquidos": 743750,
      "periodo_servicios": "November 2023",
      "direccion_profesional": "Av. Providencia 1234, Santiago",
      "actividad_economica": "Software Consulting"
    }
  }
}
```

Main fields [#main-fields]

| Field                   | Type               | Description                            |
| ----------------------- | ------------------ | -------------------------------------- |
| `numero_boleta`         | Number             | Receipt number                         |
| `fecha_emision`         | String (date-time) | Receipt issue date                     |
| `rut_profesional`       | String             | Professional's RUT (tax ID)            |
| `nombre_profesional`    | String             | Professional's full name               |
| `rut_cliente`           | String             | Client's RUT (tax ID)                  |
| `nombre_cliente`        | String             | Client's name or company name          |
| `descripcion_servicios` | String             | Description of services provided       |
| `honorarios_brutos`     | Number             | Gross fees before tax retention        |
| `retencion_impuesto`    | Number             | Tax retention amount (typically 12.5%) |
| `honorarios_liquidos`   | Number             | Net fees after tax retention           |
| `periodo_servicios`     | String             | Period when services were provided     |
| `direccion_profesional` | String             | Professional's address                 |
| `actividad_economica`   | String             | Economic activity or professional area |

Important considerations [#important-considerations]

* It is an official tax document for professional services in Chile
* Tax retention is typically 12.5% of gross fees
* The receipt number must be sequential and unique per professional
* RUT format must be valid Chilean tax identification
* Used for income tax declarations by both professional and client
* Net fees = Gross fees - Tax retention
* Essential document for tax compliance for independent professionals


---

# Current Account Statement (https://docs.docutray.com/docs/document-types/cartola_cc)


Current Account Statement with transaction details.

**Document type code:** `cartola_cc`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "titular": "John Doe",
      "fecha_desde": "2023-11-01T00:00:00Z",
      "fecha_hasta": "2023-11-30T23:59:59Z",
      "transacciones": [
        {
          "fecha": "2023-11-15T10:30:00Z",
          "descripcion": "TRANSFER RECEIVED",
          "sucursal": "001 DOWNTOWN BRANCH",
          "numero_documento": "TRF123456",
          "tipo": "abono",
          "monto": 500000
        },
        {
          "fecha": "2023-11-20T14:45:00Z",
          "descripcion": "UTILITY BILL PAYMENT",
          "sucursal": "002 UPTOWN BRANCH",
          "numero_documento": "PSB789012",
          "tipo": "cargo",
          "monto": 35000
        },
        {
          "fecha": "2023-11-25T09:15:00Z",
          "descripcion": "ATM WITHDRAWAL",
          "sucursal": "ATM SHOPPING MALL",
          "numero_documento": "GCA345678",
          "tipo": "cargo",
          "monto": 100000
        },
        {
          "fecha": "2023-11-28T16:20:00Z",
          "descripcion": "CASH DEPOSIT",
          "sucursal": "003 BUSINESS DISTRICT",
          "numero_documento": "DEP901234",
          "tipo": "abono",
          "monto": 250000
        }
      ]
    }
  }
}
```

Main fields [#main-fields]

| Field         | Type               | Description                  |
| ------------- | ------------------ | ---------------------------- |
| `titular`     | String             | Account holder name          |
| `n_cuenta`    | String             | Account number or identifier |
| `fecha_desde` | String (date-time) | Statement start date         |
| `fecha_hasta` | String (date-time) | Statement end date           |

Transaction fields [#transaction-fields]

| Field              | Type               | Description                                                                                                                                                                                     |
| ------------------ | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `fecha`            | String (date-time) | Transaction date                                                                                                                                                                                |
| `descripcion`      | String             | Transaction description                                                                                                                                                                         |
| `sucursal`         | String             | Transaction branch. This may not appear                                                                                                                                                         |
| `numero_documento` | String             | Transaction document number. This may not appear                                                                                                                                                |
| `tipo`             | String (enum)      | Transaction type. Transactions can be Charges or Credits                                                                                                                                        |
| `monto`            | Number             | Transaction amount. Be careful not to confuse this value with the Balance or Daily Balance column. You will usually find it in the Charges or Credits columns according to the transaction type |

Transaction types [#transaction-types]

* **cargo**: Represents a debit or money outflow from the account
* **abono**: Represents a credit or money inflow to the account

Important considerations [#important-considerations]

* This is a bank current account statement with complete transaction information
* Transactions can be charges (debits) or credits (deposits)
* Important to verify the statement period dates for the correct timeframe
* The amount field corresponds to the actual transaction value, not the account balance


---

# Credit Card Statement (https://docs.docutray.com/docs/document-types/cartola_tc)


Credit Card Statement with credit limits, billed amounts, and transaction details.

**Document type code:** `cartola_tc`

Response Structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "titular": "Juan Pérez",
      "numero_tarjeta": "XXXX-XXXX-XXXX-1234",
      "fecha_estado_cuenta": "2023-12-01T00:00:00Z",
      "monto_total_facturado": 125000,
      "tipo_cartola": "nacional",
      "moneda": "CLP",
      "cupo_disponible": 1500000,
      "cupo_utilizado": 500000,
      "cupo_total": 2000000,
      "saldo_periodo_anterior": 75000,
      "transacciones": [
        {
          "fecha": "2023-11-15T00:00:00Z",
          "descripcion": "SUPERMERCADO XYZ",
          "monto_mensual": 45000,
          "compra_en_cuotas": true,
          "numero_cuota": 2,
          "total_cuotas": 6,
          "monto_total": 270000
        },
        {
          "fecha": "2023-11-20T00:00:00Z",
          "descripcion": "FARMACIA ABC",
          "monto_mensual": 15000,
          "compra_en_cuotas": false,
          "numero_cuota": null,
          "total_cuotas": null,
          "monto_total": null
        },
        {
          "fecha": "2023-11-25T00:00:00Z",
          "descripcion": "PAGO POR INTERNET",
          "monto_mensual": -75000,
          "compra_en_cuotas": false,
          "numero_cuota": null,
          "total_cuotas": null,
          "monto_total": null
        }
      ]
    }
  }
}
```

Main Fields [#main-fields]

| Field                                    | Type               | Description                                                                                                                                                                                                                                                                                                             |
| ---------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `titular`                                | String             | Cardholder's name                                                                                                                                                                                                                                                                                                       |
| `numero_tarjeta`                         | String             | Masked card number format                                                                                                                                                                                                                                                                                               |
| `fecha_estado_cuenta`                    | String (date-time) | Statement date                                                                                                                                                                                                                                                                                                          |
| `monto_total_facturado`                  | Number             | Total billed amount                                                                                                                                                                                                                                                                                                     |
| `tipo_cartola`                           | String (enum)      | Indicates if it is national or international                                                                                                                                                                                                                                                                            |
| `moneda`                                 | String (enum)      | CLP for national statements, USD for international ones                                                                                                                                                                                                                                                                 |
| `cupo_disponible`                        | Number             | Available credit limit                                                                                                                                                                                                                                                                                                  |
| `cupo_utilizado`                         | Number             | Used credit limit                                                                                                                                                                                                                                                                                                       |
| `cupo_total`                             | Number             | Total credit limit                                                                                                                                                                                                                                                                                                      |
| `saldo_periodo_anterior`                 | Number (nullable)  | Previous period balance, may also appear as final owed balance from previous period, previous billed balance, etc. Note that in some statements the Previous Period Starting Owed Balance appears, but for you it's always important to find the Final one, which may be called Final Owed Balance from Previous Period |
| `monto_total_facturado_periodo_anterior` | Number             | Total billed amount from previous period                                                                                                                                                                                                                                                                                |

Transaction Fields [#transaction-fields]

| Field              | Type               | Description                                                                    |
| ------------------ | ------------------ | ------------------------------------------------------------------------------ |
| `fecha`            | String (date-time) | Transaction date                                                               |
| `descripcion`      | String             | Transaction description                                                        |
| `monto_mensual`    | Number             | Monthly amount to pay for the transaction                                      |
| `compra_en_cuotas` | Boolean            | Indicates if it's an installment purchase                                      |
| `numero_cuota`     | Number (nullable)  | Current installment number to pay in the month, only for installment purchases |
| `total_cuotas`     | Number (nullable)  | Total number of installments, only for installment purchases                   |
| `monto_total`      | Number (nullable)  | Total transaction amount, only for installment purchases                       |

Important considerations [#important-considerations]

* The transaction list may include entries with names like PAID AMOUNT or INTERNET PAYMENT, which have negative amounts, but should also be included
* National statements use CLP currency and international statements use USD
* Installment transactions show the detail of each installment and the original total amount
* Important to verify the statement date for the correct period
* Credit card statement with complete information about limits and transactions


---

# AFP Contributions Certificate (https://docs.docutray.com/docs/document-types/cotizaciones_afp)


Certificate of AFP (pension fund) contributions with contribution history and employer information.

**Document type code:** `cotizaciones_afp`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "rut_afiliado": "12.345.678-9",
      "nombre_afiliado": "Juan Carlos Pérez",
      "afp": "Habitat",
      "fecha_emision": "2023-11-15T00:00:00Z",
      "periodo_consultado": "January 2023 - October 2023",
      "cotizaciones": [
        {
          "mes": "2023-10",
          "empleador": "Tech Solutions S.A.",
          "rut_empleador": "98.765.432-1",
          "remuneracion": 1200000,
          "cotizacion_obligatoria": 120000,
          "cotizacion_voluntaria": 50000,
          "seguro_cesantia": 36000,
          "estado": "PAGADO"
        },
        {
          "mes": "2023-09",
          "empleador": "Tech Solutions S.A.",
          "rut_empleador": "98.765.432-1",
          "remuneracion": 1200000,
          "cotizacion_obligatoria": 120000,
          "cotizacion_voluntaria": 50000,
          "seguro_cesantia": 36000,
          "estado": "PAGADO"
        }
      ]
    }
  }
}
```

Main fields [#main-fields]

| Field                | Type               | Description                           |
| -------------------- | ------------------ | ------------------------------------- |
| `rut_afiliado`       | String             | Member's RUT (tax ID)                 |
| `nombre_afiliado`    | String             | Member's full name                    |
| `afp`                | String             | AFP name (pension fund administrator) |
| `fecha_emision`      | String (date-time) | Certificate issue date                |
| `periodo_consultado` | String             | Period covered by the certificate     |

Contribution fields [#contribution-fields]

| Field                    | Type   | Description                                    |
| ------------------------ | ------ | ---------------------------------------------- |
| `mes`                    | String | Contribution month (YYYY-MM format)            |
| `empleador`              | String | Employer's name or company                     |
| `rut_empleador`          | String | Employer's RUT (tax ID)                        |
| `remuneracion`           | Number | Monthly salary or wage                         |
| `cotizacion_obligatoria` | Number | Mandatory pension contribution (typically 10%) |
| `cotizacion_voluntaria`  | Number | Voluntary additional contribution              |
| `seguro_cesantia`        | Number | Unemployment insurance contribution            |
| `estado`                 | String | Payment status (PAID/PENDING/OVERDUE)          |

Important considerations [#important-considerations]

* Official document from Chilean pension system (AFP)
* Shows contribution history for tax and benefit purposes
* Mandatory contributions are typically 10% of salary
* Used for pension calculations and employment verification
* Essential for retirement planning and loan applications
* Payment status indicates employer compliance with contributions


---

# Curriculum Vitae (https://docs.docutray.com/docs/document-types/cv)


To extract all data from a CV, including description, education, and work experience.

**Document type code:** `cv`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "nombre": "Ana María Torres",
      "telefono": "+56 9 8765 4321",
      "correo_electronico": "ana.torres@email.com",
      "descripcion": "Systems Engineer with 8 years of experience in software development and technology project management.",
      "educacion": [
        {
          "institucion": "University of Chile",
          "titulo": "Systems Engineering",
          "ano_ingreso": 2010,
          "ano_salida": 2015,
          "ubicacion": "Santiago, Chile"
        },
        {
          "institucion": "AIEP Professional Institute",
          "titulo": "Programming Technician",
          "ano_ingreso": 2008,
          "ano_salida": 2010,
          "ubicacion": "Santiago, Chile"
        }
      ],
      "experiencia_laboral": [
        {
          "empresa": "TechSolutions S.A.",
          "cargo": "Senior Developer",
          "ano_ingreso": 2020,
          "ano_salida": 2023,
          "descripcion": "Team leadership, implementation of scalable architectures, and mentoring junior developers.",
          "ubicacion": "Santiago, Chile"
        },
        {
          "empresa": "Innovate Corp",
          "cargo": "Full Stack Developer",
          "ano_ingreso": 2017,
          "ano_salida": 2020,
          "descripcion": "Web application development using React, Node.js, and PostgreSQL. Participation in digital transformation projects.",
          "ubicacion": "Valparaíso, Chile"
        }
      ]
    }
  }
}
```

Main fields [#main-fields]

| Field                | Type   | Description                                  |
| -------------------- | ------ | -------------------------------------------- |
| `nombre`             | String | Person's full name                           |
| `telefono`           | String | Contact phone number                         |
| `correo_electronico` | String | Email address                                |
| `descripcion`        | String | Professional summary or personal description |

Education fields [#education-fields]

| Field         | Type   | Description                             |
| ------------- | ------ | --------------------------------------- |
| `institucion` | String | Educational institution name            |
| `titulo`      | String | Degree or title obtained                |
| `ano_ingreso` | Number | Year of entry to the institution        |
| `ano_salida`  | Number | Year of graduation from the institution |
| `ubicacion`   | String | Institution location                    |

Work experience fields [#work-experience-fields]

| Field         | Type   | Description                                      |
| ------------- | ------ | ------------------------------------------------ |
| `empresa`     | String | Company or employer name                         |
| `cargo`       | String | Job title or position held                       |
| `ano_ingreso` | Number | Year started in the position                     |
| `ano_salida`  | Number | Year ended in the position                       |
| `descripcion` | String | Description of responsibilities and achievements |
| `ubicacion`   | String | Work location                                    |

Important considerations [#important-considerations]

* Education and experience arrays are generally ordered by relevance or chronology
* Years can be in full format (YYYY) or abbreviated according to the original document
* Personal description may include skills, professional objectives, or career summary
* Useful for recruitment processes and professional profile analysis


---

# Electronic Invoice (https://docs.docutray.com/docs/document-types/factura)


Electronic Invoice from SII (Chile) with issuer, recipient, and product/service details.

**Document type code:** `factura`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "folio": 123456,
      "fecha_emision": "2023-11-15T00:00:00Z",
      "rut_emisor": "98.765.432-1",
      "nombre_emisor": "Commercial Company S.A.",
      "rut_receptor": "12.345.678-9",
      "nombre_receptor": "Juan Pérez González",
      "detalle": [
        {
          "descripcion": "HP Pavilion Notebook",
          "unidad": "Unit",
          "cantidad": 2,
          "precio_unitario": 450000,
          "precio_total": 900000
        },
        {
          "descripcion": "Wireless Mouse",
          "unidad": "Unit",
          "cantidad": 2,
          "precio_unitario": 25000,
          "precio_total": 50000
        }
      ],
      "total_neto": 950000,
      "iva": 180500,
      "total": 1130500
    }
  }
}
```

Main fields [#main-fields]

| Field             | Type               | Description                      |
| ----------------- | ------------------ | -------------------------------- |
| `folio`           | Number             | Invoice folio number             |
| `fecha_emision`   | String (date-time) | Invoice issue date               |
| `rut_emisor`      | String             | Issuer's RUT (tax ID)            |
| `nombre_emisor`   | String             | Issuer's name or company name    |
| `rut_receptor`    | String             | Recipient's RUT (tax ID)         |
| `nombre_receptor` | String             | Recipient's name or company name |
| `total_neto`      | Number             | Net total before taxes           |
| `iva`             | Number             | VAT amount                       |
| `total`           | Number             | Final total including taxes      |

Detail fields [#detail-fields]

| Field             | Type   | Description                    |
| ----------------- | ------ | ------------------------------ |
| `descripcion`     | String | Product or service description |
| `unidad`          | String | Unit of measurement            |
| `cantidad`        | Number | Quantity of products/services  |
| `precio_unitario` | Number | Price per unit                 |
| `precio_total`    | Number | Total price for the line       |

Important considerations [#important-considerations]

* It is an official tax document from Chile's SII
* The folio is unique for each issuer
* VAT is calculated on the net total according to the current rate
* RUT must be in valid Chilean format
* Each detail line represents a billed product or service
* The sum of all `precio_total` from detail should match `total_neto`


---

# Document Types (https://docs.docutray.com/docs/document-types)


DocuTray supports the processing of multiple document types. Below you will find detailed documentation for each type, including its API code, data structure, and usage examples.

Financial Documents [#financial-documents]

<Cards>
  <Card title="8-Column Balance Sheet" href="/docs/document-types/balance_ocho_columnas" />

  <Card title="Current Account Statement" href="/docs/document-types/cartola_cc" />

  <Card title="Credit Card Statement" href="/docs/document-types/cartola_tc" />

  <Card title="Promissory Note" href="/docs/document-types/pagare" />

  <Card title="Transbank Voucher" href="/docs/document-types/voucher_transbank" />
</Cards>

Tax Documents [#tax-documents]

<Cards>
  <Card title="Professional Fee Receipt" href="/docs/document-types/boleta_honorarios" />

  <Card title="Electronic Invoice" href="/docs/document-types/factura" />

  <Card title="Invoice" href="/docs/document-types/invoice" />

  <Card title="Purchase Order" href="/docs/document-types/oc" />
</Cards>

Labor Documents [#labor-documents]

<Cards>
  <Card title="AFP Contributions Certificate" href="/docs/document-types/cotizaciones_afp" />

  <Card title="Curriculum Vitae" href="/docs/document-types/cv" />

  <Card title="Payroll" href="/docs/document-types/liquidacion_sueldo" />
</Cards>

Medical Documents [#medical-documents]

<Cards>
  <Card title="Medical Prescription" href="/docs/document-types/receta_medica" />
</Cards>

International Commerce Documents [#international-commerce-documents]

<Cards>
  <Card title="Bill of Lading" href="/docs/document-types/bl" />
</Cards>


---

# Invoice (https://docs.docutray.com/docs/document-types/invoice)


International services invoice with currency, amount, date, and issuer and recipient information.

**Document type code:** `invoice`

Response Structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "moneda": "USD",
      "fecha_pago": "2023-12-15T00:00:00Z",
      "invoice_id": "INV-2023-001234",
      "monto_total": 1250.50,
      "fecha_emisión": "2023-12-01T00:00:00Z",
      "tax_id_emisor": "12345678-9",
      "tax_id_receptor": "98765432-1",
      "nombre_emisor": "ABC Company Inc.",
      "nombre_receptor": "XYZ Client Ltd."
    }
  }
}
```

Main Fields [#main-fields]

| Field             | Type               | Description                                                                       |
| ----------------- | ------------------ | --------------------------------------------------------------------------------- |
| `moneda`          | String             | Currency in which the Invoice is being charged. Uses ISO 4217 format for currency |
| `fecha_pago`      | String (date-time) | The date when payment was made, if available                                      |
| `invoice_id`      | String             | The code or identification number of the invoice                                  |
| `monto_total`     | Number             | Total amount of the Invoice                                                       |
| `fecha_emisión`   | String (date-time) | The date when the Invoice was issued                                              |
| `tax_id_emisor`   | String             | Tax ID, RUT or fiscal identifier of the Invoice issuer, if available              |
| `tax_id_receptor` | String             | Tax ID, RUT or fiscal identifier of the Invoice recipient, if available           |
| `nombre_emisor`   | String             | Name or business name of the Invoice issuer, if available                         |
| `nombre_receptor` | String             | Name or business name of the Invoice recipient, if available                      |

Important considerations [#important-considerations]

* All listed fields are **required** for document processing
* Dates will be in ISO 8601 format (date-time)
* Currency will follow ISO 4217 standard (e.g: USD, EUR, CLP)
* Amounts are numeric values without currency formatting
* It is an international billing document
* Used for international services and products


---

# Payroll (https://docs.docutray.com/docs/document-types/liquidacion_sueldo)


Detailed payslip with salary information, deductions, bonuses, and net payment calculations.

**Document type code:** `liquidacion_sueldo`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "empleado": "Carlos Mendoza Ruiz",
      "rut": "15.678.432-9",
      "cargo": "Software Developer",
      "empresa": "Innovate Tech S.A.",
      "periodo": "November 2023",
      "fecha_pago": "2023-11-30T00:00:00Z",
      "dias_trabajados": 22,
      "sueldo_base": 1200000,
      "haberes": [
        {
          "concepto": "Overtime Hours",
          "cantidad": 8,
          "valor_unitario": 15000,
          "total": 120000
        },
        {
          "concepto": "Performance Bonus",
          "cantidad": 1,
          "valor_unitario": 100000,
          "total": 100000
        }
      ],
      "descuentos": [
        {
          "concepto": "AFP Contribution",
          "porcentaje": 10,
          "total": 120000
        },
        {
          "concepto": "Health Insurance",
          "porcentaje": 7,
          "total": 84000
        },
        {
          "concepto": "Income Tax",
          "porcentaje": null,
          "total": 45000
        }
      ],
      "total_haberes": 1420000,
      "total_descuentos": 249000,
      "liquido_a_pagar": 1171000
    }
  }
}
```

Main fields [#main-fields]

| Field              | Type               | Description                     |
| ------------------ | ------------------ | ------------------------------- |
| `empleado`         | String             | Employee's full name            |
| `rut`              | String             | Employee's RUT (tax ID)         |
| `cargo`            | String             | Job position or title           |
| `empresa`          | String             | Company name                    |
| `periodo`          | String             | Payroll period                  |
| `fecha_pago`       | String (date-time) | Payment date                    |
| `dias_trabajados`  | Number             | Days worked in the period       |
| `sueldo_base`      | Number             | Base salary                     |
| `total_haberes`    | Number             | Total earnings (base + bonuses) |
| `total_descuentos` | Number             | Total deductions                |
| `liquido_a_pagar`  | Number             | Net amount to be paid           |

Earnings (haberes) fields [#earnings-haberes-fields]

| Field            | Type   | Description                              |
| ---------------- | ------ | ---------------------------------------- |
| `concepto`       | String | Earnings concept (overtime, bonus, etc.) |
| `cantidad`       | Number | Quantity (hours, units, etc.)            |
| `valor_unitario` | Number | Unit value                               |
| `total`          | Number | Total amount for this earning            |

Deductions (descuentos) fields [#deductions-descuentos-fields]

| Field        | Type              | Description                                |
| ------------ | ----------------- | ------------------------------------------ |
| `concepto`   | String            | Deduction concept (AFP, health, tax, etc.) |
| `porcentaje` | Number (nullable) | Percentage applied, if applicable          |
| `total`      | Number            | Total deducted amount                      |

Important considerations [#important-considerations]

* Official payroll document for employment in Chile
* Net pay = Total earnings - Total deductions
* AFP and health contributions are mandatory in Chile (typically 10% and 7%)
* Income tax varies based on salary brackets
* Days worked affects proportional salary calculations
* Essential document for employment verification and loan applications
* All amounts are in Chilean pesos (CLP)


---

# Purchase Order (https://docs.docutray.com/docs/document-types/oc)


Corporate purchase order with supplier information, requested products/services, and delivery details.

**Document type code:** `oc`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_orden": "OC-2023-001245",
      "fecha_emision": "2023-11-10T00:00:00Z",
      "empresa_compradora": "Tech Solutions S.A.",
      "rut_comprador": "98.765.432-1",
      "proveedor": "Office Supplies Corp",
      "rut_proveedor": "12.345.678-9",
      "contacto_comprador": "María González - Procurement",
      "telefono_comprador": "+56 2 2345 6789",
      "direccion_entrega": "Av. Providencia 1234, Santiago, Chile",
      "fecha_entrega_solicitada": "2023-11-20T00:00:00Z",
      "detalle": [
        {
          "codigo_producto": "LAP001",
          "descripcion": "Business Laptop HP ProBook 450",
          "cantidad": 5,
          "precio_unitario": 650000,
          "precio_total": 3250000
        },
        {
          "codigo_producto": "MOU002",
          "descripcion": "Wireless Mouse Logitech MX Master 3",
          "cantidad": 5,
          "precio_unitario": 85000,
          "precio_total": 425000
        }
      ],
      "subtotal": 3675000,
      "iva": 698250,
      "total": 4373250,
      "condiciones_pago": "30 days net",
      "observaciones": "Delivery required during business hours. Contact procurement department upon arrival."
    }
  }
}
```

Main fields [#main-fields]

| Field                      | Type               | Description                             |
| -------------------------- | ------------------ | --------------------------------------- |
| `numero_orden`             | String             | Purchase order number                   |
| `fecha_emision`            | String (date-time) | Purchase order issue date               |
| `empresa_compradora`       | String             | Purchasing company name                 |
| `rut_comprador`            | String             | Purchaser's RUT (tax ID)                |
| `proveedor`                | String             | Supplier name                           |
| `rut_proveedor`            | String             | Supplier's RUT (tax ID)                 |
| `contacto_comprador`       | String             | Buyer contact person                    |
| `telefono_comprador`       | String             | Buyer contact phone                     |
| `direccion_entrega`        | String             | Delivery address                        |
| `fecha_entrega_solicitada` | String (date-time) | Requested delivery date                 |
| `subtotal`                 | Number             | Subtotal before taxes                   |
| `iva`                      | Number             | VAT amount                              |
| `total`                    | Number             | Total amount including taxes            |
| `condiciones_pago`         | String             | Payment terms                           |
| `observaciones`            | String             | Additional observations or instructions |

Detail fields [#detail-fields]

| Field             | Type   | Description                    |
| ----------------- | ------ | ------------------------------ |
| `codigo_producto` | String | Product or service code        |
| `descripcion`     | String | Product or service description |
| `cantidad`        | Number | Requested quantity             |
| `precio_unitario` | Number | Unit price                     |
| `precio_total`    | Number | Total price for the line       |

Important considerations [#important-considerations]

* It is a formal procurement document between companies
* Purchase order number is unique and used for tracking
* Serves as authorization for the supplier to deliver goods/services
* Essential for inventory control and accounts payable processes
* Payment terms define when payment is due after delivery
* Delivery address may differ from company's main address
* Total amount = Subtotal + VAT (typically 19% in Chile)


---

# Promissory Note (https://docs.docutray.com/docs/document-types/pagare)


Financial promissory note with debtor information, amount, payment terms, and maturity date.

**Document type code:** `pagare`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_pagare": "PN-2023-000789",
      "fecha_emision": "2023-11-01T00:00:00Z",
      "lugar_emision": "Santiago, Chile",
      "deudor": "Carlos Mendoza Ruiz",
      "rut_deudor": "15.678.432-9",
      "beneficiario": "Banco de Chile",
      "rut_beneficiario": "97.004.000-5",
      "monto": 5000000,
      "moneda": "CLP",
      "fecha_vencimiento": "2024-05-01T00:00:00Z",
      "tasa_interes": 2.5,
      "tipo_interes": "monthly",
      "forma_pago": "Monthly installments of CLP 450,000",
      "lugar_pago": "Any branch of Banco de Chile",
      "avalista": "María González López",
      "rut_avalista": "12.345.678-9",
      "clausulas_especiales": "In case of default, the debtor agrees to pay legal collection costs and attorney fees."
    }
  }
}
```

Main fields [#main-fields]

| Field                  | Type               | Description                           |
| ---------------------- | ------------------ | ------------------------------------- |
| `numero_pagare`        | String             | Promissory note number                |
| `fecha_emision`        | String (date-time) | Issue date                            |
| `lugar_emision`        | String             | Place where the note was issued       |
| `deudor`               | String             | Debtor's full name                    |
| `rut_deudor`           | String             | Debtor's RUT (tax ID)                 |
| `beneficiario`         | String             | Beneficiary's name (creditor)         |
| `rut_beneficiario`     | String             | Beneficiary's RUT (tax ID)            |
| `monto`                | Number             | Principal amount                      |
| `moneda`               | String             | Currency (CLP, USD, etc.)             |
| `fecha_vencimiento`    | String (date-time) | Maturity date                         |
| `tasa_interes`         | Number             | Interest rate percentage              |
| `tipo_interes`         | String             | Interest type (monthly, annual, etc.) |
| `forma_pago`           | String             | Payment method description            |
| `lugar_pago`           | String             | Payment location                      |
| `avalista`             | String             | Guarantor's name (if applicable)      |
| `rut_avalista`         | String             | Guarantor's RUT (if applicable)       |
| `clausulas_especiales` | String             | Special clauses or conditions         |

Important considerations [#important-considerations]

* It is a legally binding financial document
* The debtor commits to pay the specified amount by the maturity date
* Interest rate and payment terms must be clearly specified
* Guarantor provides additional security for the loan
* Used for personal and commercial loans
* Essential for legal collection processes if payment defaults occur
* Currency should be specified to avoid confusion in international transactions


---

# Medical Prescription (https://docs.docutray.com/docs/document-types/receta_medica)


Medical prescription with doctor information, patient details, and prescribed medications with dosage instructions.

**Document type code:** `receta_medica`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "doctor": "Dr. Ana María Fernández",
      "especialidad": "Internal Medicine",
      "rut_doctor": "12.345.678-9",
      "registro_medico": "RM-12345",
      "paciente": "Juan Carlos Pérez",
      "rut_paciente": "15.678.432-K",
      "fecha_prescripcion": "2023-11-15T00:00:00Z",
      "diagnostico": "Hypertension and Type 2 Diabetes",
      "medicamentos": [
        {
          "nombre": "Losartan",
          "concentracion": "50mg",
          "forma_farmaceutica": "Tablets",
          "cantidad": 30,
          "posologia": "1 tablet daily, preferably in the morning",
          "duracion_tratamiento": "30 days"
        },
        {
          "nombre": "Metformin",
          "concentracion": "850mg",
          "forma_farmaceutica": "Tablets",
          "cantidad": 60,
          "posologia": "1 tablet twice daily with meals",
          "duracion_tratamiento": "30 days"
        }
      ],
      "indicaciones_generales": "Monitor blood pressure and glucose levels weekly. Return for follow-up in 30 days.",
      "hospital_clinica": "Hospital Clínico Universidad de Chile"
    }
  }
}
```

Main fields [#main-fields]

| Field                    | Type               | Description                          |
| ------------------------ | ------------------ | ------------------------------------ |
| `doctor`                 | String             | Prescribing doctor's full name       |
| `especialidad`           | String             | Doctor's medical specialty           |
| `rut_doctor`             | String             | Doctor's RUT (tax ID)                |
| `registro_medico`        | String             | Doctor's medical license number      |
| `paciente`               | String             | Patient's full name                  |
| `rut_paciente`           | String             | Patient's RUT (tax ID)               |
| `fecha_prescripcion`     | String (date-time) | Prescription date                    |
| `diagnostico`            | String             | Medical diagnosis                    |
| `indicaciones_generales` | String             | General instructions for the patient |
| `hospital_clinica`       | String             | Hospital or clinic name              |

Medication fields [#medication-fields]

| Field                  | Type   | Description                                          |
| ---------------------- | ------ | ---------------------------------------------------- |
| `nombre`               | String | Medication name (generic or brand)                   |
| `concentracion`        | String | Medication concentration/strength                    |
| `forma_farmaceutica`   | String | Pharmaceutical form (tablets, capsules, syrup, etc.) |
| `cantidad`             | Number | Quantity prescribed                                  |
| `posologia`            | String | Dosage instructions                                  |
| `duracion_tratamiento` | String | Treatment duration                                   |

Important considerations [#important-considerations]

* It is an official medical document required for controlled medication dispensing
* Doctor must have valid medical license to prescribe
* Patient identification is essential for pharmacy dispensing
* Dosage instructions must be followed exactly as prescribed
* Some medications may require special handling or storage
* Used for insurance reimbursement and medication tracking
* Essential for patient safety and treatment compliance


---

# Transbank Voucher (https://docs.docutray.com/docs/document-types/voucher_transbank)


Transbank transaction voucher with card payment details, merchant information, and transaction amounts.

**Document type code:** `voucher_transbank`

Response structure [#response-structure]

```json
{
  "data": {
    "extractedData": {
      "numero_transaccion": "123456789012",
      "fecha_hora": "2023-11-15T14:32:15Z",
      "comercio": "SuperMarket Plaza S.A.",
      "rut_comercio": "98.765.432-1",
      "terminal": "12345678",
      "numero_tarjeta": "XXXX-XXXX-XXXX-1234",
      "tipo_tarjeta": "VISA CREDIT",
      "banco_emisor": "Banco de Chile",
      "codigo_autorizacion": "AB123456",
      "monto": 45750,
      "moneda": "CLP",
      "tipo_transaccion": "SALE",
      "cuotas": 1,
      "plan_cuotas": "Without Interest",
      "estado": "APPROVED",
      "codigo_respuesta": "00",
      "descripcion_respuesta": "TRANSACTION APPROVED",
      "numero_referencia": "987654321",
      "numero_lote": "000123"
    }
  }
}
```

Main fields [#main-fields]

| Field                   | Type               | Description                                   |
| ----------------------- | ------------------ | --------------------------------------------- |
| `numero_transaccion`    | String             | Unique transaction number                     |
| `fecha_hora`            | String (date-time) | Transaction date and time                     |
| `comercio`              | String             | Merchant name                                 |
| `rut_comercio`          | String             | Merchant's RUT (tax ID)                       |
| `terminal`              | String             | Terminal identification number                |
| `numero_tarjeta`        | String             | Masked card number                            |
| `tipo_tarjeta`          | String             | Card type (VISA, MASTERCARD, etc.)            |
| `banco_emisor`          | String             | Card issuing bank                             |
| `codigo_autorizacion`   | String             | Transaction authorization code                |
| `monto`                 | Number             | Transaction amount                            |
| `moneda`                | String             | Currency (typically CLP)                      |
| `tipo_transaccion`      | String             | Transaction type (SALE, REFUND, etc.)         |
| `cuotas`                | Number             | Number of installments                        |
| `plan_cuotas`           | String             | Installment plan description                  |
| `estado`                | String             | Transaction status (APPROVED, DECLINED, etc.) |
| `codigo_respuesta`      | String             | Response code from payment processor          |
| `descripcion_respuesta` | String             | Response description                          |
| `numero_referencia`     | String             | Reference number for tracking                 |
| `numero_lote`           | String             | Batch number for settlement                   |

Important considerations [#important-considerations]

* Official payment voucher from Chile's main payment processor
* Authorization code confirms transaction approval
* Used for reconciliation and accounting purposes
* Card number is masked for security (only last 4 digits visible)
* Response code "00" typically indicates successful transaction
* Essential for refunds and dispute resolution
* Batch number groups transactions for daily settlement
* Different transaction types may have different data requirements


---

# Create Document Type (https://docs.docutray.com/docs/guides/crear-tipo-documento)


Create Custom Document Type [#create-custom-document-type]

This guide will help you create a custom document type using Docutray's AI-powered creation wizard.

Prerequisites [#prerequisites]

Before you begin, make sure you have:

* An active Docutray account
* At least one sample document of the type you want to create (PDF, JPG, PNG, etc.)
* A clear description of the data you want to extract from the document

Step 1: Access the Creation Wizard [#step-1-access-the-creation-wizard]

1. Log in to your Docutray account at [https://app.docutray.com](https://app.docutray.com)

2. In the sidebar menu, navigate to **Document Types**

3. Click the **New Document Type** button

<img alt="New document type button" src={__img0} placeholder="blur" />

Step 2: Upload Sample Documents [#step-2-upload-sample-documents]

The wizard will show you an upload zone where you can upload your sample documents.

Supported Formats [#supported-formats]

* **PDF**: PDF documents
* **Images**: JPG, PNG, GIF, BMP, WebP

Limits [#limits]

* **Maximum size**: 10MB per file
* **Maximum quantity**: 5 files at a time

How to Upload [#how-to-upload]

You have two options:

1. **Drag and drop**: Drag files directly to the upload zone
2. **Select files**: Click the upload zone to open the file selector

<img alt="Document upload zone" src={__img1} placeholder="blur" />

<Callout type="info">
  Tip: Upload multiple examples of the same document type to get better schema generation results.
</Callout>

Step 3: Describe the Data to Extract [#step-3-describe-the-data-to-extract]

Once at least one document is uploaded, a configuration panel with a text field will appear.

Describe the Fields [#describe-the-fields]

In the description field, clearly indicate what data you want to extract from the document. Be specific about:

* **Field names** you want to obtain
* **Expected data types** (text, numbers, dates, lists)
* **Approximate location** in the document if relevant

Description Example [#description-example]

```
Extract the following data from the invoice:
- Invoice number
- Issue date
- Issuer tax ID
- Issuer company name
- Recipient tax ID
- Net total
- Tax (e.g., VAT)
- Total amount due
- List of items with: quantity, description, unit price, and total
```

<img alt="Configuration panel" src={__img2} placeholder="blur" />

Step 4: Generate the Schema [#step-4-generate-the-schema]

1. Click the **Generate Schema with AI** button

2. The system will analyze your documents and automatically generate:
   * A JSON schema with detected fields
   * A suggested name for the document type
   * A description of the document type

3. While generating, you'll see progress indicators:
   * Analyzing documents...
   * Generating schema...
   * Extracting test data...

<img alt="Generation progress" src={__img3} placeholder="blur" />

<Callout type="warning">
  Generation can take 10-30 seconds depending on document complexity.
</Callout>

Step 5: Review and Edit the Schema [#step-5-review-and-edit-the-schema]

Once generated, you can view and edit the schema in an interactive table.

Edit Fields [#edit-fields]

For each field you can modify:

* **Name**: The field identifier name
* **Type**: Text, Number, Boolean, Array, or Object
* **Description**: A field description
* **Required**: Whether the field is mandatory

Available Field Types [#available-field-types]

| Type    | Icon   | Use               |
| ------- | ------ | ----------------- |
| Text    | `A`    | Text strings      |
| Number  | `#`    | Numeric values    |
| Boolean | Toggle | True/False        |
| Array   | `[ ]`  | Arrays of values  |
| Object  | `{ }`  | Nested structures |

Add or Remove Fields [#add-or-remove-fields]

* **Add**: Use the "Add Field" button at the bottom of the table
* **Remove**: Use the trash icon on each row

<img alt="Schema editor" src={__img4} placeholder="blur" />

Step 6: Test the Extraction [#step-6-test-the-extraction]

The system automatically runs an extraction test after generating the schema.

View Results [#view-results]

1. Switch to the **Results** tab

2. You'll see the extracted data from the sample document in structured format

3. You can toggle between tree view and JSON to review the data

If Results are Incorrect [#if-results-are-incorrect]

1. Go back to the **Configuration** tab
2. Adjust the schema as needed
3. Click **Regenerate** to test again

<img alt="Extraction results" src={__img5} placeholder="blur" />

Step 7: Configure Metadata [#step-7-configure-metadata]

Before creating the document type, configure its information:

Name (Required) [#name-required]

* Enter a descriptive name for the document type
* Example: "Electronic Invoice", "Fee Receipt"

Description (Optional) [#description-optional]

* Add a description to help identify the document's purpose

Save as Draft [#save-as-draft]

* Check this option if you want to save the type without activating it immediately
* Drafts are not available for API use until activated

<img alt="Metadata form" src={__img6} placeholder="blur" />

Step 8: Create the Document Type [#step-8-create-the-document-type]

1. Review that all data is correct

2. Click the **Create Document** button in the top right corner

3. The system will save the document type and redirect you to its detail page

<img alt="Create document button" src={__img7} placeholder="blur" />

<Callout type="success">
  Congratulations! Your new document type is ready to use.
</Callout>

Error Handling [#error-handling]

Common Errors [#common-errors]

| Error              | Solution                                     |
| ------------------ | -------------------------------------------- |
| File too large     | Reduce file size to under 10MB               |
| Unsupported format | Use PDF, JPG, PNG, GIF, BMP, or WebP         |
| Generation error   | Check your connection and retry              |
| Extraction error   | Adjust the description and regenerate schema |

Retry Operations [#retry-operations]

If an error occurs during generation or extraction:

1. An alert will appear with the error message
2. Use the **Retry** button to re-execute the operation
3. If the error persists, try with a more detailed description

Next Steps [#next-steps]

Once your document type is created, you can:

* **Use the API**: Convert documents using the `/api/convert` endpoint
* **Create Flows**: Automate processing with DocFlows
* **Configure Webhooks**: Receive notifications when documents are processed

<Cards>
  <Card title="API Documentation" href="/docs/api/convert/post" />

  <Card title="Configure Webhooks" href="/docs/webhooks" />
</Cards>

Keyboard Shortcuts [#keyboard-shortcuts]

For faster navigation:

| Action                        | Shortcut                          |
| ----------------------------- | --------------------------------- |
| Open file selector            | `Enter` or `Space` on upload zone |
| Navigate between fields       | `Tab`                             |
| Expand/collapse nested fields | `Enter` on expand button          |


---

# User Guides (https://docs.docutray.com/docs/guides)


User Guides [#user-guides]

Find detailed guides to get the most out of Docutray's features.

Document Management [#document-management]

<Cards>
  <Card title="Create Document Type" href="/docs/guides/crear-tipo-documento" description="Learn to create custom document types using the AI wizard" />
</Cards>

Configuration [#configuration]

<Cards>
  <Card title="Getting Started" href="/docs/getting-started" description="First steps: create account and configure API Key" />

  <Card title="Configure Webhooks" href="/docs/webhooks/configuracion" description="Receive automatic processing notifications" />
</Cards>


---

# Client (https://docs.docutray.com/docs/node-sdk/client)


DocuTray [#docutray]

The main client class for interacting with the DocuTray API. Provides access to all API resources through typed properties.

```ts
import DocuTray from 'docutray';

// Using environment variable (DOCUTRAY_API_KEY)
const client = new DocuTray();
```

```ts
// Explicit API key
const client = new DocuTray({ apiKey: 'dt_my-api-key' });
```

```ts
// Custom configuration
const client = new DocuTray({
  apiKey: 'dt_my-api-key',
  timeout: 30_000,
  maxRetries: 3,
});
```

Resources [#resources]

| Property         | Type                                                         | Description                        |
| ---------------- | ------------------------------------------------------------ | ---------------------------------- |
| `convert`        | [`Convert`](/docs/node-sdk/resources/convert)                | Document conversion operations     |
| `identify`       | [`Identify`](/docs/node-sdk/resources/identify)              | Document identification operations |
| `documentTypes`  | [`DocumentTypes`](/docs/node-sdk/resources/document-types)   | Document type catalog              |
| `steps`          | [`Steps`](/docs/node-sdk/resources/steps)                    | Processing step execution          |
| `knowledgeBases` | [`KnowledgeBases`](/docs/node-sdk/resources/knowledge-bases) | Knowledge base management          |

ClientOptions [#clientoptions]

Configuration options for the DocuTray client.

<AutoTypeTable path="../../vendor/docutray-node/src/core/types.ts" name="ClientOptions" />

RequestOptions [#requestoptions]

Per-request options that override client-level defaults.

<AutoTypeTable path="../../vendor/docutray-node/src/core/types.ts" name="RequestOptions" />

RetryConfig [#retryconfig]

Configuration for the exponential backoff retry strategy.

<AutoTypeTable path="../../vendor/docutray-node/src/core/types.ts" name="RetryConfig" />

File Input Types [#file-input-types]

FileInput [#fileinput]

Accepted file input types for document uploads: `Blob | Buffer | ArrayBuffer | FileWithMetadata`.

FileWithMetadata [#filewithmetadata]

A file with explicit filename and optional content type.

<AutoTypeTable path="../../vendor/docutray-node/src/core/types.ts" name="FileWithMetadata" />


---

# Errors (https://docs.docutray.com/docs/node-sdk/errors)


Error Hierarchy [#error-hierarchy]

All errors thrown by the SDK extend `DocuTrayError`. API errors include HTTP status code information and response details.

```
DocuTrayError
├── APIConnectionError
│   └── APITimeoutError
└── APIError
    ├── BadRequestError (400)
    ├── AuthenticationError (401)
    ├── PermissionDeniedError (403)
    ├── NotFoundError (404)
    ├── ConflictError (409)
    ├── UnprocessableEntityError (422)
    ├── RateLimitError (429)
    └── InternalServerError (5xx)
```

Usage [#usage]

```ts
import DocuTray, {
  DocuTrayError,
  APIError,
  AuthenticationError,
  RateLimitError,
} from 'docutray';

const client = new DocuTray();

try {
  await client.convert.run({ documentTypeCode: 'invoice', url: '...' });
} catch (err) {
  if (err instanceof RateLimitError) {
    console.log(`Rate limited. Retry after ${err.retryAfter}s`);
  } else if (err instanceof AuthenticationError) {
    console.log('Invalid API key');
  } else if (err instanceof APIError) {
    console.log(`API error ${err.statusCode}: ${err.message}`);
    console.log('Request ID:', err.requestId);
  } else if (err instanceof DocuTrayError) {
    console.log('SDK error:', err.message);
  }
}
```

DocuTrayError [#docutrayerror]

Base error class for all SDK errors.

| Property  | Type     | Description      |
| --------- | -------- | ---------------- |
| `message` | `string` | Error message    |
| `name`    | `string` | Error class name |

APIConnectionError [#apiconnectionerror]

Thrown when the SDK cannot establish a connection to the API.

| Property | Type      | Description                                             |
| -------- | --------- | ------------------------------------------------------- |
| `cause`  | `unknown` | The underlying error that caused the connection failure |

APITimeoutError [#apitimeouterror]

Thrown when a request exceeds the configured timeout or is aborted. Extends `APIConnectionError`.

APIError [#apierror]

Thrown when the API returns a non-success HTTP status code.

| Property     | Type                  | Description                     |
| ------------ | --------------------- | ------------------------------- |
| `statusCode` | `number`              | The HTTP status code            |
| `requestId`  | `string \| undefined` | The `x-request-id` header value |
| `body`       | `unknown`             | The parsed response body        |
| `headers`    | `Headers`             | The raw response headers        |

Status-Specific Errors [#status-specific-errors]

| Error Class                | HTTP Status | Description                |
| -------------------------- | ----------- | -------------------------- |
| `BadRequestError`          | 400         | Invalid request parameters |
| `AuthenticationError`      | 401         | Invalid or missing API key |
| `PermissionDeniedError`    | 403         | Insufficient permissions   |
| `NotFoundError`            | 404         | Resource not found         |
| `ConflictError`            | 409         | Resource conflict          |
| `UnprocessableEntityError` | 422         | Validation errors          |
| `RateLimitError`           | 429         | Rate limit exceeded        |
| `InternalServerError`      | 5xx         | Server-side errors         |

RateLimitError [#ratelimiterror]

Includes additional rate-limit metadata extracted from response headers.

| Property     | Type                  | Description                                |
| ------------ | --------------------- | ------------------------------------------ |
| `retryAfter` | `number \| undefined` | Seconds to wait before retrying            |
| `limitType`  | `string \| undefined` | Type of rate limit hit                     |
| `limit`      | `number \| undefined` | Maximum requests allowed in current window |
| `remaining`  | `number \| undefined` | Requests remaining in current window       |
| `resetTime`  | `Date \| undefined`   | When the rate limit window resets          |


---

# Node.js SDK (https://docs.docutray.com/docs/node-sdk)


This is the API reference for the DocuTray Node.js SDK.

Installation [#installation]

```bash
npm install docutray
```

Quick Start [#quick-start]

```ts
import DocuTray from 'docutray';

const client = new DocuTray(); // uses DOCUTRAY_API_KEY env var

const result = await client.convert.run({
  documentTypeCode: 'invoice',
  url: 'https://example.com/invoice.pdf',
});
```

Modules [#modules]

Client [#client]

The main entry point for the SDK:

* [`DocuTray`](/docs/node-sdk/client) - Client class with resource properties

Errors [#errors]

Error handling classes:

* [Error Hierarchy](/docs/node-sdk/errors)

Resources [#resources]

API resource classes:

* [Convert](/docs/node-sdk/resources/convert) - Document conversion
* [Identify](/docs/node-sdk/resources/identify) - Document identification
* [DocumentTypes](/docs/node-sdk/resources/document-types) - Document type catalog
* [Steps](/docs/node-sdk/resources/steps) - Step execution
* [KnowledgeBases](/docs/node-sdk/resources/knowledge-bases) - Knowledge base operations

Types [#types]

Response and model types:

* [Convert Types](/docs/node-sdk/types/convert)
* [Identify Types](/docs/node-sdk/types/identify)
* [Document Type Types](/docs/node-sdk/types/document-type)
* [Step Types](/docs/node-sdk/types/step)
* [Knowledge Base Types](/docs/node-sdk/types/knowledge-base)
* [Shared Types](/docs/node-sdk/types/shared)


---

# Client (https://docs.docutray.com/docs/python-sdk/client)


The main client classes for interacting with the DocuTray API.

Client [#client]

Synchronous client for the DocuTray API.

**Example:**

```python
>>> client = Client(api_key="sk_test_123")
    >>> # Convert a document
    >>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data)
    >>> client.close()

    Or using a context manager:
    >>> with Client(api_key="sk_test_123") as client:
    ...     result = client.identify.run(file=Path("document.pdf"))
    ...     print(f"Type: {result.document_type.name}")
```

**Properties:**

* `convert`
  : Document conversion operations.

* `document_types`
  : Document type catalog operations.

* `identify`
  : Document identification operations.

* `knowledge_bases`
  : Knowledge base operations for semantic search.

* `steps`
  : Step execution operations.

AsyncClient [#asyncclient]

Asynchronous client for the DocuTray API.

**Example:**

```python
>>> async with AsyncClient(api_key="sk_test_123") as client:
    ...     result = await client.convert.run(
    ...         file=Path("invoice.pdf"),
    ...         document_type_code="invoice"
    ...     )
    ...     print(result.data)
```

**Properties:**

* `convert`
  : Document conversion operations (async).

* `document_types`
  : Document type catalog operations (async).

* `identify`
  : Document identification operations (async).

* `knowledge_bases`
  : Knowledge base operations for semantic search (async).

* `steps`
  : Step execution operations (async).


---

# Exceptions (https://docs.docutray.com/docs/python-sdk/exceptions)


Exception classes for error handling in the DocuTray SDK.

Exception Hierarchy [#exception-hierarchy]

```
DocuTrayError (base)
├── APIConnectionError (network errors)
│   └── APITimeoutError (request timeout)
└── APIError (HTTP errors)
    ├── BadRequestError (400)
    ├── AuthenticationError (401)
    ├── PermissionDeniedError (403)
    ├── NotFoundError (404)
    ├── ConflictError (409)
    ├── UnprocessableEntityError (422)
    ├── RateLimitError (429)
    └── InternalServerError (5xx)
```

DocuTrayError [#docutrayerror]

Base exception for all DocuTray SDK errors.

**Arguments:**

message: The error message.

APIConnectionError [#apiconnectionerror]

Raised when the SDK cannot connect to the API server.

This includes network errors, DNS resolution failures, and other
connection-level problems.

**Arguments:**

message: The error message.
should\_retry: Whether this error should be retried.

APITimeoutError [#apitimeouterror]

Raised when a request times out.

**Arguments:**

message: The error message.

APIError [#apierror]

Base class for errors returned by the API.

All HTTP error responses from the API are converted to subclasses
of this exception. Contains rich context for debugging.

**Arguments:**

message: Human-readable error description.
status\_code: HTTP status code from the response.
request\_id: Request ID from X-Request-ID header for debugging.
body: Parsed JSON response body (can be any JSON type).
headers: Response headers.

BadRequestError [#badrequesterror]

Raised when the API returns a 400 Bad Request error.

This typically indicates invalid parameters or malformed request data.

AuthenticationError [#authenticationerror]

Raised when authentication fails (401 Unauthorized).

This indicates an invalid, expired, or missing API key.

PermissionDeniedError [#permissiondeniederror]

Raised when access is forbidden (403 Forbidden).

This indicates the API key doesn't have permission for the requested operation.

NotFoundError [#notfounderror]

Raised when a resource is not found (404 Not Found).

ConflictError [#conflicterror]

Raised when there's a conflict with the current state (409 Conflict).

This typically occurs when trying to create a resource that already exists
or when there's a version conflict.

UnprocessableEntityError [#unprocessableentityerror]

Raised when the request is well-formed but contains semantic errors (422).

This indicates validation errors in the request payload.

RateLimitError [#ratelimiterror]

Raised when rate limit is exceeded (429 Too Many Requests).

Check the `retry_after` property for the recommended wait time.
Additional rate limit details are available in `limit_type`, `limit`,
`remaining`, and `reset_time` properties when provided by the API.

**Properties:**

* `limit`
  : Get the maximum limit for this period.

* `limit_type`
  : Get the type of rate limit exceeded (minute, hour, day).

* `remaining`
  : Get the number of remaining requests.

* `reset_time`
  : Get the timestamp when the rate limit resets.

* `retry_after`
  : Get the recommended wait time in seconds from Retry-After header.

InternalServerError [#internalservererror]

Raised when the API returns a 5xx server error.

These errors are typically transient and can be retried.


---

# Python SDK (https://docs.docutray.com/docs/python-sdk)


This is the API reference for the DocuTray Python SDK.

Installation [#installation]

```bash
pip install docutray
```

Modules [#modules]

Client [#client]

The main entry points for the SDK:

* [`Client`](/docs/python-sdk/client#client) - Synchronous client
* [`AsyncClient`](/docs/python-sdk/client#asyncclient) - Asynchronous client

Exceptions [#exceptions]

Error handling classes:

* [Exception Hierarchy](/docs/python-sdk/exceptions)

Resources [#resources]

API resource classes:

* [Convert](/docs/python-sdk/resources/convert) - Document conversion
* [Identify](/docs/python-sdk/resources/identify) - Document identification
* [DocumentTypes](/docs/python-sdk/resources/document_types) - Document type catalog
* [Steps](/docs/python-sdk/resources/steps) - Step execution
* [KnowledgeBases](/docs/python-sdk/resources/knowledge_bases) - Knowledge base operations

Types [#types]

Response and model types:

* [Convert Types](/docs/python-sdk/types/convert)
* [Identify Types](/docs/python-sdk/types/identify)
* [Document Type Types](/docs/python-sdk/types/document_type)
* [Step Types](/docs/python-sdk/types/step)
* [Knowledge Base Types](/docs/python-sdk/types/knowledge_base)
* [Shared Types](/docs/python-sdk/types/shared)


---

# Webhook Configuration (https://docs.docutray.com/docs/webhooks/configuracion)


Webhook Configuration [#webhook-configuration]

This guide will show you how to configure and manage webhooks in your Docutray account.

Create a New Webhook [#create-a-new-webhook]

Step 1: Access webhook settings [#step-1-access-webhook-settings]

1. Sign in to your Docutray account at [https://app.docutray.com/login](https://app.docutray.com/login)

2. Select the organization you want to work with

3. Navigate to "Settings" > "Organization" > "Webhooks" in the navigation menu

<img alt="Webhook configuration menu" src={__img0} placeholder="blur" />

Step 2: Create a new webhook [#step-2-create-a-new-webhook]

1. Click the "Add Webhook" button

<img alt="Webhooks page" src={__img1} placeholder="blur" />

2. Complete the required fields in the form:

<img alt="Webhook creation form" src={__img2} placeholder="blur" />

* **Endpoint URL**: The HTTPS URL where you'll receive notifications
* **Events**: Select the types of events you want to receive:
  * **Conversion Events**: `CONVERSION_STARTED`, `CONVERSION_COMPLETED`, `CONVERSION_FAILED`
  * **Identification Events**: `IDENTIFICATION_STARTED`, `IDENTIFICATION_COMPLETED`, `IDENTIFICATION_FAILED`
  * **Steps Events**: `STEP_STARTED`, `STEP_COMPLETED`, `STEP_FAILED`
* **Enabled**: Allows you to activate or deactivate the webhook

3. Click "Create Webhook"

4. **Important**: Copy and save the automatically generated secret. This secret is used to verify the authenticity of requests.

<img alt="Generated webhook secret" src={__img3} placeholder="blur" />

Step 3: Configure your endpoint [#step-3-configure-your-endpoint]

Your endpoint must meet the following requirements:

* **Protocol**: Be publicly accessible via HTTPS
* **Response**: Respond with a 200-299 status code to confirm receipt
* **Format**: Process POST requests with Content-Type `application/json`
* **Response time**: Respond in less than 30 seconds

Webhook Management [#webhook-management]

View configured webhooks [#view-configured-webhooks]

On the webhooks page you can see all configured webhooks:

<img alt="List of configured webhooks" src={__img4} placeholder="blur" />

Edit a webhook [#edit-a-webhook]

1. Click the options menu (⋯) of the webhook you want to edit
2. Select "Edit"
3. Modify the necessary fields
4. Click "Update Webhook"

<img alt="Edit webhook" src={__img5} placeholder="blur" />

Enable/Disable a webhook [#enabledisable-a-webhook]

You can enable or disable a webhook using the toggle switch in the webhook list, without needing to delete it.

<img alt="Enable/Disable webhook" src={__img6} placeholder="blur" />

Regenerate secret [#regenerate-secret]

If you need to change the secret:

1. Click the options menu (⋯) of the webhook
2. Select "Regenerate secret"
3. Copy and save the new secret

**Note**: The old secret will stop working immediately.

<img alt="Regenerate secret" src={__img7} placeholder="blur" />

Delete a webhook [#delete-a-webhook]

1. Click the options menu (⋯) of the webhook
2. Select "Delete"
3. Confirm deletion

**Note**: This action cannot be undone.

Data Structure [#data-structure]

HTTP Headers [#http-headers]

Each webhook request includes the following headers:

```http
Content-Type: application/json
User-Agent: Docutray-Webhook/1.0
X-Docutray-Signature: sha256=<hmac_signature_body>
X-Docutray-Auth-Signature: sha256=<hmac_signature_auth>
X-Docutray-Timestamp: <unix_timestamp>
X-Docutray-Request-Id: <uuid>
X-Docutray-Event: <event_type>
```

Header Description [#header-description]

* **X-Docutray-Signature**: HMAC signature based on message body
* **X-Docutray-Auth-Signature**: HMAC signature based on metadata (for Lambda Authorizers)
* **X-Docutray-Timestamp**: Unix timestamp in seconds
* **X-Docutray-Request-Id**: Unique UUID for each delivery
* **X-Docutray-Event**: Event type (e.g., `CONVERSION_COMPLETED`)

Recommendations [#recommendations]

Reliability [#reliability]

* Respond quickly (within 30 seconds)
* Implement idempotent processing
* Log received events for debugging
* Use a message queue for asynchronous processing if needed

Error handling [#error-handling]

* Docutray will retry up to 5 times with exponential backoff
* If your endpoint doesn't respond consistently, the webhook may be automatically disabled
* Retries follow this sequence: 30s, 1min, 5min, 15min, 1hour

Testing [#testing]

* Use tools like [webhook.site](https://webhook.site) to test webhook reception
* Implement a test endpoint before production
* Verify that your firewall allows connections from Docutray servers

Next Steps [#next-steps]

* **[Security](/docs/webhooks/seguridad)**: Implement signature verification to protect your endpoint
* **[Conversion Events](/docs/webhooks/conversion)**: Learn about document conversion webhooks
* **[Identification Events](/docs/webhooks/identificacion)**: Learn about identification webhooks
* **[Steps Events](/docs/webhooks/steps)**: Learn about steps webhooks
* **[Examples](/docs/webhooks/ejemplos)**: Review sample code to implement webhooks


---

# Conversion Events (https://docs.docutray.com/docs/webhooks/conversion)


Conversion Webhooks [#conversion-webhooks]

Conversion webhooks are sent during document processing when using a specific document type to extract structured data.

Conversion Events [#conversion-events]

Conversion Started (CONVERSION_STARTED) [#conversion-started-conversion_started]

Sent when document processing begins:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  }
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion

Conversion Completed (CONVERSION_COMPLETED) [#conversion-completed-conversion_completed]

Sent when conversion finishes successfully:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "SUCCESS",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:15.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  },
  "data": {
    "invoiceNumber": "INV-2024-001",
    "amount": 1250.00,
    "vendor": "ABC Company Inc.",
    "date": "2024-01-15"
  }
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"SUCCESS"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `response_timestamp` (string): ISO 8601 timestamp of when conversion completed
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion
* `data` (object): Extracted data from the document according to the document type schema

Conversion Failed (CONVERSION_FAILED) [#conversion-failed-conversion_failed]

Sent when conversion fails:

```json
{
  "conversion_id": "clm123abc456def",
  "status": "ERROR",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:10.000Z",
  "document_type_code": "invoice",
  "original_filename": "invoice-001.pdf",
  "document_metadata": {
    "client_id": "ABC123",
    "department": "finance"
  },
  "error": "Error during OCR processing: Unable to process image"
}
```

**Fields:**

* `conversion_id` (string): Unique conversion ID
* `status` (string): Current status, always `"ERROR"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when conversion started
* `response_timestamp` (string): ISO 8601 timestamp of when conversion failed
* `document_type_code` (string): Code of the document type used
* `original_filename` (string, optional): Original name of the processed file
* `document_metadata` (object, optional): Custom metadata sent with the conversion
* `error` (string): Descriptive error message

Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'CONVERSION_STARTED':
      console.log(`Conversion started: ${data.conversion_id}`);
      // Update database with "processing" status
      break;

    case 'CONVERSION_COMPLETED':
      console.log(`Conversion completed: ${data.conversion_id}`);
      console.log('Extracted data:', data.data);
      // Save extracted data to database
      // Send notification to user
      break;

    case 'CONVERSION_FAILED':
      console.log(`Conversion failed: ${data.conversion_id}`);
      console.log('Error:', data.error);
      // Log error and notify user
      break;
  }

  res.status(200).send('OK');
});
```

Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Implementation Examples (https://docs.docutray.com/docs/webhooks/ejemplos)


Implementation Examples [#implementation-examples]

This page provides complete examples of how to implement Docutray webhooks in different languages and frameworks.

Node.js/Express [#nodejsexpress]

```javascript
const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.raw({ type: 'application/json' }));

app.post('/webhooks/docutray', (req, res) => {
  const signature = req.headers['x-docutray-signature'];
  const eventType = req.headers['x-docutray-event'];
  const payload = req.body;

  // Verify signature
  const secret = process.env.DOCUTRAY_WEBHOOK_SECRET;
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');

  if (`sha256=${expectedSignature}` !== signature) {
    return res.status(401).send('Signature verification failed');
  }

  const data = JSON.parse(payload);

  // Process based on event type
  switch (eventType) {
    // Conversion Events
    case 'CONVERSION_STARTED':
      console.log(`Conversion started: ${data.conversion_id}`);
      break;
    case 'CONVERSION_COMPLETED':
      console.log(`Conversion completed: ${data.conversion_id}`);
      console.log('Extracted data:', data.data);
      break;
    case 'CONVERSION_FAILED':
      console.log(`Conversion failed: ${data.conversion_id}`);
      console.log('Error:', data.error);
      break;

    // Identification Events
    case 'IDENTIFICATION_STARTED':
      console.log(`Identification started: ${data.identification_id}`);
      break;
    case 'IDENTIFICATION_COMPLETED':
      console.log(`Identification completed: ${data.identification_id}`);
      console.log('Identified type:', data.document_type);
      break;
    case 'IDENTIFICATION_FAILED':
      console.log(`Identification failed: ${data.identification_id}`);
      console.log('Error:', data.error);
      break;

    // Steps Events
    case 'STEP_STARTED':
      console.log(`Step started: ${data.step_name} (${data.step_execution_id})`);
      break;
    case 'STEP_COMPLETED':
      console.log(`Step completed: ${data.step_name} (${data.step_execution_id})`);
      if (data.data) console.log('Processed data:', data.data);
      if (data.validation) console.log('Validation:', data.validation);
      break;
    case 'STEP_FAILED':
      console.log(`Step failed: ${data.step_name} (${data.step_execution_id})`);
      console.log('Error:', data.error);
      break;
  }

  res.status(200).send('OK');
});

app.listen(3000);
```

Python/Flask [#pythonflask]

```python
import hmac
import hashlib
import json
import os
from flask import Flask, request

app = Flask(__name__)

@app.route('/webhooks/docutray', methods=['POST'])
def handle_webhook():
    signature = request.headers.get('X-Docutray-Signature')
    event_type = request.headers.get('X-Docutray-Event')
    payload = request.get_data()

    # Verify signature
    secret = os.environ['DOCUTRAY_WEBHOOK_SECRET'].encode()
    expected_signature = hmac.new(
        secret,
        payload,
        hashlib.sha256
    ).hexdigest()

    if f'sha256={expected_signature}' != signature:
        return 'Signature verification failed', 401

    data = json.loads(payload)

    # Process based on event type
    # Conversion Events
    if event_type == 'CONVERSION_STARTED':
        print(f"Conversion started: {data['conversion_id']}")
    elif event_type == 'CONVERSION_COMPLETED':
        print(f"Conversion completed: {data['conversion_id']}")
        print(f"Extracted data: {data['data']}")
    elif event_type == 'CONVERSION_FAILED':
        print(f"Conversion failed: {data['conversion_id']}")
        print(f"Error: {data['error']}")

    # Identification Events
    elif event_type == 'IDENTIFICATION_STARTED':
        print(f"Identification started: {data['identification_id']}")
    elif event_type == 'IDENTIFICATION_COMPLETED':
        print(f"Identification completed: {data['identification_id']}")
        print(f"Identified type: {data['document_type']}")
    elif event_type == 'IDENTIFICATION_FAILED':
        print(f"Identification failed: {data['identification_id']}")
        print(f"Error: {data['error']}")

    # Steps Events
    elif event_type == 'STEP_STARTED':
        print(f"Step started: {data['step_name']} ({data['step_execution_id']})")
    elif event_type == 'STEP_COMPLETED':
        print(f"Step completed: {data['step_name']} ({data['step_execution_id']})")
        if 'data' in data:
            print(f"Processed data: {data['data']}")
        if 'validation' in data:
            print(f"Validation: {data['validation']}")
    elif event_type == 'STEP_FAILED':
        print(f"Step failed: {data['step_name']} ({data['step_execution_id']})")
        print(f"Error: {data['error']}")

    return 'OK', 200

if __name__ == '__main__':
    app.run(port=3000)
```

Python/FastAPI [#pythonfastapi]

```python
import hmac
import hashlib
import os
from fastapi import FastAPI, Request, Header, HTTPException

app = FastAPI()

@app.post("/webhooks/docutray")
async def handle_webhook(
    request: Request,
    x_docutray_signature: str = Header(...),
    x_docutray_event: str = Header(...)
):
    payload = await request.body()

    # Verify signature
    secret = os.environ['DOCUTRAY_WEBHOOK_SECRET'].encode()
    expected_signature = hmac.new(
        secret,
        payload,
        hashlib.sha256
    ).hexdigest()

    if f'sha256={expected_signature}' != x_docutray_signature:
        raise HTTPException(status_code=401, detail="Signature verification failed")

    data = await request.json()

    # Process based on event type
    if x_docutray_event == 'CONVERSION_STARTED':
        print(f"Conversion started: {data['conversion_id']}")
    elif x_docutray_event == 'CONVERSION_COMPLETED':
        print(f"Conversion completed: {data['conversion_id']}")
        print(f"Extracted data: {data['data']}")
    elif x_docutray_event == 'CONVERSION_FAILED':
        print(f"Conversion failed: {data['conversion_id']}")
    # ... other events

    return {"status": "ok"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=3000)
```

PHP [#php]

```php
<?php

$signature = $_SERVER['HTTP_X_DOCUTRAY_SIGNATURE'];
$eventType = $_SERVER['HTTP_X_DOCUTRAY_EVENT'];
$payload = file_get_contents('php://input');

// Verify signature
$secret = getenv('DOCUTRAY_WEBHOOK_SECRET');
$expectedSignature = 'sha256=' . hash_hmac('sha256', $payload, $secret);

if ($signature !== $expectedSignature) {
    http_response_code(401);
    echo 'Signature verification failed';
    exit;
}

$data = json_decode($payload, true);

// Process based on event type
switch ($eventType) {
    case 'CONVERSION_STARTED':
        error_log("Conversion started: " . $data['conversion_id']);
        break;

    case 'CONVERSION_COMPLETED':
        error_log("Conversion completed: " . $data['conversion_id']);
        error_log("Extracted data: " . json_encode($data['data']));
        break;

    case 'CONVERSION_FAILED':
        error_log("Conversion failed: " . $data['conversion_id']);
        error_log("Error: " . $data['error']);
        break;

    // ... other events
}

http_response_code(200);
echo 'OK';
```

Go [#go]

```go
package main

import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "os"
)

type WebhookData map[string]interface{}

func verifySignature(payload []byte, signature string, secret string) bool {
    h := hmac.New(sha256.New, []byte(secret))
    h.Write(payload)
    expectedSignature := "sha256=" + hex.EncodeToString(h.Sum(nil))
    return expectedSignature == signature
}

func handleWebhook(w http.ResponseWriter, r *http.Request) {
    signature := r.Header.Get("X-Docutray-Signature")
    eventType := r.Header.Get("X-Docutray-Event")

    payload, err := ioutil.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Error reading body", http.StatusBadRequest)
        return
    }

    // Verify signature
    secret := os.Getenv("DOCUTRAY_WEBHOOK_SECRET")
    if !verifySignature(payload, signature, secret) {
        http.Error(w, "Signature verification failed", http.StatusUnauthorized)
        return
    }

    var data WebhookData
    if err := json.Unmarshal(payload, &data); err != nil {
        http.Error(w, "Error parsing JSON", http.StatusBadRequest)
        return
    }

    // Process based on event type
    switch eventType {
    case "CONVERSION_STARTED":
        log.Printf("Conversion started: %v", data["conversion_id"])
    case "CONVERSION_COMPLETED":
        log.Printf("Conversion completed: %v", data["conversion_id"])
        log.Printf("Extracted data: %v", data["data"])
    case "CONVERSION_FAILED":
        log.Printf("Conversion failed: %v", data["conversion_id"])
        log.Printf("Error: %v", data["error"])
    // ... other events
    }

    w.WriteHeader(http.StatusOK)
    fmt.Fprintf(w, "OK")
}

func main() {
    http.HandleFunc("/webhooks/docutray", handleWebhook)
    log.Println("Server started on :3000")
    log.Fatal(http.ListenAndServe(":3000", nil))
}
```

Ruby/Sinatra [#rubysinatra]

```ruby
require 'sinatra'
require 'json'
require 'openssl'

post '/webhooks/docutray' do
  signature = request.env['HTTP_X_DOCUTRAY_SIGNATURE']
  event_type = request.env['HTTP_X_DOCUTRAY_EVENT']
  payload = request.body.read

  # Verify signature
  secret = ENV['DOCUTRAY_WEBHOOK_SECRET']
  expected_signature = 'sha256=' + OpenSSL::HMAC.hexdigest('sha256', secret, payload)

  if signature != expected_signature
    halt 401, 'Signature verification failed'
  end

  data = JSON.parse(payload)

  # Process based on event type
  case event_type
  when 'CONVERSION_STARTED'
    puts "Conversion started: #{data['conversion_id']}"
  when 'CONVERSION_COMPLETED'
    puts "Conversion completed: #{data['conversion_id']}"
    puts "Extracted data: #{data['data']}"
  when 'CONVERSION_FAILED'
    puts "Conversion failed: #{data['conversion_id']}"
    puts "Error: #{data['error']}"
  # ... other events
  end

  status 200
  body 'OK'
end
```

Implementation recommendations [#implementation-recommendations]

Asynchronous processing [#asynchronous-processing]

For webhooks that require heavy processing, consider using a task queue:

```javascript
// Example with Bull (Redis)
const Queue = require('bull');
const webhookQueue = new Queue('webhook-processing');

app.post('/webhooks/docutray', async (req, res) => {
  // Verify signature first
  if (!verifySignature(req)) {
    return res.status(401).send('Invalid signature');
  }

  // Add to queue for asynchronous processing
  await webhookQueue.add({
    eventType: req.headers['x-docutray-event'],
    data: JSON.parse(req.body)
  });

  // Respond immediately
  res.status(200).send('OK');
});

// Process in background
webhookQueue.process(async (job) => {
  const { eventType, data } = job.data;
  // Heavy processing here
});
```

Error handling and retries [#error-handling-and-retries]

```javascript
app.post('/webhooks/docutray', async (req, res) => {
  try {
    // Verify signature
    if (!verifySignature(req)) {
      return res.status(401).send('Invalid signature');
    }

    // Process webhook
    await processWebhook(req.body);

    // Respond successfully
    res.status(200).send('OK');
  } catch (error) {
    console.error('Error processing webhook:', error);

    // Return 500 error so Docutray retries
    res.status(500).send('Internal server error');
  }
});
```

Logging and debugging [#logging-and-debugging]

```javascript
const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'webhooks.log' })
  ]
});

app.post('/webhooks/docutray', (req, res) => {
  const requestId = req.headers['x-docutray-request-id'];
  const eventType = req.headers['x-docutray-event'];

  logger.info('Webhook received', {
    requestId,
    eventType,
    timestamp: new Date().toISOString()
  });

  // Process webhook...

  logger.info('Webhook processed', {
    requestId,
    eventType,
    duration: Date.now() - startTime
  });

  res.status(200).send('OK');
});
```

Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)


---

# Identification Events (https://docs.docutray.com/docs/webhooks/identificacion)


Identification Webhooks [#identification-webhooks]

Identification webhooks are sent during the automatic document type identification process, where Docutray analyzes the document and determines its type among the specified options.

Identification Events [#identification-events]

Identification Started (IDENTIFICATION_STARTED) [#identification-started-identification_started]

Sent when document identification process begins:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "document_type_code_options": ["invoice", "receipt", "purchase_order"]
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `document_type_code_options` (array, optional): List of document type codes to identify among

Identification Completed (IDENTIFICATION_COMPLETED) [#identification-completed-identification_completed]

Sent when identification finishes successfully:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "SUCCESS",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:08.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "document_type": {
    "code": "invoice",
    "name": "Invoice",
    "confidence": 0.95
  },
  "document_type_code_options": ["invoice", "receipt", "purchase_order"],
  "alternatives": [
    {
      "code": "receipt",
      "name": "Receipt",
      "confidence": 0.78
    },
    {
      "code": "purchase_order",
      "name": "Purchase Order",
      "confidence": 0.45
    }
  ]
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"SUCCESS"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `response_timestamp` (string): ISO 8601 timestamp of when identification completed
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `document_type` (object): Identified document type with highest confidence
  * `code` (string): Document type code
  * `name` (string): Document type name
  * `confidence` (number): Identification confidence level (0-1)
* `document_type_code_options` (array, optional): List of document type codes that were identified among
* `alternatives` (array, optional): Alternative document types with their confidence levels

Identification Failed (IDENTIFICATION_FAILED) [#identification-failed-identification_failed]

Sent when identification fails:

```json
{
  "identification_id": "idn_abc123xyz789",
  "status": "ERROR",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:05.000Z",
  "original_filename": "unknown-document.pdf",
  "document_metadata": {
    "source": "email_attachment",
    "received_date": "2024-01-15"
  },
  "error": "Unable to identify document type: image quality too low"
}
```

**Fields:**

* `identification_id` (string): Unique identification ID
* `status` (string): Current status, always `"ERROR"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when identification started
* `response_timestamp` (string): ISO 8601 timestamp of when identification failed
* `original_filename` (string, optional): Original file name
* `document_metadata` (object, optional): Custom metadata sent with the identification
* `error` (string): Descriptive error message

Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'IDENTIFICATION_STARTED':
      console.log(`Identification started: ${data.identification_id}`);
      // Update database with "identifying" status
      break;

    case 'IDENTIFICATION_COMPLETED':
      console.log(`Identification completed: ${data.identification_id}`);
      console.log(`Identified type: ${data.document_type.code}`);
      console.log(`Confidence: ${data.document_type.confidence}`);
      // Save identified type to database
      // If confidence is high, proceed with automatic conversion
      if (data.document_type.confidence > 0.9) {
        // Start automatic conversion
      }
      break;

    case 'IDENTIFICATION_FAILED':
      console.log(`Identification failed: ${data.identification_id}`);
      console.log('Error:', data.error);
      // Log error and request manual intervention
      break;
  }

  res.status(200).send('OK');
});
```

Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Webhooks (https://docs.docutray.com/docs/webhooks)


Webhooks [#webhooks]

Webhooks allow you to receive real-time notifications about events that occur in your Docutray account. When you configure a webhook, Docutray will send an HTTP POST request to the URL you specify each time an event you've subscribed to occurs.

Available Webhook Types [#available-webhook-types]

Docutray supports three types of webhooks, each designed for different use cases:

<Cards>
  <Card title="Conversion Webhooks" href="/docs/webhooks/conversion">
    Receive notifications during document processing when using a specific document type to extract structured data.
  </Card>

  <Card title="Identification Webhooks" href="/docs/webhooks/identificacion">
    Receive notifications during the automatic document type identification process.
  </Card>

  <Card title="Steps Webhooks" href="/docs/webhooks/steps">
    Receive notifications during the execution of individual steps in document processing workflows.
  </Card>
</Cards>

Configuration Guides [#configuration-guides]

<Cards>
  <Card title="Initial Setup" href="/docs/webhooks/configuracion">
    Learn how to configure and manage webhooks in your Docutray account.
  </Card>

  <Card title="Security and Verification" href="/docs/webhooks/seguridad">
    Protect your endpoints with HMAC signature verification and replay attack prevention.
  </Card>

  <Card title="Implementation Examples" href="/docs/webhooks/ejemplos">
    Sample code in Node.js and Python to implement webhooks.
  </Card>
</Cards>

Key Features [#key-features]

* **Real-time notifications**: Receive events immediately as they occur
* **Multiple events**: Subscribe to specific events based on your needs
* **Robust security**: HMAC signature verification with two methods available
* **Automatic retries**: Retry system with exponential backoff
* **Flexible management**: Enable, disable, or delete webhooks without affecting your integration

Next Steps [#next-steps]

1. **[Configuration](/docs/webhooks/configuracion)**: Start by setting up your first webhook
2. **[Security](/docs/webhooks/seguridad)**: Implement signature verification on your endpoint
3. **Select your webhook type**: Choose between Conversion, Identification, or Steps based on your use case
4. **[Examples](/docs/webhooks/ejemplos)**: Review sample code for your preferred language


---

# Security and Signature Verification (https://docs.docutray.com/docs/webhooks/seguridad)


Security and Signature Verification [#security-and-signature-verification]

Docutray provides two signature verification methods to adapt to different architectures:

Method 1: Body-based signature (Traditional) [#method-1-body-based-signature-traditional]

Validates the complete payload content. Ideal for traditional implementations:

```javascript
const crypto = require('crypto');

function verifyWebhookBody(bodyString, signature, secret) {
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(bodyString)
    .digest('hex');

  return `sha256=${expectedSignature}` === signature;
}

// Usage in Express.js
app.post('/webhook', express.raw({type: 'application/json'}), (req, res) => {
  const signature = req.headers['x-docutray-signature'];
  const secret = process.env.WEBHOOK_SECRET;

  if (!verifyWebhookBody(req.body.toString(), signature, secret)) {
    return res.status(401).send('Invalid signature');
  }

  // Process webhook
  const payload = JSON.parse(req.body);
  // ...
});
```

Method 2: Authentication signature (Lambda Authorizers compatible) [#method-2-authentication-signature-lambda-authorizers-compatible]

Validates using only metadata in headers, **without body access**. Ideal for AWS Lambda Authorizers, Azure Functions, or Google Cloud Functions:

```javascript
const crypto = require('crypto');

function verifyWebhookAuth(headers, webhookUrl, secret) {
  const authSignature = headers['x-docutray-auth-signature'];
  const timestamp = headers['x-docutray-timestamp'];
  const requestId = headers['x-docutray-request-id'];
  const eventType = headers['x-docutray-event'];

  // Validate timestamp (5-minute window)
  const now = Math.floor(Date.now() / 1000);
  if (Math.abs(now - parseInt(timestamp)) > 300) {
    return false; // Timestamp expired
  }

  // Calculate expected signature
  const authPayload = `${requestId}|${timestamp}|${webhookUrl}|${eventType}`;
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(authPayload)
    .digest('hex');

  return `sha256=${expectedSignature}` === authSignature;
}
```

Complete example: AWS Lambda Authorizer [#complete-example-aws-lambda-authorizer]

```javascript
// Lambda Authorizer for AWS API Gateway
exports.handler = async (event) => {
  const crypto = require('crypto');

  try {
    // Extract headers
    const authSignature = event.headers['x-docutray-auth-signature'];
    const timestamp = parseInt(event.headers['x-docutray-timestamp']);
    const requestId = event.headers['x-docutray-request-id'];
    const eventType = event.headers['x-docutray-event'];
    const webhookUrl = `https://${event.headers.host}${event.path}`;

    // Validate header presence
    if (!authSignature || !timestamp || !requestId || !eventType) {
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Validate timestamp (5-minute window)
    const now = Math.floor(Date.now() / 1000);
    if (Math.abs(now - timestamp) > 300) {
      console.log('Webhook timestamp expired');
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Recalculate expected signature
    const secret = process.env.DOCUTRAY_WEBHOOK_SECRET;
    const authPayload = `${requestId}|${timestamp}|${webhookUrl}|${eventType}`;
    const expectedSignature = crypto
      .createHmac('sha256', secret)
      .update(authPayload)
      .digest('hex');

    // Validate signature
    if (`sha256=${expectedSignature}` !== authSignature) {
      console.log('Signature verification failed');
      return generatePolicy('user', 'Deny', event.methodArn);
    }

    // Valid signature - allow request
    return generatePolicy('user', 'Allow', event.methodArn);

  } catch (error) {
    console.error('Error in authorizer:', error);
    return generatePolicy('user', 'Deny', event.methodArn);
  }
};

function generatePolicy(principalId, effect, resource) {
  return {
    principalId,
    policyDocument: {
      Version: '2012-10-17',
      Statement: [{
        Action: 'execute-api:Invoke',
        Effect: effect,
        Resource: resource
      }]
    }
  };
}
```

Example: Python for AWS Lambda Authorizer [#example-python-for-aws-lambda-authorizer]

```python
import hmac
import hashlib
import time
import os

def lambda_handler(event, context):
    try:
        # Extract headers
        headers = {k.lower(): v for k, v in event['headers'].items()}
        auth_signature = headers.get('x-docutray-auth-signature')
        timestamp = int(headers.get('x-docutray-timestamp', 0))
        request_id = headers.get('x-docutray-request-id')
        event_type = headers.get('x-docutray-event')
        webhook_url = f"https://{headers['host']}{event['path']}"

        # Validate header presence
        if not all([auth_signature, timestamp, request_id, event_type]):
            return generate_policy('user', 'Deny', event['methodArn'])

        # Validate timestamp (5-minute window)
        now = int(time.time())
        if abs(now - timestamp) > 300:
            print('Webhook timestamp expired')
            return generate_policy('user', 'Deny', event['methodArn'])

        # Recalculate expected signature
        secret = os.environ['DOCUTRAY_WEBHOOK_SECRET']
        auth_payload = f"{request_id}|{timestamp}|{webhook_url}|{event_type}"
        expected_signature = hmac.new(
            secret.encode(),
            auth_payload.encode(),
            hashlib.sha256
        ).hexdigest()

        # Validate signature
        if f"sha256={expected_signature}" != auth_signature:
            print('Signature verification failed')
            return generate_policy('user', 'Deny', event['methodArn'])

        # Valid signature - allow request
        return generate_policy('user', 'Allow', event['methodArn'])

    except Exception as error:
        print(f'Error in authorizer: {error}')
        return generate_policy('user', 'Deny', event['methodArn'])

def generate_policy(principal_id, effect, resource):
    return {
        'principalId': principal_id,
        'policyDocument': {
            'Version': '2012-10-17',
            'Statement': [{
                'Action': 'execute-api:Invoke',
                'Effect': effect,
                'Resource': resource
            }]
        }
    }
```

Replay attack protection [#replay-attack-protection]

The system includes automatic replay attack protection through:

1. **Unique timestamp**: Each delivery includes `X-Docutray-Timestamp` (Unix seconds)
2. **Unique Request ID**: Each delivery has a unique `X-Docutray-Request-Id` (UUID)

Security recommendations [#security-recommendations]

* **Validate timestamp**: Reject requests with timestamps outside a reasonable window (recommended: 5 minutes)
* **Cache request-id**: Temporarily store processed request-ids to detect duplicates
* **Use HTTPS**: Always use HTTPS endpoints to prevent interception

```javascript
// Example of request-id cache with Redis
const redis = require('redis');
const client = redis.createClient();

async function isReplayAttack(requestId) {
  const key = `webhook:${requestId}`;
  const exists = await client.exists(key);

  if (exists) {
    return true; // Request-id already processed
  }

  // Mark as processed (10-minute TTL)
  await client.setex(key, 600, '1');
  return false;
}
```

Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Steps Events](/docs/webhooks/steps)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Steps Events (https://docs.docutray.com/docs/webhooks/steps)


Steps Webhooks [#steps-webhooks]

Steps webhooks are sent during the execution of individual steps in document processing workflows. Each step can perform operations such as conversion, identification, or validation.

Steps Events [#steps-events]

Step Started (STEP_STARTED) [#step-started-step_started]

Sent when step execution begins:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "PROCESSING",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  }
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"PROCESSING"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `document_metadata` (object, optional): Metadata of the document being processed

Step Completed (STEP_COMPLETED) [#step-completed-step_completed]

Sent when a step finishes successfully:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "completed",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:12.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  },
  "data": {
    "invoiceNumber": "INV-2024-001",
    "amount": 1250.00,
    "vendor": "ABC Company Inc.",
    "date": "2024-01-15"
  },
  "identification": {
    "document_type": "invoice",
    "confidence": 0.95
  },
  "validation": {
    "errors": {
      "count": 0,
      "messages": []
    },
    "warnings": {
      "count": 1,
      "messages": ["Amount exceeds historical average"]
    }
  }
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"completed"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `response_timestamp` (string): ISO 8601 timestamp of when step completed
* `document_metadata` (object, optional): Metadata of the document being processed
* `data` (object, optional): Processed document data (if step performed conversion)
* `identification` (object, optional): Identification result (if step performed identification)
* `validation` (object, optional): Validation result with errors and warnings

Step Failed (STEP_FAILED) [#step-failed-step_failed]

Sent when a step fails:

```json
{
  "step_execution_id": "step_exec_xyz123",
  "step_id": "step_convert_invoice",
  "step_name": "Convert Invoice",
  "status": "failed",
  "request_timestamp": "2024-01-15T10:30:00.000Z",
  "response_timestamp": "2024-01-15T10:30:08.000Z",
  "document_metadata": {
    "batch_id": "batch_001",
    "priority": "high"
  },
  "error": "Validation failed: required field 'invoiceNumber' is missing"
}
```

**Fields:**

* `step_execution_id` (string): Unique step execution ID
* `step_id` (string): Step ID in the workflow
* `step_name` (string): Descriptive name of the step
* `status` (string): Current status, always `"failed"` for this event
* `request_timestamp` (string): ISO 8601 timestamp of when step started
* `response_timestamp` (string): ISO 8601 timestamp of when step failed
* `document_metadata` (object, optional): Metadata of the document being processed
* `error` (string): Descriptive error message

Implementation Example [#implementation-example]

```javascript
app.post('/webhooks/docutray', (req, res) => {
  const eventType = req.headers['x-docutray-event'];
  const data = JSON.parse(req.body);

  switch (eventType) {
    case 'STEP_STARTED':
      console.log(`Step started: ${data.step_name} (${data.step_execution_id})`);
      // Update UI with workflow progress
      break;

    case 'STEP_COMPLETED':
      console.log(`Step completed: ${data.step_name} (${data.step_execution_id})`);

      // Process data if available
      if (data.data) {
        console.log('Processed data:', data.data);
        // Save data to database
      }

      // Check validation results
      if (data.validation) {
        if (data.validation.errors.count > 0) {
          console.log('Validation errors:', data.validation.errors.messages);
          // Notify validation errors
        }
        if (data.validation.warnings.count > 0) {
          console.log('Warnings:', data.validation.warnings.messages);
          // Log warnings
        }
      }
      break;

    case 'STEP_FAILED':
      console.log(`Step failed: ${data.step_name} (${data.step_execution_id})`);
      console.log('Error:', data.error);
      // Stop workflow and notify error
      // Log failure for analysis
      break;
  }

  res.status(200).send('OK');
});
```

Use cases [#use-cases]

Monitoring complex workflows [#monitoring-complex-workflows]

Steps webhooks are ideal for monitoring the execution of multi-step workflows:

```javascript
// Example: Track progress of a multi-step workflow
const flowProgress = {
  stepStates: {},
  totalSteps: 0,
  completedSteps: 0
};

function handleStepEvent(data, eventType) {
  const stepId = data.step_id;

  if (eventType === 'STEP_STARTED') {
    flowProgress.stepStates[stepId] = 'processing';
    flowProgress.totalSteps++;
  } else if (eventType === 'STEP_COMPLETED') {
    flowProgress.stepStates[stepId] = 'completed';
    flowProgress.completedSteps++;
  } else if (eventType === 'STEP_FAILED') {
    flowProgress.stepStates[stepId] = 'failed';
  }

  // Calculate progress percentage
  const progress = (flowProgress.completedSteps / flowProgress.totalSteps) * 100;
  console.log(`Workflow progress: ${progress}%`);
}
```

Validation and quality control [#validation-and-quality-control]

Use validation results to implement quality controls:

```javascript
function handleValidationResults(validation) {
  // Stop processing if there are critical errors
  if (validation.errors.count > 0) {
    // Send to manual review
    sendToManualReview(validation.errors.messages);
    return;
  }

  // Warnings don't block the workflow
  if (validation.warnings.count > 0) {
    // Log for analysis but continue
    logWarnings(validation.warnings.messages);
  }

  // Continue to next step in workflow
  proceedToNextStep();
}
```

Related pages [#related-pages]

* [Webhook Configuration](/docs/webhooks/configuracion)
* [Security and Verification](/docs/webhooks/seguridad)
* [Conversion Events](/docs/webhooks/conversion)
* [Identification Events](/docs/webhooks/identificacion)
* [Implementation Examples](/docs/webhooks/ejemplos)


---

# Convert (https://docs.docutray.com/docs/node-sdk/resources/convert)


Convert [#convert]

Resource for converting documents to structured data using OCR. Access via `client.convert`.

Methods [#methods]

run(params, options?) [#runparams-options]

Creates a synchronous conversion request. The API processes the document and returns the result.

```ts
const status = await client.convert.run({
  documentTypeCode: 'invoice',
  url: 'https://example.com/invoice.pdf',
});

console.log(status.conversion_id);
console.log(status.status); // 'SUCCESS' | 'ERROR' | 'ENQUEUED' | 'PROCESSING'
console.log(status.data);   // extracted data (on success)
```

**Parameters**: [`ConvertParams`](/docs/node-sdk/types/convert#convertparams)
**Returns**: `Promise<ConversionStatus>`

runAsync(params, options?) [#runasyncparams-options]

Creates an asynchronous conversion request. Returns a status object with a `.wait()` method for polling.

```ts
import fs from 'fs';

const status = await client.convert.runAsync({
  documentTypeCode: 'invoice',
  file: fs.readFileSync('invoice.pdf'),
});

// Poll until completion
const result = await status.wait();
console.log(result.data); // extracted data
```

**Parameters**: [`ConvertParams`](/docs/node-sdk/types/convert#convertparams)
**Returns**: `Promise<ConversionStatus & { wait(): Promise<ConversionStatus> }>`

getStatus(conversionId, options?) [#getstatusconversionid-options]

Retrieves the current status of a conversion operation.

```ts
const status = await client.convert.getStatus('conv_abc123');
```

**Parameters**: `conversionId: string`
**Returns**: `Promise<ConversionStatus>`

Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.convert.withRawResponse.run({
  documentTypeCode: 'invoice',
  url: 'https://example.com/invoice.pdf',
});

console.log(raw.statusCode); // 200
console.log(raw.headers);    // Response headers
const data = await raw.parse(); // Parsed body
```


---

# Document Types (https://docs.docutray.com/docs/node-sdk/resources/document-types)


DocumentTypes [#documenttypes]

Resource for listing and inspecting document type definitions. Access via `client.documentTypes`.

Methods [#methods]

list(params?, options?) [#listparams-options]

Lists available document types with pagination.

```ts
const page = await client.documentTypes.list({ limit: 10 });

for (const docType of page.data) {
  console.log(docType.name, docType.codeType);
}

// Auto-pagination
for await (const docType of page.autoPagingIter()) {
  console.log(docType.name);
}
```

**Parameters**: [`DocumentTypesListParams`](/docs/node-sdk/types/document-type#documenttypeslistparams) (optional)
**Returns**: `Promise<Page<DocumentType>>`

get(id, options?) [#getid-options]

Retrieves a single document type by ID.

```ts
const docType = await client.documentTypes.get('dt_abc123');
console.log(docType.name, docType.schema);
```

**Parameters**: `id: string`
**Returns**: `Promise<DocumentType>`

validate(id, options?) [#validateid-options]

Validates a document type schema.

```ts
const result = await client.documentTypes.validate('dt_abc123');

if (result.errors.count === 0) {
  console.log('Schema is valid');
} else {
  console.log('Errors:', result.errors.messages);
}
```

**Parameters**: `id: string`
**Returns**: `Promise<ValidationResult>`

Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.documentTypes.withRawResponse.list();
console.log(raw.statusCode);
```


---

# Identify (https://docs.docutray.com/docs/node-sdk/resources/identify)


Identify [#identify]

Resource for identifying document types from images. Access via `client.identify`.

Methods [#methods]

run(params, options?) [#runparams-options]

Creates a synchronous identification request.

```ts
const status = await client.identify.run({
  url: 'https://example.com/document.pdf',
});

console.log(status.document_type); // best match
console.log(status.alternatives);  // other matches
```

**Parameters**: [`IdentifyParams`](/docs/node-sdk/types/identify#identifyparams)
**Returns**: `Promise<IdentificationStatus>`

runAsync(params, options?) [#runasyncparams-options]

Creates an asynchronous identification request with a `.wait()` method for polling.

```ts
import fs from 'fs';

const status = await client.identify.runAsync({
  file: fs.readFileSync('document.pdf'),
});

const result = await status.wait();
console.log(result.document_type);
```

**Parameters**: [`IdentifyParams`](/docs/node-sdk/types/identify#identifyparams)
**Returns**: `Promise<IdentificationStatus & { wait(): Promise<IdentificationStatus> }>`

getStatus(identificationId, options?) [#getstatusidentificationid-options]

Retrieves the current status of an identification operation.

```ts
const status = await client.identify.getStatus('id_abc123');
```

**Parameters**: `identificationId: string`
**Returns**: `Promise<IdentificationStatus>`

Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.identify.withRawResponse.run({
  url: 'https://example.com/document.pdf',
});

console.log(raw.statusCode);
const data = await raw.parse();
```


---

# Knowledge Bases (https://docs.docutray.com/docs/node-sdk/resources/knowledge-bases)


KnowledgeBases [#knowledgebases]

Resource for managing knowledge bases and their documents. Access via `client.knowledgeBases`.

Methods [#methods]

list(params?, options?) [#listparams-options]

Lists knowledge bases with pagination.

```ts
const page = await client.knowledgeBases.list({ limit: 10 });

for (const kb of page.data) {
  console.log(kb.name, kb.documentCount);
}
```

**Returns**: `Promise<Page<KnowledgeBase>>`

get(id, options?) [#getid-options]

Retrieves a single knowledge base by ID.

```ts
const kb = await client.knowledgeBases.get('kb_abc123');
```

**Returns**: `Promise<KnowledgeBase>`

create(params, options?) [#createparams-options]

Creates a new knowledge base.

```ts
const kb = await client.knowledgeBases.create({
  name: 'Product catalog',
  description: 'Product information database',
});
```

**Returns**: `Promise<KnowledgeBase>`

update(id, params, options?) [#updateid-params-options]

Updates an existing knowledge base.

```ts
const kb = await client.knowledgeBases.update('kb_abc123', {
  name: 'Updated catalog',
});
```

**Returns**: `Promise<KnowledgeBase>`

delete(id, options?) [#deleteid-options]

Deletes a knowledge base.

```ts
await client.knowledgeBases.delete('kb_abc123');
```

**Returns**: `Promise<void>`

search(id, params, options?) [#searchid-params-options]

Searches a knowledge base for matching documents.

```ts
const results = await client.knowledgeBases.search('kb_abc123', {
  query: 'invoice total',
  limit: 5,
});

for (const item of results.data) {
  console.log(item.document.content, item.similarity);
}
```

**Returns**: `Promise<SearchResult>`

sync(id, options?) [#syncid-options]

Triggers a sync operation for the knowledge base.

```ts
const result = await client.knowledgeBases.sync('kb_abc123');
console.log(result.status, result.documentsProcessed);
```

**Returns**: `Promise<SyncResult>`

Documents [#documents]

Access knowledge base documents via `client.knowledgeBases.documents(knowledgeBaseId)`.

documents(kbId).list(params?, options?) [#documentskbidlistparams-options]

Lists documents in a knowledge base.

```ts
const docs = await client.knowledgeBases.documents('kb_abc123').list();
```

documents(kbId).get(docId, options?) [#documentskbidgetdocid-options]

Gets a single document.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').get('doc_xyz');
```

documents(kbId).create(params, options?) [#documentskbidcreateparams-options]

Creates a new document in the knowledge base.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').create({
  content: { title: 'Invoice', amount: 100 },
  metadata: { source: 'upload' },
});
```

documents(kbId).update(docId, params, options?) [#documentskbidupdatedocid-params-options]

Updates an existing document.

```ts
const doc = await client.knowledgeBases.documents('kb_abc123').update('doc_xyz', {
  content: { title: 'Updated Invoice' },
});
```

documents(kbId).delete(docId, options?) [#documentskbiddeletedocid-options]

Deletes a document from the knowledge base.

```ts
await client.knowledgeBases.documents('kb_abc123').delete('doc_xyz');
```


---

# Steps (https://docs.docutray.com/docs/node-sdk/resources/steps)


Steps [#steps]

Resource for running predefined processing steps. Access via `client.steps`.

Methods [#methods]

runAsync(params, options?) [#runasyncparams-options]

Runs a processing step asynchronously with a `.wait()` method for polling.

```ts
const status = await client.steps.runAsync({
  stepId: 'step_abc123',
  url: 'https://example.com/document.pdf',
});

const result = await status.wait();
console.log(result.status); // 'SUCCESS'
console.log(result.data);   // processed result
```

**Parameters**: [`StepsRunParams`](/docs/node-sdk/types/step#stepsrunparams)
**Returns**: `Promise<StepExecutionStatus & { wait(): Promise<StepExecutionStatus> }>`

getStatus(executionId, options?) [#getstatusexecutionid-options]

Retrieves the current status of a step execution.

```ts
const status = await client.steps.getStatus('exec_abc123');
console.log(status.status);
```

**Parameters**: `executionId: string`
**Returns**: `Promise<StepExecutionStatus>`

Raw Responses [#raw-responses]

Access raw HTTP response details via `withRawResponse`:

```ts
const raw = await client.steps.withRawResponse.runAsync({
  stepId: 'step_abc123',
  url: 'https://example.com/document.pdf',
});

console.log(raw.statusCode);
```


---

# Convert Types (https://docs.docutray.com/docs/node-sdk/types/convert)


ConversionStatusType [#conversionstatustype]

Possible statuses for a conversion operation.

```ts
type ConversionStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

ConversionResult [#conversionresult]

Extracted data from a successful conversion.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConversionResult" />

ConversionStatus [#conversionstatus]

Status of a conversion operation, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConversionStatus" />

ConvertParams [#convertparams]

Parameters for creating a conversion request. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/convert.ts" name="ConvertParams" />

Type Guards [#type-guards]

isConversionComplete(status) [#isconversioncompletestatus]

Returns `true` if the conversion has reached a terminal state (`SUCCESS` or `ERROR`).

isConversionSuccess(status) [#isconversionsuccessstatus]

Returns `true` if the conversion completed successfully.

isConversionError(status) [#isconversionerrorstatus]

Returns `true` if the conversion failed with an error.

```ts
import { isConversionSuccess, isConversionError } from 'docutray';

if (isConversionSuccess(status)) {
  console.log('Data:', status.data);
} else if (isConversionError(status)) {
  console.log('Error:', status.error);
}
```


---

# Document Type Types (https://docs.docutray.com/docs/node-sdk/types/document-type)


DocumentType [#documenttype]

A document type definition from the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="DocumentType" />

ValidationErrorInfo [#validationerrorinfo]

Validation error details.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationErrorInfo" />

ValidationWarningInfo [#validationwarninginfo]

Validation warning details.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationWarningInfo" />

ValidationResult [#validationresult]

Result of validating a document type schema.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="ValidationResult" />

DocumentTypesListParams [#documenttypeslistparams]

Parameters for listing document types.

<AutoTypeTable path="../../vendor/docutray-node/src/types/document-type.ts" name="DocumentTypesListParams" />

Type Guards [#type-guards]

isValidationValid(result) [#isvalidationvalidresult]

Returns `true` if the validation result has no errors.

hasValidationWarnings(result) [#hasvalidationwarningsresult]

Returns `true` if the validation result has warnings.

```ts
import { isValidationValid, hasValidationWarnings } from 'docutray';

const result = await client.documentTypes.validate('dt_abc123');

if (isValidationValid(result)) {
  console.log('Schema is valid');
  if (hasValidationWarnings(result)) {
    console.log('Warnings:', result.warnings.messages);
  }
}
```


---

# Identify Types (https://docs.docutray.com/docs/node-sdk/types/identify)


IdentificationStatusType [#identificationstatustype]

Possible statuses for an identification operation.

```ts
type IdentificationStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

DocumentTypeMatch [#documenttypematch]

A document type match with confidence score.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="DocumentTypeMatch" />

IdentificationResult [#identificationresult]

Result of a successful identification, including primary and alternative matches.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentificationResult" />

IdentificationStatus [#identificationstatus]

Status of an identification operation, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentificationStatus" />

IdentifyParams [#identifyparams]

Parameters for creating an identification request. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/identify.ts" name="IdentifyParams" />

Type Guards [#type-guards]

isIdentificationComplete(status) [#isidentificationcompletestatus]

Returns `true` if the identification has reached a terminal state (`SUCCESS` or `ERROR`).

isIdentificationSuccess(status) [#isidentificationsuccessstatus]

Returns `true` if the identification completed successfully.

isIdentificationError(status) [#isidentificationerrorstatus]

Returns `true` if the identification failed with an error.

```ts
import { isIdentificationSuccess } from 'docutray';

if (isIdentificationSuccess(status)) {
  console.log('Document type:', status.document_type);
}
```


---

# Knowledge Base Types (https://docs.docutray.com/docs/node-sdk/types/knowledge-base)


KnowledgeBase [#knowledgebase]

A knowledge base definition from the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="KnowledgeBase" />

KnowledgeBaseDocument [#knowledgebasedocument]

A document stored in a knowledge base.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="KnowledgeBaseDocument" />

SearchResultItem [#searchresultitem]

A single search result item with similarity score.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SearchResultItem" />

SearchResult [#searchresult]

Search results from a knowledge base query.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SearchResult" />

SyncResult [#syncresult]

Result of a knowledge base sync operation.

<AutoTypeTable path="../../vendor/docutray-node/src/types/knowledge-base.ts" name="SyncResult" />


---

# Shared Types (https://docs.docutray.com/docs/node-sdk/types/shared)


ImageContentType [#imagecontenttype]

Accepted image/document MIME types for file uploads.

```ts
type ImageContentType =
  | 'image/png'
  | 'image/jpeg'
  | 'image/tiff'
  | 'image/webp'
  | 'application/pdf';
```

Pagination [#pagination]

Pagination metadata from the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="Pagination" />

PaginatedResponse [#paginatedresponse]

Generic paginated response wrapper.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="PaginatedResponse" />

RateLimitInfo [#ratelimitinfo]

Rate limit information extracted from API response headers.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="RateLimitInfo" />

QuotaExceededInfo [#quotaexceededinfo]

Quota exceeded details returned in 429 responses.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="QuotaExceededInfo" />

ErrorDetail [#errordetail]

Error detail from an API error response.

<AutoTypeTable path="../../vendor/docutray-node/src/types/shared.ts" name="ErrorDetail" />


---

# Step Types (https://docs.docutray.com/docs/node-sdk/types/step)


StepExecutionStatusType [#stepexecutionstatustype]

Possible statuses for a step execution.

```ts
type StepExecutionStatusType = 'ENQUEUED' | 'PROCESSING' | 'SUCCESS' | 'ERROR';
```

StepExecutionStatus [#stepexecutionstatus]

Status of a step execution, as returned by the API.

<AutoTypeTable path="../../vendor/docutray-node/src/types/step.ts" name="StepExecutionStatus" />

StepsRunParams [#stepsrunparams]

Parameters for running a step. Provide exactly one of `file`, `url`, or `base64` as the document source.

<AutoTypeTable path="../../vendor/docutray-node/src/types/step.ts" name="StepsRunParams" />

Type Guards [#type-guards]

isStepExecutionComplete(status) [#isstepexecutioncompletestatus]

Returns `true` if the step execution has reached a terminal state (`SUCCESS` or `ERROR`).

isStepExecutionSuccess(status) [#isstepexecutionsuccessstatus]

Returns `true` if the step execution completed successfully.

isStepExecutionError(status) [#isstepexecutionerrorstatus]

Returns `true` if the step execution failed with an error.

```ts
import { isStepExecutionSuccess } from 'docutray';

const result = await status.wait();

if (isStepExecutionSuccess(result)) {
  console.log('Processed data:', result.data);
}
```


---

# Convert (https://docs.docutray.com/docs/python-sdk/resources/convert)


Convert resource for document conversion operations.

AsyncConvert [#asyncconvert]

Asynchronous document conversion operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     result = await client.convert.run(
    ...         file=Path("invoice.pdf"),
    ...         document_type_code="invoice"
    ...     )
    ...     print(result.data)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

get_status [#get_status]

```python
def get_status(self, conversion_id: str) -> ConversionStatus
```

Get the status of an asynchronous conversion.

**Arguments:**

conversion\_id: The conversion ID returned by run\_async().

**Returns:**

The current conversion status.

run [#run]

```python
def run(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionResult
```

Convert a document asynchronously.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The conversion result with extracted data.

run_async [#run_async]

```python
def run_async(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionStatus
```

Start an asynchronous document conversion.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial conversion status with conversion\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

Convert [#convert]

Synchronous document conversion operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data)
```

**Arguments:**

client: The parent client instance.

**Methods:**

get_status [#get_status-1]

```python
def get_status(self, conversion_id: str) -> ConversionStatus
```

Get the status of an asynchronous conversion.

**Arguments:**

conversion\_id: The conversion ID returned by run\_async().

**Returns:**

The current conversion status.

**Example:**

```python
>>> status = client.convert.get_status("conv_abc123")
    >>> if status.is_success():
    ...     print(status.data)
```

run [#run-1]

```python
def run(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionResult
```

Convert a document synchronously.

Sends a document to the API and waits for the conversion result.
This is suitable for small documents that process quickly.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The conversion result with extracted data.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
AuthenticationError: If the API key is invalid.

**Example:**

```python
>>> result = client.convert.run(
    ...     file=Path("invoice.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(result.data["total"])
```

run_async [#run_async-1]

```python
def run_async(self, document_type_code: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> ConversionStatus
```

Start an asynchronous document conversion.

Initiates a conversion job and returns immediately with a conversion ID.
Use get\_status() to poll for completion, or call wait() on the result.

**Arguments:**

document\_type\_code: The document type code to use for conversion.
file: File to convert (Path, bytes, or file-like object).
url: URL of the document to convert (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial conversion status with conversion\_id.

**Example:**

```python
>>> status = client.convert.run_async(
    ...     file=Path("large_document.pdf"),
    ...     document_type_code="invoice"
    ... )
    >>> print(f"Conversion ID: {status.conversion_id}")
    >>> # Poll for completion
    >>> final = status.wait()
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Document Types (https://docs.docutray.com/docs/python-sdk/resources/document_types)


Document Types resource for document type catalog operations.

AsyncDocumentTypes [#asyncdocumenttypes]

Asynchronous document type operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     page = await client.document_types.list()
    ...     for doc_type in page.data:
    ...         print(f"{doc_type.codeType}: {doc_type.name}")
    >>>
    >>> # Iterate through all document types across pages
    >>> async for doc_type in (await client.document_types.list()).auto_paging_iter_async():
    ...     print(doc_type.name)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

get [#get]

```python
def get(self, type_id: str) -> DocumentType
```

Get a specific document type by ID.

**Arguments:**

type\_id: The document type ID.

**Returns:**

The document type details including schema.

list [#list]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> AsyncPage[DocumentType]
```

List available document types.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page. Defaults to server default.
search: Search term to filter document types by name.

**Returns:**

An AsyncPage of document types with pagination support.

validate [#validate]

```python
def validate(self, type_id: str, data: dict[str, Any]) -> ValidationResult
```

Validate JSON data against a document type's schema.

**Arguments:**

type\_id: The document type ID to validate against.
data: The JSON data to validate.

**Returns:**

Validation result with errors and warnings.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

DocumentTypes [#documenttypes]

Synchronous document type operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> page = client.document_types.list()
    >>> for doc_type in page.data:
    ...     print(f"{doc_type.codeType}: {doc_type.name}")
    >>>
    >>> # Iterate through all document types across pages
    >>> for doc_type in client.document_types.list().auto_paging_iter():
    ...     print(doc_type.name)
```

**Arguments:**

client: The parent client instance.

**Methods:**

get [#get-1]

```python
def get(self, type_id: str) -> DocumentType
```

Get a specific document type by ID.

**Arguments:**

type\_id: The document type ID.

**Returns:**

The document type details including schema.

**Raises:**

NotFoundError: If the document type doesn't exist.

**Example:**

```python
>>> doc_type = client.document_types.get("dt_abc123")
    >>> print(f"Name: {doc_type.name}")
    >>> print(f"Schema: {doc_type.schema_}")
```

list [#list-1]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> Page[DocumentType]
```

List available document types.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page. Defaults to server default.
search: Search term to filter document types by name.

**Returns:**

A Page of document types with pagination support.

**Example:**

```python
>>> # List all document types
    >>> page = client.document_types.list()
    >>> for doc_type in page.data:
    ...     print(doc_type.name)
    >>>
    >>> # Iterate through all pages
    >>> for page in client.document_types.list().iter_pages():
    ...     print(f"Page {page.page}: {len(page.data)} items")
    >>>
    >>> # Iterate through all items automatically
    >>> for doc_type in client.document_types.list().auto_paging_iter():
    ...     print(doc_type.name)
    >>>
    >>> # Search for specific types
    >>> page = client.document_types.list(search="invoice")
```

validate [#validate-1]

```python
def validate(self, type_id: str, data: dict[str, Any]) -> ValidationResult
```

Validate JSON data against a document type's schema.

This validates extracted data to check if it conforms to the
document type's expected structure and requirements.

**Arguments:**

type\_id: The document type ID to validate against.
data: The JSON data to validate.

**Returns:**

Validation result with errors and warnings.

**Example:**

```python
>>> result = client.document_types.validate(
    ...     "dt_invoice",
    ...     {"invoice_number": "INV-001", "total": 100}
    ... )
    >>> if result.is_valid():
    ...     print("Data is valid!")
    >>> else:
    ...     for error in result.errors.messages:
    ...         print(f"Error: {error}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Identify (https://docs.docutray.com/docs/python-sdk/resources/identify)


Identify resource for document type identification operations.

AsyncIdentify [#asyncidentify]

Asynchronous document identification operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     result = await client.identify.run(file=Path("document.pdf"))
    ...     print(f"Type: {result.document_type.code}")
```

**Arguments:**

client: The parent async client instance.

**Methods:**

get_status [#get_status]

```python
def get_status(self, identification_id: str) -> IdentificationStatus
```

Get the status of an asynchronous identification.

**Arguments:**

identification\_id: The identification ID returned by run\_async().

**Returns:**

The current identification status.

run [#run]

```python
def run(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationResult
```

Identify the type of a document asynchronously.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The identification result with document type and alternatives.

run_async [#run_async]

```python
def run_async(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationStatus
```

Start an asynchronous document identification.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The initial identification status with identification\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

Identify [#identify]

Synchronous document identification operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> result = client.identify.run(file=Path("document.pdf"))
    >>> print(f"Type: {result.document_type.code}")
    >>> print(f"Confidence: {result.document_type.confidence}")
```

**Arguments:**

client: The parent client instance.

**Methods:**

get_status [#get_status-1]

```python
def get_status(self, identification_id: str) -> IdentificationStatus
```

Get the status of an asynchronous identification.

**Arguments:**

identification\_id: The identification ID returned by run\_async().

**Returns:**

The current identification status.

**Example:**

```python
>>> status = client.identify.get_status("id_abc123")
    >>> if status.is_success():
    ...     print(status.document_type.name)
```

run [#run-1]

```python
def run(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationResult
```

Identify the type of a document synchronously.

Sends a document to the API and returns the identified document type
with confidence scores.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to. If provided, the API will only consider
these document types when identifying.

**Returns:**

The identification result with document type and alternatives.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
AuthenticationError: If the API key is invalid.

**Example:**

```python
>>> result = client.identify.run(file=Path("unknown.pdf"))
    >>> print(f"Identified as: {result.document_type.name}")
    >>> for alt in result.alternatives:
    ...     print(f"  Alternative: {alt.name} ({alt.confidence:.2%})")

    >>> # Limit to specific document types
    >>> result = client.identify.run(
    ...     file=Path("statement.pdf"),
    ...     document_type_code_options=["cartola_cc", "cartola_tc"]
    ... )
```

run_async [#run_async-1]

```python
def run_async(self, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_type_code_options: list[str] | None = None) -> IdentificationStatus
```

Start an asynchronous document identification.

Initiates an identification job and returns immediately with an ID.
Use get\_status() to poll for completion.

**Arguments:**

file: File to identify (Path, bytes, or file-like object).
url: URL of the document to identify (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_type\_code\_options: List of document type codes to limit
identification to.

**Returns:**

The initial identification status with identification\_id.

**Example:**

```python
>>> status = client.identify.run_async(file=Path("document.pdf"))
    >>> final = status.wait()
    >>> print(f"Type: {final.document_type.code}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Knowledge Bases (https://docs.docutray.com/docs/python-sdk/resources/knowledge_bases)


Knowledge Bases resource for semantic document search operations.

AsyncKnowledgeBaseDocuments [#asyncknowledgebasedocuments]

Asynchronous document operations for a knowledge base.

**Arguments:**

client: The parent async client instance.
knowledge\_base\_id: The knowledge base ID.

**Methods:**

create [#create]

```python
def create(self, content: dict[str, Any], document_id: str | None = None, metadata: dict[str, Any] | None = None, generate_embedding: bool = True) -> KnowledgeBaseDocument
```

Add a document to the knowledge base.

**Arguments:**

content: Document content matching the knowledge base schema.
document\_id: Optional external document reference ID.
metadata: Optional additional metadata.
generate\_embedding: Whether to automatically generate embedding. Defaults to True.

**Returns:**

The created document.

delete [#delete]

```python
def delete(self, document_id: str) -> None
```

Delete a document from the knowledge base.

**Arguments:**

document\_id: The document ID to delete.

get [#get]

```python
def get(self, document_id: str) -> KnowledgeBaseDocument
```

Get a specific document by ID.

**Arguments:**

document\_id: The document ID.

**Returns:**

The document details.

list [#list]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> AsyncPage[KnowledgeBaseDocument]
```

List documents in the knowledge base.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter documents.

**Returns:**

An AsyncPage of documents with pagination support.

update [#update]

```python
def update(self, document_id: str, content: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None, regenerate_embedding: bool = False) -> KnowledgeBaseDocument
```

Update a document in the knowledge base.

**Arguments:**

document\_id: The document ID to update.
content: Updated document content.
metadata: Updated metadata.
regenerate\_embedding: Whether to force embedding regeneration. Defaults to False.

**Returns:**

The updated document.

AsyncKnowledgeBases [#asyncknowledgebases]

Asynchronous knowledge base operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     async for kb in (await client.knowledge_bases.list()).auto_paging_iter_async():
    ...         print(f"{kb.name}: {kb.documentCount} documents")
```

**Arguments:**

client: The parent async client instance.

**Methods:**

create [#create-1]

```python
def create(self, name: str, description: str, schema: dict[str, Any], indexing_preferences: dict[str, Any] | None = None) -> KnowledgeBase
```

Create a new knowledge base.

**Arguments:**

name: Unique name for the knowledge base.
description: Description of the knowledge base.
schema: JSON schema for documents in this knowledge base.
indexing\_preferences: Optional indexing configuration.

**Returns:**

The created knowledge base.

delete [#delete-1]

```python
def delete(self, knowledge_base_id: str) -> None
```

Delete a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to delete.

documents [#documents]

```python
def documents(self, knowledge_base_id: str) -> AsyncKnowledgeBaseDocuments
```

Access document operations for a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

An AsyncKnowledgeBaseDocuments instance for document operations.

get [#get-1]

```python
def get(self, knowledge_base_id: str) -> KnowledgeBase
```

Get a specific knowledge base by ID.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

The knowledge base details.

list [#list-1]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None, is_active: bool | None = None) -> AsyncPage[KnowledgeBase]
```

List knowledge bases.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter by name or description.
is\_active: Filter by active status.

**Returns:**

An AsyncPage of knowledge bases with pagination support.

search [#search]

```python
def search(self, knowledge_base_id: str, query: str, limit: int | None = None, similarity_threshold: float | None = None, include_metadata: bool | None = None) -> SearchResult
```

Perform semantic search in a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to search.
query: Search query text.
limit: Maximum number of results (1-50).
similarity\_threshold: Minimum similarity score (0-1).
include\_metadata: Include document metadata in results.

**Returns:**

Search results with similarity scores.

sync [#sync]

```python
def sync(self, knowledge_base_id: str, regenerate_embeddings: bool | None = None) -> SyncResult
```

Trigger manual synchronization of a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to sync.
regenerate\_embeddings: Whether to regenerate all embeddings.

**Returns:**

The sync operation result.

update [#update-1]

```python
def update(self, knowledge_base_id: str, name: str | None = None, description: str | None = None, is_active: bool | None = None) -> KnowledgeBase
```

Update a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to update.
name: New name for the knowledge base.
description: New description.
is\_active: Active status.

**Returns:**

The updated knowledge base.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

KnowledgeBaseDocuments [#knowledgebasedocuments]

Synchronous document operations for a knowledge base.

**Arguments:**

client: The parent client instance.
knowledge\_base\_id: The knowledge base ID.

**Methods:**

create [#create-2]

```python
def create(self, content: dict[str, Any], document_id: str | None = None, metadata: dict[str, Any] | None = None, generate_embedding: bool = True) -> KnowledgeBaseDocument
```

Add a document to the knowledge base.

**Arguments:**

content: Document content matching the knowledge base schema.
document\_id: Optional external document reference ID.
metadata: Optional additional metadata.
generate\_embedding: Whether to automatically generate embedding. Defaults to True.

**Returns:**

The created document.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").create(
    ...     content={"title": "User Guide", "text": "..."},
    ...     metadata={"source": "manual"}
    ... )
    >>> print(f"Created: {doc.id}")
```

delete [#delete-2]

```python
def delete(self, document_id: str) -> None
```

Delete a document from the knowledge base.

**Arguments:**

document\_id: The document ID to delete.

**Raises:**

NotFoundError: If the document doesn't exist.

**Example:**

```python
>>> client.knowledge_bases.documents("kb_123").delete("doc_456")
```

get [#get-2]

```python
def get(self, document_id: str) -> KnowledgeBaseDocument
```

Get a specific document by ID.

**Arguments:**

document\_id: The document ID.

**Returns:**

The document details.

**Raises:**

NotFoundError: If the document doesn't exist.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").get("doc_456")
    >>> print(doc.content)
```

list [#list-2]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None) -> Page[KnowledgeBaseDocument]
```

List documents in the knowledge base.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter documents.

**Returns:**

A Page of documents with pagination support.

**Example:**

```python
>>> docs = client.knowledge_bases.documents("kb_123").list()
    >>> for doc in docs.auto_paging_iter():
    ...     print(doc.id)
```

update [#update-2]

```python
def update(self, document_id: str, content: dict[str, Any] | None = None, metadata: dict[str, Any] | None = None, regenerate_embedding: bool = False) -> KnowledgeBaseDocument
```

Update a document in the knowledge base.

**Arguments:**

document\_id: The document ID to update.
content: Updated document content.
metadata: Updated metadata.
regenerate\_embedding: Whether to force embedding regeneration. Defaults to False.

**Returns:**

The updated document.

**Example:**

```python
>>> doc = client.knowledge_bases.documents("kb_123").update(
    ...     "doc_456",
    ...     content={"title": "Updated Guide", "text": "..."}
    ... )
```

KnowledgeBases [#knowledgebases]

Synchronous knowledge base operations.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> # List knowledge bases
    >>> for kb in client.knowledge_bases.list().auto_paging_iter():
    ...     print(f"{kb.name}: {kb.documentCount} documents")
    >>>
    >>> # Search in a knowledge base
    >>> results = client.knowledge_bases.search("kb_123", query="authentication")
    >>> for item in results.data:
    ...     print(f"{item.document.id}: {item.similarity:.2%}")
```

**Arguments:**

client: The parent client instance.

**Methods:**

create [#create-3]

```python
def create(self, name: str, description: str, schema: dict[str, Any], indexing_preferences: dict[str, Any] | None = None) -> KnowledgeBase
```

Create a new knowledge base.

**Arguments:**

name: Unique name for the knowledge base.
description: Description of the knowledge base.
schema: JSON schema for documents in this knowledge base.
indexing\_preferences: Optional indexing configuration.

**Returns:**

The created knowledge base.

**Raises:**

ConflictError: If a knowledge base with that name already exists.

**Example:**

```python
>>> kb = client.knowledge_bases.create(
    ...     name="User Documentation",
    ...     description="Product user guides and manuals",
    ...     schema={"type": "object", "properties": {"title": {"type": "string"}}}
    ... )
    >>> print(f"Created: {kb.id}")
```

delete [#delete-3]

```python
def delete(self, knowledge_base_id: str) -> None
```

Delete a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to delete.

**Raises:**

NotFoundError: If the knowledge base doesn't exist.

**Example:**

```python
>>> client.knowledge_bases.delete("kb_123")
```

documents [#documents-1]

```python
def documents(self, knowledge_base_id: str) -> KnowledgeBaseDocuments
```

Access document operations for a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

A KnowledgeBaseDocuments instance for document operations.

**Example:**

```python
>>> docs = client.knowledge_bases.documents("kb_123")
    >>> for doc in docs.list().auto_paging_iter():
    ...     print(doc.content)
```

get [#get-3]

```python
def get(self, knowledge_base_id: str) -> KnowledgeBase
```

Get a specific knowledge base by ID.

**Arguments:**

knowledge\_base\_id: The knowledge base ID.

**Returns:**

The knowledge base details.

**Raises:**

NotFoundError: If the knowledge base doesn't exist.

**Example:**

```python
>>> kb = client.knowledge_bases.get("kb_123")
    >>> print(f"{kb.name}: {kb.description}")
```

list [#list-3]

```python
def list(self, page: int | None = None, limit: int | None = None, search: str | None = None, is_active: bool | None = None) -> Page[KnowledgeBase]
```

List knowledge bases.

**Arguments:**

page: Page number (1-indexed). Defaults to 1.
limit: Number of items per page.
search: Search term to filter by name or description.
is\_active: Filter by active status.

**Returns:**

A Page of knowledge bases with pagination support.

**Example:**

```python
>>> for kb in client.knowledge_bases.list().auto_paging_iter():
    ...     print(f"{kb.name}: {kb.documentCount} documents")
```

search [#search-1]

```python
def search(self, knowledge_base_id: str, query: str, limit: int | None = None, similarity_threshold: float | None = None, include_metadata: bool | None = None) -> SearchResult
```

Perform semantic search in a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to search.
query: Search query text.
limit: Maximum number of results (1-50).
similarity\_threshold: Minimum similarity score (0-1).
include\_metadata: Include document metadata in results.

**Returns:**

Search results with similarity scores.

**Example:**

```python
>>> results = client.knowledge_bases.search(
    ...     "kb_123",
    ...     query="how to configure authentication",
    ...     limit=5
    ... )
    >>> for item in results.data:
    ...     print(f"{item.similarity:.2%}: {item.document.content}")
```

sync [#sync-1]

```python
def sync(self, knowledge_base_id: str, regenerate_embeddings: bool | None = None) -> SyncResult
```

Trigger manual synchronization of a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to sync.
regenerate\_embeddings: Whether to regenerate all embeddings.

**Returns:**

The sync operation result.

**Example:**

```python
>>> result = client.knowledge_bases.sync("kb_123", regenerate_embeddings=True)
    >>> print(f"Sync status: {result.status}")
```

update [#update-3]

```python
def update(self, knowledge_base_id: str, name: str | None = None, description: str | None = None, is_active: bool | None = None) -> KnowledgeBase
```

Update a knowledge base.

**Arguments:**

knowledge\_base\_id: The knowledge base ID to update.
name: New name for the knowledge base.
description: New description.
is\_active: Active status.

**Returns:**

The updated knowledge base.

**Example:**

```python
>>> kb = client.knowledge_bases.update(
    ...     "kb_123",
    ...     description="Updated documentation"
    ... )
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Steps (https://docs.docutray.com/docs/python-sdk/resources/steps)


Steps resource for step execution operations.

AsyncSteps [#asyncsteps]

Asynchronous step execution operations.

**Example:**

```python
>>> async with AsyncClient(api_key="...") as client:
    ...     status = await client.steps.run_async(
    ...         step_id="step_extraction",
    ...         file=Path("document.pdf")
    ...     )
    ...     result = await status.wait()
    ...     print(result.data)
```

**Arguments:**

client: The parent async client instance.

**Methods:**

get_status [#get_status]

```python
def get_status(self, execution_id: str) -> StepExecutionStatus
```

Get the status of a step execution.

**Arguments:**

execution\_id: The execution ID returned by run\_async().

**Returns:**

The current execution status.

run_async [#run_async]

```python
def run_async(self, step_id: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> StepExecutionStatus
```

Execute a step asynchronously.

**Arguments:**

step\_id: The ID of the step to execute.
file: File to process (Path, bytes, or file-like object).
url: URL of the document to process (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial execution status with execution\_id.

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.

Steps [#steps]

Synchronous step execution operations.

Steps allow executing predefined document processing workflows.

**Example:**

```python
>>> client = Client(api_key="...")
    >>> status = client.steps.run_async(
    ...     step_id="step_extraction",
    ...     file=Path("document.pdf")
    ... )
    >>> result = status.wait()
    >>> print(result.data)
```

**Arguments:**

client: The parent client instance.

**Methods:**

get_status [#get_status-1]

```python
def get_status(self, execution_id: str) -> StepExecutionStatus
```

Get the status of a step execution.

**Arguments:**

execution\_id: The execution ID returned by run\_async().

**Returns:**

The current execution status.

**Example:**

```python
>>> status = client.steps.get_status("exec_abc123")
    >>> if status.is_success():
    ...     print(status.data)
```

run_async [#run_async-1]

```python
def run_async(self, step_id: str, file: FileInput | None = None, url: str | None = None, file_base64: str | None = None, content_type: str | None = None, document_metadata: dict[str, Any] | None = None) -> StepExecutionStatus
```

Execute a step asynchronously.

Initiates step execution and returns immediately with an execution ID.
Use get\_status() to poll for completion.

**Arguments:**

step\_id: The ID of the step to execute.
file: File to process (Path, bytes, or file-like object).
url: URL of the document to process (alternative to file).
file\_base64: Base64-encoded document (alternative to file).
content\_type: Content type of the file. Auto-detected if not provided.
document\_metadata: Additional metadata to include with the document.

**Returns:**

The initial execution status with execution\_id.

**Raises:**

ValueError: If no file input is provided.
BadRequestError: If the request is invalid.
NotFoundError: If the step doesn't exist.

**Example:**

```python
>>> status = client.steps.run_async(
    ...     "step_invoice_extraction",
    ...     file=Path("invoice.pdf")
    ... )
    >>> print(f"Execution ID: {status.execution_id}")
```

**Properties:**

* `with_raw_response`
  : Access methods that return raw HTTP responses.


---

# Convert Types (https://docs.docutray.com/docs/python-sdk/types/convert)


Types for document conversion operations.

ConversionResult [#conversionresult]

Result of a synchronous document conversion.

**Fields:**

* `data`: `dict[str, Any]` - Extracted data according to the document type JSON schema.

* `model_config`: `Any`

ConversionStatus [#conversionstatus]

Status of an asynchronous document conversion.

**Fields:**

* `conversion_id`: `str` - Unique conversion ID.

* `data`: `dict[str, Any] | None` - Extracted data (only present when status is SUCCESS).

* `document_type_code`: `str | None` - Document type code used for conversion.

* `error`: `str | None` - Error message (only present when status is ERROR).

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when conversion was started.

* `response_timestamp`: `datetime | None` - Timestamp when conversion was completed (only for SUCCESS/ERROR).

* `status`: `ConversionStatusType` - Current conversion status.

* `status_url`: `str | None` - URL to check conversion status.


---

# Document Type Types (https://docs.docutray.com/docs/python-sdk/types/document_type)


Types for document type operations.

DocumentType [#documenttype]

A document type definition.

**Fields:**

* `codeType`: `str` - Unique document type code.

* `createdAt`: `datetime | None` - Creation timestamp.

* `description`: `str | None` - Document type description.

* `id`: `str` - Unique document type ID.

* `isDraft`: `bool` - Indicates if the document type is a draft.

* `isPublic`: `bool` - Indicates if the document type is public.

* `model_config`: `Any`

* `name`: `str` - Document type name.

* `schema_`: `dict[str, Any] | None` - JSON schema for the document type (when retrieved by ID).

* `updatedAt`: `datetime | None` - Last update timestamp.

ValidationErrorInfo [#validationerrorinfo]

Validation error information.

**Fields:**

* `count`: `int` - Total number of errors found.

* `messages`: `list[str]` - List of descriptive error messages.

* `model_config`: `Any`

ValidationResult [#validationresult]

Result of JSON validation against a document type schema.

**Fields:**

* `errors`: `ValidationErrorInfo` - Validation errors.

* `model_config`: `Any`

* `warnings`: `ValidationWarningInfo` - Validation warnings.

ValidationWarningInfo [#validationwarninginfo]

Validation warning information.

**Fields:**

* `count`: `int` - Total number of warnings found.

* `messages`: `list[str]` - List of descriptive warning messages.

* `model_config`: `Any`


---

# Identify Types (https://docs.docutray.com/docs/python-sdk/types/identify)


Types for document identification operations.

DocumentTypeMatch [#documenttypematch]

A matched document type with confidence score.

**Fields:**

* `code`: `str` - Document type code.

* `confidence`: `float` - Confidence score (0-1).

* `model_config`: `Any`

* `name`: `str` - Document type name.

IdentificationResult [#identificationresult]

Result of a synchronous document identification.

**Fields:**

* `alternatives`: `list[DocumentTypeMatch]` - Alternative document types with their confidence levels.

* `document_type`: `DocumentTypeMatch` - Primary identified document type.

* `model_config`: `Any`

IdentificationStatus [#identificationstatus]

Status of an asynchronous document identification.

**Fields:**

* `alternatives`: `list[DocumentTypeMatch] | None` - Alternative document types (only present when status is SUCCESS).

* `document_type`: `DocumentTypeMatch | None` - Primary identified document type (only present when status is SUCCESS).

* `error`: `str | None` - Error message (only present when status is ERROR).

* `identification_id`: `str` - Unique identification ID.

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when identification was started.

* `response_timestamp`: `datetime | None` - Timestamp when identification was completed (only for SUCCESS/ERROR).

* `status`: `IdentificationStatusType` - Current identification status.

* `status_url`: `str | None` - URL to check identification status.


---

# Knowledge Base Types (https://docs.docutray.com/docs/python-sdk/types/knowledge_base)


Types for knowledge base operations.

KnowledgeBase [#knowledgebase]

A knowledge base for semantic document search.

**Fields:**

* `createdAt`: `datetime | None` - Timestamp when the knowledge base was created.

* `description`: `str | None` - Description of the knowledge base.

* `documentCount`: `int | None` - Number of documents in the knowledge base.

* `id`: `str` - Unique knowledge base ID.

* `isActive`: `bool` - Whether the knowledge base is active.

* `model_config`: `Any`

* `name`: `str` - Name of the knowledge base.

* `schema_`: `dict[str, Any] | None` - JSON schema for documents in this knowledge base.

* `updatedAt`: `datetime | None` - Timestamp when the knowledge base was last updated.

KnowledgeBaseDocument [#knowledgebasedocument]

A document stored in a knowledge base.

**Fields:**

* `content`: `dict[str, Any]` - Document content matching the knowledge base schema.

* `createdAt`: `datetime | None` - Timestamp when the document was added.

* `documentId`: `str | None` - External document reference ID.

* `id`: `str` - Unique document ID within the knowledge base.

* `metadata`: `dict[str, Any] | None` - Additional metadata for the document.

* `model_config`: `Any`

* `updatedAt`: `datetime | None` - Timestamp when the document was last updated.

SearchResult [#searchresult]

Result of a semantic search operation.

**Fields:**

* `data`: `list[SearchResultItem]` - List of matching documents with similarity scores.

* `model_config`: `Any`

* `query`: `str | None` - The processed search query.

* `resultsCount`: `int` - Total number of results returned.

SearchResultItem [#searchresultitem]

A single search result with similarity score.

**Fields:**

* `document`: `KnowledgeBaseDocument` - The matched document.

* `model_config`: `Any`

* `similarity`: `float` - Similarity score (0-1), higher is more similar.

SyncResult [#syncresult]

Result of a knowledge base synchronization operation.

**Fields:**

* `completedAt`: `datetime | None` - Timestamp when sync completed.

* `documentsProcessed`: `int | None` - Number of documents processed during sync.

* `errors`: `list[str] | None` - Any errors encountered during sync.

* `model_config`: `Any`

* `startedAt`: `datetime | None` - Timestamp when sync started.

* `status`: `str` - Sync status (e.g., 'started', 'completed', 'failed').

* `syncId`: `str | None` - Unique sync operation ID.


---

# Shared Types (https://docs.docutray.com/docs/python-sdk/types/shared)


Shared types used across multiple resources.

APIResponse [#apiresponse]

Base class for API responses with common fields.

**Fields:**

* `model_config`: `Any`

ErrorDetail [#errordetail]

Error detail information.

**Fields:**

* `errors`: `list[str] | None` - List of specific validation errors.

* `message`: `str` - Error message.

* `model_config`: `Any`

PaginatedResponse [#paginatedresponse]

Generic paginated response wrapper.

**Fields:**

* `data`: `list[T]` - List of items in the current page.

* `model_config`: `Any`

* `pagination`: `Pagination` - Pagination metadata.

Pagination [#pagination]

Pagination information for list responses.

**Fields:**

* `limit`: `int` - Number of items per page.

* `model_config`: `Any`

* `page`: `int` - Current page number (1-indexed).

* `total`: `int` - Total number of items matching the query.


---

# Step Types (https://docs.docutray.com/docs/python-sdk/types/step)


Types for step execution operations.

StepExecutionStatus [#stepexecutionstatus]

Status of an asynchronous step execution.

**Fields:**

* `data`: `dict[str, Any] | None` - Result data (only present when status is SUCCESS).

* `error`: `str | dict[str, Any] | None` - Error message or details (only present when status is ERROR).

* `execution_id`: `str` - Unique execution ID.

* `model_config`: `Any`

* `original_filename`: `str | None` - Original filename of the processed file.

* `request_timestamp`: `datetime | None` - Timestamp when execution was started.

* `response_timestamp`: `datetime | None` - Timestamp when execution was completed (only for SUCCESS/ERROR).

* `status`: `StepExecutionStatusType` - Current execution status.

* `step_id`: `str | None` - Step ID that was executed.


---

# Start asynchronous document conversion using OCR (https://docs.docutray.com/docs/api/conversion/convertDocumentAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Check asynchronous conversion status (https://docs.docutray.com/docs/api/conversion/getConversionStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Convert documents to structured data using OCR (https://docs.docutray.com/docs/api/conversion/convertDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Get document type by ID (https://docs.docutray.com/docs/api/document-types/getDocumentType)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Validate document against document type (https://docs.docutray.com/docs/api/document-types/validateDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# List accessible document types (https://docs.docutray.com/docs/api/document-types/listDocumentTypes)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Start asynchronous document type identification (https://docs.docutray.com/docs/api/identification/identifyDocumentAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Check asynchronous identification status (https://docs.docutray.com/docs/api/identification/getIdentificationStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Identify document type from an image (https://docs.docutray.com/docs/api/identification/identifyDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Get specific document from Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-documents/getKnowledgeBaseDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Delete document from Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-documents/deleteKnowledgeBaseDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Update document in Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-documents/updateKnowledgeBaseDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Bulk upload documents with real-time progress (https://docs.docutray.com/docs/api/knowledge-base-documents/bulkUploadKnowledgeBaseDocuments)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# List documents in a Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-documents/listKnowledgeBaseDocuments)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Add document to Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-documents/uploadKnowledgeBaseDocument)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Get details of a specific Knowledge Base (https://docs.docutray.com/docs/api/knowledge-bases/getKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Delete a Knowledge Base (https://docs.docutray.com/docs/api/knowledge-bases/deleteKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Update a Knowledge Base (https://docs.docutray.com/docs/api/knowledge-bases/updateKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Semantic search in Knowledge Base (https://docs.docutray.com/docs/api/knowledge-base-search/searchKnowledgeBaseGet)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Advanced semantic search (https://docs.docutray.com/docs/api/knowledge-base-search/searchKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Manual Knowledge Base synchronization (https://docs.docutray.com/docs/api/knowledge-base-sync/syncKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# List organization's Knowledge Bases (https://docs.docutray.com/docs/api/knowledge-bases/listKnowledgeBases)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Create new Knowledge Base (https://docs.docutray.com/docs/api/knowledge-bases/createKnowledgeBase)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Execute a document processing step asynchronously (https://docs.docutray.com/docs/api/steps/executeStepAsync)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api

---

# Get step execution status (https://docs.docutray.com/docs/api/steps/getStepExecutionStatus)

API endpoint documentation. See the full API reference at https://docs.docutray.com/docs/api