> ## Documentation Index
> Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Set up an OCR skill

> How to create and configure OCR skills for text extraction and document export.

To create an OCR skill, follow these steps:

<Steps>
  <Step title="Create a new OCR skill">
    In the **Skill Catalog**, click the **Create** button in the toolbar.
  </Step>

  <Step title="Select the OCR skill type">
    Select the **OCR Skill** skill type. The **Create OCR Skill** dialog box opens.
  </Step>

  <Step title="Configure General settings">
    On the **General** tab:

    * Enter a name and description for the new skill.
    * Select the Technology Core version. We recommend selecting the latest version.

    **Image Processing**

    Under **Image Processing**, open the **PDF Processing Mode** dropdown and select the processing mode for PDF documents:

    * **Default** (Recommended) — Uses the embedded PDF text layer when available and supplements with OCR as needed. This is the default setting.
    * **Use Text Layer Only** — Extracts text from the embedded PDF text layer. If no text layer exists, Vantage falls back to OCR automatically.
    * **Use OCR Only** — Ignores any embedded PDF text layer and performs full OCR on the document.

    <Note>
      For more information about each mode and guidance on which to choose, see [PDF Processing Mode](/vantage/documentation/skill-designer/ocr-skill/pdf-processing-mode).
    </Note>
  </Step>

  <Step title="Select recognition languages">
    On the **Languages** tab:

    * In the **Allowed Languages** section, select one or several document recognition languages. During processing, the document language will be automatically chosen from the languages specified during setup. Note that the number of selected languages may affect recognition speed and quality.
    * If the document contains handwritten text, enable the **Handwritten** option in the **Text Appearance** section.
  </Step>

  <Step title="Configure image enhancements">
    On the **Image Enhancements** tab, **Crop Image** and **Correct Page Orientation** are enabled by default. Optionally, turn off these features if your documents do not require them.

    * **Crop Image** crops the image to the edges of the original document.
    * **Correct Page Orientation** automatically rotates the image to restore its original orientation.
  </Step>

  <Step title="Enable barcode recognition">
    On the **Barcodes** tab, enable the barcode types that may appear on your documents. The number of selected barcode types can affect recognition speed. 

    If you don't need to recognize any barcodes, click the button with the number of selected options in the header of the **Barcode Types** table and click **Deselect all**.

    <Warning>
      If you are processing several document files using an OCR skill as part of a single transaction, all files will be merged into one. As a result, the number of output files will be identical to the specified number of export formats.
    </Warning>
  </Step>

  <Step title="Choose export formats">
    On the **Export** tab, select one or several document export formats.
  </Step>
</Steps>

## Available export formats

* **JSON** (default format):
  * **Text only** (default option). The exported JSON file will only contain recognized text without preserving the document layout. If you select **Text only**, you cannot export to DOCX, XLSX, and PPTX.
  * **Preserve document structure.** The exported JSON file will contain recognized text and the document layout will be preserved as well.
* **XML**:
  * **Text only**. The exported XML file will only contain recognized text. The document layout will not be preserved. If you select **Text only**, you cannot export to DOCX, XLSX, and PPTX.
  * **Preserve document structure**. The exported XML file will contain recognized text and the document layout will be preserved.
* **ALTOXML**:
  * **Text only**. The exported ALTO XML file will only contain recognized text; the document layout will not be preserved. If you select **Text only**, you cannot export to DOCX, XLSX, and PPTX.
  * **Preserve document structure**. The exported ALTO XML file will contain recognized text and the document layout will be preserved.

<Warning>
  Export options (**Text only** and **Preserve document structure**) for JSON, XML, and ALTOXML cannot be different. If you specify another export option for one of these formats, this option will be applied for the other formats.
</Warning>

* **PDF**:
  * PDF/A-3a (the default PDF export format)
  * PDF/A-3b
  * Image-only. Non-editable PDF in PDF/A-3b standard

<Tip>
  For each PDF export option, choose between "smaller file size" (default option) and "maximum quality". Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.
</Tip>

* **TXT**
* **DOCX** (Microsoft Word):
  * **Editable**. The exported DOCX file preserves the original format and text flow but at the same time allows for easy editing. The output document may differ from the original image.
  * **Exact**. The exported DOCX file maintains the formatting of the original document. This may limit the changes that can be made to the text and formatting of the output document.
* **XLSX** (Microsoft Excel)
* **TIFF**
* **JPEG**:
  * **Maximum quality**. The exported file contains a JPEG with a compression level of 95%.
  * **Reduced size**. The exported file contains a JPEG with a compression level of 75%.
* **PPTX** (Microsoft PowerPoint)
* **HTML**

## Related topics

<CardGroup cols={3}>
  <Card title="OCR skill" icon="magnifying-glass-text" href="/vantage/documentation/skill-designer/ocr-skill/ocr-skill">
    Overview of the OCR skill and what it can extract.
  </Card>

  <Card title="PDF Processing Mode" icon="file-pdf" href="/vantage/documentation/skill-designer/ocr-skill/pdf-processing-mode">
    Control whether Vantage uses the embedded PDF text layer, OCR, or a combination.
  </Card>

  <Card title="OCR activity" icon="magnifying-glass" href="/vantage/documentation/skill-designer/process/ocr-activity">
    Run an OCR skill as part of a Process skill workflow.
  </Card>

  <Card title="Data export formats" icon="file-lines" href="/vantage/documentation/skill-designer/process/output-activity/export-formats">
    Reference for every export format and option.
  </Card>

  <Card title="Technology Core versions" icon="microchip" href="/vantage/documentation/technology-core-versions">
    Choose the engine version that powers a skill.
  </Card>
</CardGroup>
