Skip to main content

Introduction

When installing ABBYY Vantage, the number of services and workers depends on the load. ABBYY Vantage will automatically scale the services and workers to optimize document processing. This guide contains information about the resources that ABBYY Vantage will require depending on the load, as well as recommendations for the System Administrator regarding the correct ways to provide these resources to ABBYY Vantage.

Reference Configurations

Resource consumption depends on your document processing scenario: the type of documents being processed, the skill being used, and the page load (that is, the number of pages processed within a certain time period). The reference Highly available configuration was tested while processing 3-page and 50-page invoices using the default Process skill with the following loads:
  • 50,000 pages per 8 hours
  • 100,000 pages per 8 hours
  • 150,000 pages per 8 hours
  • 200,000 pages per 8 hours
The reference Without high availability configuration was tested while processing 3-page invoices using the default Process skill with the following loads:
  • 10,000 pages per 8 hours
  • 30,000 pages per 8 hours
  • 50,000 pages per 8 hours
The Without high availability configuration doesn’t support training skills with the Deep Learning activity.
During the tests of the reference configurations, files were fed to the system via the REST API. The default Process skill with the following workflow was used:
  1. Import files.
  2. Recognize documents.
  3. Classify and determine document types.
  4. Extract data from documents.
  5. Export data to JSON.

Node Types

Node typeCPU cores (for each node)RAM, GB (for each node)Disk size, GB
Service nodes1248120*
Worker nodes1248120
*The disk size requirements outlined above refer to the minimum size requirements, meaning that additional disk space may be required. By default, Vantage installs NFS file storage on virtual machines. In this case, virtual machines that are used as the first service node will require additional disk space depending on the load.

Storage Requirements

ConfigurationStorageStorage locationDisk size, GB
Without high availabilityInternal NFSService node500 (for processing every 10,000 pages per 8 hours)
Without high availabilityExternal NFSNFS server machine500 (for processing every 10,000 pages per 8 hours)
Highly availableExternal NFSNFS server machine50 (for processing every 10,000 pages per 8 hours)
Highly availableLocal persistent volumeFirst service node (from the inventory file)500 (for processing every 10,000 pages per 8 hours)
You may need additional storage if you use big data catalogs, skills with a large number of activities, or export data to shared folders.
We recommend using external storage if the load is greater than 10,000 pages per 8 hours.

Performance Results

Depending on the page load, ABBYY Vantage required the following amount of resources to efficiently process documents in each configuration:

Highly Available Configuration

Load (pages/8 hours)Nodes for services (3-page invoices)Nodes for services (50-page invoices)Nodes for workers (3-page invoices)Nodes for workers (50-page invoices)
50,0004444
100,0004457
150,0004479
200,00044811
During testing, statistics on the input-output operations for the disk used for blob storage were also collected. You can expect that the numbers in your case will not exceed these:

Disk I/O Operations

Load (pages/8 hours)Disk I/O operations/second (3-page invoices)Disk I/O operations/second (50-page invoices)
50,00010050
100,000250100
150,000400170
200,000600230

Without High Availability Configuration

Load (pages/8 hours)Nodes for servicesNodes for workers
10,00011*
30,00013
50,00013
*The configuration with one worker node is intended for testing purposes only and does not support training skills with any activity.
When scaling ABBYY Vantage, no increase in document processing time was noted.

Managing Nodes

The System Administrator can add additional worker nodes to the cluster to increase the required load. For more information on how to prepare a node, see System Requirements.

Adding a Worker Node

To add a worker node, follow these steps:
  1. Open an inventory file from the installation directory.
  2. In the [abbyy_workers] section, add an additional node by specifying its name and IP address.
  3. Run the installer:
docker run -it \
-v $PWD/kube:/root/.kube \
-v $PWD/ssh/ansible:/root/.ssh/ansible \
-v "//var/run/docker.sock:/var/run/docker.sock" \
-v $PWD/inventory:/ansible/inventories/k8s/inventory \
-v $PWD/env_specific.yml:/ansible/inventories/k8s/group_vars/all/env_specific.yml \
-v $PWD/ssl:/ansible/files/ssl:ro \
--privileged \
registry.local/vantage/vantage-k8s:2.7.1
  1. Run the following playbook:
ansible-playbook -i inventories/k8s -v playbooks/4-Kubernetes-k8s.yml