Introduction
When installing ABBYY Vantage, the number of services and workers depends on the load. ABBYY Vantage will automatically scale the services and workers to optimize document processing. This guide contains information about the resources that ABBYY Vantage will require depending on the load, as well as recommendations for the System Administrator regarding the correct ways to provide these resources to ABBYY Vantage.
Reference Configurations
Resource consumption depends on your document processing scenario: the type of documents being processed, the skill being used, and the page load (that is, the number of pages processed within a certain time period).
The reference Highly available configuration was tested while processing 3-page and 50-page invoices using the default Process skill with the following loads:
- 50,000 pages per 8 hours
- 100,000 pages per 8 hours
- 150,000 pages per 8 hours
- 200,000 pages per 8 hours
The reference Without high availability configuration was tested while processing 3-page invoices using the default Process skill with the following loads:
- 10,000 pages per 8 hours
- 30,000 pages per 8 hours
- 50,000 pages per 8 hours
The Without high availability configuration doesn’t support training skills with the Deep Learning activity.
During the tests of the reference configurations, files were fed to the system via the REST API. The default Process skill with the following workflow was used:
- Import files.
- Recognize documents.
- Classify and determine document types.
- Extract data from documents.
- Export data to JSON.
Node Types
| Node type | CPU cores (for each node) | RAM, GB (for each node) | Disk size, GB |
|---|
| Service nodes | 12 | 48 | 120* |
| Worker nodes | 12 | 48 | 120 |
*The disk size requirements outlined above refer to the minimum size requirements, meaning that additional disk space may be required. By default, Vantage installs NFS file storage on virtual machines. In this case, virtual machines that are used as the first service node will require additional disk space depending on the load.
Storage Requirements
| Configuration | Storage | Storage location | Disk size, GB |
|---|
| Without high availability | Internal NFS | Service node | 500 (for processing every 10,000 pages per 8 hours) |
| Without high availability | External NFS | NFS server machine | 500 (for processing every 10,000 pages per 8 hours) |
| Highly available | External NFS | NFS server machine | 50 (for processing every 10,000 pages per 8 hours) |
| Highly available | Local persistent volume | First service node (from the inventory file) | 500 (for processing every 10,000 pages per 8 hours) |
You may need additional storage if you use big data catalogs, skills with a large number of activities, or export data to shared folders.
We recommend using external storage if the load is greater than 10,000 pages per 8 hours.
Depending on the page load, ABBYY Vantage required the following amount of resources to efficiently process documents in each configuration:
Highly Available Configuration
| Load (pages/8 hours) | Nodes for services (3-page invoices) | Nodes for services (50-page invoices) | Nodes for workers (3-page invoices) | Nodes for workers (50-page invoices) |
|---|
| 50,000 | 4 | 4 | 4 | 4 |
| 100,000 | 4 | 4 | 5 | 7 |
| 150,000 | 4 | 4 | 7 | 9 |
| 200,000 | 4 | 4 | 8 | 11 |
During testing, statistics on the input-output operations for the disk used for blob storage were also collected. You can expect that the numbers in your case will not exceed these:
Disk I/O Operations
| Load (pages/8 hours) | Disk I/O operations/second (3-page invoices) | Disk I/O operations/second (50-page invoices) |
|---|
| 50,000 | 100 | 50 |
| 100,000 | 250 | 100 |
| 150,000 | 400 | 170 |
| 200,000 | 600 | 230 |
Without High Availability Configuration
| Load (pages/8 hours) | Nodes for services | Nodes for workers |
|---|
| 10,000 | 1 | 1* |
| 30,000 | 1 | 3 |
| 50,000 | 1 | 3 |
*The configuration with one worker node is intended for testing purposes only and does not support training skills with any activity.
When scaling ABBYY Vantage, no increase in document processing time was noted.
Managing Nodes
The System Administrator can add additional worker nodes to the cluster to increase the required load. For more information on how to prepare a node, see System Requirements.
Adding a Worker Node
To add a worker node, follow these steps:
- Open an inventory file from the installation directory.
- In the
[abbyy_workers] section, add an additional node by specifying its name and IP address.
- Run the installer:
docker run -it \
-v $PWD/kube:/root/.kube \
-v $PWD/ssh/ansible:/root/.ssh/ansible \
-v "//var/run/docker.sock:/var/run/docker.sock" \
-v $PWD/inventory:/ansible/inventories/k8s/inventory \
-v $PWD/env_specific.yml:/ansible/inventories/k8s/group_vars/all/env_specific.yml \
-v $PWD/ssl:/ansible/files/ssl:ro \
--privileged \
registry.local/vantage/vantage-k8s:2.7.1
- Run the following playbook:
ansible-playbook -i inventories/k8s -v playbooks/4-Kubernetes-k8s.yml