How to calculate the number of Processing Stations

To make the most of computing resources, each Station runs multiple processing threads at the same time; the more CPU cores available, the more parallel threads are processed. Since the number of CPU cores varies from computer to computer, it makes sense to count the total number of processing CPU cores in the FlexiCapture System. If there are no bottlenecks in the System, each new processing core makes an equal contribution to overall performance. So you can estimate the contribution of one core, then work out how many cores you need to reach your target performance. The number of pages a core processes in a given time depends greatly on the processing workflow (for example, the number of stages), processing settings (image enhancement, recognition mode, export settings), custom stage implementation (custom engines and script rules, access to external resources), and hardware. If you have no data on any of these yet but need a rough estimate, use the graph below as a baseline — though your project will most likely give different results.

Chart showing the dependence of performance on the number of processing cores as a baseline, plotting thousands of pages processed in 24 hours against processing CPU cores, where the line rises linearly to about 2,000 thousand pages at 100 cores. — Dependence of performance on the number of processing cores

The baseline above uses the “SingleEntryPoint” Demo project (unattended processing, export to PDF files) and, for black-and-white pages, 10-core processing stations at 2.4 GHz with 16 GB RAM, an SSD, and a 1 Gb/s NIC.

Estimate the number of processing cores

Measure how long one core takes to process one page, and then divide your target volume by it.

Configure your project workflow, choose the Processing Station closest in hardware to your production setup, and create a typical batch of images.
Measure how long it takes one core to process one batch. Processing a batch only once is not enough: during the test, FlexiCapture spreads the work across all available cores, so a batch finishes faster than it would in production, where the other cores are busy with other batches. For a reliable figure, create several copies of your typical batch — at least as many as there are cores, and ideally N times more (N ≥ 3) — and process them all at once. The time per batch per core is the total processing time divided by N, which also accounts for cores competing over the Station’s shared resources.
Calculate the number of cores you need:
N = (P × t) / T
where P is the number of pages to process, t is the time to process one page, and T is the time available.

Worked example

An 8-core Processing Station with Hyper-Threading provides 16 logical cores (16 executive processes).
Create 16 × 3 = 48 copies of a typical batch (×3 to reduce measurement error) and process them all at once.
The run takes 15 minutes. Each core processes 3 batches, so one batch takes about 5 minutes.
The batch has 69 pages, so one page takes about 4.35 seconds.
To process 200,000 pages in 8 hours (28,800 seconds): N = (200,000 × 4.35) / 28,800 ≈ 31 cores.
So 2 Processing Stations with 8 cores each and Hyper-Threading (32 logical cores in total) are enough for automatic processing.

Limiting factors

Two factors limit the useful number of processing cores in the System.

Infrastructure load

The total load on the infrastructure may create bottlenecks:

On the FlexiCapture server hardware
On the network
On external shared resources (such as databases and external services) that are requested from custom processing scripts

A bottleneck causes performance saturation: adding another processing core has a negative effect, or no effect at all, on total performance. This guide describes how to design the System to avoid bottlenecks and how to monitor the hardware and infrastructure for them. Even without a clearly detected bottleneck, competition between processing cores over shared resources grows as you add cores. If you expect to use more than 50% of the network’s or FileStorage’s read/write capacity, add 20% to the per-page processing time in the example above — which in turn means you need about 20% more processing cores. To help cores reach external resources faster, use caching. For example, instead of connecting directly to a database, connect it to a FlexiCapture Data Set and request the Data Set from your scripts.

Processing Server capacity

The Processing Server can serve only a limited number of processing cores. This number depends on the average time a core needs to perform a task, which in turn depends greatly on batch size (in pages) and the customization you implement. Typically, with around 10 pages per batch, the Processing Server can serve 120 processing cores. If you create many custom stages with very fast scripts, or process one page per batch, the average task time drops sharply, which can slightly reduce the maximum number of cores the Server can serve. To detect this, monitor the number of free processing cores on the Processing Server. If you have a queue of documents to process but the number of occupied cores has reached saturation and almost never rises, you have hit this limit. To resolve it:

Process the entire batch without splitting it into small tasks where possible (see the Stage Properties in the Workflow settings dialog).
Process pages in bigger portions: increase the average number of pages per batch, merge several custom stages into one, or move the customization into a standard stage — for example, by adding it to a routing event in that stage’s script.

Administrator's Guide

Administration Guide

Performance Guide

Troubleshooting

Cloud

How to calculate the number of Processing Stations

Estimate the number of processing cores

Worked example

Limiting factors

Infrastructure load

Processing Server capacity

​Estimate the number of processing cores

​Worked example

​Limiting factors

​Infrastructure load

​Processing Server capacity

Estimate the number of processing cores

Worked example

Limiting factors

Infrastructure load

Processing Server capacity