Name | Type | Multiplicity | Parent Tag | Description |
document | Complex Type Type elementspage documentData Type attributesversion — XML version producer — the producer of the XML file pagesCount — (optional) the number of pages in the document mainLanguage — (optional) the main language of the document languages — (optional) all languages of the document | 1 | no | Document. |
page | Complex Type, a sequence of block tags Type elementsblock Type attributeswidth — the image width in pixels height — the image height in pixels resolution — the image resolution in pixels per inch originalCoords — (optional) if true, all coordinates are relative to the original image before opening; otherwise, they are relative to the opened (deskewed) image rotation — (optional) the type of rotation applied to original page image before processing. It can be one of the following values: Normal, RotatedClockwise, RotatedUpsideDown, RotatedCounterclockwise | 0…unbounded | document | Recognized page. |
block | BlockType BlockType elementsAvailability of this or that element depends on the type of the block (see blockType attribute). region — always available text — available only if blockType attribute is “Text” row — available only if blockType attribute is “Table” separatorsBox — available only if blockType attribute is “SeparatorsBox” separator — available only if blockType attribute is “Separator” checkmark — available only if blockType attribute is “Checkmark” groupCheckmark — available only if blockType attribute is “GroupCheckmark” BlockType attributesblockType — the type of the block. It can be one of the following values: Text, Table, Picture, Barcode, Separator, SeparatorsBox, Checkmark, GroupCheckmark blockName — (optional) the name of the block isHidden — (optional) specifies if the block is hidden (the default value is false) l — (optional) the coordinate of the left border of the block t — (optional) the coordinate of the top border of the block r — (optional) the coordinate of the right border of the block b — (optional) the coordinate of the bottom border of the block | 0…unbounded | page | Recognized block. |
region | Complex Type, a sequence of rect tags Type elementsrect Has no type attributes | 1 | block | Block region, a set of rectangles. |
rect | Complex Type Type attributesl — the coordinate of the left border of the rectangle t — the coordinate of the top border of the rectangle r — the coordinate of the right border of the rectangle b — the coordinate of the bottom border of the rectangle | 1…unbounded | region | Rectangle of a block region. |
text | TextType TextType elementspar TextType attributesorientation — (optional) the text orientation. It can be one of the following values: Normal, RotatedClockwise, RotatedUpsidedown, RotatedCounterclockwise (the default value is Normal) backgroundColor — (optional) the background color of the text (the default value is -1, which means that the color is transparent) mirrored — (optional) specifies if the text is mirrored (the default value is false) inverted — (optional) specifies if the text is inverted (the default value is false) | 0…unbounded | block | Text of a recognized text block (presents as an element of block tag, if blockType attribute is “Text”). |
0…unbounded | cell | Text of a table cell. |
par | ParagraphType ParagraphType elementsline ParagraphType attributesdropCapCharsCount — (optional) the number of drop caps in the paragraph (the default value is 0) dropCap-l — (optional) the left coordinate of the drop cap rectangle dropCap-t — (optional) the top coordinate of the drop cap rectangle dropCap-r — (optional) the right coordinate of the drop cap rectangle dropCap-b — (optional) the bottom coordinate of the drop cap rectangle align — (optional) the paragraph aligning. It can be one of the following values: Left, Center, Right, Justified (the default value is Left) leftIndent — (optional) the left paragraph indent (the default value is 0) rightIndent — (optional) the right paragraph indent (the default value is 0) startIndent — (optional) the indent of the first line of the paragraph (default value is 0) lineSpacing — (optional) the spacing between lines (the default value is 0) isListItem — (optional) indicates that the paragraph is part of a list (the default value is false) lstLvl — (optional) the list level lstNum — (optional) the number of the paragraph in the list | 0…unbounded | text | Paragraph of a recognized text. |
line | LineType LineType elementsformatting LineType attributesbaseline — the distance from the base line to the top edge of the page l — the coordinate of the left border of the surrounding rectangle t — the coordinate of the top border of the surrounding rectangle r — the coordinate of the right border of the surrounding rectangle b — the coordinate of the bottom border of the surrounding rectangle | 0…unbounded | par | Line of a paragraph. |
formatting | FormattingType FormattingType groupcharParams or wordRecVariants FormattingType attributeslang — the name of the language ff — (optional) the name of the font fs — (optional) the size of the font bold — (optional) the bold font style (the default value is false) italic — (optional) the italic font style (the default value is false) subscript — (optional) the subscript font effect (the default value is false) superscript — (optional) the superscript font effect (the default value is false) smallcaps — (optional) the small caps font effect (the default value is false) underline — (optional) the underline font effect (the default value is false) strikeout — (optional) the strikeout font effect (the default value is false) color — (optional) the color of the font (the default value is 0) scaling — (optional) the scaling of the font (the default value is 1000) spacing — (optional) the character spacing (the default value is 0) | 0…unbounded | line | Group of characters with uniform formatting. Attributes of characters are alternated with word’s recognition variants. The variants of recognition of the word are written before the word. |
charParams | CharParamsType CharParamsType elementscharRecVariants CharParamsType attributesl — the coordinate of the left border of the character rectangle t — the coordinate of the top border of the character rectangle r — the coordinate of the right border of the character rectangle b — the coordinate of the bottom border of the character rectangle suspicious — (optional) this property set to TRUE means that the character was recognized uncertainly (the default value is false) proofed — (optional) specifies whether spell-checking was performed upon this character (the default value is false) wordStart — deprecated; (optional) this property set to TRUE marks the leftmost character in a word wordFirst — (optional) this property set to TRUE marks the first character in a word wordLeftmost — (optional) this property set to TRUE marks the leftmost character in a word wordFromDictionary — (optional) specifies whether the word was found in the dictionary wordNormal — (optional) specifies whether the word was recognized with either a standard or user-defined language, and that it is not a number or an identifier wordNumeric — (optional) specifies whether the word is a number wordIdentifier — (optional) specifies whether the word is an identifier (abbreviation, URL, etc.) wordPenalty — (optional) penalty for discordance of characters in the word meanStrokeWidth — (optional) the mean width of the stroke in the RLE representation of the word image, expressed in pixels multiplied by 10 charConfidence — (optional) stores the value of character confidence. It is a numerical estimate of the probability that the recognition was correct. However, this number is not guaranteed to be positive, and the only meaningful use of confidence is to compare different recognition variants of the same character serifProbability — (optional) specifies the probability that the character is written with a Serif font isTab — (optional) specifies if the character is a tab tabLeaderCount — (optional) specifies symbols quantity in the tab leader. The quantity is calculated at the synthesis stage considering font and tab width. This attribute is used if isTab=TRUE | 0…unbounded | formatting | Attributes of a single character. |
charRecVariants | Complex Type, a sequence of charRecVariant tags Type elementscharRecVariant Has no type attributes |
| charParams | Variants of a character recognition. |
charRecVariant | CharRecognitionVariant Type attributescharConfidence — (optional) a numerical estimate of the probability that the recognition was correct serifProbability — (optional) probability that a character is written with a Serif font | 0…unbounded | charRecVariants | Variant of a character recognition. |
wordRecVariants | Complex Type, a sequence of wordRecVariant tags Type elementswordRecVariant Has no type attributes |
| formatting | Variants of recognition of the next word. |
wordRecVariant | WordRecognitionVariant type WordRecognitionVariant elementsvariantText WordRecognitionVariant attributeswordFromDictionary — (optional) specifies whether the word was found in the dictionary wordNormal — (optional) specifies whether the word was recognized with a standard or user-defined language, and that it is not a number or an identifier wordNumeric — (optional) specifies whether the word is a number wordIdentifier — (optional) specifies whether the word is an identifier (abbreviation, URL, etc.) wordPenalty — (optional) penalty for discordance of characters in the word meanStrokeWidth — (optional) the mean width of the stroke in the RLE representation of the word image, expressed in pixels multiplied by 10 | 0…unbounded | wordRecVariants | Variant of recognition of the next word. |
variantText | Complex Type, a sequence of charParams tags Type elementscharParams Has no type attributes | 1 | wordRecVariant | Word. |
row | TableRowType TableRowType elementscell Has no type attributes | 0…unbounded | block | Table row (presents if blockType attribute is Table). |
cell | Complex Type, a sequence of TextType tags Type elementstext Type attributescolSpan — (optional) the column span of the cell (the default value is 1) rowSpan — (optional) the row span of the cell (the default value is 1) align — (optional) this property specifies alignment for a tab stop and can be one of the following values: Top, Center, Bottom (the default value is Top) picture — (optional) specifies if the cell contains only a picture (the default value is false) leftBorder — (optional) the table cell left border type. It can be one of the following values: Absent, Unknown, White, Black (the default value is Black) topBorder — (optional) the table cell top border type. It can be one of the following values: Absent, Unknown, White, Black (the default value is Black) rightBorder — (optional) the table cell right border type. It can be one of the following values: Absent, Unknown, White, Black (the default value is Black) bottomBorder — (optional) the table cell bottom border type. It can be one of the following values: Absent, Unknown, White, Black (the default value is Black) width — the width of the cell height — the height of the cell | 0…unbounded | row | Table cell (presents if blockType attribute is Table). |
separatorsBox | Complex Type, a sequence of separator tags Type elementsseparator Has no type attributes | 0…1 | block | Group of separators, presents if blockType attribute is “SeparatorsBox” |
separator | SeparatorBlockType type SeparatorBlockType elementsstart end SeparatorBlockType attributesthickness — specifies the precise width of the separator in pixels type — specifies the type of the separator. It can be one of the following values: Unknown, Black, Dotted | 0…1 | block | Single separator, presents if blockType attribute is “Separator”. |
0…unbounded | separatorsBox | Separator in a group of separators. |
groupCheckmark | Complex Type, a sequence of checkmark tags Type elementscheckmark Has no type attributes | 0…1 | block | Group of checkmarks, presents if blockType attribute is “GroupCheckmark” |
checkmark | CheckmarkBlockType type CheckmarkBlockType attributesconfidence — specifies the confidence of checkmark recognition value — specifies the checkmark state. It can be one of the following values: Unknown, Checked, Unchecked, Corrected | 0…1 | block | Single checkmark, present if blockType attribute is “Checkmark”. |
0…unbounded | checkmarkGroup | Checkmark in a group of checkmarks. |
barcodeInfo | BarcodeInfoType type BarcodeInfoType attributestype — specifies the type of the barcode. It can be one of the following values: - CODE39
- INTERLEAVED25
- EAN13
- CODE128
- EAN8
- PDF417
- CODABAR
- UPCE
- INDUSTRIAL25
- IATA25
- MATRIX25
- CODE93
- POSTNET
- UCC128
- PATCH
- AZTEC
- DATAMATRIX
- QRCODE
- UPCA
- MAXICODE
- CODE32
- FULLASCII
- ROYAL
- KIX
- INTELLIGENT
- AUSTRALIA_POST
- Unknown
supplement — (optional) specifies the type of supplementary barcode. It can be one of the following values: void, 2dig, 5dig | 0…1 | block | Information about barcode, presents if blockType attribute is “Barcode”. |
start | Point type Point attributesx — specifies the horizontal coordinate of the start point of separator y — the vertical coordinate of the start point of separator | 1 | separator | Start point of a separator. |
end | Point type Point attributesx — specifies the horizontal coordinate of the end point of separator y — the vertical coordinate of the end point of separator | 1 | separator | End point of a separator. |
documentData | Complex Type Type elementsparagraphStyles sections Has no type attributes | 0…1 | document | Parameters of paragraph and font styles of the document. |
paragraphStyles | Complex Type, a sequence of paragraphStyle tags Type elementsparagraphStyle Has no type attributes | 0…1 | documentData | Collection of paragraph formatting styles. |
paragraphStyle | ParagraphStyleType Type ParagraphStyleType elementsfontStyle ParagraphStyleType attributesid — the identifier of the paragraph name — the name of the paragraph style mainFontStyleId — the main font style of the paragraph role — the paragraph role. It can be one of the following values: - text
- tableText
- heading
- tableHeading
- pictureCaption
- tableCaption
- contents — table of contents
- footnote
- endnote
- rt — running title
- garb — garbage
- other
- barcode
- headingNumber
roleLevel — (optional) (the default value is -1, which means that the level is not available for this role) align — paragraph alignment. It can be one of the following values: Left, Center, Right, Justified, CjkJustified, ThaiJustified before — (optional) space before the paragraph of this style (the default value is 0) after — (optional) space after the paragraph of this style (the default value is 0) startIndent — (optional) indent of the first line of the paragraph leftIndent — (optional) left indent of the whole paragraph rightIndent — (optional) right indent of the whole paragraph lineSpacing — (optional) line spacing lineSpacingRatio — (optional) line spacing (proportional to the letter height) fixedLineSpacing — (optional) if true, the line spacing in the paragraph does not vary | 0…unbounded | paragraphStyles | Formatting style of a paragraph. |
fontStyle | FontStyleType Type FontStyleType attributesid — the identifier of the font style baseFont — (optional) italic — (optional) if true, the font is italic bold — (optional) if true, the font is bold underline — (optional) if true, the font is underlined strikeout — (optional) if true, the font is strikeout smallcaps — (optional) if true, the font is small caps scaling — (optional) the scaling of the font (the default value is 1000) spacing — (optional) the character spacing (the default value is 0) color — (optional) the color of the font (the default value is 0) backgroundColor — (optional) the background color (the default value is 0) ff — the name of the font fs — the size of the font | 0…unbounded | paragraphStyle | The font style. |
sections | Complex Type, a sequence of section tags Type elementssection Has no type attributes | 0…1 | documentData | The collection of document sections. |
section | SectionType Type SectionType elementsstream Has no type attributes | 0…unbounded | sections | A document section. |
stream | TextStreamType Type TextStreamType elementsmainText elemId TextStreamType attributesrole — (optional) the stream role. It can be one of the following values: garb, text, footnote, incut (the default value is text) vertCjk — (optional) if true, the stream contains vertical CJK text beginPage — the number of page on which the stream begins endPage — (optional) the number of page on which the stream ends | 0…unbounded | section | A sequence of paragraphs and blocks. |
mainText | Complex Type Type attributesrtl — (optional) if true, the text has right-to-left writing direction columnCount — the number of columns | 0…1 | stream |
|
elemId | Complex Type Type attributesid — string ID of the element | 0…unbounded | stream | The ID of a page element. |