Full Page OCR Service.
POST {{baseUrl}}/session/services/fullpageocr
The Full Page OCR Real-Time Service will provide full page OCR processing on submitted images or PDF documents and return the OCR content in the specified output type.
Service Properties
Env - Metadata environment identifier. Value is one of
D
,T
orP
. Default value isP
.OcrEngineName - String. This specifies the OCR engine name to use. Currently supported engine is “Advanced”. "Advanced” OCR engine is currently assigned to the “OpenText Capture Recognition Engine” for this release. The default OCR engine is “Advanced”.
AutoRotate - Boolean. This is an optional value specifying whether auto rotation should be enabled for the engine. The default is true.
Country - String. This optional value specifies the country for the engine. The default is
USA
. When passing mutiple values in comma separated list, values must be within countries/languages groups given below:- Greek: Greece, Greek
- Latin and Cyrillic languages: Afrikaans, Albanian, Andorra, Argentina, Australia, Austria, AzerbaijaniCyrillic, AzerbaijaniLatin, AzerbaijanCyrillic, AzerbaijanLatin, Baltic, Basque, Belarus, Belarusian, Belgium, BosnianLatin, BosniaLatin, Brazil, Bulgaria, Bulgarian, Canada, Catalan, CentralAmerica, CentralEurope, Chile, Colombia, Croatia, Croatian, Cyrillic, Czech, CzechLanguage, Danish, Denmark, Dutch, English, Estonia, Estonian, Faroese, Finland, Finnish, France, French, Frisian, German, Germany, GreatBritain, Greece, Greek, Guarani, Hani, Hungarian, Hungary, Iceland, Icelandic, Indonesian, Ireland, Irish, Italian, Italy, JapanLatin, KazakhCyrillic, KazakhLatin, KirghizCyrillic, Kirundi, Latin, Latvia, Latvian, Liechtenstein, Lithuania, Lithuanian, Luxembourg, Luxembourgish, Macedonian, Malay, Mexico, Netherlands, NewZealand, Norway, Norwegian, Poland, Polish, Portugal, Portuguese, Quechua, RhaetoRomanic, Romania, Romanian, Russia, Russian, Rwanda, Scandinavia, SerbianCyrillic, SerbianLatin, SerbiaCyrillic, SerbiaLatin, Shona, Slovak, Slovakia, Slovenia, Slovenian, Somali, Sorbian, SouthAfrica, SouthAmerica, SouthAmericaSpanish, Spain, Spanish, Swahili, Sweden, Swedish, Switzerland, TajikCyrillic, Turkey, Turkish, TurkmenCyrillic, TurkmenLatin, Ukraine, Ukrainian, USA, UzbekCyrillic, UzbekLatin, Venezuela, WesternEurope, Wolof, Xhosa, Zulu
- Chinese: ChineseSimplified, ChineseTraditional
- Chinese Hong Kong: ChineseTraditionalHongKong
- Japanese: Japan, Japanese (cannot both be selected.)
- Korean: Korean
- Thai: Thai, Thailand (English can be added explicitly)
ProcessingMode - String. This optional value specifies proessing mode value for the engine. The default is
VoteOcrAndEText
. This can be one of the following values:- VoteOcrAndEText - Select this option if your input documents contain mixed content. The data from file is extracted by running the full page OCR reading. Where possible, electronic text is also extracted and the results are used to refine the OCR results.
- OcrFromImage - Select this option if your input documents are images / contain images only. The data from file is extracted by running the full page OCR reading
- ExtractFromEText - Select this option if your input files are PDF files that contain textual data only. Electronic text is extracted natively, as is
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
OutputType - Required String. This setting specifies the OCR output type for the request item. It can be one of these values
Pdf
,Text
. The additional values you can make on the request item are based on what is assigned to the OutputType.- Values for OutputType
Pdf
- Version - String. Can be one of these optional values
Pdf
,Pdf14
,Pdf15
,Pdf16
,Pdf17
,PdfA1A
,PdfA1B
,PdfA2A
,PdfA2B
,PdfA2U
. If not provided, the default value is “Pdf”. Mapping to Acrobat version:“Pdf” -> PDF 1.7,PDF14
-> Pdf 1.4,Pdf15
-> Pdf 1.5,Pdf16
-> Pdf 1.6,Pdf17
-> Pdf1.7,PdfA1A
-> Pdf/A-1a,PdfA1B
-> Pdf/A-1b,PdfA2A
-> PDF/A-2a and “PdfA2B” - > PDF/A-2b,PdfA2U
-> Pdf/A-2u - Compression - String. Sets the compression level to apply to the text in the output PDF file. Can be one of these optional values “None”, “Low, “Medium”, “High”. If not provided, the default value is “Medium”.
- ImageSelection - String. Can be one of these optional values. If not provided, the default value is “OriginalImage”.
- NoImage - Only extracted text will be included into the output file.
- OriginalImage - Extracted text will be included and the source image will be set as a background for the page in the output file.
- ResultImage - Extracted text will be included and the processed image will be set as a background for the page in the output file.
- ImageResolutionLimit - Number. Limits resolution of images (color, grey, binary) to provided value in DPI. If image resolution is lower than specified by this parameter, then image remains unchanged, otherwise it is scaled to this specified value. Valid values are from 0 to 300. Default value is 150. When value is out of range (value outside 0 to 300), default is used.
- JpegCompressionLevel - Number. Controls the compression rate of the JPEG. Higher compression rates produce smaller files with lower image quality. The current version supports compression rates from 10, 15, 20, 25 an so on in increments of 5 until 90. Values other than 10, 15, 20, 25 etc in the range 10 to 90 are rounded to nearest valid value. E.g. 13 is rounded to 15. The default value is 80. Default value is used when this parameter value is out of 10 to 90 range.
- BinaryImage - Boolean. Specifies whether to convert pdf file images to the binary format. When value is ‘true’ then pdf images are converted to the binary format. The default value is ‘false’.
- Values for OutputType
Text
- None for output type “Text”. Encoding is “Unicode”.
- Values for OutputType
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The supported file input types for color and grayscale images are JPEG and PNG. The supported file input type for binary images is TIFF G4.
Request Body
{"serviceProps"=>[{"name"=>"Env", "value"=>"D"}, {"name"=>"OcrEngineName", "value"=>"Advanced"}], "requestItems"=>[{"nodeId"=>1, "values"=>[{"name"=>"OutputType", "value"=>"text"}], "files"=>[{"name"=>"DoodadPage1", "value"=>"F_5fca44b57c4b4cddae84a7be36864c4bTIF", "contentType"=>"image/tiff"}]}]}
HEADERS
Key | Datatype | Required | Description |
---|---|---|---|
Content-Type | string |
RESPONSES
status: OK
{"returnStatus":{"status":200,"code":"OK0000","message":"","server":"WS-S7699312da37548f4a2bf9921c4a66d90"},"id":"REQ1","serviceName":"FullPageOCR","executionMilliSeconds":847,"licenseUsedPercent":0,"resultItems":[{"nodeId":1,"errorCode":"","errorMessage":"","files":[{"name":"Doodad-p1","value":"F_de487b9914be497681d09a927b060c27DAT","contentType":"text/plain","src":"https://{host}/cp-rest/session/files/F_de487b9914be497681d09a927b060c27DAT","fileType":"txt"}]}]}