Logo
OpenText Developer Cloud public resources API Documentation

Full Page OCR Service.

POST {{baseUrl}}/session/services/fullpageocr

The Full Page OCR Real-Time Service will provide full page OCR processing on submitted images or PDF documents and return the OCR content in the specified output type.

Service Properties

  • Env - Metadata environment identifier. Value is one of D, T or P. Default value is P.

  • OcrEngineName - String. This specifies the OCR engine name to use. Currently supported engine is “Advanced”. "Advanced” OCR engine is currently assigned to the “OpenText Capture Recognition Engine” for this release. The default OCR engine is “Advanced”.

  • AutoRotate - Boolean. This is an optional value specifying whether auto rotation should be enabled for the engine. The default is true.

  • Country - String. This optional value specifies the country for the engine. The default is USA. When passing mutiple values in comma separated list, values must be within countries/languages groups given below:

    • Greek: Greece, Greek
    • Latin and Cyrillic languages: Afrikaans, Albanian, Andorra, Argentina, Australia, Austria, AzerbaijaniCyrillic, AzerbaijaniLatin, AzerbaijanCyrillic, AzerbaijanLatin, Baltic, Basque, Belarus, Belarusian, Belgium, BosnianLatin, BosniaLatin, Brazil, Bulgaria, Bulgarian, Canada, Catalan, CentralAmerica, CentralEurope, Chile, Colombia, Croatia, Croatian, Cyrillic, Czech, CzechLanguage, Danish, Denmark, Dutch, English, Estonia, Estonian, Faroese, Finland, Finnish, France, French, Frisian, German, Germany, GreatBritain, Greece, Greek, Guarani, Hani, Hungarian, Hungary, Iceland, Icelandic, Indonesian, Ireland, Irish, Italian, Italy, JapanLatin, KazakhCyrillic, KazakhLatin, KirghizCyrillic, Kirundi, Latin, Latvia, Latvian, Liechtenstein, Lithuania, Lithuanian, Luxembourg, Luxembourgish, Macedonian, Malay, Mexico, Netherlands, NewZealand, Norway, Norwegian, Poland, Polish, Portugal, Portuguese, Quechua, RhaetoRomanic, Romania, Romanian, Russia, Russian, Rwanda, Scandinavia, SerbianCyrillic, SerbianLatin, SerbiaCyrillic, SerbiaLatin, Shona, Slovak, Slovakia, Slovenia, Slovenian, Somali, Sorbian, SouthAfrica, SouthAmerica, SouthAmericaSpanish, Spain, Spanish, Swahili, Sweden, Swedish, Switzerland, TajikCyrillic, Turkey, Turkish, TurkmenCyrillic, TurkmenLatin, Ukraine, Ukrainian, USA, UzbekCyrillic, UzbekLatin, Venezuela, WesternEurope, Wolof, Xhosa, Zulu
    • Chinese: ChineseSimplified, ChineseTraditional
    • Chinese Hong Kong: ChineseTraditionalHongKong
    • Japanese: Japan, Japanese (cannot both be selected.)
    • Korean: Korean
    • Thai: Thai, Thailand (English can be added explicitly)
  • ProcessingMode - String. This optional value specifies proessing mode value for the engine. The default is VoteOcrAndEText. This can be one of the following values:

    • VoteOcrAndEText - Select this option if your input documents contain mixed content. The data from file is extracted by running the full page OCR reading. Where possible, electronic text is also extracted and the results are used to refine the OCR results.
    • OcrFromImage - Select this option if your input documents are images / contain images only. The data from file is extracted by running the full page OCR reading
    • ExtractFromEText - Select this option if your input files are PDF files that contain textual data only. Electronic text is extracted natively, as is

Number of Request Items

This Real-Time Service supports one or more items.

Values Per Request Item

  • OutputType - Required String. This setting specifies the OCR output type for the request item. It can be one of these values Pdf, Text. The additional values you can make on the request item are based on what is assigned to the OutputType.

    • Values for OutputType Pdf
    • Version - String. Can be one of these optional values Pdf, Pdf14, Pdf15, Pdf16, Pdf17, PdfA1A, PdfA1B, PdfA2A, PdfA2B, PdfA2U. If not provided, the default value is “Pdf”. Mapping to Acrobat version:“Pdf” -> PDF 1.7, PDF14 -> Pdf 1.4, Pdf15 -> Pdf 1.5, Pdf16 -> Pdf 1.6, Pdf17 -> Pdf1.7, PdfA1A -> Pdf/A-1a, PdfA1B -> Pdf/A-1b, PdfA2A -> PDF/A-2a and “PdfA2B” - > PDF/A-2b, PdfA2U -> Pdf/A-2u
    • Compression - String. Sets the compression level to apply to the text in the output PDF file. Can be one of these optional values “None”, “Low, “Medium”, “High”. If not provided, the default value is “Medium”.
    • ImageSelection - String. Can be one of these optional values. If not provided, the default value is “OriginalImage”.
      • NoImage - Only extracted text will be included into the output file.
      • OriginalImage - Extracted text will be included and the source image will be set as a background for the page in the output file.
      • ResultImage - Extracted text will be included and the processed image will be set as a background for the page in the output file.
    • ImageResolutionLimit - Number. Limits resolution of images (color, grey, binary) to provided value in DPI. If image resolution is lower than specified by this parameter, then image remains unchanged, otherwise it is scaled to this specified value. Valid values are from 0 to 300. Default value is 150. When value is out of range (value outside 0 to 300), default is used.
    • JpegCompressionLevel - Number. Controls the compression rate of the JPEG. Higher compression rates produce smaller files with lower image quality. The current version supports compression rates from 10, 15, 20, 25 an so on in increments of 5 until 90. Values other than 10, 15, 20, 25 etc in the range 10 to 90 are rounded to nearest valid value. E.g. 13 is rounded to 15. The default value is 80. Default value is used when this parameter value is out of 10 to 90 range.
    • BinaryImage - Boolean. Specifies whether to convert pdf file images to the binary format. When value is ‘true’ then pdf images are converted to the binary format. The default value is ‘false’.
    • Values for OutputType Text
    • None for output type “Text”. Encoding is “Unicode”.

Files Per Request Item

Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The supported file input types for color and grayscale images are JPEG and PNG. The supported file input type for binary images is TIFF G4.

 

Body PARAM

Key Datatype Required Description 



HEADERS

Key
Curl
curl -X POST 'https://capture.ot2.opentext.com/cp-rest/v2/session/services/fullpageocr' -H 'Content-Type: application/hal+json' -d '{"serviceProps":[{"name":"Env","value":"D"},{"name":"OcrEngineName","value":"Advanced"}],"requestItems":[{"nodeId":1,"values":[{"name":"OutputType","value":"text"}],"files":[{"name":"DoodadPage1","value":"F_5fca44b57c4b4cddae84a7be36864c4bTIF","contentType":"image/tiff"}]}]}'

ENDPOINTS