Extract Document Service.

POST {{baseUrl}}/session/services/extractdocument

The Extract Document Service will perform extraction on each item submitted and return a UIM object containing information from the result.

Service Properties

  • Env - Metadata environment identifier. Value is one of D, T or P. Default value is P.

  • IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.

  • Project - Optional string. The recognition project to use to classify the document. Valid values are Default for the Advanced Recognition and InformationExtraction for the Information Extraction. If omitted, Default is used.

Values Per Request Item

  • DocumentTypeName String. The Document Type name to be used for extraction. This is ignored if the TemplateIds property is passed.

  • TemplateIds - Array of Strings. The image template IDs assigned in the recognition project that are used for extraction. If not supplied, then the DocumentTypeName must be specified. Unused if the project is InformationExtraction. To skip a extraction from a page, set the template ID for that page to -2.

  • RepeatLastTemplate - Boolean. If true and if the TemplateIds array has fewer entries than the request item has files, the last template ID is applied to the remaining files in the request item.

Files Per Request Item

Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.

If the TemplateIds property is not included in the request, more than one image is sent, and the DocumentTypeName is specified, then the images are processed as follows. First, the templates in the specified document type are ordered by name (not ID). Then, the first template in the list is used for the first file in the request item, the second template in the list is used for the second file in the request item, and so forth. If the request item contains more images than there are templates in the document type, then the extra images are not processed for data extraction.

Request Body

{"serviceProps"=>[{"name"=>"Env", "value"=>"D"}, {"name"=>"IncludeOcrData", "value"=>true}, {"name"=>"Project", "value"=>"InformationExtraction"}], "requestItems"=>[{"nodeId"=>1, "values"=>[{"name"=>"DocumentTypeName", "value"=>"TestWren"}, {"name"=>"TemplateIds", "value"=>["28"]}, {"name"=>"RepeatLastTemplate", "value"=>false}], "files"=>[{"name"=>"Wren", "value"=>"F_113aecccef734e448bec8d254ae4e059TIF", "contentType"=>"image/tiff", "fileType"=>"tif"}, {"name"=>"Wren_p2", "value"=>"F_2061b933c8e5412aa563a1b9c7ebf337TIF", "contentType"=>"image/tiff", "fileType"=>"tif"}]}]}

HEADERS

KeyDatatypeRequiredDescription
Content-Typestring

RESPONSES

status: OK

{"returnStatus":{"status":200,"code":"OK0000","message":"","server":"WS-Sa3586a2353bb48c0b7131c9875f61e69IS"},"licenseUsedPercent":0,"id":"REQ6","serviceName":"extractdocument","executionMilliSeconds":663,"licensePagesUsed":1,"licensePagesUsed2":0,"resultItems":[{"nodeId":1,"errorCode":"","errorMessage":"","values":[{"name":"ClassificationPageIds,","value":["d907a548196c4e35837dad51954cd3ed","748f4bfce54f4fc084ba19cee31bcccc"]},{"name":"UimData","value":{"docType":"TestWren","locale":"en-US","flaggedReason":null,"extractionId":"3aff08999e844ff6a31aff002b0fcb4a","nodeList":[{"name":"InvoiceNumber","isArray":false,"indexFieldType":"Number","labelText":"Invoice No.","isRequired":true,"controlType":"TextBox","data":[{"arrayIndex":0,"value":227628,"fieldError":{"errorCode":"ER2208","recoverable":false,"message":"Out of Bounds:Valid values: 1000 ➜ 10210"},"mustConfirm":true,"choices":null,"locationRect":{"left":639,"top":144,"width":124,"height":30},"pageId":1,"confidence":100,"extractedCharacters":[{"values":[{"value":"2","confidence":100}]}]}]}]}}]}]}