Extract Document Service.
POST {{baseUrl}}/session/services/extractdocument
The Extract Document Service will perform extraction on each item submitted and return a UIM object containing information from the result.
Service Properties
Env - Metadata environment identifier. Value is one of
D
,T
orP
. Default value isP
.IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.
Project - Optional string. The recognition project to use to classify the document. Valid values are
Default
for the Advanced Recognition andInformationExtraction
for the Information Extraction. If omitted,Default
is used.
Values Per Request Item
DocumentTypeName String. The Document Type name to be used for extraction. This is ignored if the TemplateIds property is passed.
TemplateIds - Array of Strings. The image template IDs assigned in the recognition project that are used for extraction. If not supplied, then the DocumentTypeName must be specified. Unused if the project is
InformationExtraction
. To skip a extraction from a page, set the template ID for that page to-2
.RepeatLastTemplate - Boolean. If true and if the TemplateIds array has fewer entries than the request item has files, the last template ID is applied to the remaining files in the request item.
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
If the TemplateIds property is not included in the request, more than one image is sent, and the DocumentTypeName is specified, then the images are processed as follows. First, the templates in the specified document type are ordered by name (not ID). Then, the first template in the list is used for the first file in the request item, the second template in the list is used for the second file in the request item, and so forth. If the request item contains more images than there are templates in the document type, then the extra images are not processed for data extraction.
Request Body
{"serviceProps"=>[{"name"=>"Env", "value"=>"D"}, {"name"=>"IncludeOcrData", "value"=>true}, {"name"=>"Project", "value"=>"InformationExtraction"}], "requestItems"=>[{"nodeId"=>1, "values"=>[{"name"=>"DocumentTypeName", "value"=>"TestWren"}, {"name"=>"TemplateIds", "value"=>["28"]}, {"name"=>"RepeatLastTemplate", "value"=>false}], "files"=>[{"name"=>"Wren", "value"=>"F_113aecccef734e448bec8d254ae4e059TIF", "contentType"=>"image/tiff", "fileType"=>"tif"}, {"name"=>"Wren_p2", "value"=>"F_2061b933c8e5412aa563a1b9c7ebf337TIF", "contentType"=>"image/tiff", "fileType"=>"tif"}]}]}
HEADERS
Key | Datatype | Required | Description |
---|---|---|---|
Content-Type | string |
RESPONSES
status: OK
{"returnStatus":{"status":200,"code":"OK0000","message":"","server":"WS-Sa3586a2353bb48c0b7131c9875f61e69IS"},"licenseUsedPercent":0,"id":"REQ6","serviceName":"extractdocument","executionMilliSeconds":663,"licensePagesUsed":1,"licensePagesUsed2":0,"resultItems":[{"nodeId":1,"errorCode":"","errorMessage":"","values":[{"name":"ClassificationPageIds,","value":["d907a548196c4e35837dad51954cd3ed","748f4bfce54f4fc084ba19cee31bcccc"]},{"name":"UimData","value":{"docType":"TestWren","locale":"en-US","flaggedReason":null,"extractionId":"3aff08999e844ff6a31aff002b0fcb4a","nodeList":[{"name":"InvoiceNumber","isArray":false,"indexFieldType":"Number","labelText":"Invoice No.","isRequired":true,"controlType":"TextBox","data":[{"arrayIndex":0,"value":227628,"fieldError":{"errorCode":"ER2208","recoverable":false,"message":"Out of Bounds:Valid values: 1000 ➜ 10210"},"mustConfirm":true,"choices":null,"locationRect":{"left":639,"top":144,"width":124,"height":30},"pageId":1,"confidence":100,"extractedCharacters":[{"values":[{"value":"2","confidence":100}]}]}]}]}}]}]}