Number of APIs: 27
OpenText Core Capture Services are a set of RESTful web service interfaces that provides capture functionality. Being developed in a purely RESTful style, Core Capture Services are easier to consume for the purposes of writing custom clients. Core Capture Services identifies resources by Uniform Resource Identifiers (URIs). It defines specific media types to represent resources and drives application state transfers by using link relations. It uses a limited number of HTTP standard methods (GET, POST, and DELETE) to manipulate these resources over the HTTP protocol. Core Capture Services (hereafter simply called, service
) supports only the JSON format for resource representation. JavaScript Object Notation (JSON) is a lightweight data interchange format based on a subset of the JavaScript Programming Language standard.
GET {{baseUrl}}/?suppress_response_codes=in adipisicing
The Home Document is an entry point to the Core Capture Services. It is available to any caller. Its main purposes it to provide discovery of the URIs necessary to interact with the service. It is retrieved by performing an HTTP GET on the base installation path. So for example if the REST service was installed into https://{host}/cp-rest/v2, then performing a GET on this URI would return the Home Document. Its main purposes is to provide discovery of the URIs necessary to interact with the service. All clients must start from the Home Document and follow the hrefs given in the link relations to the resources desired. This is important to ensure that your client applications will always work regardless of the URI changes that may take place under different deployment configurations of the service.
GET {{baseUrl}}/about?suppress_response_codes=in adipisicing
This resource provides product information about the Services installation to authenticated users.
GET {{baseUrl}}/session/tables?Env=P&suppress_response_codes=in adipisicing
The server maintains different tables that provide information about key pieces of data to authenticated users.
GET {{baseUrl}}/session/tables/:tableId?view=name,createtime&sort=createtime asc&env=P&suppress_response_codes=in adipisicing
The Table Resource pertains to a specific table from the set of available tables on the server.
GET {{baseUrl}}/session/doctypes?Env=P&suppress_response_codes=in adipisicing
This operation returns a feed listing all of the Document Types. Document Types are created using the Designer.
GET {{baseUrl}}/session/doctypes/:docType?Env=P&suppress_response_codes=in adipisicing
This resource retrieves a specific Document Type. A Document Type is created using the Designer.
GET {{baseUrl}}/session/files/:fileId?suppress_response_codes=in adipisicing
Retrieving an actual file that was previously POSTed is simply performed by executing a GET on the files URI with the fileId as shown below. This will return the actual file data.
POST {{baseUrl}}/session/files/:fileId?suppress_response_codes=in adipisicing
Chunking a file in pieces to the server requires that the POST be made to the URI represented by the src property or the URI provided by the Location header returned from the first chunk. Additional chunks append to the file and you can always retry/re-post the last chunk. Chunking requires the data for the file to be sent in base64 or binary encoding. The chunks need to be posted without gaps in order to be successful.
DELETE {{baseUrl}}/session/files/:fileId?suppress_response_codes=in adipisicing
An individual file can be deleted. Once deleted, the file can no longer be accessed.
POST {{baseUrl}}/session/files?suppress_response_codes=in adipisicing
You can only create one stage file at a time. Upon the first POST a unique fileId will be created by the server. File data can be posted either in base64 encoding as a JSON post or as a binary to the server.
If you need to chunk this in pieces to the server, then subsequent requests must be made to the URI represented by the src property or the URI provided by the Location header returned from the first chunk. Additional chunks append to the file and you can always retry/re-post the last chunk. The chunks need to be posted without gaps in order to be successful.
There are two ways to create a stage file:
Create the stage file using a JSON post with base64 encoding.
Post the file as binary using the appropriate Content-Type.
DELETE {{baseUrl}}/session/files?filter=*&suppress_response_codes=in adipisicing
This call deletes all stage files in the session including those returned by service calls. Once called, the deleted files will no longer be available.
Deleting files accepts a query string parameter, filter, as shown below. Currently, the only value this parameter supports is *, which means all files. This is the only filter value currently supported by the Services and provides for the deletion of all the files in the session.
GET {{baseUrl}}/session/services?suppress_response_codes=in adipisicing
This operation returns a feed listing all of the Real-Time Services.
POST {{baseUrl}}/session/services/convertimages
The Convert Images Real-Time Service provides image conversion capability as defined by an image conversion profile.
Capture Services currently only supports using system provided image conversion profiles named:
SplitPDFProfileprofile supports splitting PDF documents including colored documents resulting into tiff images of 300 DPI resolution.
SplitPDFtoPDFsprofile supports splitting multipage PDF document into single page PDF documents. “CombineSearchablePDFs” profile supports merging multipage PDF documents into a single mutltipage PDF document.
Service Properties
Env - Metadata environment identifier. Value must be S
since SplitPDFProfile
is system provided profile.
Profile - Required String. The Image conversion profile name to use for the conversion. Currently only system provided profiles name are “SplitPDFProfile”, “SplitPDFtoPDFs” and “CombineSearchablePDFs”.
ReturnFileDataInline - Boolean. If true, then the resulting file is returned inline in the result item as a base64 encoded file. If omitted or false, then the resulting file is returned as a fileId and can be retrieved through the Files resource. File ID referncing the resulting file is returned as part of URI in src
property of File object of Result Item.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are necessary or used.
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource.
The File Type property for the file must specify the file extension for the file, such as tif
, png
, jpg
, pdf
, etc. This is used by the Convert Images Real-Time Service for further typing of the file.
POST {{baseUrl}}/session/services/processimage
The Process Image Real-Time Service provides image processing capability as defined by an image processor profile defined in the Designer.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
Profile - Required String. The Image processor profile name to use.
ReturnFileDataInline - Boolean. If true, then the resulting file is returned inline in the result item as a base64 encoded file. If omitted or false, then the resulting file is returned as a fileId and can be retrieved through the Files resource. File ID referncing the resulting file is returned as part of URI in src
property of File object of Result Item.
Redact - Boolean. Image will be redacted using “Rectangles” property passed in Request Item. Redaction will be done before profile filter processing if profile name is passed in. “Profile” property is optional when “Redact” property is “True”.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
left:
top:
width:
height:
Files Per Request Item
There can only be one file per request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
POST {{baseUrl}}/session/services/fullpageocr
The Full Page OCR Real-Time Service will provide full page OCR processing on submitted images or PDF documents and return the OCR content in the specified output type.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
OcrEngineName - String. This specifies the OCR engine name to use. Currently supported engine is “Advanced”. "Advanced” OCR engine is currently assigned to the “OpenText Capture Recognition Engine” for this release. The default OCR engine is “Advanced”.
AutoRotate - Boolean. This is an optional value specifying whether auto rotation should be enabled for the engine. The default is true.
Country - String. This optional value specifies the country for the engine. The default is USA
. When passing mutiple values in comma separated list, values must be within countries/languages groups given below:
ProcessingMode - String. This optional value specifies proessing mode value for the engine. The default is VoteOcrAndEText
. This can be one of the following values:
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
OutputType - Required String. This setting specifies the OCR output type for the request item. It can be one of these values Pdf
, Text
. The additional values you can make on the request item are based on what is assigned to the OutputType.
Pdf14,
Pdf15,
Pdf16,
Pdf17,
PdfA1A,
PdfA1B,
PdfA2A,
PdfA2B,
PdfA2U. If not provided, the default value is “Pdf”. Mapping to Acrobat version:“Pdf” -> PDF 1.7,
PDF14-> Pdf 1.4,
Pdf15-> Pdf 1.5,
Pdf16-> Pdf 1.6,
Pdf17-> Pdf1.7,
PdfA1A-> Pdf/A-1a,
PdfA1B-> Pdf/A-1b,
PdfA2A-> PDF/A-2a and “PdfA2B” - > PDF/A-2b,
PdfA2U-> Pdf/A-2u
Text
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The supported file input types for color and grayscale images are JPEG and PNG. The supported file input type for binary images is TIFF G4.
POST {{baseUrl}}/session/services/readbarcodes
The Read BarCodes Real-Time Service will provide barcode extraction processing.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
BarcodeTypes - Required String. Comma separated list of available barcodes. List of barcodes types:
Characters - Number. Exact number of characters to search for in the barcode text. Valid values range from 0 to 100.
Decode - Boolean. If true, then it decodes the results into readable strings; otherwise, if false (the default), then it will not decode into readable strings.
MinHeight - Number. Minimum height of barcode. Valid values range from 0 (default) to 1000.
Mode - String. Barcode detection modes let you switch between normal and enhanced detection types. If omitted, defaults to Normal. Valid values:
Orientation - String. Specifies the orientation of the barcodes detection. If omitted, then it defaults to HorizontalVertical. Valid values are:
ScanDistance - Number. Specifies the scan distance (in pixels) between line sweeps. Useful when searching for 1D type barcodes. Reducing the value improves detection of barcodes which are short relative to their height. Valid values are 1 to 10. If omitted, defaults to 5.
UseChecksum - Boolean. A value that is an indication of whether the checksums are used. If omitted, then it defaults to false.
UseRegion - String. A region to select for barcode detection in order to improve the barcode detection process. It defaults to empty (not used).
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are necessary or used.
Files Per Request Item
There can only be one file per request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
POST {{baseUrl}}/session/services/classify
The Classify Service will perform Classification on the images submitted and return available Document Type and Template information if successful. In addition to template information, fields extracted as part of pre-index extraction will be retuned as fields in UimData.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are needed or used.
Files Per Request Item
There can only be one file per request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
POST {{baseUrl}}/session/services/classifyextractpage
The Classify Extract Page Service will perform classification and extraction on each item submitted and return a UIM object containing information from the result of classification and extraction.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise by default, it won't.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are needed or used
Files Per Request Item
There can only be one file per request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
POST {{baseUrl}}/session/services/classifyextractdocument
The Classify Extract Document Service will perform classification and extraction on each item submitted and return an UIM object containing information from the result of classification and extraction. Optionally, the service will perform document separation as configured in the recognition project.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.
EnableDocumentSeparation - Optional string. Whether to perform auto document separation as per Dpp project folder management settings. Default value is “false”.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are needed or used
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
If the request item contains more than one image, then the document type associated with the first classified page is used for the document. The extraction results for all pages belonging to the document type are merged into a single document. If a given field has conflicting values from different pages, then the value is set according to the Extract Page
visual property for that field in the document type definition.
POST {{baseUrl}}/session/services/extractpage
The Extract Page Service will perform extraction on each item submitted and return a UIM object containing information from the result.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Values Per Request Item
DocumentTypeName - String. The Document Type name to be used for extraction. This is optional if the TemplateId property is passed.
PageIndex - Number. The zero-based page index within Document Type. If omitted, then it defaults to 0. This is optional if the TemplateId property is passed. Unused if the project is InformationExtraction
.
TemplateId - String. The image template ID assigned in the recognition project that should be used for extraction. If not supplied, then the DocumentTypeName should be specified. Unused if the project is InformationExtraction
.
Files Per Request Item
There can only be one file per request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
If the DocumentTypeName and PageIndex are specified, then the data will be extracted based on the index of the template in the order of the template names (not IDs) in the specified document type. If the PageIndex is greater than the number of templates in the document type, then the image is not processed for data extraction.
POST {{baseUrl}}/session/services/extractdocument
The Extract Document Service will perform extraction on each item submitted and return a UIM object containing information from the result.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Values Per Request Item
DocumentTypeName String. The Document Type name to be used for extraction. This is ignored if the TemplateIds property is passed.
TemplateIds - Array of Strings. The image template IDs assigned in the recognition project that are used for extraction. If not supplied, then the DocumentTypeName must be specified. Unused if the project is InformationExtraction
. To skip a extraction from a page, set the template ID for that page to -2
.
RepeatLastTemplate - Boolean. If true and if the TemplateIds array has fewer entries than the request item has files, the last template ID is applied to the remaining files in the request item.
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
If the TemplateIds property is not included in the request, more than one image is sent, and the DocumentTypeName is specified, then the images are processed as follows. First, the templates in the specified document type are ordered by name (not ID). Then, the first template in the list is used for the first file in the request item, the second template in the list is used for the second file in the request item, and so forth. If the request item contains more images than there are templates in the document type, then the extra images are not processed for data extraction.
POST {{baseUrl}}/session/services/uimdata
The UimData Real-Time Service will provide either UIM (Unified Indexing Model) data population or validation or both population and validation. The population and validation rules referenced below are developed in the Designer when constructing a Document Type. Please see the Designer documentation for more information about rules and Document Types.
Service Properties
D,
Tor
P. Default value is
P.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
Command - String. Valid values:
TriggerReference - String. Name of the field that is used as a population trigger or population target. Used only for Populate or PopulateAndValidate commands. If this is empty or not provided, then the service will run all the rules on the supplied UimData. If it is populated, then it will only run rules that are not one-time rules.
TriggerKind - String. One of the following values. Used only for Populate or PopulateAndValidate commands.
PopulateTriggerRow - Integer. This is a zero based row index for array field based population. This property is ignored if no field name was supplied in the triggerReference
property or if the field name supplied is not an array field. The operation will also fail if the index supplied for this property is invalid for the supplied array field name.
UimData - Object. This is a UIM data information object that you want the service to use for performing the command.
Files Per Request Item
No files are necessary or used.
POST {{baseUrl}}/session/services/processimagepipeline
The Process Image Pipeline Real-Time services executes a series of image services on a single image based on service properties. The order of services execution is as follows:: - Image Enhancement - Classify - Extract
Outputs of all executed services are combined and returned in the response.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
ImageProfile - If non empty, the incoming image is enhanced.
ReturnFileDataInline - Boolean. If true, then the resulting file is returned inline in the result item as a base64 encoded file. If false, then the resulting file is returned as a fileId and can be retrieved through the Files resource. Default value is true. Applicable only if image enhancement is done. File ID referncing the resulting file is returned as part of URI in src
property of File object of Result Item.
Classify - Boolean. Classify the image.
Extract - Boolean. Extract the image. If this is set then ClassifyAndExtract is performed on the image and “Classify” flag is ignored.
IncludeOcrData - Boolean. If true then the returned UIM data object will contain extracted characters information. Otherwise it won't.
Project - Optional string. The recognition project to use to classify the document. Valid values are Default
for the Advanced Recognition and InformationExtraction
for the Information Extraction. If omitted, Default
is used.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
No values are necessary or used.
Files Per Request Item
There can only be one file in request item object. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource.
POST {{baseUrl}}/session/services/learning
The Learning service is used to learn to classify and/or extract a document using the Information Extraction recognition project. The document must previously have been processed with one of the services that classify and/or extract the document in order to generate the needed server-side data for learning. Extraction IDs (extractionId) returned by the following services cannot be used for learning: Classify, ClassifyExtractPage, and ExtractPage. However, extraction IDs that are returned from the following services can be used: ClassifyExtractDocument, and ExtractDocument. In other words, only extraction IDs coming from document level classify or extract services are valid. Now, there are two situations where learning IDs will be invalid. First, if you change the structure of the batch such as rearranging pages, deleting pages, or adding new pages after you have called those APIs for classification and extraction, then you will need to clear the Extraction IDs (extractionId) in the UIMData as they will be invalid for learning. Second, if you modify a page such as rotating or cropping it, supply null for that page's classification ID instead of the ID returned by the classification APIs as the change to the image invalidates its classification.
For more efficient processing, if a document classified or extracted with the Information Extraction project will not be learned, the learning service should still be called with the learning mode set to None
. This will delete any temporary server-side data that would otherwise be used for learning.
Service Properties
Env - Metadata environment identifier. Value is one of D
, T
or P
. Default value is P
.
Project - String. Must be set to InformationExtraction
for the Information Extraction project.
Number of Request Items
This Real-Time Service supports one or more items.
Values Per Request Item
Mode - String. Valid values:
UimData - Object. This is a UIM data information object to be used for learning to extract. Learning works best if the field values have location rectangles. The page IDs for the field values are 0-based page numbers in this call.
DocumentType String. The Document Type name to be used for learning to classify.
ClassificationPageIds - Array of strings. The array of server-generated IDs, one per page, returned by classification functions. The service will use these values to learn to classify. If the entire document was extracted, this is the ClassificationPageIds
returned by the extraction call. If the document was extracted page-by-page, it is an array of the per-page classification IDs. The values in the list must have the same order as the pages in the document.
Files Per Request Item
Each item can have one or more files. It can either be an embedded file or a reference to a file ID previously posted to the Files Resource. The File Type property for the file is ignored for this service.
GET {{baseUrl}}/session?suppress_response_codes=in adipisicing
The Session resource will provide the URI for ending the session. This clears all the files and metadata in your session and allows for more efficient processing for future sessions related to your subscription.
DELETE {{baseUrl}}/session?suppress_response_codes=in adipisicing
As long as the session is still active, deleting your session will return HTTP status code 200. If the session has expired, then a 401 Unauthorized response is sent by the server.
POST {{baseUrl}}/session/batches?suppress_response_codes=in adipisicing
The Create and Export Batch operation is the last call required to create a batch and submit it to your backend server. First submit all image files to server, then you can use the returned file reference ids to reference files in the Batch nodal information. You can also use the same file reference ids returned by Real-Time service calls.
The batch name that you use to create a batch has to be unique when being imported into your backend server. To help you accomplish creating unique names you can supply for the batchName
JSON property any Capture Services Format Expression function (see the Designer Documentation). There are also two additional format tokens you can use for providing unique names - {NextIndex} and {NextId}
{NextIndex} - This will provide a 64 bit integer number that is unique. Example: batchName
:MyBatch_{NextIndex}
produces on the server MyBatch_1026000000002
{NextId} - This will provide a valid Batch name string that is unique. Example: batchName
:MyBatch_{NextId}
produces on the server MyBatch3241
Any supported static function in the Capture Services Expression Language (see the Designer Documentation). [batchName
:MyBatch_{Tddhhmmss|Now()}_{NextIndex}
produces on the server, MyBatch090649341026000000003. Or batchName
:MyBatch_{S|CreateGuid(0)}
produces on the server, MyBatch_82fcd238-2fb7-44ac-9acc-a13ce406241d
Document type and UimData values in batch:
For composing export profile and exporting UimData values, Document type and UimData values must at batch level mentioned below with given names as below.
Batch – Level 7:
valueName = Profile
valueType = string
value = // Name of the profile.
Document – Level 1:
valueName = UimDocumentType
valueType = string
value = // This should be a string specifying the document type name.
valueName = UimData
valueType = uimdata
value = // This should be a UIM Data JSON Object
valueName = OutputFile
valueType = file
value = // File id for the original PDF to be available for export.
Page – Level 0:
valueName = OutputImage
valueType = file
value = // file id for the image to be exported
valueName = Backside
valueType = int
value = // 0 (front) or 1 (back) to indicate whether the image is a backside image.
ENDPOINTS