Most organisations use paper based forms for their business functionality and extracting the data from these paper forms and integrate with business systems has always been the challenge. Form Recognizer is a Cognitive Service that uses Machine Learning technologies to detect data from these paper based forms and extract them as Key/Value pairs.
API Reference – https://bit.ly/2rvTvwr
Pre-Requisites
- Form Recognizer Cognitive Service provisioned on Azure (Submit for Form Recognizer Access Request)
- Postman App
Image Requirements
- Set of 5 forms of same type for training
- Image needs to be PDF, JPEG, PNG
Train Model Operation
The Train Model operation allows you to create and train a custom model. The train request must include a source parameter which must be an externally accessible Blob storage container Uri or valid path to the data folder on a local drive.
- Launch Postman
- Append your endpoint URL from the pre-requisites step with /formrecognizer/v1.0-preview/custom/train
- Provide the subscription key and content-type in Headers tab as shown below.

- Provision a new storage account on the Azure Portal and create a container named “formsrecognizer-train” and upload the documents from this location to the container.
- Here the blob Url https://<storage account>.blob.core.windows.net/<container name>?<SAS value>
- storage account – name of the storage account that was created in the previous step
- SAS value – can be retrieved as shown below. (Navigate to the Storage account > Shared access signature > Generate SAS and Connection String and note down the “Blob Service SAS Url”.)

- prefix – select only files and folders with a specified prefix. As indicated in the above screenshot it will look for files with the prefix “formsrecognizer-“
- includeSubFolders – boolean to denote whether sub folders within the container needs to be looked into for files.
- Here is the output of the “Train Model” operation along with the JSON output.

{ "modelId": "d405176c-7072-45a4-ab9b-2b515a22d1b8", "trainingDocuments": [ { "documentName": "Invoice_1.pdf", "pages": 1, "errors": [], "status": "success" }, { "documentName": "Invoice_2.pdf", "pages": 1, "errors": [], "status": "success" }, { "documentName": "Invoice_3.pdf", "pages": 1, "errors": [], "status": "success" }, { "documentName": "Invoice_4.pdf", "pages": 1, "errors": [], "status": "success" }, { "documentName": "Invoice_5.pdf", "pages": 1, "errors": [], "status": "success" } ], "errors": [] }
- Note down the Model Id.