Introduction

Processing incoming invoices is a key task for many companies. Usually, accountants review documents and enter the necessary information into the system. However, as the number of them increases, the time spent processing them also increases. This affects the company's efficiency and increases stress among employees. This is precisely where automation comes to the rescue. 

We have already written about how to improve the document processing process in the article "AI-powered Document Recognition | UDS Blog", and in this article, we will explore another approach to this process improvement.

Model Training

For our purpose, we will use the "Custom extraction model" found in the "Document Intelligence Studio." 

Once we have selected it, we can create a new project by clicking the "Create a project" button: 

Now, we will enter the project’s name: 

As we do not have any resources prepared in Azure, we can create them with a new project: 

A little lower, we will select the API version. Then, we will use the latest available version: 

 

For this project, we will create a new resource group, a repository, and a container: 

 

! Note. We have specified the path to the folder, as we will need it later: 

After the project has been created, we will see the following window: 

Now, we can download the data and try to train the model. To do this, we will select at least five documents with the same structure. Since our task is to recognize invoices, we will use the tab "Auto label" -> "All documents": 

 

 

Next, we will choose the “prebuilt-invoice” model and click on the “Auto label” button: 

After that, the files will be automatically recognized, and the fields will be pre-filled. However, warnings may appear, in which case they must be corrected manually. For example, in our case, the text "z.  Hd." was recognized as "CustomerAddressRecipient" and "CustomerName": 

 

 

! Note. This is part of the address, so we will delete the text from the "CustomerName" field. Then, in the "CustomerName" field, we will paste the text from the document by simply clicking on the desired text in the document and selecting the "CustomerName" field: 

! Note. This operation must be done for each file that has a similar warning. Also, for recognition, it is necessary to have the account number and the account date. Fortunately, this data has been recognized correctly.   

Thus, we have the following result: 

 

Now, we will click on the “Train” button. There, we need to specify the model ID and select the training mode. 
  

! Note. For our purpose, it is enough to use the “Template” mode, but it is recommended to use “Neural”:

After setting the parameters, we are to click on the "Train" button. Now, we can see the training results on the "Models" page:

Now, we have to change the folder path in the project settings. We will set the value to MS: 

Similar to the previous steps, we will load and train another model. But unlike the previous files, in the current ones, we will fill in the data about the car and the last registered kilometer. We also need this data, but what if these fields are not in the “prebuilt-invoice” model: 

In fact, this is not a problem. We will create our fields as follows: on the “Label data” tab to the right of the file area, we need to click on “Add a field”, select “Field” and enter the name of the new field: 

This way, we have created two new fields, "Kennzeichen" and "KmStand". Then, we have mapped the fields to the values: 

Now, we need to match values in each document we have selected for training. After editing the fields, we have trained another model.

Creating a Classifier

To create a shared model, we first need to create a classifier. For this, we go to the main menu in the "Document Intelligence Studio" and select "Custom classification models":

Similar to the Custom extraction model, we create a project, but this time, we choose the existing services:

In the place where we have to enter the path to the folder, we do not write anything. We leave it blank:

Since we have used different folders for different file types, when creating a project with classification, we will receive the following message:

Here, we are asked to create labels according to the folder names. In our case, we agree with the suggestion, and therefore labels are created automatically. Moreover, the files in different folders are automatically labeled according to the folder names.  

Now, we can click on the “Train” button, fill in the required fields, and train the model for classification:

We can see the training result on the “Models” tab:

The next step is to return to the Custom Extraction model and create a shared model. To do this, we go to the “Models” tab and opt for the previously trained models:

We set the model identifier, select the previously created classifier, and define the parameters that will satisfy the recognition result:

Checking the Results

The last step is to test our model. We go to the “Test” tab and choose the common model. Then we download the two files with different structures and click on the “Run analysis” button:

Now, we can see the recognition result. 

The document MS:

The document N.:

Summarizing

As we can see, the results are quite acceptable. The documents have been correctly classified and analyzed. We can admit that the solution is effective.  

These instructions will help automate document processing, reduce the risk of errors, and free up the accounting team for more critical tasks.

Feel free to contact a UDS Systems representative if you have any questions or require a consultation on the topic.