In real world, we will have many PDF files to read the content and prefill the forms in web application. To automatically read this PDF and predict the values, Microsoft offering cognitive service called Form Recognizer. Using this service, we can pass our PDF file and get the extracted OCR values as JSON back with bounding box coordinates.
Ref: https://azure.microsoft.com/en-in/services/form-recognizer/#features
We can use some custom libraries to highlight the bounding box coordinates in UI over the image. For this we can convert PDF into images and display it in UI as well.
For ex:
https://www.w3schools.com/tags/tag_map.asp
Sample JSON extracted: Highlighted sample bounding box coordinates.
{"status":"succeeded","createdDateTime":"2021-02-23T05:09:00Z","lastUpdatedDateTime":"2021-02-23T05:09:11Z","analyzeResult":{"version":"2.1.0","readResults":[{"page":1,"angle":0,"width":1700,"height":2200,"unit":"pixel","lines":[{"text":"CONTOSO LTD.","boundingBox":[114,134,466,134,466,175,115,175],"words":[{"text":"CONTOSO","boundingBox":[115,135,333,134,333,176,115,176],"confidence":0.994},{"text":"LTD.","boundingBox":[357,134,465,134,465,176,358,176],"confidence":0.994}],"appearance":{"style":{"name":"other","confidence":0.878}}},{"text":"INVOICE","boundingBox":[1410,114,1601,115,1601,155,1410,155],"words":[{"text":"INVOICE","boundingBox":[1411,115,1593,115,1592,156,1411,155],"confidence":0.995}],"appearance":{"style":{"name":"other","confidence":0.878}}},.......}
No comments:
Post a Comment