Skip to Content

AI-900: How to use the Read API to extract text from images and documents

Learn how the Read API in Azure Cognitive Services can process single-page and multi-page documents and return a JSON response with the recognized text and its location in the original image.

Table of Contents

Question

When you use the Read API to process an image, what hierarchy of information does it return?

A. First pages, then lines, then words
B. First words, then lines, then pages
C. First lines, then pages, then words

Answer

A. First pages, then lines, then words

Explanation

The results from the Read API are arranged into the following hierarchy: first pages, then lines, then words.

The correct answer is A. First pages, then lines, then words.

The Read API is a part of the Computer Vision service in Azure Cognitive Services that can extract printed and handwritten text from images and documents. The Read API can process both single-page and multi-page documents and return a JSON response that contains the recognized text and its location in the original image. The JSON response has a hierarchy of information that reflects the structure of the document. The hierarchy is as follows:

  • The top level is the readResult, which contains information about the document, such as the language, the angle of rotation, and the number of pages.
  • The next level is the pageResult, which contains information about each page in the document, such as the page number, the width and height of the page, and the unit of measurement.
  • The next level is the line, which contains information about each line of text in the page, such as the text content, the bounding box coordinates, and the confidence score.
  • The lowest level is the word, which contains information about each word in the line, such as the text content, the bounding box coordinates, and the confidence score.

The Read API returns the information in this hierarchy to preserve the layout and formatting of the original document and to enable the user to reconstruct the document from the extracted text. For example, the user can use the page number, the angle of rotation, and the bounding box coordinates to place the text in the correct position and orientation on the page. The user can also use the line and word information to identify the paragraphs, headings, lists, tables, and other elements in the document.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Fundamentals AI-900 exam and earn Microsoft Azure AI Fundamentals AI-900 certification.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump