AI-102: How to Optimize Azure Cognitive Search Indexing？

Discover expert tips on choosing the right Azure Cognitive Search indexer capabilities for computationally intensive AI tasks. Perfect for mastering the AI-102 exam and boosting your career!

Question

Table of Contents

Question
Answer
Explanation

Xerigon Corporation has an Azure Storage account for each department. Several departments have a need for computationally intensive AI enrichment where indexers are needed for skillsets.

Scenario 1:

The Finance department needs to scan 14 MB of text documents. Documents may change due to market fluctuations, so indexing only new and changed documents is essential.

Scenario 2:

The Marketing department needs to scan 9 GB of documents of competitors’ products.

Scenario 3:

The Accounting department scans up to 1,200 documents of sales orders in an 8-hour period. The entire payload of the scan generally exceeds 4.7 GB.

You must index the documents for each department in their respective storage account. You want to minimize the length of time it takes to build an index for each department’s documents.

Which indexer capability should you choose for each scenario?

Match each scenario with the appropriate capability.

Indexing scenarios:

Scenario 1
Scenario 2
Scenario 3

Answer

Parallel indexing:

Scenario 2
Scenario 3

Scheduled indexing:

Scenario 1

Batching documents:

Explanation

Scheduled indexing, batching documents, and parallel indexing are different capabilities for indexing data sets in Azure AI Search.

Scheduled indexing allows you to automate the updating of your search index at regular intervals. This capability is helpful if the source data changes frequently, as in Scenario 1. In Scenario 1, documents may change frequently due to market fluctuations, so indexing only new and changed documents is important. With scheduled indexing, as the name implies, you can schedule the indexing frequency, whether every five minutes or once daily.

Parallel indexing allows you to break large datasets into smaller partitions and index those partitions. The data can be partitioned into multiple containers in blob storage. Every partition can be indexed separately and simultaneously. Parallel indexing is excellent for computationally intensive AI tasks and large-scale data ingestion, speeding up the indexing process. Both Scenario 2 and 3 have large amounts of documents that must be indexed.

Batching documents can decrease the time taken to process large amounts of data. You must determine the batch size for your data to optimize the index speed. Batching documents would not work in any of the scenarios because it must have a payload for the batch under 16 MB and less than 1,000 documents in a bulk upload. The accounting department scans up to 1,200 documents of sales orders in an 8-hour period. The entire payload of documents in the 8-hour period scan generally exceeds 4.7 GB.

Microsoft Azure AI Engineer Associate AI-102 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Engineer Associate AI-102 exam and earn Microsoft Azure AI Engineer Associate AI-102 certification.