Learn the best approach for updating data quality rules in a large number of AWS Glue Data Catalog tables. Discover how to minimize operational overhead using AWS Lambda and the Glue Data Quality API.
Table of Contents
Question
A data engineer has implemented data quality rules in 1,000 AWS Glue Data Catalog tables. Because of a recent change in business requirements, the data engineer must edit the data quality rules.
How should the data engineer meet this requirement with the LEAST operational overhead?
A. Create a pipeline in AWS Glue ETL to edit the rules for each of the 1,000 Data Catalog tables. Use an AWS Lambda function to call the corresponding AWS Glue job for each Data Catalog table.
B. Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits.
C. Create an Amazon EMR cluster. Run a pipeline on Amazon EMR that edits the rules for each Data Catalog table. Use an AWS Lambda function to run the EMR pipeline.
D. Use the AWS Management Console to edit the rules within the Data Catalog.
Answer
B. Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits.
Explanation
Using an AWS Lambda function to call the AWS Glue Data Quality API is the most efficient and scalable way to edit the data quality rules across 1,000 Data Catalog tables with minimal operational overhead.
Here’s why the other options are not optimal:
A. Creating an AWS Glue ETL pipeline to edit the rules for each table individually would be very time-consuming and require a lot of unnecessary code. Using Lambda to call a Glue job for each table is also inefficient.
C. Setting up an Amazon EMR cluster to run a pipeline that edits the rules for each table is overkill for this task. EMR is more suited for big data processing workloads. Using Lambda to trigger the EMR pipeline for each table edit is also not an efficient architecture.
D. Manually editing the rules for 1,000 tables via the AWS Management Console would be extremely tedious and error-prone. It’s not a scalable solution.
In contrast, using a single Lambda function to make API calls to Glue Data Quality allows you to efficiently update the data quality rules across all 1,000 tables with just a few lines of code. Lambda automatically scales to handle the API requests, and you avoid the operational overhead of managing Glue ETL jobs, EMR clusters, or manual edits.
Therefore, choice B provides the simplest and most effective way to implement the data quality rule changes with the least operational burden on the data engineer. Let me know if this explanation makes sense!
Amazon AWS Certified Data Engineer – Associate DEA-C01 certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Data Engineer – Associate DEA-C01 exam and earn Amazon AWS Certified Data Engineer – Associate DEA-C01 certification.