Skip to Content

Amazon DEA-C01: What is the most operationally efficient way to synchronize an on-premises Oracle data warehouse with Amazon S3?

Learn the most operationally efficient approach to load and synchronize an on-premises Oracle data warehouse with Amazon S3 using AWS services like AWS DMS and AWS Glue. Discover best practices for incremental data loading.

Table of Contents

Question

A company maintains a data warehouse in an on-premises Oracle database. The company wants to build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and synchronize the tables with incremental data that arrives from the data warehouse every day.

Each table has a column that contains monotonically increasing values. The size of each table is less than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A business intelligence team queries the tables between 10 AM and 8 PM every day.

Which solution will meet these requirements in the MOST operationally efficient way?

A. Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.
B. Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.
C. Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day. Overwrite the previous day’s full-load copy every day.
D. Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day’s full-load copy every day.

Answer

The most operationally efficient solution to meet the requirements is Option A:

Use an AWS Database Migration Service (AWS DMS) full load plus CDC (change data capture) job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.

Explanation

AWS DMS supports full load plus CDC for Oracle databases, which allows loading the initial full data set and then continuously capturing incremental changes. The monotonically increasing column can be used as a bookmark to track changes.

AWS DMS is more operationally efficient than using AWS Glue for the initial load because DMS is a managed service optimized for database migration. It handles the complexities of data extraction, transformation, and loading.

Once the full load is in S3, AWS Glue can be used to append the daily incremental data to the full copy. Glue is well-suited for this ETL task.

Appending increments is more efficient than overwriting the entire dataset daily, as it reduces data transfer and processing. It takes advantage of the existing data.

The BI team’s usage window of 10 AM to 8 PM provides ample time overnight to perform the incremental refresh, which occurs between 1-2 AM.

Therefore, combining AWS DMS for the initial load and change capture with AWS Glue for incremental appends provides the most operationally efficient solution while meeting all stated requirements.

Amazon AWS Certified Data Engineer – Associate DEA-C01 certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Data Engineer – Associate DEA-C01 exam and earn Amazon AWS Certified Data Engineer – Associate DEA-C01 certification.