Skip to Content

SAA-C02: Solution to convert CSV files to Apache Parquet format and store in transformed data bucket.

Question

A company’s reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket. Which solution will meet these requirements with the LEAST development effort?

A. Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
C. Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
D. Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.

Answer

B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.

Explanation

The best solution for this scenario is B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step. This solution will meet the requirements with the least development effort because AWS Glue is a fully managed service that can automatically discover, catalog, and transform data. AWS Glue can also natively convert .csv files to Apache Parquet format without requiring any custom code. AWS Glue ETL jobs can run on demand or on a schedule and can write the output files to any S3 bucket.

The other solutions are not optimal because they require more development effort or do not support converting .csv files to Apache Parquet format natively.

A. Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.

This solution will require more development effort because it involves creating and managing an EMR cluster, writing a Spark application, and configuring EMRFS. EMR also does not support converting .csv files to Apache Parquet format natively, so the Spark application will need to use a third-party library or custom code.

C. Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.

This solution will require more development effort because it involves creating and managing a job definition, writing a Bash script, and submitting a job. AWS Batch also does not support converting .csv files to Apache Parquet format natively, so the Bash script will need to use a command-line tool or custom code.

D. Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.

This solution will require more development effort because it involves creating and managing a Lambda function, writing a Lambda handler, and configuring an event notification. AWS Lambda also does not support converting .csv files to Apache Parquet format natively, so the Lambda handler will need to use an SDK or custom code. Additionally, Lambda has limitations on execution time, memory size, and concurrency that may affect the performance and scalability of this solution.

Reference

Amazon AWS Certified Solutions Architect – Associate SAA-C02 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Solutions Architect – Associate SAA-C02 exam and earn Amazon AWS Certified Solutions Architect – Associate SAA-C02 certification.