Discover the most cost-effective solution for running SQL queries on encrypted data stored in Amazon S3 using Apache Parquet files and client-side encryption with KMS managed keys.
Table of Contents
Question
A weather forecasting company collects temperature readings from various sensors on a continuous basis. An existing data ingestion process collects the readings and aggregates the readings into larger Apache Parquet files. Then the process encrypts the files by using client-side encryption with KMS managed keys (CSE-KMS). Finally, the process writes the files to an Amazon S3 bucket with separate prefixes for each calendar day.
The company wants to run occasional SQL queries on the data to take sample moving averages for a specific calendar day.
Which solution will meet these requirements MOST cost-effectively?
A. Configure Amazon Athena to read the encrypted files. Run SQL queries on the data directly in Amazon S3.
B. Use Amazon S3 Select to run SQL queries on the data directly in Amazon S3.
C. Configure Amazon Redshift to read the encrypted files. Use Redshift Spectrum and Redshift query editor v2 to run SQL queries on the data directly in Amazon S3.
D. Configure Amazon EMR Serverless to read the encrypted files. Use Apache SparkSQL to run SQL queries on the data directly in Amazon S3.
Answer
A. Configure Amazon Athena to read the encrypted files. Run SQL queries on the data directly in Amazon S3.
Explanation
Amazon Athena is a serverless, interactive query service that allows you to analyze data stored in Amazon S3 using standard SQL. It supports querying data in various formats, including Apache Parquet, and can seamlessly work with data encrypted using client-side encryption with KMS managed keys (CSE-KMS).
Here’s why Amazon Athena is the most cost-effective solution:
- Serverless: Athena is serverless, so there is no infrastructure to manage, and you only pay for the queries you run.
- Direct querying on S3: Athena can query data directly from Amazon S3, eliminating the need to load data into a separate data warehouse or cluster.
- Cost-effectiveness: With Athena, you are charged based on the amount of data scanned by your queries, making it cost-effective for occasional queries on a specific calendar day.
The other options have limitations or additional costs:
- Amazon S3 Select (Option B) does not support client-side encryption with KMS managed keys.
- Amazon Redshift (Option C) requires provisioning and managing a cluster, which can be more expensive for occasional queries.
- Amazon EMR Serverless with Apache SparkSQL (Option D) is designed for more complex, large-scale data processing and may be more costly for simple SQL queries.
Therefore, using Amazon Athena to run SQL queries directly on the encrypted data in Amazon S3 is the most cost-effective solution for the given scenario.
Amazon AWS Certified Solutions Architect – Associate SAA-C03 certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Solutions Architect – Associate SAA-C03 exam and earn Amazon AWS Certified Solutions Architect – Associate SAA-C03 certification.