Learn how to process and analyze genomic data in near-real-time using Amazon Kinesis Data Streams. Save the results to an Amazon Redshift cluster for flexible, parallel, and durable storage, enabling efficient analytics and insights for researchers.
Table of Contents
Question
A company is developing a gene reporting device that will collect genomic information to assist researchers with collecting large samples of data from a diverse population. The device will push 8 KB of genomic data every second to a data platform that will need to process and analyze the data and provide information back to researchers. The data platform must meet the following requirements:
- Provide near-real-time analytics of the inbound genomic data
- Ensure the data is flexible, parallel, and durable
- Deliver results of processing to a data warehouse
Which strategy should a solutions architect use to meet these requirements?
A. Use Amazon Kinesis Data Firehose to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon RDS instance.
B. Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.
C. Use Amazon S3 to collect the inbound device data, analyze the data from Amazon SQS with Kinesis, and save the results to an Amazon Redshift cluster.
D. Use an Amazon API Gateway to put requests into an Amazon SQS queue, analyze the data with an AWS Lambda function, and save the results to an Amazon Redshift cluster using Amazon EMR.
Answer
B. Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.
Explanation
To meet the requirements of near-real-time analytics, flexible and parallel processing, and delivering results to a data warehouse, the solutions architect should choose option B: Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.
Here’s why this option is the correct choice:
- Amazon Kinesis Data Streams: Amazon Kinesis Data Streams is a scalable and durable real-time data streaming service. It can handle high volumes of streaming data and provide near-real-time analytics. By using Kinesis Data Streams, the genomic data can be ingested and processed in real-time.
- Kinesis clients for analysis: Kinesis clients, such as Kinesis Data Analytics or Kinesis Data Firehose, can be used to analyze the data from Kinesis Data Streams. These clients allow for real-time processing and analysis of the genomic data, enabling near-real-time analytics as required.
- Amazon Redshift cluster using Amazon EMR: Amazon Redshift is a fully managed data warehousing solution that offers scalability and performance for analytics workloads. By saving the results of the data analysis to an Amazon Redshift cluster, you can ensure that the data is stored in a flexible, parallel, and durable manner. Amazon EMR (Elastic MapReduce) can be used to process the data and load it into the Redshift cluster.
By leveraging Amazon Kinesis Data Streams for streaming data ingestion, using Kinesis clients for real-time analysis, and saving the results to an Amazon Redshift cluster using Amazon EMR, you can meet the requirements of near-real-time analytics, flexible and parallel processing, and delivering results to a data warehouse.
Amazon AWS Certified Solutions Architect – Professional SAP-C02 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Solutions Architect – Professional SAP-C02 exam and earn Amazon AWS Certified Solutions Architect – Professional SAP-C02 certification.