Learn the best approach for ingesting transactional data from Amazon Kinesis, calculating rolling averages, and serving results to ML models using SageMaker Feature Store.
Table of Contents
Question
A machine learning (ML) engineer at a bank is building a data ingestion solution to provide transaction features to financial ML models. Raw transactional data is available in an Amazon Kinesis data stream.
The solution must compute rolling averages of the ingested data from the data stream and must store the results in Amazon SageMaker Feature Store. The solution also must serve the results to the models in near real time.
Which solution will meet these requirements?
A. Load the data into an Amazon S3 bucket by using Amazon Kinesis Data Firehose. Use a SageMaker Processing job to aggregate the data and to load the results into SageMaker Feature Store as an online feature group.
B. Write the data directly from the data stream into SageMaker Feature Store as an online feature group. Calculate the rolling averages in place within SageMaker Feature Store by using the SageMaker GetRecord API operation.
C. Consume the data stream by using an Amazon Kinesis Data Analytics SQL application that calculates the rolling averages. Generate a result stream. Consume the result stream by using a custom AWS Lambda function that publishes the results to SageMaker Feature Store as an online feature group.
D. Load the data into an Amazon S3 bucket by using Amazon Kinesis Data Firehose. Use a SageMaker Processing job to load the data into SageMaker Feature Store as an offline feature group. Compute the rolling averages at query time.
Answer
C. Consume the data stream by using an Amazon Kinesis Data Analytics SQL application that calculates the rolling averages. Generate a result stream. Consume the result stream by using a custom AWS Lambda function that publishes the results to SageMaker Feature Store as an online feature group.
Explanation
This approach efficiently processes the streaming data, computes the required rolling averages, and stores the results in SageMaker Feature Store for near real-time serving to the ML models.
The other options have drawbacks:
A. Loading data into S3 and using SageMaker Processing adds unnecessary steps and latency.
B. Writing directly to Feature Store and using GetRecord API to calculate averages is inefficient and not suitable for rolling calculations.
D. Using an offline feature group and computing averages at query time does not meet the near real-time requirement.
Amazon AWS Certified Machine Learning – Specialty (MLS-C01) certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Machine Learning – Specialty (MLS-C01) exam and earn Amazon AWS Certified Machine Learning – Specialty (MLS-C01) certification.