Skip to Content

AWS SAA-C03: Serverless Data File Processing with AWS Transfer Family and Lambda

Learn how to efficiently process incoming data files using AWS Transfer Family for FTP, Amazon S3 for storage, and AWS Lambda for serverless parallel processing triggered by S3 event notifications.

Table of Contents

Question

A data analytics company wants to migrate its batch processing system to AWS. The company receives thousands of small data files periodically during the day through FTP. An on-premises batch job processes the data files overnight. However, the batch job takes hours to finish running.

The company wants the AWS solution to process incoming data files as soon as possible with minimal changes to the FTP clients that send the files. The solution must delete the incoming data files after the files have been processed successfully. Processing for each file needs to take 3-8 minutes.

Which solution will meet these requirements in the MOST operationally efficient way?

A. Use an Amazon EC2 instance that runs an FTP server to store incoming files as objects in Amazon S3 Glacier Flexible Retrieval. Configure a job queue in AWS Batch. Use Amazon EventBridge rules to invoke the job to process the objects nightly from S3 Glacier Flexible Retrieval. Delete the objects after the job has processed the objects.
B. Use an Amazon EC2 instance that runs an FTP server to store incoming files on an Amazon Elastic Block Store (Amazon EBS) volume. Configure a job queue in AWS Batch. Use Amazon EventBridge rules to invoke the job to process the files nightly from the EBS volume. Delete the files after the job has processed the files.
C. Use AWS Transfer Family to create an FTP server to store incoming files on an Amazon Elastic Block Store (Amazon EBS) volume. Configure a job queue in AWS Batch. Use an Amazon S3 event notification when each file arrives to invoke the job in AWS Batch. Delete the files after the job has processed the files.
D. Use AWS Transfer Family to create an FTP server to store incoming files in Amazon S3 Standard. Create an AWS Lambda function to process the files and to delete the files after they are processed. Use an S3 event notification to invoke the Lambda function when the files arrive.

Answer

B. Use an Amazon EC2 instance that runs an FTP server to store incoming files on an Amazon Elastic Block Store (Amazon EBS) volume. Configure a job queue in AWS Batch. Use Amazon EventBridge rules to invoke the job to process the files nightly from the EBS volume. Delete the files after the job has processed the files.

Explanation

AWS Transfer Family: AWS Transfer Family is a fully managed service that allows you to create and manage FTP servers securely. This eliminates the need to manage an EC2 instance for the FTP server, reducing operational overhead.

Amazon S3 Standard: Incoming files are stored directly in Amazon S3 Standard, which provides high durability, availability, and scalability for object storage. This avoids the need for additional storage volumes or retrieval from Glacier.

AWS Lambda: AWS Lambda is a serverless compute service that can run code in response to events or triggers. In this case, a Lambda function can be created to process the incoming data files as soon as they arrive in S3.

S3 Event Notification: Amazon S3 supports event notifications, which can be configured to trigger a Lambda function whenever a new object is created in the S3 bucket. This ensures that the files are processed as soon as they arrive, meeting the requirement for minimal delay.

Parallel Processing: Since the processing time for each file is between 3-8 minutes, multiple Lambda functions can run in parallel to process the files concurrently, enabling efficient and scalable processing of high volumes of files.

Deletion after Processing: The Lambda function can delete the processed files from S3 after successful processing, satisfying the requirement to delete the files once they are processed.

This solution leverages fully managed AWS services like AWS Transfer Family, Amazon S3, and AWS Lambda, reducing operational overhead and providing a highly scalable and efficient architecture. It minimizes changes to the existing FTP clients, as they can continue to transfer files using the standard FTP protocol. Additionally, it ensures that files are processed as soon as they arrive, and automatically deleted after successful processing, meeting the company’s requirements in the most operationally efficient way.

Amazon AWS Certified Solutions Architect – Associate SAA-C03 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Solutions Architect – Associate SAA-C03 exam and earn Amazon AWS Certified Solutions Architect – Associate SAA-C03 certification.