Learn the key requirements for configuring an AWS Glue Crawler to create a single table when crawling objects in an Amazon S3 bucket. Discover the importance of consistent object format, compression type, schema, and naming conventions.
Table of Contents
Question
A finance company receives data from third-party data providers and stores the data as objects in an Amazon S3 bucket.
The company ran an AWS Glue crawler on the objects to create a data catalog. The AWS Glue crawler created multiple tables. However, the company expected that the crawler would create only one table.
The company needs a solution that will ensure the AVS Glue crawler creates only one table.
Which combination of solutions will meet this requirement? (Choose two.)
A. Ensure that the object format, compression type, and schema are the same for each object.
B. Ensure that the object format and schema are the same for each object. Do not enforce consistency for the compression type of each object.
C. Ensure that the schema is the same for each object. Do not enforce consistency for the file format and compression type of each object.
D. Ensure that the structure of the prefix for each S3 object name is consistent.
E. Ensure that all S3 object names follow a similar pattern.
Answer
The correct combination of solutions to ensure the AWS Glue crawler creates only one table is:
A. Ensure that the object format, compression type, and schema are the same for each object.
D. Ensure that the structure of the prefix for each S3 object name is consistent.
Explanation
For an AWS Glue crawler to create a single table when crawling objects in an Amazon S3 bucket, the following conditions must be met:
- Object Format: All objects should have the same format (e.g., CSV, JSON, Parquet). If the objects have different formats, the crawler will create separate tables for each format.
- Compression Type: The compression type (e.g., GZIP, SNAPPY, BZIP2) should be consistent across all objects. Inconsistent compression types will cause the crawler to create multiple tables.
- Schema: The schema (structure and data types) of the objects should be the same. If the schema varies, the crawler will generate separate tables to accommodate the different schemas.
- Prefix Structure: The prefix structure of the S3 object names should be consistent. The prefix helps the crawler determine the hierarchy and partitioning of the data. Inconsistent prefixes may lead to the creation of multiple tables.
While option E (ensuring all S3 object names follow a similar pattern) can be helpful for organization and consistency, it is not a strict requirement for the AWS Glue crawler to create a single table.
By ensuring that the object format, compression type, schema, and prefix structure are consistent across all objects in the S3 bucket, the AWS Glue crawler will create a single table representing the data.
Amazon AWS Certified Data Engineer – Associate DEA-C01 certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Data Engineer – Associate DEA-C01 exam and earn Amazon AWS Certified Data Engineer – Associate DEA-C01 certification.