Table of Contents
What Defines Composite Key for Multi-Field Sorting in Hadoop?
Composite keys in Hadoop combine fields like state-city-value to dictate sorting/grouping order via custom comparators/partitioners, essential for secondary sort without extra processing steps.
Question
Which statement best describes a composite key?
A. It combines multiple fields to define a sorting/grouping order
B. It avoids shuffle phase in Hadoop
C. It represents the final reducer output
D. It compresses mapper outputs to reduce size
Answer
A. It combines multiple fields to define a sorting/grouping order
Explanation
A composite key in Hadoop MapReduce is a custom WritableComparable class that encapsulates multiple fields (like state, city, and total donation amount) into a single key emitted by the mapper, enabling the framework’s default sorting and grouping to operate across those fields hierarchically during shuffle/sort. Developers define compareTo() to establish sort priority (e.g., first by state, then city, then descending total), a custom partitioner using only the natural key (e.g., state) for reducer routing, and a grouping comparator to bundle values solely by natural key at reducers. This powers secondary sorting scenarios without post-processing, unlike single-field keys that can’t control multi-level order, shuffle avoidance, reducer output representation, or compression.