Skip to Content

How Composite Keys Control Sorting and Grouping in MapReduce?

What Defines Composite Key for Multi-Field Sorting in Hadoop?

Composite keys in Hadoop combine fields like state-city-value to dictate sorting/grouping order via custom comparators/partitioners, essential for secondary sort without extra processing steps.

Question

Which statement best describes a composite key?

A. It combines multiple fields to define a sorting/grouping order
B. It avoids shuffle phase in Hadoop
C. It represents the final reducer output
D. It compresses mapper outputs to reduce size

Answer

A. It combines multiple fields to define a sorting/grouping order

Explanation

A composite key in Hadoop MapReduce is a custom WritableComparable class that encapsulates multiple fields (like state, city, and total donation amount) into a single key emitted by the mapper, enabling the framework’s default sorting and grouping to operate across those fields hierarchically during shuffle/sort. Developers define compareTo() to establish sort priority (e.g., first by state, then city, then descending total), a custom partitioner using only the natural key (e.g., state) for reducer routing, and a grouping comparator to bundle values solely by natural key at reducers. This powers secondary sorting scenarios without post-processing, unlike single-field keys that can’t control multi-level order, shuffle avoidance, reducer output representation, or compression.