Table of Contents
What Does Reducer Sum in Classic Hadoop Word Count Example?
In Hadoop’s Word Count, the reducer sums all mapper-emitted 1s per unique word after shuffle grouping, producing final <word, frequency> outputs essential for text frequency analysis.
Question
In the classic Word Count job, what is the reducer’s main role?
A. Deleting duplicate words from output
B. Summing counts for each unique word
C. Sorting words alphabetically
D. Splitting input text into tokens
Answer
B. Summing counts for each unique word
Explanation
In the classic Word Count job, the reducer receives grouped intermediate key-value pairs from all mappers (e.g., <“apple”, > after shuffle/sort), where each unique word serves as the key and an iterable list of 1s represents occurrences from across the dataset. The reducer iterates through these values, summing them into a total count (e.g., sum() → 3), then emits the final output pair <word, total_count> written to HDFS. This aggregation step completes the distributed counting process, leveraging Hadoop’s automatic grouping by key; token splitting happens in the mapper, sorting is framework-managed during shuffle, and duplicates aren’t explicitly deleted since identical keys are inherently grouped.