Skip to Content

How Does Combiner Aggregate Mapper Output Locally in MapReduce?

Why Is Hadoop Combiner Called Mini-Reducer for Local Aggregation?

Understand why Hadoop’s combiner earns the “mini-reducer” name by locally aggregating mapper output to cut network traffic before reducers, boosting MapReduce efficiency with examples.

Question

Why is a combiner often called a “mini-reducer”?

A. Because it formats the final output
B. Because it partitions the data
C. Because it executes after reducers
D. Because it aggregates mapper output locally

Answer

D. Because it aggregates mapper output locally

Explanation

A combiner is called a “mini-reducer” because it runs on the same node as the mapper after map() execution completes, performing local aggregation on intermediate key-value pairs from that mapper before shuffle/sort sends data across the network to reducers. This reduces data volume transferred over the network (e.g., multiple “word”, 1 pairs become “word”, N), optimizing bandwidth and job performance, but only for commutative/associative operations like summation. Unlike full reducers that receive grouped/sorted input from all mappers across the cluster, combiners process only local map output per task, execute optionally (not guaranteed by Hadoop), and cannot format final output, partition data, or run post-reduce.