Table of Contents
Why Use Setup Method for Resource Initialization in Hadoop Mapper?
Hadoop’s setup() method initializes resources like DB connections once per Mapper/Reducer task before map/reduce processing, optimizing performance by avoiding repeated setup across records.
Question
Why is the setup method useful in a Mapper or Reducer class?
A. To finalize reducer output
B. To compress final HDFS output
C. To initialize resources before task execution
D. To partition mapper outputs
Answer
C. To initialize resources before task execution
Explanation
The setup method in Mapper and Reducer classes runs exactly once per task instance before any map() or reduce() invocations, providing an efficient hook for one-time initialization of shared resources such as database connections, lookup tables from configuration parameters, custom counters, or file handles that persist across all records processed by that task. This avoids redundant setup overhead inside per-record methods, optimizes performance (especially for I/O-heavy operations), and ensures resources are ready when core processing begins, unlike cleanup() which handles finalization or partitioning which occurs during shuffle. Developers commonly override public void setup(Context context) to read job parameters via context.getConfiguration() or establish connections, making it indispensable for production MapReduce jobs requiring external data or stateful processing.