Skip to Content

How Does HDFS Split and Replicate Files Across DataNodes?

What Happens When Hadoop Writes a File to HDFS?

Learn the exact process of writing a file to HDFS for your Big Data certification. Understand how Hadoop splits files into blocks and replicates them across multiple DataNodes to guarantee fault tolerance and high availability.

Question

What occurs when Hadoop writes a file to HDFS?

A. File is split into blocks and replicated across multiple DataNodes
B. File is permanently stored in NameNode
C. File is converted into SQL tables
D. File is sent to a single DataNode without backup

Answer

A. File is split into blocks and replicated across multiple DataNodes

Explanation

When a file is written to the Hadoop Distributed File System (HDFS), the client first contacts the NameNode to get permission and locations. Once approved, the file is not kept whole; instead, it is divided into fixed-size chunks called blocks (typically 128 MB each). The client then streams these blocks directly to the DataNodes, where Hadoop automatically replicates each block (usually three times by default) across different nodes and racks to ensure high availability and fault tolerance. The file is never permanently stored in the NameNode (which only holds metadata), it is not converted into SQL tables, and it is never sent to just a single DataNode without backup.