It's important to note that in a Hadoop cluster, "blocks of data" are replicated to the cluster members. Although not illustrated in the drawing is the master node or often referred to as the name node, which provides intelligence for clustering activities.
In the example, we are using one terabyte blocks of data within a Hadoop Data Platform (HDP) Cluster. You will need to verify that each one of the nodes participating in the cluster has adequate disk space to support the data block replicated between members. Unique for some clustered environments, this is not a shared disk environment, the data blocks exist on the servers local disk in most instances.
The network switch in the environment must also be capable of supporting block copies between nodes, so exercise caution if the cluster spans beyond the network switch infrastructure.
There are a lot of details related to configuring a Hadoop cluster, and perhaps this necessary information will assist with evaluation configuration alternatives.
For more information, contact us at [email protected].