In this section we will understand about the NameNode in the Hadoop HDFS system and learn the importance of NameNode in the Hadoop ecosystem.
NameNode is also known as Mater in the Hadoop eco-system, which is the heart of the whole system and required most reliable hardware in the production environment.
Hadoop uses HDFS, which is the primary file system for storing the data by Hadoop applications. In a HDFS cluster NameNode is heart of the system which manages file system and the data nodes. NameNode is the heart of the system and single point failure for the system, if NameNode is down in the system, system won’t be able to process the file. User application won’t be able to access or save the file into the system.
In a HDFS cluster NameNode pays major role in managing the file system metadata and the DataNodes. In the HDFS system DataNode actually stores the data and NameNode manages and coordinates with all the DataNodes in the system.
There are three components of the HDFS cluster architecture:
NameNode: Manages the file system meta-data and all the DataNodes. Client interacts with the NameNode for storing, updating or requesting the data stored in the file system.
DataNode: DataNode actually stores the data in the HDFS architecture. Actual I/O is performed on the DataNodes directly.
Client Application: The client application interacts with the NameNode for all the activities such as saving, updating or retrieving files. Actual IO is performed on the DataNode.
Recommendations for NameNode in Production:
In the production system highly reliable and robust server should be used as this machine is very important for the functioning of the whole system.
- Use highly quality server with lots of RAM and processing power. The more RAM we have better performance and more disk space. A rough rules for this is that for 100TB of raw disk space we should have 1GB RAM.
- The CPU requirement is less and usually it will use 2-5% of CPU if multi-core server is used. Here the requirement is to be use the highly reliable hardware then the high performance system.
- Always use the ECC certified RAM in your production server.
- Run system on the latest version of Java, at the time of writing of this tutorial latest version of Java was Java 1.8 or simply Java 8.
- Run the server VM with compressed pointers -XX:+UseCompressedOops, this cuts the JVM heap significantly size down.
- Always monitor the free disk space available to the NameNode and if it is low add more space to it.
- The DataNode, JobTracker or TaskTracker services should be hosted on the different servers. Don’t host all these on the same system.
- You should configure the NameNode for storing a copy(another set) of the transaction logs to a network mounted disk, this will help you in disaster recovery.
In the production system it should run on reliable hardware and there should be another standby NameNode to handle the system if Master (NameNode) fails.