Got it

HCIA-Big Data | Introduction to HDFS

Latest reply: Apr 19, 2022 20:13:39 1083 26 16 0 0

Hello, friend!

In this post, I will share with you the HDFS.

HDFS Overview

Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.

HDFS has a high fault tolerance capability and is deployed on cost-effective hardware.

HDFS provides high-throughput access to application data and applies to applications with large data sets.

HDFS looses some Potable Operating System Interface of UNIX (POSIX) requirements to implement streaming access to file system data.

HDFS was originally built as the foundation for the Apache Nutch Web search engine project.

HDFS is a part of the Apache Hadoop Core project.

HDFS Architecture Overview

HDFS Architecture Overview

HDFS working principle, see [FI Components] HDFS working principle.

HDFS Namespace Management

The HDFS namespace contains directories, files, and blocks.

HDFS uses the traditional hierarchical file system. Therefore, users can create and delete directories and files, move files between directories, and rename files in the same way as using a common file system.

NameNode maintains the file system namespace. Any changes to the file system namespace or its properties are recorded by the NameNode.

Communication Protocol

HDFS is a distributed file system deployed on a cluster. Therefore, a large amount of data needs to be transmitted over the network.

  • All HDFS communication protocols are based on the TCP/IP protocol.

  • The client initiates a TCP connection to the NameNode through a configurable port and uses the client protocol to interact with the NameNode.

  • The NameNode and the DataNode interact with each other by using the DataNode protocol.

  • The interaction between the client and the DataNode is implemented through the Remote Procedure Call (RPC). In design, the NameNode does not initiate an RPC request, but responds to RPC requests from the client and DataNode.


The client is the most commonly used method for users to operate HDFS. HDFS provides a client during deployment.

The HDFS client is a library that contains HDFS file system interfaces that hide most of the complexity of HDFS implementation.

Strictly speaking, the client is not a part of HDFS.

The client supports common operations such as opening, reading, and writing, and provides a command line mode similar to Shell to access data in HDFS.

HDFS also provides Java APIs as client programming interfaces for applications to access the file system.

If you want to know more information about HDFS modules, see [FI Components] HA HDFS Architecture.

Disadvantages of the HDFS Single-NameNode Architecture

Only one NameNode is set for HDFS, which greatly simplifies the system design but also brings some obvious limitations. The details are as follows:

  • Namespace limitation: NameNodes are stored in the memory. Therefore, the number of objects (files and blocks) that can be contained in a NameNode is limited by the memory size.

  • Performance bottleneck: The throughput of the entire distributed file system is limited by the throughput of a single NameNode.

  • Isolation: Because there is only one NameNode and one namespace in the cluster, different applications cannot be isolated.

  • Cluster availability: Once the only NameNode is faulty, the entire cluster becomes unavailable.

HDFS-related Concepts

Computer Cluster Structure

The distributed file system stores files on multiple computer nodes. Thousands of computer nodes form a computer cluster.

Currently, the computer cluster used by the distributed file system consists of common hardware, which greatly reduces the hardware overhead.

Computer Cluster Structure

Basic System Architecture

Basic System Architecture


The default size of an HDFS block is 128 MB. A file is divided into multiple blocks, which are used as storage units.

The block size is much larger than that of a common file system, minimizing the addressing overhead.

The abstract block concept brings the following obvious benefits:

  • Supporting large-scale file storage.

  • Simplifying system design.

  • Applicable to data backup.

The difference between NameNode and DataNode



Stores metadata.

Stores file content.

Metadata is stored in the memory.

The file content is stored in the disk.

Saves the mapping between files, blocks, and DataNodes.

Maintains the mapping between block IDs and local files on DataNodes.

NameNode and DataNode

If you want to know more information about NameNode and DataNode, see HDFS Architecture and Functionality.

Summary of HDFS-related posts

HDFS RPC Server Summary

HDFS Balancer

HDFS Kernel Permission Check (ACL)

HDFS Data Storage Policy: LAZY_PERSIST

Useful command-line operations on HDFS (Common Fault  Locating Methods)

HDFS start fails due to insufficient memory

Failing to reclaim the Delta Table Space of the HDFS tables

FusionStorageHDFS Service Unavailable

The 20 Hadoop Shell Commands to Manage HDFS

[Infographic]The World in the Cloud | FusionInsight (Issue  6) HDFS

[FI Components] HA HDFS Architecture

[FI Components] HDFS working principle

[FI Components] The HDFS High Availability Overview

[FI Components] Relationship between Spark and HDFS

[FI Components Log] HDFS Component Log Introduction

How to plan HDFS Capacity

How to configurate rule of HDFS NameNode and DataNode JVM Parameters

In What Situation Will the HDFS Copy Recovery Be Triggered?

How to use hdfs dfs command?

hdfs dfsadmin command

That's all, thanks!

  • x
  • convention:

Created Mar 3, 2022 14:52:08

Thank you for sharing!
View more
  • x
  • convention:

olive.zhao Created Mar 7, 2022 00:46:55 (0) (0)
Moderator Created Mar 3, 2022 23:59:07

View more
  • x
  • convention:

olive.zhao Created Mar 7, 2022 00:47:32 (0) (0)
MVE Created Mar 4, 2022 05:42:11

I very much appreciate your great support, thanks
View more
  • x
  • convention:

olive.zhao Created Mar 7, 2022 00:47:08 (0) (0)
Created Mar 6, 2022 04:35:40

Interesting information to know.
View more
  • x
  • convention:

olive.zhao Created Mar 7, 2022 00:47:13 (0) (0)
Admin Created Mar 29, 2022 07:22:39

View more
  • x
  • convention:

olive.zhao Created Apr 2, 2022 01:50:45 (0) (0)
Created Apr 11, 2022 03:50:36

Good share
View more
  • x
  • convention:

MVE Author Created Apr 11, 2022 08:15:55

Useful Info
View more
  • x
  • convention:

Created Apr 11, 2022 09:50:51

Thanks for sharing
View more
  • x
  • convention:

olive.zhao Created Apr 12, 2022 00:35:01 (0) (0)
Created Apr 11, 2022 14:01:13

Thanks for sharing
View more
  • x
  • convention:

olive.zhao Created Apr 12, 2022 00:34:56 (0) (0)
Back to list


You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits


Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.