Got it

HCIA-Big Data | HDFS Key Features and HDFS Data Read/Write Process

Latest reply: Mar 8, 2022 02:46:28 472 2 2 0 0

Hello, friend!

In this post, I will share with you the HDFS key features and HDFS Read/Write process.

HDFS key features

HDFS High Availability (HA)


The High availability of HDFS is achieved by using ZooKeeper to implement active and standby NameNodes and thus solving the problem of a single point of failure. Here ZooKeeper is used to store HA status files. 

Actually, the active NameNode provides services, and standby NameNode synchronizes metadata from the active NameNode, and functions as the hot backup of the Active NameNode.

As for metadata synchronization well the generated EditLog is written to the localhost, and JournaINodes at the same time. When detecting that the EditLog on the JournalNodes changes the standby NameNode loads the EditLog to the memory of its own and generates new metadata which is the same as that on the active NameNode. And then metadata synchronization is complete. While the active and standby Fslmages are still stored in their respective disks and do not interact with each other.

Fslmage is a copy of the metadata periodically written from the memory to the local disk. So it is also called metadata image.

And ZooKeeper Failover Controller(ZKFS) controls active and standby NameNode arbitration. As a simplified arbitration agent, ZKFS uses the distributed local function of the Zookeeper to implement active and standby arbitration and controls the active and standby status of NameNodes through the command channel.

Metadata Persistence

Metadata Persistence

HDFS Federation

HDFS Federation

Data Replica Mechanism

Data Replica Mechanism

HDFS Data Integrity Assurance

HDFS aims to ensure the integrity of storage data and ensures the reliability of components.

Rebuilding the replica data of failed data disks

  • When the DataNode fails to report data to the NameNode periodically, the NameNode initiates the replica rebuilding action to restore the lost replicas.

Cluster data balancing:

  • The HDFS architecture designs the data balancing mechanism, which ensures that data is evenly distributed on each DataNode.

Metadata reliability:

  • The log mechanism is used to operate metadata, and metadata is stored on the active and standby NameNodes.

  • The snapshot mechanism implements the common snapshot mechanism of file systems, ensuring that data can be restored in a timely manner in the case of mis-operations.

Security mode:

  • HDFS provides a unique security mode mechanism to prevent faults from spreading when DataNodes or disks are faulty.

Other Key Design Points of the HDFS Architecture

Space reclamation mechanism:

  • Supports the recycle bin mechanism and dynamic setting of the number of copies.

Data organization:

  • Data is stored by a data block in the HDFS of the operating system.

Access mode:

  • Provides HDFS data accessing through Java APIs, HTTP, or SHELL modes.

Common Shell Commands

Common Shell Commands

New Features of HDFS 3.0

Erasure Code (EC) in HDFS is supported.

Union based on the HDFS router is supported.

Multiple NameNodes are supported.

Disk balancers are added to DataNodes for load balancing.

HDFS Data Read/Write Process

HDFS Data Read Process

data read

HDFS Data Write Process

HDFS Data Write Process

Summary of HDFS-related posts

HDFS RPC Server Summary

HDFS Balancer

HDFS Kernel Permission Check (ACL)

HDFS Data Storage Policy: LAZY_PERSIST

Useful command-line operations on HDFS (Common Fault  Locating Methods)

HDFS start fails due to insufficient memory

Failing to reclaim the Delta Table Space of the HDFS tables

FusionStorageHDFS Service Unavailable

The 20 Hadoop Shell Commands to Manage HDFS

[Infographic]The World in the Cloud | FusionInsight (Issue  6) HDFS

[FI Components] HA HDFS Architecture

[FI Components] HDFS working principle

[FI Components] The HDFS High Availability Overview

[FI Components] Relationship between Spark and HDFS

[FI Components Log] HDFS Component Log Introduction

How to plan HDFS Capacity

How to configurate rule of HDFS NameNode and DataNode JVM Parameters

In What Situation Will the HDFS Copy Recovery Be Triggered?

How to use hdfs dfs command?

hdfs dfsadmin command

That's all, thanks!

The post is synchronized to: HCIA-Big Data

  • x
  • convention:

Admin Created Mar 8, 2022 02:46:28

  • x
  • convention:

olive.zhao Created Mar 9, 2022 01:21:55 (0) (0)


You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits


Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.