Got it

This section describes the FusionInsight components

Latest reply: Nov 24, 2021 05:22:20 482 14 16 0 6

This section describes the FusionInsight components

Hello everyone, today I want to learn about HDFS with you. Let's get started.

Overview

Huawei Fusion Insight is a distributed data processing system that provides large-capacity data storage, query, and analysis capabilities. Fusion Insight encapsulates another layer on top of the Hadoop cluster, similar to open source big data platforms like CDH and HDP.

HDFS Principle - Distributed file system

When configuring the hbase cluster to connect the HDFS to other mirror disks, there are many perplexing problems. The three cornerstones of the underlying technology of big data originated from Google's three papers before 2006, GFS Map-Reduce Bigtable, in which GFS Map-Reduce technology directly supports Apache With the Hadoop project,Bigtable gave birth to a new database domain called NoSQL.

Due to the high latency of mapreduce processing framework, After 2009, Google launched Dremel to promote the rise of real-time computing system, which triggered the second wave of big data technology, some big data companies have launched their own big data query analysis products, such as Cloudera open source big data query analysis engine Impala Hortonworks open source Stinger Facebook open source Presto UC Berkeley AMPLAB LABS developed Spark computing framework. All of these technologies are based on HDSF data sources, the most basic of which is read and write operations.

Data storage redundancy

To ensure the fault tolerance and availability of the system,HDFS adopts the multi-copy mode for data redundancy storage. Usually, multiple copies of a data block are distributed to different slave nodes, which has the following advantages:

Speed up data transmission speed;Easy to check data error;Ensure data reliability.

Strategy for data access

Storage of data

In order to improve data reliability and system availability, and make full use of network bandwidth,HDFS adopts the RACK-based data storage policy. Usually, an HDFS cluster contains multiple Racks. Data between different Racks needs to be communicated through switches or routers, but the same RACK does not The default redundancy replication factor of THE HDFS is 3. Each file block is stored in three places. Two copies are stored on different machines of the same RACK, and the third copy is stored on different machines of the same RACK.

Data reading

The HDFS provides an API to determine the ID of the Rack to which the primary node belongs. Clients can invoke the API to obtain their own Rack ID When a client reads data, it obtains a list of the locations where different copies of the block are stored from the primary node. The list contains the secondary nodes where the copies are stored. It can call the API to determine the Rack IDS of the client and those secondary nodes If the ids are the same, the copy is preferred for reading data.

Data reproduction

The HDFS data replication adopts the pipeline replication strategy, which greatly improves the efficiency of the data replication process. When a client writes a file to the HDFS, the file is first written locally and divided into several blocks. The size of each block is determined according to the value set by HDFS Each block sends a write request like the primary node in the HDFS cluster, and the primary node returns a list of writable secondary nodes, which are then written.

Data Errors and Recovery

Primary node error

Store metadata information on the master node synchronously in other file systems;

Run a second primary node, after the primary node is down, it can make up for it by using the second secondary secondary node for data recovery.

Error from Node

Each slave node periodically sends information to the primary node to report its own status.When the slave nodes fail, they are marked as down, and the primary node no longer sends IO requests to them.At this point, if you find that there are fewer data blocks than the redundancy factor, a data redundancy replication will be initiated to generate a new copy of it.

Data Error

After the client reads the data, it checks the data with MD5 and SHA1 to ensure that the correct data is read.If an error is found, a copy of the data block is read.

That's all. Thank you.


  • x
  • convention:

little_fish
Admin Created Aug 23, 2021 10:00:06

Thanks.
View more
  • x
  • convention:

user_4147187
user_4147187 Created Aug 23, 2021 10:06:42 (0) (0)
Thanks for your support.  
Unicef
MVE Created Aug 23, 2021 14:57:14

GOOD SHARE
View more
  • x
  • convention:

user_4147187
user_4147187 Created Aug 30, 2021 11:30:05 (0) (0)
Thank you.  
user_4267501
Created Aug 30, 2021 04:04:57

Nice
View more
  • x
  • convention:

user_4147187
user_4147187 Created Aug 30, 2021 11:30:36 (0) (0)
 
Liu_Yingluo
Created Aug 30, 2021 04:08:55

The article is worth our study.This section describes the FusionInsight components-4109767-1
View more
  • x
  • convention:

user_4147187
user_4147187 Created Aug 30, 2021 11:30:28 (0) (0)
Thanks.  
andersoncf1
MVE Author Created Aug 30, 2021 05:05:39

Good
View more
  • x
  • convention:

user_4147187
user_4147187 Created Aug 30, 2021 11:30:17 (0) (0)
Thanks.  
adrian_alucard
Created Sep 23, 2021 04:47:21

Great work
View more
  • x
  • convention:

MahMush
Moderator Author Created Nov 6, 2021 08:30:58

goid sharing
View more
  • x
  • convention:

user_4358465
Created Nov 24, 2021 04:57:14

Thank you for the post.
View more
  • x
  • convention:

NgTrang
Created Nov 24, 2021 05:22:20

Thanks for sharing
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.