What is SmallFS used for?

4

The number of files that HDFS NameNode can manage is restricted by the node's heap memory. A large number of small files (more small files indicate less data blocks) generated during the use of services can rapidly consume NameNode memory and slow NameNode running.
A background small file merging feature (namely, SmallFS) is developed to solve this problem. SmallFS automatically detects small files in the system based on the file size threshold, merges them, and stores metadata to a local LevelDB to reduce the NameNode load. Moreover, it provides a new FileSystem interface for users to transparently access these small files.

Other related questions:
What is SmallFS?
The number of files that HDFS NameNode can manage is restricted by the node's heap memory. A large number of small files (more small files indicate less data blocks) generated during the use of services can rapidly consume NameNode memory and slow NameNode running. A background small file merging feature (namely, SmallFS) is developed to solve this problem. SmallFS automatically detects small files in the system based on the file size threshold, merges them, and stores metadata to a local LevelDB to reduce the NameNode load. Moreover, it provides a new FileSystem interface for users to transparently access these small files.

Definition of SmallFS
The number of files that HDFS NameNode can manage is restricted by the node's heap memory. A large number of small files (more small files indicate less data blocks) generated during the use of services can rapidly consume NameNode memory and slow NameNode running. A background small file merging feature (namely, SmallFS) is developed to solve this problem. SmallFS automatically detects small files in the system based on the file size threshold, merges them, and stores metadata to a local LevelDB to reduce the NameNode load. Moreover, it provides a new FileSystem interface for users to transparently access these small files.

What is Kafka used for?
Kafka is a distributed, partitioned, and replicated message publishing and subscription system that provides features similar to the Java Message Service (JMS). Kafka features message persistence, high throughput, distribution, multi-client support, and real-time processing, and applies to online and offline message consumption. It is ideal for Internet service data collection scenarios, such as conventional data collection, active website tracing, aggregation of operation data in statistics systems (monitoring data), and log collection.

What is Yarn used for?
Yarn is the resource management system of Hadoop 2.0. It is a general resource module that manages and schedules resources for applications. Yarn can be used in the MapReduce framework as well as other frameworks such as Tez, Spark, and Storm.

What is Loader used for?
Compared with conventional Extract-Transform-Load (ETL), Loader has the following advantage and disadvantage: 1. Advantage: Loader uses a MapReduce-based parallel computing architecture as the underlying architecture, which delivers a faster data processing speed than ETL. 2. Disadvantage: Compared with ETL, Loader focuses more on the data import and export function of FusionInsight Hadoop and is weak in data conversion.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top