Got it

Flume working principle

Latest reply: Dec 23, 2021 09:48:17 2921 1 3 0 1
Hello, everyone!

This post describes the Flume working principle. Please find more information below.

How Flume works

Flume’s high-level architecture is built on a streamlined codebase that is easy to use and extend. The project is highly reliable, without the risk of data loss.

Flume also supports dynamic reconfiguration without the need for a restart, which reduces the downtime for its agents.

The following components make up Apache Flume:




A singular unit of data that is transported by Flume (typically a single log entry)


The entity through which data enters into Flume. Sources either actively poll for data or passively wait for data to be delivered to them. A variety of sources allow data to be collected, such as log4j logs and syslogs.


The entity that delivers the data to the destination. A variety of sinks allow data to be streamed to a range of destinations. One example is the HDFS sink that writes events to HDFS.


The conduit between the Source and the Sink. Sources ingest events into the channel and the sinks drain the channel.


Any physical Java virtual machine running Flume. It is a collection of sources, sinks and channels.


The entity that produces and transmits the Event to the Source operating within the Agent.

Flume components interact in the following way:

1. a flow in Flume starts from the Client;

2. the Client transmits the Event to a Source operating within the Agent;

3. the Source receiving this Event then delivers it to one or more Channels;

4. one or more Sinks operating within the same Agent drains these Channels;

5. channels decouple the ingestion rate from the drain rate using the familiar producer-consumer model of data exchange;

6. when spikes in client side activity cause the data to be generated faster than can be handled by the provisioned destination capacity can handle, the Channel size increases. This allows sources to continue their normal operation for the duration of the spike;

7. the Sink of one Agent can be chained to the Source of another Agent. This chaining enables the creation of complex data flow topologies.

Flume’s distributed architecture requires no central coordination point. Each agent runs independently of others with no inherent single point of failure and it can easily scale horizontally.

Flume structure

1. The external structure of Flume:

Flume Structure

As shown in the figure above, the data generated by the data generator (e.g. facebook, twitter) is collected by a single agent running on the server where the data generator is located, after which the data container collects data from each agent and collects it. Data is stored in HDFS or HBase.

2. Flume event

The event acts as the most basic unit of Flume's internal data transfer. It is a byte array of retransmitted data (the data set is passed from the data source access point and transmitted to the transmitter, ie HDFS/HBase) and a Choose the head composition.

A typical Flume event is shown in the following structure:

Flume event

When we are going to privately customize the plugin, for example: the flume-hbase-sink plugin is to get the event and then parse it, and filter it according to the situation, and then transfer it to HBase or HDFS.

3. Flume Agent

After understanding the external structure of Flume, we know that Flume has one or more Agents inside. However, for each Agent, it is a separate daemon (JVM), which receives collections from the client, or from other Agents. Where to receive, and then quickly transfer the acquired data to the next destination node sink, or agent. The basic model of Flume as shown below:

Flume Agent

The Agent is mainly composed of three components: source, channel and sink.


Receive data from the data generator and pass the received data to one or more channels in Flume's event format. Flume provides multiple ways to receive data, such as Avro, Thrift, twitter1%, etc.


Channel is a short-lived storage container that caches the data in the event format received from the source until they are consumed by the sinks. It acts as a bridge between the source and the sink. Channel is a complete transaction. This guarantees the consistency of the data when it is sent and received. And it can be linked with any number of source and sink. Supported types are: JDBC channel, File System channel, Memort channel, etc.


The sink stores the data in centralized storage such as Hbase and HDFS, which consumes the events from the channels and passes them to the destination. The destination may be another sink, or HDFS, HBase.

An example of its combination:

Flume sink

Flume sink1

That's all, thanks!

  • x
  • convention:

Admin Created Dec 23, 2021 09:48:17

Thanks for your sharing!
View more
  • x
  • convention:


You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits


Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.