Got it

Kafka: a complete overview Highlighted

Latest reply: Aug 9, 2022 08:49:38 2782 56 33 0 1

Hello, everyone!

Today I will share with you the Kafka, hope you like this post.

Introduction to Kafka

Kafka was originally developed by Linkedin. It is a distributed, partitioned, multi-copy message publish-subscribe system. It provides features similar to JMS (JavaMessageService), but is completely different in design. It has message persistence. High-throughput, distributed, multi-client support, real-time and other features, suitable for offline and online message consumption, such as conventional message collection, website activity tracking, aggregate statistical system operation data (monitoring data), log collection and other large amounts of data Data collection scenarios for Internet services. The official design motives for Kafka are as follows:

We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds a large company might have. To do this we had to think through a fairly broad set of use cases.

It would have to have high throughput to support high volume event streams such as real-time log aggregation.

It would need to deal gracefully with large data backlogs to be able to support periodic data loads from offline systems.

It also meant the system would have to handle low-latency delivery to handle more traditional messaging use-cases.

We wanted to support partitioned, distributed, real-time processing of these feeds to create new, derived feeds. This motivated our partitioning and consumer model.

Finally, in cases where the stream is fed into other data systems for serving, we knew the system would have to be able to guarantee fault tolerance in the presence of machine failures.

Supporting these uses led us to a design with a number of unique elements, more akin to a database log than a traditional messaging system.

Kafka structure

At present, Kafka is a relatively popular distributed queue server, which is responsible for transferring data from one application to another, and there is no need to pay attention to how the data is transferred between applications during the transfer process. The following picture shows the Kafka queue. Server cluster architecture.

Kafka structure

As you can see from the above figure, for Kafka, this is like a typical hub, connecting various "cables/devices" in. What is the structure inside it? The schematic diagram is as follows:


cables/devices

In the figure, we see that there are three Brokers, that is, three cluster nodes, each message has a leader partition, two follower partitions, and several Producers, Consumers, and consumer groups. There is also a zookeeper cluster here. Kafka manages the cluster through zookeeper, elects the leader, and performs rebalance when the Consumer changes. Here, Producer uses push mode to publish messages to Broker, and Consumer uses pull mode to subscribe and consume messages from Broker.

Introduction to Kafka components

The following figure shows the relationship between the related components of Kafka:

Kafka

In the figure above, a topic is configured with 3 partitions. Partition1 has two offsets: 0 and 1. Partition2 has 4 offsets. Partition3 has 1 offset. The id of the replica is exactly the same as the server id of the machine where the replica is located.

If the number of copies of a topic is 3, then Kafka will create 3 identical copies for each partition in the cluster. Each broker in the cluster stores one or more partitions. Multiple producers and consumers can simultaneously produce and consume data. As shown in the following table, the concepts and functions of each component of Kafka are listed.


Component name

effect

Broker

  • A Kafka cluster contains one or more service instances (server nodes), and these service instances are called Brokers.

  • The broker stores topic data. If a topic has N partitions and the cluster has N brokers, then each broker stores a partition of the topic. If the cluster has (N+M) brokers, then there are N brokers that store a partition of the topic, and the remaining M brokers do not store the partition data of the topic. In the end, you need to ensure that Kafka data is balanced.

Topic

  • Each message published to the Kafka cluster has a category, and this category is called Topic. (Physically different Topic messages are stored separately. Logically, although a Topic message is stored on one or more brokers, users only need to specify the Topic of the message to produce or consume data without worrying about where the data is stored.)

    Similar to the database table name

Partition

  • Kafka divides topics into one or more Partitions, and each topic has at least one partition. The data in each partition is stored using multiple segment files. The data in the partition is ordered, and the data between different partitions loses the order of the data.

  • It should be noted that if the topic has multiple partitions, the order of the data cannot be guaranteed when the data is consumed. In scenarios where the consumption order of messages needs to be strictly guaranteed, the number of partitions needs to be set to 1.

  • The position of each message in the file is called the offset (offset). The offset is a long number that uniquely marks a message. Consumers track records through (offset, partition, topic).

  • Any messages posted to this Partition will be directly appended to the end of the log file.

Producer

  • The producer is the publisher of data. This role publishes messages to Kafka topics. After the broker receives the message sent by the producer, the broker appends the message to the segment file currently used for appending data. The message sent by the producer is stored in a partition, and the producer can also specify the partition where the data is stored.

Consumer

  • Consumers can read data from the broker. Consumers can consume data in multiple topics.

Consumer Group

  • Each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, if you do not specify a group name, it belongs to the default group).

  • Each message can only be consumed by one Consumer in the consumer group, but can be consumed by multiple consumer groups. That is, data between groups is shared, and data within groups is competitive.

Leader

  • Each partition has multiple copies, one and only one of them is the leader, and the leader is the current partition responsible for reading and writing data.

Follower

  • Follower follows Leader. All write requests are routed through Leader. Data changes will be broadcast to all followers. Follower and Leader maintain data synchronization. If the leader fails, a new leader is elected from the follower. When the follower and the leader hang up, get stuck or the synchronization is too slow, the leader will delete the follower from the "in sync replicas" (ISR) list and create a new follower.

Replica

  • A copy of the partition to ensure the high availability of the partition.

  • Replicas are based on partitions. Each partition has its own master copy and slave copy.

  • The master copy is called Leader, and the slave copy is called Follower. Follower synchronizes data from Leader by pulling.

  • Both consumers and producers read and write data from Leader, and do not interact with Follower.

  • In order to improve the fault tolerance of Kafka, Kafka supports the replication strategy of Partition, and the number of copies of Partition can be configured through the configuration file .

Controller

  • One of the servers in the Kafka cluster is used for leader election and various failovers.

Kafka design features

  • Provides message persistence capability with time complexity of O(1), and guarantees constant-time access performance even for data above TB level.

  • High throughput rate. Even on very cheap commercial machines, a single machine can support the transmission of 100K messages per second.

  • Support message partitioning between servers, and distributed consumption, while ensuring the sequential transmission of messages in each partition.

  • Support offline data processing and real-time data processing.

  • Data Transmission: Zero Copy Technology.

  • Application decoupling.

  • Support replication log, arbitration, ISR and state machine.

  • High availability and durability.

  • Support log compression and quota management.

Message queue type

Features

Redis

Redis is a NoSQL database based on Key-Value pairs, and it is actively developed and maintained. And it supports MQ function, so it can be used as a lightweight queue service. There is an experimental table name. For the enqueue and dequeue operations of RabbitMQ and Redis, when entering the team, when the data is relatively small, the performance of Redis is higher than that of RabbitMQ, and if the data size exceeds 10K, Redis is too slow and unbearable; At the time, regardless of the size of the data, Redis showed very good performance, and RabbitMQ's dequeuing performance was much lower than Redis.

RabbitMQ

RabbitMQ is an open source message queue written in Erlang. It supports many protocols: AMQP, XMPP, SMTP, STOMP. Because of this, it is very heavyweight and more suitable for enterprise-level development. At the same time, the Broker architecture is implemented, which means that the message is queued in the central queue when it is sent to the client. Good support for routing, load balancing or data persistence.

ActiveMQ

ActiveMQ is a sub-project under Apache. Similar to ZeroMQ, it can implement queues with agents and peer-to-peer technologies. At the same time, similar to RabbitMQ, it can efficiently implement advanced application scenarios with a small amount of code.

ZeroMQ

ZeroMQ is known as the fastest message queuing system, especially for high throughput demand scenarios. ZeroMQ can implement advanced/complex queues that RabbitMQ is not good at, but developers need to combine multiple technical frameworks by themselves. Technical complexity is a challenge to the successful application of this MQ.

Hope you like this post!

  • x
  • convention:

olive.zhao
Admin Created Nov 2, 2021 14:38:31

  • x
  • convention:

NTan33
Created Nov 8, 2021 01:37:46

A very comprehensive overview of Kafka!
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 8, 2021 03:01:46 (0) (0)
 
MahMush
MahMush Created Dec 12, 2021 11:34:52 (0) (0)
yes it is  
Vien
Created Nov 8, 2021 08:28:15

Great one
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 9, 2021 00:41:01 (0) (0)
 
kita
Created Nov 8, 2021 09:03:08

Great Kafka overview.
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 9, 2021 00:41:07 (0) (0)
 
SamB
Created Nov 8, 2021 09:56:07

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 9, 2021 00:41:20 (0) (0)
Thanks for your support!  
hanhcao
Created Nov 8, 2021 10:09:42

Good share
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 9, 2021 00:41:27 (0) (0)
 
mouh1991
Created Nov 8, 2021 10:24:22

Thanks
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Nov 9, 2021 00:41:33 (0) (0)
 
faysalji
Author Created Nov 8, 2021 11:11:38

"Kafka was originally developed by Linkedin" Good to know
View more
  • x
  • convention:

faysalji
Author Created Nov 8, 2021 11:12:06

Kafka components are well defined, thanks
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.