Apache Kafka
Kafka is an open source distributed event streaming platform that is optimized for ingesting and processing streaming data in real-time. Apache Kafka has the following core capabilities:
· High Throughput – Delivering messages at network limited throughput using a cluster of machines with less latency.
· Scalable – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions.
· Permanent Storage – Store streams of data safely in a distributed, durable, fault tolerant cluster.
· High Availability – Stretch clusters efficiently over availability zones or connect separate clusters across geographic zones.
Some common use cases for Apache Kafka within an organization:
Apache Kafka Architecture
Apache Kafka contains a simple but very powerful architecture. In Kafka, the Producer pushes the messages to the Broker through a Topic. The Kafka cluster comprise of Brokers that collectively store messages received from Kafka Producers through a Kafka Topic. Next, Kafka Consumers subscribe to Kafka Topics and start receiving messages from Kafka Brokers. The Kafka distributed system is managed through the Zookeeper.
Apache Kafka Components
The Apache Kafka distributed system contains various components with varying functionalities. The following is a brief description of each of them.
Kafka Producer – It is responsible for sending or publishing a message or data to the Kafka Topic which is hosted inside a Kafka Broker. Multiple Kafka Producers can send messages to a single Kafka Topic. It acts as a data source in Apache Kafka system architecture.
Kafka Consumer – It is responsible for receiving a message or data from the Kafka Broker after a successful subscription to a Kafka Topic. Kafka Consumers can request a message from a Kafka Broker. Kafka Consumers can be put into groups called Consumer Groups. Consumers in a group can share the partitions of the Kafka Topic they are subscribed to.
Kafka Broker – Is an intermediate server that is responsible for exchanging messages between the Kafka Producer and Kafka Consumer. Multiple Brokers can form a Kafka Cluster. It is the storage and working area for Kafka Topics and their respective partitions.
Kafka Cluster – It is a group of commodity hardware computers which are connected together and are working towards a pre-defined task.
Kafka Topic – It is name referring to a data stream or a message stream. The Kafka Producer sends a message to a unique name which is called the topic for that message stream. Multiple producers can also send messages to the same topic. For message consumption, Kafka Consumers subscribe to the topic in Kafka Broker and then messages are delivered.
Kafka Partition – The Kafka Producer sends a message to a Kafka Broker with a unique identity of a topic. Since the Kafka Cluster is a distributed system of brokers, If the data volume of a topic is huge and cannot be stored in a single broker, it can be partitioned and stored in different brokers.
Kafka Offset – A sequence number is assigned to each message in a partition of a Kafka Topic. The sequence number is called the offset. Every single partition of a topic has a different offsets and the offset number is always local to the partition.
Kafka Consumer Group – It is a group of consumers sharing the same workload. There can be multiple consumer groups subscribing to the same or different topics. Two or more consumers belonging to the same group do not receive the common message. This is because, the offset pointer moves to the next number once the message has been consumed by any of the consumers in that consumer group.
Kafka Data Model
The Kafka data model consists of messages and topics. Messages represent information such as lines in a log file or error message from a system. Messages are grouped into categories called topics. The process that publish messages into a topic in Kafka is called a producer. The process that receives messages from a Kafka topic is known as a consumer. The servers that process messages in a Kafka cluster are called brokers. A Kafka cluster consists of servers that process messages.