Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing. Flink features stream processing and is a top open-source stream processing engine in the industry.
Flink provides high-concurrency pipeline data processing, millisecond-level latency, and high reliability, making it suitable for low-latency data processing.
This pic shows the technology stack of Flink.
The following lists the key features of Flink in the current version:
DataStream
Checkpoint
Window
Job Pipeline
Configuration Table
For details about other Flink features, see https://ci.apache.org/projects/flink/flink-docs-stable/ .
Flink architecture:
Client
Flink client is used to submit jobs (streaming jobs) to Flink.
TaskManager
TaskManager (also called worker) is a service execution node of Flink. It executes specific tasks. A Flink system could have multiple TaskManagers. These TaskManagers are equivalent to each other.
JobManager
JobManager (also called master) is a management node of Flink. It manages all TaskManagers and schedules tasks submitted by users to specific TaskManagers. In high-availability (HA) mode, multiple JobManagers are deployed. Among these JobManagers, one of which is selected as the leader, and the others are standby.
Flink provides the following features:
Low latency
Exactly once
Asynchronous snapshot mechanism, ensuring that all data is processed only once.
High availability
Leader/Standby JobManagers, preventing single point of failure (SPOF).
Scale out