Got it

HCIA-Big Data | Flink Time & Window

Latest reply: May 29, 2022 17:53:11 743 23 13 1 0

Hello, everyone!

Introduction to Flink Time 

In stream processor programming, time processing is very critical. For example, event stream data (such as server log data, web page click data, and transaction data) is continuously generated. In this case, you need to use keys to group events and count the events corresponding to each key at a specified interval. This is our known "big data" application.

Time Classification in Stream Processing

During stream processing, the system time (processing time) is often used as the time of an event. However, the system time (processing time) is the time that is imposed on an event. Due to reasons such as network delay, the processing time cannot reflect the sequence of events.

In actual cases, the time of each event can be classified into the following types:

  • Event time: The time when an event occurs;

  • Ingestion time: The time when an event arrives at the stream processing system;

  • Processing time: The time when an event is processed by the system;

For example, if a log enters Flink at 10:00:00.123 on November 12, 2019 and reaches Window at 10:00:01.234 on November 12, 2019, the log content is as follows:

2019-11-02 10:37.624 INFO Fail over to rm2

  • 2019-11-02 10:37.624 is the event time.

  • 2019-11-02 10:37.123 is the ingestion time.

  • 2019-11-02 10:37.234 is the processing time.

Differences Between Three Time

In the actual situation, the sequence of events is different from the system time. These differences are caused by network delay and processing time. See the following figure.

The horizontal coordinate indicates the event time, and the vertical coordinate indicates the processing time. Ideally, the coordinate formed by the event time and processing time should form a line with a tilt angle of 45 degrees. However, the processing time is later than the event time. As a result, the sequence of events is inconsistent.

Differences Between Three Time

Time Semantics Supported by Flink

Table: Differences between processing time and event time


Processing Time

Event Time (Row Time)

Time of the real world

Time when data is generated

Local time when the data node is processed

Timestamp carried in a record

Simple processing

Complex processing

Uncertain results (cannot be reproduced)

Certain results (reproducible)

For most streaming applications, it is valuable to have the ability to reprocess historical data and produce consistent results with certainty using the same code used to process real-time data. 

It is also critical to note the sequence in which events occur rather than the sequence in which they are processed, and the capability to infer when (or should) a set of events is completed. For example, consider a series of events involved in e-commerce transactions or financial transactions. These requirements for stream processing can be met using the event timestamp recorded in the data stream instead of using the 

clock of the machine that processes the data.

Introduction Flink Windows

A streaming system is a data processing engine designed to process infinite data sets. Infinite data sets refer to constantly growing data sets. Window is a method for splitting infinite data sets into finite blocks for processing.

Windows are at the heart of processing infinite streams. Windows split the stream into "buckets" of finite size, over which we can apply computations.

Window Types

Windows can be classified into two types:

  • Count Window: Data-driven window is generated based on the specified number of data records, which is irrelevant to time.

  • Time Window: Time-driven window is generated based on time.

  • Apache Flink is a distributed computing framework that supports infinite stream processing. In Flink, Window can divide infinite streams into finite streams.  

Time Window Types

According to the window implementation principles, Time Window can be classified into tumbling window, sliding window, and session window.

1. Tumbling Window

The data is sliced according to a fixed window length. Characteristics: The time is aligned, the window length is fixed, and the windows do not overlap.

Application scenario: BI statistics collection (aggregation calculation in each time period)

For example, assume that you want to count the values emitted by a sensor. A tumbling window of 1 minute collects the values of the last minute and emits their sum at the end of the minute. 

See the following figure.

Tumbling Window

In the first sliding window, the values 9, 6, 8, and 4 are summed, yielding the result 27. Next, the window slides by a half minute and the values 8, 4, 7, and 3 are summed, yielding the result 22, etc. 

2. Sliding Window

The sliding window is a more generalized form of a fixed window. It consists of a fixed window length and a sliding interval. Characteristics: The time is aligned, the window length is fixed, and the windows can be overlapping.

Application scenario: Collect statistics on the failure rate of an interface over the past 5 minutes to determine whether to report an alarm.

A sliding window of 1 minute that slides every half minute counts the values of the last minute, emitting the count every half minute. See the following figure.

Sliding Window

3. Session Window

The session window is used to aggregate data with high activity in a certain period into a window for calculation. The triggering condition of the window is the session gap. If no data is received within the specified time, the window ends, and the window calculation is triggered. Unlike the tumbling window and sliding window, the session window does not require a fixed sliding value or window size. You only need to set the session interval for triggering window calculation and specify the upper limit of the inactive data duration. 

Session Window

Code Definition

A tumbling time window of 1 minute can be defined in Flink simply as: 

stream.timeWindow(Time.minutes(1)); 

A sliding time window of 1 minute that slides every 30 seconds can be defined as simply as: 

stream.timeWindow(Time.minutes(1),Time.seconds(30));  

Assume that a user performs a series of operations (such as selecting, browsing, searching, purchasing, and switching tabs) after opening an APP on a mobile phone. These operations and the corresponding occurrence time are sent to the server for user behavior analysis. User behavior is a segment, and behaviors in each segment are continuous. A correlation degree of behaviors within a segment is far greater than a correlation degree of behaviors between segments. Each segment of user behavior is called a session, and the gap between segments is called a session gap. Therefore, we should divide the user's behaviors according to the session window and calculate the result of each session. 

In this way, Flink automatically places elements in different session windows based on the timestamps of the elements. If the timestamp interval between two elements is less than the session gap, the two elements are in the same session. If the interval between two elements is greater than the session gap and no element can fill in the gap, the two elements are placed in different sessions. 

DataStream input

That's all, thanks!


  • x
  • convention:

JackJ
Created Apr 28, 2022 05:37:51

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:29:13 (0) (0)
 
wissal
MVE Created Apr 28, 2022 05:54:22

Learning together, every day!
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:29:26 (0) (0)
Learning together!  
MahMush
Moderator Author Created Apr 29, 2022 15:26:51

Important post for time classification, Difference in time
View more
  • x
  • convention:

MahMush
Moderator Author Created Apr 29, 2022 15:28:07

Inconsistencies caused by time differences
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:29:47 (0) (0)
 
azkasaqib
Created Apr 29, 2022 16:52:05

Nice
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:29:55 (0) (0)
Thanks!  
azkasaqib
Created Apr 29, 2022 16:52:11

HCIA-Big Data | Flink Time & Window-4885695-1
View more
  • x
  • convention:

AliBinHussain
Created Apr 29, 2022 16:52:54

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:30:02 (0) (0)
 
AliBinHussain
Created Apr 29, 2022 16:53:01

Keep it up
View more
  • x
  • convention:

MahMush
Moderator Author Created May 6, 2022 09:54:58

Flink uses a concept called windows to divide a (potentially) infinite DataStream into finite slices based on the timestamps of elements or other criteria
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 6, 2022 11:30:19 (0) (0)
 
12
Back to list

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.