Got it

Stream SQL explanation

Latest reply: Nov 24, 2021 04:55:51 879 3 5 0 1

This post is about the stream SQL explanation. Please find more details below.

1. Flow calculation SQL principle and architecture

Streaming SQL is usually an SQL-like declarative language, mainly used for continuous querying of streaming data (Streams). In order to be under the common stream computing platform and framework (such as Storm, Spark Streaming, Flink, Beam, etc.) API, reduce the real-time development threshold by building an SQL abstraction layer using a simple and common SQL language.

The principle of stream computing SQL is actually very simple, which is to build a bridge between SQL and the underlying stream computing engine. Stream computing SQL is submitted by the user, translated by the SQL engine layer into the underlying API and on the underlying stream computing engine carried out. For example, for Storm, it will be automatically translated into Storm's task topology and run on the Storm cluster.

The stream computing SQL engine is the core of stream computing SQL. It is mainly responsible for the operations of user SQL input for parsing, semantic analysis, logical plan generation, logical plan execution and physical execution plan generation. The real implementation of computing is the underlying stream computing platform.

Unlike offline tasks, real-time data is constantly flowing in, so in order to use SQL to abstract stream processing, stream computing SQL also introduces the concept of 'table', but the table here is a dynamic table.

The flow of SQL is as follows:


2. Stream computing SQL: the main real-time development technology in the future at present

The flow calculation SQL has a different progress and support in various computing frameworks. Storm SQL is just an experimental feature. Flink SQL is the core API that Flink promotes. Flink is a native open source stream computing engine and there are currently no other open source stream computing engines that offer better streams than Flink. Calculate the SQL framework and syntax, etc., so Flink SQL is actually defining the annotations for the flow calculation SQL.

3. DataFrame and SQL Operations

You can easily use DataFrames and SQL operations on streaming data. You have to create a SparkSession using the SparkContext that the StreamingContext is using. Furthermore, this has to be done so that it can be restarted on driver failures. This is done by creating a lazily instantiated singleton instance of SparkSession. This is shown in the following example. It modifies the earlier word count example to generate word counts using DataFrames and SQL. Each RDD is converted to a DataFrame, registered as a temporary table and then queried via SQL.

java for streaming

You can also run SQL queries on tables defined on streaming data from a different thread (that is, asynchronous to the running StreamingContext). Just make sure that you set the StreamingContext to remember a sufficient amount of streaming data such that the query can run. Otherwise the StreamingContext, which is unaware of the any asynchronous SQL queries, will delete off old streaming data before the query can complete. For example, if you want to query the last batch, but your query can take 5 minutes to run, then call streamingContext.remember(Minutes(5)) (in Scala, or equivalent in other languages).

  • x
  • convention:

Created Nov 6, 2021 16:05:04

Stream SQL explanation-4275141-1
View more
  • x
  • convention:

Created Nov 24, 2021 04:55:51

Nicely done. Thank you for sharing
View more
  • x
  • convention:

PanchakS Created Jun 13, 2022 09:22:23 (0) (0)


You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits


Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.