Got it

Streaming Basic Information

115 0 0 0 0

Hello, everyone!

Today I'm going to introduce you FI Streaming. It helps beginners learn about FI faster.

Component Introduction

Streaming is a distributed, reliable, and fault-tolerant real-time computing system based on the open source Apache Storm. It is used to process data streams of a massive scale on a real-time basis. Storm Streaming is applicable to real-time analysis, continuous computation, and distributed Extract, Transform, and Load (ETL). Storm has the following features:

  • Wide applications

  • High scalability

  • Free from data loss

  • Fault-tolerant

  • Easy to construct and control

  • Multi-language support

The Streaming service consists of the active and standby Nimbus processes, UI processes, and multiple Supervisor processes, as shown in Figure 1.

Figure 1 Streaming architecture
en-us_image_0228256682.png

Name

Description

Nimbus

Nimbus indicates control nodes of the Streaming service. The Streaming service in HA mode has one active Nimbus and one standby Nimbus.

  • The active Nimbus receives tasks submitted by the client and dispatches tasks to Supervisors. It also monitors status of other processes.

  • The standby Nimbus takes over services from the active Nimbus if the active Nimbus is faulty.

Supervisor

Supervisor monitors and receives tasks allocated by Nimbus and starts or stops Workers based on actual situations. Worker is a process running specific service logic. Each worker is a JVM process.

UI

UI is the streaming service monitoring interface, through which the system running topology can be viewed.

ZooKeeper

ZooKeeper provides distributed coordination services for processes of Streaming. Active and standby Nimbus, Supervisors, and workers are registered with ZooKeeper so that Nimbus can obtain the health status of each process.

Log Introduction

Log path: The default storage path of Streaming log files is /var/log/Bigdata/streaming/role name.

  • Nimbus: /var/log/Bigdata/streaming/nimbus (run log) /var/log/Bigdata/audit/streaming/nimbus (audit log)

  • Supervisor: /var/log/Bigdata/streaming/supervisor(run log) /var/log/Bigdata/audit/streaming/ui (audit log)

  • UI: /var/log/Bigdata/streaming/ui

  • Logviewer: /var/log/Bigdata/streaming/logviewer

Log archive rule: The automatic Streming log compression function is enabled. By default, when the size of a log file exceeds 10 MB (the log file size threshold is configurable. For details, see "Configuring the Log Level and Log File Size"), the log file is automatically compressed into a log file named in the following rule: <Original log file name>_<No.>.log.zip.

A maximum of 13 latest compressed files are reserved. The number of compressed files and compression threshold can be configured.

The name rule of the compressed audit log is audit.log.[yyyy-MM-dd].[No.].zip. The file will never be deleted.

Streaming Log List

Log Type

Log File Name

Description

Run log

nimbus/access.log

Nimbus user access log file

nimbus/checkavailable.log

Availability check log file of Nimbus

nimbus/checkService.log

Serviceability check log file of Nimbus

nimbus/cleanup.log

Cleanup log file about the Nimbus uninstallation

nimbus/metrics.log

Nimbus monitoring log file

nimbus/nimbus.log

Nimbus process run log file

nimbus/postinstall.log

Work log file after the Nimbus installation

nimbus/prestart.log

Work log file before the Nimbus installation

nimbus/start.log

Work log file about Nimbus startup

nimbus/stop.log

Work log file about Nimbus stop

supervisor/cleanup.log

Cleanup log file about the Supervisor uninstallation.

supervisor/metrics.log

Supervisor monitoring log file

supervisor/postinstall.log

Work log file after the Supervisor installation

nimbus/prestart.log

Work log file before the Nimbus installation

nimbus/start.log

Work log file about Nimbus startup

nimbus/stop.log

Work log file about Nimbus stop

supervisor/cleanup.log

Cleanup log file about the Supervisor uninstallation.

supervisor/metrics.log

Supervisor monitoring log file

supervisor/postinstall.log

Work log file after the Supervisor installation

supervisor/prestart.log

Work log file before the Supervisor startup.

supervisor/start.log

Work log file about the Supervisor startup

supervisor/stop.log

Work log file about the Supervisor stop

supervisor/supervisor.log

Supervisor process run log file

supervisor/[topologyId]-worker-[port number].log

Worker process run log file. One port uses one run log file. By default, the system contains 29100, 29101, 29102, and 29103 four ports.

supervisor/metadata/[topologyid]-worker-[port number].yaml

Metadata file of worker logs, which is the log cleaning basis for logviewer. The file will be automatically deleted by the cleaning thread of logviewer based on certain conditions.

ui/cleanup.log

Cleanup log file about the UI uninstallation

ui/postinstall.log

Work log file after the UI installation

ui/prestart.log

Work log file before the UI startup.

ui/start.log

Work log file about UI startup

ui/stop.log

Work log file about the UI stop

ui/ui.log

Run log file of the UI process

logviewer/cleanup.log

Cleanup log file about the logviewer uninstallation

logviewer/logviewer.log

logviewer run logs file

logviewer/postinstall.log

Work log file after the logviewer installation.

logviewer/prestart.log

Work log file before the logviewer startup.

logviewer/start.log

Work log file about the logviewer startup

logviewer/stop.log

Work log file about the logviewer stop

Audit log

nimbus/audit.log

Nimbus audit log file

ui/audit.log

UI audit log file

Log Level

Table 2 lists the log levels provided by Streaming.

The levels of run logs are OFF, ERROR, WARN, INFO, DEBUG, and TRACE from high priority to low. Run logs of equal or higher levels than the specified level are printed. The higher the specified log level, the less the logs printed.

Severity

Description

OFF

OFF indicates that log output is disabled.

ERROR

Logs of this level record error information about system running.

WARN

Logs of this level record abnormal information about the current event processing.

INFO

Logs of this level record normal running status information about the system and events.

DEBUG

Logs of this level record the system information and system debugging information.

TRACE

Logs of this level record information about the system and class invoking relationships.

To modify log levels, perform the following operations:

  1. Log in to FusionInsight Manager.

  2. Choose Services > HBase > Configuration.

  3. Select All from the Type drop-down list box.

  4. In the navigation tree, click Log of the target role.

  5. Select a desired log level.

  6. Click Save Configuration. Then click OK for the configuration to take effect.

Log Format

The following table lists the Streaming log formats.

Log Type

Format

Example

Run log

%d{yyyy-MM-dd HH:mm:ss,SSS} | %-5p | [%t] | %m | %logger (%F:%L) %n

2015-03-11 15:04,241 | INFO | [RMI TCP Connection(2646)-xxx.xxx.xxx.2] | The baseSleepTimeMs [1000] the maxSleepTimeMs [1000] the maxRetries [1] | backtype.storm.utils.StormBoundedExponentialBackoffRetry (StormBoundedExponentialBackoffRetry.java:46)

<yyyy-MM-dd HH:mm:ss,SSS><HostName><RoleName><logLevel><Message>

2015-03-11 15:04 252 10-165-0-84 streaming-Nimbus INFO nimbus start normally

Audit log

<username><user IP address><time><operation><operation object><operation result>

UserName=streaming/hadoop, UserIP=xxx.xxx.xxx.2, Time=Tue Mar 10 01:15:35 CST 2015, Operation=Kill, Resource=test, Result=Success

Client Use

For routine fault locating, the client is used to connect to Streaming for verification so that some possibilities can be excluded.

en-us_image_0228256683.jpg

  • Version

    Displays the version of Storm.

    storm version

  • List

    Lists all the topologies running on the Storm platform. In the security version, only topologies submitted by the user are displayed.

    storm list

  • Active

    Activates the specified topology spout and starts reading data.

    storm activate topology-name

  • Deactive

    Deactivates the specified topology spout and stops reading data.

    storm deactivate topology-name

Streaming WebUI

  • Click to go to Streaming WebUI.

    Figure 1 FusionInsight Manager
    en-us_image_0229229486.png

  • Information in the Cluster Summary column:

    Figure 2 Cluster Summary
    en-us_image_0228256685.png

    • Version: Version information

    • Nimbus uptime: Running duration of the primary Nimbus

    • Supervisors: Number of Supervisor nodes in the cluster

    • Used slots: Number of used slots in the cluster

    • Free slots: Number of free slots in the cluster

    • Free slots: Total number of slots in the cluster

    • Executors: Number of running executors in the cluster; also number of processing threads in the worker. The value is equal to the sum of executors configured for all Spout/Bolt instances in all topologies of the cluster.

    • Tasks: Number of running tasks in the cluster; number of Bolt and Spout instances; total number of tasks derived from all executors.

  • Information in the Topology Summary column:

    Figure 3 Topology Summary
    en-us_image_0228256686.png

    • Name: Topology name. You can click it to view topology details.

    • Id: Topology ID

    • Owner: User who submits the topology. Only administrators and the user who submits the topology can view the topology information.

    • Status: Topology status (Active, Inactive, Rebalancing, or Killed)

    • Uptime: Running time of the topology

    • Num workers: Number of running workers in the cluster

    • Num workers: Number of running executors in the cluster

    • Num workers: Number of running tasks in the cluster

  • Information in the Supervisor Summary column:

    Figure 4 Supervisor Summary
    en-us_image_0228256687.png

    • Slots: Number of slots configured on the Supervisor. (The default value is 4. The value can be modified by configuring supervisor.slots.ports.)

    • Id: Supervisor ID

    • Host: Host where the Supervisor resides

    • Uptime: Running time of the Supervisor

    • Used slots: Number of used slots on the Supervisor

    • Version: version information

  • Information in the Nimbus Configuration column:

    Figure 5 Nimbus Configuration
    en-us_image_0228256688.png

    The configurations of all roles and instances can be displayed on FusionInsight Manager. Information displayed here is to keep open-source compatibility.

  • Topology details

    You can click a topology name in the Topology Summary column to go to the page.

    This page consists of Topology summaryTopology actionsTopology statsSpouts (All time)Bolts (All time) Topology Visualization, and Topology Configuration.

  • Information in the Topology actions column:

    Figure 6 Topology actions
    en-us_image_0228256689.png

    • Activate: Activate the topology.

    • Deactivate: Suspend the topology.

    • Rebalance: Redistribute resources in the topology.

    • Kill: Close and delete the topology.

  • Information in the Topology stats column:

    Figure 7 Topology stats
    en-us_image_0228256710.png

    • Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked (sum of Spouts/Bolts Emitted)

    • Transferred: Number of messages transferred to lower-level Bolts (total number of Spouts/Bolts Transferred)

      If Bolt A uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

      If the emit operation is performed in Bolt A but the receiver of tuple is not specified, Transferred will be 0.

    • Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.

    • Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.

    • Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.

      If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.

  • Information in the Spouts column

    Figure 8 Spouts (All time)
    en-us_image_0228256711.png

    • Id: ID of a topology component, which is set in the service code. You can view the detailed information of a component by clicking its serial number.

    • Executors: Number of running executors allocated by the Spout

    • Tasks: Number of running tasks allocated by the Spout

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts.

      If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

    • Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.

    • Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.

    • Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.

      If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.

    • Error Host: Name of the host where an error occurs

    • Error Port: ID of the port where an error occurs

    • Last error: Latest error information

  • Information in the Bolts column

    Figure 9 Bolts (All time)
    en-us_image_0228256712.png

    • Id: ID of a topology component, which is set in the service code. You can click the ID to view the component details.

    • Executors: Number of running executors allocated by the Bolt

    • Tasks: Number of running tasks allocated by the Bolt

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts.

      If Bolt A uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

      If the emit operation is performed in Bolt A but the receiver of tuple is not specified, Transferred will be 0.

    • Capacity (last 10m): Performance indicator. A smaller value indicates better performance. When the value is close to 1, it indicates that the load is heavy and the execute method is continuously invoked. In this case, you need to increase the parallel degree of processing units. The calculation formula is (Number executed x Average execute latency) / Measurement time.

    • Execute latency (ms): Average time spent on processing a message using the execute method

    • Executed: Number of messages processed using the execute method

    • Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK

    • Acked: Number of messages ACKed by the Bolt

    • Failed: Number of messages Bolt fails to ACKed

    • Error Host: Name of the host where an error occurs

    • Error Port: ID of the port where an error occurs

    • Last error: Latest error information

  • Information in the Topology Visualization column:

    Figure 10 Topology Visualization
    en-us_image_0228256713.png

  • Information in the Topology Configuration column:

    Figure 11 Topology Configuration
    en-us_image_0228256714.png

  • Component-Spout details

    You can click an ID in the Spouts column to go to the page.

    This page consists of Component summarySpout statsOutput stats (All time)Executors (All time), and Errors.

  • Information in the Component summary column:

    Figure 12 Component summary
    en-us_image_0228256715.png

    • Id: ID of a topology component, which is set in the service code

    • Name: Topology name. You can click it to view topology details.

    • Executors: Number of running executors allocated by the Spout

    • Tasks: Number of running tasks allocated by the Spout

  • Information in the Spout stats column:

    Figure 13 Spout stats
    en-us_image_0228256716.png

    • Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts.

      If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

    • Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.

    • Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.

    • Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.

      If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.

  • Information in the Output stats column:

    Figure 14 Output stats (All time)
    en-us_image_0228256717.png

    • Stream: Message stream name. The default name is default.

    • Emitted: Number of messages with the stream name; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages with the stream name transferred to lower-level Bolts.

      If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

    • Complete latency (ms): Average time taken for messages with the stream name to be completely acknowledged. If they are not acknowledged, the value will be 0.

    • Acked: Number of messages with the stream name successfully processed. If they are not acknowledged, the value will be 0.

    • Failed: Messages with the stream name are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.

      If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.

  • Information in the Executors column

    Figure 15 Executors (All time)
    en-us_image_0228256718.png

    • Id: executor ID

    • Uptime: Running time of the executor thread

    • Host: Machine where the executor is running

    • Port: executor running port

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts.

      If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

    • Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value is 0.

    • Acked: Number of messages successfully processed. If they are not acknowledged, the value will be 0.

    • Failed: Messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.

      If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.

  • Information in the Errors column

    Figure 16 Errors
    en-us_image_0228256719.png

    • Time: Time when an error occurs

    • Error Host: Name of the host where an error occurs

    • Error Port: ID of the port where an error occurs

    • Error: Error information

  • Component-Bolt details

    You can click an ID in the Bolts column to go to the page.

    This page consists of Component summarySpout statsOutput stats (All time)Executors (All time), and Errors.

  • Information in the Component summary column:

    Figure 17 Component summary
    en-us_image_0228256720.png

    • Id: ID of a topology component, which is set in the service code

    • Name: Topology name. You can click it to view topology details.

    • Executors: Number of running executors allocated by the Bolt

    • Tasks: Number of running tasks allocated by the Bolt

  • Information in the Bolt stats column:

    Figure 18 Bolt stats
    en-us_image_0228256721.png

    • Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts

      If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.

    • Execute latency (ms): Average time spent on processing a message using the execute method

    • Executed: Number of messages processed using the execute method

    • Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK

    • Acked: Number of messages ACKed by the Bolt

    • Failed: Number of messages Bolt fails to ACKed

  • Information in the Input stats column:

    Figure 19 Input stats (All time)
    en-us_image_0228256722.png

    • Component: Component ID.

    • Stream: Message stream name. The default name is default.

    • Execute latency (ms): Average time spent on processing a message using the execute method

    • Executed: Number of messages processed using the execute method

    • Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK.

    • Acked: Number of messages ACKed by the Bolt

    • Failed: Number of messages Bolt fails to ACKed

  • Information in the Output stats column:

    Figure 20 Output stats (All time)
    en-us_image_0228256723.png

    • Stream: Message stream name. The default name is default.

    • Emitted: Number of messages with the stream name; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages with the stream name transferred to lower-level Bolts.

  • Information in the Executors column

    Figure 21 Executors (All time)
    en-us_image_0228256724.png

    • Id: executor ID

    • Uptime: Running time of the executor thread

    • Host: Machine where the executor is running

    • Port: executor running port

    • Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked

    • Transferred: Number of messages transferred to lower-level Bolts.

    • Capacity (last 10m): Performance indicator. A smaller value indicates better performance. When the value is close to 1, it indicates that the load is heavy and the execute method is continuously invoked. In this case, you need to increase the parallel degree of processing units. The calculation formula is (Number executed x Average execute latency) / Measurement time.

    • Execute latency (ms): Average time spent on processing a message using the execute method

    • Executed: Number of messages processed using the execute method

    • Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK.

    • Acked: Number of messages ACKed by the Bolt

    • Failed: Number of messages Bolt fails to ACKed

  • Information in the Errors column

    Figure 22 Errors
    en-us_image_0228256725.png

    • Time: Time when an error occurs

    • Error Host: Name of the host where an error occurs

    • Error Port: ID of the port where an error occurs

    • Error: Error information


Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.