Hello, everyone!
Today I'm going to introduce you FI Streaming. It helps beginners learn about FI faster.
Component Introduction
Log Introduction
Client Use
Streaming WebUI
Component Introduction
Streaming is a distributed, reliable, and fault-tolerant real-time computing system based on the open source Apache Storm. It is used to process data streams of a massive scale on a real-time basis. Storm Streaming is applicable to real-time analysis, continuous computation, and distributed Extract, Transform, and Load (ETL). Storm has the following features:
Wide applications
High scalability
Free from data loss
Fault-tolerant
Easy to construct and control
Multi-language support
The Streaming service consists of the active and standby Nimbus processes, UI processes, and multiple Supervisor processes, as shown in Figure 1.
Figure 1 Streaming architecture
Name | Description |
|---|---|
Nimbus | Nimbus indicates control nodes of the Streaming service. The Streaming service in HA mode has one active Nimbus and one standby Nimbus.
|
Supervisor | Supervisor monitors and receives tasks allocated by Nimbus and starts or stops Workers based on actual situations. Worker is a process running specific service logic. Each worker is a JVM process. |
UI | UI is the streaming service monitoring interface, through which the system running topology can be viewed. |
ZooKeeper | ZooKeeper provides distributed coordination services for processes of Streaming. Active and standby Nimbus, Supervisors, and workers are registered with ZooKeeper so that Nimbus can obtain the health status of each process. |
Log Introduction
Log path: The default storage path of Streaming log files is /var/log/Bigdata/streaming/role name.
Nimbus: /var/log/Bigdata/streaming/nimbus (run log) /var/log/Bigdata/audit/streaming/nimbus (audit log)
Supervisor: /var/log/Bigdata/streaming/supervisor(run log) /var/log/Bigdata/audit/streaming/ui (audit log)
UI: /var/log/Bigdata/streaming/ui
Logviewer: /var/log/Bigdata/streaming/logviewer
Log archive rule: The automatic Streming log compression function is enabled. By default, when the size of a log file exceeds 10 MB (the log file size threshold is configurable. For details, see "Configuring the Log Level and Log File Size"), the log file is automatically compressed into a log file named in the following rule: <Original log file name>_<No.>.log.zip.
A maximum of 13 latest compressed files are reserved. The number of compressed files and compression threshold can be configured.
The name rule of the compressed audit log is audit.log.[yyyy-MM-dd].[No.].zip. The file will never be deleted.
Streaming Log List
Log Type | Log File Name | Description |
|---|---|---|
Run log | nimbus/access.log | Nimbus user access log file |
nimbus/checkavailable.log | Availability check log file of Nimbus | |
nimbus/checkService.log | Serviceability check log file of Nimbus | |
nimbus/cleanup.log | Cleanup log file about the Nimbus uninstallation | |
nimbus/metrics.log | Nimbus monitoring log file | |
nimbus/nimbus.log | Nimbus process run log file | |
nimbus/postinstall.log | Work log file after the Nimbus installation | |
nimbus/prestart.log | Work log file before the Nimbus installation | |
nimbus/start.log | Work log file about Nimbus startup | |
nimbus/stop.log | Work log file about Nimbus stop | |
supervisor/cleanup.log | Cleanup log file about the Supervisor uninstallation. | |
supervisor/metrics.log | Supervisor monitoring log file | |
supervisor/postinstall.log | Work log file after the Supervisor installation | |
nimbus/prestart.log | Work log file before the Nimbus installation | |
nimbus/start.log | Work log file about Nimbus startup | |
nimbus/stop.log | Work log file about Nimbus stop | |
supervisor/cleanup.log | Cleanup log file about the Supervisor uninstallation. | |
supervisor/metrics.log | Supervisor monitoring log file | |
supervisor/postinstall.log | Work log file after the Supervisor installation | |
supervisor/prestart.log | Work log file before the Supervisor startup. | |
supervisor/start.log | Work log file about the Supervisor startup | |
supervisor/stop.log | Work log file about the Supervisor stop | |
supervisor/supervisor.log | Supervisor process run log file | |
supervisor/[topologyId]-worker-[port number].log | Worker process run log file. One port uses one run log file. By default, the system contains 29100, 29101, 29102, and 29103 four ports. | |
supervisor/metadata/[topologyid]-worker-[port number].yaml | Metadata file of worker logs, which is the log cleaning basis for logviewer. The file will be automatically deleted by the cleaning thread of logviewer based on certain conditions. | |
ui/cleanup.log | Cleanup log file about the UI uninstallation | |
ui/postinstall.log | Work log file after the UI installation | |
ui/prestart.log | Work log file before the UI startup. | |
ui/start.log | Work log file about UI startup | |
ui/stop.log | Work log file about the UI stop | |
ui/ui.log | Run log file of the UI process | |
logviewer/cleanup.log | Cleanup log file about the logviewer uninstallation | |
logviewer/logviewer.log | logviewer run logs file | |
logviewer/postinstall.log | Work log file after the logviewer installation. | |
logviewer/prestart.log | Work log file before the logviewer startup. | |
logviewer/start.log | Work log file about the logviewer startup | |
logviewer/stop.log | Work log file about the logviewer stop | |
Audit log | nimbus/audit.log | Nimbus audit log file |
ui/audit.log | UI audit log file |
Log Level
Table 2 lists the log levels provided by Streaming.
The levels of run logs are OFF, ERROR, WARN, INFO, DEBUG, and TRACE from high priority to low. Run logs of equal or higher levels than the specified level are printed. The higher the specified log level, the less the logs printed.
Severity | Description |
|---|---|
OFF | OFF indicates that log output is disabled. |
ERROR | Logs of this level record error information about system running. |
WARN | Logs of this level record abnormal information about the current event processing. |
INFO | Logs of this level record normal running status information about the system and events. |
DEBUG | Logs of this level record the system information and system debugging information. |
TRACE | Logs of this level record information about the system and class invoking relationships. |
To modify log levels, perform the following operations:
Log in to FusionInsight Manager.
Choose Services > HBase > Configuration.
Select All from the Type drop-down list box.
In the navigation tree, click Log of the target role.
Select a desired log level.
Click Save Configuration. Then click OK for the configuration to take effect.
Log Format
The following table lists the Streaming log formats.
Log Type | Format | Example |
|---|---|---|
Run log | %d{yyyy-MM-dd HH:mm:ss,SSS} | %-5p | [%t] | %m | %logger (%F:%L) %n | 2015-03-11 15:04,241 | INFO | [RMI TCP Connection(2646)-xxx.xxx.xxx.2] | The baseSleepTimeMs [1000] the maxSleepTimeMs [1000] the maxRetries [1] | backtype.storm.utils.StormBoundedExponentialBackoffRetry (StormBoundedExponentialBackoffRetry.java:46) |
<yyyy-MM-dd HH:mm:ss,SSS><HostName><RoleName><logLevel><Message> | 2015-03-11 15:04 252 10-165-0-84 streaming-Nimbus INFO nimbus start normally | |
Audit log | <username><user IP address><time><operation><operation object><operation result> | UserName=streaming/hadoop, UserIP=xxx.xxx.xxx.2, Time=Tue Mar 10 01:15:35 CST 2015, Operation=Kill, Resource=test, Result=Success |
Client Use
For routine fault locating, the client is used to connect to Streaming for verification so that some possibilities can be excluded.

Version
Displays the version of Storm.
storm version
List
Lists all the topologies running on the Storm platform. In the security version, only topologies submitted by the user are displayed.
storm list
Active
Activates the specified topology spout and starts reading data.
storm activate topology-name
Deactive
Deactivates the specified topology spout and stops reading data.
storm deactivate topology-name
Streaming WebUI
Click to go to Streaming WebUI.
Figure 1 FusionInsight Manager

Information in the Cluster Summary column:
Figure 2 Cluster Summary

Version: Version information
Nimbus uptime: Running duration of the primary Nimbus
Supervisors: Number of Supervisor nodes in the cluster
Used slots: Number of used slots in the cluster
Free slots: Number of free slots in the cluster
Free slots: Total number of slots in the cluster
Executors: Number of running executors in the cluster; also number of processing threads in the worker. The value is equal to the sum of executors configured for all Spout/Bolt instances in all topologies of the cluster.
Tasks: Number of running tasks in the cluster; number of Bolt and Spout instances; total number of tasks derived from all executors.
Information in the Topology Summary column:
Figure 3 Topology Summary

Name: Topology name. You can click it to view topology details.
Id: Topology ID
Owner: User who submits the topology. Only administrators and the user who submits the topology can view the topology information.
Status: Topology status (Active, Inactive, Rebalancing, or Killed)
Uptime: Running time of the topology
Num workers: Number of running workers in the cluster
Num workers: Number of running executors in the cluster
Num workers: Number of running tasks in the cluster
Information in the Supervisor Summary column:
Figure 4 Supervisor Summary

Slots: Number of slots configured on the Supervisor. (The default value is 4. The value can be modified by configuring supervisor.slots.ports.)
Id: Supervisor ID
Host: Host where the Supervisor resides
Uptime: Running time of the Supervisor
Used slots: Number of used slots on the Supervisor
Version: version information
Information in the Nimbus Configuration column:
Figure 5 Nimbus Configuration

The configurations of all roles and instances can be displayed on FusionInsight Manager. Information displayed here is to keep open-source compatibility.
Topology details
You can click a topology name in the Topology Summary column to go to the page.
This page consists of Topology summary, Topology actions, Topology stats, Spouts (All time), Bolts (All time) Topology Visualization, and Topology Configuration.
Information in the Topology actions column:
Figure 6 Topology actions

Activate: Activate the topology.
Deactivate: Suspend the topology.
Rebalance: Redistribute resources in the topology.
Kill: Close and delete the topology.
Information in the Topology stats column:
Figure 7 Topology stats

Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked (sum of Spouts/Bolts Emitted)
Transferred: Number of messages transferred to lower-level Bolts (total number of Spouts/Bolts Transferred)
If Bolt A uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
If the emit operation is performed in Bolt A but the receiver of tuple is not specified, Transferred will be 0.
Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.
Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.
Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.
If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.
Information in the Spouts column
Figure 8 Spouts (All time)

Id: ID of a topology component, which is set in the service code. You can view the detailed information of a component by clicking its serial number.
Executors: Number of running executors allocated by the Spout
Tasks: Number of running tasks allocated by the Spout
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts.
If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.
Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.
Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.
If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.
Error Host: Name of the host where an error occurs
Error Port: ID of the port where an error occurs
Last error: Latest error information
Information in the Bolts column
Figure 9 Bolts (All time)

Id: ID of a topology component, which is set in the service code. You can click the ID to view the component details.
Executors: Number of running executors allocated by the Bolt
Tasks: Number of running tasks allocated by the Bolt
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts.
If Bolt A uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
If the emit operation is performed in Bolt A but the receiver of tuple is not specified, Transferred will be 0.
Capacity (last 10m): Performance indicator. A smaller value indicates better performance. When the value is close to 1, it indicates that the load is heavy and the execute method is continuously invoked. In this case, you need to increase the parallel degree of processing units. The calculation formula is (Number executed x Average execute latency) / Measurement time.
Execute latency (ms): Average time spent on processing a message using the execute method
Executed: Number of messages processed using the execute method
Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK
Acked: Number of messages ACKed by the Bolt
Failed: Number of messages Bolt fails to ACKed
Error Host: Name of the host where an error occurs
Error Port: ID of the port where an error occurs
Last error: Latest error information
Information in the Topology Visualization column:
Figure 10 Topology Visualization

Information in the Topology Configuration column:
Figure 11 Topology Configuration

Component-Spout details
You can click an ID in the Spouts column to go to the page.
This page consists of Component summary, Spout stats, Output stats (All time), Executors (All time), and Errors.
Information in the Component summary column:
Figure 12 Component summary

Id: ID of a topology component, which is set in the service code
Name: Topology name. You can click it to view topology details.
Executors: Number of running executors allocated by the Spout
Tasks: Number of running tasks allocated by the Spout
Information in the Spout stats column:
Figure 13 Spout stats

Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts.
If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value will be 0.
Acked: Number of topology messages that are successfully processed. If they are not acknowledged, the value will be 0.
Failed: Topology messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.
If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.
Information in the Output stats column:
Figure 14 Output stats (All time)

Stream: Message stream name. The default name is default.
Emitted: Number of messages with the stream name; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages with the stream name transferred to lower-level Bolts.
If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
Complete latency (ms): Average time taken for messages with the stream name to be completely acknowledged. If they are not acknowledged, the value will be 0.
Acked: Number of messages with the stream name successfully processed. If they are not acknowledged, the value will be 0.
Failed: Messages with the stream name are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.
If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.
Information in the Executors column
Figure 15 Executors (All time)

Id: executor ID
Uptime: Running time of the executor thread
Host: Machine where the executor is running
Port: executor running port
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts.
If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
Complete latency (ms): Average time taken for a topology message to be completely acknowledged. If it is not acknowledged, the value is 0.
Acked: Number of messages successfully processed. If they are not acknowledged, the value will be 0.
Failed: Messages are incorrectly displayed or timeout occurs before the messages are acknowledged. If they are not acknowledged, the value will be 0.
If the ACK feature is disabled, the values of the preceding three parameters are 0, which is meaningless.
Information in the Errors column
Figure 16 Errors

Time: Time when an error occurs
Error Host: Name of the host where an error occurs
Error Port: ID of the port where an error occurs
Error: Error information
Component-Bolt details
You can click an ID in the Bolts column to go to the page.
This page consists of Component summary, Spout stats, Output stats (All time), Executors (All time), and Errors.
Information in the Component summary column:
Figure 17 Component summary

Id: ID of a topology component, which is set in the service code
Name: Topology name. You can click it to view topology details.
Executors: Number of running executors allocated by the Bolt
Tasks: Number of running tasks allocated by the Bolt
Information in the Bolt stats column:
Figure 18 Bolt stats

Windows: Time window that displays the running status of 10 minutes, 3 hours, 1 day, and All time. You can click a time window to view information in the time window.
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts
If a Spout uses the all group method (received by each Bolt) to transfer tuple to Bolt B and two tasks are started in Bolt B, the value of Transferred is twice that of Emitted.
Execute latency (ms): Average time spent on processing a message using the execute method
Executed: Number of messages processed using the execute method
Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK
Acked: Number of messages ACKed by the Bolt
Failed: Number of messages Bolt fails to ACKed
Information in the Input stats column:
Figure 19 Input stats (All time)

Component: Component ID.
Stream: Message stream name. The default name is default.
Execute latency (ms): Average time spent on processing a message using the execute method
Executed: Number of messages processed using the execute method
Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK.
Acked: Number of messages ACKed by the Bolt
Failed: Number of messages Bolt fails to ACKed
Information in the Output stats column:
Figure 20 Output stats (All time)

Stream: Message stream name. The default name is default.
Emitted: Number of messages with the stream name; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages with the stream name transferred to lower-level Bolts.
Information in the Executors column
Figure 21 Executors (All time)

Id: executor ID
Uptime: Running time of the executor thread
Host: Machine where the executor is running
Port: executor running port
Emitted: Number of sent messages; Number of times that the emit method of outputCollector is invoked
Transferred: Number of messages transferred to lower-level Bolts.
Capacity (last 10m): Performance indicator. A smaller value indicates better performance. When the value is close to 1, it indicates that the load is heavy and the execute method is continuously invoked. In this case, you need to increase the parallel degree of processing units. The calculation formula is (Number executed x Average execute latency) / Measurement time.
Execute latency (ms): Average time spent on processing a message using the execute method
Executed: Number of messages processed using the execute method
Process latency (ms): Average time taken from the Bolt receiving a message to sending an ACK.
Acked: Number of messages ACKed by the Bolt
Failed: Number of messages Bolt fails to ACKed
Information in the Errors column
Figure 22 Errors

Time: Time when an error occurs
Error Host: Name of the host where an error occurs
Error Port: ID of the port where an error occurs
Error: Error information





















