Hello, everyone!
We have come across the term 'Big Data' many times, but not many people know what Big Data really is and how it is useful in modern world.
Businesses, government agencies, HCPs (Health Care Providers), as well as financial and academic institutions are all exploiting Big Data's power to boost business prospects and enhance customer experience.
Every day, the world produces almost 2.5 quintillion bytes of data. In the last two years alone, nearly 90 percent of the global data has been generated.
At this point, we know that Big Data is being utilized by every industry and it is important to know what Big Data really is. Let’s talk about Big Data, its applications and Huawei’s Big Data Solution (FusionInsight).
BIG DATA
The term 'Big Data' refers to information/data that is massive, fast and so challenging that conventional methods make it extremely difficult or impossible to process it.
There are some simple Big Data concepts that will make it much easier to define what Big Data is:
it refers to a vast volume of data that in time tends to expand exponentially;
it is so extensive that traditional data analysis methods cannot be used to process or evaluate it;
data mining, data collection, data processing, data exchange and visualization of data are included in Big Data.
Now that we have a decent idea about Big Data, let’s talk about the types of Big Data.
TYPES OF BIG DATA
There are three types of Big Data:
structured;
unstructured;
semi-structured.
Let me break this down in simple words.
Structured
Structured Data is used in an organized fashion to refer to information that is already stored in databases. We mean that data can be interpreted and saved in a fixed format.
Unstructured
Unstructured data is the opposite of structured data - it doesn’t have a clear format. It makes the collection and analysis of unstructured data very complicated and time-consuming.
Semi-structured
Data that is not structured data in the conventional database format, but includes certain organizational properties that make retrieval simpler, is called semi-structured data.
Assuming you have a good idea about Big Data and it’s types, we’ll now see what Huawei is doing in the market of Big Data and what kind of solution Huawei provides to meet new challenges. Huawei has been in data analytical business for over 12 years and Huawei’s Big Data business has helped hundreds of enterprise customers across all regions. Huawei’s answer to modern world’s big data problems is FusionInsight HD.
FusionInsight HD
Huawei FusionInsight HD is a distributed data processing system that provides massive data analysis and query capabilities. It meets the following requirements of enterprises:
swift integration and management of large data sets of various types;
advanced analysis of native information;
visualization of available data for special analysis;
creation of a development environment for new analysis applications;
optimization and scheduling of workloads.
FusionInsight is a distributed data-processing system that provides a unified enterprise-level big data storage query and analysis platform by enhancing functions of the open-source Hadoop software.
HADOOP
You must be wondering what is Hadoop, right?
Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
Now, Hadoop is open source and its not perfect. It has its own flaws.
So, Huawei adopts essence of the open source Hadoop, eliminates bugs, and improves some functions. Huawei’s FusionInsight is much more stable than open source Hadoop.
It supports the swift integration and management of large dataset of various types, advanced analysis of native information and visualization of available data for special analysis.
With the help of FusionInsight, enterprises can capture new opportunities and discover risks by analyzing and mining various massive data.
FusionInsight ARCHITECTURE
I’ll explain about FusionInsight architecture in details in another article.
ADVANTAGES OF FusionInsight
Agile;
Intelligent;
Convergent.
Agile
Provides a range of data processing capabilities, covering converged data warehouse, offline processing, real-time stream computing, real-time retrieval, interactive query, and relationship analysis.
Supports unified multi-cluster and multi-tenant management.
Supports rolling upgrade with zero downtime.
Uses Elk and Spark SQL that are compliant with SQL standards.
Intelligent
The graph database responds to correlated data analyses covering tens of billions of records within seconds, promptly returning query results covering hundreds of billions of relationships spanning tens of billions of nodes.
RTD enables millisecond-level real-time risk control, making the shift from post-event to real-time risk control.
This solution has integrated more than ten algorithms to allow unified algorithm management and improve resource utilization of AI clusters by about 100%.
Convergent
Provides DLF for one-stop data integration, development, and management.
Converges Hadoop and MPPDB data.
Deploys x86 and ARM server hybrid.
Below are some technical terminologies which will be useful for engineers.
HDFS
Provides data access with high throughput; can process large-scale data sets.
Yarn
As the resource management system of Hadoop 2.0, Yarn implements resource management and scheduling for applications.
Spark
An in-memory distributed computing framework.
Elk
Provides standard SQL engine and enables conventional applications to be smoothly migrated to the Big Data platform.
MapReduce
A distributed computing engine supporting massive offline batch processing.
Flink
A unified computing framework for batch and stream processing and stream processing. At its core is a stream processing engine that supports data distribution and parallel computing.
Storm
A distributed, reliable, and fault-tolerant real-time stream data processing system. It provides SQL-like query languages (StreamCQL).
Solr
An independent, enterprise-class application search server based on Apache Lucene.
Kafka
A distributed, partitioned message release-subscription system with multiple copies.
Loader
Exchanges data and files between FusionInsight, relational databases, and file systems.
HBase
A column-oriented distributed storage system suitable for mass unstructured or semi-structured data that provides high availability, performance, and scalability. HBase supports real-time data read and write.
Flume
A distributed mass log collection, aggregation, and transmission system that provides high availability and reliability.
Huawei GaussDB integrates AI technology into the database kernel architecture and algorithms, providing users distributed databases with higher performance, higher availability, and more diverse computing power.
SUCCESS CASES OF FusionInsight
The Huawei Smart Transportation Solution:
Intelligent Marketing with Big Data:
This wraps up my introductory article on Big Data and FusionInsight.
Below you can find all the useful links to learn more about Big Data and FusionInsight.
USEFUL LINKS
1. https://e.huawei.com/ae/solutions/cloud-computing/big-data;
2. https://e.huawei.com/en/publications/global/ict_insights/hw_376150/feature story/HW_376292;
3. https://e.huawei.com/ae/solutions/industries/finance/reshape-data/big-data;
4. https://www.huaweicloud.com/en-us/solution/government/gbd.html;
5. https://www.youtube.com/watch?v=v-UzYYjZ4RE.
FusionInsight DOCUMENTATION