Got it

What Are the Differences Between MPPDB and Hadoop?

Created: Feb 23, 2022 07:26:02Latest reply: Feb 23, 2022 08:16:47 245 8 0 0 0
  Rewarded HiCoins: 0 (problem resolved)

Hello, engineers!

What are the differences between MPPDB and Hadoop?

Thanks in advance!

  • x
  • convention:

Featured Answers
olive.zhao
Admin Created Feb 23, 2022 07:44:45

Hello, friend!

MPP: MPPDB processes TB/PB-level data. It has good scalability, structured data, low latency, interactive query capability, and high availability.

Hadoop: The Hadoop processes PB/EB-level data, features high scalability, unstructured data, and relatively high latency. It is generally used for batch processing and high availability.


MPP

Hadoop

Data volume

A huge amount of data was processed(TB/PB level).

Can process a large amount of data (PB/EB level).

Historical data

Not suitable for storing too much historical data.

You can store as much historical data as you want and access it at any time.

Data distribution

Data is associated between nodes in hash and loop mode. The interconnection between nodes has high requirements, and the scalability is limited.

Data is randomly distributed, and data is not associated between nodes. Therefore, high scalability can be achieved.

Data analysis

Provides flexible data analysis capabilities and can be integrated with various software.

The support of data analysis software is gradually being enriched.

Transaction Support

ACID strictly ensures that data integrity is ensured.

Support for some commercial products.

Data structure

Strictly structured data.

Flexible unstructured, semi-structured data.

Response Time

Low latency, interactive script analysis.

In contrast to high latency, it is generally batch processing and requires specific technologies to perform real-time or near-real-time calculations.

Performance optimization

Additional optimization calculations (indexes, partitions, etc.) need to be added.

This feature applies to specific application scenarios and is not efficient in other scenarios.

Hope this helps!

View more
  • x
  • convention:

lisali
lisali Created Feb 23, 2022 08:14:42 (0) (0)
Thanks for your reply!  
All Answers
stephen.xu
stephen.xu Admin Created Feb 23, 2022 07:34:37

Hello, dear.
Please kindly wait for a while.
View more
  • x
  • convention:

lisali
lisali Created Feb 23, 2022 08:13:50 (0) (0)
Thanks!  
Saqibaz
Saqibaz Created Feb 23, 2022 08:16:38 (0) (0)
 
olive.zhao
olive.zhao Admin Created Feb 23, 2022 07:44:45

Hello, friend!

MPP: MPPDB processes TB/PB-level data. It has good scalability, structured data, low latency, interactive query capability, and high availability.

Hadoop: The Hadoop processes PB/EB-level data, features high scalability, unstructured data, and relatively high latency. It is generally used for batch processing and high availability.


MPP

Hadoop

Data volume

A huge amount of data was processed(TB/PB level).

Can process a large amount of data (PB/EB level).

Historical data

Not suitable for storing too much historical data.

You can store as much historical data as you want and access it at any time.

Data distribution

Data is associated between nodes in hash and loop mode. The interconnection between nodes has high requirements, and the scalability is limited.

Data is randomly distributed, and data is not associated between nodes. Therefore, high scalability can be achieved.

Data analysis

Provides flexible data analysis capabilities and can be integrated with various software.

The support of data analysis software is gradually being enriched.

Transaction Support

ACID strictly ensures that data integrity is ensured.

Support for some commercial products.

Data structure

Strictly structured data.

Flexible unstructured, semi-structured data.

Response Time

Low latency, interactive script analysis.

In contrast to high latency, it is generally batch processing and requires specific technologies to perform real-time or near-real-time calculations.

Performance optimization

Additional optimization calculations (indexes, partitions, etc.) need to be added.

This feature applies to specific application scenarios and is not efficient in other scenarios.

Hope this helps!

View more
  • x
  • convention:

lisali
lisali Created Feb 23, 2022 08:14:42 (0) (0)
Thanks for your reply!  
faysalji
faysalji Moderator Author Created Feb 23, 2022 07:59:30

A detailed overview on the differences between the two:



MPPHadoop
Platform OpennessClosed and proprietary. For some technologies even documentation download is not possible for non-customersCompletely open source with both vendor and community resources freely available over the internet
Hardware OptionsMany solutions are Appliance-only, you cannot deploy the
software on your own cluster. All the solutions require specific
enterprise-grade hardware like fast disks, servers with high amounts of
ECC RAM, 10GbE/Infiniband, etc.
Any HW would work, some guidelines on configurations are
provided by vendors. Mostly recommendations are to use cheap commodity
HW with DAS
Scalability (nodes)Tens of nodes in average, 100-200 is a max100 nodes in average, a number of thousands is a max
Scalability (user data)Tens of terabytes in average, petabyte is a maxHundreds of terabytes in average, tens of petabytes is a max
Query Latency10-20 milliseconds10-20 seconds
Query Average Runtime5-7 seconds10-15 minutes
Query Maximum Runtime1-2 hours1-2 weeks
Query OptimizationComplex enterprise query optimizer engines kept as one of the most valuable corporate secretsNo optimizer or the optimizer with really limited functionality, sometimes not even cost-based
Query Debugging and ProfilingRepresentative query execution plan and query execution statistics, explainatory error messagesOOM issues and Java heap dump analysis, GC pauses on the
cluster components, separate logs for each task give you lots of fun
time
Technology PriceTens to hundreds thousand dollars per nodeFree or up to thousands dollars per node
Accessibility for End UsersSimple friendly SQL interface and simple interpretable in-database functionsSQL is not completely ANSI-compliant, user should care
about the execution logic, underlying data layout. Functions are usually
required to be written in Java, compiled and put on the cluster
Target End User AudienceBusiness AnalystsJava Developers and experienced DBAs
Single Job RedundancyLow, job fails when MPP node failsHigh, job fails only if the node manages the job execution will fail
Target SystemsGeneral DWH and analytical systemsPurpose-built data processing engines
Vendor Lock-inTypical caseRare case usually caused by technology misuse
Minimal recommended collection sizeAnyGigabytes
Maximal ConcurrencyTens to hundreds of queriesUp to 10-20 of jobs
Technological ExtensibilityUse only vendor-provided toolsMix up with any brand-new open source tools introduced (Spark, Samza, Tachyon, etc.)
DBA Skill Level RequirementAverage RDBMS DBATop-notch with good Java and RDBMS background
Solutions Implementation ComplexityModerateHigh

Source

View more
  • x
  • convention:

lisali
lisali Created Feb 23, 2022 08:15:00 (0) (0)
Thanks for your reply!  
Saqibaz
Saqibaz Created Feb 23, 2022 08:16:47

Thanks for sharing
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.