Got it

Can there ever be too much data in big data?

101 0 1 0 0

Hello everyone,

The answer to the question is a resounding YES. There can absolutely be too much data in a big data project.

There are numerous ways in which this can happen, and various reasons why professionals need to limit and curate data in any number of ways to get the right results.

In general, experts talk about differentiating the "signal" from the "noise" in a model. In other words, in a sea of big data, the relevant insight data becomes difficult to target. In some cases, you're looking for a needle in a haystack.

For example, suppose a company is trying to use big data to generate specific insights on a segment of a customer base, and their purchases over a specific time frame.

Taking in an enormous amount of data assets may result in the intake of random data that's not relevant, or it might even produce a bias that skews the data in one direction or another.

It also slows down the process dramatically, as computing systems have to wrestle with larger and larger data sets.

In so many different kinds of projects, it's highly important for data engineers to curate the data to restricted and specific data sets – in the case above, that would be only the data for that segment of customers being studied, only the data for that time frame being studied, and an approach that weeds out additional identifiers or background information that can confuse things or slow down systems. 

For more, let's look at how this works in the frontier of machine learning. 

Machine learning experts talk about something called "overfitting" where an overly complex model leads to less effective results when the machine learning program is turned loose on new production data.

Overfitting happens when a complex set of data points match an initial training set too well, and don't allow the program to easily adapt to new data.

Now technically, overfitting is caused not by the existence of too many data samples, but by the coronation of too many data points. But you could argue that having too much data can be a contributing factor to this type of problem, as well. Dealing with the curse of dimensionality involves some of the same techniques that were done in earlier big data projects as professionals tried to pinpoint what they were feeding IT systems.

The bottom line is that big data can be enormously helpful to companies, or it can become a major challenge. One aspect of this is whether the company has the right data in play. Experts know that it's not advisable to simply dump all data assets into a hopper and come up with insights that way – in new cloud-native and sophisticated data systems, there's an effort to control and manage and curate data in order to get more accurate and efficient use out of data assets.

  • x
  • convention:


You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits


Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.