How essential is a Hadoop infrastructure in a big data environment?
Because Hadoop came into vogue at the same time big data did, they became synonymous. [But] they're two different things. Hadoop is a parallel programming model that is implemented on a bunch of low-cost clustered processors, and it's intended to support data-intensive distributed applications. That's what Hadoop is all about. It existed prior to the fascination with big data that we're hearing about today. But since Hadoop was out there, it was seized on as sort of an architecture for building big data infrastructure. It rests on Google's MapReduce algorithms, which are a way to distribute an application across clusters. Google's file system, operating system, MapReduce applications and Hadoop Distributed File System [HDFS] are mostly built on Java, which introduces its own set of problems. Hadoop also claims to provide resiliency through internodal failover. With most clusters, if one node fails, it's supposed to fail over to another cluster.