Four Advantages of Spark
Fast: Compared with Hadoop MapReduce, Spark's memory-based computing is more than 100 times faster. And disk-based computing is more than 10 times faster. Spark implements an efficient DAG execution engine to efficiently process data flows based on memory.
Easy to use: Spark supports Java, Python, and Scala APIs and more than 80 advanced algorithms, enabling users to quickly build different applications. Spark supports interactive Python and Scala shells, which means that Spark clusters can be used in these shells to verify solutions instead of packaging, uploading clusters, and verification. This is important for prototyping.
Universality: Spark provides a unified solution. Spark can be used for batch processing, interactive query (general Spark SQL), real-time streaming (via Spark Streaming), machine learning (via Spark MLlib), and graph computing (via Spark GraphX).
These different types of processing can all be used seamlessly in the same application. Spark's unified solution is attractive because any company wants to use a unified platform to handle problems, reducing the human cost of development and maintenance and the physical cost of deploying the platform. Of course, as a unified solution, Spark does not sacrifice performance. On the contrary, Spark has a huge advantage in terms of performance.
Convergence: Spark can be easily integrated with other open source products. For example, Spark can use YARN and Apache Mesos of Hadoop as its resource management and scheduler, and can process all data supported by Hadoop, including HDFS, HBase, and Cassanda. This is especially important for users who have deployed Hadoop clusters because they can use Spark's powerful processing capabilities without any data migration. Spark can also be independent of third-party resource managers and schedulers. It implements Standalone as its built-in resource manager and scheduling framework, which further lowers the threshold for using Spark and makes it easy for everyone to deploy and use Spark. In addition, Spark provides a tool for deploying a standalone Spark cluster on EC2.
----------------
Copyright Notice: This is the original article by the CSDN blogger "Explosion of the Small Universe", in accordance with the CC 4.0 BY-SA copyright agreement. Please attach a link to the original source and this notice for reprinting.
Original link: https://blog.csdn.net/yu0_zhang0/article/details/80056951