Definition of Spark

Spark is a memory-based distributed computing framework. In iterative computing scenarios, data is stored in the memory during processing. This provides a computing capability that is 10 to 100 times greater than that provided by MapReduce. Spark can use HDFS as the underlying storage system, enabling users to quickly switch to Spark from MapReduce. In addition, Spark provides one-stop data analysis capabilities, including small-batch stream processing, off-line batch processing, SQL query, and data mining. Users can use all these capabilities seamlessly within an application.

Scroll to top