Profile photo for Matei Zaharia

It depends a bit on what you mean by "Hadoop". Some people take Hadoop to mean a whole ecosystem (HDFS, Hive, MapReduce, etc), in which case Spark is designed to fit well within the ecosystem (reading from any input source that MapReduce supports through the InputFormat interface, being compatible with Hive and YARN, etc). Others refer to Hadoop MapReduce in particular, in which case I think it's very likely that non-MapReduce engines will take over in a lot of domains, and in many cases they already have.

From this latter point of view, perhaps the most interesting thing about Spark is that it shows that a lot of workloads can be captured efficiently by the same, simple generalization of the MapReduce model. Spark can achieve (and sometimes beat) state-of-the-art performance in not only simple ETL, but also machine learning, graph processing, streaming, and relational queries. Importantly, this means that applications can combine these workloads more efficiently. For example, once you ETL data in, you can easily compute a report or run a training algorithm on the same in-memory data. Furthermore, you get the same programming interface to combine these jobs in, and only one system to manage and install.

How much will this matter? It's hard to predict, but one possibility is that after experimenting with specialized computing models, distributed programmers will want to have a general model, in the same way that programmers for a single machine settled on general-purpose languages. Having a general platform is even more important in big data, because data is so expensive to move across systems! In this case, Spark shows that many of the tricks used in specialized systems today (e.g. column-oriented processing, graph partitioning tricks) can be implemented on a general platform.

In any case, it is a first-order goal of the system to stay compatible with the wider Hadoop ecosystem, and just give people better ways to compute on the same data. The Hadoop ecosystem is also moving quickly towards supporting alternative programming models, through efforts like YARN.

View 37 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025