Editor’s note: This article is a repost from August 2018 when Tom Reilly was Cloudera’s CEO and its co-founder Mike Olson was its Chief Strategy Officer. Neither of these men remain at the company.
When Cloudera held its quarterly conference call with investors in June of 2018, Deutsche Bank analyst Karl Keirstead presented company CEO Tom Reilly and co-founder, Mike Olson, with an interesting question:
“When I go to Cloudera’s website, I don’t even see the world Hadoop anymore. When I listen to this earnings call, I don’t hear it anymore, and I’m just wondering whether the pivot that Tom (Reilly, Cloudera CEO) laid out at the beginning of the call where you are moving to more ML (machine learning,) analytics and cloud. I’m just wondering does that sort of motivate you to pivot even faster away from those core Hadoop elements to either more Cloudera proprietary IP or perhaps different open source software, whether these shifts Tom talked about are almost forcing an accelerated shift away from those core Hadoop roots?”
You can’t blame Keirstead for wanting clarification given that Cloudera had made its name as one of the leading, if not the leading, Hadoop evangelists, that Hadoop “inventor” Doug Cutting is one of Cloudera’s best-known employees, and that Cloudera (together with O’Reilly) sponsored Hadoop’s biggest conference, Strata Hadoop World for almost a decade. (The conference’s name was changed to Strata Data Conference last year.)
Olson provided Keirstead with a thoughtful response:
Look, we are in no way ashamed of Hadoop. It is still a core foundation element of our platform. But those projects, HCF, [inaudible] scale our storage system, not reduce the scale of our processing engine, they were all we had 10 years ago when we started the company.
Today we’ve got a rich suite of analytic engines and power for our distributed query and Spark for receiving processing and model training and so on. We’ve got a rich collection of storage technologies, not just HDFS but on Amazon S3 native storage, on Microsoft ADLS native storage, even IoT native storage for workloads that demand that in the Apache Kudu project. So it’s just a much more interesting platform than before.
In other words, and according to Olson, Hadoop may not be all that interesting but other data platform technologies, tools and approaches (like Cloud, Spark, Containers, and Edge among many others) are.
It’s worth noting that the number of vendors on the “Hadoop” playground is (surprisingly) large. Consider that Gartner analysts Nick Heudecker, Merv Adrian, and Ankush Jain listed 14 (Amazon Web Services (AWS),Cloudera, Cray, Google Cloud Platform, Hortonworks, Huawei, IBM, MapR Technologies, Microsoft, Oracle, Qubole, Seabox, Teradata and Transwarp) in their Market Guide for Hadoop Distributions (fee required) in 2017. Add to that the mere fact that Gartner published a report with the aforementioned title and you have proof positive that enterprise IT managers are still shopping for something they are calling “Hadoop.”
Not only that, but Gartner also wrote that firms were spending close to $800 million on Hadoop distributions, even though a mere 14 percent of enterprises reported relying on the technology. (There’s a clear opportunity for someone here.)
Moreover, Google certainly isn’t shying away from the word “Hadoop.” At its Google Next conference last month, they spoke about Cloud Dataproc , a Google managed Spark and Hadoop service from the mainstage.
Enterprises are also spending big dollars to pay professionals who have experience with Hadoop. Consider the salaries for jobs in which “Hadoop” is a requirement. (source: Dice)
So when you hear someone say, “Hadoop is important, but it’s not where it’s at these days,” look closely at what they have to offer, what they are trying to sell, and what it means to both you and your organization.