News

Google Is Adding Another Product In Its Range Of Big Data Services, Google Cloud Dataproc

By Priya Rana

Posted on September 24, 2015

Google is adding another product in its range of big data services on the Google Cloud Platform today. The new Google Cloud Dataproc service, which is now in beta, sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform.

Greg DeMichillie, director of product management for Google Cloud Platform, told me Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds — significantly faster than other services — and Google will only charge 1 cent per virtual CPU/hour in the cluster. That’s on top of the usual cost of running virtual machines and data storage, but as DeMichillie noted, you can add Google’s cheaper preemptible instances to your cluster to save a bit on compute costs. Billing is per-minute, with a 10-minute minimum.

DeMichillie and Google product manager for big data products James Malone told me Google is able to ensure the service’s speed thanks to its network infrastructure, but also because it patched a few Spark issues (related to the open source YARN resource manager the company is using for this product) and by building optimized images.

DeMichillie acknowledges that some people simply want to have full control over their data pipeline and processing architecture and are hence more likely to want to run and manage their own virtual machines. In his view, Dataproc users won’t have to make any real tradeoffs when compared to setting up their own infrastructure.

Google Is Adding Another Product In Its Range Of Big Data Services, Google Cloud Dataproc

Leave a Reply

Featured Ad

Leading Solution Providers

Submit Guest Article

Subscribe To Our Newsletter

Latest Tweets

For AI To Change Business, It Needs To Be Fueled With Quality Data

Morten Middelfart – Big Data Solutions for Tumor Sequencing

What Are The Opportunities For High Performance Computing In India?

“First Thing We Tell Them Is That When You Go On A Public Cloud And Put Your Workloads There, Make That Secure”

“With CI, Infrastructure Is Less A Business Constraint And More A Business Enabler”

Follow Article