Monday, October 26, 2015

IBM Launches Apache Spark-as-a-Service

IBM is launching a Spark-as-a-Service offering on Bluemix following a successful 13-week Beta program with more than 4,600 developers using it to build intelligent business and consumer apps fueled by data.

IBM also confirmed that it has redesigned more than 15 of its core analytics and commerce solutions with Apache Spark.

Apache Spark was developed by the AMPLab at UC Berkeley as an open-source cluster computing framework. It offers in-memory processing and is known for its ease of use in creating algorithms that harness insight from complex data.

“For data scientists and engineers who want to do more with their data, the power and appeal of open source innovation for technologies like Spark is undeniable,” said Rob Thomas, Vice President of Product Development, IBM Analytics. “IBM is committed to using Spark as the foundation for its industry-leading analytics platform, and by offering a fully managed Spark service on IBM Bluemix, data professionals can access and analyze their data faster than ever before, with significantly reduced complexity.”

http://www-03.ibm.com/press/us/en/pressrelease/47946.wss

Databricks: Apache Spark Outgrowing Hadoop


The number of standalone deployments of Spark eclipses those on YARN as more users run Spark independent of Hadoop, according to a newly published survey of Spark users conducted by Databricks, the company founded by the creators of Apache Spark. Databricks said that users that are running Spark in standalone (48 percent of respondents) exceeds those running Spark on YARN (40 percent of respondents), alongside a majority of users running Spark in...

Google Cloud Dataproc Brings Fast Hadoop & Spark Cluster Provisioning


Google introduced new capabilities for managing clusters of Hadoop and Spark. Google Cloud Dataproc, which is now in beta,  is a managed Spark and Hadoop service that leverages open source data tools for batch processing, querying, streaming, and machine learning. The service can be used to create and manage clusters ranging in size from 3 to hundreds of nodes. Google said its Cloud Dataproc can create Spark and Hadoop clusters in 90 seconds...

IBM Backs Apache Spark for Cloud Data Processing


IBM is putting its weight behind Apache Spark, which is an open source engine for large-scale data processing and compatible with Hadoop data. Apache Spark can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like