Showing posts with label #Hadoop. Show all posts
Showing posts with label #Hadoop. Show all posts

Thursday, November 5, 2015

Cask Data Raises $20 Million for Enterprise-class Apache Hadoop

Cask Data, a start-up based in Palo Alto, California, raised $20 million in Series B funding for its enterprise-class Apache Hadoop solutions.

Cask's flagship offering, the Cask Data Application Platform ("CDAP"), provides an open source layer on top of the Hadoop ecosystem that adds enterprise-class governance, portability, security, scalability and transactional consistency. From Data Lakes to Data Apps, the CDAP platform is ideal for enterprise environments because it abstracts many layers of the Hadoop ecosystem, allowing developers to use their existing skills to build high-performance, large-scale Big Data applications. The company said its approach dramatically accelerates development of applications and deployment into production, cutting average time to implement by more than 80%, while retaining the operational controls required by today's enterprise customers. Major customers and partners include AT&T, Cloudera, Salesforce, Pet360 and Lotame.

The funding was led by Safeguard Scientifics with participation from Battery Ventures, Ignition Partners and other existing investors

"Big data has moved into the mainstream, but enterprises continue to struggle with the complexities and new skill sets required in the Hadoop ecosystem," said Cask Founder and CEO, Jonathan Gray. "Because our platform can layer on top of any distribution, instantly integrate with new and existing data stores, and easily support both Spark or MapReduce, it delivers real value for enterprises in a data-heavy environment, slashing development and deployment timelines. We are excited to be a part of the Safeguard family of partner companies. This financing, along with the operational expertise and guidance from our new board members Phil and Frank, will allow us to take Cask to the next level."

http://www.cask.co

Monday, September 28, 2015

Latest version of Altiscale Data Cloud has Spark as a Service, ODPi support

Altiscale is updating its Big Data-as-a-Service with an expanded Spark-as-a-Service offering that supports all major versions of Apache Spark.  The new Altiscale Data Cloud 4.0 also features major upgrades to core Hadoop components, such as HDFS and YARN.

"Altiscale is dedicated to providing its customers with a full breadth of production-ready big data analytical options. That’s why we’ve been active in the Spark community from the very beginning,” said Raymie Stata, CEO and founder, Altiscale. “It’s also why Altiscale is ensuring that we support all major recent versions of Spark. Spark is evolving so rapidly that we want to ensure anything our customers rely on for Big Data analytics continues to be there for them.”

The latest version of the Altiscale Data Cloud also features the following updated capabilities:

  • Apache Spark 1.5.0, which provides improved performance and stability. It offers enhanced support for Data Science APIs, especially with advances in DataFrames features and improved support for the R Language, making it a compelling release for data scientists.
  • Apache Hadoop 2.7.1, so that customers can utilize YARN’s resource manager to deploy and manage long-running data access applications in Hadoop.
  • Apache Hive 1.2.0, for access to enhanced SLQ Semantics and major performance improvements.
  • Apache Pig 0.15.0, which now provides the capability to run Hive UDFs inside Pig as well as improved stability for Pig on Tez.

There are also improvements to the workflow manager, Apache Oozie (4.2.0) and to Apache Tez (0.7.0).

https://www.altiscale.com/

Friday, September 25, 2015

Databricks: Apache Spark Outgrowing Hadoop

The number of standalone deployments of Spark eclipses those on YARN as more users run Spark independent of Hadoop, according to a newly published survey of Spark users conducted by Databricks, the company founded by the creators of Apache Spark.

Databricks said that users that are running Spark in standalone (48 percent of respondents) exceeds those running Spark on YARN (40 percent of respondents), alongside a majority of users running Spark in the public cloud. The survey also found that 51 percent of respondents run Spark on a public cloud.

Key findings from the survey include:

  • Spark is outgrowing Hadoop: The most common Spark deployments according to the community are: 48 percent standalone, 40 percent YARN within Hadoop and 11 percent Apache Mesos. Spark users who do not use any Hadoop components have more than doubled in 2015 (from 2014). 
  • Streaming and advanced analytics uses rising: Spark is being used for an increasingly diverse set of applications, particularly data scientists for machine learning, streaming and graph analysis use cases. In 2015, there are 56 percent more Spark streaming users than in 2014. The production use of advanced analytics, like MLib for machine learning and GraphX for graph processing, increased from 11 percent in 2014 to 15 percent in 2015. 75 percent of Spark users are also using two or more Spark components (51 percent of Spark users are using three or more Spark components).
  • Spark users are becoming more diverse:  Of those surveyed, 41 percent identified themselves as Data Engineers, while 22 percent of respondents identified themselves as Data Scientists. Spark users are solving a variety of problems in different languages -- Scala (71 percent), Python (58 percent), SQL (36 percent), Java (31 percent) and R (18 percent) -- and all within the same framework.
  • Spark's most popular use cases come to light: Fifty-two percent use Spark for data warehousing, 68 percent use it for business intelligence, 40 percent for processing application and system logs, 48 percent to build recommendation engines, 36 percent for user-facing services and 29 percent for fraud detection and security.
  • Spark is increasing access to big data:  Ninety one percent of those surveyed claim performance as their reason for adoption, while 77 percent cite ease of programming, 71 percent cite ease of deployment, 64 percent cite advanced analytics capabilities and 52 percent cite real-time streaming capabilities.

"The continued growth of Spark has been highly encouraging, as companies are going into production to obtain real business value, and they are doing so in a wide range of environments beyond Hadoop clusters," said Matei Zaharia, creator of Apache Spark and CTO of Databricks. "Databricks and our partners are 100 percent committed to the long-term growth of Spark and we'll continue to make improvements based on this survey data and our ongoing community feedback, to make the most complete big data analytics toolkit accessible to all businesses."

https://databricks.com

Wednesday, September 23, 2015

Google Cloud Dataproc Brings Fast Hadoop & Spark Cluster Provisioning

Google introduced new capabilities for managing clusters of Hadoop and Spark.

Google Cloud Dataproc, which is now in beta,  is a managed Spark and Hadoop service that leverages open source data tools for batch processing, querying, streaming, and machine learning. The service can be used to create and manage clusters ranging in size from 3 to hundreds of nodes.

Google said its Cloud Dataproc can create Spark and Hadoop clusters in 90 seconds or less, compared to 5 to 30 minutes using on-premises or IaaS providers.

http://googlecloudplatform.blogspot.com/2015/09/Google-Cloud-Dataproc-Making-Spark-and-Hadoop-Easier-Faster-and-Cheaper.html

Wednesday, August 5, 2015

Hortonworks Hits Revenue of $30.7 million, up 154% YoY

Hortonworks, which specializes in Open Enterprise Hadoop, reported Q2 revenue of $30.7 million, an increase of 154 percent over the $12.1 million in the second quarter of 2014. There was a total GAAP gross profit of $17.5 million for the second quarter of 2015, compared to gross profit of $5.5 million in the same period last year.

"We are very pleased with our second quarter performance which was highlighted by support subscription revenue growth of 178% year-over-year and solid customer momentum with the addition of 119 new support subscription logos," said Rob Bearden, chief executive officer and chairman of the board of directors of Hortonworks. "As leading enterprise organizations continue to deploy the Hortonworks Data Platform in production at scale, as evidenced by our 144% dollar-based net expansion rate over the trailing four quarters, we could not be more thrilled to serve as their trusted IT partner during this transformational period in the data management industry."

http://hortonworks.com/

Tuesday, July 14, 2015

MapR Reports Triple Digit Growth for its Apache Hadoop

MapR Technologies, a start-up based in San Jose, California, reported more than 100% growth in bookings and billings during Q2 2015 compared to the same quarter in the prior year for its Apache Hadoop solutions.

MapR processes big and fast data on a single platform, enabling real-time applications for enterprise deployments.

“New customer adoption and expanded deployments of the MapR Distribution for Hadoop have continued to accelerate as enterprise customers are realizing top-line revenue growth and operational efficiencies,” said John Schroeder, cofounder and CEO, MapR Technologies. “Our technology innovations with Apache Hadoop coupled by a proven, subscription-based licensing model, has enabled our business to grow with predictable success.”

https://www.mapr.com

MapR Raises $110 Million for Apache Hadoop


MapR Technologies, a start-up based in San Jose, California, raised $110 million in venture funding for its distribution for Apache Hadoop software. MapR has significant production Hadoop environments in financial services, healthcare, media, retail, telecommunications, and Web 2.0 companies.  The financing will be used to continue growth in the big data and analytics segment, especially to fund additional engineering resources and support...

Monday, June 15, 2015

IBM Backs Apache Spark for Cloud Data Processing

IBM is putting its weight behind Apache Spark, which is an open source engine for large-scale data processing and compatible with Hadoop data.

Apache Spark can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

IBM said Spark is potentially the most important new open source project in a decade that is being defined by data. As such, IBM plans to embed Spark into its Analytics and Commerce platforms, and to offer Spark as a service on IBM Cloud. The company said its will put more than 3,500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide; donate its IBM SystemML machine learning technology to the Spark open source ecosystem; and educate more than one million data scientists and data engineers on Spark.

“IBM has been a decades long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, General Manager, Analytics Platform, IBM Analytics. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”

http://www.ibm.com