Intel Addresses Big Data With Its Own Hadoop Distribution
Intel’s Big Data plans progress with the addition of a Xeon-optimised Hadoop implementation
Intel has made a significant move into the booming area of Big Data after it released its own distribution of Apache Hadoop.
During a Webcast press conference 26 February, Boyd Davis, vice president and general manager of Intel’s Data Centre Software division, said the giant chip maker has been working with Hadoop since 2009 – this is actually Intel’s third release of its Hadoop software – and has been an open-source advocate for much longer than that.
Intel Distribution
However, for Intel to be more than an outside influence in the Big Data space, it had to become a player with products to offer, Davis said. The Intel Distribution for Apache Hadoop was a way to do that.
Intel’s legacy in high-end data centre hardware – including its Xeon server chips and recent offerings around solid-state drive (SSD) memory – and newer efforts in software give the company a strong silicon-based foundation for launching a Hadoop distribution, he said. Intel is optimising Hadoop to work with features on its chips, such as incorporating Advanced Encryption Standard New Instructions (AES-NI) for accelerating encryption into the Hadoop Distributed File System.
Intel, which has been building up its software capabilities via in-house development and acquisitions, will keep open parts of its Hadoop distribution – making them interoperable with other Hadoop distributions – but will keep some features, including management and monitoring capabilities, to itself. Intel will not open source such software as Intel Manager for Apache Hadoop – for configuration and deployment – or Active Tuner for Apache Hadoop, a tool for improving the performance of compute clusters running the distribution.
What Intel’s Hadoop distribution will do is give organisations the confidence that comes when a major tech player supports an open-source technology, providing a “consistent, stable foundation” for the open-source software, Davis said, adding that Intel wants “to make sure Hadoop stays on the leading edge.” More vendors, from established players like EMC to smaller companies like Cloudera, are coming out with their own Hadoop offerings.
Big Data is becoming a growing trend in the business world, with a staggering amount of data being created from the wide range of devices and machines people are using. Davis pointed to numbers indicating that every 11 seconds, a petabye of data is created around the world.
“We’re in an era of generating huge amounts of data,” he said, noting that “the key is how to get value out of the data.”
Big Data
Hadoop, which includes about a dozen open-source projects, is designed to enable businesses to more easily do just that: store huge amounts of data, analyse it and leverage it in ways that benefit both the organisations and their end users. For example, businesses can use it to gain a better understanding of what their customers want, while medical researchers can more quickly discover life-saving drugs and communities can improve their environments by better managing traffic patterns.
“Big Data has the potential to not only transform business models … but has the ability to transform society,” Davis said.
Intel’s move comes the same week that other players have made significant advances in Big Data and Hadoop. Hewlett-Packard announced a Hadoop plug-in for its ArcSight security software that will make it easier and faster for organisations to run through huge amounts of security data. Hortonworks’ new beta of its Hadoop Data Platform will run on Microsoft’s Windows Server, and EMC announced 25 February a new Hadoop distribution, Pivotal HD, that works closely with the storage vendor’s Greenplum massively parallel processing (MMP) database.
Davis said Intel will leave much of the application work to its partners, but that the chip maker will create a foundation for Hadoop that will enable organisations to leverage the capabilities in its data centre hardware. Intel’s AES-NI technology will enable up to 20 times the encryption speed of other technologies, while Intel’s SSD and cache acceleration will offer queries in Hive – the data warehouse system in Hadoop – that are 8.5 times faster. The combination of Intel’s silicon and its Hadoop distribution means that analysing a terabyte of data, which normally would take as long as four hours, can now be done in seven minutes, according to Intel.
During the Webcast, Intel offered a long list of partners to help integrate its software into various platforms, including Cisco Systems, Cray, Dell, Infosys, NextBio, Red Hat, SAP, SAS, Savvis and Teradata. SuperMicro announced 26 February that it was adding Intel’s Hadoop distribution to some of its servers and storage systems aimed at Big Data environments.
Intel’s investment arm, Intel Capital, also is investing in smaller Big Data companies, such as 10gen and Guavus Analytics.
The strong move into Big Data also will help fuel sales of its Xeon chips, driving organisations to run their Big Data workloads on Intel-based servers from the likes of HP and Dell. Davis said that “one of [Intel’s] biggest motivators is to drive faster growth of the data centre.”