In December 2012 Infochimps launched an enterprise cloud for big data analytics. Jim Kaskade, the company’s CEO, says it’s significant because it addresses two critical big data issues. The first is the fact that big data is too large to move to a cloud service, and the second inhibitor is caused by the need for tough and stringent data security. So in his view these challenges often prevent large organisations from using the cloud to analyse big data.
He therefore asks: “Who would use the public cloud for their big data problems?” No-one is going to upload the data to the public cloud, and so Infochimps’ solution is to bring the cloud to the enterprise data to address and solve the problem using a distributed computing model.
Providing cloud benefits
“We provide all of the benefits of cloud – pay per use, a level of abstraction to make it simple for developers to create new applications, infrastructure automation and so on”, he explains. These benefits are provided in a network of trusted third party Tier 4 data centres – where his customers’ data already resides.
Kaskade adds: “Most people don’t think that the cloud is enterprise ready and this is true for the public cloud, but by developing our cloud services inside secure data centres that already house our customers’ data we overcome the two key issues of having to move the data to the compute and its security.”
This is important because, in his view, Global 2000 enterprises are currently, or will soon be executing a data consolidation plan that is focused on reducing their capital expenses. For many organisations virtual clouds are the answer. With them organisations won’t need to spend trillions of dollars on data infrastructure because it’s already available through firms like Infochimps.
Customers also benefit from the economies of scale that are typically offered by infrastructure-as-a-service (IaaS) and other relevant cloud models for storing, analysing and managing data analysis. The other benefits include the ability to manage the elasticity of demand, data analysis even when the data resides in different locations, and easy deployment. It’s also an on-demand and completely managed service that offers three data services: batch (Hadoop), Ad Hoc (or near real-time) with no SQL and New SQL, and real-time data processing which involves real-time analytics.
Enterprise-ready
Kaskade thinks it all presents an incredible opportunity and he thinks that the industry is going to explode in terms of growth. Why? Well he argues that it’s all because at a high level cloud computing is about being fast, simple, flexible and enterprise-ready. Therefore the biggest impact of his company’s enterprise cloud for big data analytics launch is its ability to “provide enterprises with a broad suite of data services that address virtually all application needs.”
Leverage data assets
Infochimps also realised that most companies just aren’t completely leveraging their enterprise data assets. The firm conducted a survey of 200 chief information officers (CIOs), which revealed that the Global 2000 companies involved in the study only use 15% of their enterprise data assets, leaving 85% of them unused. Yet Kaskade claims that 100% of all data assets can be captured and analysed as well as kept secure with his firm’s solution.
No Data Scientist Needed
He also argues that no training is required in order to use Infochimps’ analytical tools. “The new solution allows you to apply analytics without understanding how stream processing, key-value databases or Java MapReduce work – and this is as simple as creating a sentence with nouns and verbs”, he says.
The Domain Specific Language or DSL used to make this possible is called Wukong, and he claims that even a girl who graduated from the University of Texas could use it to work on a data pipeline “that was ten times more complicated than a project we executed with an ETL expert with ten years of Informatica experience.”
“For one customer we provided three PHP developers with access to our cloud, and with the use of our application developer toolkit they were able to create new applications that created $50-100m in annual revenue within 30 days as there was no need to learn anything about stream processing, No SQL, New SQL or Hadoop”, he adds.
Simplified data integration
Data integration is simplified by the creation of Data Delivery Services (DDS), which offers fault tolerance and stream processing services. It allows companies to connect to data streams whether they are in motion or idle. He provides Twitter data streams as an example of one that is changing every second. Accessing a data warehouse like Teradata is an example of an activity where the data is at rest. “The magic is to first connect, then store, analyse and present the data”, he explains.
He concludes with a prediction that every data-driven organisation in 2013 will be working on proving the value of big data technologies. The question they therefore need to consider in his opinion is about whether they can prove their value within 30 days with companies like Infochimps by deploying cloud solutions, or by taking an average of 18 months to achieve the same results by managing their big data analytics in-house. The future, after all, is not just about Hadoop as the future is going to be increasingly about real-time big data analytics.
By Graham Jarvis