How Infochimps Extracts Value From Big Data

Clouds, APIs, Big Data. Big buzzwords. Often, it is not clear what exactly companies are offering their customers and how the technology can be put to good use. We interviewed Infochimps' Tim Gasper to get a better perspective.

How can companies around the world make sense of data? Getting new insights, a better understanding of markets or developing new lines of business? To answer some of these questions, we organized a data journalism unconference at SXSW 2012. The key idea was to connect with others who interested in cloud computing and analytics, as well as to show and tell about opportunities to work with data in different ways. 
The landscape is changing rapidly: From storage to clouds to analytics, potentially enabling a wave of changes in reporting, journalism, healthcare and others domains. Among the attendees at the unconference where several people from Infochimps, a start-up company based in Austin. The company positions itself as a "leading marketplace and big data infrastructure provider".  
What can Big Data do for you?
Founded in 2009, Infochimps has a catchy name and an impressive technology offering, too. The company is by now included in lists and overviews looking at Big Data infrastructure and services
Still, for many people the connections between storage, clouds, big data are not that obvious. We asked Tim Gasper, product manager at the company, where Infochimps heading.
Q.: How do you describe what Infochimps does to people outside Big Data and Open Source Software?
Tim Gasper: "If I need to set the context, I'll start with: almost every company imaginable collects data. Banks track your account information, retailers collect data on the things that you buy at their stores, etcetera. As computers get smaller and more powerful, and everything is internet connected, these companies are collecting more and more data. So much, that it's starting to break their existing systems and make them think of the world in new ways. They're realizing that the companies that successfully analyze this data and mine it for insights will have a huge competitive advantage over those that don't."
Q.: What is a typical approach to make sense of data?
Tim Gasper: "In order to analyze that data, companies need to take on new strategies and use new technologies. Some companies choose to do this themselves, but the kind of people that can build these new systems quickly and inexpensively are an extremely rare breed. Infochimps exists to make it easier for companies to adopt these new strategies and technologies. We have a cloud platform for companies to build their Big Data systems on, and we help them along the way. It uses all the most popular open source technologies, but leverages our special sauce to glue all the tools together and make working with them a truly simple, joyful experience. Implementing a Big Data system shouldn't feel like rocket science, so our goal is to make it more accessible."

Q.: What is the history of Infochimps as a company, when was the first idea conceived?
Tim Gasper: "The company started in 2009 as the brain child of founders Flip Kromer and Dhruv Bansal, PhD candidates in Physics, and Joe Kelly. They wanted to collect the world's data all in one place - a Wikipedia of data if you will. Over the last three years, they've been aggregating tons of public and crowdsourced data into our Data Marketplace. However, in the process of doing that, we pioneered some really great big data technology to make that Data Marketplace possible. We learned that not only do companies want lots of external data sources, they also want to leverage that technology we developed. So now we're sharing it as part of flagship product suite - the Infochimps Platform."
Q.: What types of companies are using your platform?
Tim Gasper: "We have a wide variety of customers, but two groups that have adopted our platform most rapidly: Companies in the media space as well as mobile device analytics. In the case of media and communications companies, they are collecting data from social media, the web, and from their clients. The companies that develop a strength in analyzing that data for their clients are pulling ahead of the competition. For mobile device analytics, data is a critical part of what they offer to their clients. This includes mobile coupons, mobile device identification, mobile advertising, and more. Mobile devices are proliferating, and likewise, the analytics industry for those devices is booming. The demand for mobile analytics is surpassing the ability for companies to hire on people that understand Big Data technologies. They need to upgrade their systems now -- and fast. Since the Infochimps Platform is quick to set up and easy to work with, our mobile analytics customers are keeping up with demand and differentiating themselves by developing powerful new features on top of our platform."
Insights from Big Data analytics are going to be relevant to more people
Q.: You recently made the Infochimps platform available as open source and connected the software to popular cloud computing concepts, such as OpenStack. Where do you see the market for big data storage and big data analytics moving right now?
Tim Gasper: "We definitely see it continuing to grow in enterprise adoption. The really interesting thing though is - maybe a year ago organizations were just starting to play around with Hadoop and NoSQL databases, but now they are starting to roll out their first production implementations of it. As more and more it's put into production with live systems, from customer data, to financial data, to social media, ... that will mean two big trends, both of which Infochimps is hoping to play a pivotal role in.
Firstly, as more of the organization's data is a part of the Big Data system, the insights it provides are going to be relevant to more people. These insights have to be distributed to everyone that can benefit from them, the "democratization" of Big Data consumption.
Secondly, due to the distributed nature of these technologies (a.k.a. lots of machines working in concert with each other), they are extremely well suited to take advantage of the cloud. And since IT is increasingly welcoming deploying systems into cloud environments, they'll need tools and interfaces to make that simple, fast, flexible, and secure."
"Systems don't talk to each other. They're silos"
Q.: How far away are non-specialist companies from using the benefits of big data? What should be high on their agenda to use these new options effectively?
Tim Gasper: "Most companies right now are finally getting a grasp of how much data they have. Before the issue was "how can I start collecting data about my customers, my supply chain, and my systems." Now they're collecting that data, and they're starting to master the way those technologies work, but the systems don't talk to each other. They're silos. So the next step, is working with Infochimps or another company out there to help make those systems talk to each other so that when you do your big data analytics - all that data is accessible to work with.

Big Data technologies are displacing data warehouses and business intelligence systems
Tim Gasper: "More and more, Big Data technologies are displacing data warehouses and business intelligence systems, which have been far under-delivering on their potential. Why lock all your data away into a slow, monolithic data warehouse, when you can put it in a highly scalable big data database and run reports in near real-time? Companies are starting to figure that out."
Q.: There is a lot of talk about costs for IT in general and the potential lower costs using Open Source/Open Data as well as cloud services. What is your current take on that? Is it about lowering costs or effectively being able to do new things?
Tim Gasper: "It's always more about being able to do new things. No one product or technology can solve all your woes. The nice thing about open source is that the developer community (including Infochimps) has worked very hard to help these systems work together effectively. So you can use a few different databases types in tandem with each other. You can use Apache Hadoop for batch data processing, and Apache Flume for streaming data processing. And the open source community is adding new features and hardening the tools for the enterprise at a pace way faster than many proprietary vendors. A nice side benefit though is that these tools are much less expensive to work with. You can start downloading and learn to work with them free of cost, and pay an enterprise open-source vendor significantly less than you'd pay an old-school mega vendor."
Thank you for the interview.
The Big Data Landscape

Linda Rath-Wiggins
Edited by:
Linda Rath-Wiggins
Tim Gasper
Infochimps in the Rackspace Cloud
MetaLayer and Infochimps