Data Alchemy

I've spent the last 10 years messing with someone or other's data and helping turn it into shareholder value - from multi-national Fortune 100 companies to tiny bootstrapped startups and with full-scale Enterprise Business Intelligence to just a few guys and a machine-learning algorithm in the cloud. There's a common thread running though the most successful cases - these are the companies that really nailed it. (November 2011)

Data - the New Gold Rush

Right now, in Silicon Valley at least, there's another gold rush going on - and this time is all about the data. The Venture Capitalists are making big bets on companies that are either creating huge quantities of customer data, transactional data or those supplying the shovels, mules and canvas jeans to those companies.

The greatest data plays are the colossal Google and Facebook and more recently we see Zynga and Pandora and the yet to be monetized Instagram and Quora. The companies providing the tools include names such as Greenplum, Splunk and Palantir. This is not the future - it's very much the present.

But for each successful data play, there are plenty of failures, the expected casualties for all the usual reasons, but there are is an interesting tenet that I think joins up the winners - the ability to create and monetize a data business.

Google and the other Original Data Alchemists

Google Logo LinkedIn Logo Facebook Logo

Let's look back over the last 15 years when companies like Google, Facebook and LinkedIn were created. Each of them created something that was useful to their users, inherent basic value: Google helped you find pages on the web LinkedIn helped you connect with colleagues and Facebook helped you find your friends

These companies set out to provide a valuable service and by maintaining that it remained a free service created huge user-bases. These user bases created transactional data.


  • Google through search, built an immense database of who was searching for what and when (620 million visits per day as of October 2011)

  • LinkedIn through its resume creation, built an immense database of who worked for which company and when (7 billion users as of November 2011)

  • Facebook through its wall feature and applications platform, built an immense database of who said what about what and when (500 million users interacting with 900 million objects)

These databases were the key to revenue:


  • Google is able to understand what an individual user might want based on their previous search history, this is used to place ads that result in better conversion

  • LinkedIn with its intimate knowledge of how people move between jobs is able to sell premium subscriptions to recruiters and companies that want to target people

  • Facebook with its knowledge of individuals behave in a crowd is able to sell highly viral forms of advertising

The final step, the real magic, is how each of these companies has built a machine around acquiring more of the data that makes them profitable:


  • Google Streetview and Maps, Gmail, Calendar, Docs and Android is creating more touch-points where users expose more of their intentions on the web, on-the-go and through mutual interactions. Currency: User Search and Click-thru'

  • LinkedIn Cardmunch , Dashboards, Events is finding new ways where users can document their professional life. Currency: User Resume

  • Facebook Timeline, Online Gaming, Messaging and Connect is becoming a vital part of how we interact with each other online. Currency: User Social Interactions

These companies took the currency that they created as part of their core service and transformed into something that the aftermarket pays real money for. For companies starting out today, following that model is the key to Data based revenue streams.

Who are the New Data Alchemists

Here are a few more recently successful organizations that have invested in a data-based value model and the currency that they create to be eventually monetized:

Already Monetized

Mint Logo Foursquare Logo Zynga Logo


  • Mint the winner in the personal finance informatics space provides a low effort personal spend analysis, where the lower the consumer effort, the more data gets created and that data is used to alternative financial services (presumably on a sales commission model). Currency: Individual Bank Transactions

  • Foursquare has a powerful Merchant Platform that allows merchants to leverage all of the check-in data and create highly relevant location based offers. The gamification of check-in has created incentives for users to check in and recruit other users. Currency: User Check-In

  • Zynga's portfolio of online games is played 4 times more frequently than its closest competitors and represents 75% of overall playtime with 90% of players playing more than one game. The social aspect of gaming creates incentives to recruit other players and then to impress by adorning one's online property by purchasing virtual goods. Currency: Virtual Goods

Yet to Monetize

Quora Logo Instagram Logo Siri Logo


  • Quora Quora is still building critical mass - a volume of questions and answers - but the hidden data is the brains-trust, there is a network of people who are answering questions and voting each other up or down. The more people, the more questions, the more votes from smart people voting up other smart people. Quora has yet to collect revenue on this, but its potential as a resource bank will be immense. Currency: Votes

  • Instagram Simplicity and effectiveness has over 10 million people snapping away on their iPhones and sharing amazingly enhanced photos with each other. Plenty of potential to float location based advertisements, but that might not be the real trick to this one. In a day and age where I don't print photos and put them in albums anymore but I do carry around a smartphone, here is the visual timeline of my life. Is that worth paying for? Would someone else pay money for my photostream? Perhaps if all of the objects in the photos were identifiable and geo-tagged. Currency: Personal Streams of Photos and Shares

  • Apple's Siri Apple is obviously using Siri to sell iPhones, but the data exhaust that is Siri is generating provides a set of search and reaction data that will rival Googles. For Siri to serve up sponsored results may be a potential revenue model, but to really understand the detailed behaviors of the user could be a fantastic key to unlocking huge revenue models.Currency: User Commands

Five Simple Steps to Creating Gold


  1. Understand what the basic transaction that you perform for your users and the precise value of that transaction

  2. Identify all possible data that gets created during the end-to-end process of that transaction

  3. Work out who might value that data in the raw, when it is aggregated or combined with other sources

  4. Determine a monetization model for the data with some new customers who gain disproportionate value from it - without abusing the users who are contributing the data

  5. Go back and enhance the process around the user transaction to ensure, increase and enrich the data being created

The Companies that Missed the Data so far

Color Logo MySpace Logo Pandora Logo

  • Color - expensive domain-names, huge amounts of funding, insane levels of hype - and an app that nobody wanted to use - feels like 1999 all over again. The data play may have been huge - photo-based social networks in real-time, but as they didn't create a foolproof motivation for users to create data, then it's all for nought. Let's see what unfolds from here and they pivot into the visual status update.

  • MySpace I never really understood MySpace - perhaps it's because I'm a data geek, the information seemed too unstructured for me to grok or be able to navigate - just a series of random mini web-pages. There seemed to be huge traction in its time and huge motivation to create pages, but because the data was unstructured, then the ability to turn it into value that could be up-cycled was beyond the imagination of the organization

  • Pandora - despite Pandora managing to post a profit in the last quarter of 2011, the fact that they haven't managed to transcend the radio-station style advertising revenue model means that profit is always going to be very hard work. Pandora's music genome, the immense amount of listening data they have could provide a high level data exhaust for monetization. A person's listening habits might provide strong clues as to lifestyle - which could mean personalized ad's that are less annoying and more likely to result in conversion for the advertiser

  • Gilding for Non-Startups

    The data gold-rush is not just for start-ups - well established companies have just as much to gain, if not more, than the digital upstarts:


    • Online and Brick and Mortar Retailers Amazon (US) and Tesco (UK) are my favorite examples of what can be done when customer purchase or browsing data is fed back into the product selection process and combined with external demographic data - a view of the customer, perhaps more intimate than the customers can imagine themselves. This creates a retail proposition that is more responsive to the customers needs, ultimately raising switching costs for the customer (a.k.a. customer loyalty)

    • In-car Telematics Vendors are the businesses that endow passenger vehicles with some sort of GPS tracking, providing that information back to the customer for journey analysis would allow customers to reduce journey-times, save money and travel more effectively. Uber is mining journey data to increase car availability with lower resource levels

    • Credit Card Providers Although Yodlee and Mint have stolen a lot of the thunder, there are still rich pickings in the data for the Credit Card providers. Search engines might know what you've been looking for, retailers might know what you've bought from them, but the credit card companies have the actual transactions and the aggregated data across all merchants and furthermore, they have a trusted relationship with the customer - one where they can immediately segment their customers into very specific groupings. This presents all manner of opportunities.

    • Airlines and Hotels What do airlines do with my loyalty card information? They give me points (although they spend a lot of energy restricting my ability to use them) and they give me privileges - huge incentives to create transactions but very few incentives to enrich that data to make it really valuable. What do the travel sites like TripIt, Expedia and Tripadvisor know about my travel intentions and experiences? Are the airlines and hotels really leveraging all the data that's out there? What amazing incentives could they create that would not only strengthen loyalty, but would allow them to offer more services around the entire travel experience? Check out AddToTrip for an out the box application to deliver this capability.

    • Telephony Providers Not simply the AT&T, BT, Orange Telecom, Telefonica's of the world, but also Vonage, Skype and other IP based services - these all classify as telephony for me. If you take the Call Detail Records that they sit on, the aggregated data is huge. It can tell you who is related to whom, what triggers calls and how effects (like sign-ups, upgrades and attritions) spread through a network. However, when you look at the data out of this box, we can also start to map calls to vendors after marketing events and the ongoing network effects or understand how non-marketing events drive data traffic through a network. The location data that comes from Smartphones enables traffic mapping, urban planning and other logistical services. Far from being a dying industry, telephony might prove to be the biggest data play of them all.

      First Retail is available to help with your Data Alchemy - from following the Steps to Creating Gold to solving the complex technology problems required to create valuable data-streams. Email us at Alchemist@firstretail.com.

      I'd also like to acknowledge the team at Social Data Lab who are shifting the paradigms of data - their contributions inspired this post.

      Things I read along the way