Google Search Privacy - Plain and Simple

The latest announcement from Ask unveiling a privacy control heats up the debate regarding intellectual property online. At 9am 12/12/07, there has been no response from Yahoo, Google or MSN, but I am certain that we will see statements and new services announced pretty soon.

In the meantime, this video from Google got me thinking...

...what could I do if I had access to the IP address and Search terms from a single user? Let's work this through an example:

How about we take Henry Lieberman who is researching Software Agents at M.I.T. Media Lab. Henry's research includes an area called Goal-Oriented User Interface for Personal Semantic Search where he has jointly published a thesis and a paper with Alex Faaborg.

A Collection of Search Terms

As Henry and Alex went about their research, I am certain that parts of their activity included searches on the web for information. Perhaps those searches were conducted through a search engine such as Google or Yahoo.

If I was to examine the search terms that they entered and the results that they clicked on, would that give me a good idea of what they were looking for? This is most certainly not a random activity and was guided by the experience, education and analysis that Henry and Alex had done in the time leading up to these searches. If I was a commercial entity who would want to exploit such technology that research would be valuable to me.

If I went further and was able to capture the actual click-stream from Henry and Alex's machines, that data would be even more valuable. So my question is, who owns that information? Surely it is Henry and Alex.

Legal Precedents on Authorship of Original Work

There are a number of legal cases where a work as simple as a set of search terms submitted by an author can be deemed an original work because the unique compilation and sequence relies on the 'judgement and experience of the author'.

For example, take the case of CCC Information Services, Inc. who loaded onto their own servers major portions of the used car valuation guide 'Red Book' for redistribution to its customers. Whereas it may be permissible to copy the phone directory in such a way, but in this case the court ruled that the information contained within Red Book was based not only on research from a multitude of sources, but additionally the judgement and expertise of the authors. CCC was thus found guilty of copyright infringement.

There exist a number of other cases in this space where copyright infringements were granted or denied based on a three prong test: In order to qualify as a copyrightable compliation -

  1. the collection and assembly of pre-existing material, facts or data,
  2. the selection, coordination, or arrangement of those materials and
  3. the creation, by virtue of the of the particular selection, coordination or arrangement of an original work of authorship.

Is Search History Valuable?

Let's go back to Henry and Alex and their search histories as captured and retained by the search engine company. The entire set of search terms they submitted constituted a a body of work. When the search engine company aggregates this data with that of all other users to improve search, that does not constitute an infringement. However, should that data be uniquely identifiable by say an IP address it does seem to infringe, especially if it is traded on in any way.

I'm not a lawyer, but I would like to make sure that those individuals who are creating original works have some rights to their data, wherever it physically lies.

Google has recently launched a new service called Google History where I can get to see my own search history - it falls under the Google Web History Privacy policy where you have agreed to allow your history to be captured and used by Google. I have been looking at my search history on this service - there are no real surprises because it only reminds me of what I have been looking for, but I believe that this data is far more useful Google perhaps on how they improve their services but more likely, how they can use that data to improve their targeting.