Search Engine Logs
From Blindside
Contents |
[edit] What is it
Search Engine Logs contain data of searches made by users on search engines. When a user enters a search term, this information is captured and stored in the search engine logs. The type of data that is retained in the logs and the duration for which this data is stored may vary from company to company. Google for example states in its revised log retention policy:
- When users search on Google, we collect information about the search, such as the query itself, IP addresses and cookie information. We had previously kept the logs data for as long as it was useful. When we implement this policy change, we will continue to keep server log data so that we can improve Google's services and protect them from security and other abuses, but we will anonymize our server logs after 18-24 months, unless legally required to retain the data for longer. Google Log Retention Policy FAQ
Google’s example of a typical search engine log entry where the search is for "cars", followed by a breakdown of its parts:
Search Engine Log Entry: 123.45.67.89 - 25/Mar/2003 10:15:32 - http://www.google.com/search?q=cars - Firefox 1.0.7; Windows NT 5.1 - 740674ce2123e969
- • 123.45.67.89 is the Internet Protocol address assigned to the user by the user's ISP; depending on the user's service, a different address may be assigned to the user by their service provider each time they connect to the Internet;
- • 25/Mar/2003 10:15:32 is the date and time of the query;
- • http://www.google.com/search?q=cars is the requested URL, including the search query;
- • Firefox 1.0.7; Windows NT 5.1 is the browser and operating system being used; and
- • 740674ce2123a969 is the unique cookie ID assigned to this particular computer the first time it visited Google. (Cookies can be deleted by users. If the user has deleted the cookie from the computer since the last time s/he visited Google, then it will be the unique cookie ID assigned to the user the next time s/he visits Google from that particular computer). [1]
[edit] Impact & Maturity assessment
We assign this an Impact Level of 1, as until resolution of IP addresses is available and widespread, this data will be useful primarily in aggregate. Information assurance issues become stronger when external search results are combined with hard drive search, now on offer from Microsoft Vista as well as Google. However, commercial and competitive pressures, the existence of both regulatory bodies and a large number of individuals for whom this is a concern will probably keep this issue in check. We assign this a Maturity Level of 2, as organisations such as DunnHumby have been dealing with this level of data (on behalf of Tesco) for some time.
Information assurance issues related to search engine logs pertain with possible exploitation of the data in logs for various activities that infringe users’ privacy.
This page from Google shows one way in which the information on search engine logs can be interpreted. [2]
[edit] Information Assurance issues
Information assurance issues arise here when a company that owns the search engine provides that data to a third party. Because the data contains a lot of details; there is a possibility of exploitation of the data in logs for various activities that infringe users’ privacy.
This page from Google shows one way in which the information on search engine logs can be interpreted. [3]
AOL faced a lot of criticism from privacy advocates when it released a compressed text file on one of its websites containing twenty million search keywords for over 650,000 users over a 3-month period, intended for research purposes. AOL Search Data Scandal
[edit] Timescale
…to be added.
[edit] Examples
News:
AOL's disturbing glimpse into users' lives
Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes
Google much? The Justice Department wants to know
Google silent over US Government subpoena
Google taking steps to improve privacy practices
Update: Lane’s Gifts v. Google
Google adding search privacy protections
Privacy concerns dog Google-DoubleClick deal
[edit] Comments (attributed)
John Battelle, journalist, and author of “The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed our Culture” talking about search engine logs and the possible impacts of these says:
- This information represents, in aggregate form, a place holder for the intentions of humankind - a massive database of desires, needs, wants, and likes that can be discovered, supoenaed, archived, tracked, and exploited to all sorts of ends. Such a beast has never before existed in the history of culture, but is almost guaranteed to grow exponentially from this day forward. This artifact can tell us extraordinary things about who we are and what we want as a culture. And it has the potential to be abused in equally extraordinary fashion. [4]
Richard M. Smith, an Internet security and privacy consultant at Boston Software Forensics while comenting on search engine logs and Google’s revised log retention policy said that Google should never be archiving the IP address and cookies on servers. He further adds:
- Google should not be in the spy business, by logging IP addresses and search strings they are running the largest intelligence operation in the world. [5]
[edit] Organisations
[edit] Documents & research papers
A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities
Privacy Protection in Personalized Search
