Monday, August 29, 2011

What if Google search was running on humans ?

When discussing process automation with public sector officials and practitioners, I always stress that importance of solving semantic interoperability issues: this will allow services to be executed at machine time (milli-seconds) than at human time (hours or even days).  This way, your new building permit might be issued (or rejected) within a couple of seconds.  

But sometimes the message does not go through: people tend to take everything achieved as granted - not realising the difference that technology brought, in some cases. So, I had to devise this simple benchmark: 

What if Google search was running on humans ? 
Let's tackle this small problem in five steps:

1. How many sites do exist ?
According to the Netcraft January 2011 Web Server Survey, there are globally almost 300 million hostnames and almost 100 million active web sites.

2. How many web pages exist ?
According to a comparison between Yahoo indexed web pages and Netcraft reports in 2005, there were globally around 270 pages per web site (active or not). That index would give a total of 270 X 300 mio = 81 billion web pages. According to a report by Google in late 2008, there were almost 1 trillion web pages indexed by Google at that time, including a big percentage of duplicates or automaticly generated pages, that could amount to even 90%, yielding less than 100 billion web pages.  According to http://www.worldwidewebsize.com/ a search algorithm returns approximately 50 billion web pages, indexed by Google, Bing and Yahoo.  As the above three three estimations are of the same "order", we will adopt the smallest number: appr. 50 billion web pages exist. 

3. How many person hours would be needed for one (manual) search ?
If we suppose that we could have an infrastructure able to distribute web pages to humans, in order to search for a specific word, we estimate 10 seconds to judge if a specific word is contained in each page.  Not much you might say (try to locate a specific word in 10 pages and you will see the issue).  However, we need 500 billion person - seconds to complete one search over the total 50 billion pages.  So, if you need the answer within 10 seconds (the best this system can do) you still need 50 billion humans to complete one search in 10 seconds.  And this does not even have ranking ...

4. So ?
According to a 2010 report by Search Engine Land, Google caters for 34,000 searches per second (or appr. 3 billion searches per day). So, every 10 seconds, we need to cater for almost 340,000 searches, yielding a total number of needed humans to 50 bil X 340,000 = 17,000,000,000,000,000.  As this number is quite big, we divide by the population of earth (6,9 billion people), and we reach a conclusion:

If Google Search ran on human power, we would need almost 2,5 million times the global earth population (or 17 quadrillion people) to reach an average response time of 10 seconds.        

Think again before saying that "we do not need machines" in the public sector ...


   

1 comment:

  1. If google search ran on humans, it would be too difficult to manage. I can't even imagine it. Thanks for the post, its such a nice point really.

    ReplyDelete