Bad news, everybody — it seems that Google has removed the interface that Scroogle used to get its results from, rendering Scroogle’s search engine scraper powerless.
For those who don’t know, Scroogle acts as a go-between between your computer and Google’s servers. You go to Scroogle.org, click on the “Scroogle Scraper” link (or just type “scroogle.org/scraper.html” to start with 😉 ) and type in your search terms. Scroogle’s servers send these results on to Google’s servers, get the results, parse them, and send them back to you.
“Why bother with the middleman to get the same results?” you might ask. Well, you see, Google places a tracking cookie on your computer if it doesn’t have one already. This cookie doesn’t expire for 38 years, and allows Google to tie your searches and, if I understand correctly, surfing across any website they control or have AdSense on, which is a LOT, to a specific IP address — to your network. This helps them create a rather thorough database of your surfing habits, which — whether intended for money-making, more specific searches, or some other reason — is a rather huge invasion of privacy. And that’s not the only issue with Google — but I’ll let Google Watch teach you about all that, for now. 🙂 And Wikipedia too, since I am not without a sense of irony.
By acting as a middleman between your computer and Google, Scroogle prevented this cookie, your search terms, and your IP address from being logged by Google. Moreover, Scroogle didn’t use any cookies of its own, save your search terms, or even keep their own logs more than 48 hours.
Best of all for those searching from work or over insecure WiFi, there was a SSL-encrypted version of the search scraper.
But now, Big Brother Google has removed the interface Scroogle’s scraper used.
We regret to announce that our Google scraper may have to be permanently retired, thanks to a change at Google. It depends on whether Google is willing to restore the simple interface that we’ve been scraping since Scroogle started five years ago. Actually, we’ve been using that interface for scraping since Google-Watch.org began in 2002.
This interface (here’s a sample from years ago) was remarkably stable all that time. During those eight years there were only about five changes that required some programming adjustments. Also, this interface was available at every Google data center in exactly the same form, which allowed us to use 700 IP addresses for Google.
That interface was at http://www.google.com/ie but on May 10, 2010 they took it down and inserted a redirect to /toolbar/ie8/sidebar.html. It used to have a search box, and the results it showed were generic during that entire time. It didn’t show the snippets unless you moused-over the links it produced (they were there for our program, so that was okay), and it has never had any ads. Our impression was that these results were from Google’s basic algorithms, and that extra features and ads were added on top of these generic results. Three years ago Google launched “Universal Search,” which meant that they added results from other Google services on their pages. But this simple interface we were using was not affected at all.
Now that interface is gone. It is not possible to continue Scroogle unless we have a simple interface that is stable. Google’s main consumer-oriented interface that they want everyone to use is too complex, and changes too frequently, to make our scraping operation possible.
Over the next few days we will attempt to contact Google and determine whether the old interface is gone as a matter of policy at Google, or if they simply have it hidden somewhere and will tell us where it is so that we can continue to use it.
Thank you for your support during these past five years. Check back in a week or so; if we don’t hear from Google by next week, I think we can all assume that Google would rather have no Scroogle, and no privacy for searchers, at all.
— Daniel Brandt, Public Information Research, scroogle AT lavabit.com
This is an extremely unfortunate event — we have been denied the right to have SECURE, PRIVATE acess to Google’s search results. Please, make everyone you know that uses Google and/or is concerned about online privacy aware of these issues and this move by Google to further squelch privacy.
Hopefully Google will reinstate the ability for others to scrape from their engine, whether by resurrecting the http://www.google.com/ie used by Scroogle or by creating a standardized API.
Hopefully this was just a mistake on Google’s part.
Hopefully there won’t be any more severe, tornadic thunderstorms here in the Mid-western United States.
Hopefully pigs will start flying soon. 😀