Doesn't Not Compute

My log of experiences with GNU/Linux and computers in general.

Dejavu All Over Again, Big Brother Google-Style

Once again, Google has removed the interface the scraper-engine Scroogle.org utilized to provide its users with Google-search results.

The announcement that appears after one tries to use the search function:

July 1, 2010: Here we go again…

We regret to announce that our Google scraper may have to be permanently retired, thanks to a change at Google. It depends on whether Google is willing to restore the simple interface that we’ve been scraping since Scroogle started five years ago. Actually, we’ve been using that interface for scraping since Google-Watch.org began in 2002.

This interface (here’s a sample from years ago) was remarkably stable all that time. During those eight years there were only about five changes that required some programming adjustments. Also, this interface was available at every Google data center in exactly the same form, which allowed us to use 700 IP addresses for Google.

That interface was at www.google.com/ie but on May 10, 2010 they took it down and inserted a redirect to /toolbar/ie8/sidebar.html. It used to have a search box, and the results it showed were generic during that entire time. It didn’t show the snippets unless you moused-over the links it produced (they were there for our program, so that was okay), and it has never had any ads. Our impression was that these results were from Google’s basic algorithms, and that extra features and ads were added on top of these generic results. Three years ago Google launched “Universal Search,” which meant that they added results from other Google services on their pages. But this simple interface we were using was not affected at all.

It is not possible to continue Scroogle unless we have a simple interface that is stable. Google’s main consumer-oriented interface that they want everyone to use is too complex, too bloated, and changes too frequently, to make our scraping operation possible.

After a lot of suggestions from Scroogle users, and a fair amount of publicity, we found a fix and Scroogle was back in 24 hours. This fix was to insert an extra parameter, &output=ie, into the search terms that were relayed to Google. The extra parameter recovered the same interface that we thought was gone forever.

Now it seems like it actually might be gone forever. Late on June 30, 2010, the results produced while using this parameter began to shift to the usual busy Google interface with ads and a left-margin sidebar. Scroogle users saw a Scroogle page that said, “Google returned no results for this search,” when in fact Google returned results but our scraper was unable to deal with them. Over the next few days we will attempt to contact Google and determine whether the old interface is gone as a matter of policy at Google, or if they simply have it hidden somewhere and will tell us where it is so that we can continue to use it.

Thank you for your support during these past five years. Check back in a week or so; if we don’t hear from Google by next week, I think we can all assume that Google would rather have no Scroogle, and no privacy for searchers.

— Daniel Brandt, Public Information Research, scroogle AT lavabit.com

This is not the first time that Google has removed interfaces which were originally designed for Internet Explorer 6, and tapped by Scroogle. And I suspect that, if Scroogle somehow finds a way to scrape the results again, Google will again remove or change it.

Alternatives

Ixquick/Startpage

As I’ve used it some in the past, before finding Scroogle, I recommend using Ixquick , which scrapes multiple search engines and directories and does not record your IP address (unless you abuse it with an automatic tool of some kind).

You can read their privacy policy here (United States version). They also have translations in UK English, Danish, German, Spanish, French, Italian, Dutch, Norwegian, Polish, Portuguese, Finnish, Swedish, Turkish, Korean, Japanese, and both Traditional and Simplified Chinese.

Furthermore, Ixquick also has a SSL-encrypted version, for further privacy, and an alternate, easier-to-remember URL: startpage.com

DuckDuckGo

Alternatively, DuckDuckGo appears to be an equally useful option which also has an SSL-encrypted version, but doesn’t appear to have any international options.

Like Google’s search page, the DuckDuckGo search page is very spare and clean. It also has options (which I haven’t tried) to search specifically for information or on shopping sites, and a “I’m Feeling Ducky” option which works the same as Google’s “I’m Feeling Lucky” button —  a clever and humorous option.

Both have options on their front page to add themselves to your browser, if you like using search boxes in your browser. (I don’t, usually.)

Google SSL

For those of you who (unwisely) don’t care if Google knows everything you search about, only that whoever is in charge of your local network (boss, ISP, etc.) doesn’t, Google currently has an encrypted connection to their engine in Beta testing. Google provides information about this on their site here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: