Hopefully this is the last time — this year, at least — that I’ll get to write about Scroogle’s search results scraper being disabled by a change at Google. It is back up once again, although the administrators have not stated exactly how they managed it. This is good, since the last time they gave specifics, it only kept working for under two months before Google changed Scroogle’s new target page into an unusable form.
What they do mention about the new method makes me a little uneasy about their future, however. I haven’t been able to find a dedicated page about it on their website yet, but
currently for a day or two “http://www.scroogle.org/cgi-bin/nbbw.cgi?Gw=” states stated:
If you arrived here because you followed a link in recent stories about Scroogle’s downtime on 2010-07-01, please be advised that Scroogle is now operating normally. Google has not restored the old simple interface so we are using a different one. We’re not happy about this because the file we have to fetch from Google is three times more bloated for the same results. Also, our malware problem continues as before.
You can follow that link and read through Scroogle’s recent battle against malware, but boils down to: “For the last two weeks, an easily-identified and blocked malware script has been eating our bandwidth. This is a bad thing — and completely unlike our activities, because Google is big and can easily handle us but we only have six servers. Lawlz.” My wording, of course. 😛
Meanwhile, the administrators have a theory about why they have been, and still are, getting spammed, due to the experience of one Scroogle user.
“When he opened the tab up later, something had filled it in for him. It could be that the empty search terms we see with our current malware problem have to do with an initial pass made by the malware using the victim’s browser, just to see if anything is responsive.”
I assume that nobody else, including a cat, came along and decided to type something in just to mess with this person. 😀
But this is rather serious. True, Scroogle is an external service which, while operational, may be in violation of Google’s Terms of Service, section 5.3, but its problems are a symptom of Google’s own weakness at filtering out spam sites — sites that can deliver malware to vulnerable PCs, if they are designed to do so. And clearly, are doing so.
Good thing I don’t like using Windows for day-to-day normal usage. 😉
A humourous curiosity you may see, if you’re attentive to deatail: At the top, in bold print indicating the original official summary, Mr. Brandt states that, “This has nothing to do with Google itself,” but in the update from July 5th, at the current-bottom of the page, has a sarcastic comment of:
“Hello, Google, is anyone home? If you want to shut down Scroogle, why must you steer hundreds of spam pages, any or all of which could be gateways to malware, to our site? Why not just take down your simple interface that we scrape? (Oh, wait, they did that too.)”
Sarcasm, ironic truth? You be the judge.
Ultimately, I just hope that:
- Scroogle gets more servers up-and-running to counter the increased load from the new methods they’ve been forced to use
- Google puts their censorship to good use and removes these sites from their results index.
- Scroogle gets some design artists — I don’t want to be rude, but their site is PAINFULLY ugly. Even a simple background colour tweak with CSS would make things look MUCH better, without increasing the server load at all.
- Whoever created the botnet that’s doing this gets imprisoned, or at least HEAVILY fined, for their stupidity. I get that they would want anonymity for this little project, but putting load on a tiny privacy-promoting site
Scroogle can STAY UP for a while, perhaps by designing a few additional scraping methods in case their current method is rendered ineffectual as well.