Google’s Spam Rating Guide
I previously posted how Google uses humans to QA search results. Henk van Ess has subsequently posted the training manual for people doing the QA.
There’s nothing too shocking in the guide. Some of the techniques it specifically addresses include:
* sneaky redirects
* 100% frame
* hidden text/hidden links
* porn on expired domains (I call this porn-napping)
* secondary search results/PPC
* thin affiliate doorway pages
I especially liked the in-depth discussion about what constitutes search engine spam in the case of hotel booking sites. I guess this is a particularly thorny problem.
Henk also posted the descriptions of how evaluators should rate. It would be fascinating to deconstruct this document from an information science standpoint.
FWIW, I still find a fair amount of search engine spam when I ego-search on my name or my articles (usually what the guide would call “secondary search results/PPC”). A page is built with snippets of legitimate content that index OK, but the pages themselves are gibberish in total. I’ve found small portions of my articles used as the snippets for these pages. Google does a pretty good job keeping these sites off the first page, but any deeper investigation usually turns up a good number of junk pages.