06 Apr 2007

Surviving Rankings, Re-Rankings, Filters and Google Hell.

I was reading Mr Wall today who had been reading this interesting thread on webmasteworld, and made these comments on his blog in a post about Different link having different goals.

…Two big things that are happening are more and more pages are getting thrown in Google’s supplemental results, and Google may be getting more aggressive with re-ranking results based on local inter-connectivity and other quality related criteria. …

After reading most of that Webmasterworld thread, I see the best meat of that thread, of course, comes from Tedster:

My current idea (this is used in many IR approaches) is that a preliminary set of results is returned, but then one or more factors undergo further testing. The preliminary set of results is now re-ranked according to multipliers determined in testing just those preliminary urls. These test factors could also be pre-scored and updated on a regular (but not necessarily date co-ordinated) basis, and be available in a secondary look-up table somewhere for quick use.

If your url doesn’t get into the preliminary set of urls, then this re-ranking step won’t ever help you — because no new results are "pulled in". If your url is in the preliminary set, the re-ranking may help you. But if you fail one of tests, then your relevance score, or your trust score, or your aging score, or your whatever score, can be multiplied by 0.2 or a fractional factor like that. that would send your url on a rankings nose dive.

So this type of re-ranking could account for the yo-yo behavior we see, going from page 1 to end-of-results and back again. Note that the url is not thrown out of the result set, the preliminary result set is kept intact, but just re-ranked.

Part of making re-ranking technology like this practical and scalable would be getting very quick preliminary results — often cached preliminary results, I assume. This need for speed might also account for the large numbers of urls being sent to the Supplemental Index, making for a less unwieldy primary index.

Supplemental urls would only be tapped if the total number of preliminary results fell below some threshhold or other.

This is my current line of thinking – purely theoretical, although informed by some knowledge of the art of Information Retrieval. I keep looking at the changes and asking myself what kind of math could account for the new signs we are seeing.

As long ago as 2 years, GoogleGuy mentioned in passing that we were tending to think in terms of filters and penalties, but that Google was moving away from that model. I think they’ve moved a giant step further — although some filters are clearly still there (only 2 results per domain for example) and some penalties as well (often manual).

I believe Tedster hit the nail on the head with some great points.

I’ve had this picture in my mind of a row of Google servers. data is gathered, and fed to the first computer that ranks the pages based on Pagerank, the next computer then recalculates the rankings based in TrustRank, the next computer reorders the listings based in interconnectivity of the community, the next computer reorders them based on Filters that are applied, then more filters and more reordering…then there’s different datacenters each with a slightly different weights on each prior reordering….and as more data is fed in, and more reorders and filters are applied the more things change….phew!

So with this picture in my head, like Aaron, I too see obtaining diferent links with different goals. The goals that I see working on are these:

  • I see getting trusted links from trusted sites to raise the trustrank value.
    (Find Trusted sites and write to them and offer them something of value)
  • I see getting links from subpages that have direct trusted backlinks to them to help trust and power.
    (Get links from pages that have backlinks to them…..they are worth sooooo much)
  • I see boosting my co-citation site neighborhood by getting links on pages that link to other sites in my neighborhood.
    (Common Backlinks (link to our public tool, sorry, only our private  tool strips the crap scraper results from this list)
  • I see boosting my co-citation page neighborhood by putting other trusted links next to my links (mixing neighborhood with trust).
    (here’s my paragraph ad with links to me and a few other highly trusted related sites)

I see trying to do all I can do so that when Google’s "done" ordering and reordering and filtering, refiltering, over an over again, that if I’m lucky our guys will come out on top. That’s the goal. The types of links above are what I’m focusing on for my strategy.

And yea…as far as Google Supplemental Hell goes….yea, Google’s cleaning house….Tedster also makes some great notes about that.

Speaking of Supplemental Hell, here’s something else that I’ve been experiencing…if you publish 300 pages today and 3 months goes by and no one links to any of these 300 pages…guess where they might be going? (Supplemental Hell?) – (Moral, don’t publish a bunch of pages in once (esp in a new folder) unless you know they’ll get some backlinks and trust to those pages within a few months…I’ve seen many sites have entire folders go to supplemental hell after people published hundreds of pages in that folder in 1 day and then nothing there got a link for the first 3 months of existance….and seen other new folders/pages survive (so far) that got a small handful of nice backlinks to even a few pages in new folders…..kinda says something about checking for "quality rating" of new pages…no links to any pages in a folder and supplemental hell for them all. (Moral, publish a new folder with 300 new pages, get some of those pages some trusted backlinks, and hurry!)

In other news: I’ll be away all next week for SES in NYC. I was only going to stay a few days, but I’ll now be speaking on the "Linking Strategies" panel on Friday at 9am so I’ll be staying the whole week. See ya in NYC!