I believe that after they removed the ability to clearly see which pages are in the supplemental results, that they then went on a binge of putting way more % of pages into this “Supplemental index”.
So something to understand today with Panda is that Google was already pretty good at tossing the majority of everyone’s pages on their sites into the supplemental results. At least the deep pages, and the pages with little content, and the pages of dup content…I had a call with a client today who had 397,000 pages indexed in Google…I told him that already Google probably had all but perhaps a few thousand pages in the supplemental results…and now, after the April Google Panda Update, he has about 20 quality pages…the rest needs to be redone and updated for 2011 and beyond…or should I say from after being Pandasized.
Google’s been tossing duplicate pages, and poorly linked pages, and pages with little content.
Since then they’ve added some other signals….what used to be the “supplemental index” has probably been rolled into a later update that they did called Caffeine in 2009…
but telling what is “good” and what is “bad” when it comes to original content with “power” has been a weak point…just because a page is original, and powerful, doesn’t mean it’s a quality page.
So how would I, if I were Google, tell if a page were "good" or "bad"?
....Is time on the page important?...maybe a little...if I see 1000 words, and the average time on the page is 15 seconds, I wouldn't give all that content much weight in the content part of their equation... but it can still certainly "solve the question" someone was searching for...in fact, it can solve the question in just a few seconds and still be "good" content..
Does the number of pages visited on your site make a quality page?...is it better that someone engages with your site...but can't they get the answer w/o engaging with your site....does Google care if they engage with your site...or do they care if the searcher quickly finds what they were seeking?
Is "brand loyalty" important?... well...does someone need to come back to a site for it to be a good search result?...maybe...but then again, Google probably doesn't care too much if people go back to your site again and again.
What if it gets high Click Through Rates (CTR's) in the Search Engine Results Pages (SERP's)?...yes, that can be an indicator that that is what people are looking for...and Google, I'm sure, is giving sites with high ranking CTR's in SERPS a ranking boost... and those with low CTR I'm sure are getting a "negative" in that part of the algorithm. I believe that CTR in SERPS has been a signal for years...but that still doesn't tell if the searcher found what they are looking for when they did go there.
So why, if you grab a phrase, post panda; do the scraper pages rank higher?...why I'd say it's because you tripped a negative content filter...and a negative content score must be worse than a supplemental results page...thus the supplemental results show up first, above you....making it look like they're ranking those pages higher...but even those scraper pages won't get much traffic being in the supplemental index.
Where in the webmaster tools, or Google analytics, can you find who went back to Google???...answer...you can't...does Google know...yes...do we...no, and that is the main reason why people are looking over their analytics and going insane seeking the "perfect" rules for what you need...that's hard to do with the information that is available to you even with Analytics and Webmaster tools....there must be large signals that are missing to fill the gaps where analytics and usability fail in the analysis of Panda...
Consider, also, what if you were ranked #1, and your CTR was 50% (Average CTR for #1 organic rankings is between 35-45 % of all clicks)…so that would be a great Click Through Rate…but what if 98% of those people returned to the same Google search, and then they clicked on the guy at #2…and then they didn’t return to Google for another hour, where they performed some other search for something else…
I’d say that they must have found what they were looking for at #2…and they did Not find what they were looking for at your site….if 98% of the people that went to your site, just backtracked to that same search and stopped at #2, then I’d say that your content might not be great for that phrase…even if it’s original content…and even if it’s at a level of not being supplemental.
I know of no way to tell who went from your site back to Google, and what they did once they went back to Google….and, if they went back to Google, #1 did the searcher return to the same Google SERP and clicked on someone else….or, #2, did they returned to Google and run another search?
We can’t measure this…Google doesn’t give us this information….so how do you tell if the content is good?
On February 25th, the day after Google Panda recovery Started, Amit Singhal, a Google Engineer, gave an interview with the Wall Street Journal. Here's the important part:
Singhal did say that the company added numerous "signals" or factors it would incorporate into its algorithm for ranking sites. Among those signals are "how users interact with" a site. Google has said previously that, among other things, it often measures whether users click the "back" button quickly after visiting a search result, which might indicate a lack of satisfaction with the site.
In addition, Google got feedback from the hundreds of people outside the company that it hires to regularly evaluate changes. These "human raters" are asked to look at results for certain search queries and questions such as, "Would you give your credit card number to this site?" and "Would you take medical advice for your children from those sites," Singhal said.
...Singhal said he couldn't discuss such details and how the search algorithm determines quality of site, because spammers could use the answers to game the system
Now looking at this...would I trust the data I get from "hundreds of people"...or from looking at 34,000 searches per second?....yea, I'd use the data from the searches a lot more than I'd put weight in what "hundreds" of people say to your questions about credit cards and medical advice things....
But anyways... a week later, Matt Cutts joins Amit Singhal in an interview with Wired Magazine:
Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?
Singhal: That's a very, very hard problem that we haven't solved, and it's an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we've developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: "Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?"
Cutts: There was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?" Questions along those lines.
Singhal: And based on that, we basically formed some definition of what could be considered low quality.
Singhal: You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there's some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red.
Cutts: ... But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings.
Keep in mind how these answers differ from the original interview with just Amit Singhal....in the original interview with just Singhal, it was said, "Google has said previously that, among other things, it often measures whether users click the "back" button quickly after visiting a search result, which might indicate a lack of satisfaction with the site."
But now, a week later, with Matt Cutts there, the "click back to Google" isn't mentioned...instead it's just the "hundreds of people outside the company that it hires to regularly evaluate changes." that are talked about...some, things like "clicking back" are not mentioned in the interview a week later when matt's there as well...but Matt Cutts does say "...But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings."
It's worth noting this link where over 2000 people are crying to Google saying they were collateral damage...and probably 14% are really collateral damage....you can add your site to that list for the Google engineers to check out: Google Webmaster Central Help Forum In there, WYSG, a Google employee states:
Our recent Google Panda update is designed to reduce rankings for low-quality sites, so the key thing for webmasters to do is make sure their sites are the highest quality possible. We looked at a variety of signals to detect low quality sites. Bear in mind that people searching on Google typically don't want to see shallow or poorly written content, content that's copied from other websites, or information that are just not that useful. In addition, it's important for webmasters to know that low quality content on part of a site can impact a site's ranking as a whole. For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.
We've been reading this thread within the Googleplex and appreciate both the concrete feedback as well as the more general suggestions. This is an algorithmic change and it doesn't have any manual exceptions applied to it, but this feedback will be useful as we work on future iterations of the algorithm.
You can add your site to the over 2000 sites above in the Google forum...You can analyze your webmaster tools data and your Google analytics data and get a lot of "partly smoking gun" information (which is helpful, and we do this as well)...
And, certainly can't hurt to look at your pages and ask these questions:
Perhaps the biggest question of all I really think should be "How do I get people to not quickly go back to the same Google search?"
The past few months I’ve been reading a lot of interesting theories about the Google Panda Update…and I’ve read a lot of noise (stuff I don’t believe)… even one of the top results in a search in Google for “Google Panda Update” is a page talking about Panda and low quality backlinks….Panda has nothing to do with backlinks…trust me…if it did, I, of all people, would be shouting that my link builders have the solutions to Panda…my link builders can’t help with Panda (but my other teams can)… I haven’t heard Google talk about social signals…but I keep hearing people mention social being a “help” to Panda…um…I don’t think so….this is a content issue…not a internal/external link thing…not a social thing…I totally believe it all has to do with Google’s analysis of User Behavior in relation to the each page of your website (or sets of pages on your site).
I read a nice thread over at WebmasterWorld that was started by Brett Tabke called Panda Metric: Google Usage of User Engagement Metrics
This information may include statistics, quotations or other data that is relevant for google metrics
This information may include statistics, quotations or other data that is relevant for google metrics
There, Brett nicely outlines all the things that Google knows about searches including, how you got to Google, where you’re from, your history, your browser, your tracking data, and cookies. Brett goes on to say:
At this point, Google knows who 70-75% (my guess) the users are and what they are doing on any given query, and can guess accurately at another 15-25% based on browser/software/system profiles (even if your ip changes and you are not logged in, Google can match all the above metrics to a profile on you)….
Finally, after all that data, the user probably types in a query: (if the search didn’t come from offsite).
Then there’s the query entry, the SERP behavior, and then the click on a result. At that point, Brett says, Google looks at:
This information may include statistics, quotations or other data that is relevant for google metrics
We know link text counts forward (to the page the link is pointing to) I think part of what Panda does is reverses the scoring and the quality score of the linked page counts backwards (to the page doing the linking).
Keep in mind this is ‘speculation only’ ATM, but I really think people are looking in the wrong place when they’re simply looking at link based scoring ‘the old way’ … Simple link weight based scoring is soooo 2000, IMO.
Think of it this way…if you have 100 pages on your website…and if Google thinks that 70 of those are “Panda Poop Pages” (Yes, I’m coining a new phrase here…Panda Poop Pages)…and say they score those Panda Poop Pages each a negative 10 score)…then your site can get a negative score overall if all content is added up and scored across your site…beyond that, possibly, if you have an internal page that has 100 links on it going to other pages of your site, and if 80 of those links go to “Panda Poop Pages”, then that page might have a lowered ranking itself because a user has a 80% chance of going to a “Panda Poop Page” from that page…
If this is the case, then improving things on a page-by-page level, will in turn, now tell Google that the page with 100 links is now only linking to 79 “Panda Poop Pages” instead of 80, and that page will increase ever so slightly…
I have a feeling that sites that have been Panda Pooped on, will not just get clean overnight…nor see any big “Wa-La!….We’re back”…they’ll see slow steady increases…page-by-page….which will, in turn, help the pages above those…and in turn, help the site as a whole… again, I don’t know….no one has yet to this day said “We came back from Panda” and I don’t think you’ll ever hear that story unless it’s a story about a whole year in time slowly bringing trust/rankings back….there are some stories of pages coming back…but there’s also been stories of things bouncing around in rankings… I had one client who I spoke with today who prior to Panda II had ranked #4 for a major phrase. After April 11 he dropped to #15…then he dropped to the 50’s…then last Friday he was #12, and today he’s #8 …and keep in mind, that he hasn’t done a thing with the site since the April Panda Update. …there’s still some bouncing around and threads where people are saying “hey, a page came back”…or is “recovering”…and then the next day it’s “sorry…it fell again”…
So, there have been several iterations of the Panda Update so far, and more tweaks are expected.
Google says that they were targeting “Content Farms”, but none of the sites that I have been analyzing have been “Content Farms”. The clients that I had that were affected by Google Panda update, and the people that came to me later, were all basically ecommerce sites selling products… they didn’t have content about everything under the sun, they were just ecommerce sites with lots of “product” pages.
I can kind of understand Google going after “Content Farms” and it’s a great battle cry of Google…”We’re getting rid of content farms…and no one likes content farms.”… but the sad fact is that the reach of Panda update went way beyond content farms…
If I showed you any of the Panda update affected sites that I’ve been looking at, you’d never think any one of these had any characteristics of a “content farm”. .. so keep in mind that I’m looking at this mostly trying to analyze the ecommerce type of sites that were effected…these are sites with typically 20-100 main categories, and with a few thousand individual products. Most of these product pages either had “manufacturer” content that many other people had, or rewritten content, or a mixture of original content and dup content. Many of these sites had “mash up” pages of their products which was a large problem, and many of these sites had issues with sending people back to Google (another story I’ll write more on later)…
It’s worth noting that Amit also said in that post:
One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole site’s rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content.
So keep in mind that your crappy pages will effect rankings on your entire site, and that you have to “remove, merge, improve, or move” those lower quality pages to have a chance of helping your rankings.
Even though I wrote some tongue in cheek responses to some of these questions, I do appreciate Amit sharing these tips on how to write better content. I’m not sure how much of this is really figured into the algorithm of the Google Panda Update, but as far as content writing for 2011 and beyond, this is a great guide to how to write content post-Panda.
Jim: My first thoughts are that Big Brands win again. People will trust Brands over Mom and Pops....but then again, ....getting a computer program to guess "trust" in the content itself is pretty hard...They might be able to guess trust based on ad layouts...but in the content itself.... I find that much harder to gauge w/o knowing who clicks on a site in a Google search, but then goes back to the same Google search and clicks on someone else, and never returns to you. (Google knows, but we don't)....
Tip: One thing that might help could be if you put great links on all your pages going to either great pages on your site, or great pages on other sites (link out to trusted places and you will become more trusted yourself)....
Tip: put Trust Seals all over your pages
Jim: Richard Zwicky, authored a post in Search Engine watch called, "Is Author an Authority Signal for Google?". This post references a Patent which Google was granted in 2010 that talks about "Blog Authors".... if you take the word "blog" and replace it with the word "article" then the importance of having a "real", "Expert or Enthusiast" person writing content for your site becomes clear. If you have 100 pages of content...and no author cited for that content, what value is any of that "non-authored" content compared to pages written by a "Google known" "Expert or Enthusiast".
Tip: Perhaps all your content should have a "written by..."Jim B. Twain"...and Jim B. Twain should only write on this broad topic...and Jim B. Twain needs a Facebook page..and a twitter account, and a bio page, and he should comment in blogs and forums, linking that name to his bio page....the bio page would say, "James B. Twain is not only an Enthusiast, but he's also an Expert in [whatever you sell]." Create a Jim B. Twain...and make him real.....And then make him write 100 pages enthusiastically in an expert style.
Tip: You can check the reading level of pages, by searching with reading levels turned on...like this search....but is Reading Level a measure of "writing "expert" level"?
Jim: This is the main reason I saw most ecommerce sites hit. Many sites created a lot of pages with different mashups of products, and with a little bit of unique content... but very similar to other related pages....
Tip: Consolidate your category pages and consolidate all your current pages wherever possible. Instead of targeting 3 phrases on a page, target 30... oh, G will hate me for saying that....oh well. You need content to get found....if someone searches for "cheap blue widgets new jersey", then you're not going to show up in search results unless you have all those words somewhere on a webpage... You can't kill content...content is what will bring you traffic...you just have to write it better and not do a bunch of cheap crappy articles that are all very, very similar (i.e., articles on 500 pages that are all about 3 paragraphs in length with no authors are not given much weight.
Jim: Tip: Pay companies like Verisign and McAffe to get their "Trust Symbols" like these symbols on your site...in fact I think that so many sites are going to do this now that I just purchased stock in Verisign and in Intel (they own McAfee now)...now, everyone go buy those trust signals so my stock will go up!
Jim: Does this mean that if you can't spell...or your style is not wrong....your screwed by Panda...if so, I'm screwed.... "Factual errors"...now how can Google know this?? But anyways, my style is horrible...who writes with 3 dots all the time?... hopefully Panda won't poop on me for having my own unique writing style
Tip: Send every page through a spell check program....seriously, have your intern do this today!
Jim: Hum..first, can Google really tell what drives the topics I write about?...If I have a ecommerce site, and I have a page about "Pool Skimmers"... am I writing that page because of the readers genuine interest in pool skimmers, or am I writing so that a relevant person searching in Google can find my pool skimmer page, no matter what combination of words they typed in, so long as it related to my pool skimmer page. I write for my site visitors....and one of my users is Googlebot, and Googlebot brings many people to my site...so I feed Googlebot content using words that I think people might search for.... Do I write based on what I think will rank higher...always, I'd be a fool if I didn't use keywords people might search for in my content.
Tip: When you write content today, don't write just a bunch of crap for the search engines... pretend that someone will really read the 1000 words of content that may be on your pool skimmer page.
Jim: Hum...because if it's not original, Google might rank it higher than the original....and that can be a problem for Google...all kidding aside, duplicate content wasn't an issue in the past... but now those pages that Google said for years wouldn't hurt you, (those pages they used to put in the supplemental results might now make the Panda poop on you today).
Tip: don't include duplicate content, and don't repeat content excessively on your site. Also tell your writer to be sure to do some original analysis in every piece of content.
Jim: I try not to "guess what might rank well in the search engines" (see question #6)...so I don't know what phrases to search for so I can run this comparison...all kidding aside again, I do love this question, because the only way to measure this, think might be to be if someone left your site and went back to Google and didn't come back to your site.
Tip: You need to do an analysis of the top 10 pages in every search result that you're targeting, and make sure that you page has substantial value compared to the other pages.
Jim: Can Google really know how much quality control you have on your content?... don't think so...but in any case, we should pretend that they can.
Tip: If you don't have a content editor, you better hire one. Fix your broken links, and internal links that go to redirects, and spelling errors, etc.
Jim: Is there always 2 sides? Does an article always have to show 2 sides to not be panda pooped on?
Tip: Guess you have to tell people to find both sides...an "expert or enthusiast" isn't writing well unless they covers both sides of anything...this product is good....and it's kinda bad....lots of people like it..but some hate it....
Jim: Brands win again.... what can I say...Google makes SEO a lot about Brands and Branding...the bigger, the better...sorry mom and pop. Hum....Maybe I should give away "Authority Awards" and Authority Seals that people can put on their content...."This Site is an Authority on [insert title tag]" seal for every page...and of course a link back to Jim's Authority Seal.com (which feeds WBP links)...jk,
Tip: Put your Facebook and Twitter link on the bottom of your content and let people know that if they have any suggestions for better content on any page that you'd love to have it...and in fact, if someone can write a better page, you'd pay them to do that...seriously... and back up every article with "We're/I'm an Authority on [something relevant to article]".
Lets break this question down into the 3 parts...
Part 1. Google: Is the content mass-produced by or outsourced to a large number of creators
Jim: For many sites, their approach to content creation was "how few words of original content do to I need to be "OK""...and "how cheap" and how many pages can I afford to put this crap on below the fold?"...but seriously... if you're selling a product line with 10k pages, how do you write 100 pages of "expert, enthusiast" content?..100 pages times $100/page... my writers can do that...anyone out there have a ten thousand dollars they'll give me..and I can deliver those 100 pages...seriously. it also looks like a lot of writers in India, and a lot of work at home moms in the US are going to find that the standards of their content writing are toast, and "C" and "D" papers won't cut it anymore.
Tip: By gosh, if you have a swimming pool accessory site..and you have a product page about Pool Skimmers...you'd better have a unique, original research article on there written by an Expert or Enthusiast on pools at least....any experts or enthusiast you have, you must now make them your writers writing all your content pages (or at least make it look that way).
Part 2. Google: or spread across a large network of sites
Jim: Well, one side effect will be that product descriptions will get slammed if they are not original to your site....it wouldn't surprise me if "mash ups" of product descriptions on multiple pages within your own site is frowned upon... problem is, many, many ecommerce sites do this... they've just found out that this is a "bad thing".... well...they know now...things have changed...back in the day of the Supplemental index, "poor" or "dup" pages wouldn't hurt you...today the panda might bite you in the ass for having those types of pages.
Part 3. Google: so that individual pages or sites don't get as much attention or care?
Jim: Are you starting to feel like you're not updating your company website enough...I know that I am...damn... I'd better rewrite those old blog posts that I did in 2005-2007... if I have pages that I don't pay attention to, the Panda might poop on my site too.
Tip: OK... if you're going to write content on your site, it better be done in house by an expert or enthusiast...because your expert and enthusiast has nothing better to do than to write 100 product pages for your ecommerce site...half kidding, half serious
Jim: Tip: Hire an editor, because your expert and enthusiast probably isn't the best writer, nor editor...but tell your writer that yes, you need to write 100 pages of content...and tell them that the content had better not look sloppy...they should take their time writing it...and then hire an editor because, trust me, you can't trust your expert or enthusiast to edit their own writing....ok....that advice was a touch tongue in cheek.
Jim: I don't have time now...but later remind me to buy stock in WebMD.
Tip: If you're not a health site, don't write about health issues...if you sell pool supplies, don't write about the health benefits of swimming...yesterday we thought this would be ok... now Panda say's this isn't cool.
Jim: Ah, again...Big Brands Love Google...and Google Loves Big Brands....and Mom and Pop...the Panda don't like you if he don't know you off the top of his head.
Tip: How is your Branding Campaign going?... Facebook...Twitter....also put on the bottom of every page "We are an Authority on [topic of page]".
Jim: So giving a page where you just "gave the answer" is no good any more... it'd better be "complete or comprehensive"... remember to tell your in-house expert enthusiast to write complete and compressive pages for your 100 product pages.
Tip: When you write...the amount of words recommended per page just changed... in the past many people would recommend a minimum of 350 words on a page...not anymore...it's now a minimum of 1000 words per page...and if the paper still isn't complete or comprehensive at 1000 words, then keep writing until it is.
Jim: Tip: Tell the writer when they write to make sure that articles all have a section called "Insightful Analysis on ...." and another paragraph called "Did you know?".
Jim: Did they mean to say, "Is this the sort of page that you'd "Like" on Facebook or Tweet About?" Did they write it in 2005 wording so they wouldn't say "Like" and Tweet?..
Tip: Tell your expert enthusiast writer to (either him, or hire a social guy)...and get them to get Facebook Likes and Twitter mentions to as many of your pages of your site as possible...and yea, get people to bookmark those pages too.
Jim: This is actually an interesting approach that people haven't been talking about...yes, with the Google Panda Update, Google had to have taken a cut in income as fewer people were clicking on Made for Adsense (MFA) sites now that they've been cut down...Google created the Content MFA Monster, and now they're cutting them down...my Google Adsense account keeps sending me tips every month on how I can make more money if I put more Adsense banners on my pages....but on the other hand, if I followed this advice that Google Adsense gives me, the Google Panda might poop on me.
Jim: Tip: Hum.... make sure to tell your in-house expert enthusiast writer to write it good enough for a printed magazine, encyclopedia or book....
(I can't wait to see this pool skimmer page).
Jim: Yup... if you have a page with just a 100 word product description, you might get Panda pooped on.
Tip: Add a section on your pages called "Specific Helpful Information".
Jim:...what is "attention to detail" in an algorithm?
Tip: I think your content pages should have a lot of pictures on them...that might show lots of detail?
Jim: Should they have added this to the end of that sentence ",and then click on an Adsense ad on their way out of your site?"?
Tip: Ask everyone to complain about your site...and use that feedback to improve it.
Let’s Learn Agent Rank and Reputational Scores…it’s about content and writers and Panda.
So here we are, almost months since the “Big” Panda Poop of Feb 24th…and still, little light at the end of the tunnel for those who got pooped on by the Panda Update.
Remember the 23 questions that Amit said we should ask ourselves for guidance on building high quality sites?
Let’s look at these specific 4 questions: (this may help with the Google Panda update recovery)
 The name of the writer can be used to influence the ranking of web search results by indicating the writer responsible for a particular content piece. In one implementation, the reputation for a writer is expressed as a numerical score. A high reputational score indicates that the writer has an established positive reputation. The reputational scores of two or more writers can be compared, and the writer having the higher reputational score can be considered to be more authoritative. In an alternative implementation, multiple scores can be computed for different contexts. For example, a writer might have a first score for content that the writer has written, and a second score for content that the writer has reviewed. In another example, a writer that is responsible for an entertainment magazine could have a high reputation score for content related to celebrity news, but a low reputation score for content related to professional medical advice.
 Assuming that a given writer has a high reputational score, representing an established reputation for authoring valuable content, then additional content authored and signed by that writer will be promoted relative to unsigned content or content from less reputable writers in search results. Similarly, if the signer has a large reputational score due to the writer having an established reputation for providing accurate reviews, the rank of the referenced content can be raised accordingly.
 A high reputational score need not give a writer the ability to manipulate web search rankings. In one implementation, reputational scores are relatively difficult to increase and relatively easy to decrease, creating a disincentive for a writer to place its reputation at risk by endorsing content inappropriately. Since the signatures of reputable writers can be used to promote the ranking of signed content in web search results, writers have a powerful incentive to establish and maintain a good reputational score.
 In one implementation, a writer’s reputation can be derived using a relative ranking algorithm, e.g., Google’s PageRank as set forth in U.S. Pat. No. 6,285,999, based on the content bearing the writer’s signature. Using such an algorithm, a writer’s reputation can be determined from the extrinsic relationships between writers as well as content. Intuitively, a writer should have a higher reputational score, regardless of the content signed by the writer, if the content signed by the writer is frequently referenced by other writers or content. Not all references, however, are necessarily of equal significance. For example, a reference by another writer with a high reputational score is of greater significance than a reference by another writer with a low reputational score. Thus, the reputation of a particular writer, and therefore the reputational score assigned to the particular writer, should depend not just on the number of references to the content signed by the particular writer, but on the importance of the referring documents and other writers. This implies a recursive definition: the reputation of a particular writer is a function of the reputation of the content and writers which refer to it.
Here’s an overview look of 5 sites that got hit by the Google Panda Update in April. All sites showed a steady drop in traffic between April 6th and April 11th. If you have seen a drop in traffic like this, during these periods, then you’ve been pooped on by the Google Panda Update. Most sites we’ve seen had a drop of between 25-80% in Google referral traffic for sites affected by the February and April Google Panda Updates.
Site #1: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. Worth noting that the bounce rate is nowhere near “accurate”…I understand why Google shows such a low bounce rate in this particular site’s case…but that’s another story for some other time.
Site #2: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn’t a content farm…but there were many similar pages.
Site #3: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn’t a content farm…but there were many similar pages.
Site #4: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn’t a content farm…but there were many similar pages.
Site #5: Looking at non-paid traffic from Google from April 2 – May 1st.Drop between 6-12th. These people don’t think that they got hit by panda….they think it has to do with their backlinks….
I lost 3 clients…these 3 clients were all hit by the April Google Panda update
This is a big brand company. If you live in the USA, I'm sure you've seen their commercials. Personally I owned 2 of their products before I ever worked with them. This is a big e-commerce site...some of it is "Killer content" and some of it is "OK"... it's those damn mashups of products on internal pages that can get you with Panda... and it got them... and even their great pages, and great rankings, all got hit in the April Panda update. This company has several in-house SEOs and I'm sure they also work with several other internet marketing companies...and now after Panda this company just lost 40% of their internet traffic...and a few days ago I was told that they're "on pause for services"...for "who knows how long" (probably every internet marketing vendor is "on hold" for them) while the company figures out what steps to take.
I started working with them in April of 2005... We took a site with good rankings and over the years we took it to the top for the short tail, and then later, for the long tail. Over the years we went through changes with them...and a few years ago they also hired some in house "SEOs" to help them out more. The original owner, whom I had worked with for years, became much busier as the CEO of the "World's Largest Online Retailer of ...." and for the past 3 years, I've only spoken with their in-house SEOs. The in-house SEOs didn't work great with us...and they only did about 1/2 of what we recommend our clients do for maximum results of our work...but hey, if they'll do half of what we ask, that's better than nothing...and overall, their long tail still greatly improved and things were working OK...even if they didn't do all we asked of them...
This client was hit by the Google Panda Update...and I had a few calls with these new SEOs...and they, like client #1, are stuck on it being about backlinks...they have over 200,000 backlinks...our team was able to help obtain about 250 links over the past 12 months...small in comparison to their total of links...but (in my biased opinion), ours were the best 250 links they had. ....the links we got for them are their most trusted links that this site has..these are unpaid pertinent links where even about 25% of are edu links ....links are the least of their concerns anyway now with being hit by Panda...on top of that, several of the 200,000 backlinks that they do have, are the ones that stick out as "unnatural"... the links we get are "natural" (with a ninja influence).... and the last call to them, which was supposed to be about me reviewing me and my teams recommendations for "solutions to Panda" turned into a call about backlinks (them complaining when the rare edu page goes away..."why don't you maintain these?" they ask..."Dude...do you want me to write to a college librarian and say 'Hey, I noticed that the page on "Cars" is missing...or that the link on the cars page going to my client site is missing...would you please put that back up??".."No!"...our link ninjas are great, because they don't act like SEOs.). But anyways, today the "SEO Kids" at Client #3 fired my company.... rrrrrrr.
We did have a funny thing happen today when we got a call from the "Prior to Panda" biggest competitor for Client #1 who was also hit by Panda...and he wants our help and advice... advice I wish that Client #1 had asked for (I gave Client #1 advice and several reports for Free to help with Panda) and I wish they had listened to my advice...really...I've been around the block for over 12 years now and I still digest SEO even in my sleep...to these "kids" that work for Client #1 and Client #3, I bet SEO is just a "job" to them...to me, it's my life....if I say "You've been hit by Panda"...please believe me.... if I say, "yes, those 200k of backlinks are not cool...but that's not what caused your traffic to plummet on April 11″...believe me.
I remember back in November of 2003 when the Google Florida Update hit (that was, I believe, 23% of all search results changing…twice as bad as panda)…and I had 2 huge clients with me at that time. Each of those 2 was 1/3 of my total income, and the remaining 25 clients made up the last 1/3 of my income… When Florida hit, one client lost rankings on 1/2 his sites…so he canceled… the other big client, knew of my other client…saw that 1/2 his sites didn’t rank anymore, and quit too (even though his rankings increased with the Florida update)… that put me down to 1/3 of my income … all in one week…. I survived… and grew… and even though Panda has hit 12-16% of all sites…we had a lot less than that percent effected overall for our clients…but it still hurts when you lose a client… losing a client I’ve had for 6 years makes me feel sad…losing anyone makes me feel sad… I don’t ever want to let a client down…and I still feel that when a client leaves, that I’ve somehow let them down…that’s why I’ve been staying at the office several nights past midnight since the April Google Panda update happened (that was the update that effected a handful of our clients)…so I’ve been working on learning and analyzing, and even in reporting on Panda. I feel bad when I know that I can help these companies…but there are “kids” in my way… oh well… I tried with those 3 clients … I even gave them reports from our content team, our analytics team, and our usability team, and from me personally, with suggestions on what to do. I was also personally involved in every “Panda” client call… and in the end, I still lose 3 clients….I’m trying to do everything I can for them…but it’s just sad when you lose a client…especially one whom I’ve had for over 6 years…. sorry for the rant…I don’t like losing clients…nor losing $17.5k in income each month from this loss….that’s $210,000 over 12 months…and that sucks!
At least there are probably other SEO companies who are also losing clients because of Panda, and perhaps some of their old clients will come to me and my company for help…and then they will become clients…and perhaps I can earn more than $210,000 this year in new clients because of Panda and our ability to help with Google Panda recovery…. we’ll see….
OK, Folks, we have some old words, and old signals that have become more popular in the days of Post-Panda.”Short Clicks – Long Clicks” – and “Pogosticking”. Add these to your SEO dictionary if you don’t have them in there already.
On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “long click”. this occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query. But unhappy users were unhappy in their own ways, most telling were the “short clicks” where a user followed a link and immediately returned to try again. “If people type something and then go and change their query, you could tell they aren’t happy,” says (Amit) Patel. “If they go to the next page of results, it’s a sign they’re not happy. You can use those signs that someone’s not happy with what we gave them to go back and study those cases and find places to improve search.”
We’ve known that Google has been looking at “Short clicks” and “long clicks” for years…I just think that with the Google Panda Updates, the measurement of those signals became much, much stronger.
There’s 2 old articles worth reviewing as well. The first is by everyone’s favorite search patent translator, Bill Slawski (SEObytheSea). Bill wrote about Search Pogosticking and Search Previews in reference to a Yahoo patent back in November of 2008, where Bill says:
Search pogosticking is when a searcher bounces back and forth between a search results page at a search engine for a particular query and the pages listed in those search results.
A search engine could keep track of that kind of pogosticking activity in the data it collects in its log files or through a search toolbar, and use it to re-rank the pages that show up in a search for that query.
And the second article is from Blind Five Year Old, from back in 2009, where he wrote about “Short Clicks vs. Long Clicks“.
They’re not peeking at bounce rates. Instead Google is measuring pogosticking activity by leveraging their current tracking mechanisms. Remember, Google already tracks the user, the search and the result clicked. All Google needed to do was to accurately model the time dimension.
Implicit feedback about how satisfied a searcher is with a web page that they found in a search result might be collected by a search engine. This kind of information isn’t provided explicitly by a searcher, but rather is implicit in the searcher’s actions or inactions.
And also in that same paper, Bill concludes:
Google’s Amit Singhal and Matt Cutts told us in The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers that the Panda update looks “for signals that recreate that same intuition, that same experience that you have as an engineer and that users have.” It’s possible that these signals are using some kind of classification system that might either incorporate user behavior signals into page rankings, or use it as feedback to evaluate the signals chosen to rerank pages in search results.
The kind of algorithmic approach that I pointed to in Searching Google for Big Panda and Finding Decision Trees may be in part what’s behind the Panda update, but it’s clear that user behavior plays a role in how a page or site might be evaluated by Google.
I also thought I’d include another paragraph from the In the Plex book worth noting:
In between the major rewrites, Google’s search quality teams constantly produced incremental improvements. “We’re looking at queries all the time and we find failures and say , ‘why, why, why?'” says Singhal, who himself became involved in a perpetual quest to locate poor results that might have indicated bigger problems in the algorithm. He got into the habit of sampling the logs kept by Google on its users’ behavior and extracting random queries. When testing a new version of the search engine, his experimentation intensified. He would compile a list of tens of thousands of queries, simultaneously running them on the current version of Google search and the proposed revision. The secondary benefit of such a test was that it often detected a pattern of failure in certain queries.
I don’t have much to add to all these quotes… except that I still support my original theory that the biggest factor to the Panda update was the Tweaking of the importance of this factor:
Those who search Google…click on a search result listing…then go back to Google, and click on some other result….I think this is what can hurt you the most…. this in not bounce rate (bounce rate is when someone leaves your site and goes anywhere)… I am only concerned with those who leave your site, and go back to the Google search and click on someone else. ….Google can give all sorts of great content advice…and we’ll take it and say “Thanks for the tips”…but I still think that the biggest factor to Panda is “short clicks” and “long clicks” and Pogosticking.
Machine learning is using a computer to recognize patterns in data, to then make predictions about new data, based on the pattern recognized or learned from prior chosen training datasets.
One of the ways that Google uses machine learning algorithms in search is to analyze historical data from the logs they keep to analyze and to predict likely future outcomes of search behavior and the satisfaction level of a searcher when they click on a search result and land on any given page. One of the most common methods search engines use to measure the satisfaction of a user is to measure the short clicks and long clicks. As Jim explains, the long click is when someone does a search, clicks on a search result link, goes to that page, and does not return to the search engines. This is a good signal to the search engine that the user found what they were looking for. The short click, is when a user goes to a page from a search, and then returns to the search engine and clicks on another result or does another search….this says that the user did not find what they were looking for on the first page…yes, there are exceptions, but this is the norm.
Although the Google Panda Update is new, Google's use of machine learning is not. Peter Norvig, Director of Research and Development at Google, has been implementing elements of machine learning in search since 2001. The 25 cent explanation for how classification trees are used in machine learning is simply, a classification tree is trained on a 'training dataset', which is usually an artificial dataset mimicking a real one or a historical dataset to 'learn' how to classify object or phenomena. They use the data to try to classify pages, or to compare it, it's like how a vending machine that accepts coins recognizes the difference between a quarter and a nickel by the diameter of the coin you put in. So too do they want to classify pages in search results.
The picture does get a little bit more complex because a classification tree has an importantly different decision procedure from a discriminant analysis. I'll first tell you what is the difference and then I'll explain why I think it's important in the understanding of what their goals with the Panda update were.
Discriminant Analysis A linear discriminant analysis produces a set of coefficients defining the single linear combination of the predictor variables. A score is determined on a linear discriminate function is computed in composite, considering scores for all predictor variables simultaneously.
A classification tree, although involving a decision tree and coefficients, just like the discriminant analysis, is importantly different. A classification tree has a hierarchical decision procedure, in that, to put it loosely and very simply, things are not all calculated at the same time but different calculations happen as the algorithm works its way hierarchically through the predictor variables.
Now, there's an academic paper written by Biswanath Panda (yes, the same Panda that the Panda update was named after) called, "PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce". This paper discusses classification and regression tree learning on massive datasets, as part of the common data mining aspect of search broadly and more specifically discussing PLANET as "a scalable tree learner with accuracy comparable to a traditional in-memory algorithm but capable of handling much more training data."
Of course Google would want to be able to have a tree learner be able to handle its massive search datasets.
There are two important things to note about this document by Biswanath Panda. First, he refers to the classification tree structures specifically as being part of their study. As well, and I thought this was very interesting, to test their heavy-duty tree learner PLANET's effectiveness they tried it on the bounce rate prediction problem. According to Panda and his team:
"We measure the performance of PLANET on bounce rate predication problem [22, 23]. A click on an sponsored search advertisement is called a bounce if the click is immediately followed by the user returning to the search engine. Ads with high bounce rates are indicative of poor user experience and provide a strong signal of advertising quality. The training dataset (ADCORPUS) for predicting bounce rates is derived from all clicks on search ads from the Google search engine in a particular period. Each record represents a click labeled with whether it was a bounce. A wide variety of features are considered for each click...So what this means is that using the method of PLANET, they were able to scale these complex tree structure learning calculations surrounding the bounce rate prediction problem successfully and scalability."
In his post discussing the topic, Searching Google for Big Panda and Finding Decision Trees, Bill Slawski connects the prior mentioned document with the panda update, stating that
"....while the authors are focusing upon problems in sponsored search with their experimentation, they expect to be able to achieve similarly effective results while working on other problems involving large scale learning problems....the Farmer/Panda update does appear to be one where a large number of websites were classified based upon the quality of content on the pages within those sites."
It sounds like the training dataset was put together by the Google quality raters. As stated in the initial interview with Matt and Amit, Matt said that there was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?" Questions along those lines.
The results of these pages/sites were probably mixed with another group of mixed pages/sites into a training dataset. After probably A LOT of testing and tweaking, the algorithm was let loose on the world to see how accurately it predicted low quality pages and sites. Amit later did offer his 23 guidelines for building a high quality site which may very well have been some of the questions asked in the real dataset....but this document is still all "concepts". It is almost as if they are giving us their goals for what they want to algorithmically measure, but they still do not give any specific measurements or tell us any of the real variables involved.
Remember how one feature of classification trees, relative to distributive trees, is that they process hierarchically instead of simultaneously. I would conclude that Google is probably using classification trees to check for one set of factors first then split, check for another set of factors, depending on how the math works out for that, it either goes on to do further processing or stops. So, I wonder if my prior evaluation is correct, there is a reasonable possibility that the Panda Update includes some kind of classification tree, with a hierarchical decision structure.
Does this mean that there are some panda factors that are necessarily more important then others? Does this mean that there is that initial combination of factors that expose your site to further scrutiny, which lead charmingly to getting pooped on by the Google Panda update. What are those factors? It is hard to say exactly with some speculation. My best guess is:
Some % of clickbacks to Google, for phase a, on a site of at least x size, with y amount of content on page [ content sub variables t, w] , and on a page level, the page having properties of p, q, l in some combination.
The best first step I think, although by this time I think all those hit by panda took action, is to identify your lowest performing pages that ALSO used to rank well for long tail phases but then dropped. Once these pages are identified, there are two options. Either ax the pages OR change the pages entirely. My point is this though, there are definitely a small number of factors that came together to bring the wrath of the panda update on your site. Being proactive by getting rid of the really bad stuff or rethinking the content of your pages is the key thing to bring you back. I don't mean that you suddenly have to become Stanford encyclopedia of philosophy, but do just enough to bring you back. That's the game isn't it, the little changes are everything. A little after I initially finished this post I stumbled on this great post, You&A with Matt Cutts. One of the things that popped out at me was this quote, which was in reply to a question about site usability as being a partial ranking factor, with Cutts saying, "Panda is trying to mimic our understanding of whether a site is a good experience or not. Usability can be a key part of that. They haven't written code to detect the usability of a site, but if you make a site more usable, that's good to do anyway." I think my big lesson from some of my research and thought that I published here and Matt Cutts concession above is that Google and search is limited in what they can hope to measure. It is almost philosophical; can a machine really understand what it means to be 'quality' from data mining?
The goal of our Panda offering is to get you “out of Panda” ASAP. We also recommend our Google Panda update recovery solution to ‘future proof’ your site against a Google Panda Update in the future.
A review of your site analytics provides a starting point for our Panda analysis. Representative pages are chosen from each type of page on your site. We make actionable recommendations for these pages based on data from your existing analytics.
We analyze user behavior on your site using Clicktale. Prior to receiving this analysis, you will need to sign up at clicktale.com and put the tracking code on your site. Clicktale offers vides of user behavior on your website and heatmaps of aggregate user actions. We analyze this information and make suggestions to help make the site more user-friendly and see what can be done to keep your users from clicking back to Google.
Making a website more user-friendly decreases its chances of being negatively impacted by the Panda because a more usable site keeps unique visitors on site longer. The usability analysis of your site includes actionable recommendations to improve the user experience as well as suggestions for increasing site conversions. The usability analysis consists of a site-wide usability analysis, a deep-page analysis based on URL's within analytics determined to be troublesome, and a "Shopping Cart Experience", where applicable.
Content plays a factor in Panda and Panda update recovery. Content needs to be unique and high quality. The 'Panda' starts out targeting deep page with thin content and works its way up your site architecture. The content analysis involves analyzing the deep pages, chosen during the analytics analysis, and includes actionable solutions for instances of duplicate content and ideas for additional content for your site.
The design of your site can impact the user experience and impact clickbacks to Google. We make suggestions to change your design in order to make your site more user-friendly. We then create design mock-ups of representative pages that include suggestions recommended in our usability, clicktale, and content reports.
Jim’s remake of the Hank Williams Jr. song, “Country Boy Can Survive”
Google’s Panda Update hit you hard this time
But Jim’s Ninja’s are together, and he’s drawn the line
Your unemployment is up and your traffic is down
But Jim’s ninja’s are united from both sides of town
We live in Upstate New York you see
And Google Update’s have never stopped me
But Google has changed, and so have I
And your site is going to survive
Your site is going to survive
We can write great content all day long
And we can work on usability from dusk till dawn
Yea, I’ve been through worse things than this a time or two
So believe you me, there ain’t many things my ninja army can’t do
We get followers and tweets, and we get Likes
And your site is going to survive
Your site is going to survive
You you can’t Poop us out, An you can’t make us run
Cause my marketing ninjas, can perform under the gun
Hey Mr Panda, Hey Panda Man
I know why you hate me, cause you love the big big brand
I’ve taken a walk into the Google minds
Read the patents and all the papers I could find.
And I can Analyze Back, and you’ll be fine
Cause your site is going to survive
Your site is going to survive
I had a good client in New York City
He never knew me by my my name, used to call me Link-ably.
I spend my time reading Aaron and Rand
And my client worked his site like a good business man
He used to send me Hot Fudge, to keep me working up late at night
I’d send him some of my Super Ninja Links , Links I know he’d love
But he was Pooped on by the terrible Panda
So he’s now another client asking me, “Jim, Oh Why!”
An Algorithm for an Algorithm, I’ll reverse engineer
And that’s my slogan, I know for you it’s do or die
And your site is going to survive
Your site is going to survive
Cause you can’t Pandasize me out, and you can’t make us run
We’re them Marketing Ninjas that work well under the gun
Hey Mr Panda, Hey Panda Man
I know why you hate me, cause you love the big big brand
I’m working with clients who were not content farms
And all Google’s done is unite my ninja band
There’s no more White Hat Vs. Black Hat this time
But one united team that you can stand behind
Your site is going to survive
Your site is going to survive