Thoughts, case studies, guidance and solutions
Internet Marketing Ninjas - Full Service Internet Marketing & Tools (formerly We Build Pages)
Jim Boykin founded We Build Pages in 1999 as a web designer. Shortly thereafter, Jim started specializing in Search Engine Optimization, and then Link Building, and has grown to be a company that employs about 85 ninjas.
Jim had lead We Build Pages into several techniques of helping their clients to achieve higher traffic, via methods such as: on page optimization, link building, content creation, widget services, and internet marketing consulting. For years Jim blogged about link building, and has won several link building awards including "Best Link Building Blog" in 2006 and 2007 by Search Engine Journal.
Prior to We Build Pages, Jim attended Rider College in New Jersey studying marketing and politics. After college, Jim traveled the USA spending several years working in a number of national parks. Upon the completion of his travels, Jim "settled down" and the creation of We Build Pages was started. Jim is a true entrepreneur and leads his company forward by being active in sales, production, and design of all projects. Jim enjoys traveling - having been in every state - and has been a regular attendee/speaker at several SEM industry conferences for years. He most enjoys speaking on his specialty of link building techniques, internet marketing tools among other topics, and has given many interviews on his link building techniques.
We Build Pages is currently rebranding as Internet Marketing Ninjas. Internet Marketing Ninjas Internet Marketing Services is based out of Upstate New York, just north of Albany, and services clients world-wide.
I've been in SEO for over 12 years and I've seen several major Google updates over the years...and this year there's the Panda Update that has 14% of search results shot to a new Google Hell, a Hell called Panda.
To understand Panda and the subsequent Google Panda recovery, you need to know some of the filters that Google has already put in place.
One of the biggest "filters" that Google has is what used to be known as the Supplemental Results. I strongly advise that if you are not familiar with the Supplemental Results (2005-2007), that you read about that history to know what Google has already done to take out 95% of the crap content.
It is worth noting that I experienced and wrote in April of 2007 about Supplemental Results:
Speaking of Supplemental Hell, here's something else that I've been experiencing...if you publish 300 pages today and 3 months goes by and no one links to any of these 300 pages...guess where they might be going? (Supplemental Hell?) - (Moral, don't publish a bunch of pages in once (especially in a new folder) unless you know they'll get some backlinks and trust to those pages within a few months...I've seen many sites have entire folders go to supplemental hell after people published hundreds of pages in that folder in 1 day and then nothing there got a link for the first 3 months of existence....and seen other new folders/pages survive (so far) that got a small handful of nice backlinks to even a few pages in new folders.....kinda says something about checking for "quality rating" of new pages...no links to any pages in a folder and supplemental hell for them all. (Moral, publish a new folder with 300 new pages, get some of those pages some trusted backlinks, and hurry!)
Also, in April of 2007, Forbes wrote an interesting article about the Supplemental results that sounds eerily like a lot of the noises that I'm hearing again... deja vu....really worth reading again.
In July of 2007, I gave advice on how to stay out of this Google Hell...this advice is still great and worth reading today.
In August of 2007, Google removed the Supplemental results label on pages in the search results, and started indexing those pages more frequently... making it hard to detect which pages are in the supplemental results...so I wrote about a method that's still good at finding pages that are definitely in the supplemental results...but FYI, keep in mind that today, there's this + Panda.
I believe that after they removed the ability to clearly see which pages are in the supplemental results, that they then went on a binge of putting way more % of pages into this "Supplemental index".
So something to understand today with Panda is that Google was already pretty good at tossing the majority of everyone's pages on their sites into the supplemental results. At least the deep pages, and the pages with little content, and the pages of dup content...I had a call with a client today who had 397,000 pages indexed in Google...I told him that already Google probably had all but perhaps a few thousand pages in the supplemental results...and now, after the April Google Panda Update, he has about 20 quality pages...the rest needs to be redone and updated for 2011 and beyond...or should I say from after being Pandasized.
Google's been tossing duplicate pages, and poorly linked pages, and pages with little content into the supplemental results for years... this hasn't been an issue with Google....the issue comes when you have original content, on a page that is not deep enough to be at the "supplemental level"...
So, first thing, remember that things like this have happened in the past....understand the supplemental result history...and know that this was 4 years ago...and they've gone way beyond this now...
Since then they've added some other signals....what used to be the "supplemental index" has probably been rolled into a later update that they did called Caffeine in 2009...but telling what is "good" and what is "bad" when it comes to original content with "power" has been a weak point...just because a page is original, and powerful, doesn't mean it's a quality page.
So how would I, if I were Google, tell if a page were "good" or "bad"?
....Is time on the page important?...maybe a little...if I see 1000 words, and the average time on the page is 15 seconds, I wouldn't give all that content much weight in the content part of their equation... but it can still certainly "solve the question" someone was searching for...in fact, it can solve the question in just a few seconds and still be "good" content..
Does the number of pages visited on your site make a quality page?...is it better that someone engages with your site...but can't they get the answer w/o engaging with your site....does Google care if they engage with your site...or do they care if the searcher quickly finds what they were seeking?
Is "brand loyalty" important?... well...does someone need to come back to a site for it to be a good search result?...maybe...but then again, Google probably doesn't care too much if people go back to your site again and again.
What if it gets high Click Through Rates (CTR's) in the Search Engine Results Pages (SERP's)?...yes, that can be an indicator that that is what people are looking for...and Google, I'm sure, is giving sites with high ranking CTR's in SERPS a ranking boost... and those with low CTR I'm sure are getting a "negative" in that part of the algorithm. I believe that CTR in SERPS has been a signal for years...but that still doesn't tell if the searcher found what they are looking for when they did go there.
Consider, also, what if you were ranked #1, and your CTR was 50% (Average CTR for #1 organic rankings is between 35-45 % of all clicks)...so that would be a great Click Through Rate...but what if 98% of those people returned to the same Google search, and then they clicked on the guy at #2...and then they didn't return to Google for another hour, where they performed some other search for something else...I'd say that they must have found what they were looking for at #2...and they did Not find what they were looking for at your site....if 98% of the people that went to your site, just backtracked to that same search and stopped at #2, then I'd say that your content might not be great for that phrase...even if it's original content...and even if it's at a level of not being supplemental.
So why, if you grab a phrase, post panda; do the scraper pages rank higher?...why I'd say it's because you tripped a negative content filter...and a negative content score must be worse than a supplemental results page...thus the supplemental results show up first, above you....making it look like they're ranking those pages higher...but even those scraper pages won't get much traffic being in the supplemental index.
Where in the webmaster tools, or Google analytics, can you find who went back to Google???...answer...you can't...does Google know...yes...do we...no, and that is the main reason why people are looking over their analytics and going insane seeking the "perfect" rules for what you need...that's hard to do with the information that is available to you even with Analytics and Webmaster tools....there must be large signals that are missing to fill the gaps where analytics and usability fail in the analysis of Panda...
I know of no way to tell who went from your site back to Google, and what they did once they went back to Google....and, if they went back to Google, #1 did the searcher return to the same Google SERP and clicked on someone else....or, #2, did they returned to Google and run another search?
We can't measure this...Google doesn't give us this information....so how do you tell if the content is good?
On February 25th, the day after Google Panda recovery Started, Amit Singhal, a Google Engineer, gave an interview with the Wall Street Journal. Here's the important part:
Singhal did say that the company added numerous "signals" or factors it would incorporate into its algorithm for ranking sites. Among those signals are "how users interact with" a site. Google has said previously that, among other things, it often measures whether users click the "back" button quickly after visiting a search result, which might indicate a lack of satisfaction with the site.
In addition, Google got feedback from the hundreds of people outside the company that it hires to regularly evaluate changes. These "human raters" are asked to look at results for certain search queries and questions such as, "Would you give your credit card number to this site?" and "Would you take medical advice for your children from those sites," Singhal said.
...Singhal said he couldn't discuss such details and how the search algorithm determines quality of site, because spammers could use the answers to game the system.
Now looking at this...would I trust the data I get from "hundreds of people"...or from looking at 34,000 searches per second?....yea, I'd use the data from the searches a lot more than I'd put weight in what "hundreds" of people say to your questions about credit cards and medical advice things....
But anyways... a week later, Matt Cutts joins Amit Singhal in an interview with Wired Magazine:
Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?
Singhal: That's a very, very hard problem that we haven't solved, and it's an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we've developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: "Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?"
Cutts: There was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?" Questions along those lines.
Singhal: And based on that, we basically formed some definition of what could be considered low quality.
Singhal: You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there's some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red.
Cutts: ... But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings.
Keep in mind how these answers differ from the original interview with just Amit Singhal....in the original interview with just Singhal, it was said, "Google has said previously that, among other things, it often measures whether users click the "back" button quickly after visiting a search result, which might indicate a lack of satisfaction with the site."
But now, a week later, with Matt Cutts there, the "click back to Google" isn't mentioned...instead it's just the "hundreds of people outside the company that it hires to regularly evaluate changes." that are talked about...some, things like "clicking back" are not mentioned in the interview a week later when matt's there as well...but Matt Cutts does say "...But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings."
It's worth noting this link where over 2000 people are crying to Google saying they were collateral damage...and probably 14% are really collateral damage....you can add your site to that list for the Google engineers to check out: Google Webmaster Central Help Forum In there, WYSG, a Google employee states:
Our recent Google Panda update is designed to reduce rankings for low-quality sites, so the key thing for webmasters to do is make sure their sites are the highest quality possible. We looked at a variety of signals to detect low quality sites. Bear in mind that people searching on Google typically don't want to see shallow or poorly written content, content that's copied from other websites, or information that are just not that useful. In addition, it's important for webmasters to know that low quality content on part of a site can impact a site's ranking as a whole. For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.
We've been reading this thread within the Googleplex and appreciate both the concrete feedback as well as the more general suggestions. This is an algorithmic change and it doesn't have any manual exceptions applied to it, but this feedback will be useful as we work on future iterations of the algorithm.
You can add your site to the over 2000 sites above in the Google forum...You can analyze your webmaster tools data and your Google analytics data and get a lot of "partly smoking gun" information (which is helpful, and we do this as well)...
And, certainly can't hurt to look at your pages and ask these questions:
Perhaps the biggest question of all I really think should be "How do I get people to not quickly go back to the same Google search?"
The past few months I've been reading a lot of interesting theories about the Google Panda Update...and I've read a lot of noise (stuff I don't believe)... even one of the top results in a search in Google for "Google Panda Update" is a page talking about Panda and low quality backlinks....Panda has nothing to do with backlinks...trust me...if it did, I, of all people, would be shouting that my link builders have the solutions to Panda...my link builders can't help with Panda (but my other teams can)... I haven't heard Google talk about social signals...but I keep hearing people mention social being a "help" to Panda...um...I don't think so....this is a content issue...not a internal/external link thing...not a social thing...I totally believe it all has to do with Google's analysis of User Behavior in relation to the each page of your website (or sets of pages on your site).
I read a nice thread over at WebmasterWorld that was started by Brett Tabke called Panda Metric: Google Usage of User Engagement Metrics
There, Brett nicely outlines all the things that Google knows about searches including, how you got to Google, where you're from, your history, your browser, your tracking data, and cookies. Brett goes on to say:
At this point, Google knows who 70-75% (my guess) the users are and what they are doing on any given query, and can guess accurately at another 15-25% based on browser/software/system profiles (even if your ip changes and you are not logged in, Google can match all the above metrics to a profile on you)....
Finally, after all that data, the user probably types in a query: (if the search didn't come from offsite).
Then there's the query entry, the SERP behavior, and then the click on a result. At that point, Brett says, Google looks at:
and then Brett sums it up with:
After all that, we can quantify a "metric" (I call, The Panda Metric). It is an amalgamation of the above inputs. This set of inputs would be relative to this query. They could also be weighted to relative queries.
So far this thread has some great comments by the Admins and Senior Members...
Like Tedster with:
I think some of the delay in recalculating is that Panda works at a very basic level - it's what Google calls the "document classifier". I have a feeling that a particular type of routine does not run as often as the rest of the scoring that is built on top of it. My current research - looking through patents, papers, and posts that mention "document classifiers".
And followed up by TheMadScientist
I think this is probably a good time to clarify 'document' can refer to a page or collection of pages, and could easily be both, IMO. E.g. A page is an individual document and can be evaluated individually, but a site (or IMO, even a 'section' of a site) is also a document and can be evaluated as collective whole.
I would guess you're right about classifications not happening as often Tedster and, of course, if only a portion of pages (sub-documents?) are changed you could end up with the same overall evaluation of the document (site) as a whole, even though there have been changes to a portion of it.
There is a lot of noise in there, but there are some great minds in there as well.
I agree with Brett and others that there are a lot of signals at play here....even though last week when I wrote about the Google Panda Update, that I theorized I felt that "people who do a search at Google...go to your site...go back to the same Google search...click on another site...and not return to you" is my theory of what is the biggest factor at play with Panda.... ..but keep in mind, I totally know that there are many additional signals that Google can tweak Panda with from all their data sources, and that they'll continue to tweak their content analysis algorithms with every new signal that they can collect.
This reminds me of another interesting comment in another Webmasterworld Panda forum thread, where TheMadScientist brings up an interesting theory, that I'm inclined to believe:
IMO it has less to do with the weight of the links changing and more to do with a 'reverse scoring' (for lack of a better phrase), meaning I think a page with links pointing to a thin page may have its quality scored lower; when a page where the link(s) are pointing to are determined to be lower quality.
IOW: If Page A links to Page B and Page B's quality score is low, the overall quality score for Page A is lowered by linking to Page B.
We know link text counts forward (to the page the link is pointing to) I think part of what Panda does is reverses the scoring and the quality score of the linked page counts backwards (to the page doing the linking).
Keep in mind this is 'speculation only' ATM, but I really think people are looking in the wrong place when they're simply looking at link based scoring 'the old way' ... Simple link weight based scoring is soooo 2000, IMO.
Think of it this way...if you have 100 pages on your website...and if Google thinks that 70 of those are "Panda Poop Pages" (Yes, I'm coining a new phrase here...Panda Poop Pages)...and say they score those Panda Poop Pages each a negative 10 score)...then your site can get a negative score overall if all content is added up and scored across your site...beyond that, possibly, if you have an internal page that has 100 links on it going to other pages of your site, and if 80 of those links go to "Panda Poop Pages", then that page might have a lowered ranking itself because a user has a 80% chance of going to a "Panda Poop Page" from that page...
If this is the case, then improving things on a page-by-page level, will in turn, now tell Google that the page with 100 links is now only linking to 79 "Panda Poop Pages" instead of 80, and that page will increase ever so slightly...
I have a feeling that sites that have been Panda Pooped on, will not just get clean overnight...nor see any big "Wa-La!....We're back"...they'll see slow steady increases...page-by-page....which will, in turn, help the pages above those...and in turn, help the site as a whole... again, I don't know....no one has yet to this day said "We came back from Panda" and I don't think you'll ever hear that story unless it's a story about a whole year in time slowly bringing trust/rankings back....there are some stories of pages coming back...but there's also been stories of things bouncing around in rankings... I had one client who I spoke with today who prior to Panda II had ranked #4 for a major phrase. After April 11 he dropped to #15...then he dropped to the 50's...then last Friday he was #12, and today he's #8 ...and keep in mind, that he hasn't done a thing with the site since the April Panda Update. ...there's still some bouncing around and threads where people are saying "hey, a page came back"...or is "recovering"...and then the next day it's "sorry...it fell again"...
There's another thread by Bill Slawski called "Just What User Behavior Data Does Google Use to Influence Search Rankings?" where Bill nicely outlines several Google Patents that mentions several of the user behavior data that they might be looking at. One of the ones noted is from the patent "Information retrieval based on historical data"
If a document is returned for a certain query and over time, or within a given time window, users spend either more or less time on average on the document given the same or similar query, then this may be used as an indication that the document is fresh or stale, respectively.
For example, assume that the query "Riverview swimming schedule" returns a document with the title "Riverview Swimming Schedule." Assume further that users used to spend 30 seconds accessing it, but now every user that selects the document only spends a few seconds accessing it. Search engines may use this information to determine that the document is stale (i.e., contains an outdated swimming schedule) and score the document accordingly.
The past few months I've been digesting everything I can on Panda...and I've been looking at analytics, usability analyses, on-page analyses, and talking with clients about Panda.... for every site I can find possible reasons...and great possible solutions....at the very least, these clients are getting a great look at things like usability, on-page SEO, analytics analysis...I wish I could tell them..."Just make these changes...and just wait a little bit..and Bam! You'll be back!"... but I don't think it works this way....
So, there have been several iterations of the Panda Update so far, and more tweaks are expected.
Google says that they were targeting "Content Farms", but none of the sites that I have been analyzing have been "Content Farms". The clients that I had that were affected by Google Panda update, and the people that came to me later, were all basically ecommerce sites selling products... they didn't have content about everything under the sun, they were just ecommerce sites with lots of "product" pages.
I can kind of understand Google going after "Content Farms" and it's a great battle cry of Google..."We're getting rid of content farms...and no one likes content farms."... but the sad fact is that the reach of Panda update went way beyond content farms... If I showed you any of the Panda update affected sites that I've been looking at, you'd never think any one of these had any characteristics of a "content farm". .. so keep in mind that I'm looking at this mostly trying to analyze the ecommerce type of sites that were effected...these are sites with typically 20-100 main categories, and with a few thousand individual products. Most of these product pages either had "manufacturer" content that many other people had, or rewritten content, or a mixture of original content and dup content. Many of these sites had "mash up" pages of their products which was a large problem, and many of these sites had issues with sending people back to Google (another story I'll write more on later)...
Google's Amit Singhal wrote, in response to the Panda Update, "More guidance on building high-quality sites "which included 23 questions that will "provide some guidance on how we've been looking at the issue." (issue being Panda and Content)...but I thought I'd add my comments as well to each question that Google is using for "Guidance" in evaluating content...so below are the questions, and my thoughts on each question as well.
It's worth noting that Amit also said in that post:
One other specific piece of guidance we've offered is that low-quality content on some parts of a website can impact the whole site's rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content.
So keep in mind that your crappy pages will effect rankings on your entire site, and that you have to "remove, merge, improve, or move" those lower quality pages to have a chance of helping your rankings.
Even though I wrote some tongue in cheek responses to some of these questions, I do appreciate Amit sharing these tips on how to write better content. I'm not sure how much of this is really figured into the algorithm of the Google Panda Update, but as far as content writing for 2011 and beyond, this is a great guide to how to write content post-Panda.
Let's Learn Agent Rank and Reputational Scores...it's about content and writers and Panda.
So here we are, almost months since the "Big" Panda Poop of Feb 24th...and still, little light at the end of the tunnel for those who got pooped on by the Panda Update.
Remember the 23 questions that Amit said we should ask ourselves for guidance on building high quality sites?
Let's look at these specific 4 questions: (this may help with the Google Panda update recovery)
There questions remind me of a Google Patent application called Agent Rank.
I hate quoting patents because the average Joe might not be able to follow...but hey, maybe my readers aren't average Joes...there's a lot of goodness, and I had to cut back on the amount that I wanted to quote here....but here's some of the juiciest parts that relate to what I'm going to get at in a moment...Big Disclosure.... I changed the word "Agent" in the below section to be the word "Writer"... I kept replacing the words in my head, thought I'd make reading easier on you if I just replaced those words....maybe I'm wrong to do this...but you can always replace the word back in your head if you'd like...hehe...
 The name of the writer can be used to influence the ranking of web search results by indicating the writer responsible for a particular content piece. In one implementation, the reputation for a writer is expressed as a numerical score. A high reputational score indicates that the writer has an established positive reputation. The reputational scores of two or more writers can be compared, and the writer having the higher reputational score can be considered to be more authoritative. In an alternative implementation, multiple scores can be computed for different contexts. For example, a writer might have a first score for content that the writer has written, and a second score for content that the writer has reviewed. In another example, a writer that is responsible for an entertainment magazine could have a high reputation score for content related to celebrity news, but a low reputation score for content related to professional medical advice.
 Assuming that a given writer has a high reputational score, representing an established reputation for authoring valuable content, then additional content authored and signed by that writer will be promoted relative to unsigned content or content from less reputable writers in search results. Similarly, if the signer has a large reputational score due to the writer having an established reputation for providing accurate reviews, the rank of the referenced content can be raised accordingly.
 A high reputational score need not give a writer the ability to manipulate web search rankings. In one implementation, reputational scores are relatively difficult to increase and relatively easy to decrease, creating a disincentive for a writer to place its reputation at risk by endorsing content inappropriately. Since the signatures of reputable writers can be used to promote the ranking of signed content in web search results, writers have a powerful incentive to establish and maintain a good reputational score.
 In one implementation, a writer's reputation can be derived using a relative ranking algorithm, e.g., Google's PageRank as set forth in U.S. Pat. No. 6,285,999, based on the content bearing the writer's signature. Using such an algorithm, a writer's reputation can be determined from the extrinsic relationships between writers as well as content. Intuitively, a writer should have a higher reputational score, regardless of the content signed by the writer, if the content signed by the writer is frequently referenced by other writers or content. Not all references, however, are necessarily of equal significance. For example, a reference by another writer with a high reputational score is of greater significance than a reference by another writer with a low reputational score. Thus, the reputation of a particular writer, and therefore the reputational score assigned to the particular writer, should depend not just on the number of references to the content signed by the particular writer, but on the importance of the referring documents and other writers. This implies a recursive definition: the reputation of a particular writer is a function of the reputation of the content and writers which refer to it.
Bill Slawski has talked about Agent Rank before, and in November of 2010 Bill said:
The Agent Rank approach hinges upon every publisher on the Web having a unique digital signature that can follow them around from one site to another.
Write a blog post on your blog - you sign it with your digital signature.
Write a guest blog post on someone else's blog - again, you sign it with your digital signature.
Leave a comment on a blog you've never seen before - you attach your digital signature to it.
Your "reputation" follows you around to different sources, and the ranking of things you write, whether on your own pages or those owned by others, can be influenced by a reputation score for your work.
A lot of the ideas that I've been thinking of the past few days have been derived from Agent Rank. I probably shouldn't go into details about the ideas that I have based off of Agent Rank....so I guess I'll have to break Amit's Rule #7 (Does the article provide original content or information, original reporting, original research, or original analysis?) and leave off any original analysis.
Here's an overview look of 5 sites that got hit by the Google Panda Update in April. All sites showed a steady drop in traffic between April 6th and April 11th. If you have seen a drop in traffic like this, during these periods, then you've been pooped on by the Google Panda Update. Most sites we've seen had a drop of between 25-80% in Google referral traffic for sites affected by the February and April Google Panda Updates.
Site #1: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. Worth noting that the bounce rate is nowhere near "accurate"...I understand why Google shows such a low bounce rate in this particular site's case...but that's another story for some other time.
Site #2: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn't a content farm...but there were many similar pages.
Site #3: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn't a content farm...but there were many similar pages.
Site #4: Looking at non-paid traffic from Google from April 1st to April 30th. Drop between 6-12th. This wasn't a content farm...but there were many similar pages.
Site #5: Looking at non-paid traffic from Google from April 2 - May 1st.Drop between 6-12th. These people don't think that they got hit by panda....they think it has to do with their backlinks....
I lost 3 clients...these 3 clients were all hit by the April Google Panda update
Client #1 - They're convinced that they were not hit by the Google Panda update. I even did a post, showing analytics of 5 sites that were hit by the April Panda Update...but even when I showed those charts to this client (saying, "Hey, You're site #5 in these charts"), they still won't admit, or believe that the sudden drop in traffic is because they were pooped on by Panda. They're convinced that it has something to do with some backlinks they were acquiring from other companies. They have 163,000 backlinks....so I can't blame them for being paranoid if they're buying thousands and thousands of backlinks from other people...but...they're analytics show it was Panda...and Panda is not about backlinks...So they've canceled all linking services, including my nice non-paid, trusted links service....oh well...that won't get you out of Panda.
Client #2 - This is a big brand company. If you live in the USA, I'm sure you've seen their commercials. Personally I owned 2 of their products before I ever worked with them. This is a big e-commerce site...some of it is "Killer content" and some of it is "OK"... it's those damn mashups of products on internal pages that can get you with Panda... and it got them... and even their great pages, and great rankings, all got hit in the April Panda update. This company has several in-house SEOs and I'm sure they also work with several other internet marketing companies...and now after Panda this company just lost 40% of their internet traffic...and a few days ago I was told that they're "on pause for services"...for "who knows how long" (probably every internet marketing vendor is "on hold" for them) while the company figures out what steps to take.
Client #3 - I started working with them in April of 2005... We took a site with good rankings and over the years we took it to the top for the short tail, and then later, for the long tail. Over the years we went through changes with them...and a few years ago they also hired some in house "SEOs" to help them out more. The original owner, whom I had worked with for years, became much busier as the CEO of the "World's Largest Online Retailer of ...." and for the past 3 years, I've only spoken with their in-house SEOs. The in-house SEOs didn't work great with us...and they only did about 1/2 of what we recommend our clients do for maximum results of our work...but hey, if they'll do half of what we ask, that's better than nothing...and overall, their long tail still greatly improved and things were working OK...even if they didn't do all we asked of them...
This client was hit by the Google Panda Update...and I had a few calls with these new SEOs...and they, like client #1, are stuck on it being about backlinks...they have over 200,000 backlinks...our team was able to help obtain about 250 links over the past 12 months...small in comparison to their total of links...but (in my biased opinion), ours were the best 250 links they had. ....the links we got for them are their most trusted links that this site has..these are unpaid pertinent links where even about 25% of are edu links ....links are the least of their concerns anyway now with being hit by Panda...on top of that, several of the 200,000 backlinks that they do have, are the ones that stick out as "unnatural"... the links we get are "natural" (with a ninja influence).... and the last call to them, which was supposed to be about me reviewing me and my teams recommendations for "solutions to Panda" turned into a call about backlinks (them complaining when the rare edu page goes away..."why don't you maintain these?" they ask..."Dude...do you want me to write to a college librarian and say 'Hey, I noticed that the page on "Cars" is missing...or that the link on the cars page going to my client site is missing...would you please put that back up??".."No!"...our link ninjas are great, because they don't act like SEOs.). But anyways, today the "SEO Kids" at Client #3 fired my company.... rrrrrrr.
We did have a funny thing happen today when we got a call from the "Prior to Panda" biggest competitor for Client #1 who was also hit by Panda...and he wants our help and advice... advice I wish that Client #1 had asked for (I gave Client #1 advice and several reports for Free to help with Panda) and I wish they had listened to my advice...really...I've been around the block for over 12 years now and I still digest SEO even in my sleep...to these "kids" that work for Client #1 and Client #3, I bet SEO is just a "job" to them...to me, it's my life....if I say "You've been hit by Panda"...please believe me.... if I say, "yes, those 200k of backlinks are not cool...but that's not what caused your traffic to plummet on April 11″...believe me.
When the Google Panda update hit, I knew we'd lose some clients... during any Google shift, people will deny it...many will blame it on "other things"...and basically people in certain positions will do anything to keep their job...and ya know..I can understand this to some degree...Fire the SEO company, save some money...you have to save money somewhere...you just lost a lot of income with Google Panda update... and even though my site wasn't hit, I lost 3 clients in the past week who were hit in the April Panda Update... total "lost" income this month is $17,500, and over 12 months that would total $210,000.
These things happen.
I remember back in November of 2003 when the Google Florida Update hit (that was, I believe, 23% of all search results changing...twice as bad as panda)...and I had 2 huge clients with me at that time. Each of those 2 was 1/3 of my total income, and the remaining 25 clients made up the last 1/3 of my income... When Florida hit, one client lost rankings on 1/2 his sites...so he canceled... the other big client, knew of my other client...saw that 1/2 his sites didn't rank anymore, and quit too (even though his rankings increased with the Florida update)... that put me down to 1/3 of my income ... all in one week.... I survived... and grew... and even though Panda has hit 12-16% of all sites...we had a lot less than that percent effected overall for our clients...but it still hurts when you lose a client... losing a client I've had for 6 years makes me feel sad...losing anyone makes me feel sad... I don't ever want to let a client down...and I still feel that when a client leaves, that I've somehow let them down...that's why I've been staying at the office several nights past midnight since the April Google Panda update happened (that was the update that effected a handful of our clients)...so I've been working on learning and analyzing, and even in reporting on Panda. I feel bad when I know that I can help these companies...but there are "kids" in my way... oh well... I tried with those 3 clients ... I even gave them reports from our content team, our analytics team, and our usability team, and from me personally, with suggestions on what to do. I was also personally involved in every "Panda" client call... and in the end, I still lose 3 clients....I'm trying to do everything I can for them...but it's just sad when you lose a client...especially one whom I've had for over 6 years.... sorry for the rant...I don't like losing clients...nor losing $17.5k in income each month from this loss....that's $210,000 over 12 months...and that sucks!
At least there are probably other SEO companies who are also losing clients because of Panda, and perhaps some of their old clients will come to me and my company for help...and then they will become clients...and perhaps I can earn more than $210,000 this year in new clients because of Panda and our ability to help with Google Panda recovery.... we'll see....
OK, Folks, we have some old words, and old signals that have become more popular in the days of Post-Panda."Short Clicks - Long Clicks" - and "Pogosticking". Add these to your SEO dictionary if you don't have them in there already.
On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the "long click". this occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query. But unhappy users were unhappy in their own ways, most telling were the "short clicks" where a user followed a link and immediately returned to try again. "If people type something and then go and change their query, you could tell they aren't happy," says (Amit) Patel. "If they go to the next page of results, it's a sign they're not happy. You can use those signs that someone's not happy with what we gave them to go back and study those cases and find places to improve search."
We've known that Google has been looking at "Short clicks" and "long clicks" for years...I just think that with the Google Panda Updates, the measurement of those signals became much, much stronger.
There's 2 old articles worth reviewing as well. The first is by everyone's favorite search patent translator, Bill Slawski (SEObytheSea). Bill wrote about Search Pogosticking and Search Previews in reference to a Yahoo patent back in November of 2008, where Bill says:
Search pogosticking is when a searcher bounces back and forth between a search results page at a search engine for a particular query and the pages listed in those search results.
A search engine could keep track of that kind of pogosticking activity in the data it collects in its log files or through a search toolbar, and use it to re-rank the pages that show up in a search for that query.
And the second article is from Blind Five Year Old, from back in 2009, where he wrote about "Short Clicks vs. Long Clicks".
They're not peeking at bounce rates. Instead Google is measuring pogosticking activity by leveraging their current tracking mechanisms. Remember, Google already tracks the user, the search and the result clicked. All Google needed to do was to accurately model the time dimension.
Implicit feedback about how satisfied a searcher is with a web page that they found in a search result might be collected by a search engine. This kind of information isn't provided explicitly by a searcher, but rather is implicit in the searcher's actions or inactions.
And also in that same paper, Bill concludes:
Google's Amit Singhal and Matt Cutts told us in The 'Panda' That Hates Farms: A Q&A With Google's Top Search Engineers that the Panda update looks "for signals that recreate that same intuition, that same experience that you have as an engineer and that users have." It's possible that these signals are using some kind of classification system that might either incorporate user behavior signals into page rankings, or use it as feedback to evaluate the signals chosen to rerank pages in search results.
The kind of algorithmic approach that I pointed to in Searching Google for Big Panda and Finding Decision Trees may be in part what's behind the Panda update, but it's clear that user behavior plays a role in how a page or site might be evaluated by Google.
I also thought I'd include another paragraph from the In the Plex book worth noting:
In between the major rewrites, Google's search quality teams constantly produced incremental improvements. "We're looking at queries all the time and we find failures and say , 'why, why, why?'" says Singhal, who himself became involved in a perpetual quest to locate poor results that might have indicated bigger problems in the algorithm. He got into the habit of sampling the logs kept by Google on its users' behavior and extracting random queries. When testing a new version of the search engine, his experimentation intensified. He would compile a list of tens of thousands of queries, simultaneously running them on the current version of Google search and the proposed revision. The secondary benefit of such a test was that it often detected a pattern of failure in certain queries.
I don't have much to add to all these quotes... except that I still support my original theory that the biggest factor to the Panda update was the Tweaking of the importance of this factor:
Those who search Google...click on a search result listing...then go back to Google, and click on some other result....I think this is what can hurt you the most.... this in not bounce rate (bounce rate is when someone leaves your site and goes anywhere)... I am only concerned with those who leave your site, and go back to the Google search and click on someone else. ....Google can give all sorts of great content advice...and we'll take it and say "Thanks for the tips"...but I still think that the biggest factor to Panda is "short clicks" and "long clicks" and Pogosticking.
Machine learning is using a computer to recognize patterns in data, to then make predictions about new data, based on the pattern recognized or learned from prior chosen training datasets.
One of the ways that Google uses machine learning algorithms in search is to analyze historical data from the logs they keep to analyze and to predict likely future outcomes of search behavior and the satisfaction level of a searcher when they click on a search result and land on any given page. One of the most common methods search engines use to measure the satisfaction of a user is to measure the short clicks and long clicks. As Jim explains, the long click is when someone does a search, clicks on a search result link, goes to that page, and does not return to the search engines. This is a good signal to the search engine that the user found what they were looking for. The short click, is when a user goes to a page from a search, and then returns to the search engine and clicks on another result or does another search....this says that the user did not find what they were looking for on the first page...yes, there are exceptions, but this is the norm.
Although the Google Panda Update is new, Google's use of machine learning is not. Peter Norvig, Director of Research and Development at Google, has been implementing elements of machine learning in search since 2001. The 25 cent explanation for how classification trees are used in machine learning is simply, a classification tree is trained on a 'training dataset', which is usually an artificial dataset mimicking a real one or a historical dataset to 'learn' how to classify object or phenomena. They use the data to try to classify pages, or to compare it, it's like how a vending machine that accepts coins recognizes the difference between a quarter and a nickel by the diameter of the coin you put in. So too do they want to classify pages in search results.
The picture does get a little bit more complex because a classification tree has an importantly different decision procedure from a discriminant analysis. I'll first tell you what is the difference and then I'll explain why I think it's important in the understanding of what their goals with the Panda update were.
Discriminant Analysis A linear discriminant analysis produces a set of coefficients defining the single linear combination of the predictor variables. A score is determined on a linear discriminate function is computed in composite, considering scores for all predictor variables simultaneously.
A classification tree, although involving a decision tree and coefficients, just like the discriminant analysis, is importantly different. A classification tree has a hierarchical decision procedure, in that, to put it loosely and very simply, things are not all calculated at the same time but different calculations happen as the algorithm works its way hierarchically through the predictor variables.
Now, there's an academic paper written by Biswanath Panda (yes, the same Panda that the Panda update was named after) called, "PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce". This paper discusses classification and regression tree learning on massive datasets, as part of the common data mining aspect of search broadly and more specifically discussing PLANET as "a scalable tree learner with accuracy comparable to a traditional in-memory algorithm but capable of handling much more training data."
Of course Google would want to be able to have a tree learner be able to handle its massive search datasets.
There are two important things to note about this document by Biswanath Panda. First, he refers to the classification tree structures specifically as being part of their study. As well, and I thought this was very interesting, to test their heavy-duty tree learner PLANET's effectiveness they tried it on the bounce rate prediction problem. According to Panda and his team:
"We measure the performance of PLANET on bounce rate predication problem [22, 23]. A click on an sponsored search advertisement is called a bounce if the click is immediately followed by the user returning to the search engine. Ads with high bounce rates are indicative of poor user experience and provide a strong signal of advertising quality. The training dataset (ADCORPUS) for predicting bounce rates is derived from all clicks on search ads from the Google search engine in a particular period. Each record represents a click labeled with whether it was a bounce. A wide variety of features are considered for each click...So what this means is that using the method of PLANET, they were able to scale these complex tree structure learning calculations surrounding the bounce rate prediction problem successfully and scalability."
In his post discussing the topic, Searching Google for Big Panda and Finding Decision Trees, Bill Slawski connects the prior mentioned document with the panda update, stating that
"....while the authors are focusing upon problems in sponsored search with their experimentation, they expect to be able to achieve similarly effective results while working on other problems involving large scale learning problems....the Farmer/Panda update does appear to be one where a large number of websites were classified based upon the quality of content on the pages within those sites."
It sounds like the training dataset was put together by the Google quality raters. As stated in the initial interview with Matt and Amit, Matt said that there was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?" Questions along those lines.
The results of these pages/sites were probably mixed with another group of mixed pages/sites into a training dataset. After probably A LOT of testing and tweaking, the algorithm was let loose on the world to see how accurately it predicted low quality pages and sites. Amit later did offer his 23 guidelines for building a high quality site which may very well have been some of the questions asked in the real dataset....but this document is still all "concepts". It is almost as if they are giving us their goals for what they want to algorithmically measure, but they still do not give any specific measurements or tell us any of the real variables involved.
Remember how one feature of classification trees, relative to distributive trees, is that they process hierarchically instead of simultaneously. I would conclude that Google is probably using classification trees to check for one set of factors first then split, check for another set of factors, depending on how the math works out for that, it either goes on to do further processing or stops. So, I wonder if my prior evaluation is correct, there is a reasonable possibility that the Panda Update includes some kind of classification tree, with a hierarchical decision structure.
Does this mean that there are some panda factors that are necessarily more important then others? Does this mean that there is that initial combination of factors that expose your site to further scrutiny, which lead charmingly to getting pooped on by the Google Panda update. What are those factors? It is hard to say exactly with some speculation. My best guess is:
Some % of clickbacks to Google, for phase a, on a site of at least x size, with y amount of content on page [ content sub variables t, w] , and on a page level, the page having properties of p, q, l in some combination.
The best first step I think, although by this time I think all those hit by panda took action, is to identify your lowest performing pages that ALSO used to rank well for long tail phases but then dropped. Once these pages are identified, there are two options. Either ax the pages OR change the pages entirely. My point is this though, there are definitely a small number of factors that came together to bring the wrath of the panda update on your site. Being proactive by getting rid of the really bad stuff or rethinking the content of your pages is the key thing to bring you back. I don't mean that you suddenly have to become Stanford encyclopedia of philosophy, but do just enough to bring you back. That's the game isn't it, the little changes are everything. A little after I initially finished this post I stumbled on this great post, You&A with Matt Cutts. One of the things that popped out at me was this quote, which was in reply to a question about site usability as being a partial ranking factor, with Cutts saying, "Panda is trying to mimic our understanding of whether a site is a good experience or not. Usability can be a key part of that. They haven't written code to detect the usability of a site, but if you make a site more usable, that's good to do anyway." I think my big lesson from some of my research and thought that I published here and Matt Cutts concession above is that Google and search is limited in what they can hope to measure. It is almost philosophical; can a machine really understand what it means to be 'quality' from data mining?
The goal of our Panda offering is to get you "out of Panda" ASAP. We also recommend our Google Panda update recovery solution to 'future proof' your site against a Google Panda Update in the future.
A review of your site analytics provides a starting point for our Panda analysis. Representative pages are chosen from each type of page on your site. We make actionable recommendations for these pages based on data from your existing analytics.
We analyze user behavior on your site using Clicktale. Prior to receiving this analysis, you will need to sign up at clicktale.com and put the tracking code on your site. Clicktale offers vides of user behavior on your website and heatmaps of aggregate user actions. We analyze this information and make suggestions to help make the site more user-friendly and see what can be done to keep your users from clicking back to Google.
Making a website more user-friendly decreases its chances of being negatively impacted by the Panda because a more usable site keeps unique visitors on site longer. The usability analysis of your site includes actionable recommendations to improve the user experience as well as suggestions for increasing site conversions. The usability analysis consists of a site-wide usability analysis, a deep-page analysis based on URL's within analytics determined to be troublesome, and a "Shopping Cart Experience", where applicable.
Content plays a factor in Panda and Panda update recovery. Content needs to be unique and high quality. The 'Panda' starts out targeting deep page with thin content and works its way up your site architecture. The content analysis involves analyzing the deep pages, chosen during the analytics analysis, and includes actionable solutions for instances of duplicate content and ideas for additional content for your site.
The design of your site can impact the user experience and impact clickbacks to Google. We make suggestions to change your design in order to make your site more user-friendly. We then create design mock-ups of representative pages that include suggestions recommended in our usability, clicktale, and content reports.
Jim's remake of the Hank Williams Jr. song, "Country Boy Can Survive"
Google's Panda Update hit you hard this time
But Jim's Ninja's are together, and he's drawn the line
Your unemployment is up and your traffic is down
But Jim's ninja's are united from both sides of town
We live in Upstate New York you see
And Google Update's have never stopped me
But Google has changed, and so have I
And your site is going to survive
Your site is going to survive
We can write great content all day long
And we can work on usability from dusk till dawn
Yea, I've been through worse things than this a time or two
So believe you me, there ain't many things my ninja army can't do
We get followers and tweets, and we get Likes
And your site is going to survive
Your site is going to survive
You you can't Poop us out, An you can't make us run
Cause my marketing ninjas, can perform under the gun
Hey Mr Panda, Hey Panda Man
I know why you hate me, cause you love the big big brand
I've taken a walk into the Google minds
Read the patents and all the papers I could find.
And I can Analyze Back, and you'll be fine
Cause your site is going to survive
Your site is going to survive
I had a good client in New York City
He never knew me by my my name, used to call me Link-ably.
I spend my time reading Aaron and Rand
And my client worked his site like a good business man
He used to send me Hot Fudge, to keep me working up late at night
I'd send him some of my Super Ninja Links , Links I know he'd love
But he was Pooped on by the terrible Panda
So he's now another client asking me, "Jim, Oh Why!"
An Algorithm for an Algorithm, I'll reverse engineer
And that's my slogan, I know for you it's do or die
And your site is going to survive
Your site is going to survive
Cause you can't Pandasize me out, and you can't make us run
We're them Marketing Ninjas that work well under the gun
Hey Mr Panda, Hey Panda Man
I know why you hate me, cause you love the big big brand
I'm working with clients who were not content farms
And all Google's done is unite my ninja band
There's no more White Hat Vs. Black Hat this time
But one united team that you can stand behind
Your site is going to survive
Your site is going to survive
Read Full Interview at http://www.webpronews.com/google-algorithm-bounce-rate-ranking-signal-2011-05
We also picked the brain of SEO vet Jim Boykin. We asked Jim how important he thinks bounce rate is. He says, "I think that some aspects of bounce rate are very important in the post-panda world."
"It's important to note how Google defines Bounce Rate," he adds. This is below:
"Bounce rate is the percentage of single-page visits or visits in which the person left your site from the entrance (landing) page. Use this metric to measure visit quality - a high bounce rate generally indicates that site entrance pages aren't relevant to your visitors. The more compelling your landing pages, the more visitors will stay on your site and convert. You can minimize bounce rates by tailoring landing pages to each keyword and ad that you run. Landing pages should provide the information and services that were promised in the ad copy."
He also points to how it is defined in Google Analytics:
"The percentage of single page visits resulting from this set of pages or page."
"Personally, I don't think that a single page visit is a bad thing. To me, it tells me the visitor found what they were looking for. Isn't that what Google would want? If I were Google, I'd want a searcher to find the answer to their search on the exact page they clicked on in a search result...not 1 or 2 clicks in. If I were Google, I'd look more at 'Who Bounces off that page, and returns to the same Google search, and clicks on someone else, and then never returns to your site,' but I'm not Google, and that's just my 'if I were Google' thoughts".
Regardless, it can't be a bad thing to strive to make every page of yours the best page of its type - the solution to the searcher's problem. At its heart, that is really what the Panda update is about. Really, that's what search ranking is about in general. Delivering the BEST result for the query - signals aside.
As far as links, while Boykin says it's "kind of" fair to say that making sure your links point to quality pages can have a major impact on how Google ranks your site post-Panda, he says, "The final solution should be to remove or fix the low quality pages, and thus, all your links would point to 'quality pages'."
Again, this should improve bounce rate.
"I think most agree that there's a 'Page Score' or a 'set of pages score,' and when that has a bad score, it affects those pages, and somehow ripples up the site," Boykin adds. "It could quite well be that if you have a page that links out to 100 internal pages, and if 80 of those pages are 'low quality' than it just might affect that page as well. A lot of this is hard to prove, but there are some smoking guns that can point in this direction."
"Bounce rate is important, and yes, many sites that got hit did have a high bounce rate, but comparing this to sites/pages that weren't hit doesn't exactly show any 'ah ha' moments of 'hey, if your bounce rate is over 75%, then you got Panda pooped on,' because the bounce rate Google shows the public is missing many key metrics that they know, but don't share with us."
I think the best advice you can follow in relation to all of this is to simply find ways to keep people from leaving your site, before they complete the task you want them to complete. That means providing content they want.