Crawling Gotcha: Rel=”Canonical” Creating 404’s or Weird Redirects for Google

This issue:

Site crawls tend to be a starting point of a lot of SEO reviews. One thing that I’ve been increasingly running into is Google flagging server response errors to pages the never existed in the first place in Google Search Console but I don’t see the same error when running other crawlers.

One of the ways that Google is finding these pages is not by crawling internal links of a site, rather they’re crawling the link within rel=”canonical”.

Some of the urls you typically see referenced rel=”canonical” in to pages that don’t exist are the following:

+ parameters (including trailing parameters) in canonical
+ double trailing backslashes
+ no trailing backslashes
+ Incorrect protocol (http: vs. https:)

So, Google finding these errors but they may be invisible to you.

How big of a deal is this:

In general, Google ignores incorrect implementations of rel=”canonical”. For smaller sites, this is not significant for larger websites that rely on rel=”canonical” for crawl control this becomes important, especially if you have a lot of duplicate content.

For a large site, this could be a big win over duplicate content with a small project. Although purists may disagree with me, for smaller site without a ton of duplicate content, this is not a big issue.

How to find this issue on your website:

Before you run the check

Since we’re trying to replicate what Google is seeing, there are a bunch of steps to surface this issue. I included the general steps below. It about 30 minutes to generate the report below if you’re familiar with your crawl tool. Also, the method below will work with any modern crawl tool. With that said, if you have specific questions about how to crawl of this issue with the crawler you’re using, feel free to drop me a note in comments.

The steps to check if you’re generating server errors through your rel=”canonical”’s

Step 1: Use your favorite crawler solution to crawl your site. Some crawlers include the contents of rel=”canonical” in a column other crawlers allow you to extract content as a custom feature.

Step 2: Grab an export, and grab the contents of the URLs referenced in rel=”canonical” and copy them to clipboard

Step 3: Run the contents of rel=”canonical” through your crawler

Step 4: Your crawler will now tell you which canonical errors may be generating server errors for Google by sending Google to canonical pages that never existed.

Step 5: Use VLOOKUP to map the rel=”canonical” errors back to your first spreadsheet, from your first crawl, generated in step 1 & step 2.

Step 6: Now you have a list of pages with canonical errors that need to be fixed.

The fix:

Update the rel=”canonical” reference to correct location, yourself in the CMS if you’re able but if you’re managing a large website, more than likely, your rel=”canonical”s are created dynamically. In the cases where you can’t fix the canonical issues yourself and web development support is needed, you’ll have a spreadsheet mapping the errors to fix.

Happy SEO cleanup 😉

2 Responses

Adwait says:

October 5, 2016 at 8:39 am

This article has helped me to solve some doubt. I have one question for you though. I might be off topic but can you suggest me how to handle pagination in blog. Can I use canonical tags in say 2nd or 3rd page of blog to get rid of duplicate meta tags error notification in webmaster?
Melvin Foster says:

October 31, 2016 at 9:19 am

All parametrs which you explained is very useful most of blogger don’t do this that’s what they miss to success their blog

Comments are closed.

Latest Posts

Download Free Marketing Resources

Latest Posts

Meet The Bloggers

Jim Boykin

Founder and CEO

Ann Smarty

Community & Branding Manager

Link Building

Technical SEO

SEO Services

Marketing

Free SEO Tools

Featured Tools

Schema Generators

Webmaster Tools

Recent Posts

About IMN