06 Apr 2016

A “More” Core Tour of the 404: Do’s and Don’ts for Broken Pages

There are some topics, that although extensively written on already, still seem to elicit a lot of confusion among clients and digital marketers alike, even in 2016. One of those topics is the 404 page and how to handle broken pages and links in general.

So even though the best practices haven’t changed (too) much over time, allow us to review the topic again and perhaps offer a fresh perspective or insight on the topic.

A Review of HTTP Status Codes

First, since a 404 is a very specific type of HTTP Status Code, let’s do a real quick review of page requests, servers and HTTP Status Codes.

Whenever you navigate to a webpage in a browser or a search engine spider crawls a page, there is a request sent to the server that the site is hosted on.  For each page request from an agent, the server sends along a specific status code, numbered according to the specific status of the page request.

  1. If a page is properly fetched and returned to a browser (or search engine spider), the server sends along a 200 level status code  – Success. The most common is a 200 (OK) that is sent along with every page that properly loads in a web browser.
  2. If the page has been moved elsewhere, the server sends along a 300 level status code – Redirected. These are your 301 (Permanent) and 302 (Temporary) redirects. There are a few other 300 level statuses, but 301s and 302s are by far the most common.
  3. If the request for the page is seemingly valid, but the server is unable to fulfill the request, then a 500 level status is returned  – Server Error. These include your 500 (Internal Server Error), 502 (Bad Gateway), and 504 (Gateway Timeout) errors.
  4. If the request for the page goes awry not because of the server, but because of the client/agent, then the result is a 400 level status code – Client Error. These include the 401 (Unauthorized) and 403 (Forbidden) status codes, but also the 404 (Not Found) error. The 404 Status Code is returned whenever the specific URL that is requested does not exist on the server, at least anymore. There is an implicit understanding with a 404 that the page in question “may” return in the future, but is currently Not Found. By contrast, the 410 Status Code is similar to a 410 but implies the page is Permanently Gone.

Why do some fear 404s?

Now, a 404 is a status code that corresponds to a specific event (or lack thereof). Nothing more, nothing less. When a page requested is not found, a 404 is not only a valid code to return, it’s the ONLY status code that should be returned. However, there is still a fear or aversion to the 404 among certain developers, webmasters and site owners. Over the years, we’ve seen quite a few contortions and setups used to avoid sending a 404 status code, as if it were the Boogeyman or the Plague. We can’t speak for everyone, but here’s our guesses as to the most common reasons for why this happens:

  1. Aesthetics – Web designers and developers like beauty and clean design. Your standard 404 message that servers send in absence of Custom 404 is as ugly as sin. It probably makes some of their skin crawl just to look at.
  2. Penalties – Some webmasters believe that 404s are “bad” for SEO. So, they go out of their way to avoid a “penalty” they believe would be incurred upon the site if Google were ever to discover that pages that don’t exist don’t actually exist.

Issues with not using 404s

So, using a 404 is a best practice. Got it. But what’s actually “wrong” with avoiding them?  If a site isn’t returning 404s properly, why should they webmasters go out of their way to address it? A few reasons, actually.

1. Reporting Bugs in Google Search Console and Google Analytics

If a site isn’t returning a 404 properly, it’s sooner or later going to be evident in your data and reporting. When 200 OK status codes are returned for what should be 404s, Google Search Console (Webmaster Tools) will often report these as “soft” 404s.

Googlebot will observe that status code sent is a 200, but will also note the page loaded empty – there was no content. IMN has received countless emails over the years from site owners asking about these soft 404s and how to “fix” them. Although there isn’t a penalty for soft 404s, Google does not like them and clearly differentiates between a soft 404 and a genuine one. Moreover, these soft 404s don’t go away until you address them, and as such, they clutter up your reporting dashboards.

The purpose of a dashboard is to give you high level diagnostics to assist you. But if your car’s dashboard always had an “Empty Gas Tank” icon flashing, even after you filled up, you’d question the utility of the dashboard, no? Same goes for SEO dashboards. Webmasters should have dashboards that report “real” errors in close to real-time. Soft 404s are going to clutter that process in Google Search Console.

For Analytics, the problem becomes even more dire – there are going to be artificially inflated traffic counts for the site. If a broken page is returning 200 OK status codes, it’s going to creep into Analytics reports and make it seem like there’s more traffic to the site than there actually is. Analysts will later have to segment out that corrupt data, which takes more time and resources. Bad data = bad reporting = Bad insights. It’s really as simple as that.

2. Crawling and indexation

Another problem with not properly using 404s involves crawling and indexation. First, let’s talk crawl budget. Every site, no matter how big or authoritative it is, has a limited crawl budget  -the time spent crawling and the number of pages crawled by Googlebot within a given time period.

If a non-existent page is returning a 200 OK status code, Googlebot is going to re-crawl that page in the future, indefinitely, until it’s signaled that page doesn’t exist. That’s a waste of crawl budget, which should be going toward NEW pages and the TOP pages on a site – not imaginary ones.

Moreover, with a 200OK status code, Google could begin to start caching and indexing these pages  -they won’t rank for anything, because there’s no content, but they will begin to add to index “bloat”. Controlling crawling and indexation activity is an often overlooked component of SEO, but it’s still an important one.

3. User Experience

There are some who would claim that a 404 is poor user experience, but we disagree. In the event that a page is not found, that should be communicated to the user as quickly and clearly as possible.

Transparency is the gold standard here. Otherwise, using 200 or 300 level status codes is going to suggest to the user they’re on a page that in reality doesn’t exist. Users want  to meet their user intents, which they can’t do on an empty or irrelevant page.

A 404 signals to them that something went awry in their request. Now, a 404 page can certainly be made MORE user friendly through a Custom 404, but that doesn’t mean a 404 in and of itself poor user experience.

Dos and Don’ts for 404 Pages

So what are the major Dos and Don’ts for 404 pages? IMN would recommend following 3 basic rules for broken pages. If you can stick to these 3, it should cover almost all the hiccups we’ve encountered over the years.

  1. Don’t use redirects for broken pages – Don’t redirect users when a page is broken. Not to the homepage,  not to a category page and definitely not to a page titled /404.php; the URL extension that was originally requested should remain the same in the address bar of the browser.
  2. Don’t use 200OK for a broken page – Whether you first redirect users to a /404.php or you just load empty pages, a broken page request should never, ever, EVER result in a 200 level status code. Make sure your 404s are returning actual 404 status codes to avoid “soft” 404s. Use a header checker tool as needed.
  3. Do use Custom 404 pages – Yes, your standard 404 screen a server sends out to a browser isn’t too much to look at. And it doesn’t provide any “Now What?” solutions for a user either. So, DO use Custom 404 pages – ones that are designed to fit the look, feel and brand of the website. These custom 404s should in big, bold letters let users know that the page request couldn’t be fulfilled, but should offer users navigational choices elsewhere on the site that might help them – the homepage, some top category pages, perhaps an internal site search page. And, if you’ve got the enthusiasm, have some fun with the Custom 404! They can really be a palette for some  genuine creativity.

If you follow these basic principles, you should have a functional website for the purposes of broken page/link handling AND avoid common reporting and user experience headaches. Hope that helps!

No Comments

Leave a Reply