21 Feb 2013

When Size Matters: Classifications for Large Scale Site Analysis

Once you have been doing SEO for awhile, you are bound to come across a client or two that goes beyond your expectations on scope and size. Usually these are big brands that have millions upon millions of pages indexed in the search engines and even more links pointing at them. A project of this magnitude can seem intimidating. I remember the first time I did an audit for a really big client, I wasn’t sure where to begin. And when the audit was finally over I worried if I had missed anything or not.

In this post I am going to go over a basic strategy I use when planning an audit process for enormous sites. But, I can’t take full credit for this one, my buddy, Alan Bleiweiss, actually gave me the original concept to use.

Essentially what we are going to do is classify different types of pages on the domain, run analysis on one or two of each of these types, and then make educated assumptions about the sum of all the pages. These assumptions then turn into recommendations that are included in the audit.


Step One: Map Basic Page Hierarchy

To start off, its important that we have a very good understanding of the structure of the site. Mapping the site’s information architecture is the best way to understand a site’s structure. When I am mapping site structures I like to visualize the site to make my thought process more palpable. To do this I often use a my favorite mind mapping software. By using a mind map, you can list all the elements of the site in a hierarchy or pattern. Its best to start with a site’s main navigation and then work your way out from each menu item away from the home page. With this mind map, it won’t be important to list every page, but rather to make a map of the main areas of the site and their supporting elements.


Step Two: Identify Patterns

After your mind map is completed it is then important to identify patterns in the site structure. This can be done by looking for similarities in pages and URLs. This is an important step because it will help you with the next step and will help you identify common issues later on. For example, lets say that for some reason a blog’s comment structure follows the same author citation schema as the site’s forum threads. Identifying this early on may help expose problems with both sets of pages if they are using citations incorrectly, because if a developer does something wrong in one area, they are likely to have made the same mistake in other places.


Step Three: Classify Page Types

After you have identified patterns it will become apparent to you that there are different “types” of pages on the site. For example, a basic blog usually has the following types of pages:

  • Main Blog Archive Page
  • Single Post Page
  • Date Archive Page
  • Category Archive Page
  • Tag Archive Page
  • Author Archive Page
  • Secondary Page

If the site is using a popular CMS, uncovering these types of page types will be easy. However, if there is no CMS or a custom CMS is being used, you will probably need to rely heavily on the patterns that you identified in step two.


Step Four: Run Analysis On Each Page Type

Now that you have mapped the site’s structure, identified patterns, and classified page types, it’s now time to finally begin your analysis. Because the site is so massive you won’t be doing analysis on each page. Instead, you are going to want to run separate analysis on each of the page types that you have identified in step three. To get the best results you are going to want to run an analysis on at least two pages of each type. If you see overlapping issues on all the pages you look at per page type, then you can start to draw assumptions about all of the pages in that classification. However, if you find irregularities on one or two pages of a type, then you should investigate further to see if it is a larger issue.


Going Beyond Page Analysis

This type of analysis is great for on page technical analysis. However, you can also use it to identify other SEO caveats. For example, you may not be able to allow your site crawler to crawl millions of pages. However, after you have identified patterns and page classifications, you can set the crawler to crawl different page types only. Whatever your analysis tasks might be, classifying page types, and identifing patterns can help you manage your analysis process with ease, and make a seemingly impossible task much more manageable.

Until next time, happy auditing!