Once you have been doing SEO for awhile, you are bound to come across a client or two that goes beyond your expectations on scope and size. Usually these are big brands that have millions upon millions of pages indexed in the search engines and even more links pointing at them. A project of this magnitude can seem intimidating. I remember the first time I did an audit for a really big client, I wasn’t sure where to begin. And when the audit was finally over I worried if I had missed anything or not.
In this post I am going to go over a basic strategy I use when planning an audit process for enormous sites. But, I can’t take full credit for this one, my buddy, Alan Bleiweiss, actually gave me the original concept to use.
Essentially what we are going to do is classify different types of pages on the domain, run analysis on one or two of each of these types, and then make educated assumptions about the sum of all the pages. These assumptions then turn into recommendations that are included in the audit.
Step One: Map Basic Page Hierarchy
To start off, its important that we have a very good understanding of the structure of the site. Mapping the site’s information architecture is the best way to understand a site’s structure. When I am mapping site structures I like to visualize the site to make my thought process more palpable. To do this I often use a my favorite mind mapping software. By using a mind map, you can list all the elements of the site in a hierarchy or pattern. Its best to start with a site’s main navigation and then work your way out from each menu item away from the home page. With this mind map, it won’t be important to list every page, but rather to make a map of the main areas of the site and their supporting elements.
Step Two: Identify Patterns
After your mind map is completed it is then important to identify patterns in the site structure. This can be done by looking for similarities in pages and URLs. This is an important step because it will help you with the next step and will help you identify common issues later on. For example, lets say that for some reason a blog’s comment structure follows the same author citation schema as the site’s forum threads. Identifying this early on may help expose problems with both sets of pages if they are using citations incorrectly, because if a developer does something wrong in one area, they are likely to have made the same mistake in other places.
Step Three: Classify Page Types
After you have identified patterns it will become apparent to you that there are different “types” of pages on the site. For example, a basic blog usually has the following types of pages:
- Main Blog Archive Page
- Single Post Page
- Date Archive Page
- Category Archive Page
- Tag Archive Page
- Author Archive Page
- Secondary Page
If the site is using a popular CMS, uncovering these types of page types will be easy. However, if there is no CMS or a custom CMS is being used, you will probably need to rely heavily on the patterns that you identified in step two.
Step Four: Run Analysis On Each Page Type
Now that you have mapped the site’s structure, identified patterns, and classified page types, it’s now time to finally begin your analysis. Because the site is so massive you won’t be doing analysis on each page. Instead, you are going to want to run separate analysis on each of the page types that you have identified in step three. To get the best results you are going to want to run an analysis on at least two pages of each type. If you see overlapping issues on all the pages you look at per page type, then you can start to draw assumptions about all of the pages in that classification. However, if you find irregularities on one or two pages of a type, then you should investigate further to see if it is a larger issue.
Going Beyond Page Analysis
This type of analysis is great for on page technical analysis. However, you can also use it to identify other SEO caveats. For example, you may not be able to allow your site crawler to crawl millions of pages. However, after you have identified patterns and page classifications, you can set the crawler to crawl different page types only. Whatever your analysis tasks might be, classifying page types, and identifing patterns can help you manage your analysis process with ease, and make a seemingly impossible task much more manageable.
Until next time, happy auditing!

OMG I just blogged! When Size Matters: Classifications for Large Scale Site Analysis http://t.co/SWdVwQAXip
When Size Matters: Classifications for Large Scale Site Analysis @NinjasMarketing http://t.co/8wLGUWrsUk by @joehall
New @webuildpages: When Size Matters: Classifications for Large Scale Site Analysis http://t.co/SbQYNMTmvt
When Size Matters: Classifications for Large Scale Site Analysis:
Once you have been doing SEO for awhile, yo… http://t.co/J6iYcLhkYK
When Size Matters: Classifications for Large Scale Site Analysis http://t.co/5aV7Kw8Qjv #seo
IM Ninjas: When Size Matters: Classifications for Large Scale Site Analysis http://t.co/4G7CMCWJTi
Concise article on “Classifications for large scale Site Analysis” http://t.co/LP3HsZoFlY by @JoeHall
When Size Matters: Classifications for Large Scale Site Analysis by @joehall via @NinjasMarketing http://t.co/n6tuapjU1v <– Good stuff!
When Size Matters: Classifications for Large Scale Site Analysis by @joehall http://t.co/fG7zN8yW1S
When Size Matters: Classifications for Large Scale Site Analysis @NinjasMarketing http://t.co/DuEEgGj9TR
for those doing SEO and site audits: “When Size Matters: Classifications for Large Scale Site Analysis” http://t.co/FKcnvVbkt8 by @joehall
Some great advice here from @joehall (with a nod to @AlanBleiweiss) on how to handle audits of large scale sites http://t.co/vA8cH5oBKs
When Size Matters: Classifications for Large Scale Site Analysis http://t.co/p5NQRUIIA6 at @NinjasMarketing
Nice approach RT @joehall: OMG I just blogged! When Size Matters: Classifications for Large Scale Site Analysis http://t.co/M4u1N16Ntk
When Size Matters: Classifications for Large Scale Site Analysis http://t.co/ykT2kp2WDa by@joehall via @NinjasMarketing
Classifications for large scale site analysis http://t.co/2hE2MWbnwA by @joehall
When size matters: classification for large scale site analysis. http://t.co/jk4uXdr8ZS