I am always surprised at how the vast majority of things that I see during audits are simple “SEO 101” infractions. But then again it makes sense; clients come to us because they aren’t SEOs themselves. So in today’s blog post I am going to cover what many SEOs consider basic SEO with a few advanced tips that I often see over looked.
Today we are talking about the infamous robots.txt file. No other element with in a SEO campaign can do so much for your rankings, both positive and negative. One small error can screw up your entire approach, but executed correctly and you can control the engines like a dog on a leash. Therefore it is incredibly important to get it right the first time.
Top to Bottom
A commonly misunderstood aspect of every robots.txt file is how the search engines actually read it. When a search engine crawls and indexes a robots.txt file it reads the document from top to bottom. Which means when an error in syntax or anything else is present the crawler will ignore all directives below the error. So therefore the lesson here is, if you are unsure of your syntax, or you are attempting something “unique”, you should put it at the bottom of the file so if a error is found, the other directives will not be ignored.
Use Wildcards Correctly
The wildcard directive can be very handy. With this directive you can create simple statements that help disallow patterns found in URLs. However if used incorrectly it can screw everything up. One important thing to remember is not all search engine crawlers support the wildcard directive. Because of this its a good idea to put any wildcard statements at the bottom of the file as to not cause an error and ignore the other directives, like we talked about above.
Only Disallow
I know that for some of you this is going to sound obvious. But, its important to remember that the robots.txt file is only used to “block” or disallow crawlers from sections of a site. It is not intended to point crawler in the direction of URLs that should be indexed, that’s what sitemaps are for. I mention this only because during several audits I have seen robots.txt files contain the directive “Allow: /example/”. This “Allow:” directive does not exist and will only cause errors. UPDATE (4/19/2013): @Zen2Seo pointed out that Googlebot does accept the Allow: directive. However it should only be used to allow sub directories of other directories that have been previously blocked. It does not help the crawler find new URLs. Also Googlebot seems to be the only user agent that supports this directive.
Use Line Breaks
Search engine crawlers read robots.txt files in segments. First the user agent is defined, and then the preceding block of code will contain the Disallow directives that are associated with that user agent. The proper format is to define the user agent, leave a blank line immediately below, and then each disallow statement should precede on it’s own line. If a new user agent needs to be defined, a new blank line should be placed separating the last disallow statement before the new user agent is defined. With out the proper use of line breaks errors will be created and the remaining directives will be ignored.
KISS It
I took a creative writing course in college once. I remember one of my first assignments was handed back to me with the words, “KISS this.” written in red. I couldn’t understand what the heck my professor was talking about, so I asked her after class. She explained that KISS should for Keep It Super Simple. Apparently, I was way to “wordy” in my assignment, and needed to trim some of the adjectives. When optimizing a robots.txt file, my best advice is to KISS it. The more complicated the file the more likely an error will occur. A few tips to keep things simple is: Don’t use the robots.txt file to block individual URLs at a time. In the case that you need to block just a handful of specific URLs, use the NOINDEX meta robots tag on the page itself. If you need to use a wildcard, thing about the simplest way to execute it. Keeping things simple will cut down on the likely hood that mistakes are made, and will create a smaller robots.txt file for faster processing.
All of the tips above are geared towards staying away from making serious errors. However, no one is perfect and everyone’s work should be check. Which is why when I write a robots.txt file I use a validator to check my work.
Until next time, happy roboting! 🙂
Comments
New @webuildpages: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/pw0UDjvAYW
5 PROTIPS: Optimizing Your Robots.txt File: I am always surprised at how the vast majority of things that I se… http://t.co/fp6UbTgKIq
5 PROTIPS: Optimizing Your Robots.txt File: I am always surprised at how the vast majority of things that I se… http://t.co/XYpOXwrY47
I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/3bK5c91kKQ
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/9fWaf3CfgF by @joehall
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/6ddNTkGLR6 by @joehall
RT @NinjasMarketing: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/oQY6MgCvto by @joehall
Top to Bottom FTW >> 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/Z80celiB9p by @joehall
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/3bK5c91kKQ
Keep the crawlers out with this robots.txt guide from @joehall http://t.co/4vw2dgsoEp
5 PROTIPS: Optimizing Your Robots.txt File by @joehall on @NinjasMarketing http://t.co/vcDAHSTO6f
THIS!!! —-> RT @seoaware: 5 PROTIPS: Optimizing Your Robots.txt File by @joehall on @NinjasMarketing http://t.co/v0FpH7TGnu
5 PROTIPS: Optimizing Your Robots.txt File http://t.co/JCmzyVVmXE #seo
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/kvgYRdTdBI
RT @seoverview: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/JCmzyVVmXE #seo
RT @TimothyCarter: THIS!!! —-> RT @seoaware: 5 PROTIPS: Optimizing Your Robots.txt File by @joehall on @NinjasMarketing http://t.co …
RT @AndyBeal: RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/kvgYRdTdBI
RT @AndyBeal: RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/kvgYRdTdBI
Awesome! It’s easy to forget to overlook the basics sometimes 🙂
Lyman Eric Perrine liked this on Facebook.
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/icPaSZc9IY by @joehall
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/dC0JNxlLHu by @joehall
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/3bK5c91kKQ
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/L7mfuH6kFY
IM Ninjas: 5 PROTIPS: Optimizing Your Robots.txt File – I am always surprised at how the vast majority of things t… http://t.co/iSKlKSWNZK
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/4kMiMkga1v
RT @seosmarty: 5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/9vbK07v3lz by @joehall
Great post Joe!
I would like to add one clarification.
The allow directive doesn’t exist in the official standard, but it is supported by Google (i.e., it’s valid if it’s directed at the googlebot): https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/3bK5c91kKQ
La Donna Robinson liked this on Facebook.
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/KaXsfyZTrs
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/3Vj4SY8wGF
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/RytiojuPyW
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/ZsHnGHToxr
5 PROTIPS: Optimizing Your Robots.txt File http://t.co/9qAreph7BB
5 PROTIPS: Optimizing Your Robots.txt File via @AndyBeal: I am always surprised at how the vast majority of… http://t.co/RNKEvStBfk
Big fan of yours Joe, but methinks you got point #3 wrong – Google has explicit instructions on the “allow” directive – https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/ozmALu65S5
5 PROTIPS by @joehall about Robots.txt File via @NinjasMarketing http://t.co/eu0CUIcW08
Buenos consejos para optimizar el robots.txt http://t.co/8xAZl1bwJJ
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/7GSdf16n9v
RT @toddmintz: RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/MBzM4PcNhQ
Disagreeing with @joehall http://t.co/GBOfIOSyLa allow is accepted and can be used (ie to specify crawlable sub-folders of blocked parents)
.@Zen2Seo I just updated the post to reflect that change: http://t.co/3bK5c91kKQ Thanks for the heads up!
I made a mistake in yesterday’s blog post…I have updated it here: http://t.co/3bK5c91kKQ
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/tYgCRH31V2
RT @balibones: 5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/tYgCRH31V2
5 PROTIPS: Optimizing Your Robots.txt File http://t.co/OzuPK1huXn
5 consejos para optimizar el fichero robots.txt #SEO – http://t.co/q2ttIsRRZV
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/EbL7AB2ZcU
5 PROTIPS: Optimizing Your Robots.txt File http://t.co/4x7TVtD4y6
RT @bluetrainseo: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/4x7TVtD4y6
5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/YuOgi6Uija
CY Cross liked this on Facebook.
5 PROTIPS: Optimizing Your Robots.txt File http://t.co/jdb5AqZVQT at @NinjasMarketing
5 PROTIPS: Optimizing Your Robots.txt File: I am always surprised at how the vast majority of things that I se… http://t.co/fGeMIIrfBk
RT @joehall: I just blogged again: 5 PROTIPS: Optimizing Your Robots.txt File http://t.co/56Ki0BzLho
5 PRO TIPS: Optimizing Your Robots.txt File http://t.co/kX2TWH8ZED by @joehall via @NinjasMarketing + Robots.txt checker tool
@lookadoo @joehall @ninjasmarketing make sure you use (xml) site map location too #robots
RT @AnnieCushing 5 PROTIPS: Optimizing your Robots.txt file http://t.co/dy2MVzxJ3Y by @joehall
Great post Joe. It’s amazing the damage that a small mistake in a robots file can do.
What’s your opinion on including your XML Sitemap URL in the robots.text file?
I added the no archive so as not to be included in archive.org which is one of the best way to spy on competitors…
Feroze Malik liked this on Facebook.
Cre8asiteforums liked this on Facebook.
Webcore Technologies liked this on Facebook.
Leyendo, 5 PROTIPS: Optimizing Your Robots.txt File @NinjasMarketing http://t.co/0y5luj0Uyo
Adrian Frank Consulting liked this on Facebook.
Besides googlebot, what is the name of the bing and yahoo bots
Nice article Joe! It has been a while. Hope you are well. You hit all the advanced stuff in this one. I actually try not to use robots.txt though. I would rather use a no index no follow, a rel canonical, 301 redirect, etc. But this is a good post. Internet marketing ninjas seems like a pretty rad company @johnelincoln
Nice article Joe! It has been a while. Hope you are well. You hit all the advanced stuff in this one. I actually try not to use robots.txt though. I would rather use a no index no follow, a rel canonical, 301 redirect, etc. But this is a good post. Internet marketing ninjas seems like a pretty rad company. Here is a good one an employee just wrote http://ignitevisibility.com/newbies-guide-blocking-content-robots-txt/ @johnelincoln