Why and How to Create Robots.txt file!
← → |
How can you improve your website’s crawlability for better Search Engine Optimization? There are many ways by optimizing your website to make easier for spiders to crawl and understand your page’s themes like using Robots Meta Tags but one easy and effective way is to tell bots what to index and what they should not by creating a Robots.txt file and placing it to your root folder.
Having a effective robots.txt file will help your blog to rank higher in Search Engines, receive higher paying relevant Ads, and can increase your blog traffic.
The robots.txt file will instruct search engine robots what pages on your blog or website should be crawled and indexed. Most websites (CMS, Wordpress, etc.) will have files and folders that are not relevant for search engines (like images or admin files) and you really don’t want them to crawl them because there is no relevant content on them so creating a simple robots.txt file can actually improve your website crawlability.
When I did search for indexed files on one of my websites had my style.css file and some admin files indexed which did not do me any good for SEO (to see how many and which files are indexed on your website you can use our free Index Checker Tool ).
Robots.txt samples
Here’s a simple robots.txt file if you use Wordpress:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. You can use “User-agent: Googlebot” to instruct Google bot only.
“Disallow: /wp-” makes sure that the search engines will not crawl the Wordpress files. This line will exclude all files and foldes starting with “wp-” from the being indexed, avoiding duplicated content and admin files.
Here’s more comprehensive sample and available options for WP 2.+:
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /stats/
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /category/
User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
# disallow all files with ? in url
Disallow: /*?*
# disable duggmirror
User-agent: duggmirror
Disallow: /
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
Click Here to see my Robots.txt file.
If you are not using Wordpress and want to improve your SEO just substitute the lines with files or folders on your website that you don’t want to be crawled, for example:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder or file to be excluded/
A robots.txt can be created easily with Notepad. After you created the robots.txt file just upload it to your root directory and that’s it!
For best SEO in your robots.txt you should probably only disallow anything that you want a robot to completely ignore.
Using Robots Meta Tags
Stop all robots from indexing a page on your site, but still follow the links on the page
<meta name=”robots” content=”noindex,follow” />
Allow other robots to index the page on your site, preventing only Googles bots from indexing the page
<meta name=”googlebot” content=”noindex,follow” />
Allow robots to index the page on your site but not to follow outgoing links
<meta name=”robots” content=”nofollow” />
Google Sponsored Robots.txt Articles
- Controlling how search engines access and index your website
- The Robots Exclusion Protocol
- robots.txt analysis tool
- Googlebot
- Inside Google Sitemaps: Using a robots.txt file
- All About Googlebot
Do you use Robots.txt file for SEO? Maybe it is time to use it.
← → |







