Best Robots.txt Tutorial to Get Higher Index Rate

In this tutorial, I will show you the best instruction for how to create best robots.txt file for Proper SEO to get higher index rate in search engines. Best Robots.txt Tutorial to Get Higher Index Rate!

Maybe you heard before about this very small file, but you never know about its powerful instructions that probably block all search engines crawler bots which lead to de-index all your site content and drag you to the bottom of the world.

Best Robots.txt Tutorial to Get Higher Index Rate
It is wonderful when search engines frequently visit your site and keep index your new published content bring you more fresh organic traffic but there is some area in your site you shouldn’t let crawler bots get inside it to prevent duplicate content issue or you have sensitive data that you don’t want to show for anyone and prevent index these data.

Robots.txt is the simplest way to tell search engine crawler bots where they allow to go or they not allowed.

What is Robots.txt File?

Robots.txt is a text file created by webmasters to guide search engines crawler bots (Google, Yahoo, Bing, Ask, AOL, Baidu and Yandex) for how to crawl and index their pages on. It’s very simple text file placed in the root folder (directory) on your site.
Its uses the Robots Exclusion Standard Protocol with simple commands lines that can be used by websites to communicate with web crawlers and other web robots.

Hint: Location and Link of Robots.txt file should be “http://example.com/robots.txt

What is Robots.txt File

Should You Have a Robots.txt?

If you would like to crawler and index your whole website, then actually you don’t need robots.txt file at all. Or as Google documents said “You only need a robots.txt file if your site includes content that you don’t want Google or other search engines to index.”

But there are many Reasons are suitable for using robots.txt file

  • You need to hide some content from search engines
  • You have sensitive data that you don’t want to show to the world or even index
  • You have download page for products or application and you don’t need Google to find it
  • You have redirects rules by any WordPress plugin and need to hide these redirect pages from bots
  • Your site is online but still in developing stage and you don’t want crawler bots to find it or index it yet
  • You have two versions of your site (viewing or browsing one and another one for printing), and you need to exclude printing version from crawling

As you can see robots.txt file has powerful endless list of instructions for how to access your site, it all depends on your needs.

Limitation and Instructions

As we mentioned previously, add wrong commands in robots.txt file can greatly hurt your site index. So you have to learn the basics of robots.txt files to guide search engines bots correctly.

 

User-agent:

is the name of the robot who should applied for directions rules

User-agent: *
it means, all robots from all search engines should apply for following directives

User-agent: Googlebot
it means, following directions apply only for Google Bots

Other Google user agent commands

Disallow:

Anything will follow the “Disallow” command, will not crawler, find, access or even index.
So, you have to be very carefully of this command because it’s very harmful. In the same time it’s helpful to exclude certain folder from index in your site.

For example: you have folder called “log” and you don’t like to seen or find by any robots

These two lines tells all robots (User-agent: *) to NOT access or index “log” folder (Disallow: /log)

Allow:

Everything comes after “Allow” command will index and discovered by all robots.
Let’s complete above example for “log” folder but also there is an image file called “round.png” inside “photos” folder that you are using it to display something in your layout.

Above three lines have the following instructions

  • I am Talking to all robots (User-agent: *)
  • Not access or index “log” folder (Disallow: /log)
  • Access, index and display “round.png” image (Allow: /photos/round.png)

See, it’s very simple commands and flexible to do anything you need for controlling access to your site.

Sitemap:

Its common practice to add your XML sitemap access link (not HTML) at the end of your robots.txt file to be discovered fast by all search engines as following

How to Create a Robots.txt File?

There is no much experience needs to make robots.txt file, its very simple txt file that you can create it by any plain text editor like “Windows Notepad” then upload it to your root folder of your site (same directory of .htaccess file)

Hint: don’t forget to test your robots.txt file for access your pages by Google robots.txt Tester tool in Google Webmaster Tools.

How-to-Create-a-Robots.txt-File

What is the best Instructions for Robots.txt File?

There is no instructions fits all webmaster’s needs because each site has its own theme, layout, web server construction, plugins rules, etc.
If you browse robots.txt file for each site you visit, you will find a lot of variations from each one to give them what they need. Same as you, you have to write notes first for instructions you need before you upload the robots.txt file to your server.
Here is some basics instructions:

  • Must be named as robots.txt not Robots.TXT
  • Robots.txt must saved as text file.
  • Must be placed in the root of your domain (highest-level directory of your site).
  • WordPress users must disallow “cgi-bin” , “wp-admin” and “trackback” folders as Following

Google Panda 4 Update and blocking your Resources (CSS & JS)

In past, it was common practice to disallow resources folder “/wp-content/” which contain all your images, stylesheets and javascript to save bandwidth or any other reasons which is completely wrong.
After Google Panda 4 Update many webmasters have been hit because they blocked GoogleBots to render their websites correctly which lead for the following error

Hint: Google requirements that all JavaScript and CSS files that responsible for your site’s layout are not blocked.

Google Panda 4 Update and blocking your Resources

How to Check if GoogleBots has Access to Render Your Site Correctly?

  1. In Google Webmaster Tool Home page, choose the site need to be check.
  2. Expand the Crawl heading on the left dashboard, and select “Fetch as Google” tool.
  3. Click on “FETCH AND RENDER” Red Button.
  4. Wait to Complete Fetch process and has black “Right” sign.
  5. Once completed, click on green “Right” sign.
  6. You will see two separate windows (one for Googlebot and one for visitor view)
  7. Check for any blocked resources errors appear or any difference in the layout for both windows then fix it as soon as possible to get higher index rate

How-to-Check-if-GoogleBots-has-Access-to-Render-Your-Site-Correctly

Conclusion:

  • Check Google Webmaster Tools Crawler errors because they may fix easily by simple line add to robots.txt file.
  • Be careful while writing your robots.txt file because single mistake may lead to your site invisible to search engines.
  • Check robots.txt file after each plugin installed because there some plugins add rules to your file which may conflict with your robots.txt file rules.
  • Its highly recommend to be sure that Googlebot can access any resource files that meaningfully contributes to your site’s visible content or its layout.

Leave a Reply

50% off over 500 items. Grow your toolkit for all projects. Design, build, produce.Cyber Monday