Robots.txt – Control How Search Engines Should Crawl Your Website!

Robots txt file to improve SEO Pin It

Today I’m going to show you – How to optimize WordPress robots.txt file for SEO. Let’s say, there are certain files, pages or directory folders that you don’t want search bots to index.

Why so?

It’s because they may contain some personal information or some random content that lacks SEO quality and might get low page rating for a website.

Now, if you agree with me here? You would definitely want to control the way search engine bots crawl your website!
SEMrush
Well, an optimized Robots.txt file does it for you and is placed on your website’s server. The file gives instructions to search bots what to crawl or cache in a website and what not to.

So, if you want Google & other search engine bots or spiders to index your website’s blog posts and pages properly, then make sure you’ve read everything written in this guide.

I am sharing my own experience, which I’ve experimented on TheMaverickSpirit 🙂

Things you will learn in this guide -

1. What is Robots.txt File for Search Engines?

2. Why Robots.txt File Important?

3. Do I really need a Robots.txt file for WordPress?

4. How to make robots.txt file?

5. Where should I put my robots.txt file?

Let’s get started –

What is Robots.txt File for Search Engines?

Robots.txt is a plain text file (encoded in UTF-8) placed in the root folder of a website on the web server. This file is read by search engine spiders also known as bots.

It uses Robots Exclusion Protocol which is a standard responsible for controlling the indexing of your site’s web pages.

The robots exclusion standard, or robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots.

The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

– By Wikipedia

What is a Web Crawler or a spider?

A crawler is a service or agent that crawls websites. Generally speaking, a crawler automatically and recursively accesses known URLs of a host that exposes content which can be accessed with standard web-browsers.

As new URLs are found (through various means, such as from links on existing, crawled pages or from Sitemap files), these are also crawled in the same way.

– By Google Webmasters

Why Robots.txt file is important?

Robots.txt plays an important role in website’s SEO performance.

For example –

What if – 

You have the same post that appears on your blog page, category page and tag page?

– Search engine’s bots will charge you for duplicate content on your website. Right!

– I know doing so enhance your user experience and simplifies website navigation.

But,

Googlebot, Yandex, and any other search bot will index everything if they are not able to locate Robots.txt file!

To avoid this,

I will tell google not to index category and tag by writing a simple code in robots.txt.

Do I really need Robots.txt file for WordPress?

It is not mandatory to have it as google will still index your website even in the absence of robots.txt.

But bots of search engines look after this file to take instructions before crawling your website.

Hence, if you don’t include they will index each and everything of your website but having it can control the indexing of your site content.

How to make robots.txt file?

Extension ‘txt’ clearly says it is just a text file. Ultimately, you can use any text editor, notepad, word processor or VI editors for Linux to create and open the file.

Robots.txt is a collection of records (or lines) and groups. Each record is a combination of a field, a colon, and a value.

General format for a record used is –

  • Field: Value

Valid Field Elements you can use in the robots.txt are -

  1. User-agent:  It is used to identify the specific web crawler or a set of web crawlers.
  2. Allow: It is a directive which is used to specify paths that we want crawlers to access.
  3. Disallow: It is a directive which is used to specify paths that we don’t want crawlers to access.
  4. Sitemap: It is used to specify the URL of your website’s sitemap. Multiple sitemap entries may exist.

And a set of lines form a group. “user-agent” will always be at the beginning of the group.

For example – 


User-agent: *

Disallow: /support

Disallow: /CGI-bin

Disallow: /images

The above lines –

  • are a part of a single group and user-agent is always use to start a group.
  • explains that we don’t want all the web crawlers to access support, CGI-bin and images folders of our web server.

I will release out a brief guide later. 

Where should I put my robots.txt file?

The robots.txt file should always be at the root of your website.

For example –

My domain is themaverickspirit.com..

..so the URL for the file will be – themaverickspirit.com/robots.txt.

Conclusion –

Hope you now know the importance of this file for SEO.

Do you use any other method to control the indexing of your website from search engine bots?

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This