Robots.txt - Optimize for Search Engine Bots and Improve Your SEO

Today I’m going to show you – How to optimize WordPress robots.txt file for SEO. Let’s say, there are certain files, pages or directory folders that you don’t want search bots to index.

Why so?

It’s because they may contain some personal information or some random content that lacks SEO quality and might get low page rating for a website.

Now, if you agree with me here? You would definitely want to control the way search engine bots crawl your website!

Well, an optimized Robots.txt file does it for you and is placed on your website’s server. The file gives instructions to search bots what to crawl or cache in a website and what not to.

So, if you want Google & other search engine bots or spiders to index your website’s blog posts and pages properly, then make sure you’ve read everything written in this guide.

I am sharing my own experience, which I’ve experimented on TheMaverickSpirit 🙂

Let’s get started –

Table of Contents

What is Robots.txt File for Search Engines?

Robots.txt is a plain text file (encoded in UTF-8) placed in the root folder of a website on the web server. This file is read by search engine spiders also known as bots.

It uses Robots Exclusion Protocol which is a standard responsible for controlling the indexing of your site’s web pages.

The robots exclusion standard, or robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots.
The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.
– By Wikipedia

What is a Web Crawler or a Spider? A crawler is a service or agent that crawls websites. Generally speaking, a crawler automatically and recursively accesses known URLs of a host that exposes content which can be accessed with standard web-browsers.

As new URLs are found (through various means, such as from links on existing, crawled pages or from Sitemap files), these are also crawled in the same way.
– By Google Webmasters

Why Robots.txt File is Important?

Robots.txt plays an important role in website’s SEO performance.

For example –

What if –

You have the same post that appears on your blog page, category page and tag page?

– Search engine’s bots will charge you for duplicate content on your website. Right!

– I know doing so enhance your user experience and simplifies website navigation.

But,

Googlebot, Yandex, and any other search bot will index everything if they are not able to locate Robots.txt file!

To avoid this,

I will tell Google not to index category and tag by writing a simple code in robots.txt.

Do I really need Robots.txt file for WordPress?

It is not mandatory to have it as google will still index your website even in the absence of robots.txt.

But bots of search engines look after this file to take instructions before crawling your website.

Hence, if you don’t include they will index each and everything of your website but having it can control the indexing of your site content.

How To Make Robots.txt File?

Extension ‘txt’ clearly says it is just a text file. Ultimately, you can use any text editor, notepad, word processor or VI editors for Linux to create and open the file.

Robots.txt is a collection of records (or lines) and groups. Each record is a combination of a field, a colon, and a value.

General format for a record used is –

Field: Value

Valid Field Elements you can use in the robots.txt are –

1. User-agent

It is used to identify the specific web crawler or a set of web crawlers.

2. Allow

It is a directive which is used to specify paths that we want crawlers to access.

3. Disallow

It is a directive which is used to specify paths that we don’t want crawlers to access.

4. Sitemap

It is used to specify the URL of your website’s sitemap. Multiple sitemap entries may exist.

And a set of lines form a group. “user-agent” will always be at the beginning of the group.

For example –

User-agent: *
Disallow: /support
Disallow: /CGI-bin
Disallow: /images

The above lines –

are a part of a single group and user-agent is always use to start a group.
explains that we don’t want all the web crawlers to access support, CGI-bin and images folders of our web server.

I will release out a brief guide later.