To learn SEO

Robots.txt What is and how to have one good optimized for SEO

Pinterest LinkedIn Tumblr

The frequent visits that the dredges from the web search engines to your website realise are a good signal. Nevertheless, the form in which these index your Web can generate some problems.

When a robot of Google, for example, begins to analyze your website does not know what pages are those that you wish to position, which no, what parts you want to maintain hidden, etc.

You need to tell him how to treat the different parts from your website through called indications put-you label; and for it they require of which its same language speaks, as it happens to the robots.txt file.

What is the robots.txt file?

robot-TXTA robots.txt file is an instruction set for bots that serves to control the activities of the dredges inside your website.

It thinks about a robots.txt file as if outside a placed Code of conduct on the wall of a gymnasium, a bar or a communitarian center: the poster in himself does not have to be able to make fulfill the enumerated rules, but the good users will follow the rules, whereas it is probable that the rebels break them, and are expelled.

Bot is an automated software that interacts with websites and applications. There is good bots and bots bad, so to speak, and a good type of bot is the tracking robot of a Web.

These bots analyze the webpages and index the content so that they can appear in the results of the web search engines.

robots.txt helps to administer to the activities of these dredges Web so that they do not overload the servant who lodges the site or so that they do not index the pages that are not destined to be visited.

How work a robots.txt does file?

A robots.txt file is only a text file, without code of noticeable HTML (of there the .txt extension).

The robots.txt file is lodged in the Web server, like any other element of the site.

In fact, you can see the robots.txt of any page writing the complete URL of the domain, and soon adding /robots.txt.

For example:  https://www.amazon.com/robots.txt 

The file is not connected from no other part of the site, reason why it is probable that the users do not find it, but the majority of the robots search Web will look for first east file before tracking the rest of the content of the Web.

Although a robots.txt file provides instructions for the robots, in fact cannot make comply with the instructions.

It is important to consider that all the subdomains need their own robots.txt file.

For example, while www.cloudflare.com has its own file, all subdomains (blog.cloudflare.com, community.cloudflare.com, etc.) also have his own one.

Why you must have a Robots.txt file?

The robots.txt archives control the access from the dredge to certain areas of a site; although this can be dangerous if by accident you do not allow that Googlebot tracks all the Web.

There are some situations in which a robots.txt file can be very useful, as for example:

  1. To avoid that it appears content duplicated in the SERPs (they ten in account that often put them robots are one better option for this).
  2. To maintain in private whole sections of a website (for example, the site of preparation of your equipment of engineering).
  3. To avoid that the pages of results of internal searches appear in the SERPs.
  4. To specify the location of sitemaps.
  5. To avoid that the web search engines index certain archives in your website (images, archives pdf, etc.).
  6. To specify a tracking delay to avoid that your servers overload themselves when the dredges load multiple parts of content simultaneously.

If you do not have areas in your Web to which you want to control the access, it is very possible that you do not need a robots.txt file.

Parameters that you must know

Before home, you must know that there is a series of commandos who you must know to be able to create your own robots.txt. They are the following:

  1. User-agent: Specific to what robot will affect all the indications that you put underneath this one.
  2. Disallow: It indicates to bot that the content is blocked and we do not want that tracks it.
  3. Allow: It allows the tracking and it is used to make some exception of the previous case.
  4. Sitemap: it says to him to bot where he is sitemap of your page.
  5. Crawl: it indicates some seconds of retardation between each scanned page.

In order to create a robots.txt file Notepad uses, note Notepad++ or Bloc. You do not need anything complex. Also you can publish or create it through your plugin of SEO, as Yoast or All in One SEO Pack.

How to create a robots.txt optimized for the SEO (for WordPress)

Now that you know what is and the important thing that it is a robots.txt file, we are going to create a robots.txt optimized for SEO, but for the CMS WordPress, that is the one that we used all almost.

Here we will not see in depth everything on a robots.txt, because it is needed much space and time. Simply we will concentrate in writing up a guide of how creating one optimized for SEO.

Once you have open your text processor. We begin.

#1 Things to block in your robots.txt

First it is to tell him to what robot we wished to indicate instructions to him and which are.

It would be thus:

As the indications are to any agent, it is placed *. If it went to some concrete one, the name would be put, for example, Googlebot.

Down, the indication of disallow is with one /, that it indicates to all the site.

Some archives and directories in the site of WordPress exist that would have to be blocked from the home. The directories to whom the access in the robot.txt file is due to reject are the directory cgi-bin and the directories standard of WP.

The directives used for the previous thing are these:

#2 Blockades in agreement with your configuration of WordPress

blockade bots with robot.txtYou must know how your WordPress uses the labels or categories to structure the content.

If you are using categories, you must vice versa block the archives of labels of the web search engines and.

First of all, it verifies the base, for it Accesss to the Panel of administraci³n> Configuraci³n> Permalinks.

Of predetermined form, the base is a label if the field is in target. You must deshabilitar the label in robot.txt as it is indicated next:

  • Disallow: /tag/

If you are not using the category, you must block it in robot.txt as it is indicated next:

  • Disallow: /category/

If you are using both, categories and labels, then you do not have to do nothing in the robot.txt file.

If you do not use labels nor categories, it blocks both in robot.txt as it is indicated next:

  • Disallow: /category/
  • Disallow: /tag/

#3 Bloquear archives of WordPress

Different archives in WordPress are used to show the content, that do not need to be available for the web search engines.

So also you must block them. The different used archives to show the content are archives PHP, INC. archives JS, archives, archives CSS, etc.

You must block them in robot.txt as it is indicated next:

Taken care of much with blocking archives Javascript, css or php because it could be that the robot of Google did not read your page correctly and that would harm to you.

#4 Bloqueo of Spam and others

Until now already you have created a robots.txt good for SEO, in which you have blocked everything what it does not serve and have left only articles, pages, category and images available for the finders.

But the robots.txt file is used for much more that; to then it protects you of bots that only tracks your content, they rob your strategy or architecture to you and they use your resources without contributing nothing absolutely.

Bots throws a look to the complete list of those here.

#5 Testea your robots.txt

check robot-TXTOnce you have reunited all the previous one in a single file and have added your own rules, the moment arrives for proving the file. It is not going to be that you block what you do not have.

First that you must do it is to discharge from the hospital your site in Google Search Console. Once you have followed all the steps (if you have doubts visit this connection) it uses the fitting room of robots.txt.

For it the robots.txt file raises that you have created and executes it as if you were Google.

There you will be able to see the possible errors; now, if everything is well, she raises your website.

This file you must maintain updated it and to be adding, for example, the changes that can have in WordPress, new sites that does Spam to you of commentaries, in your analytical ones or the same directives of Google, etc.

Example of robots.txt

In order to finalize, and although either you have learned how to make you yourself a robots.txt file, we left an example you or done, so that you only change certain data that appear by yours.

It analyzes and it takes care of well what makes this file, because its yield is different according to the characteristics from your servant and your webpage.

Here you have the ready example of robots.txt to use.

It remembers that, to have a robots.txt file in your website it will help to that the tracking robots know to interpret better the parts of the Web and they concentrate in which really has importance.

Author

2 Commentaries

  1. Hello Edu thank you very much by the post, I have a doubt on robots that listens in youtube and decia something as well as that is better not to have anything because google enters if or if to all the site and not giving the permissions him in robots were detrimental because it was as to deny the entrance to him to bot of google and generated bad results or position to the site, that has this of certain for this year 2020? thank you very much

    • Edu Coromina To respond

      Hello Leonardo!

      Certainly the robots.txt file must handle itself with care, and I advise to make it in ecommerce especially medium/great to block autogenerated filters or URLs. It is a delicate file and sometimes people use a group of robots.txt when she always must personalize itself to the needs of each.

      Another important factor is not to block by robots.txt URLs or sections that nevertheless appear in sitemaps, as well as to understand the certain loss of linkjuice of URLs that can aim at URLs blocked by this file.

      Also it serves as a form to block bots of tools as Screaming Frog or Netpeak Spider of peculiar eyes.

      If the directives are formed well, the robots.txt file does not harm.

      A hug!

It leaves a Commentary

This site uses Akismet to reduce the Spam. It learns how the data of your commentaries are processed.