The Complete Guide to Robots.txt and Noindex Meta Tag In Technical SEO

The Robots.Txt File And Noindex Meta Tag Are Important For Doing On-Page SEO. This Gives You The Power To Tell Google Which Pages They Should Crawl And Which Pages They Should Index – Display In The Search Results.
Most Major Search Engines (Including Google, Bing And Yahoo) Recognize And Honor Robots.Txt Requests.

This Guide Covers All The Ways To Use Robots.Txt On Your Website, But, While It Looks Simple, Any Mistakes You Make In Your Robots.Txt Can Seriously Harm Your Site Performance, So Make Sure You Read And Understand The Whole Of This Article Before You Start Technical SEO.

  • What Is Robots.Txt File?
  • What Does Robots.Txt File Do?
  • Where Should We Put Our Robots.Txt File?
  • How To Create Robots.txt File?
  • Robots.Txt Syntax Meaning
  • Proper Usage Of Wildcards
  • What Is X-Robots-Tag

What Is Robots.txt File?

The Robots Exclusion Protocol, Or More Commonly Known As Robots.Txt Is A File That Directs Web Crawlers And Robots Such As Googlebot And Bingbot To Which Pages Of Your Websites Should Not Be Crawled.
Robots.Txt Is A File That Tells Search Engine Spiders To Not Crawl Certain Pages Or Sections Of A Website. Most Major Search Engines (Including Google, Bing And Yahoo) Recognize And Honor Robots.Txt Requests.
You Can Do More With Robots.Txt Than The Site Outlines, Such As Using Wildcards, Sitemap Links, And The “Allow” Directive. All Major Search Engines Support These Extensions.
It Is Also Known As The Gatekeeper For Your Entire Site. Bot Crawlers’ First Objective Is To Find And Read The Robots.Txt File, Before Accessing Your Sitemap Or Any Pages Or Folders.

What Does Robots.txt File Do?

Search Engines Index The Web By Crawling Pages, Following Links To Go From Site A To Site B To Site C And So On. Before A Search Engine Spiders Any Page On A Domain It Hasn’t Encountered Before, It Will Open That Domain’s Robots.Txt File, Which Tells The Search Engine Which Urls On That Site It’s Allowed To Index.

Search Engines Typically Cache The Contents Of The Robots.Txt. Changes Will Be Reflected Fairly Quickly.

Where Should We Put Our Robots.Txt File?

The Robots.Txt File Should Always Be At The Root Of Your Domain. So If Your Domain Is Www.Example.Com, It Should Be Found At Https://Www.Example.Com/Robots.Txt.
It’s Also Very Important That Your Robots.Txt File Is Actually Called Robots.Txt. The Name Is Case Sensitive.

How To Create Robots.txt File?

By Default, A Robots.Txt File Would Look Like This:

Create One Notepad File Name As Robots.txt And Add Coding As Follows

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: http://www.example.com/sitemap.xml.gz

Robots Syntax Meaning

User-agent Specifies the crawl bot you want to block from crawling a URL eg. Googlebot, Bingbot, Ask, Yahoo.
Disallow Specifies that a URL and all other URLs under it should be blocked
Allow This is only followed by Googlebot. It tells it that a page can be crawled even if the parent page is disallowed
Sitemap Specifies the location of your website’s sitemap

Proper Usage Of Wildcards

In The Robots.Txt, A Wildcard, Represented As The (*) Symbol, This Can Be Used As A Symbol For Any Sequence Of Characters.

A Directive For All Types Of Crawl Bots:

User-Agent:*

The Wildcard Could Also Be Used To Disallow All Urls Under The Parent Page Except For The Parent Page.

User-Agent:*
Disallow: /Authors/*
Disallow: /Categories/*
Disallow: /Images

This Means All Page Urls Under The Main Author Page And Categories Page Are Blocked Except For Them.
This Tells All Spiders To NOT Crawl Your Images Folder.

A Good Example Of A Robots.Txt File Would Look Like This

User-Agent:*
Disallow: /Testing-Page/
Disallow: /Account/
Disallow: /Checkout/
Disallow: /Cart/
Disallow: /Products/Page/*
Disallow: /Wp/Wp-Admin/
Allow: /Wp/Wp-Admin/Admin-Ajax.Php
Sitemap: Yourdomainhere.Com/Sitemap.Xml

After Editing Your Robots.Txt File, You Should Upload In The Top-Level Directory Of Your Website’s Code So When A Bot Enters Your Website For Crawling, It Would See The Robots.Txt File First.

Read More Here : https://www.portent.com/blog/seo/robots-txt-mistakes-best-uses-guide.htm

What Is X-Robots-Tag

The X-Robots-Tag Allows You To Control The Indexing Of A Page In The HTTP Response Header Of The Page. The X-Robots-Tag Is Similar To The Meta Robots Tag But It Also Allows You To Tell Search Engines Not To Show Specific File Types In The Search Results Such As Images And Other Media Files.
To Do This, You Need To Have Access To Your Website’s .Php, .Htaccess, Or Server Access File.
If You Plan On Programmatically Blocking A Particular Page That Is Written In Html, Then Using The Meta Tag Should Suffice. But If You Plan On Blocking Crawling Of, Lets Say An Image, Then You Could Use The http Response Approach To Do This In Code.

Imagine you run a website which also has some .doc files, but you don’t want search engines to index that filetype for a particular reason. On Apache servers, you should add the following line to the configuration / a .htaccess file:
Then You Can Use The Following Code In htaccess file

<FilesMatch “.doc$”>
Header set X-Robots-Tag “noindex, noarchive, nosnippet”
</FilesMatch>

Or, if you’d want to do this for both .doc and .pdf files:

<FilesMatch “.(doc|pdf)$”>
Header set X-Robots-Tag “noindex, noarchive, nosnippet”
</FilesMatch>

If you want to prevent search engines from showing files you’ve generated with PHP, you could add the following in the head of the header.php file:

header(“X-Robots-Tag: noindex”, true);

This would not prevent search engines from following the links on those pages. If you want to do that, then alter the previous example as follows:

header(“X-Robots-Tag: noindex, nofollow”, true);

Let'sDigital
Average rating:  
 0 reviews

Leave a reply:

Your email address will not be published.

Site Footer

Sliding Sidebar

We Work For The Best

We Work For The Best

Having Worked For Multiple Digital Marketing Agencies Over The Last Few Years Mahesh Gangurde Decided To Form LetsDigital, With The Goal Of Providing A Simple, Affordable And Effective Digital Marketing Services. Over The Last 4 Years We Have Built Up A Team Of Passionate Professionals In Mumbai And Maharashtra, Each Committed To Remaining At The Forefront Of The Digital Space.

Social Profiles

Facebook

Digital Marketing Is Not Job Its A Lifestyle For Many, So Many Passionate People Around Seeking For Something, Sharing Their Knwoledge. Its A Nomad Life:), Its All About #LetsDigital #Marketing #MaheshGangurde