Block Dynamic URLs From Googlebot Using Your Robots.txt File

I’ve been looking for out how one can block some dynamic urls from the Googlebot. The seek bots for Yahoo! Slurp and MSNBot use the similar or very equivalent syntax to dam dynamic urls. As an instance I’ve this one line in my htaccess document which permits me to make use of static pages as a substitute of dynamic pages however I discovered on occasion the Googlebot will nonetheless move slowly my dynamic pages. This can result in replica content material which is not condoned via any of the various search engines.

I’m looking to blank up my personals web site because it lately ranks neatly with Yahoo however now not Google. I imagine MSN Live has equivalent algorithms to Google however this is not scientifically confirmed in any respect. I best state this from my very own non-public revel in with search engine optimization and my shopper’s websites. I imagine I’ve discovered some solutions on score neatly with Google, MSN and in all probability Yahoo. I’m in the course of checking out at the moment. I’ve controlled to rank neatly on Google for a shopper’s web site already for related key phrases. Anyway, this is how one can block the dynamic pages from Google the use of your robots.txt document. The following is an extract of my htaccess document:

RewriteRule personals-dating-(.*).html$ /index.php?web page=view_profile&identity=$1

This rule, if you happen to’re questioning, permits me to create static pages similar to personals-dating-4525.html from the dynamic hyperlink index.php?web page=view_profile&identity=4525. However, this has led to issues as now the Googlebot can and has “charged” me with replica content material. Duplicate content material is frowned upon and reasons extra paintings on Googlebot as a result of now it has to move slowly additional pages and it may be considered as spammy via the set of rules. The ethical is replica content material will have to be have shyed away from in any respect prices.

What follows is an extract of my robots.txt document:

User-agent: Googlebot

Disallow: /index.php?web page=view_profile&identity=*

Notice the “*” (asterisk) signal on the finish of the second one line. This simply tells the Googlebot to forget about any choice of characters within the asterisk’s position. For instance, Googlebot will forget about index.php?web page=view_profile&identity=4525 or every other quantity or set or characters. In different phrases, those dynamic pages may not be listed. You can take a look at to look in case your regulations to your robots.txt document will serve as accurately via logging into your Google webmaster keep watch over panel account. If you should not have a Google account you then merely wish to create one from Gmail, AdWords or AdSense and you can have get admission to to the Google site owners gear and keep watch over panel. If you are wishing to reach upper scores then you will have one. Then all you wish to have to do is be logged into your gmail, adwords, or AdSense accounts to have an account. They make it beautiful easy to arrange an account and it is unfastened. Click the “Diagnostics” tab after which the “robots.txt research device” hyperlink beneath the Tools segment within the left column.

By the best way, your robots.txt document will have to be to your webroot folder. The Googlebot assessments your web site’s robots.txt document as soon as an afternoon and it is going to be up to date to your Google site owners keep watch over panel beneath the “robots.txt research device” segment.

To check your robots.txt document and validate in case your regulations will serve as accurately with Googlebot then merely sort within the url that you just wish to check within the box “Test URLs by contrast robots.txt document”. I added the next line to this box:

Then I clicked at the “Check” button on the backside of the web page. The Googlebot will block this url given the stipulations. I imagine this can be a higher approach to block Googlebot moderately than use the “URL Removal” device which you will use. The “URL Removal” device is at the left column of your Google site owners keep watch over panel. I’ve learn in a couple of instances within the Google teams that folks have had issues of the “URL Removal” device.

Tags :