OSCOMMERCE SUPPORT CALL 702-453-3332

 

Help - Search - Members - Calendar
Full Version: Help on 'robots.txt'
osCommerce Community Support Forums > osCommerce Online Merchant v2.x > Tips and Tricks
mhormann
Hi everybody.

I've seen a lot of 'bad' (i.e. non-working) 'robots.txt' files lately, so I want to give a few tips.

'robots.txt' is a file that 'well-behaved' (not all!) spiders and search engines respect and check for. It is used to specify what you DON'T want the spiders to see.

Here are some guidelines:

1. The filename MUST be 'robots.txt'.
EXACTLY that. All lowercase, plural (i.e., NOT 'robot.txt' or 'Robots.txt'.

2. The file MUST have 'Linux-type' (i.e. Linefeed, "\n") type line endings.
NOT Mac (CR), NOT DOS (CRLF). If you work on any of these, use an editor that allows to save in 'UNIX Mode'.

3. The file MUST be in your web root.
If you put it in, say '/catalog/', no bot will ever see or check it. ALWAYS put in your web root, i.e. "http://www.mydomain.com/robots.txt".

4. Comment lines.
You CAN have comment lines. They start with a '#' in column ONE. Be careful NOT to separate too much using empty lines (see rule 9)!
CODE
  # BAD: Not starting in column 1.

CODE
# GOOD: Start in column 1.


In theory, it IS allowed to put comments on a 'Disallow' or 'User-agent' line like 'User-agent: Googlebot #this is Google'. DON'T USE IT! It is bad practise, and some spiders will misinterpret it and instead spider what they weren't supposed to.
CODE
# BAD: Comments on the same line.
User-agent: Googlebot # Google

CODE
# GOOD: Comments on separate lines.
# Google
User-agent: Googlebot


5. White space.
In theory, it IS allowed to use white space (i.e., empty lines or indentation by blanks or tabs). DON'T DO IT! Some spiders will misinterpret it. ALWAYS start comments, 'User-agent:' and 'Disallow:' in column 1. And DON'T use tabs but blanks instead.
CODE
# BAD: Indentation (not starting in column 1).
  User-agent: Googlebot
  Disallow:     /admin/

CODE
# GOOD: Start in column 1, use ONE blank after the ':'.
User-agent: Googlebot
Disallow: /admin/


6. User-agent: [spider's name]
Type it EXACTLY like this. It means the spider. You CAN put '*' to make it target ALL spiders.
CODE
# BAD: all lowercase
user-agent: *
# BAD: all uppercase
USER-AGENT: *
# BAD: no '-', 'Agent' has uppercase 'A'
User Agent: *

CODE
# GOOD: (Google)
User-agent: Googlebot
# GOOD: (ALL spiders)
User-agent: *


7. Disallow: [path/filename to exclude]
Use ONLY ONE path and/or filename per line, i.e. NOT "Disallow: /cgi-bin /stats"!
CODE
# BAD: Multiple paths/files on one line
Disallow: /cgi-bin /stats

CODE
# GOOD: One definition per line
Disallow: /cgi-bin
Disallow: /stats


Disallow works like an 'automatic wildcard' (without '?' or '*') by matching from the left, i.e. "Disallow: /help" would match the DIRECTORY "/help", the directory "/helpfiles", the FILE "/help.htm", the FILE "/helpfile.php" and so on.

So if you want to exclude a complete directory but NOT files with same name (i.e. you want to exclude the '/catalog/elmar/' directory but NOT 'elmar_start.php', it is good practise to write it like "Disallow: /catalog/emar/" (with ending '/').
CODE
# GOOD: Disallow directory 'elmar' but not 'elmar_start.php'
Disallow: /elmar/


8. There is NO 'Allow:'!
If you want to allow anything, you must disallow the rest and put an empty 'Disallow:' at the end!
CODE
# BAD: (intended: disallow 'Jane' but not 'John')
Disallow: /Jane
Allow: /John

CODE
# GOOD: (disallow 'Jane', allow all the rest)
Disallow: /Jane
Disallow:


9. NEVER have a BLANK LINE BETWEEN 'User-agent:' and it's corresponding 'Disallow:' lines!
Some spiders will mis-interpret this as to be allowed spidering your whole site. You CAN have comment lines in between.
CODE
# BAD: Blank line between 'User-agent:' and 'Disallow:'
# This should exclude Google
User-agent: Googlebot

# And here we say which to exclude
Disallow: /
# Result: Some spiders will instead assume they're ALLOWED you whole site!

CODE
# GOOD: NO blank lines between 'User-agent:' and corresponding 'Disallow:'
# This should exclude Google
User-agent: Googlebot
# And here we say which to exclude
Disallow: /
# Result: Google will be kept from spidering your whole site.


10. Always go from 'more specific' to 'less specific'!
Start with the most specific rules, then go to the least specific. This means, the part for 'User-agent: *' should come LAST in your 'robots.txt'! The reason: If a spider sees 'User-agent: *' FIRST it might stop scanning since it's one of 'All spiders', so it'll not bother to look through the rest of your file if it's specifically addressed elsewhere!
CODE
# BAD: Spider might not honor this
# Allow everything to all other spiders
User-agent: *
Disallow:

# Disallow Google
User-agent: Googlebot
Disallow: /

CODE
# GOOD: First do the specifics, then the 'rest of them'
# Disallow Google
User-agent: Googlebot
Disallow: /

# Allow everything to all other spiders
User-agent: *
Disallow:


11. Use a 'robots.txt' validator.
One might make mistakes. It's good practise to check using a validator.

Here's a good one (has some examples even):
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

And here's one that checks on even more potential problems:
http://tool.motoricerca.info/robots-checker.phtml

12. One more tip: Search engines get clever.
If you really have to run a lot of sites... Hey, comparing their 'robots.txt' files is FAST and makes it VERY easy for SEs to find if they're all the same... and so they start assuming they get tricked and rank you down... ;-)

Here's an example 'robots.txt':
CODE
# osCommerce robots.txt

# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/

# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/admin/
Disallow: /catalog/download/
Disallow: /catalog/elmar/
Disallow: /catalog/pub/
Disallow: /catalog/account.php
Disallow: /catalog/advanced_search.php
Disallow: /catalog/checkout_shipping.php
Disallow: /catalog/create_account.php
Disallow: /catalog/login.php
Disallow: /catalog/password_forgotten.php
Disallow: /catalog/popup_image.php
Disallow: /catalog/shopping_cart.php


Have fun! And happy 'spidering'...
Matthias
mhormann
Two more tips here, since this darned forum only lets me edit my post twice...

13. Always start from the 'base' path—reduce ambiguity.
Some spiders would probably do what you want when specifying things like
CODE
# BAD: Ambiguous
User-agent: *
Disallow: secret.php

Some would assume it means 'secret.php' in every directory, some would ignore it, some would only compare it to '/secret.php' ...
ALWAYS be specific and start at the web root, i.e. with '/'!
CODE
# GOOD: Always start at your web root
User-agent: *
Disallow: /secret.php
Disallow: /catalog/secret.php
Disallow: /catalog/admin/secret.php


14. Google's new 'wildcard' exclusion system
Google now allows 'wildcards' to be specified like '*.cgi'. DO NOT assume this will work with any other spider! Try to keep it as simple as possible, using rules that are easy to understand for every spider.
If you really want to target special functions for special spiders, ALWAYS target them specifically, i.e. use a separate 'User-agent:' part.
CODE
# BAD: Assuming they all understand it
User-agent: *
Disallow: *.cgi

CODE
# GOOD: If you have to... address it specifically
User-agent: Googlebot
Disallow: *.cgi

User-agent: *
Disallow: /cgi-bin/
Disallow: /secret/secret.cgi


15. DO NOT put each and every file in 'robots.txt'!
I have seen 'robots.txt' files with more than 4000 entries, specifying each and every .html or .php program file in them. DON'T DO THAT.
It is much better to exclude complete directories or single 'critical' files.
Bots tend to turn away on too long 'robot.txt' files and probably never come back...
TCwho
Very Good Information mhormann!

Very Much Appreciated biggrin.gif
miguel_os
Thank you very much!!!!.
mhormann
You're very welcome.

Seeing people happy is actually a wonderful Christmas present for me. And working together with people all over this globe in peace...
wheeloftime
QUOTE (mhormann @ Dec 21 2004, 07:35 AM)
You're very welcome.

Seeing people happy is actually a wonderful Christmas present for me. And working together with people all over this globe in peace...
*


I just found this information about robots.txt and it is really helpful in setting this up for my site. There is one thing however which I do not completely understand and that is where to place this robots.txt and how to refer to the different 'disallow' directories.
On my host I have a httpdocs directory underneath which I have my old shop and the soon to be osC shop. When people acces my domain (www.mydomain.nl) they access the index.* file from the httpdocs directory so I assumed this was my root. When FTP'ing however I can go one level back seeing ie. the httpdocs directory, the cgi-bin directory etc.
With this in mind I changed the robots.txt to:
CODE
# robots.txt for Wheel of Time
# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /httpdocs/catalog/

# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /httpdocs/tmp/
Disallow: /httpdocs/stats/
Disallow: /httpdocs/plesk-stat/
Disallow: /httpdocs/media/
Disallow: /httpdocs/siteadmin/
Disallow: /httpdocs/catalog/admin/
Disallow: /httpdocs/catalog/download/
Disallow: /httpdocs/catalog/pub/
Disallow: /httpdocs/catalog/account.php
Disallow: /httpdocs/catalog/advanced_search.php
Disallow: /httpdocs/catalog/checkout_shipping.php
Disallow: /httpdocs/catalog/create_account.php
Disallow: /httpdocs/catalog/login.php
Disallow: /httpdocs/catalog/password_forgotten.php
Disallow: /httpdocs/catalog/popup_image.php
Disallow: /httpdocs/catalog/shopping_cart.php


The robots.txt I placed in /httpdocs however as that is the 'root' everyone gets when accessing the site but I am not sure if this will also be the case for the visiting robots ? Did I do wrong to add /httpdocs in front of everything making this not work at all or is it as it should be ?

Thanks in advance !
gabrielk
Wheeloftime,

This is incorrect. It's from the WEB root, not the server path.

So even though you FTP your files to /httpdocs/ your *web* root is still /

Example: the administrator of these forums probably FTPs the forum files to something like:

/public_html/forums/

However, the web root is still simple:

/

Which, in robots.txt, is the same thing as: http://forums.oscommerce.com/

So you will want to upload your robots.txt file in the same directory as your index.* page, and you should treat that as your root.
wheeloftime
QUOTE (gabrielk @ Apr 5 2005, 09:01 AM)
Wheeloftime,

This is incorrect.  It's from the WEB root, not the server path.

So even though you FTP your files to /httpdocs/ your *web* root is still /

Example: the administrator of these forums probably FTPs the forum files to something like:

/public_html/forums/

However, the web root is still simple:

/

Which, in robots.txt, is the same thing as: http://forums.oscommerce.com/

So you will want to upload your robots.txt file in the same directory as  your index.* page, and you should treat that as your root.
*


Gabriel,

Thanks for coming back on this rather old topic ! I figured out it had to be as you describe and that's where I have placed the file.

regards,
Howard
wheeloftime
QUOTE (PandA.nl @ Apr 5 2005, 12:32 PM)
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.

Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.
*


Helder.
Thanks also for this extra information !
wheeloftime
QUOTE (PandA.nl @ Apr 5 2005, 03:21 PM)
Geen dank, geen hulde, geef me liever een gulde biggrin.gif
*


Eurootje dan toch nog biggrin.gif
Maar dat rijmt nie zo helaas.
Panic36
I dont get the disallow stuff, I mean doesn't that just allow people to read a file and know all your sensative spots to try and exploit? I completely removed, renamed, but I don't disallow it on search engines... IF it isn't linked anywhere how can google find it?

Robert
PandA.nl
QUOTE (Panic36 @ Apr 6 2005, 09:18 AM)
I dont get the disallow stuff, I mean doesn't that just allow people to read a file and know all your sensative spots to try and exploit? I completely removed, renamed, but I don't disallow it on search engines... IF it isn't linked anywhere how can google find it?
*
Exactly,

The dissallow is only ment for linked pages that you don't want indexed for whatever reason (saving bandwidth for example). Only nice bots listen to it, other ignore, or might even search for dissallowed files and dirs.
moonbeam
I ran the validator at searchengineworld.com and got this:

Were sorry this robots.txt does not validate
Warnings detected 238
Errors detected 322

This is a stock page from OSC 2.2. I have not modified anything.
I guess I am going to have to learn. This seems like alot of errors?
Any advise? thanks, Moon
moonbeam
QUOTE (PandA.nl @ Apr 7 2005, 04:58 PM)
A stock osC install does not have a robots.txt file included (which is logical because the robots.txt must be in the root, while the catalog may be located elsewhere), so I guess it's another robots.txt file you have that produces all those errors.

url?
*


Oh my...
I have no idea? Could I have picked up the robots.txt in a contribution? Now I am more confused...
moonbeam
What do I need to do now? Should I remove the file and find a replacement or should I try to edit it? I surely don't know where it came from.
moonbeam
Ok, It does not exist... Thank you for setting me straight, I will do just what you say. Off I go to build my own robots.txt. Wish me luck!
PandA.nl
QUOTE (moonbeam @ Apr 8 2005, 01:25 PM)
Wish me luck!
*
Good luck! biggrin.gif
ptrinephi
QUOTE (PandA.nl @ Apr 5 2005, 11:32 AM)
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.

Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.
*


Great tips here.

When a bot gets in your site does it come in through the index page and tries all the links recursively up and down? And if so, a robots.txt file like the one below is overkill?

QUOTE
User-agent: Googlebot-Image
Disallow: /

User-agent: *
Disallow: /admin/
Disallow: /downloads/
Disallow: /images/
Disallow: /includes/
Disallow: /pub/
Disallow: /session/
Disallow: /temp/
Disallow: /templates/
Disallow: /webstats/
#
Disallow: /account.php
Disallow: /account_edit.php
Disallow: /account_history.php
Disallow: /account_history_info.php
Disallow: /account_newsletters.php
Disallow: /account_notifications.php
Disallow: /account_password.php
Disallow: /add_checkout_success.php
Disallow: /address_book.php
Disallow: /address_book_process.php
Disallow: /advanced_search.php
Disallow: /advanced_search_result.php
Disallow: /affiliate_affiliate.php
Disallow: /affiliate_banners.php
Disallow: /affiliate_clicks.php
Disallow: /affiliate_contact.php
Disallow: /affiliate_details.php
Disallow: /affiliate_faq.php
Disallow: /affiliate_intro.php
Disallow: /affiliate_logout.php
Disallow: /affiliate_password_forgotten.php
Disallow: /affiliate_payment.php
Disallow: /affiliate_sales.php
Disallow: /affiliate_show_banner.php
Disallow: /affiliate_signup.php
Disallow: /affiliate_summary.php
Disallow: /affiliate_terms.php
Disallow: /checkout_confirmation.php
Disallow: /checkout_payment.php
Disallow: /checkout_payment_address.php
Disallow: /checkout_paypalipn.php
Disallow: /checkout_process.php
Disallow: /checkout_shipping.php
Disallow: /checkout_shipping_address.php
Disallow: /checkout_success.php
Disallow: /configure.php
Disallow: /contact_us.php
Disallow: /create_account.php
Disallow: /create_account_success.php
Disallow: /down_for_maintenance.php
Disallow: /download.php
Disallow: /gv_redeem.php
Disallow: /gv_send.php
Disallow: /info_shopping_cart.php
Disallow: /links_setup.php
Disallow: /login.php
Disallow: /logoff.php
Disallow: /password_forgotten.php
Disallow: /paypal_notify.php
Disallow: /popup_coupon_help.php
Disallow: /popup_image.php
Disallow: /popup_search_help.php
Disallow: /product_notifications.php
Disallow: /product_reviews_write.php
Disallow: /redirect.php
Disallow: /shopping_cart.php
Disallow: /shopping_cart_help.php
Disallow: /shipping_estimator_popup.php
Disallow: /tell_a_friend.php


You mention not to put /admin in the file. Fine, it's password protected (.htaccess). What about the other directories? (includes, temp, download...)

So if I undestand this right, the best way to create an efficient robots.txt file is to manually browse your site, follow all possible links and write down the ones you don't want bots to look at and ignore all the files and directories that are not linked by any pages? Is that right?

Thanks
sleahcim
Excellent post Matthias.

I would like to add something for clarification because there are some misconceptions about what the robot actually does when it gets to your site. Yes, when it gets to your site the first thing it does (at least the good bots like google) is look in your root directory for robots.txt.

If it finds it it then scans the file for exclusions and what you dictate you would like to be seen.

Whether or not you have a robots.txt file the bot then FOLLOWS THE LINKS WITHIN YOUR SITE. It does not view directories and files in which you do not have links or that aren't linked from other sites. In other words it cannot see directories like you can in your FTP program. It only sees files that you have built links to.

Why would you want to omit a file from the bot's? One excellent example is the larger picture popup page you get when you click on the "view larger image" on your product page. And why wouldn't you want the bot to index a popup? If it appears in a page of search results on Google (or other index site) and your potential customer clicks on the link then they are taken to your little picture with no means of navigation or understanding of where they are. Those who didn't include robot.txt files excluding these pages are now having to create hacks to redirect customers to the home page of their stores.

My 2cents.
nicedeals
good work guys..keep it up...
I'm a newbie here so hope you guys can help..

The problem!
I have submitted my website " www.nicedeals.co.uk " to a few search engines and have added the meta tag contributions which allows you to add meta tag title, description and keywords in the admin panel under "edit". Now I have a few issues with the rankings which I believe could be related to this topic. I'll deal with msn and google here as yahoo is an altogether different ball game I'm told.

MSN;
msn has indexed the first categories from my website e.g. if i search for " vauxhall astra body panels " it will rank me No.1 and take me to my website page " http://nicedeals.co.uk/caraccessories/nfos...7661882d85c1a0f " wich is the main category under " body panels and lamps " covering all the vauxhall models. but if I type " astra body panels" it will rank me lower down although I have specified a more specific meta tag just for vauxhall astra and it doesn't index the astra page either which should be;
" http://nicedeals.co.uk/caraccessories/nfos...972f78d25951867 "

Google;
google on the otehr hand is not ranking me anywhere near the first pages and will only show my site if I was to type something like " vauxhall nicedeals " and even then the more specific pages for astra will be in the ommited search results if atall present.

Now correct me if I'm wrong but this robots.txt file if used in a way where only the specific vauxhall models i.e. astra , calibra pages etc are only allowed to be read by the bots should increase the website ranking for specific searchees like" astra bonnet " etc.

Thank you so much for reading and any help geatly appreciated.



thumbsup.gif
FixItPete
Just a thought on this... why isn't cookie_usage.php disallowed too? Doesn't it seem that this is what causes bots to hit a wall?
jashnu
QUOTE (mhormann @ Dec 18 2004, 03:40 PM) *
# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /usage/
Disallow: /catalog/admin/
Disallow: /catalog/download/
Disallow: /catalog/elmar/
Disallow: /catalog/pub/
Disallow: /catalog/account.php
Disallow: /catalog/advanced_search.php
Disallow: /catalog/checkout_shipping.php
Disallow: /catalog/create_account.php
Disallow: /catalog/login.php
Disallow: /catalog/password_forgotten.php
Disallow: /catalog/popup_image.php
Disallow: /catalog/shopping_cart.php[/code]

Have fun! And happy 'spidering'...
Matthias


Hi, just a simple newbie question. What should the rest of the file look like? I mean how do you put the spiders in the end of the file. I had an error when I was using just a list of spiders. TIA.

-Jani
enigma1
the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455
jashnu
QUOTE (enigma1 @ Apr 30 2006, 01:15 AM) *
the spiders use a different file. Look into your catalog\includes\spiders.txt
http://www.oscommerce.com/community/contributions,2455


I meant robots.txt file, not spiders.txt
enigma1
via the user agent for example

CODE
User-agent: Googlebot-Image
Disallow: /
RC_Nut
Ok here goes... questions from a newbie

I do no have a catalog directory. I guess better wording would be I dont have a directory named catalog. Should I have one ?

RE:- Robots.txt

I have installed my store in a sub domain of my site. The subdomain is shop.mydomain.com . Should my robots.txt file be in the root directory of subdomain of the OSc install ? Should I have a separate robots.txt file for my main site that does not include my subdomain.
rin67630
QUOTE (wheeloftime @ Apr 5 2005, 01:37 PM) *
QUOTE(PandA.nl @ Apr 5 2005, 12:32 PM)
I would not add the location of your admin (or any other non public directory) to the robots.txt file! Showing this kind of information in your robots.txt file (which anybody can read), makes your site less safe.

Robots only would get there if there's a link to it (and obviously there shouldn't), and if a robot finds/tries it anyway, for whatever reason, the .htaccess protection won't allow it in, so the robots.txt file does not add anything usefull to that.


I have put the "admin" path in the robots.txt.

... and the honeypot ist exactly there! biggrin.gif
owl17sb
I have read through all this and I am now completely confused.

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
smile.gif
WatchPart
QUOTE (owl17sb @ May 21 2006, 10:36 AM) *
I have read through all this and I am now completely confused.

Is there are idiots guide as to what you need to do on these sort of things. I stumbled across this thread by accident and I was completely unaware that I would need to write a file called "robots.txt". Is there anywhere that will show be how to start writing it and what needs to be included and excluded?

I don't mean to be thick but how would I know this needed to be done? Are there any other things that I should be aware of?

Is there a complete idiots guide as to what to do and what not to do to get a website up and running and safe and secure?

I appreciate all this help is in people spare time but any pointers on this would be greatly appreciated.
smile.gif


I agree with you is there a sample robot.txt what we can edit ourselfs?
smithveg
Thank alot to all the discussion. I get better understand to this topics.
Tough, i still have a couple of questions which i don't understand.

1. what's the mean to google's new 'wildcard' exclusion system. the thread said google now allows 'wildcards to be specified like '*.cgi'... Can i know what is the purpose of 'cgi-bin' directory? it's use for what?

2. will the .htaccess will block the spider's way to view our sites? because i saw htaccess in every single folder. However i'm not using htaccess to secure my folder. Instead i'm just uses the hosting site;s service - 'password protected directory' function. I'm just afraid that .htaccess file will block the spider's ways to suft my pages.

3. how can i force the bot to index my 'view large image' pop up? i heard it can easily appear in page of search result on google (or other index site)... Do i need to add some extra coding in popup.php to helps my ranking?

4. from a thread said "Those who didn't include robots.txt files excluding those pages are now having to create hacks to redirect customers to the homepage of their stores"... I'm not sure what's means, robots.txt file is NECESSARY to your pages is it?

5. 'Back Link' is from others site like to us. How about Anchor, is it <a href.....></a>. This is anchor that they meant to?

I know this post was long long ago. No matter how, i hope somebody will come here and give me some ideas to solve all these questions...

Thankyou
smith thumbsup.gif
spudevo
Hey guys can anyone tell me if my robots.txt looks ok??


# robots.txt for Wheel of Time
# Currently disallow all shop stuff to the Google Image bot
# Mainly image hunters anyway, they eat up bandwidth...
User-agent: Googlebot-Image
Disallow: /cgi-bin/
Disallow: /httpdocs/

# ALL search engine spiders/crawlers (put at end of file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /httpdocs/temp/
Disallow: /httpdocs/admin/
Disallow: /httpdocs/download/
Disallow: /httpdocs/pub/
Disallow: /httpdocs/account.php
Disallow: /httpdocs/advanced_search.php
Disallow: /httpdocs/checkout_shipping.php
Disallow: /httpdocs/create_account.php
Disallow: /httpdocs/login.php
Disallow: /httpdocs/password_forgotten.php
Disallow: /httpdocs/popup_image.php
Disallow: /httpdocs/shopping_cart.php
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.