4.3f Metadata, search engines and their robots
Meta tags are collected from Web pages by visiting robots; applications that automatically crawl the Internet to index Web pages. These robots are called spiders and they submit the meta tags to a Web index that can be accessed by a search engine. Using the "robots" property of the meta element, one can give instructions to a visiting robot as to how it should crawl the Web site. The robot may or may not respect these instructions.
The format is very simple:
where "value" is replaced by one or more instructions (keywords) which provide the directions to the spider robot. Multiple instructions are separated by commas.
The most common set of instructions would be:
As this is the default behaviour for robots, you do not have to include this tag.
You obviously should be careful not to specify conflicting or repeating directives such as:
In addition to server-wide robot control using the file "robots.txt", it is possible to specify certain pages that should not be indexed (by search engine spider robots), or that the linked pages should not be indexed. The robots meat tag, placed in the HTML "head" section of a page, can specify either or both of these actions.
Most spider robots will recognize this tag and follow the rules for each page. Included below are the most commonly used values:
A robots.txt is a regular text file that is uploaded to the root directory of Web sites that by defining a few rules can instruct robots to not crawl or index certain files and directories. Each unique domain can have only one robots.txt file and clients must contact the WebGuide if any changes are required.