robots.txt file specifies the directories and files that you want search engines to crawl, and it excludes those that you want crawlers to ignore. Make sure the contents of your robots.txt is providing appropriate indexing instructions to crawlers. Here’s an example of robots.txt file’s instructions:
User-agent: * # Block the following directories from being indexed Disallow: /css/ Disallow: /dita/ss/ Disallow: /dita/dtd/ Disallow: /flash/ Disallow: /includes/ Disallow: /scripts/ # Block the following pages from being indexed Disallow: /directory/file.html # Block the following file extensions from being indexed Disallow: /*.js$ Disallow: /*.txt$
Structure of a Robots.txt File
The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:
“User-agent” are search engines’ crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.