About Cipinet Bot

1. What is CipinetBot?

CipinetBot is Cipinet's web crawling robot. It crawls the web to collect documents and indexes for Cipinet Directory.

2. If CipinetBot crawls my site, will my site show up in search results of Cipinet Search Engine?

Your website(s) may or may not be indexed. While indexing, CipinetBot filters out certain keywords and domain pattern. For instance URLs with /cgi-bin/, /cgi/ or query string will not be indexed. Also, we do not index adult rated websites. When websites are indexed, database is filtered through additional filter programs. These filters are used to improve the quality of results.

3. Why did CipinetBot crawl my site; what is the purpose of its crawling?

CipinetBot indexes documents on the internet so that they can be used for Cipinet search engine.

4. How can I prevent the CipinetBot crawler from crawling my site?

The Robots Exclusion Protocol prevents Web crawlers from crawling a site. You can place a robots.txt file at the root of your site, for example http://www.yourdomainname.com/robots.txt so that web crawler visiting your site follow the rules.

Before crawling a website, CipinetBot will read robots.txt file. If it finds User-agent: CipinetBot, it will follow the rules whether or not to index web pages. If it does not find the User-agent: Cipinetbot, then it will try to find the User-agent: * and follow its rules.

There is also a Robots META tag that can be put on Web pages. The tag keeps Web crawlers from indexing the page and/or prevents them from harvesting links on the page. For example tells a crawler it should neither index the document nor analyze it for links.

5. Can I request that CipinetBot crawls only part of my site?

Yes, you can place the robots.txt file at the root of your site and define in the file which parts of your site should not be crawled. For example:

User-agent: Cipinetbot
Disallow: /Directory1/

Directory1 can be any directory name if you do not want CipinetBot to index contents of the Directory1.

6. Why do I need a robots.txt on my server?

Robots.txt is the standard (Robots Exclusion Protocol) that instructs Web Crawlers which site/directory should not be crawled.

7. Why did CipinetBot try to access robots.txt on my site?

Robots.txt is the standard (Robot Exclusion Protocol) that instructs Web crawlers which site/directory should not be crawled. Cipinetbot needs to access it first to determine if a site/directory is allowed to be crawled.

8. How can I prevent CipinetBot from following links from a particular page?

CipinetBot obeys the noindex, nofollow meta-tags. CipinetBot will not follow any links that are present on the page if meta tag is placed in the HEAD section of that page.

9. Why did CipinetBot fail to obey the robots.txt on my Web server?

CipinetBot makes every effort to obey the rules defined by the Robots Exclusion Protocol. Occasionally it may fail due to bugs in the crawler program. Also, check that your syntax is correct against the standard at http://www.robotstxt.org/wc/exclusion.html#robotstxt. A common source of problems is that the robots.txt file must be placed in the top directory of the server e.g. http://www.myhost.com/robots.txt . Placing the file in any subdirectory will not have any effect.

10. Why is CipinetBot downloading the same page on my site multiple times?

CipinetBot should only download one copy of each file from your site during a given crawl. If the first attempt to index the page fails due to certain reasons, CipinetBot visits same page again to index it.

12. If I have any other question related to CipinetBot, where do I send my question?

You may contact us regarding CipinetBot by filling our contact us form.