Jump to content
Server Maintenance This Week. ×

Robots and Spiders


This topic is 7095 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hmm... I used to want to stop bots and spiders from visiting my FM site since I suspected they'd cause it to crash by following long CDML URLs which they'd truncate to the first x many charcters (strong of 128? 256?). I suppose if you WANTED bots and spiders to crawl your site, you could make some CDML hard-coded links with relatively short string URLs and it'd work. Maybe you could even automate or have your site create those CDML strings just for bots by creating hidden or hard-to-find-by-a-human link to these URLs that a bot/spider would follow with glee.

I was going to read up on robots.txt files, too. Maybe there's something you can put in robots.txt to help them.

--ST

Link to comment
Share on other sites

The best way to ensure that search page bots pcik up your site is to add meta-data to your pages. Here's the basic meta-data format that goes in the header:

<META http-equiv=Content-Type content="text/html; charset=windows-1252">

<META NAME="title" CONTENT="">

<META NAME="description" CONTENT="">

<META NAME="originatorJurisdiction" CONTENT="">

<META NAME="originatorDepartment" CONTENT="">

<META NAME="originatorDivision" CONTENT="">

<META NAME="originatorSection" CONTENT="">

<META NAME="originatorOffice" CONTENT="">

<META NAME="createDate" CONTENT="">

<META NAME="dateofLastModification" CONTENT="">

<META NAME="keywords" CONTENT="">

<META NAME="subjects" CONTENT="">

<META NAME="contactName" CONTENT="">

<META NAME="contactOrganization" CONTENT="">

<META NAME="contactStreetAddress1" CONTENT="">

<META NAME="contactStreetAddress2" CONTENT="">

<META NAME="contactCity" CONTENT="M">

<META NAME="contactState" CONTENT="">

<META NAME="contactZipcode" CONTENT="">

<META NAME="contactPhoneNumber" CONTENT="">

<META NAME="contactFaxNumber" CONTENT="">

<META NAME="contactNetworkAddress" CONTENT="">

You can leave blank or delete any meta-data elements you do not want to include.

Link to comment
Share on other sites

Unfortunately the meta tag method still doesn't allow your content to be searched or indexed, which is Google's prefered approach, and what you ultimately really want to search on any given site. Meta tags are notoriously unreliable and increasingly given less value by search engines (though there is some conflicting evidence that Dublin Core is more effective). I don't know whether search engines or bots can execute db queries and index the results, but at a guess I wouldn't think so.

Kevin

Link to comment
Share on other sites

This topic is 7095 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.