View Menu

Technical Support Forums

Free, outstanding support from WebAssist and your colleagues

stopping robots from taking data

Thread began 8/13/2009 9:51 pm by aaron322044 | Last modified 8/18/2009 10:52 am by aaron322044 | 1367 views | 4 replies

Danilo Celic

I've not worked with mod rewrite so I could easily be wrong, but it's my understanding that the mod rewrite would be able to change a request to remove the query string, but unless you have specific pages created in the mod rewrite for each possible value of the "num" parameter, then the page won't "know" what the parameter should be and therefore won't be able to display the proper data on the page. That would stop the robot from getting to your content, but also the humans that need it too. ;-)

Another option could be to not use autoincrement for the ID of the item to display, and generate the ID in some other fashion, like a text unique id, perhaps using uniqid
which will create a 13 character long ID, 23 characters long with the more_entropy parameter set to true.

This would require changes to your database, and making sure to generate the ID column value each time a record is inserted into the database.

However, many robots will follow links through out the site to find all of the available data items rather than simply incrementing query string parameter values. If the robot(s) coming to your pages spider rather than increment, then the unique id won't help in that regard, as if there is some way to get to the page, the spider will find it. You'd be better off limiting visits by IP over a certain timeframe.

In addition to limiting the number of requests, you could also "ban" IP addresses where if you find that they are running through your site, then add the IP to a "bad id" table in your DB and then spit back a 404 or some other HTTP error when a request comes through from a bad IP. The problem with this approach is that with some internet connections IP addresses change all the time, and therefore an outright ban could be stopping legitimate traffic. That's why I initially suggested the sleep response which will merely slow down the response to the bad requests. You could even take a progressive approach where if the request come in at over a certain rate per minute/hour/day, then you increase the delay on the response.

HTH

Build websites with a little help from your friends

Your friends over here at WebAssist! These Dreamweaver extensions will assist you in building unlimited, custom websites.

Build websites from already-built web applications

These out-of-the-box solutions provide you proven, tested applications that can be up and running now.  Build a store, a gallery, or a web-based email solution.

Want your website pre-built and hosted?

Close Windowclose

Rate your experience or provide feedback on this page

Account or customer service questions?
Please user our contact form.

Need technical support?
Please visit support to ask a question

Content

rating

Layout

rating

Ease of use

rating

security code refresh image

We do not respond to comments submitted from this page directly, but we do read and analyze any feedback and will use it to help make your experience better in the future.

Close Windowclose

We were unable to retrieve the attached file

Close Windowclose

Attach and remove files

add attachmentAdd attachment
Close Windowclose

Enter the URL you would like to link to in your post

Close Windowclose

This is how you use right click RTF editing

Enable right click RTF editing option allows you to add html markup into your tutorial such as images, bulleted lists, files and more...

-- click to close --

Uploading file...