    Capturing Web Spider/Crawler


      Hi Friends


      We have been effected by the problem of WEB spider/crawlers as they continuously keep on collecting data from our website and use the content on their portals. Since in my setup i don't have a WAF deployed i thought of making a co-relation rule which might help in determining these software.The website i am trying to protect is used by a very large no of user over the day.


      The thought/Rule.


      Since the characteristics of a Web Spider/Crawler is to go through a web site at a very rapid speed during a short span of time , we can use this create an alert.


      for example :


      group by Source IP , Destination IP

      command = GET,PUT

      duration = 10 seconds, distinct value : 250


      From this alert we can create a Watch-list , which can be then used to monitor activities by these IP address.


      such as

      Data byte sent

      No of request.


      Though i m little circumspect whether the duration filter will work for this small a duration or not. Also there may be chances of a false positive as ISP uses NAT for providing internet to user and may end u hitting threshold defined by us.


      Do tell me what you think of this and if any thing can be further added to this.