We have been effected by the problem of WEB spider/crawlers as they continuously keep on collecting data from our website and use the content on their portals. Since in my setup i don't have a WAF deployed i thought of making a co-relation rule which might help in determining these software.The website i am trying to protect is used by a very large no of user over the day.
Since the characteristics of a Web Spider/Crawler is to go through a web site at a very rapid speed during a short span of time , we can use this create an alert.
for example :
group by Source IP , Destination IP
command = GET,PUT
duration = 10 seconds, distinct value : 250
From this alert we can create a Watch-list , which can be then used to monitor activities by these IP address.
Data byte sent
No of request.
Though i m little circumspect whether the duration filter will work for this small a duration or not. Also there may be chances of a false positive as ISP uses NAT for providing internet to user and may end u hitting threshold defined by us.
Do tell me what you think of this and if any thing can be further added to this.
Don't forget, when your helpful posts earn a kudos or get accepted as a solution you can unlock perks and badges. Those aren't the only badges, either. How many can you collect? Click here to learn more.
Community Help Hub
New to the forums or need help finding your way around the forums? There's a whole hub of community resources to help you.