During the last days I had a lot of Webwasher 6.8.x crashes (or traffic overload) on different customer installation. Mostly of the time this was caused by the FIFA world.
Generally the customer just restart the daemon and the problem is "solved..".
But, on one installation I had the chance to collect some data before the restart that I would like to share with you.
I notified a lot of Webwasher threaths and especially a lot of ESTABLISHED TCP connections:
1) # netstat -an|grep EST
(more than 1000 established TCP connection)
But, almost more than 90% caused by the same destination IP's (located in CZ):
22.214.171.124 -> p1.diretta.it, or p1.livesport.cz
126.96.36.199 -> p2.diretta.it, or p2.livesport.cz
188.8.131.52 -> p3.diretta.it , or p3.livesport.cz
I performed a tcpdump (attached to this post). As you can see this is a simple java script that perform forever a sys.utime funtion.
The interesting part, is that even when all client browser are closed all TCP connection and Webwasher threaths remains open forever.This flood the threaths and TCP connections resulting at the end to a Webwasher overload.
Why ? When the cachign is disabled and no client connection is open, every threth and TCP connection should automatically close, or not ?
For testing you can use following main URL's:
Thanks for any explanation, best regards
without taking a deeper look I am pretty sure that this is related to some kind of HTTP Live ticker. We have seen this for several URLs in the past, this technique was mainly used for Stock Tickers in the past, but for the World Cup it also seems to be useful. What happens is in general that the Client opens a TCP connetion to the server, which is left open with no data going through. Wenn something happens and the server wants to update the client, it uses the existing conenction to push data from the server to the client.
So in the past things like that worked like doing a META refresh every couple of seconds or so, but because today a few seconds delay is not acceptable and everything needs to be a Live Push Stream some services make use of this technology.
The problem here is the filtering itself, so if you put these URLs to the ICAP Bypass they work fine, as the "dead" connections will be detected by Webwasher and closed. With filtering enabled the ICAP filter thread waits for data even if the Client connection has been closed. I have not yet fully understood the issue, but from what I got so far is that there is no explicit reference between the Client connection and the Server connection, so when we perform ICAP filtering we do not correctly recognize if the Client has closed his session, and the server session remains active until it is closed from the Server side. The server may keep these connections opened for a long time (or even forever) and we will continue gathering the data in memory for scanning, which may lead to overload conditions etc.
MultiProcess will take care of this, but on SingleProcess machines this problem may become visible by Webwasher going into Overload protection.
I recommend to put the identified links
to the ICAP Bypass. I will check these links. We have some automatic mechanisms coming with the AV Updates that helps to avoid these problems. We have already seen p*.flashsports.co.uk and p*.ergebnisselive.de causing similar problems, and I bet the problem is the same. I will make sure they will be analyzed and handled correctly as well.
yes I made the investigation with p*.flashsports.co.uk and p*.ergebnisselive.de. Both are causing exacly the same problem. Of course I already added the ICAP bypass, but I looking for a more complete workaround as this can happen for a lot of URL's.
I tried to solve the issue with the new 6.8.7 "rate limite" feature by limiting the number of concurrent connection. This work as expected, but as it count ALL concurrent connections and not only the one coming from a specific source or destination, we'll block all traffic for the specified group.
Can you tell me more about other workaround or when and what we can do with the AV pattern update.
there is not yet a full solution as it is not yet known if the problem can be solved with the 6.x architecture. We can work-around the problem by tweaking the AV updates on our side, which has helped a lot in the past.
At the moment the problem can only be really solved by switching to MWG7, as the architecture is different and allows to prevent this issue from occuring.
AndreNachricht geändert durch Andre Sabban on 22.06.10 05:22:58 CDT