Jenkins is a hit with the Java developer folks http://ftp-chi.osuosl.org/pub/jenkins/war/1.534/jenkins.war
Unfortunately, this 60MB file with about 1300 files is causing developers to complain loudly about scan times exceeding 1200 seconds.
This body.size and content-length is such that it's getting both the composite opener and AV. Composite opener is invoked for Body.NestedArchiveLevel is <5.
Anyone more familiar with these things than me hazard a guess on why this archive is so evil? What tweaks to web gateway rules would you suggest to improve this situation?
the main problem with this archive is that contains embedded jar files, that also has many embedded objects. The overall count of embedded objects that are extracted by MWG is about 45,000... And every object should be scanned by AV individually
You can workaround this problem (if it's ok for you) following way - enter into AV ruleset only if Timer.TimeInRuleEngine less than some threshold, or use the rulelike "if Timer.TimeInRuleEngine is greater than threshold" then do "Stop Cycle" near the begin of rules - this will skip all rules if processing of an archive is taking too long...
Thanks Alex. How does that primitive work? If all files are getting individual scans as they run through the rule engine (which cycle is that considered anyway? )... would that TimeInRuleEngine ever get that big for any individual file, or does this somehow magically aggregate for a given URL?
the .war file itself is downloaded in the response cycle. In the response cycle the archive should pass a rule which enabled the composite opener. The composite opener starts to extract the file and run each file through the rule engine as an "embedded" cycle run. If in the embedded cycle another archive is hitting the composite opener, this causes another bunch of embedded cycle, until the file is finally completely scanned. The embedded cycles still "belong" to the response cycle, so the transaction is the same - therefore I assume (unfortunately I don't know for sure) that TimeInRuleEngine will keep increasing until the response cycle or the transaction has finished.
We have an additional rule that you MAY want to try. If you send an archive with three members in it generally MWG will scan the complete archive AND all archive members. With the rule you can skip scanning an archive in case MWG is able to extract and filter its members. If all members are clean, the complete archive should be clean as well. This COULD reduce the amount of time required for scanning, I have provided this to a few customers but so far I haven't seen much feedback if the rule is used or if it helps at all.
If you want to give it a try, just let me know.
I just downloaded the war file mentioned above... It took around 1400 seconds on my (small) VM. I added the "optimization" rule and the file went through in around 700 seconds. So for this specific file the optimization rule works quite good.
I have experimented with the "Rule Engine Timer", but I don't really get how it works... I try to find out some more details on this.
By using a stop watch and some rules I was able to configure a timeout. If the .war file is scanned for more than X seconds (where X is whatever I like) I can pass the file (with the remaining files unfiltered) or block the file (whichever is the desired behavior).
So there are a couple of possibilities here to influence what the users see.
We're facing the same problem. Most of the time, ZIP archives containing jar files takes several minutes to scan.
Where can I get the additional ruleset? Do I have to submit a ticket to the support?
There have been improvements in 7.4.1 (currently in beta II) to the scan engine that will help with scan time.
Testing this on a virtual machine of mine, scan time was down to 840 seconds (14 minutes) -- so it shaved off about 5 or so minutes and this is on a smaller virtual machine.