A temporary "hang" would coincide with the crash, because the on-access scanner is restarted within a short timeframe.
When the OAS is restarting and initializing it will have noticeable impact on any other activity the system is undertaking, because the OAS threads run at High priority - everybody else takes a back seat. If it is not temporary, that is a different issue.
The scanner should not be crashing on its own or other McAfee files.
It's easy to assume it is scanning these files, but that may not be the case. Timeouts not only occur from scanning, but can occur from attempting to get access to the file and being blocked until the fatal timeout threshold is reached wherein the process self-terminates.
Configuring the scanner to avoid these files may prevent the issue occurring but you may be masking a problem that needs investigation/resolution.
I would suggest insisting that McAfee Support work with you closer to get to the root of the problem. That is, if you are willing to see that investigation through because it will require your assistance.
We actually have been working with Support for a month. The issue may be resolved or minimized.
Within ePO under Configuration, Server Settings, there is a setting global updating, it was set to enabled, we were asked to change to disabled.
This seems to have eliminated the crashes that we were seeing as well as much of the slowness associated with the crashes.
Now just to put a point on the slowness, we are speaking of the system hanging for 8 minutes, not a scan timeouts and poor performance for 30 - 60 seconds, this was flat out, no response to keystrokes, task and perfmon unrepsonsive.
Curious...can you give more details on your environment? If you look at my first response above, you'll see I already discovered the Global Updating fix within our environment. I am wondering if your environment is similar to ours.
When we began to notice our issues We were in transition from 8.5 to 8.7. Initially there were very few users reporting any issues. We completed the 8.7 deployment except for a few stragglers. Our hardware ranged from P4 3.2Ghz with 512Mb to Core2 with 8Gb all running the basic same XP SP3 image.
We use ePO 4.0 and agents mostly at 4.0. When we first notices the issue around June 2009, we were mostly at 8.7 no SP, agent 4.0 and scan engine 5300.
We got much more interested when we found increasing numbers of people reporting slowness issues and were able to find a way to reproduce it. During the investigation we were able to correlate the applications logs and the the hanging systems.
In our efforts to prepare for the normal things we are asked to do we tried hotfixes, latest apps, patches, agents etc. In the end we were at 8.7SP2, Agent 4.0.1494, engine at 5400. All to no avail.
So at this point short of uninstalling AV we had the issue once a day.
Oddly it was once a day that it was haning, thus why it was difficult to make a change and get immediate results.
McAfee worked with us to make some process high and low on-access policies, and exclude some files, again to little improvement.
We did not have anything that we knew of running on a schedule at the time when we were seeing our issue. But apparently the global updating was the issue. We were directed to disable the global update and the issue is gone to the best of our knowledge.
There was a KB that came out a week or so ago, while it may not directly relate to you at version 8.5, and the 8.5 policies dont have this option.
Which basically says that you should disable, On-Access Processes unless you need very high security.
What I am hearing from people having this in my organization is that it happens after the time when our server has gone out and updated the repository with the daily DAT file. Then when the machines come around to checkin and pull the update that is when they are hanging. It happens at the top of the hour within a minute or so and it appears to be the top of the next hour after the DAT updates to EPO. Not sure if this helps in determining WHY it is happening but its fairly consistant with people experiencing the hard locks. In the mean time I wondered if I set affinity for a single core via task manager that it might keep the machines from completely flat lining until I can work out a full resolution for this problem.
I agree with you now that I see what they had us do.
Our lock was a few minutes after the hour (noon) ocassionally at 6pm, our global updates were set at about the same time, noon.
We also had jobs to pull from product updates from McAfee in the automation on 7 hour intervals, so disabling the global update, which none of us remember making any changes to, and it had been operational fro two years or more, seems to have stopped our issues.
The question that I have is why is the issue happening when glabal updates are done, but not when the automation updates are done?
We are still pulling updates through the server automation, the clients are getting the updates when they check-in, but there are no reported hangs. The workstations are still getting the updates within 10 minutes of the automagically checking into ePO.
We are told that McAfee tends to put updates out at noonish and somedays around 6pm, we were not told which timezone that was, but I think it was relative to our time zone, PST.
We did not try to set the affinity to a CPU core.
In my scenario I had global updating set to the default of 20 minutes. Running 8.7i, I had already disabled the "Processes on enable" option which greatly reduces the impact of a DAT update. What was really affecting us was the fact that we have a lot of virtual servers running on a SAN (n-series) and when the dat updates would occur, there was huge I/O on the SAN causing major hangs. After changing the time from 20 minutes to 1 hour, all was resolved for us. 20 minutes was just not enough time to guarantee that the updates would be spread out enough to not have an impact on the SAN.
"I had global updating set to the default of 20 minutes" --> changed to 1 hour
Thanks for your post Jeff,
We had the same issue end it seems to have improves things a lot.
Still observing... but this parameter seems crucial for lots of VMs and San configurations ! (here thousands of VMs, VMware 3.5, NetApp san, VS 8.7 patch 4)Ce message a été modifié par: mozzie on 23/12/10 11:57:01 CET
Resolved on my 3 years old (age makes sense) DELL Latitude basically by inspecting Windows Application log.
In the log I saw the McAfee scanning engine repeatedly attempting to scan a file in \Device\HarddiskVolume2\Documents and Settings\All Users\Dell\UCM\ called SMManager.txt.
I went to the location of the file which belongs by the way to Dell's Connection Manager useless application and saw it sitting at a size of > 2.5GB obviously too large to (presumably lift into isolated memory storage for scanning) to handle by my 32 Bit hardware.
Deleting the file helped resolve the unexpected stalls and restarts of McAfee.
As a side note: thumbs down to both - the DELL software devs of the Connection Manager and a deep sigh McAfee, too because the engine supposed not to stop no matter what. This for example prevented me connecting to my corporate network and resulted in lost productivity.