We have recently undertaken an upgrade to the McAfee Agent from 4.6 (P3) to 4.8 (P2) in readiness for the end-of-life of MA4.6 and found some interesting results with our SuperAgents with respect to Lazy Caching. Not being a particularly well documented product feature I have been doing some research and found another thread (here) which suggests that as of MA4.8 SuperAgents can now request Lazy Cache content updates from other SuperAgents. As I understand from this same thread under MA4.6 the SuperAgent was forced to update from the Master Repository only. There in lies our problem. The following is a summary of our observations since moving to MA4.8, we have been using for about 4 weeks now and have had this issue manifest around half a dozen times.
- SA#1 receives a client request for content which it does not have so it contacts SA#2 requesting that content
- SA#2 also doesn't have that content so it has to make a request from another SA which coincidently requests that content from SA#1. Both SA#1 and SA#2 effectively get stuck in a loop, each waiting for the other to provide the update
- During that time SA#3, then SA#4 etc. also need content and request from SA#1 and also end up stuck in a holding pattern waiting for a response
- In all cases the SuperAgents will accept connections but in affect provide no response/data, instead appearing to hold the connection to the client open indefinitely. This is evident when attempting to access the Agent Log remotely (via HTTP), you just receive the page loading animated icon.
- The only way to recover is via the agent logs on the SuperAgent to identify the server that is SA#1 & SA#2. Because they cannot be terminated gracefully you have kill the `FrameworkService.exe` and `naPrdMgr.exe` processes on SA#1 & SA#2 and then restart the services. By killing the Framework service you break the chain of SuperAgents waiting for content and they all become responsive again.
I have an open case with McAfee support and have provided various MER logs, ProcDumps etc. an am still awaiting feedback. Currently our McAfee Agent Repository policy for SuperAgents is configured to use ping time to determine which repository to use for updates, in the interim is it possible to use a policy to force all SuperAgents to update their Lazy Cache directly from the Master Repository or is this by design in MA4.8 and cannot be controlled? Also is there a (possibly hidden) configurable parameter to specify a time-out value in the MA to force the connection to be terminated if the content is not provided within a defined period?
Also as a side note I highly recommend that on any client tasks (e.g. DAT updates) you configure the "Stop the task if it runs for x hours" to terminate the task after a period of time, otherwise your end-points will wait for content from a non-response SA and thus the updater wont process any other scheduled updates until the original task has completed.
I also recommend reading this thread (here) for anybody who is interested in gaining a basic understanding of how Lazy Caching works.