Did you already try to investigate the log /var/log/messages on the ESM ?
If yes, is there any error / warning during the time you faces the issues ?
Do you have a large amount of alarms and cases that are open or even automated reports that are running?
If you can keep a few ssh terminal sessions into the ESM run the below in each terminal after you restart and can log into the GUI might give you a clue.
# tail -f /var/log/messages
Also, check all your receivers disk usage under /var/log/data/inline/thirdparty.logs/<###> disk usage. You could be overwhelming the system with EPS??
We are running a number of ssh sessions now to monitor some logs and performance.
We are well within the eps ratings for all devices although we have added a few new devices just before the problem started...these are not generating huge numbers of events.
The attempt to stop cpservice never completes. We left it for about 60 hours and then killed the two cpservice processes and restarted the service.
Nothing obvious in the logs so far.
Now that we have the ESM back functioning again we will see if a stop of cpservice will complete because if it doesn't we may have issues if we attempt to upgrade to the latest version.
Update: the cpservice stops cleanly now.
Actually, We are having the same problem and it's basically a bug.
- (Bug 37936)ESM is not responding
We've gotten to the point whereas, until there a fix, per McAfee Engineering, when we encounter this we have to kill cpservice
Open HTOP and note the PID of cpservice
kill -9 %PID%
Ours started on the 10th of this month.
By the way, keep an eye on the rror log:
Did anybody found solution for this?
Previously we have hard rebooted the ESM, if anybody knows particular solution and core reason for the problem will be helpful.