We have just hit a problem with our ESM where without warning we cannot login or unlock existing sessions. Authentication attempts just do nothing.
Executing "service cpservice stop" changes the behaviour and a login attempt generates an error message.
After several hours we still did not have a stopped service so we interrupted the script and killed the two cpservice related processes and then executed "service cpservice start".
This initiated a database rebuild that went on for about an hour and then we had access to the ESM again.
About 36 hours later we find ourselves in the same position with an unresponsive ESM.
So a couple of questions:
Available disk space is shown below:
|Filesystem||Size Used Avail Use% Mounted on|
|/dev/sdb3||1.9T 9.1G 1.8T 1% /|
|/dev/sdb1||975M 136M 790M 15% /boot|
|/dev/sdc1||7.3T 3.4T 4.0T 46% /data_hd|
|shm||32G||0 32G 0% /dev/shm|
|/dev/sda||445G 313G 133G 71% /index_hd|
Do you have a large amount of alarms and cases that are open or even automated reports that are running?
If you can keep a few ssh terminal sessions into the ESM run the below in each terminal after you restart and can log into the GUI might give you a clue.
# tail -f /var/log/messages
Also, check all your receivers disk usage under /var/log/data/inline/thirdparty.logs/<###> disk usage. You could be overwhelming the system with EPS??
We are running a number of ssh sessions now to monitor some logs and performance.
We are well within the eps ratings for all devices although we have added a few new devices just before the problem started...these are not generating huge numbers of events.
The attempt to stop cpservice never completes. We left it for about 60 hours and then killed the two cpservice processes and restarted the service.
Nothing obvious in the logs so far.
Now that we have the ESM back functioning again we will see if a stop of cpservice will complete because if it doesn't we may have issues if we attempt to upgrade to the latest version.
Update: the cpservice stops cleanly now.
Actually, We are having the same problem and it's basically a bug.
- (Bug 37936)ESM is not responding
We've gotten to the point whereas, until there a fix, per McAfee Engineering, when we encounter this we have to kill cpservice
Open HTOP and note the PID of cpservice
kill -9 %PID%
Ours started on the 10th of this month.