6 Replies Latest reply on Aug 9, 2016 8:15 AM by kmc

    ESM unresponsive

    acommons

      We have just hit a problem with our ESM where without warning we cannot login or unlock existing sessions. Authentication attempts just do nothing.

       

      Executing "service cpservice stop" changes the behaviour and a login attempt generates an error message.

       

      After several hours we still did not have a stopped service so we interrupted the script and killed the two cpservice related processes and then executed "service cpservice start".

       

      This initiated a database rebuild that went on for about an hour and then we had access to the ESM again.

       

      About 36 hours later we find ourselves in the same position with an unresponsive ESM.

       

      So a couple of questions:

      1. How long should a "service cpservice stop" command actually take to complete in the worst case?
      2. Is there anything else we should be trying to resolve this?

       

      Available disk space is shown below:

       

      Filesystem        Size  Used Avail Use% Mounted on
      /dev/sdb3         1.9T  9.1G  1.8T   1% /
      /dev/sdb1         975M  136M  790M  15% /boot
      /dev/sdc1         7.3T  3.4T  4.0T  46% /data_hd
      shm                32G 0   32G   0% /dev/shm
      /dev/sda          445G  313G  133G  71% /index_hd

       

      cheers,

      Andrew

        • 1. Re: ESM unresponsive
          ksudki

          Hello acommons,

           

          Did you already try to investigate the log /var/log/messages on the ESM ?

           

          If yes, is there any error / warning during the time you faces the issues ?

           

          Regards

          • 2. Re: ESM unresponsive
            rcavey

            Do you have a large amount of alarms and cases that are open or even automated reports that are running?  

             

            If you can keep a few ssh terminal sessions into the ESM run the below in each terminal after you restart and can log into the GUI might give you a clue.

             

            #  htop

             

            # tail -f /var/log/messages

             

            Also, check all your receivers disk usage under /var/log/data/inline/thirdparty.logs/<###>  disk usage.   You could be overwhelming the system with EPS??

            • 3. Re: ESM unresponsive
              acommons

              We are running a number of ssh sessions now to monitor some logs and performance.

               

              We are well within the eps ratings for all devices although we have added a few new devices just before the problem started...these are not generating huge numbers of events.

               

              The attempt to stop cpservice never completes. We left it for about 60 hours and then killed the two cpservice processes and restarted the service.

               

              Nothing obvious in the logs so far.

               

              Now that we have the ESM back functioning again we will see if a stop of cpservice will complete because if it doesn't we may have issues if we attempt to upgrade to the latest version.

               

              Update: the cpservice stops cleanly now.

              • 4. Re: ESM unresponsive
                pepelepuu

                Actually, We are having the same problem and it's basically a bug.

                - (Bug 37936)ESM is not responding

                We've gotten to the point whereas, until there a fix, per McAfee Engineering, when we encounter this we have to kill cpservice

                Open HTOP and note the PID of cpservice


                kill -9 %PID%

                 

                Ours started on the 10th of this month.

                • 5. Re: ESM unresponsive
                  pepelepuu

                  By the way, keep an eye on the rror log:

                  less /usr/local/ess/data/NitroError.Log

                  • 6. Re: ESM unresponsive
                    kmc

                    Hi,

                    Did anybody found solution for this?

                    Previously we have hard rebooted the ESM, if anybody knows particular solution and core reason for the problem will be helpful.

                     

                    Regards,

                    KMC