2 Replies Latest reply on Jan 29, 2016 9:47 AM by paul.k

    Failing ESM Database

    siemwarrior


      We now have had our McAfee SIEM for 10 months V9.4 HF3. Last couple of months we have been struggling to keep the system afloat, with a list of issues, but the primery issue seems to be loss of data after a database rebuild. Symptoms are normally the same, the system will freeze or logs\events will stop coming into the ESM GUI ( viewable). This will cause us to try and restart the services, cpservices stop and start. But most of the time the services wont stop, will hang on cpservices stop forever and wont get to prompt. So we will then have to force a reboot, when i say force a reboot, often the reboot command wont reboot the server first time and you have to enter reboot command twice... a little strange.

       

      However the outcome after a reboot is the database rebuild, now just to make things clear, we are currently forced to reboot the ESM once or twice a week ( over last months), but a rebuild can also occur between reboots automatically. My assumption is that the rebuild fixes errors with the DB, but more than often, when this occurs i get millions of logs and events wiped from the GUI.

       

      As a current example, this week i have been monitoring the system, logs and event have been coming in as expected Mon - Fri, i checked the DB with command

       

      DBCheck -d '/usr/local/ess/data/ngcp.dfl|127.0.0.1|1111' –c   all was fine, no errors reported

       

       

      then Friday evening logs stopped coming in, so tried to restart service. But cpservice stop command, just hung for 3 hrs, so had to reboot. The ESM went into rebuild database mode and when finished, i lost all events from Monday and Tuesday of that week. But they were there before!!!!!

       

      Been working with McAfee support,, regarding issues around partitions, but this problem isnt going away. To add to this, im thinking that this issue is also causing corruption of files, as over problems are starting to develop like, error when trying to enable syslog auto learn, and some other rules based errors.

       

      Anybody have similar issues and how was it resolved.

       

      Justin

        • 1. Re: Failing ESM Database
          alfa42

          Hello siemwarrior,

           

          In our organisation we are currently experiencing the same issues (currently using v9.5 MR7) with aprox. three datalosses for around 9TB of information .

           

          McAfee support is still trying to figure out what is causing the crashes/hanging of the "cpservice" service and the data corruption. Did you get a solution to your problem or are you still experiencing the same issues?

           

          Regards.

          • 2. Re: Failing ESM Database
            paul.k

            Is this an All in one?

            Are you using an NAS for any of your storage?

             

            We have seen issues per 9.5.0 Mr9 with the NFS mount hanging, causing the ELM stuff to cache and eventually cause the ESM to fall over. We would loose most of the events from the point the ESM falls over and some prior as well due to the rebuild,

             

            Good luck