1 2 Previous Next 11 Replies Latest reply on Sep 19, 2017 5:55 AM by jamesmac

    SIEM : Daily Health Check




      Please let me know on daily basis what should we check so that we dont miss anything.


      Kindly share checklist for health check of all the devices : SIEM ESM, ELM, Rec, EPO.


      Thanks in advance.

        • 1. Re: SIEM : Daily Health Check

          Just as a first draft to start the discussion, I would say :


          - The collection rate per device, to bring to light any behaviour under (quiet device) or above (noisy devices) the baseline.

          - CPU, Memory, Disk usage, temperature, & co.

          - DB Usage

          - Device outages / availability


          All compared with their own baselines

          • 2. Re: SIEM : Daily Health Check

            it would be great to have comprehensive dashboard for health checks...

            • 3. Re: SIEM : Daily Health Check

              This kind of dashboard would sell me on upgrading to 9.5 and using a content pack to get the health check dashboard.

              • 4. Re: SIEM : Daily Health Check
                LT McGary

                Has anyone created a report or dashboard that provides this information automatically?

                • 5. Re: SIEM : Daily Health Check

                  This is one of my most immediat goals. I will share them as soon as stable.

                  • 6. Re: SIEM : Daily Health Check

                    There is an existing view called "Device Status" that you could use a starting point. You can use it at any level, an individual Data Source or Parent, a Receiver, or the top level ESM.

                    Shows the following details on the ESM:

                    • Ten Minute Average CPU Load
                    • Current CPU Load
                    • Current Memory Utilization
                    • Current Hard Disk Utilization
                    • Block Device Statistics
                    • Uptime
                    • Ten Minute Event and Flow Rate


                    I built a custom a view which is my default view, I have a small window which shows me the event distribution for the last hour from each Receiver, ACE, ePO, APM, etc.. As well as an overall EPS Gauge for the ESM. At the top of the view I still have my normal Event Summary, Event Count, and Distribution.


                    This allows me to quickly see if my overall EPS, as well as each Appliance, to see if there are any issues with missing events that have not been alerted on for Low Event Count.

                    • 7. Re: SIEM : Daily Health Check

                      I'm in the process of building reports for device & system health for all ESM devices including the ESM itself.  As soon as I finalize and verify content and automation I will get with McAfee and see if they will let me post them here in community.

                      • 8. Re: SIEM : Daily Health Check

                        You can create a separate dashboard having distribution panels for major devices in architecture, ACE, NSM, ATD, ERC

                        Query Type: Distribution

                        Device ID: ACE/ NSM/ ATD/ ERC

                        It will give the overview of events, you can find the device outage based on ups and downs.

                        • 9. Re: SIEM : Daily Health Check

                          Having worked with the McAfee SIEM for nearly (3) years now, making sure the appliances are healthy is not so easily done. There should be an appliance health dashboard, instead problems with the appliances are lumped into the event stream from data sources or don't make it any further than /var/log/messages. We gave up and built something to monitor the thing we paid good money to monitor all of our things. It's not terribly sophisticated but it's very effective, it goes like this.


                          1. Simple perl scripts run hourly on each appliance to dump command line stuff to an appliance status file, for example: disk free, hw raid, logical raid, dssummary, listings of key directories, etc.

                          2. A collection script on a Windows Server collects up the status files from all the appliances.

                          3. A monitoring script cuts up all the data into parameters for each appliance and tests to see if a parameter is within green, yellow, or red tolerances.

                          4. The monitoring script sends out a green alert every (6) hours if all is well, or if there's a yellow or red condition it will directly email or text.


                          It's a shame to have to resort to this, but with 13K+ data sources in play and dummy parent folders, the little red or yellow flags are always there. We've also tried setting up alerts for health events but have had things go bad wrong without a peep from those.



                          1 2 Previous Next