9 Replies Latest reply on Apr 2, 2015 12:19 PM by rth67

    Events per device

    gehinger

      Hi all,

       

      I would like to create a dashboard on which I can see the total event count per device. Ideally, I would also like to have a baseline and the deviation from this baseline.

      The goal is to abnormal behaviours... Servers who generate too much events or too less, based on their respective baselines.

       

      I created a new dashboard with a table. I created a new query with which I select the fields "Device ID", "Device Name" and "Event count".

       

      • In my table, some devices have more than 1 line corresponding. I would like to group the result by device ID. But I don't find how.
      • I have no idea how I can add the baseline and the deviation... If able.

       

      Maybe my approach is not a good one... ? I would appreciate any help or clue

      Thank you in advance.

       

      regards,
      Guillaume.

       

      ESM version : 9.4.2

        • 1. Re: Events per device
          aszotek

          There are few ways of doing this, solutions that work include:

          - Bar charts - Collection rate [by Device] per second

          - Distribution charts

           

          My preference is to use Distribution charts per device type as it shows amounts of events over time, so I can quickly see deviations or interruptions (the latter being rather frequent on McAfee SIEM despite HA setup).

          The downside is space limit (your screen) if you want to have chart for each single device.

          • 2. Re: Events per device
            gehinger

            I created a Correlation rule with the following criteria :

             

            deviation_rule.jpg

             

            I also grouped by "Device ID". So my rule is as follow :

             

            deviation_rule2.jpg

             

            I then created a dashboard with a bar chart. I am not sure which event query I should select... I made few tests, with my rule's signature ID as a filter, but my chart stay empty.

            And I really don't think none of our servers are in the red... Any idea?

             

            bonus question : Threshold cannot be < 1. Doesn't it use normal distribution law?

             

            Thanks in advance,
            GE

            • 3. Re: Events per device
              gehinger

              Hi,

               

              Thanks for the answer. Didn't saw it while typing my text

              I will try those different approaches and keep you in touch.

              • 4. Re: Events per device
                gehinger

                So,

                I now have a nice graph with the collection rate per second for each of my data-source, ordered by avg_rate, and a distribution chart bound to it. Much more easier than my approach ^^


                Problem is even if I show baseline averages and margins, it will show all the data-sources, regardless of the deviation.

                What I would like is to see only the sources which are under or above those margins, and/or order my sources by those deviations. 

                 

                I don't even know if this is feasible ^^

                • 5. Re: Events per device
                  aszotek

                  Baseline per data source is perfectly doable, not sure what graph you are referring to, can you present a screenshot?

                  Bar charts can show multiple deviations, if your distribution chart shows all of them, you can bound it to "collection rate per second" bar chart and distribution chanrt will change when you select each data source device. This is blind guess, based on my graphs.

                  • 6. Re: Events per device
                    gehinger

                    Sorry not to be clearer I will do my best ^^


                    Right now, I have something like this :

                    deviation_rule.jpg

                    The problem with this solution is that it requires a manual interaction to see which data-source hasn't a normal behaviour. With more than 100 sources, it is not usable... 

                    This is why I would like to order my data sources not by average rate but by its deviation from its own baseline. Problem is baseline can be shown on the graph but cannot be used in the queries.

                    • 7. Re: Events per device
                      rth67

                      Have you thought about setting up Alarm's for Devices that have a Deviation from the Baseline? Again, if you want one for each Data Source, you will need to create 100 separate Alarms.

                      This isn't feasible in our environment, as we have about 2,000 Data Sources on one ESM, and about 1,000 on our other ESM.  We have Deviation from Baseline Alarms for the Devices themselves (Receivers, ACE, APM), as well as for particular groupings like Firewalls, VMware Hosts, or simply for Unknown Events.

                      • 8. Re: Events per device
                        LT McGary

                        RTH,

                         

                        What did you use for a baseline for your Alarms? A day, a week, a month?

                        • 9. Re: Events per device
                          rth67

                          It varies, you have to tweak the alarm to fit your environment.

                          Some examples are:

                          General Deviation from Baseline for a Device:

                               Query - Total Events; Time Frame: Last 8 Hours; Trigger when 90% below baseline; Check Rate - 1 Hour

                          Others include settings such as:

                               Query - Total Events; Time Frame - Lat 1 Week, Trigger when - 50% above, 50% below; Check Rate - 12 Hours

                               Query - Total Events; Time Frame - Last 2 Hours; Trigger when - 25% above; Check Rate - 1 Hour (Unknown Events from Unix/Linux/AIX systems)

                           

                          The Unknown Events increase for certain types of systems is usually triggered by one of several events, either someone enabled Debug mode (switch or router typically), or in the case of some VMware hosts, the local storage for it's logs filled up, so then it logs even more events saying it is out of space, over and over.

                           

                          We are currently dealing with a receiver (small older orange Nitro box) which has been getting "lowmem_reserve" messages followed up by "IPSDBServer[1541]: Error: Could not send event(s) to correlator through socket - Unable to obtain lock(4)" - when this occurs the receiver replies to a ping, but no longer processes events, does not allow ssh connections, and worst of all, does not accept incoming syslog messages. I just had to hard boot it again this morning due to this issue, after following up on the Alarm email, and viewing my dashboard and seeing that we had no events in the past 30 minutes from that receiver.


                          My default dashboard contains a small Distribution view for each "Device" we have, 9 Receivers, ACE, APM, ePO, plus an "EPS" gauge for Total Events per Second of the system, so I can quickly see when an issue may be taking place.


                          We also have "Device Failure" alarms for each device, that check every 10 minutes. there are occasional False Positives, but we usually see this prior to the Deviation from Baseline, as we check more frequently.