0 Replies Latest reply on Aug 11, 2016 9:57 PM by sivrat

    Session drops occurring during Splunk Health Checks

    sivrat

      I work in a larger organization, and a while back a upgrade on our McAfee Enterprise firewall (sidewinders) code to 8.3.2P07. Shortly after that upgrade we started a large amount of "session drop" notifications on our log collectors. Notably, almost all of the notifications (95+%) that regularly occur are from Splunk Universal Forwarders to our main Splunk server. Several different people have looked into it but ended up putting it on back burner for more pressing issues, but it is causing a LOT of noise, so I'd like to see if anyone else has had anything similar or has ideas of where to proceed.

      The following info is what we can tell:

      * Did tcpdumps on various devices capable of it. Matching session all last 1 second (normally under 0.5 seconds looking at timestamps in wireshark), and comprise of a full session. Syn, Syn/Ack, Ack; Fin/Ack, Fin/Ack, Ack. No Reset packets are being seen.

      * Session are also occurring every 30 seconds and don't contain any info. Based on my reading this is in line with Splunk health checks.
      * For each complete session (Syn, Syn/Ack, Ack, Fin/Ack, Fin/Ack, Ack), I am seeing 6 "session drop" events.
      * "Session drops" only seem to be occurring on the health check events, as the TCP session that Splunk is sending actual data do not seem to be triggering any events.

      * Current settings for session timeouts for this type of traffic is configured to be 30 seconds idle timeout. Again, seeing the complete sessions lasting under a full second so don't think its timeouts.

       

      I'm more than willing to check on some settings, but I am not terribly experienced with all of the software/hardware involved.

       

      Even if I can't prevent the session drops from being triggered, if I can prevent it from triggering 6 times for each sessions that'd be great. I mainly just want to quite our logs down as this is probably around 40-50% of our daily event log. I'd prefer not filtering completely from log collector. Its a potential solution, but I'd prefer to fix what looks like configuration issues rather than just ignoring/filtering them.