1 Reply Latest reply on May 15, 2015 5:06 PM by wwarren

    Troubleshooting VSE after an incident

    pwalski

      Pertinent environment details

      EPO: 5.1

      VSE: 8.8 Patch 4

      Affected O/S: Windows 2003

       

      So I am in a position where me and my team are brought in somewhat late to the party when an incident occurs. In pretty much all of the cases that I've been involved with here these incidents are along the lines of Server A was unresponsive and support staff "couldn't access the server". I put that in quotes because there are never really concrete definition on what couldn't access the server means, i.e. couldn't log into it via RDP, couldn't ping the server, could log in but the response time was poor, etc.

       

      In any case what tends to happen is that the support staff somehow seem to work through the issue, or the issue goes away and they come screaming at me saying that it was Mcafee that was the cause and I should disable/remove Mcafee immediately (to which I roll my eyes at). The quandary I have is that since the reported issue is not occurring when I get involved, and cannot be recreated how can I go about troubleshooting and demonstrating that the issue is or is not VSE related? I have processes in place to troubleshoot when an issue is occurring and I have been using the Profiler tool post-incident, and getting some random samples from the system, but that is really telling me when the system is healthy.

       

      Are there any recommendations that people have on how to adequately prove to people that VSE is doing it's job and maybe they need to strengthen their troubleshooting skills? Maybe not phrase it exactly like that.

       

      Thanks much.

        • 1. Re: Troubleshooting VSE after an incident
          wwarren

          The suspicion that comes to mind based on the environment (win2k3) and symptom (which sounds like non-page pool memory depletion), is the issue I describe here:

          Navigating Mines, second issue in the table.

           

          If you want to confirm it, configure the system to allow being dumped via keyboard. Next time they report "unresponsiveness"+"can't RDP", they should know to force a dump to occur.

          If it is the issue I'm thinking it is, we have no solution for you - but we do have a resolution. The problem can be avoided, or controlled. And while it's true they can say "I told you it was McAfee" you can respond in kind with "Yes, and it only occurs for Server 2003... two-thousand-and-three, what year is it?"... if you wanted to.