4 Replies Latest reply on Jul 14, 2012 3:25 PM by orcusporcus

    ePO fails to deploy agent

      Hello.

      I am deploying the McAfee Agent to a new environment.

       

      I have about 10k active machines but the AD have about 22k machines (yeah, a lot of garbage).

      I have already imported 7k machines with only 2k being active.

       

      What I did was to put a recurrent task to try and deploy the McAfee Agent to all unmanaged machines aiming to deploy the Agent to the ones that were offline.

       

      What I am facing now is that when I import new machines most of them fail to install the agent (the network path is unavailable);

      I got about 300 machines and started the deployment task but most of them failed (and most of them were online).

      I tryied to connect to 1 machine to troubleshoot (did a "\\" at it and a psexec connection) and it was successfull.

      Start the deployment task to 6 machines only (with the elected one in the middle) - all failed.

      Start the deployment task to 3 machines only (with the elected one in the middle) - all failed.

      Start the deployment task only to the machine - succes.

       

      Repeated this test with 4 other machines machines.

      Every time the task was run to more than 1 machine it failed.

      Every time the task was run to only one machine it was successfull.

       

      I have imported more machines different times and the behaviour kept the same.

       

      I am tending to think this problem is related to simultaneous connections open as my recurrent task (to try and install the McAfee Agent on unmanaged systems) keeps running every 30 minutes.

      When it starts, it tries to install the McAfee Agent to about 6k machines.

      Is there a problem with this scenario?

      In the end I´ll have a task starting 10k connections to unmanaged systems every hour (because of the AD garbage).

      Most of them doesn´t even resolve DNS, so it won´t even open a connection.

      AFAIK there is no limitation to the maximum simultaneous connections ePO can open to remote systems (nor Windows 2008).

       

      I want opinions on 2 topics:

      1 - ePO failing to install McAfee Agent to more than 1 machine at once;

      2 - A recurrent task trying to install the McAfee Agent to about 10k machines at once.

       

      And maybe

      3 - Is 1 and 2 related?

       

      Thanks in advance.

        • 1. Re: ePO fails to deploy agent
          JoeBidgood

          Hi...

          AFAIK there is no limitation to the maximum simultaneous connections ePO can open to remote systems (nor Windows 2008).

           

          Unfortunately this is not correct - there is a very definite limit to the maximum number of connections. The ePO server service, which is essentially an Apache server, maintains a dynamic connection pool, but the maximum size of that pool is 250.

          I would strongly recommend against a task that tries to send agent installs to 10,000 machines in one go - a task like this will put a considerable load on the server. It's entirely possible that it wouldn't complete in an hour, in which case you'd have pending tasks stacking up behind each other.

           

          In the first instance, I'd suggest trying to find out why the agent install tasks are failing. What details are shown in the server task log when the task fails? Also check the server.log for the time that the task ran - what errors if any are shown?

           

          HTH -

           

          Joe

          • 2. Re: ePO fails to deploy agent

            I have read about this 250 simultaneous connection, but I understood that it was for the incoming simultaneous connections to the Apache server (i.e. Agent-to-Server communication).

            This limit really applies to outgoing connections? That doesn´t make sense to me.

            Agent-to-server connections go from remote agent to port 80 at ePO (which is where Apache listens).

            Agent deployment starts from ePO machine (at a random port) to port 445 at the remote machine, what makes me think this kind of communication is more related to Windows than to Apache itself.

            Another point is that at the moment agent installation task is running agains 4k machines and it is taking about 20 minutes to complete.

            At each run it finds about 5 new online machines, about 50 failures and the rest just time-outs which leads me to think that tha tsk is running fine.

            oh, and since at least half of that 4k machines are old garbage from AD, this machines doesn´t even have a DNS entry, which makes me think that ePO wouldn´t even "spoil" a connection to this machines, since it won´t be able to resolve their name to IP.

             

            Does what I think make any sense or am I "in galaxy far far away"?

            • 3. Re: ePO fails to deploy agent
              JoeBidgood

              I was overgeneralising a bit: the 250 limit is for the incoming connection pool, true, but the ePO server has a finite amount of resources to devote to this sort of task - that was the point I was aiming for

               

              I would still recommend (a) avoiding a scenario where the server is trying to send 10,000 agent installs, and (b) trying to troubleshoot the failures first.

               

              HTH -

               

              Joe

              • 4. Re: ePO fails to deploy agent

                What I came up with (based on info from the thread https://community.mcafee.com/thread/27668) was:

                - Connected to a remote machine that gave me the error message at ePO;

                - Confirmed that the FramePkg.exe was there;

                - Tried to execute FramePkg.exe /install=agent /silent and it gave me the same error as the above thread (corrupted file);

                - Copied FramePkg.exe via "\\" to the remote machine and when I tried to execute, it gave me, again, the same error (corrupted file);

                - The copied file had the same size as the original, but when I took a SHA hash, I found out that the hashes from the original and the copied one were different.

                 

                So what was really causing the problem was an "in transit" corruption of the FramePkg.exe file, since many remote offices have smal and already compromised links.

                What I did to solve this problem (since I had about 2.5k machines with this error) was to script a .bat file with psexec.

                In the end, what the script did was to start a ftp connection from the remote machine to the ePO machine to download the FramePkg.exe file and then execute it.

                 

                I believe FTP has better error correction than SMB/CIFS as almost all of the 2.5k machines have the McAfee Agent installed now.

                 

                 

                 

                 

                As for the task to install McAfee Agent running agains 10k machines I still don´t have a full opinion.

                When I start it, I can see that the process who maintain the open connections is not Apache nor Tomcat.

                It´s the "System" process who start the connections from ePO machine to port 445 at the remote hosts.

                 

                I follow closely from "netstat" when I start the task.

                It keeps between 80 and 200 simultaneous connections (open or half-open).

                I can see "ESTABLISHED" and "SYN-SENT" types of connections.

                 

                My guess:

                - From this 10k machines, 3k doesn´t even have a DNS entry, so no connection will be open to this one;

                - From the 7k remaining, 6k are non-existant machines but they do have a DNS entry. They are responsible for a lot of "SYN-SENT" connections at "netstat". Anyway, as this machines do not exist, after a few seconds the half-open connection will be closed as there is no complete 3 way handshake. So the machine will only allocate the resources for this connection for a little while.

                - The other 1k machines it wil connect, one by one, as I can see some "ESTABLISHED" connections at "netstat".

                 

                I think there is some kill-switch that ends the task after 20 minutes, as it always ends with 21 or 22 minutes from executing (being it against 10k or against 2k machines).

                As I can follow 80 to 200 simultaneous connections to port 445 from ePO when the task is running (and the destinations keep changing) I think it can successfully run agains 10k machines.