5 Replies Latest reply on Nov 3, 2016 10:58 AM by falconevo

    McAfee ePO 5.3.2 hanging on installation

    alex.davidson

      Hi all,

       

      I'm trying to upgrade our ePO server from 5.1.1 to 5.3.2 in the hopes that it solves some issues we are having, however I'm currently up to 15 hours waiting for the installation screen to get past the initial "Computing Space Requirements" screen

       

      The only thing I thought might have been affecting this is that it warned me of 10000+ events waiting to be written to the database (funnily enough this is part of one of the problems I'm hoping 5.3 will solve) so I moved the event xml and pkg files out of the default folder but then the installer wouldn't even compute space requirements! I've put some (not all) of these xml and pkg files back into the default folder so now the installer gets further but is this really meant to take this long?

       

      This is the first time I've had to do an upgrade of ePO on my own so I'm lost before I start, so any help is appreciated

       

      I also don't really know what these pkg and xml files in the DB events folder even reference so if anyone can educate me there that'd be grand

        • 1. Re: McAfee ePO 5.3.2 hanging on installation
          falconevo

          Stop the Mcafee Event parser service, cut ONLY the XML and .PKG content out of the EPO install directory;

           

          ***\McAfee\ePolicy Orchestrator\DB\Events

          Paste the content to an alternate location for safe keeping.  Restart the Event parser service and retry the upgrade/installation.

           

          The same problems exist in 5.3.2 build 400, if you are having problems with the event parser service failing due to mass amounts of PKG files being located in the folder generated by Agents.  Linux servers are a culprit of mine that have multiple AV installations on (ClamAV & Mcafee) as they end up fighting over file locks and generating masses of agent events for the EPO to parse.

           

          Once you have finished the upgrade/install, you can cut/paste the content back in to the event parser folder you cleared earlier.  It wouldn't be advisable to do this in bulk, maybe 5PKG files at a time.   The event parser service extracts the PKG files automatically to disk and drops the .xml content for the service to parse the event.  If you drop too many all at once, if you don't have sufficient CPU and Disk resource to cope with them all at once it will crash the service.  They need to add a queuing mechanism in to the service, but i doubt they will ever fix it.

          • 2. Re: McAfee ePO 5.3.2 hanging on installation
            alex.davidson

            Thanks for the confirmation, I've managed to get past that initial hang now, however after however long it takes I then get hit with the "install wizard was interrupted" at the end and I can't get past that

             

            Could you let me know where the various error logs are that I could check to see if I can find out what could be happening?

            • 3. Re: McAfee ePO 5.3.2 hanging on installation
              falconevo

              You will want to be looking in the following folder for troubleshooting the installer logs;

               

              C:\ProgramData\McAfee\ePolicy Orchestrator\InstallLogs\

               

              This will have the install, debug and MSI log.


              Just out of curiosity what problem are you having?  Is it agent events locking up the event parser service?

              • 4. Re: McAfee ePO 5.3.2 hanging on installation
                alex.davidson

                There are two issues I'm having:

                 

                One is I'm trying to upgrade from 5.1 to 5.3 and it's constantly failing the upgrade. The second is that over the last month or two the EPO server has been dying a severe death and we've recently found that the epo events folder had 65,000+ xml files waiting in the Events folder and a further 100,000+ in the debug folder. Insane RAM use on SQL then became a bit of an issue but more critical like that were that the sqlserver was pretty much chewing up the disk i/o mainly on read

                 

                In fact I've just turned the event parser back on and in a matter of 5 minutes a once empty Events folder now has over 20,000 xml files in there, and watching resource monitor the sqlserver just hit over 19,000,000 B/sec before I've had to pull the plug and stop the parser service

                 

                I've got another thread going here :Failed upgrade installation of either ePO 5.3.1 or 5.3.2 for the installation failure. I'm probably going to start another thread up for the major overuse of resources once the event parser is on

                • 5. Re: McAfee ePO 5.3.2 hanging on installation
                  falconevo

                  Well I can tell you now that the Event Parser issue is not yet fixed, it falls on its face due to a lack of queue limiting and it just simply has too much work to do for the resources that are likely to be assigned to a Mcafee EPO server.


                  Hard disk IO is an issue for the event parser under large implementations, even with log filtering down to a minimum some servers will spam threat events in certain scenarios.  I have dealt with the issue for a while and here's what I used to mitigate.

                   

                  **\McAfee\ePolicy Orchestrator\DB\Events\


                  This folder gets the XML, TXML and PKG files from the agents, PKG files are a compressed 10Mb version which encompasses an enormous amount of XML files which have been generated by the agent.  From what I can tell, if the agent needs to send data in bulk to the EPO, it does so by packaging them up as PKG files.   When the PKG is dropped in the Events folder, the Event Parser will spool through the files churning through them one by one with seemingly no 'hard limit' on the amount it tries to transact at any one time.   In a small implementation, this is unlikely to be an issue but in large ones its a problem.

                   

                  The problem is Disk IO and CPU time waiting for the events to extract (if in PKG), process and clear.  Disk IO requirements are quite high in these scenarios.   To get around this you can attempt to use faster storage such as SSDs, SAS disks or hardware HBA/RAID controllers that support write back caching, however even with all of these I was still seeing IO performance problems due to the amount of agent events being handled.


                  I instead looked in to RAMDisking the hard coded *DB\Events folder with no real luck, I was going to use a RamDisk as some freeware options are available.  Instead I was forced to use a product named PrimoCache Server to place an active memory cache in the Windows OS's file system filter driver.

                   

                  PrimoCache Overview

                   

                  What this essentially does, when configured correctly is intercept file system IO and place all requests in memory read and/or write for a particular volume or partition.  I have mine configured to perform read and write caching with a 3 second delay on write from memory to disk, this way the impact to the IO on the disk is staged and doesn't overwhelm the HBA controller and the events can essentially be parsed in memory reducing the wait time on disk IO allowing the CPU time to be significantly reduced.

                   

                  The flow would be

                   

                  Disk IO request   >  Primo NTFS intercept filter  >  Write to memory   >  Provide confirmation of write to operating system  >  Store in memory cache  > Defer write to phyiscal  disk after 'x' seconds.

                   

                  Here is the config I use;

                   

                   

                   

                  Bare in mind there are risks to using such software, it will affect the whole partition as it cannot be specified for a folder.  Use the write delay time wisely, remember that if the server has a hard failure, power drop etc then anything in memory will be lost.  If you couldn't care less about losing a few events in the event of an outage, then its not really a big deal.   Use the software at your own risk and it isnt free but does give you a lengthy 60 day trial to see if its for you.  It requires a reboot to install and to remove so be aware of that.

                   

                  Hope this helps, also I don't have any affiliation with Primo Cache I simply use it because it works for certain use cases.