1 Reply Latest reply on Nov 15, 2012 11:15 AM by Regis

    MQM Quarantine manager - CPU pegged  rpcserv.exe

    Regis

      Greetings,

       

      Has anyone run into the issue of rpcserv.exe monopolizing CPU?   It's been the case since  our MEG's went under load.   We're not seeing any crazy messaging rates, so it's rather surprising.

      Over the past 14 hours, the quarantine has grown to about 2600 entries.

       

      It's running in a virtual cluster, 8GB of RAM (up from the original 4GB.  Product guide says 2GB is the min).  I'm using mySQL.   It's a 2008R2 server in a vsphere cluster.  With CPU uncapped by vmware, it's consuming 30% of the enter server's CPU.  From inside the 2008 server r2 sp1 machine itself, Resource Monitor shows the cpu brickwalled thanks largely to rpcserve.exe.   And now things have grown to even consume all 8GB of RAM.  

       

      Curious if anyone has had similar experiences, and what hardware you have.  And how much disk you've thrown at it for how many users?    It's becoming clear for this install that 30GB of disk isn't going to be enough for the 800 or so users being supported at this site.     Thanks for any insight!

       

      On hold with MEG gold support.    Heh.   The triage people supposedly transferred me to the MEG 7 queue but I reached an ePO/VSE guy who tried to transfer to MEG support.  I ended up back with the automotons in triage... who promptly transferred me back to that same guy's queue.     Awesome.    Also trying to run a MER as I'm sure someone will want one, but running  a MER on a box that's struggling sure is an exercise that will make you more patient for long hold times. 

       

       

       

       

       

      Message was edited by: Regis  updated with additional details as the plot thickens  on 10/13/12 12:16:28 PM CDT
        • 1. Re: MQM Quarantine manager - CPU pegged  rpcserv.exe
          Regis

          In case anyone is victimized by this situation, this got resolved by supoprt.     Apparently despite what certain VSphere engineers or VMWare zealots in your own company may say about single core vm's perhaps being preferable for vsphere cluster performance,  MQM appears to _really_ need 2 CPU cores in a virtual server (and the install guide doesn't put that info in quite the right place for you to notice this).    As soon as the server was fed a 2nd CPU core, the problem vanished never to be seen again (even when dropped back to 1 core).   We're running with 2 CPU's now.

           

          Also, check available disk space for ... say, the possibility of the server group not allocating all the virtual disk space to the actual C:\ volume.     MQM of course won't complain in any overt way about it, but it certainly couldn't be helping matters.

           

          Finally, and possibly most importantly given that an RPC process was the thing going bananas here,  make sure you have your firewall flows set up correctly between MQM and MEG.  The network port requirements for MQM are NOT well spelled out in the MQM guide at this time.   Make sure your firewall flows for tcp/80 and tcp/49500 are BIdirectional.   Despite what you may hear from certain SE's or even consultants, MQM will want to reach out to your MEG's  (spam is stored on the MEG's and MQM sends comms to get it released, I'm led to believe) and  MEG's will need to initiate connections inbound to the MQM's as well (as you'd expect), so make sure the firewall flows are bi-directional.   The MQM code won't tell you there's a problem in any straightforward way, and the product guide doesn't list the port requirements succintly (yet), but they're definitely important.   I have a suspicion these were the root cause of these issues whilst all the other adjunct issues surely didn't help.