1 2 3 4 Previous Next 39 Replies Latest reply on Nov 17, 2009 10:50 PM by peter_eepc

    Extremely slow and at sometimes completely unresponsive DB

    mwilke

      Database Details

       

      Database is on RAID5 DAS

       

      Users = 18,000 *estimate

      Machines = 20,000 *estimate

       

      Name Indexing = on

       

      Connections limited to 250 -- was 200 but still had same issue

       

      Random sync intervals for all machines ranging anywhere from 180 minutes to 240 minutes... some machines maybe even more.

       

       

      We are seeing that especially on Mondays and usually on all other days as well... the Managment Console is completely unresponsive.  When you type in user id and password, it hangs and never opens the console.  We can do a netstat -a and see that there are 250 connections (syncs) happening at that time but what is weird is that other times when the console is working just fine we have 250 connections then also.

       

      We are running out of things to try and I was wondering if any one out there in a large environment like this has any suggestions or has seen the same thing.

        • 1. Re: Extremely slow and at sometimes completely unresponsive DB

          Where is Management Console located (server and network wise)?

          What database connection is being used for console? And for user PC's?

          Do you see some abnormal activity in names cache? (check time stamps, any pattern?)

          How often do you rebuild names cache?

          • 2. Re: Extremely slow and at sometimes completely unresponsive DB

            Interesting. We're not seeing the issue that you're seeing.

             

            23,000 machines, 140 users.

            Sync is every 720 minutes, w/ a 90 minute randomizer at startup.

            RAID 1 (IIRC)

            Name Indexing = on

            Daily re-indexing (LifeTime=86400

             

            I don't use the EEM console remotely, I RDP to the server, and run the console from there.

            • 3. Re: Extremely slow and at sometimes completely unresponsive DB
              mwilke

              Peter, the location of the server is smack dab in the middle of the US in a large Data Center.  Fiber connections i do believe.

               

              Anyone who connects to the database connects via IP address not the local file

               

              What type of abnormal activity in the name cache?  What should i look for?

               

              And we rebuild the cache twice per week via a script that runs the toastcache.bat file

              • 4. Re: Extremely slow and at sometimes completely unresponsive DB
                mwilke

                But if you only have 140 users then i guess that cant be too bad.  If everyone was syncing at once youd only have 140 connections.

                 

                And we RDP to the server too.  It doesnt seem to make a difference if we are on the server or just on the remote console the behavior is the same.  Also, i am in the same building as the server so its not like i am networking across the US to open it up.  All high speed connections throughout.

                • 5. Re: Extremely slow and at sometimes completely unresponsive DB

                  140 UserIDs in EEPC.

                  23,000+ real users on 23,000 real machines. Yes, there are LOTS of synchronizations!

                   

                  I've retuned the database twice now, including one time of restoring the db from backup because it was so corrupted. We backed way off on the synchs, because we were getting database contention errors, when multiple clients were trying to sync the audit trail on the same user.

                  • 6. Re: Extremely slow and at sometimes completely unresponsive DB
                    mwilke

                    Ok i gotcha.  Do you have a limit set on how many connections are available?

                    • 7. Re: Extremely slow and at sometimes completely unresponsive DB

                      Do you use the same MEE Database Server service for users and EE Manager connections?

                      I believe best practise is to have them separated.

                      I would drop client synchronisation frequency, increase number of connections drastically on Client facing MEE Database Server service, reduce TCP timeout to 5min.

                       

                      Indication of name cache trouble is spontantenous cache refresh, which manifests in cache file sizes to drop and all timestamps to jump to current time (rather than majority showing last cache rebuild time). That could happen to user and machine objects.

                      • 8. Re: Extremely slow and at sometimes completely unresponsive DB

                        I don't see a limit in dbcfg.ini  Where did you set it?

                         

                        When I check the server status, I rarely see more than 15-20 connections, and often only 2-3.

                        • 9. Re: Extremely slow and at sometimes completely unresponsive DB
                          mwilke

                          Do you use the same MEE Database Server service for users and EE Manager connections?

                           

                          >>  Not sure what you mean here.  We have the product installed on the server and have the MEE Database Server service started of course.  On any computer (PC) that uses the admin console they just have a remote connection pointing to the servers database so is there really a need to have this service going on the PCs?  Is this what you are asking?

                           

                           

                          I would drop client synchronisation frequency, increase number of connections drastically on Client facing MEE Database Server service, reduce TCP timeout to 5min.

                           

                          >>  Drop client sync frequency is something we are tinkering with a bit to see what we get.  What do you mean increase number of conn on MEE DB Server service?  Are you saying that the limit of 250 available connections is too low?  McAfee recomends keeping this at 200 but then they told us 250 last week so we moved it up to 250.  And we already have the TCP timeout set to 5min

                           

                           

                          Indication of name cache trouble is spontantenous cache refresh, which manifests in cache file sizes to drop and all timestamps to jump to current time (rather than majority showing last cache rebuild time). That could happen to user and machine objects.

                           

                          >> Right now, we are rebuilding name cache on Sundays and Wednesdays.  Just checked the names.* files and all of them have Sundays date and time that the cache was rebuilt by the script.  So i assume there is no problems there.

                          1 2 3 4 Previous Next