High CPU i/o wait times (vmstat/top wa) and load avg spikes in MWG?
Wondering if anyone under web gateway 7.x has run into load spikes (loadaverages of 20-25 or worse) periodically and poor performance that -- for all our checking seems to have high i/o wait times (wa values >10 quite consistently).
Support has had trouble figuring out much to do with it and I'm pressing to get it escalated to someone with ninja i/o wait debugging skills. This environment has an identically sized box that handles way more requests per second on teh same policy that never sees these load spikes or these WA values. Support had me to an intel diagnostic test (IDT) PCT and look at the system event log, but none of that had anything that looked remotely relevant. However, neither of these test the disk. No disk health issues indicated in fsck or in the messages logs--the drives are 10k rpm drives.
On an interesting possibly related note, months ago when we started battling some performance issues, we noticed that turning off caching made a HUGE improvement in performance, which may simply have masked this underlying i/o wait issue a bit since doing so would I imagine reduce the i/o needs of the gateway quite a bit.
Has anyone else had success in tracking down and isolating i/o wait issues on these?
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
I've seen tutorials on the internet for other LInux's that talk abou pinning it down to a disk and a process using iostat, but that esn't see to be available on the mwg.