I have an issue with IOPS number (read). We can reach 10.000 IOPS in our storage and it seems it's caused by ePO infrastructure.
Some months ago Storage Team warn me about this issue. We have reviewed but we cannot get down the IOPS.
A brief resume:
- Wordwide company
- ePO 5.1.0 server (virtual, dedicated, W2008 R2, 16 Gb RAM, 4 cpu) and database in cluster SQL 2008 R2.
- 5 Agent Handlers (3 in DMZ, 2 in LAN), so in total 1 ePO and 5 AGH (virtual, dedicated, W2008 R2).
- Storage TIER 1.
- 10.000 nodes,
- Maintenance program in database.
- Automatic responses and querys reviewed.
- During last weeks, it's increased is on Mondays and Fridays, Tuerdays and Wednesday OK, Thursday it increases a little
- McAfee SIEM connected to database, but it takes the events all week during certain hours and consumption is on Mondays.
We have reviewed a lot of settings but we cannot dectec why IOPS consumption is too hight.
Any idea or setting that we can check? Any log to review?
Thanks in advance.
90 Gb reserved, 23 Gb used.
During last month: 3.600.000 access protection events.
I have Dat reputations events, but problem appeared several months before check in Dat reputacion feature.
I think Mcafee has solved the issue of the Dat reputations events we do no see them anymore, but regarding the large number of events do you run the task(Purge Threat and Client Events Older than 90 Days)? with this large database, how is epo performing , is it slow?
What system is hitting 10k IOPs? The ePO app server or the database server?
Is the database server physical or virtual?
I'd be more inclined to believe that your storage team needs to do some additional investigating and help determine why 10k IOPs are available to you. If you are using many Intel Security products, there is definitely the potential for needing all those IOPs, but as others have suggested, you can tune some of that load through managing your ASCI interval, server tasks that may be executing complex queries, etc.
is the database, it's in a SQL 2008 R2 cluster, SQL servers are virtual. We use many Intel Security Products (Agent, VSE, HIPS, Site Advisor, MSME, Drive Encryption...).
Server tasks have been reviewed, we have several tasks to sort some systems, they are executed every hour, but hight IOPS consumption is not everydays at the same time... it's a very rare case...
Do you think ASCI interval is too much? Maybe I could change it to 60 seconds, but I don't undestand why it occurs only some days...
You're probably experiencing check-in storms. The agents use a random check-in time based on the ASCI. So if x clients are on-line and y of those clients decide to all check-in close to the same time. You get a perfect storm. So you should research increasing the interval. I say research as one of my admins complains that large check-in values can interfere with Encryption. (I never checked if that was a valid statement) We have a similar number of endpoints and McAfee recommended 2 - 3 hour check-in.
I've also seen what happens to the database when someone sets 4000 clients to check in every 15 minutes. It was not pretty.