cancel
Showing results for 
Search instead for 
Did you mean: 
becke
Level 9
Report Inappropriate Content
Message 1 of 8

ePO purge events by query is inefficient

(I'm on ePO 5.1.1)

This is just an observation on how the purge threat events and purge client events system tasks work - specifically when you use a query.

The first thing I noticed is that it does all of these purge jobs in batches of 1000.

For example - if you use the standard purge task and set it to delete everything older than 90 days, the recurring SQL call would look like follows:

     DELETE  TOP (1000) EPOEvents WHERE '2014-08-01T16:30:38.055' > EPOEvents.DetectedUTC

The 2nd thing I noticed is that if you use purge task for threat events using a query the resultant SQL is very inefficient.

Say you create a query to find all events of type 1095 older than 15 days, the resultant query ePO runs looks like follows:

     DELETE EPOEvents FROM (SELECT TOP 1000 EPOEvents.AutoID from [EPOEvents] where ( ( [EPOEvents].[ThreatEventID] = 1095  ) and ( [EPOEvents].[DetectedUTC] < '2014-10-14T18:05:50.931' ) )) as T1 WHERE EPOEvents.AutoID = T1.AutoID

This causes the server task to take about 3 hours to run if run daily on our server and just clearing out the days worth of events - and causes deadlocks and other issues if something else tries to delete events.

The proper solution to too many events is probably better tuning and filtering up front than clean-up afterwards. However, due to organizational rules outside of my hands that isn't an option for me - so I moved this pruning outside of ePO into a sql agent job and it takes a couple of minutes to run and the database isn't being thrashed for hours:

DECLARE @maxAutoID int = (SELECT MAX(evt.AutoID) FROM ePOEvents evt);

DECLARE @dateThreshold datetime = (DATEADD(day,-15,GETUTCDATE()))

WHILE 1 = 1

  BEGIN

    DELETE TOP(1000000)

      FROM [EPOEvents]

      WHERE

        [EPOEvents].[ThreatEventID] = 1095 AND

        [EPOEvents].[DetectedUTC] < @dateThreshold AND

        AutoID < @maxAutoID

        IF @@ROWCOUNT = 0 BREAK   

  END

7 Replies

Re: ePO purge events by query is inefficient

Good post, like to see other people really utilizing the back end of ePO.

I do have a couple things:

1. Where did you get the SQL logic for the purge job? I didn't see a sproc/trigger for it and was wondering. The "DELETE  TOP (1000) EPOEvents WHERE '2014-08-01T16:30:38.055' > EPOEvents.DetectedUTC"

2. I'm curious though, I have a purge job that runs nightly, that purges all types of data (I know your running it from a query) - looking at my task from last night it deleted 130k+ threat events, along with a bunch of other data and it only took 1 minute to run. See screenshot. It's been running pretty efficiently for our ePO DB.

Now granted we don't have any threat event data > 1 week due to our purges, I took your query, modded it slightly and ran the below; it ran returning 1000 rows in < 1 sec.

SELECT TOP 1000 EPOEvents.AutoID from [EPOEvents] where (([EPOEvents].[ThreatEventID] = 1092) and ([EPOEvents].[DetectedUTC] > sysdatetime()))

Just wondering if maybe something is up with your DB and not ePO itself?

becke
Level 9
Report Inappropriate Content
Message 3 of 8

Re: ePO purge events by query is inefficient

Firstly - I'm dealing with a database with 100,000,000 threat events (not a typo) - keeping all threat events and client events for 90 days apart from event 1095 which is now kept for 15 days for tuning purposes. but it does mean that queries that hit this database are expensive.....

Secondly, the standard tasks are much more efficient than ones based off of custom queries. My standard purge tasks do not take long to run (relatively speaking).

As to how I got the queries - I connected to the db, ran sp_who2 - found the spid of the DELETE command, and ran DBCC INPUTBUFFER(spid) to see the last input to that session.

There are other ways to capture this but this was the easiest way in an ad hoc / quick manner - especially considering the queries are long running for me.

Re: ePO purge events by query is inefficient

We have requirements to keep threat/client events for a year. But we get around that by taking weekly backups and storing them. Then we are able to purge all client/threat events > 1 week.

Also, 1095 is a low event, we have 1095 un-checked so it does not even go to the database. Menu > Configuration > Server Settings > Event Filtering. Just a thought.

becke
Level 9
Report Inappropriate Content
Message 5 of 8

Re: ePO purge events by query is inefficient

we likewise are working around similar requirements - although the current compromise is to keep 90 days "live".

I checked the duration of the standard purge jobs and they're 5 seconds for client tasks and 2 minutes for purging threat events > 90 days. it's really just when you get to custom query driving it that it becomes an absolute dog.

Re: ePO purge events by query is inefficient

Delete ePO Events Older than the mentioned date:


SET rowcount 10000

DELETE FROM epoEvents

WHERE detectedutc < '2015-06-01'

WHILE @@rowcount > 0

BEGIN

DELETE FROM epoEvents

WHERE detectedutc < '2015-06-01'

END

SET rowcount 0

GO

Re: ePO purge events by query is inefficient

I am in the same situation right now. We are trying to purge eventID 1095 by query, but its taking forever.The items are close to 95 Million.

So, we would like to use sql database side to purge items older than 7 days.

What will be the delete statement  to address that?  any query provided by Mcafee?

McAfee Employee
McAfee Employee
Report Inappropriate Content
Message 8 of 8

Re: ePO purge events by query is inefficient

There are many factors that can affect perfomance such as this.  The simpler your query, the more efficient it will be.  It purges in batches to prevent the transaction log from growing out of hand.  The default purge tasks are based on a stored procedure in the database. 

Based on queries, deadlocks can occur if the database indexes are highly fragmented and the tables are large.  Some deadlocks are normal, it just depends on how sustained they are and how frequent.  There are several things you can do to retain data per environmental requirements, yet keep the database itself optimized.  You can set up a rollup server to roll up events to and use that as your longer storage retention while keeping the current production database purged to shorter time frames.  You can also optimize the amount of events being generated.  As was suggested, disable 1095 events.  However, if that is not an option, review your access protection rules for what you can disable as far as reporting them goes.  Look at the 1095 events coming in - which ones do you want to see and which ones are not necessary?  In the rules then, disable reporting of those specific rule violations so the events are not excessive. 

Use the recommended maintenance also for the database per KB67184.  If the database is highly fragmented for the larger tables, we recommend to run the attached reindex script in that KB with epo services off to optimize performance.  Once it is fully reindexed offline, then it can be run with your daily backups.