I am interested in how people are backing up their ESM data.
I have an ESM + DAS and a redundant ESM + DAS.
I understand that this gives me HA and DR capability, however what it doesn't protect against is any misconfiguration or malicous act (on both Primary and Redundant).
i.e. an administrator misconfigures the Data Allocation/Retention settings, or someone deletes events from both ESMs.
What I need to protect against is a catastrophic failure or event that requires the full restore of configuration and data.
To provide this protection I am performing Full Backups (occasionally) and Daily Backups (every night) to a network share (CIFS).
The problem I have with this method is that for me it is not sustainable. In our short period of using ESM our ESM data is at 5TB (uncompressed) and this takes about 8-10 hours to backup. The Daily backups (compressed) are approx 3+GB and take 3-4 hours to backup.
We have a requirement to keep 1-2 years of ESM data online, we have projected this to equate to approx 30-40TB. When we get to that size, performing a Full Backup will take a couple days.
I am interested in how others are performing Full Backups and/or protecting themselves against catastrophic failures and events?
I feel for you... that is a pretty tough challenge you have there.... We have no where near that level of retention nor anywhere near that amount data so I can only throw out some ideas.
First make sure the folks you grant admin rights are fully trained and reiterate how important data retention is on your program.
Your main ESM storage I'm assuming is out on a huge data store so maybe you could get your storage folks to allocate 4-5 times what you need and use snapshots or possibly have a second storage array for a read only/recover copy from a continuous backup process? I would imagine that if you had to rebuild as long as the configuration files match the underlying database that you could drop in the backups for that period and restart. You may want to test that theory out for giggles if you have a smaller development/test environment
As far as relying on an ELM, according to McAfee there is no way to re-ingest events ELM -> ESM which I agree that would be very difficult to ensure you got everything so not really an option.
I would think McAfee would have thought of this and could provide some ideas on how to solve this problem?
The database allocation and database storage options actually only exist under the NGCP admin account, they do not exist anywhere else, so as long as NGCP is restricted to an individual or two who know, do not touch these settings, you should be good.
As far as keeping data active on the SIEM, or available to become active, you could potentially investigate archiving data, and doing backups for the archived data server, this will allow you to re-insert these partitions into the SIEM for viewing at a later date, and having backups performed off the ESM, so that it stays up and running, as far as getting a full database backup, it may be the best solution to do full backups on the redundant ESM, and leave the primary ESM running.
Unfortunately, that is the best answer giving the capabilities allotted to the SIEM at this point. If McAfee develops a way for the full backup to start taking the oldest partition offline first and detaching it, then backing it up, then proceeding until it reaches the current active partitions, then we may have a real backup solution that won't stop our SIEM from running 2-3 days during a backup.