I feel for you... that is a pretty tough challenge you have there.... We have no where near that level of retention nor anywhere near that amount data so I can only throw out some ideas.
First make sure the folks you grant admin rights are fully trained and reiterate how important data retention is on your program.
Your main ESM storage I'm assuming is out on a huge data store so maybe you could get your storage folks to allocate 4-5 times what you need and use snapshots or possibly have a second storage array for a read only/recover copy from a continuous backup process? I would imagine that if you had to rebuild as long as the configuration files match the underlying database that you could drop in the backups for that period and restart. You may want to test that theory out for giggles if you have a smaller development/test environment
As far as relying on an ELM, according to McAfee there is no way to re-ingest events ELM -> ESM which I agree that would be very difficult to ensure you got everything so not really an option.
I would think McAfee would have thought of this and could provide some ideas on how to solve this problem?
The database allocation and database storage options actually only exist under the NGCP admin account, they do not exist anywhere else, so as long as NGCP is restricted to an individual or two who know, do not touch these settings, you should be good.
As far as keeping data active on the SIEM, or available to become active, you could potentially investigate archiving data, and doing backups for the archived data server, this will allow you to re-insert these partitions into the SIEM for viewing at a later date, and having backups performed off the ESM, so that it stays up and running, as far as getting a full database backup, it may be the best solution to do full backups on the redundant ESM, and leave the primary ESM running.
Unfortunately, that is the best answer giving the capabilities allotted to the SIEM at this point. If McAfee develops a way for the full backup to start taking the oldest partition offline first and detaching it, then backing it up, then proceeding until it reaches the current active partitions, then we may have a real backup solution that won't stop our SIEM from running 2-3 days during a backup.