Good question. Hopefully I can clarify things a little.
Best practice #1: Configure backup day on 0. Do a full backup, follow-up with daily incremental backups.
ESM backups are underrated. The ability to restore a backup to quickly work around a corruption issue is a powerful tool. It's faster to restore a backup then to get someone on the phone to troubleshoot.
Best Practice #2: Calculate/Establish your ESM retention time.
The solution essentially stores two sets of the log data. One set is the digitally-signed and compressed raw data that the ELM stores and the other set is the aggregated metadata of the raw data which is stored in the ESM database. The metadata is the day-to-day operational data that is displayed in the UI. Good aggregation practices will ensure that all of the critical parts of any event are stored in a field so nothing is lost in the aggregation process (see my aggregation post for more info). Each set of data can have different retention times.
At one end of the spectrum, we calculate to ensure that there is enough disk space to support the required retention time for each set of data. At the other end, there's no need to keep data we're not required to and in some parts of the world it needs to be removed after a certain point to comply with privacy laws. Regardless of what dictates the timeline, when the retention window or disk space is exceeded, the data will be removed in a continuous FIFO process. If ESM Archival is configured, the data being removed will be automatically moved to an "inactive partition" on a remote data store. If you need to access that data at a later data, you're able to mount the partition remotely and run (albeit much slower) queries against it.
Data Retention is where the retention periods are configured and Data Archival is where we're able to configure a place for expired metadata to be archived.