How does ERC buffer event (to be consumed by ESM and ACE) and raw log (to be pushed to ELM) data in case when there is an outage and ERC still receives logs from data sources but ESM, ACE and ELM are not available to consume it?
To give an example on how this might work in case there is temporarily no connectivity to ERC from ESM/ACE and from ERC to ELM and then it is re-established
1. ERC processes incoming logs from data sources
2. Raw logs are stored as files to be pushed to ELM: is there a defined buffer limit? or just as long as there is enough free space on the file system?
3. Events are stored in internal event database to be published to Kafka: are events removed from EDB once published via Kafka? are they kept for a defined time period in internal ERC event database?
4. Published events are stored in ERC Kafka buffer: 3 days is the default? Are we allowed to modify this setting?
5. Once ELM is back online, ERC pushes all raw logs kept in the buffer: does it matter whether the events for this raw data are still in ERC event database or Kafka buffer? will the raw logs for non-existing events still be stored on ELM?
6. Once ESM/ACE are back, they consume events from Kafka buffer: is there a possibility that older-than-3-days events are still in internal ERC event database but not in Kafka buffer, and can be re-published via Kafka to be consumed by ESM/ACE? or is only Kafka buffer consumed?
If the receiver is unable to connect to the ELM, then the raw logs are buffered in /var/log/data/inline/thirdparty.logs/elm.logs . There is no configured limit to this and it will continue to store data until it runs out of space. Once free space on the primary disk of the receiver is below 30GB, active data collection will stop.
If the receiver is unable to connect to its databus (in the event that you have a DSB appliance) then it will retry until it is able to connect. However, once it is able to connect, the restrict insertion of historical data will apply - if you want all events to come through you will need to disable this temporarily (from the receiver properties->events, flows and logs).
If the ESM is unable to consume from the receiver's databus, then data will be aged out of the bus after 3 days by default. It is possible to adjust this setting, but it is non-trivial and it will always be on (i.e. even if data is consumed, it will still stay on the bus for 3 days at present, if you increase this you permanently increase the storage load of the databus). It is possible to force the receiver to re-publish older data onto the bus instead and this is typically the better solution, if an extended outage is experienced.
- How to re-publish the data after an outage?
- What is the default retention period for the data to be kept in the ERC database until it is removed and can no longer be re-published to the bus?
- I guess events are removed from ERC database quite soon during normal operation. Does this mean that ERC tracks data consumption, and will try to keep the unconsumed events longer in the database when there is an outage?