On ESMI, Receiver Properties say Primary HA Status: online, Secondary HA Status: standby.
So I click on High Availability -> Return to Service -> Secondary.
Eventually I get two dialogue boxes saying: "The device was successfully brought back online.", then "High Availability Sync Receiver Settings was successful"
Receiver Properties shows secondary HA status = online and ha_status on secondary concurs.
Then, after a minute or so, Receiver Properties shows secondary status = standby and ha_status ...
McAfee1-ERC-4600 ~ # ha_status
McAfee1-ERC-4600 ~ #
McAfee2-ERC-4600 ~ # ha_status
McAfee2-ERC-4600 ~ #
ifconfig on both receivers looks good (Primary eth0 IP = pri. management IP, Primary eth1 = shared IP, Secondary eth0 IP = sec. management IP, Secondary eth1 = <not set>)
Have a look at the last 100 lines of these files:
See if there is ha-info.sh script on the system for some reason that did not make it into my notes but we used on a prior ticket..
Otherwise, try this command.
#> crm status ## HA native status tool
If you think there are no stoppers from the above logs/output you could try:
#secondary#> ha_statuschange --online
Soooo, after quite a few months of no issues... we encountered this yesterday. syslogs/events dropping on the floor = BAD ( listening McAfee ??)
Two HA pairs in our one facility both stopped getting events. After investigation, we found neither receiver in both sets of HA pairs had the shared IP and ha_status showed mode=offline on primary and mode=standby on the other (that is from memory which was relayed to me from the person on-site)
The fix was executing "ha_statuschange --online" on the primary receiver and we were back in business.
Why McAfee does not parse ha_status output to show in the GUI is beyond me ( I relayed this in a SR this past year ) as it CLEARLY gives evidence that neither receiver is primary ( never give up unless one system is primary ) which could then trigger a event/Alarm.
Interestingly we had something similar this week too. One of our HA ERC cluster has both nodes in an offline state. We had to "Return to Service" them both. Fingers crossed McAfee start sending these sorts of events into ESM so they can alert on.
I am running 9.4.0 20130903 - have been having loads of issues since our upgrade. Prior to that 9.3.2 for 8 months with zero issues.