Speaking with engineering on this, unplugging the management ports is not sufficient to cause a failover because they have a heartbeat interface and that would still see the REC as up. when you power off the primary the secondary should failover and come online as primary and then your old primary should come up as secondary.
if you think that unplugging the management interfaces should be enough for a failover below is a link where you could log a PER for this request and you would be in contact with our PM group about it.
McAfee Product EnhancementRequests: https://mcafee.acceptondemand.com/index.jsp
this ERC cluster has two usable interfaces: one interface dedicated to the mgmt (eth0) and the other (eth1) used as a shared data interface for the data traffic coming from all the log sourcers. I think that there must be a failover if -at least- the data (eth1) interface goes down.
Thanks for your help,
Below is what I have found:
If we lose link on eth1, it should fail-over to the other machine. This is not a fast fail-over, it may take a couple of minutes before we are convinced that the link is truly lost. If we lose the mgmt link, it will not fail-over because in most cases, it is still collecting data and the ESM has a good chance of being able to connect through the other interface.
Let me know if this answers your question or if you have other questions.
ok, I'll try to wait more time the next time, but the last one I've waited 2-3 minutes (maybe 5) and the failover didn't work. Where did you find these informations?
I've tryed to search some informations on how the ERC cluster should works, but I dind't find many...
Best regards and thatnks for the support,
Today I tried to disconnect the eth1 waiting something like 10 minutes, but the failover didn't work. It seems that it works only rebooting the primary or disconnecting the heartbeat interface :-( ....
Does anyone ever tried the failover disconnecting this shared interface (eth1)?
Thanks in advance,
I will follow up with our engineer about this, but this may be a case where you may need to either call our number or create a case for further investigation. I will respond with the engineers comments.
ok, thanks. Please noticed that we have the same behaviour both on a 1250 and 2600 ERC cluster...
Speaking with engineers they say it has been tested and at this point we would need to start collecting data from you so it may be best to log a case with us. include the version of both ESM and REC's you can get this by sshing to the device and typing: cat /etc/buildstamp
then if you could ssh to each of the HA pairs and get the /var/log/ha_nicmon.log from them that way we can further investigate this.
We opened a ticket and the support has worked developing a new fix.
The latest fix here
has solved this issue