My first thought is that you have an explicit rule in the Access Control Rules for the 'entrelayd' service (called the "Enterprise Relay Server" at version 8). Go to your Rules and type the word "enter" in the Search box. Do you have a rule using this service? If you do you must delete it.
One other thing to check on both firewalls: login via SSH and run 'cat /secureos/etc/failover.conf' and go to the bottom of the file. There is a line there that says "key(SHA512 some_key)." Make sure both of them say "SHA512" and that one doesn't say "SHA1" instead.
Sliedl, thanks for the quick response. I just checked on both of your suggestions, I have no rules with the entrelayd service and a check of my failover.conf verified they are both at SHA512. One additional bit of info I just thought of. When we first tried this the hearbeat zone was a redundant lagg interface. I read that there was an issue with this in 8.0.0, even though Im currently using 8.3.2 ( i couldn't confirm but assumed this was fixed by now) I thought I would go back and break out the lagg and just use a single interface. But its the same problem. I did break up the HA pair before I made these interface changes, then re created the cluster. Could this have affected something?
There could be any number of things causing this and if there is some explicit error there will be an audit event on either the primary or the standby (or both).
If the firewalls are connected via a switch then the switch must be able to pass IGMP and it must not decrement the TTL (the TTL of the heartbeat is 1). If they are connected via a switch you can try connecting them with a cable (straight-through or crossover) to see if that fixes the issue. Also the heartbeat interface should not be VLANed.
We ended up finding a workaround if not the correct solution. In the Advanced interface options we removed the "Monitor Interface" check on the Heartbeat interface. This corrected our issue with the cluster not recognizing the secondary firewall was there. We also tested failover and restoral successfully. We still get occasional errors that the secondary is not responding when making configuration changes, but we believe that is a layer one issue with the hearbeat cable.