cancel
Showing results for 
Search instead for 
Did you mean: 
fwmonitor
Level 7

MWG7 HA VIP not reachable after VMmotion

Hello,

we've noticed that sometimes after a VMmotion the VIP isn't reachable and the appliance should be restarted (or service network restart).

is it mfend or vrrp related?

I've attached relevant lines from /var/log/messages

regards

0 Kudos
4 Replies
tim.skopnik
Level 7

Re: MWG7 HA VIP not reachable after VMmotion

Same problem here - we already made a case (SR # <3-2400999961> - Proxy-HA-Mode looses vIP) out of this half a year before but as the issue occures very seldom in our environment we closed it again as the sent-in logfiles gave no clue about the reason for this behavior.

Unlike your case iproxy1 does not even realize a VRRP-change (although its kernel sees the short network outage). For that reason it does not send a gratitious arp after iproxy2 (which behaves well as far as we can see) drops its temporarily occupied master role. As result the vIP resides on iproxy1 (we still have to verify this) while the last gratitious arp came from iproxy2 (this ist verified) so the connecting router still thinks the vIP was at iproxy2-side.

We are going to re-open the case now as the issue happened again yesterday.

Relevant logs:

Aug 14 07:33:08 iproxy1 kernel: VMCIUtil: Updating context id from 0xfeab825e to 0xfeab825e on event

0.

Aug 14 07:33:09 iproxy1 kernel: e1000: eth0 NIC Link is Down

Aug 14 07:33:09 iproxy1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Aug 14 07:36:57 iproxy1 snmpd[3845]: truncating integer value > 32 bits

Aug 14 07:36:57 iproxy1 snmpd[3845]: truncating integer value > 32 bits

Aug 14 07:40:28 iproxy1 shutdown[23991]: shutting down for system reboot

Aug 14 07:33:05 iproxy2 Keepalived_vrrp: VRRP_Instance({) Transition to MASTER STATE

Aug 14 07:33:06 iproxy2 Keepalived_vrrp: VRRP_Instance({) Entering MASTER STATE

Aug 14 07:33:06 iproxy2 Keepalived_vrrp: VRRP_Instance({) setting protocol VIPs.

47.160.204

Aug 14 07:33:06 iproxy2 mfend: set lbnetwork 1

Aug 14 07:33:06 iproxy2 mfend: state=master previous=backup

Aug 14 07:33:06 iproxy2 kernel: TX Synchronization temporarily suspended

Aug 14 07:33:06 iproxy2 kernel: TX Adding IPV4 Address 10.47.160.204

Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) Received higher prio advert

Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) Entering BACKUP STATE

Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) removing protocol VIPs.

Aug 14 07:33:10 iproxy2 mfend: set lbnetwork 0

Aug 14 07:33:10 iproxy2 mfend: state=backup previous=master

Aug 14 07:33:10 iproxy2 kernel: TX Synchronization from redundant device suspended

Aug 14 07:33:10 iproxy2 kernel: TX Removing IPV4 Address 10.47.160.204

Aug 14 07:41:47 iproxy2 shutdown[20690]: shutting down for system reboot

0 Kudos
fwmonitor
Level 7

Re: MWG7 HA VIP not reachable after VMmotion

0 Kudos
tim.skopnik
Level 7

Re: MWG7 HA VIP not reachable after VMmotion

Hi!

Do you really think this can be the root-cause?

We only loose the vIP! Management (using same network interface but node address) is still working fine...

As far as i read the article it seems that the issue should hit non-virtual IPs too, souldnt it?

anyway i will try to check switch notification next week!

cu .Tim

0 Kudos
tim.skopnik
Level 7

Re: MWG7 HA VIP not reachable after VMmotion

Actually we use virtual nexus switches so the settings references in the document you linked are greyed out within our installation.

thanx anyway!

cu.Tim

Edit: As today we encountered the same symptoms on a (hardware-) appliance-based proxy-ha-cluster it seems that the issue is NOT vmotion-related.

symptoms on both nodes (dmesg):

TX Conflicting loadbalancing master [peernode]

TX Scanning partner has failed [peernode]

(every 2? minutes)

We triggered the issue by changing duplex-parameters on both gateways switchports (each appliance conneced to one of two linked cisco 3750). This triggerd a short port-flap on both ports (20sec delay between the ports).

Afterwards the vIP was still reachable but the logging startet.

A restart of keepalived on the redundant device fixed the symptoms and the error logging stopped. This resulted in the following dmesg-output on this node:

TX Synchronization from redundant device suspended

(once)

As we do NOT use transparent proxy and do NOT have redundant network connections for the nodes i do not see stp responsible.

Any ideas?

cu. Tim

Nachricht geändert durch tim.skopnik on 19.09.13 09:18:12 CDT
0 Kudos