4 Replies Latest reply on Sep 19, 2013 9:18 AM by tim.skopnik

    MWG7 HA VIP not reachable after VMmotion

    fwmonitor

      Hello,

       

      we've noticed that sometimes after a VMmotion the VIP isn't reachable and the appliance should be restarted (or service network restart).

       

      is it mfend or vrrp related?

       

      I've attached relevant lines from /var/log/messages

       

      regards

        • 1. Re: MWG7 HA VIP not reachable after VMmotion
          tim.skopnik

          Same problem here - we already made a case (SR # <3-2400999961> - Proxy-HA-Mode looses vIP) out of this half a year before but as the issue occures very seldom in our environment we closed it again as the sent-in logfiles gave no clue about the reason for this behavior.

           

          Unlike your case iproxy1 does not even realize a VRRP-change (although its kernel sees the short network outage). For that reason it does not send a gratitious arp after iproxy2 (which behaves well as far as we can see) drops its temporarily occupied master role. As result the vIP resides on iproxy1 (we still have to verify this) while the last gratitious arp came from iproxy2 (this ist verified) so the connecting router still thinks the vIP was at iproxy2-side.

           

          We are going to re-open the case now as the issue happened again yesterday.

           

          Relevant logs:

          Aug 14 07:33:08 iproxy1 kernel: VMCIUtil: Updating context id from 0xfeab825e to 0xfeab825e on event

          0.

          Aug 14 07:33:09 iproxy1 kernel: e1000: eth0 NIC Link is Down

          Aug 14 07:33:09 iproxy1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

          Aug 14 07:36:57 iproxy1 snmpd[3845]: truncating integer value > 32 bits

          Aug 14 07:36:57 iproxy1 snmpd[3845]: truncating integer value > 32 bits

          Aug 14 07:40:28 iproxy1 shutdown[23991]: shutting down for system reboot

           

           

          Aug 14 07:33:05 iproxy2 Keepalived_vrrp: VRRP_Instance({) Transition to MASTER STATE

          Aug 14 07:33:06 iproxy2 Keepalived_vrrp: VRRP_Instance({) Entering MASTER STATE

          Aug 14 07:33:06 iproxy2 Keepalived_vrrp: VRRP_Instance({) setting protocol VIPs.

          47.160.204

          Aug 14 07:33:06 iproxy2 mfend: set lbnetwork 1

          Aug 14 07:33:06 iproxy2 mfend: state=master previous=backup

          Aug 14 07:33:06 iproxy2 kernel: TX Synchronization temporarily suspended

          Aug 14 07:33:06 iproxy2 kernel: TX Adding IPV4 Address 10.47.160.204

          Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) Received higher prio advert

          Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) Entering BACKUP STATE

          Aug 14 07:33:10 iproxy2 Keepalived_vrrp: VRRP_Instance({) removing protocol VIPs.

          Aug 14 07:33:10 iproxy2 mfend: set lbnetwork 0

          Aug 14 07:33:10 iproxy2 mfend: state=backup previous=master

          Aug 14 07:33:10 iproxy2 kernel: TX Synchronization from redundant device suspended

          Aug 14 07:33:10 iproxy2 kernel: TX Removing IPV4 Address 10.47.160.204

          Aug 14 07:41:47 iproxy2 shutdown[20690]: shutting down for system reboot

          • 3. Re: MWG7 HA VIP not reachable after VMmotion
            tim.skopnik

            Hi!

             

            Do you really think this can be the root-cause?

            We only loose the vIP! Management (using same network interface but node address) is still working fine...

            As far as i read the article it seems that the issue should hit non-virtual IPs too, souldnt it?

             

            anyway i will try to check switch notification next week!

             

            cu .Tim

            • 4. Re: MWG7 HA VIP not reachable after VMmotion
              tim.skopnik

              Actually we use virtual nexus switches so the settings references in the document you linked are greyed out within our installation.

               

              thanx anyway!

               

              cu.Tim

               

               

               

               

              Edit: As today we encountered the same symptoms on a (hardware-) appliance-based proxy-ha-cluster it seems that the issue is NOT vmotion-related.

              symptoms on both nodes (dmesg):

               

              TX Conflicting loadbalancing master [peernode]

              TX Scanning partner has failed [peernode]

              (every 2? minutes)

               

              We triggered the issue by changing duplex-parameters on both gateways switchports (each appliance conneced to one of two linked cisco 3750). This triggerd a short port-flap on both ports (20sec delay between the ports).

              Afterwards the vIP was still reachable but the logging startet.

              A restart of keepalived on the redundant device fixed the symptoms and the error logging stopped. This resulted in the following dmesg-output on this node:

               

              TX Synchronization from redundant device suspended

              (once)

               

               

               

              As we do NOT use transparent proxy and do NOT have redundant network connections for the nodes i do not see stp responsible.

               

              Any ideas?

               

              cu. Tim

               

              Nachricht geändert durch tim.skopnik on 19.09.13 09:18:12 CDT