We recently tried to upgrade one of our SG580 routers to V4.0.5 firmware with somewhat disasterous results. After only day we had to revert back to the V3.1.6 firmware because the 30 or so IPSec VPN tunnels we rely on would not stay up for more than and hour or so before failing and having to be reset.
We have had a probelm wth routing over the VPN tunnels on the V3.1.6 firmware so some time that has persisted across several firmware updates and so far eluded the developers when the problem was reported to technical support. In V4.0.5 this routing probem is cronically bad to the point where the network wasn't really functioning at all.
The symptom of the problem is that the VPN tunnel says its up but packets are not being routed over the tunnel to the remote site. A PING won't work (nor anything else) that goes over the tunnel. Disabling the tunnel at the hub router and then reenabling usually restores the routing function over that VPN tunnel. In V3.1.6 this happens to about 1 tunnel per week with 30 tunnels. In V4.0.5 it was happening to several tunnels every hour all day. We reverted back after only one day of this mayhem.
I persisted with a single branch SG560 router on V4.0.5 to see if it was stable enough with only a single VPN tunnel back to the hub router. The new status feature and interface information displays are really informative and we would have like to use that if we could.
On the branch router with only one tunnel the SG560 would go into a CPU loop at least once a week. The web interface was eventually unresponsive when this happened but the CPU usage on the status display would jump from 2-3% to 100% before it completely locked up and stopped responding altogether. The routing of packets over the VPN tunnel stopped just prior (only a few seconds) to the CPU loop starting. It was then necessary to power cycle the device manually. We could not continue with this level of reliability and we are now back to V3.1.6 firmware.
A support report is attached for the SG560 V4.0.5 firmware at around the time of a CPU loop starting (I only ever got one report, most times it locked up entirely before completing the download).
Another bug in the V4.0.5 firmware was the time an interface had been up, while the diagnostics page would accuratley show the boot time as being only a few minutes, the up time of the network interfaces and VPN tunnels would show over 136 days uptime within minutes of being rebooted.
The V4.0.5 firmware needs major overhauling with regard to the IPSec VPN tunnels before the next release.
Sadly, McAffee support is so difficult to deal with I have given up trying to log a support incident.Message was edited by: mark.emery on 15/02/10 10:22:49 PM
This sounds exactly like problem we are having althrough not as often.The Ipse tunnel would not pass traffic and the onl fix is to restart the tunel at the remote / spoke end.
We are currently running a mixture of 4.0.5 for the hub and 3.22 and 3.1.16 for branches office (on the plan to migrate all to version 4). I was blaming the version 3 firmware as the list f fix for version 4 inc many ipsec fixes.
Now reading your detailed report is making worry!
Yes, attached TSR is just prior to complete lockup, CPU was already showing 100% on status display. All the rest I tried to capture locked up before the download of TSR completed.
We do have a single SG580 running V4.0.5 firmware with no IPSec at all and it runs OK as firewall and TrustedSource relay. although it does still show the incorrect uptime on connections uptime 141 days after 1 hr since booting:
LAN, Static, 22.214.171.124
up for 141 d 3 h
VLAN 2, Unconfigured
VLAN 3, LAN, Static, 192.168.0.1
up for 141 d 3 h
VLAN 4, Internet, PPPoE
up for 141 d 3 h
up for 141 d 3 h
McAfee/SG580 Version 4.0.5 -- Tue, 29 Sep 2009 20:02:52 +1000
Linux version 2.6.26-uc0 (build@sgbuild) (gcc version 4.2.1) #1 Tue Sep 29 22:31:23 EST 2009
Serial Number: 0601450647330517
Uptime 1 hours, 8 minutes, 13 seconds.
Gateways: 126.96.36.199, 188.8.131.52
DNS: 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124
Your IPSec MTU is set to 1500.
this is not good.
can you set it to 1300 and see if issues persist ?
if that works, bump it up to 1400 and I think this will be ok....test of course.
We've had 1500 as IPSEC MTU for several years (I think it was the default in earlier versions?), we recently put in a Cisco router on a BDSL service that has MTU hardwired to 1500 in the Cisco router that Telstra supplied. The traffic for these IPSEC tunnel passes through this Cisco router and I had to increase the reassembly buffer because of the high fragmentation rate. That's how I found the Cisco MTU was fixed but now I understand the fragmentation is probably coming from the remote ADSL end.
It seems the MTU for Telstra ADSL is 1492. Is there a way to calculate a sensible value for IPSEC MTU based on the MTU of the underlying ADSL service? ie 1492 less IPSEC overhead? or should we be using 1492 to match ADSL service?
Can't really do suck-it-and-see on the live network, there are 30 routers to do so only want to try once. Problem happens at random to one out of 30 per week so trying only one at random will not yield a result, doing all will produce a result if the problem hasn't recurred for a few weeks. We don't have a V4 router to try it on and I would like much more confidence in the candiate solution before putting one back in. Also don't want to make MTU too small and create a whole new problem that might be caused by increased packet rate if MTU is too small.
The UTM help says the MTU field can be left blank, does it do a calculation or test to determine a viable value for the network its running over?
I don't mean to be uncooperative, I appreciate the help, I'd just like to have some confidence that the new MTU I set is chosen better than a random guess.
The IPSec MTU, like all MTU's should be discoverable
but there are issues at times as per above, so we can't always rely on MTU path discovery.
Your MTU will be 1500 ( Ethernet interface ) - PPPoE headers if applicable - IPSec headers
Which leaves you in the low 1400's.
But experience over the years has shown me that even this is sometimes not low enough, depending on whose ADSL service is resold on to another provider via some ATM switching and a magic box that further reduces the MTU available.
So, when I hear of IPSec issue, one thing is to consider the impact of the MTU as currently set in the unit. Seeing yours set at 1500 means that you will be relying on MTU path discovery, and as such not the fixed 1500 value that you have set, as an IPSec packet with a payload of 1500 bytes will certianly need to be fragmented.
So I suggest 1300, as I have never seen IPSec packet beween two end points have issues at this value.
As such it is a safe way to eliminate a MTU issue as the cause of the problem. While I say safe, IPSec subsystems do get restarted to apply this change.
If 1300 was set we would know decisively whether the issue is MTU is related.Message was edited by: Ross Camm on 2/18/10 4:37:14 PM GMT+10:00
After some manual testing of various packets size we have discovered that for our IPSEC configuration running on Telstra ADSL with MTU of 1454 that an IPSEC MTU of 1395 is the largest value will not cause fragmentation. Phase 2 settings are 3DES-SHA with 1024 bit key.
also found a ping size of 1472 bytes sends a packet that is 1500 bytes in size (MTU of ethernet interface).