Hi, occasionally some networks in our enterprise go down. We have noticed that when this happens, once the network is back up, it cannot get to the proxy. The workaround for this has been to add a static route from the affected IP to the web gateway, and then delete it. Once this is done, users in the network are able to access the Internet via the proxy again.
Has anyone come across this or a similar issue before?
Also, can anyone explain why adding a static route and deleting it is effective? Is it just a case of giving the proxy's networking a kick in the guts to get it running again on the affected network?
Any insight would be much appreciated.
I have not heard about this behaviour before... I strongly recommend to file an SR with technical support, since this looks like there could be an issue we would like to replicate and fix.
I have the same issue. Problem exists in local routing cache on MLOS2 system. Solution is following command executed under linux console: ip route flush cache
I configured cron to exec this command on every 5 minutes becasue you newer know when any enterprise network will be down again
Of course this is only temporary fix and McAfee Team should fix this issueMessage was edited by: faciulula on 9/11/13 7:23:13 AM CDT
Based on the details of this thread alone, there is not enough details to know what the issue is, as a result we would not what what to "fix".
We really do want to help you but we need more details. Can you please let us know the SRs you have opened on this?
Previously we had SR #3-3218195529 open. A feedback file was provided but no useful info was gleaned and because the issue hadn't re-occurred in a while the SR was closed on 3 Sept.
If it does re-occur, which is likely, we will take TCP dump, connection trace, and feedback file once issue has been resolved. We may also try the ip route flush cache command rather than adding a static route to see if it is effective.
I looked over the case and you are correct, the feedback did not help in isolating the issue.
However, from the feedback you are running 7.3.0- it's a bit older now (about a year old). Just a note.
Also, the case seemed to note that this only seemed to affect "remote users". Is this true? Your original description seems to indicate that only "some networks" are impacted by this.
You also said that adding a new route seemed to fix the issue. Do you happen to know what the routing table looked like prior to you updating it or flushing the cache? (ip route show OR netstat -rn)
Getting a feedback in the problem state would contain information about the problem state (rather than collecting it after).
It also doesnt make sense that you would have routing issues, because the MWG is only using one interface, so it simply relies on its default gateway.
I would wonder if there is some sort of ICMP redirect in your network which is sent incorrectly. An ICMP redirect would typically be sent by the specified gateway if there is a more optimal route to the internet that should be taken.
# MWG = 10.1.1.73
# Default GW in MWG = 10.1.1.30
# GW for the Default GW= 10.1.1.1
10.1.1.30 might send an ICMP redirect to MWG, informing it to use 10.1.1.1 instead of itself (10.1.1.30).
But this doesnt make sense, because this only affects certain networks... the above issue would impact all networks none local to the MWG itself.
So to conclude, yes, please gather the requested data if it occus again. If it pops up again, please open a SR, and post the number here.
I will be out of the office until 9/25 (fyi).
Today I had this routing issue, again. To know more about this I will start monitor route cache and this article may be helpful: http://vincent.bernat.im/en/blog/2011-ipv4-route-cache-linux.html Regarding to this article this may be a problem: "
rt_cache_entries is the number of entries in the route cache. You should compare it with
net.ipv4.route.max_size and ensure that the cache is never full to avoid triggering the garbage collector too often.". After flushing routing table value
rt_cache_entries is about 500-1200. I wonder how big is this when routing problem appers.
If anybody will observe this issue again please post result of following command (before you will run command ip route flush cache):
lnstat -s1 -i1 -c-1 -f rt_cache
Yesterday I had observed this issue again and I have done some investigation. This happened when subnet 10.10.10.0/24 was down for a while.
Routing on my proxy server looks below:
default via 10.10.9.6 dev eth0
10.0.0.0/8 via 10.10.9.1 dev eth0
172.16.0.0/12 via 10.10.9.1 dev eth0
192.168.0.0/16 via 10.10.9.1 dev eth0
Yesterday I catched invalid routing:
10.10.10.129 via 10.10.9.6 dev eth0 src 10.10.9.4
cache <redirected> ipid 0x869c rtt 47ms rttvar 62ms ssthresh 2 cwnd 4
As you can see routing is cached as "redirected" and goes default via 10.10.9.6 dev eth0 but it should go as 10.0.0.0/8 via 10.10.9.1 dev eth0. So propper routing should be:
10.10.10.129 via 10.10.9.1 dev eth0 src 10.10.9.4
cache ipid 0x866c rtt 55ms rttvar 57ms ssthresh 7 cwnd 7
Again solution for this is flush route table on linux.
I have write script to temporary fix this issue (see attachment of this post):
- put flush-routing.sh file in location /usr/local/sbin and give exec permissions
- put file mwg-custom in location /etc/cron.d/
And next restart crond service. In log file /var/log/flush-routing.log you will find some events when issue occurs again.
Hope it will help somebody Espessialy McAfee to better know and solve this problem.
KarolMessage was edited by: faciulula on 10/23/13 3:31:58 AM CDT
Please open a ticket and let me know the SR #, this way we can communicate directly and post our results. Please include a feedback when you open the case. Do not post the feedback here.