I have 50 SuperAgents across the globe acting as repositories for our various sites and they are set to replicate at various times of the day as to spread the load on ePO.
Most days I am getting between 2 - 5 failed "Distributed Repository Replication failures" they seem to be pretty random which is puzzling, some days no failures whatsoever.
There are a few repeat offending machines, however there's seems no patterns whatsoever as to which ones fail.
When i try and view the packages in ePO>Distributed Repositories for the failed SA's i get "Site Catalog not found" but if i reboot the machine and manually kick off the replication task it completes, if i don't reboot the SA sometimes the next day it replicates okay without it being rebooted.
I have checked that Lazy caching is disabled, and i can't find anything anywhere with a definitive fix as to what is causing this problem.
I'm running ePO 5.3, with a mixture of Agent versions 4.8 & 5.0.1 & 5.0.2, it doesn't seem to matter which version of the Agent is installed.
how the remote sites are connected? I had such things with connections over "DSL" that is disconnected once a day by the provider. When this happened during replication - the replication failed (and I also had no access to the files/catalog on that repository - but could do a fresh replication manually). So maybe your connection is somehow unstable?
And keep in mind that dist. repositories not always "better" than direct synch with ePO (e.g. regarding bandwidth) - it depends e.g on how many clients do you have on the remote site. And e.g. you could only use the remote repositories for e.g. software update/install (new product version) but pull daily DATs directly from ePO-
Hi - All are connected on our WAN links, some sites are 24/7 sites so that would rule out ISP disconnections.
We have alot of clients across the network 5000+ so SA's are definitely the way to go, our SA's are there to replicate all applicable packages.
I have similar issues. Some of my SA repos work, some don't. Even after a full 'reset' some repos work but there are still bad repos where I cant see the packages from ePO, although replication claims it has worked (I can see some of the files locally). In addition, I don't think my clients are doing a very good job with 'ping time' and I cant even find the right log for that action these days. I thought the McAfee Agent Common Service service running as a Local Service might have been an issue, but even using Local System still fails on the bad ones. Also, I have some DRs and some of those are barely being used, which makes no sense as they are on bigger sites with no SA repos. Also, I can see clients connected to the bad SAs on port 8190 - I know connectivity works. I have a ticket open with McAfee. I await their attention.............
This message may mean that the SA cannot connect to source repository due to; network throttling or network congestion - I'm wonder if this may be due to "to replicate at various times". Meaning, I manage over 30,000 nodes, with 80+ SA's with 5 specific scheduled replication task set for very specific times of the day (non-business hours or low network traffic hours) - all 5 task run within a half hour of each other, back to back.
As for the confirming if the Agent is accessing the correct SA - Under McAfee Agent > Repository; I have configured the client policy: "Use this repository list" & "Subnet distance"< this has to be dialed in; play with the maximum number of hops>.
I really hope that the removal of the "Enable Agent Activity log" was just an oversight and that it will be reintroduced in Agent 5.0.5 ... Any Intel employees PLEASE chime in; would love to hear from you:
Thanks Tao. I have McAfee engaged now. We are at the usual stage of upping logging levels etc. Meanwhile I can re-iterate that I see many client connections on my failing SAs. Its as though the connections do not close and the service gets hung. When I restart macmnsvc.exe it kills the connections of course and then I can see the packages again from ePO and maybe the repo works for a while. I have to check that bit. I set one repo to auto-restart the macmnsvc.exe service every two hours yesterday. I came back to look today and the service is in 'stopping' state.
Repos that were good yesterday are no longer good today. Same issue. However, some repos stay good.
The McAfee guy says this is how to find the ping times in the logs, but I cant find them. We can use subnet hops as our network team are random with this!
Agent logs and McScript.
Masvc for agent log for MA5.0.x
Look for something similar to below
ping ICMP::Ping - Pinging “repo”
ping ICMP::Ping - Avg ping time is “0”
The best approach is to debug first. The issue might not just be related to a ping problem.
What agent version? How many repositories are you replicating to? What are the specific errors in the epoapsvr log? Do they always fail, or randomly?
Was my reply helpful?
If this information was helpful in any way or answered your question, will you please select Accept as Solution in my reply and together we can help other members?
ePO version is 5.3.1, Agent version is 184.108.40.2068. I have 58 repositories.
I see 403 errors in the epospsvr log. The replication failures are random. I do have one or two problem repos that may be a network related due to their locations.
Sometimes the McAfee Agent Common Services has stopped or its status is stopping. A reboot get the service going again and the replication works.