Bit of a weird one here, and I'm not sure if it's been caused by the agent, or some other environmental factor.
I've got a dozen or so distributed repositories set up across a WAN, in ePO 4.6. The agent policy is set up so that clients use the closest subnet to determine the correct repository.
I've been looking through some repository utilization reports, and clients have been successfully connecting to their local repository for updates...except for yesterday, when 100% of clients connected directly to the ePO server for DAT updates, completely ignoring their local repository.
I'm not aware of any network or server issues we've had here, that would have caused 200ish agents to decide that the ePO server was their local repository.
Any ideas? It's not a major issue at the moment with only 200 clients on the server, but I'm in the process of updating another 1,000 odd clients from vShield 8.5 and ePO 4.0, and if they all decide to hit the WAN, it's going to be a big traffic jump!
The most likely things that spring to mind would be:
a) if the replication to the distributed repos failed for some reason, or
b) if the master repo was modified in some way without a subsequent replication task being done.
The second option is possibly more likely: any time you change the master repo in any way, for example by adding or removing a package, you have to replicate the change to the DRs: otherwise the client machines will know that the DRs are out of sync with the master and will refuse to use them.
Thanks for that Joe. I haven't made any changes to the master repository for quite some time now, and certainly nothing around the day that everyone came back to the ePO server. I just went and checked the server task log, and replication has reported as being successful to distributed repositories for the past week, so option a seems unlikely too.
The only failures I have in the server task log are 'Download software product list' - I don't think that's related though, it looks to have failed a few times without affecting where clients download from.
If nothing else, I've at least added some new monitors to my dashboard to keep better track of this stuff! If it happens again, I'll have to log it with McAfee - happy to hear any more suggestions though!
To really track it down we need to look at the client machines... can you post the agent_<machinename>.log and mcscript.log from an affected machine? That might shed some light, assuming the relevant time hasn't aged out of the logs by now...
Thanks again Joe. Looks like we've missed getting the info from the log files by one day, so they won't help unfortunately!
I think at this stage, I'm just going to have to make sure I keep a close eye on where clients are updating from, and grab all of that info before it runs off the logs. At least I've managed to modify some of the queries now to give me good visibility of where clients are updating from.
Any chance you could share what you've done to track where clients are updating from? I have over 4000 endpoints and it can get frustrating sometimes.
No problems David!
I've using two queries now, to confirm that:
Firstly, distributed repositories have been updated by the ePO server. The query is under Product Deployment - 'Distributed Repository Status'. I've also set up an automated response, so I get an email if any repository replication fails.
Secondly, I'm checking to see which distributed repositories are being utilised using the query under McAfee Agent - 'Repository Usage Based On DAT and Engine Pulling'. I've edited that default query, so that it's only giving me results for the past day. I can quickly get a view on whether the distributed repositories are servicing clients. On the day where I had the issue that led to this post, I could see that the ePO server was the only repository that showed up as being utilised...so a pretty clear indication that something was wrong! This query doesn't give me any type of 'a client from site A updated from site B today' notification, but I can pretty quicly see whether repository utilisation roughtly matches up with the number of systems in each site.
I've got both of those queries on my dashboard now :-)
That's awesome, thank you.
Previously i've been pulling the mcscript.log file from workstations when there have been problems.
Seems I've had this happen again, over the past two days. I think I've found the reason this time though. What I think has been happening, is that the scheduled repository replication is occurring before the latest DAT file has been added to the master repository. As the distributed repository replication is only scheduled once a day, it ends up being out of date, and causes the clients to fall back to the master repository.
I think with the way the server tasks were scheduled, often replication would be successful - but then would fail occassionally, hence the inconsistent behaviour I was seeing. This week, we've also hit daylight savings time - so the time DAT files become available will have shifted.
Much easier to see what had happened when I was only looking through server task logs from yesterday, instead of a week ago!
I'm not sure how your environment would take to using RC software, but we deployed agent 4.6 across the nation to prevent that situation from ever ocurring.
The lazy caching is a god send.