I have a question for some work i am doing with a client. The design is to have a central EPO 4.6 server, and around 60 or so distributed FTP repositories. If, for any reason the main EPO 4.6 server has a failure, or key connectivity from this central server is disrupted, replication of master repository contents will cease to function to these distributed repositories. Clients though, will still see a working distributed repository, but, just with old content. I have a fallback site configured, infact a couple, one, an external internet facing HTTP repository which is again a managed distributed repository, and also, thirdly, the NAI Site repository, but, to make a client use these, it actually required a failure of one of the actual repositories it usually uses (non-contactability), which in this scenario, wouldnt actually be the case, as locally, they would still be 'up'. To this end, can you think of a way where by in the even of the main EPO server failing, we can make the latest DATS available to these 60 or so distributed respositories, keeping in mind that the respositories are usually a 'copy' of the Master Repository which, in this instance, would be down.
Short of asking the satillete sites to ammend their DNS to an incorrect IP address so the clients get a name resolution error and forces the client to use NAI Site, how can we keep these clients up to date in the even of a failure? Ive been scratching my head but cannot work it out, ive pretty much catered for most failure scenarios
I considered having a secondary EPO Server, which was aware of the Distributed Repository list (SiteMgr.xml), and was using the same key pair for content signing, and in a failure scenario, using this to distribute content to the Repositories, but it seems rather over-kill, and would obviously require hardware/software licenses etc etc to setup (could just be an offline workstation to be honest), but i guess it could be an option, if having two EPO servers, using the same distributed repositories would work in this scenario, though seems like a fudge. I am actually testing this and seeing what happens in a lab
Jon 21/01/11 09:15:54 CST
1 simple way would be to create ftp script that manually pulled the files down to the distribution repositorie(s) from NAI, if the scenario ever came about. Then once the Master is back online you can force a full update from the master to make sure you don't end up with any mismatches. (Not sure if McAfee would support this, but its easy enough to do).
It really depends on your environment though and the pressures on you or your team, in most cases a back up of the SQL Database and a Image of the ePO server would be just as quick to get back up and running as it would take for another DAT to be released.
Yes, the script would certainly be the best option i feel.
The structure of the NAI site is actually different slightly from the distributed repository structure. I would be intruiguied to konw what files i would need to replicate to ensure clients trusted the replicated contents.
If you look on http://update.nai.com/Product/CommonUpdater You have: - many files which are basically DAT's and GEMS at the top level. Also along with catalog.z, ceu.ini, extradat.mcs ,gdeltaavv.ini ,oem.ini ,replica.log ,scmdat.pdb ,sitestat.xml ,v2datdet.mcs , v2datinstall.mcs
1 folder Current (with structure underneath etc)
The distributed repository, You have : - 3 files SiteStat.xml, Replica.log, catalog.z.
2 folders Current and Previous.(with structure underneath etc)
I am assuming that clients validate the repository contents by means of the catalog.z primarily, so a replication of just this file from the NAI site, to the distributed repository, along with any Deltas from the contents of the Current folder at NAI Site to the distributed repository, should be enough to 'fool' clients to pull the replicated contents from the Distributed repositories.
Actually after this, i would assume that an incremental replication from the Master Repository would actually suffice. I would not be able to do a FULL replication from the Master Repository very easily due to bandwidth constraints. It would be very very onerous.
As far as a full restore of EPO server, sure this would be fine, but in my scenario, i would have no comms access to the Datacenter where the server was located, or, the DC blew up . . .
What do you think? Maybe i will experiment
An experiment is certainly needed for this, might have ago myself, I'm sure I have done something similar in the past.
Off top of my head you are right about the files required and if these are the only files you are changing then a Full replication although desirable will not be necessary. (during the testing will prove the correct files)
If you would not have access to get a the original server up and running in this scenario within an adequate timeframe then for repository updates using a script would be a simple method (if it works, I'm sure it will though).
Good luck let me know if you getting working, if I find a script or get chance to do it I will send you the details.
Hi Ian + everyone
ok i have been doing alot of testing this afternoon and its quite interesting
With two EPO servers (call them eposerver1 and eposerver 2) , i have a single distributed repository (UNC) dist1. I have a client1 which is using eposerver1 for its ASC and policies. Client1 has a policy to use dist1 as its one and only repository. Dist1 is setup as a distributed repository on eposerver1, and i kick off a task to do a full replication.
a) Full replication completes. client1 can update from dist1 as you would expect
2. I then setup eposerver2 to also use dist1 (UNC) as a distributed repository. I force an incremental replication task. Task completes. client1 complains that ' catalog.z is corrupt'. It means, it doesnt trust the content, as you would expect.
3. Public key from repository exported from eposerver2 and imported into eposerver1. Client policy update is forced
4. Client1 can now update itself again from the repository with no issues.
Now, all the above it well and good but, it actually took a couple of hours of messing around, but eventually it just worked.
5. I pulled down the contents of commonupdater, to a directory, swapped out the entire contents of dist1 to common updater. Client1 updated it self with no issues.
6. I restored the dist1 contents to what it was before, then i swapped out the sitelist and the catalog.z to that of the commonupdater from NAI. Client1 updated itself with no issues.
7. I then did a incremental replication on eposerver1 back to dist1 repository. Task completed with no errors. Client1 updated itself with no issues.
So, by the end of today, i feel i have a good idea of how it all works, and, how to script a back-up job incase of failure, and i feel quite reassured an incremental task for each of the respositories will restore them with no dramas.
Still need to do a bit more testing, but seems to be working as i expected.
PS replica.log is not important to the clients, you can delete it if you so wished and it makes no difference.
Cheers for being a sounding board
You *may* run into problems related to the catalog version. The first file a client downloads when it connects to a repository is the sitestat.xml. The sitestat.xml contains exactly 2 pieces of information:
The client machines no the catalog version of the master repository as its part of the information they exchange during a standard agent-to-server communication. If they connect to a repository with a catalog version that is older than the master repository they will flag it as "not up to date" and move on. I honestly don't know what they will do if they encounter a catalog version that is more recent than the master repository. If an update fails from a repository you can check the mcscript.log for more details on precisely why the update failed.
Ok, things are a little clearer now.
It seems that if you configure a repository as a distributed repository within EPO, this can obviously appear as a repository on the Sitelist for clients to use. It appears in the list of repositories in the McAfee Agent Policy where you can enable/disable it, but it is considered by the Policy to be a 'Global' Repository.
As such, it would seem as Jeremy mentinoed, it is fussy about Catalog Version.
Interestingly though, if you configure the 'same' UNC path, that clients would use to access this repository, which is automatically included in the policy when you configure a distributed repository, but as a Local Repository, manually defined in the McAfee Agent Policy, it seems it does not care about Catalog Version and 'Invalid Repository' error does not occur, if you swap out the repository contents, with that of say commonupdater from ftp.nai.
WHat i will have to do is for my 60 or so repositories, is configure them all as ftp distributed repositories, but have them disabled within the McAfee Agent repository list. I will though, specify a 'Local Repository' as a dns name such as updateav. I will then, ensure all the distributed FTP repositories in all the different DNS zones, resolves to the relevant FTP repository in that location. This way, the McAfee Agent will not be treating the distributed repositories like 'Global' Repositories, so if in a failure scenario, i can swap out the contents with NAI, and clients will not complain.
It sounds like a plan. . .what do you think?
As this example proves: -
Sounds like a messy plan but then it was always going to be that way.
Nice to get a clearer understanding of it all though thanks for sharing.