Hi, I need your help. I came across your blog and I tried everything without sucesfully
I have 13 SA Repository and the problem is with 4 of them. The Super Agent Repository Replication Task failed every day at the same point (after Copy VSE870.msi) almost 45% after 1 hour 45minutes, I created a new task (testing) using just one SA Repository and the result is the same "Failure".
I let you know all the changes tried and the Mcafee Ambient Configuration:
* EPO 4.5 Patch 1
* SA Repository Agent 4.0 Patch 3
* Enterprise 8.7
* I Create a new task replicating just one SA repository in a diferent schedule of the master task that replicate all the right SA repository without failure, and choosed the necesary packages (Engine, Dat, Agent) and clear the check Replicate legacy DATs
* Reinstall the Agent in the SA repository (now it is using 4.0 patch 3)
* Every Day the replication failed at the same point (after Copy VSE870.msi),
* I am using advace Logging in the EPO Server
* It´s not a DNS or NTFS Permission Problem
Please Help me,
I´ll send you part of the EPOAPSRV Log file:
20100421042134 I #5284 naInet ------------------------------------------------------------
20100421042134 I #5284 SIM_InetMgr Session 1 ended, result=1
20100421042134 I #5284 SiteMgr GeneralInetRequestThreadProc: GeneralInetRequest thread ended
20100421042134 x #5284 SiteMgr SiteMgr main control final release...
20100421043343 I #3084 naInet HTTP Server returned success, HTTP return code: HTTP/1.0 200 OK
20100421043343 I #3084 SIM_InetMgr Uploaded file VSE870.msi successfully in session 1
20100421043343 I #1544 naInet HTTP Session closed
20100421043343 I #1544 naInet ------------------------------------------------------------
20100421043343 I #1544 SIM_InetMgr Session 1 ended, result=1
20100421043343 e #1544 SiteMgr ReplicationThreadProc: Upload data to site ePOSA_NASAMCBO failed
20100421043343 e #2724 SiteMgr ReplicationThreadProc: Replication finished with partial failure
20100421043343 x #2724 SiteMgr SiteMgr main control final release...
So I'm using the information you linked in the epoapsvr.log here:
http://community.mcafee.com/message/127442#127442
The last replication attempt failed to a site named ePOSA_NASAMCBO with these errors in the epoapsvr.log:
20100426025936 I #2192 naInet HTTP Session initialized
20100426025936 I #2192 naInet Connecting to HTTP Server in socket-mode
20100426025936 I #2192 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100426025957 E #2192 naInet Failed to connect to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100426025957 E #2192 naInet Socket error: 10060
20100426025957 I #2192 naInet HTTP Session closed
20100426025957 I #2192 naInet ------------------------------------------------------------
20100426025957 E #2192 SIM_InetMgr Start session for site upload failed
20100426025957 e #2192 SiteMgr ReplicateSite: Failed to connect to site ePOSA_NASAMCBO
20100426025957 I #2192 SrvEvtInf Generating Event
20100426025957 e #2192 SiteMgr ReplicationThreadProc: Upload data to site ePOSA_NASAMCBO failed
This is a straight-forward failure to connect. So ePO attempted to contact the super agent via the agent wakeup call port and failed to establish a connection. For this typically you need to make sure we have a route from the EPO server to the machine hosting the super agent repository, confirm the agent service is running on the client machine and that the frameworkservice.exe is listening on port 8081 on the machine hosting the SA reository.
This appears to be an intermittant network issue as you can see a little higher in the log it was successfully connecting to the same site (this occured the day before):
20100425030007 I #6208 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425030007 I #6208 naInet Connected to HTTP Server: NASAMCBO.ven.rsa-ins.com
However a little further along in the log on the same thread it is timing out trying to send a file:
20100425032454 E #6208 NaiInet Socket send error 10054
20100425032454 E #6208 naInet Failes to upload data in bytes: 65536
20100425032454 I #6208 SIM_InetMgr Upload file avvdat-5962.zip failed in session 1, nainet ret=10054
20100425032454 I #6208 SiteMgr ReplicationUploadFile: Failed to upload file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, hr=-2147467259, retry limit remaining: 4
20100425032454 I #6208 SiteMgr ReplicationUploadFile: Uploading file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, retry limit remaining: 4
20100425032456 I #6208 SIM_InetMgr Uploading file avvdat-5962.zip from session 1, LocalDir=C:/PROGRA~1/McAfee/EPOLIC~1/DB\Software\Current\VSCANDAT1000\DAT\0000, RemoteDir=Current\VSCANDAT1000\DAT\0000
20100425032456 I #6208 naInet Uploading file C:/PROGRA~1/McAfee/EPOLIC~1/DB\Software\Current\VSCANDAT1000\DAT\0000\avvdat-5962.zip to HTTP Server
20100425032456 I #6208 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425032517 I #6208 naInet Failed to connect to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425032517 I #6208 SIM_InetMgr Upload file avvdat-5962.zip failed in session 1, nainet ret=10054
20100425032517 I #6208 SiteMgr ReplicationUploadFile: Failed to upload file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, hr=-2147467259, retry limit remaining: 3
It goes on to re-try 5 times and gives up. Notice the return code windows is passing back to EPO:
20100425032454 E #6208 NaiInet Socket send error 10054
So 10054 = Connection reset by peer (ref: http://msdn.microsoft.com/en-us/library/ms740668(VS.85).aspx). This indicates something other than EPO is closing the connection. Could be one of several things, perhaps the agent service hosting the repository stopped on the remote machine? It could also indicate that at the time of the replication the WAN was so overloaded it couldn't process these requests in a timely fashion. The files the replication is failing on (avvdat-5962.zip and vse870.msi) are both larger files so maybe you don't have the bandwidth required to send those files over the WAN? To test this you could try manually copying one of those files from the EPO server to the machine hosting the super agent repository.
I hope that helps get you going in the right direction.
Hi Jeremy,
Thank you for your answer is a really good explanation. I´m highlighting my comments, please feel free in ask me if you have another question or doubt:
The last replication attempt failed to a site named ePOSA_NASAMCBO with these errors in the epoapsvr.log:
20100426025936 I #2192 naInet HTTP Session initialized
20100426025936 I #2192 naInet Connecting to HTTP Server in socket-mode
20100426025936 I #2192 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100426025957 E #2192 naInet Failed to connect to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100426025957 E #2192 naInet Socket error: 10060
20100426025957 I #2192 naInet HTTP Session closed
20100426025957 I #2192 naInet ------------------------------------------------------------
20100426025957 E #2192 SIM_InetMgr Start session for site upload failed
20100426025957 e #2192 SiteMgr ReplicateSite: Failed to connect to site ePOSA_NASAMCBO
20100426025957 I #2192 SrvEvtInf Generating Event
20100426025957 e #2192 SiteMgr ReplicationThreadProc: Upload data to site ePOSA_NASAMCBO failed
This is a straight-forward failure to connect. So ePO attempted to contact the super agent via the agent wakeup call port and failed to establish a connection. For this typically you need to make sure we have a route from the EPO server to the machine hosting the super agent repository,R= Yes, the SA hosting machine is a Server in the same domain in a remote branch connected using a dedicated network link of 256KBps
confirm the agent service is running on the client machine and that the frameworkservice.exe is listening on port 8081 on the machine hosting the SA reository.
R= Yes, the agent is running and i checked the agent log and didn´t see any events related to disconnections or similar
This appears to be an intermittant network issue as you can see a little higher in the log it was successfully connecting to the same site (this occured the day before):
20100425030007 I #6208 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425030007 I #6208 naInet Connected to HTTP Server: NASAMCBO.ven.rsa-ins.com
However a little further along in the log on the same thread it is timing out trying to send a file:
20100425032454 E #6208 NaiInet Socket send error 10054
20100425032454 E #6208 naInet Failes to upload data in bytes: 65536
20100425032454 I #6208 SIM_InetMgr Upload file avvdat-5962.zip failed in session 1, nainet ret=10054
20100425032454 I #6208 SiteMgr ReplicationUploadFile: Failed to upload file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, hr=-2147467259, retry limit remaining: 4
20100425032454 I #6208 SiteMgr ReplicationUploadFile: Uploading file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, retry limit remaining: 4
20100425032456 I #6208 SIM_InetMgr Uploading file avvdat-5962.zip from session 1, LocalDir=C:/PROGRA~1/McAfee/EPOLIC~1/DB\Software\Current\VSCANDAT1000\DAT\0000, RemoteDir=Current\VSCANDAT1000\DAT\0000
20100425032456 I #6208 naInet Uploading file C:/PROGRA~1/McAfee/EPOLIC~1/DB\Software\Current\VSCANDAT1000\DAT\0000\avvdat-5962.zip to HTTP Server
20100425032456 I #6208 naInet Connecting to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425032517 I #6208 naInet Failed to connect to Real Server: NASAMCBO.ven.rsa-ins.com on port: 8081
20100425032517 I #6208 SIM_InetMgr Upload file avvdat-5962.zip failed in session 1, nainet ret=10054
20100425032517 I #6208 SiteMgr ReplicationUploadFile: Failed to upload file avvdat-5962.zip to site ePOSA_NASAMCBO::Current\VSCANDAT1000\DAT\0000, hr=-2147467259, retry limit remaining: 3
It goes on to re-try 5 times and gives up. Notice the return code windows is passing back to EPO:
20100425032454 E #6208 NaiInet Socket send error 10054
So 10054 = Connection reset by peer (ref: http://msdn.microsoft.com/en-us/library/ms740668(VS.85).aspx). This indicates something other than EPO is closing the connection.
Could be one of several things, perhaps the agent service hosting the repository stopped on the remote machine?
R= The Agent service has not been stopped because if you check the agent log there are not events related to service stopped or similar
It could also indicate that at the time of the replication the WAN was so overloaded it couldn't process these requests in a timely fashion.
R= I have tried to run the task in different schedule and different time on day (peak and off peak hours) and the result always is the same
The files the replication is failing on (avvdat-5962.zip and vse870.msi) are both larger files so maybe you don't have the bandwidth required to send those files over the WAN?
R= ok i agree with you because both are larger files but my question and doubt is why when i copied both or more larger files using windows copy (copy and paste from remote to local path) i never get an error or failure and the copy process finish without problems?
To test this you could try manually copying one of those files from the EPO server to the machine hosting the super agent repository.
R= I did this at different times and the copying process from the EPO to the hosting machine is working fine and without problems
I´m thinking in something happens using HTTP connections or 8081 port but i don´t know how to solve or detect this issue
I really appreciate your help, i´ll be waiting your comments
Regards,
Unfortnately I don't have much beyond that. The log files provided clearly indicate that the connection was reset by the remote host and NOT that ePO itself terminated the connection. If you believe the problem has something to do with an HTTP file transfer or port 8081 then you could switch the repository to a UNC share which uses neither of the above.
Thank you Jeremy,
I´m going to check all the server configuration, install all the widows update and others in order to try to solve the problem and in the last instance i´ll change the SA repositorie to UNC
I´ll let you know the result
Regards and thank you for your valuable help
Corporate Headquarters
6220 America Center Drive
San Jose, CA 95002 USA