Hi,
Users have been complaining about "pauses" and "sporadic errors" accessing the internet.
The dashboard shows nothing relevant on the "Alerts" tab, but I noticed that there are some moments (about 5 times a day) when the number of connected clients drops abruptly, sometimes to zero, and then recovers. The data is collected via snmp using the "MCAFEE-MWG-MIB.stClientCount.0" oid.
Looking at the error logs (in /opt/mwg/log/mwg-errors) I found entries like those below; the timestamps match those of the drops in the number of connected clients.
mwg-core.errors.log:[2016-11-25 10:43:45.726 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
mwg-core.errors.log:[2016-11-25 11:05:24.644 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
mwg-core.errors.log:[2016-11-25 14:56:10.013 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
mwg-core.errors.log:[2016-11-25 16:35:20.657 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
mwg-core.errors.log:[2016-11-25 18:28:02.600 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
mwg-core.errors.log:[2016-11-25 18:30:05.978 +00:00] [Core] [TermSignalReceived] 'McAfee Web Gateway Core version: 7.7.0.1.0 - build: 22207' - child process exited (termsignal='11').
Signal 11 is SIGSEGV.
dmesg recorded more than a hundred occurrences like this:
[3034797.008397] traps: mwg-core[74787] general protection ip:7f15607e863f sp:7f15203cb280 error:0 in libProxy.so[7f1560501000+4ad000]
Has anyone seen theses error messages ?
Thanks and regards,
Sergio
PS.
I can confirm that the errors began after the upgrade to 7.7.0.1. After upgrading on a Friday, there were no errors during the weekend (very light load) and, on Monday the errors began to show.
I was thinking of a hardware problem (bad memory) but such coincidence makes me suspicious ...
Message was edited by: Sergio Veloso
Added confirmation that the errors began only after the upgrade to 7.7.0.1.
Another thing I noticed (from "/opt/mwg/log/debug/binary_invoke.log"):
[2016-11-29 10:07:50.155 +00:00] a parent instance of Core forked a new child...
[2016-11-29 10:07:50.155 +00:00] parent pid : 17703
[2016-11-29 10:07:50.155 +00:00] child pid : 185980
[2016-11-29 10:07:50.155 +00:00] reason : guardian for running instance
[2016-11-29 10:07:50.155 +00:00] a child instance of Core was forked...
[2016-11-29 10:07:50.155 +00:00] parent pid : 17703
[2016-11-29 10:07:50.155 +00:00] child pid : 185980
[2016-11-29 10:07:50.155 +00:00] reason : guardian for running instance
This is logged at the same time of the failure.
Buenas tardes Sergio.
No estoy muy seguro si mi sugerencia te sirva, pero valida el modelo y configuración del dispositivo, ya que las nuevas actualizaciones tienen un alto consumo de memoria RAM y el fabricante recomienda si tu dispositivo tiene 4 Gb llevarlo a 8 Gb. Esto fue lo que logre investigar luego de que hicimos la actualización a la versión 7.6.2.
Buenas !
Thanks for the reply ! I think though that memory is not the problem, because the appliance has over 100GB.
However, the latest release (7.7.0.3) mentions this among the "known issues":
"""
The core process failed on Web Gateway with term signal 11 due to a problem with the PDF opener,
which was caused by a missing root entry in a trailer section. (1168496)
"""
I think that is the issue I am experiencing.
I'll update as soon as possible and come back to report the results.
Download the new ePolicy Orchestrator (ePO) Support Center Extension which simplifies ePO management and provides support resources directly in the console. Learn more about ePO Support Center
Corporate Headquarters
2821 Mission College Blvd.
Santa Clara, CA 95054 USA