1 of 1 people found this helpful
Ok, so you have the following;
Sensor Model: M-4050
Sensor SW: 18.104.22.168
Spikes happen Every 15 minutes and last for 6 minutes...Is this trend on-going or is it some days and not other?
Do you see an increase in any particular alerts around these times?
What features are you using on the Sensor? (SSL Decryption/HTTP Response Scanning/etc)
When you ssh into the Device can you go to the debug shell and type show sensor health and post output here if anything is bad?
Also, to shed some light on the show sensor-load command, just because the PEs only show 20% approx across all of them, this does not mean that the traffic that is being scanned isn't causing high CPU usage. This is why I am asking what features you have enabled, and if there is an increase in a certain alert or alerts.
1 of 1 people found this helpful
Just to add to mjesmer's reply -
Do you have any scheduled tasks configured during that time? I.e. automated sigset/botnet download to the sensor? Performance monitoring will also increase the load on the sensors - do you have any of the real time CPU/mem monitors in use during that time?
Also, what's happening at the network level? Do you see any patterns on specific protocols being monitored during that time? I.e increase on HTTP traffic (in or outbound, do you have HTTP response enabled?), Network backups, FTP file transfers, etc?
The output for this device is
IntruDbg#> show sensor health
bootflag = on
sensor health = good
health of control channel = good
health of correlation engine = good
health of snmp master agent = good
health of snmp sub agent = good
health of packet log = good
health of system controller = good
health of CLI = good
health of Log Main = good
health of Log Task = good
health of SGAP = good
health of AuthGw = good
health of ACLDaemon = good
health of TrustedSource = good
health of BCM = good
This trend is on-going until today.
FYI, we didn't see any increase in any particular alerts during the spike time.
We have enabled HTTP Response Scanning on this device, but no SSL decryption.
We also have RealTime CPU/mem monitors all the time.
As advised by d_aloy regarding network level, I check on the IPS Flow Usage and Throughput , and found identical spike on both graph.
Perhaps I should start check on the network level to see if any network backups, FTP transfer during these moment.
Thanks a lot guys!
What metric's are you collecting under Device > <Device_Name> > Setup > Performance Monitoring > Metrics?
Are you seeing faults generated for CPU utilization?
If you change the time frame from minutes to hours does the problem disappear?
I am collecting both CPU Utilization Data and Port Throughput Utilization Data under Metrics.
I also did not see any faults generated for CPU utilization.
Yes, when i change the time frame from minutes to hours, the problem disappear and I no longer see the spike, anything you have in your mind?
2 of 2 people found this helpful
I tried using these charts in 8.2 to get details of throughput and had issues where the values displayed were incorrect or just didn't make sense and increased and decreased when the time value was changed.
If you haven't already I would open an SR with support to have them look into it.
This data is stored in the iv_perf_mon_* tables, you could try querying the tables to see what values they actually hold for CPU utilization.