I need to generate a report showing the top 10 sites that the most data have been uploaded to. The goal is to identify potential fraudulous data leakage.
I went through existing reports and reports sections but I don't find how to enable this information in reports.
Do you know if it is possible ? If not, are you monitoring such information through other means ?
Please note that Web Reporter only as useful as the data in the logs. Regarding bytes, there are 4 different values, but Web Reporter only has one inbound and one outbound byte values.
1) From client to proxy (bytes_from_client)
2) From proxy to the web server (bytes_to_server)
1) From web server to proxy (bytes_from_server)
2) From proxy to the client. (bytes_to_client)
Page 38 of the product guide
If you are worried about possible data leakage then you are probably interested most in bytes_to_server. Keep in mind that if you block sites like bittorrent, (and assuming HTTP or SSL Scanner enabled if HTTPS), then you would not see any bytes_to_server, but there would be bytes_from_client. So hopefully this makes you think about which 2 values are most important to you. If you put all 4 values in your log file I don't know which 2 are used, but my guess would be first value (from left to right).
Once you have the byte values in the log, you just need to create a report with the appropriate byte value (caution: bytes = sum of inbound and outbound bytes). On the column properties tab of the query, sort bytes descending to push the largest values to the top.
Thanks for explanations. At first, I didn't noticed the Web- Detailed combo in the query definition where I found the related bandwith items but bytes_to_server doesn't exist in Web reporter. So I decided to go with bytes_from_client as a starting point.
I modified log format on my Web gateway as example below :
#time_stamp "auth_user" src_ip status_code "req_line" "categories" "rep_level" "media_type" bytes_from_server bytes_from_client "user_agent" "virus_name" "block_res"
[29/Jan/2013:14:20:37 +0400] "" 220.127.116.11 200 "GET http://cs-quotes.iitech.dk/_datafeed/Quotes/IITDCQuoteFeedExt30.dll?MODE=REFRESH&ID=7999&SEQ=5653 HTTP/1.1" "Finance/Banking" "Minimal Risk" "text/html" 1356 493 "Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0" "" "0"
[29/Jan/2013:14:20:37 +0400] "" 18.104.22.168 200 "CONNECT secure.evenium.com:443 HTTP/1.1" "Software/Hardware" "Minimal Risk" "" 6786 306 "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)" "" "0"
[29/Jan/2013:14:20:38 +0400] "" 22.214.171.124 200 "GET http://www.cdiscount.com/Overlayers.mvc/783699.html HTTP/1.1" "Online Shopping" "Minimal Risk" "text/html" 125332 3483 "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" "" "0"
Both values are logged and are different. Bytes from server value is bigger than bytes from client value which seems logical for these requests. Everything seems all right on the gateway.
Web Reporter is configured with Mcafee web gateway - Auto Discover format and my report configured to show Bytes from client (sort by descending) and Bytes from Server. Everything seems right ont the web reporter but the resulting report is not persistent :
Bytes from client is 0 and Bytes from server equals Bytes (which should be sum of inbound and outbound bytes as you said).
Is the issue due to the fact that at first, the log format uploaded to web reporter was not the same format than now (with additional field bytes_from_client) ? Is there something wrong with my log format ?
Thanks for your help.
on 29/01/13 04:49:27 CST
on 29/01/13 06:54:31 CSTCe message a été modifié par: mcyp on 29/01/13 06:56:12 CST
I see bytes in the bytes_to_client field of your log, so you should see data in bytes_to_client for new data. Of course existing data would still be 0 as you mentioned. The log structure looks fine. If you made a mistake, the log job would fail, or would be successful with 100% error lines.
Is new data still showing 0 bytes_to_client?
Mmmm. There is something strange.
I generated a report with my query on whole today period. It gave me this :
I generated a report with custom dates, from today 4 PM (approximately 1 hour later than the new log format was configured on the gateway) to end of the day . It gave me this :
I generated a third report with custom dates from today 1:PM (starting before new log format) to end of day. It gave me this :
It seems obvious that there is an issue on automatic discover of log format from web reporter. The newest data with new log format is not taken in account at all. Should I need to define custom log format ? or rebuild a new database from scratch in web reporter ?Ce message a été modifié par: mcyp
I would say that your newer logs are failing to import. As I said in my previous post, either the jobs are failing (usually due to problem with the header) or the status is "Successful" but with 100% of the lines as an error (problem with log lines not matching the header).
The auto-discover will work correctly but it depends on a good log format. Modifying the log format is very prone to making mistakes for even experienced people since there is no error checking.
So, should I better try to define a custom log format in web reporter so that my log format is imported correctly or try to define a log format on the gateway that would be correctly auto-discovered (putting quotes around bytes_from_client for example) ?
Thanks for your help.
The problem is not the auto-detect. Even custom log parser won't work if your log structure is not correct.
Find the problem with the access log structure per my suggestions above.
You should never use the custom log format. It does not handle block codes for Web Gateway.
In fact it was working. The lonly issue is that the logs were not uploaded realtime so when I generated the report the data were not available yet on the web reporter. Now it works fine.