After doing some analysis of our logs with our Hadoop cluster, we have decided to try to flag newly observed domains being requested by our proxy clients.
So far so good,we have a bunch of web proxies that generate a field "domain" containing the domain part of the destination URLs.However... We could not find a suitable way of doing this:
Initially we planned to use Enrichment, however to my surprise, Data Enrichment can only match against single column data sources and to my freaking surprise, no support to discretionary REST calls... (no, using SQL Server CLRs or HBASE Coprocessors is out of question).
We went back to the drawing board and re-considered using Dynamic Watchlists + Correlation rules, but using ESM Strings is just impossible. The Watchlists ran against whatever fields they can match!!! Using that feature is like Nightmare on Grep Street.
Would anyone have a suggestion?
Actually you could try to achieve your scenario with watchlists and Alarms.
Just create a static watchlist and then create an alarm based on field match for domain not in the watchlist.
1.Watchlist should be static and the value type should be domain.
- Condition : Should be Field Match based on the Domain compared against the previously created watchlist
- Under devices select your Proxy servers
- Under actions you can append the same watchlist with the new Domains.
You could even extend the scenario by adding second watchlist which will be a baseline.
What you could do in that scenario is the same the only difference is the second watchlist.
1st watchlist will be the baseline and it will be update after each Alarm.
2nd watchlist is the one that will contain only the new domains.
You can set expiry for the second watchlist on a monthly basis or however you want.
Also you can perform all these actions with a single alarm and 1-2 watchlists.
Let me know if you wan't to discuss that in more details.
I'll try to go in more details.
The Alarm itself will be used to monitor the Domain field within the event. The Alarm will check whether the information is already present within watchlist "A".
If the information is not present within watchlist "A" the actions for the alarm will add it to watchlist A and B.
This way watchlist "A" will contain all accessed domains and watchlist "B" can be valid for a month.
Watchlist "A" - will contain all domains
Watchlist "B" accessed during the last month and not in watchlist "A"(This way we ensure that we monitor only for new domain occurrences).
Both watchlists will be static and the alarm Action will update them.
let me know if that makes sense
Watchlist "A" will be updated by the Alarm initially the process will be noisy but after 1 month let's say it will be populated with most of the domains so you could use it as baseline.
Watchlist "A" will not expire.
Watchlist "B" also will be updated by the Alarm.
Watchlist "B" should be set to expire after a week or a month depends on you actually.
The problem with this scenario is that during the initial feeding of Watchlist "A" there will be a lot of Alarms generated.
Maybe there are better ways but that was the first in my mind.
Hope that helps.
Reading your comment I see that you are already using hadoop as a database for you logs. Was wondering if you could tell me how it was setup. We need to do the same with net flow, and other data. but we would rather be with a hadoop cluster.
I use the Data Archival Feature to export data and from there I import into HDFS. It works for syslog and some other events but would not work for Netflow as those data sources aren't saved using this method.
If netflow is what you want to analyse, then you may have a read on:
and its relevant source files:
Hope this helps.
Like my comments? Consider donating bitcoins using the QR code in my Avatar.