Skip navigation
McAfee Secure sites help keep you safe from identity theft, credit card fraud, spyware, spam, viruses and online scams
1154 Views 6 Replies Latest reply: Feb 18, 2013 11:09 AM by sroering RSS
ianc Newcomer 4 posts since
Feb 6, 2013
Currently Being Moderated

Feb 7, 2013 8:26 AM

Anticipated performance of Content Security Reporter with large data

Hello Community,

 

Our customer has a requirement to retain 6 months worth of webwasher access logs to query using CSR.  The most common type of query wil be around the examination of an individual user's web browsing habits.  There will be a small number of users running reports, probably no more than a couple at a time.  Looking at the current rate of log creation this equates to around 5,000,000,000 individual access logs.  We anticipate using page views to minimise the size of the database, and the expectation is that the number of logs is set to increase in the future.

 

We are planning to use SQL Sever 2008 SE on a SAN attached physical server.  CSR will be installed on a separate physical server, with ePO running on a VM.

 

I'm wondering if anyone has had any experience of using CSR (or indeed Web Reporter) for this order of magnitude of data.  I believe that the main load will be on the SQL server and I'm keen to ensure that it is spec'd highly enough to cope.  Initial thoughts are starting with 32GB RAM, using a quad core Xeon 2.4Ghz processor.  Any experiences / lessons learnt would be most welcome.

 

Thanks

Ian

  • sroering McAfee SME 458 posts since
    Feb 10, 2011

    Regarding log parsing, the performance of CSR and Web Reporter 5.2 (and indeed 5.2.1) should be about the same.  The big difference between the two products is on the report generation, but I don't think this should be too different in performance either.

     

    Regarding your log data, we have several customers that generate over 250 million records per day. keeping at least 6 months. Are you sure about your log record count?  Maybe you mean 5 million or are considering log data other than access logs.  5 billion is quite large and I would guess the company size would need to be well over 100,000 people.  If you really meant 5 million, then I would put you as an "average" size to what we have for customers. 

     

    regarding hardware, CSR and Web Reporter are about the same (assuming CSR is on a different server than ePO).  You shouldn't need more than 12 GB of RAM and that is for special circumstances were you might run ver large reports that are several GB. SQL server would be my preference for the database too.  Make sure it is on dedicated hardware and give it all the RAM you can. You will need an Microsoft SQL Server Enterprise license, but you should enable DB partitioning for best performance.

     

    My recommendation, is to use page views, not only for compression, but it makes your reports more closely reflect reality.  See this doc for more information.

    https://community.mcafee.com/docs/DOC-4662

     

    Please consider this KB article for deployment best practices.

    https://kc.mcafee.com/corporate/index?page=content&id=KB73295

  • sroering McAfee SME 458 posts since
    Feb 10, 2011

    5 billion records for 6 months makes a lot more sense.  I thought you were saying 5 billion per day.  Don't take this the wrong way, but that's really not very much data.  You shoulnd't have any performance issues if you plan well.  I judge performance based on records per second.  At 40 million records a day you need to run at least 463 records a second or you are falling behind.  Web Reporter has no trouble running over 3,000 records per second on the out side.  If page views are enabled, you can multiply that number by about 10x for 30k/sec.

     

    So if you can achieve 3,000 records per second, you can pars a full day of log files without page views in less than 4 hours. That should be plenty of time for running reports and performing maintenance, etc.

     

     

    Make sure the DB has dedicated hardware. The DB does more work than Web Reporter since it runs the reports and stores data during log parsing.  Web Reporter only has to do the log parsing.  I think you would be fine with 32GB RAM on the DB server. You want to put your best hardware here. That includes disks.  If your detail data table is 500 GB, and you run a detail report for the full 6 months, the DB needs to scan the entire table which is a lot of disk IO.  Enabled DB parttioning to get the best report performance.

     

    Web Reporter should run well on any modern server with 12 GB of RAM. You would probably run fine with 4~8 GB of RAM dedicated to Web Reporter.  I think the default cache options will work fine for you. 

     

    Use summary data for your reports when ever possible.  Detail data is atleast 10x larger and the cost of running such reports reflects this.

     

    That is my advice.

  • sroering McAfee SME 458 posts since
    Feb 10, 2011

    A small note. You said that the traffic has a lot of HTTP status 407, which means you are doing proxy authentication.  As you seemed to have noticed, Web Reporter pre-filters this data (as it should) and doesn't go into the DB.  This also affects the performance numbers I gave.

     

    In general, there are two places to take a reading.  Log records in the file need to be read into memory, then we do some processing, and ultimately some records come out the back end.

     

    I ignored your 25% HTTP status 407 records which are dropped on the front end.  Ultimately the front end has NEVER had any performance issues. The default 2 log parsing threads tear through data like industrial wood chippers.  Nobody has ever needed more than 2 log parsing threads.

     

    If issues happen, it is on the processing.  That is why I make all my performance measuremets on what comes out of the system.  Administration > Options > Performance > DB Statistics >> Records per second.  The only gotcha is to keep in mind that you need to mulitply that number by your average page-view compression (usually around 10x) to know how many records on the front end are getting processed.  But again, this doesn't consider 407s since it is pre-filtered.

  • sroering McAfee SME 458 posts since
    Feb 10, 2011

    Run regular index maintenance, and use SQL partitioning.  If your reports are using summary data, such as top categories by hits for jdoe over 6 months,  or top sites by bytes for jdoe over 6 months, then your reports should be fairly fast.  For all 6 months, I wouldn't expect the reports to run more than a couple minutes per query on summary data. Shorter time periods would be faster still.  Summary data has indexes on pretty much everything, which makes it very fast if you keep them defragmented. 

     

    Detail data only has indexes on the record ID and time. So running reports on detail data for all 6 months will do a full table scan regardless of your filters. Trying to predict performance on detail data is much more difficult since it depends on all the events using disk IO. I think you have a good plan in place and shouldn't have unreasonable report times. 

     

    A detailed report that says "give me all urls for site mcafee.com from user jdoe over 6 months" I would expect less than 1 hour.  A detailed report that says "give me all the urls that contain the word 'mcafee' over 6 months" I wouldn't be surprised if it took over an hour.  Both queries do a full table scan since they cover all 6 months. The main difference is that the first can filter on the user id and site ID (a simple number comparison). The second query has to to a text search of the URL which is very costly.

     

    I hope that helps put performance into perspective.

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 5 points
  • Helpful Answers - 3 points