5 Replies Latest reply: Nov 5, 2013 10:41 AM by Regis RSS

    web categorization question

    Regis

      I've been administering these things for a while now but I still can't say I really understand how their categorization works.

       

      One site came to my attention today which appears entirely benign.

           http://digitaldeconstruction.com/    Visit it from a web gateway appliance, and it says it's pornography.   It actually seems to be an aggregator site of news and interesting stories and the like, as is all the rage it seems.

       

      I checked https://www.trustedsource.org/en/feedback/url?action=checksingle      and it shows for the resident gateway appliances, Uncategorized, but if you check it against McAfee Realtime or SiteAdvisor Enterprise,  it says Pornography.

       

      So...   can someone explain in a paragraph, the difference as it pertains to a web gateway appliance users' experience, or these different databases?   And how the web gatweway comes to a category determination that shows on error pages for people?   I'd like to know what aspect is the database that gets copied down regularly to the appliances, vs what "GTI" does, etc.

       

      Thanks in advance  for any clarification! 

        • 1. Re: web categorization question
          eelsasser

          There are a limited number of URLs in the database on-box. ~40 million

          There are roughly 11 times more that reside on the GTI cloud. ~400 million

           

          When a URL.Category is looked up, it consults the local database first, then the online database.

          Capture.png

           

          At some point in time, that site might have contained adult content and was flagged.

          No one has looked at it since and wanted it reverted back.

          If you request a site review, they will look at it again and change it.

          • 2. Re: web categorization question
            Jon Scholten

            Hi Regis,

             

            I'm a little confused as to the wording, but I think you're just asking how come there is a difference between resident and cloud databases?

             

            resident = database downloaded to local web gateway

            cloud = database on mcafee servers

             

            The resident database is downloaded in order to prevent the MWG from needing to lookup every URL in the cloud. In addition, the resident database has a maximum size restriction (due to legacy devices, memory restrictions, and trying to keep a lower footprint on the appliance OS, etc...).

             

            The cloud helps with this because it can be whatever size it needs to be, and helps where the resident may not be updated as often. So MWG is configured to check the cloud when the MWG encounters an uncategorized URL (per the resident database).

             

            Does this help?

             

            The example you have given above simply looks like a miscategorization.

             

            Best,

            Jon

             

            Message was edited by: jscholte erik responded at the same time i was typing mine... on 11/1/13 5:44:29 PM CDT
            • 3. Re: web categorization question
              Regis

              Thank you both.     That clears it up nicely. 

               

              When checking categorizations against https://www.trustedsource.org/en/feedback/url?action=checksingle, I'd always used the option entitlted "McAfee Web Gateway V7/v6 resident"  as that was my closest use case from that list (since, you know, I'm running web gateway).     As such I never bothered looking at the others and only now understand how the various lists interact.    I knew about uncategorized falling through to IP reputation, but I didn't know about a separate cloud database that that site calls "McAfee RealTime Database."  Now it's clear, despite everyone's different wording for em. 

               

               

              I submitted a recatogorization on it for teh resident one... so I'm curious what they'll do with hit.  If they recat it in resident will they also update realtime?   I guess we'll see.  

               

              Thanks again.

              • 4. Re: web categorization question
                Jon Scholten

                Items in the resident are a subset of what's in the cloud.

                 

                So everything that's in the resident is in the cloud, but not the otherway around.

                 

                If you selected resident, I think they'll address it in related resident databases as well as the cloud.

                 

                Best,

                Jon

                • 5. Re: web categorization question
                  Regis

                  Jon Scholten wrote:

                   

                  Items in the resident are a subset of what's in the cloud.

                   

                  So everything that's in the resident is in the cloud, but not the otherway around.

                   

                  If you selected resident, I think they'll address it in related resident databases as well as the cloud.

                   

                  Best,

                  Jon

                   

                   

                  FYI, they did indeed follow the path of least astonishment on this.  Recategorization request is now closed and the cateogrization is now Blogs/wiki for both "real time"  (what others would call cloud)  and resident (on-gateway database) lists.  

                   

                  Thanks all for your input.