The MEG appliance is a powerful filtering tool for email.  As part of its use, it needs to be able to find content in various parts of messages.  While it is frequently possible to only worry about simple strings with (possible) wildcards, sometimes it's necessary to be able to filter for content matching particular formats, but which cannot be reliably predicted ahead of time (except in that it matches the given format).  In such cases, Regular Expressions are a powerful tool.

 

However, as Spiderman keeps telling us, with great power comes great responsibility.  Regular expressions are extremely powerful tools.  They can pick out seemingly random data as matching a specific pattern, and do so much more quickly than we can.  However, the more complicated the regular expression, the more processing power it will require to run it.  Additionally, regular expressions can be designed to search for strings or blocks of varying lengths, or even of an undefined length but still matching a particular pattern.

 

The problem with Regular expressions is that, as I saw in a case I recently worked on, there are commands which are able to be interpreted as "look for this pattern in a string of (effectively) infinite length".  While they aren't always as recognizable as this, the simplest version of this sort of regular expression is ".*" (without the quotes).  This command means to match any character one or more times.  However, since there's no limit to the length, the result is that it's entirely possible that the system would continue searching for a lot longer than we would normally expect.

 

In order to prevent this from resulting in significant, unexpected CPU use, it is recommended to be very careful with the use of * and + in regular expressions.  If it doesn't need to search indefinitely, it would be better to put in something like {1,100}, thus ensuring that the preceeding item matches up to 100 characters, but not more than that.  This helps to prevent a case where the CPU gets all chewed up because the system is trying to match an extremely long string to the pattern when it has no business doing so.