cancel
Showing results for 
Search instead for 
Did you mean: 
DBO
Level 9
Report Inappropriate Content
Message 1 of 5

RegEx for http://*.scr URL in e-mail?

Anybody got a test RegEx expression to detect URL ending in *.SCR inbedded in e-mail with IM 6.7.1?

4 Replies
ijahnke
Level 11
Report Inappropriate Content
Message 2 of 5

Re: RegEx for http://*.scr URL in e-mail?

Our Knowledge Base article KB69857 describes this issue

A simple RegEx that would fit this querry:

https?://.+\.scr$

**Please note that generally our technical support staff does not support setting up custom RegEx dictionaries**

Message was edited by: Ivan Jahnke on 9/15/10 12:58:40 PM CDT
runcmd
Level 10
Report Inappropriate Content
Message 3 of 5

Re: RegEx for http://*.scr URL in e-mail?

I'm not sure if IronMail's RegEx engine makes certain assumptions but the only downside I see to this regular expression is that it may catch anything with ".scr" in the URL.  Therefore, both of these could be hits...

ht_p://www.bad.xyz/hosting/somethingbad.scr

ht_p://www.screening.xyz/

(One "t" in "http" was intentionally omitted because they are bogus URLs and I didn't want them to automatically hyperlink.)

You'd almost want something like...

https?://.+\.scr($|/|>|"| |\f|\n|\t|\r)

The problem is that there are a lot of possibilities for characters at the end of the "scr" file extension for a URL that is embedded in an email.

Message was edited by: runcmd on 9/15/10 1:24:54 PM EDT
ijahnke
Level 11
Report Inappropriate Content
Message 4 of 5

Re: RegEx for http://*.scr URL in e-mail?

If you select "Word Boundary" then it should only words that would begin and end with http://<stuff>.scr

However I have edited the original script and added "$" to the end

runcmd
Level 10
Report Inappropriate Content
Message 5 of 5

Re: RegEx for http://*.scr URL in e-mail?

Thanks for the clarification, Ivan.  The reason I added the range ($|/|>|"| |\f|\n|\t|\r) at the end of my RegEx example is because URLs can potentially be embedded in a lot of different ways...

<ht_p://www.bad.xyz/hosting/somethingbad.scr>

<!a href="ht_p://www.bad.xyz/hosting/somethingbad.scr">

"ht_p://www.bad.xyz/hosting/somethingbad.scr"

ht_p://www.bad.xyz/hosting/somethingbad.scr/

etc.

Even with just a "$" on the end and "word boundary" configured, would IronMail's RegEx engine catch the above examples?  If so, that's good to know.  If not, word boundary would probably allow you to reduce the range to something more reasonable like ($|/|>|").  I'd imagine that the more complex your RegEx is, the more processing power is consumes on the appliance as well.