Insane regular expressions
- May 29th, 2010
- Write comment
Most people I respect will agree with me that for things like html, regular expressions are a boat load of epic fail. However, a clever regular expression is a wonderful tool to have in your arsenal of problem solving tools. One of my favorite evil interview questions is “Can you create a regular expression that can validate email addresses?”. Mainly what I’m looking for is an insight into how people think about a problem and how much they understand the problem domain. It is also an interesting tell as far as when people admit they don’t know how things actually work.
I’m not looking for someone to belt out an answer like this (from http://fightingforalostcause.net/misc/2006/compare-email-regex.php)
/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i
But if someone did, I’d probably go down a different path of questions since they obviously have a very good understanding of how email addresses work.
Along the way, I began to think to myself…. if you really wanted to validate an email address, wouldn’t it be vastly simpler to create a grammar that you could use to tokenize an address into the
Regular expressions are cool and all, but at the end of the day, even the expression above is simply an elaborate parser that is simply asserting the validity of the format, not the validity of the address. I mean, if the domain isn’t even registered, then the address is invalid. For instance, joe@example.com is a valid email address in the sense that it is well formed. However, if you’re a big rfc nerd like me, you would be instantly saying “Ah, ha! Example.com is a reserved domain according to RFC 2606! No recipients that that domain are going to be valid!”
You would be correct. Maybe this is an obnoxious way for me to take my ‘validate an email address’ to the next level. In the right circumstances, it would be quite telling if someone did not understand the significance of mx records and their role in mail delivery.