Support escaping in regular expression replacement
Happy new year!! It has been way to quit on my blog lately. My new years resolution is to change this though. I have much in store, though I’m starting of with something small (but handy).
Simple replacement
String replacement is often used as a way to apply templating. You might replace “%a:test” with “~~test~~” using the regexp: %a:(\w+), replacing it with “~~$1~~”.
Trying to escape
The only problem now, is that I can’t use “%a:” any more within my string. This could be solved by allowing escaping using the backslash. In the regexp we can use a negative lookbehind to see if the character before the % isn’t a backslash: (?<!\\)%a:(\w+).
Escaping the escaping
Now we’re close, however now it’s not possible to use “\%:a” anywhere. We need to be able to escape the backslash as well. We could state the problem as needing to match %a if there isn’t an uneven number of backslashes in front of it. Checking for an uneven number in a negative lookbehind isn’t possible unfortunately, so we need to get the backslashes into the match. We can say: match 0 or more pairs of backslashes, followed by “%a:”, if there is no backslash in front of it. This results in the regexp:
(?<!\\)((?:\\{2})*+)%a:(\w+), replacing it for “$1~~$2~~”.
To finish up
To only thing is that \% and \\ will still be displayed as that. This can simply be solved with a str_replace.
09 Jan 2009 Arnold Daniels





Just a small note, although it doesn’t change anything, in the final regex, “*+” an useless +.
Hi Jordi,
The + isn’t useless, it’s there to increase performance. When the expression fails because of an uneven number of backslashes, it doesn’t need to try the expression for each backslash, because they all will fail. Using a possessive quantifier (*+ or ++) prevents this.
Thanks for the really helpful post, this is going to come in very handy. However, I think the negative lookbehind should have the question mark before the less than symbol.
http://www.regular-expressions.info/lookaround.html
You seem to be right Dave. I’ve changed in in the article.