Regex for finding quoted strings

(originally posted to the BBEdit-Talk list, but posting here too since the answer might help others)

I’m looking for a regex pattern that will find quoted strings (double quotes) but skip (double-)quoted strings containing any of the following characters: $, ‘, “, (dollar sign, single quote, double quote, backslash)

At first I tried “[^$'”\]+?” but it was matching the end of one quoted string and the beginning of the next, so I’m clearly missing something.

Regexes in Depth: Advanced Quoted String Matching was helpful, but didn’t explain how to negate strings containing the characters above.

Strings that should fail to match:

// contains quotes $str = "`zcol ACOL` NUMBER(32,2) DEFAULT 'The "cow" (and Jim''s dog) jumps over the moon' PRIMARY, INTI INT AUTO DEFAULT 0, zcol2"afs ds"; // contains dollar signs, backslashes and single quotes ADOConnection::outp( " -- $_SESSION['AVAR']={$_SESSION['AVAR']}",false); // contain single quotes if (strncmp($val,"'",1) != 0 && substr($val,strlen($val)-1,1) != "'") {

Strings that should successfully match:

$myvar = "this is my quoeted ".$and_another_var." and another string";

Also, quoted strings should not be preceded with a backslash.

I’ve read and reread the BBEdit docs (which are great) but I’ve been unable to come up with a method that passes all of these tests.

I never had any idea this could be such a complicated problem. Does anyone see what I’m missing?


Matching negative character classes is prone to difficulties because it’s hard to manage what comes before and after the class. That’s why I ended up using the following, which worked more or less well for me and avoided matching properly quoted strings inside HTML.

(?s)(?<!name=|action=|align=|valign=|width=|height= |nowrap=|scope=|class=|id=|style=|type=|value=|method=|border= |cellspacing=|cellpadding=|colspan=|size=|maxlength=|for=|label= |rows=|cols=|wrap=|language=|href=|version=|fuse=|charset=|src= |alt=|title=|xmlns=|http-equiv=|rel=|content=|rowspan=|checked= |accept=|face=)(?<!')(?<!\)(?<!?>) "((?!.|,|, | ,| , |. | . |:| :|: | : )[[:alnum:] -_.,:%@<>?()*/]*?(?<!\))"

Update 2

Give me a break! Here’s the solution to this problem: matching quoted strings.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.