Regex for finding quoted strings

(originally posted to the BBEdit-Talk list, but posting here too since the answer might help others)

I’m looking for a regex pattern that will find quoted strings (double quotes) but skip (double-)quoted strings containing any of the following characters: $, ‘, “, (dollar sign, single quote, double quote, backslash)

At first I tried “[^$'”\]+?” but it was matching the end of one quoted string and the beginning of the next, so I’m clearly missing something.

Regexes in Depth: Advanced Quoted String Matching was helpful, but didn’t explain how to negate strings containing the characters above.

Strings that should fail to match:

// contains quotes
$str = "`zcol ACOL` NUMBER(32,2) DEFAULT 'The "cow"
(and Jim''s dog) jumps over the moon' PRIMARY,
INTI INT AUTO DEFAULT 0, zcol2"afs ds";

// contains dollar signs, backslashes and single quotes
ADOConnection::outp( "
-- $_SESSION['AVAR']={$_SESSION['AVAR']}",false);

// contain single quotes
if (strncmp($val,"'",1) != 0 && substr($val,strlen($val)-1,1) != "'") {

Strings that should successfully match:

$myvar = "this is my quoeted ".$and_another_var." and another string";

Also, quoted strings should not be preceded with a backslash.

I’ve read and reread the BBEdit docs (which are great) but I’ve been unable to come up with a method that passes all of these tests.

I never had any idea this could be such a complicated problem. Does anyone see what I’m missing?

Update

Matching negative character classes is prone to difficulties because it’s hard to manage what comes before and after the class. That’s why I ended up using the following, which worked more or less well for me and avoided matching properly quoted strings inside HTML.

(?s)(?<!name=|action=|align=|valign=|width=|height=
|nowrap=|scope=|class=|id=|style=|type=|value=|method=|border=
|cellspacing=|cellpadding=|colspan=|size=|maxlength=|for=|label=
|rows=|cols=|wrap=|language=|href=|version=|fuse=|charset=|src=
|alt=|title=|xmlns=|http-equiv=|rel=|content=|rowspan=|checked=
|accept=|face=)(?<!')(?<!\)(?<!?>)
"((?!.|,|, | ,| , |. | . |:| :|: | : )[[:alnum:]
-_.,:%@<>?()*/]*?(?<!\))"

Update 2

Give me a break! Here’s the solution to this problem: matching quoted strings.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.