jeudi 16 juin 2016

MySQL Wildcard escaping magic

About a year ago I encountered problem with escaping wildcard % character in MySQL. I found a couple of cases that are incomprehensible to me. I read MySQL documentation and some questions on stackoverflow but I didn't found explanation why this happen. I found only one good answer here, but no one know what's happening.

Problem is that when you escaping wildcard (%) or search string containing (escape), this work not as expected. Below I gave some examples to explain to what the problem. My question is:

Is there anyone who can explain How it works?

Ok, now lets go to examples:

We have database table names magic with records:

+--------------+
| name         |
+--------------+
| %            |
| %%           |
| %kasztan     |
| mleko%1      |
| 1%ololo      |
| magic        |
| nyan%cat     |
|             |
| \           |
| %\%         |
| %\          |
| \%          |
| \\         |
| \\\\     |
| %\\\\    |
| \%\        |
| \\\%      |
| \1\%       |
| \1\%       |
| \1\        |
| %\1\       |
| 1\%        |
| 123          |
| aaaa         |
| abcd         |
+--------------+

As MySQL docs say:

To test for literal instances of a wildcard character, precede it by the escape character. If you do not specify the ESCAPE character, “” is assumed.

  • % matches one “%” character.

  • _ matches one “_” character.

Ok, check it. Let's found all records ending with '%'. If % matches '%' character and without it % is wildcard character, this should be good:

SELECT * FROM `magic` WHERE name LIKE '%%'

It found all records containing %. It seems %% work as %/%%. If we check docs:

Note

Because MySQL uses C escape syntax in strings (for example, “n” to represent a newline character), you must double any “” that you use in LIKE strings. For example, to search for “n”, specify it as “n”. To search for “”, specify it as “\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.

Exception: At the end of the pattern string, backslash can be specified as “”. At the end of the string, backslash stands for itself because there is nothing following to escape.

That's why we need double escaping - change %% to %\% and do it again.

SELECT * FROM `magic` WHERE name LIKE '%\%'

Nothing changed. It's looks like it's not working, we got again the same set of records. But now make one small change:

SELECT * FROM `magic` WHERE name LIKE '%a\%'

This query should return all records ending with a% and it does. We got a%, cocoa% and power of a%. By the way %a% do the same.


Make another test but this time: %a\%% - everything containing 'a%'.

SELECT * FROM `magic` WHERE name LIKE '%a\%%'

Perfect, we got previous records plus boba%why and a%coco. It means this query does what it should.

In short for now logic is simple:

  • %\% == %\%%
  • %a\% <> %a\%%

Complete lack of sense.


This is nothing. Go to next test.

Find single backslash. This '\\' should work.

SELECT * FROM `magic` WHERE name LIKE '\\'

It's working. Only one match ''.

Very good, but what if we need find all records containing backslash? My logic say: % - as wildcard + four to make only one backslash + last wildcard %.

In result: '%\\%'.

SELECT * FROM `magic` WHERE name LIKE '%\\%'

We found nothing. Make another test, but add one more backslash.

SELECT * FROM `magic` WHERE name LIKE '%\\%'

It's working. how? I got all records containing . Now check 6x.

SELECT * FROM `magic` WHERE name LIKE '%\\\%'

Again the same result. But where is logic???


Track my logic of escaping: We have %\\% now parse it => we got % ? %. But what about 5' backslash? He escape %? It disappeared?

If we have 6 backslashes, my logic is is:

%\\\% (parsing)=> %\% (compile)=> (% wildcard)%

In result we should find records ended %. But In real life we find records like: %\1\.


Next test: %\\\\%. Eight backslash combo.

Parse it %\\\\% => %\\% => %\% (wildcard)(literaly two backslashes)(wildcard)

This should find all records containing '\' and we found:

%\%, \%, \\\%, \1\%, \1\%, 1\%, very nice but where is \1\, \ or \ etc.?

It looks like we found everything ending \%.


And last, but the strangest test.

This time do the same query, but more restrictive: \\\\%. Search record starting with \. (Last time containing)

SELECT * FROM `magic` WHERE name LIKE '\\\\%'

Now we got:

  • %
  • \
  • \\
  • %
  • \%
  • 1%
  • 1%
  • 1

I rly cannot understand how it work. How parser and compiler process escaping? Is it a bug or feature?

Aucun commentaire:

Enregistrer un commentaire