Home / Blog / Addslashes(): don't call it a comeback

Addslashes(): don't call it a comeback

still number one for escaping SQL statements

I've seen a lot of people talking about mysql_real_escape_string() vs addslashes() vs addcslashes(). There seems to be a lot of real confusion about what these functions do (even with the php.net manual around), especially when it comes to character sets. I feel that some people are being scared into using some escaping methods with which they are not very familiar. So, I've decided to lay it all out in a few charts so there is no confusion about what each function does and how each can help protect against SQL injection attacks.

The Test

AS = addslashes()

MRES = mysql_real_escape_string()

ACS = addcslashes() //called with "\\\000\n\r'\"\032%_"

Feature AS MRES ACS
escapes quote, double quote, and backslash yes yes yes
escapes LIKE modifiers: underscore, percent no no yes
escapes with single quotes instead of backslash no yes*1 no
character-set aware no yes*2 no
prevents multi-byte attacks no yes*3 no
escape 7 strings 5000 times 0.031 secs 0.047 secs 0.039 secs
escape 1 binary image 200 times 0.514 secs 1.091 secs 0.552 secs

*1 MRES can escape with single quotes instead of backslashes if you set NO_ESCAPE_BACKSLASH during your SQL session. Since this is strictly a MySQL function, its use is dubious.

*2 Although MRES can be character set aware, it requires that you explicitly set the character set of the string which needs to be escaped, a feature that was not available until php 5.2.3.

*3 Although MRES stops the intended SQL injection, it can result in database errors, which, if unexpected could leak more information to the attacker. It also requires you to properly set your character sets, a feature that was not available until php 5.2.3.

As you can see, none of these functions fully protect against "multi-byte" injection attacks. I qualify this with "fully" because if you have mismatched character sets, MRES can present your attacker with an unexpected error message, or fail to work at all.

Speed

As far as speed goes, you can see that MRES can be up to twice as slow as AS or ACS. This is because the data to be escaped is sent to the libmysqlclient library. While this has some benefits when dealing with outdated character sets, it presents a number of problems.

Testability

When performing unit tests (you write unit tests, don't you?), if you are using MRES to escape your data, you are saddled with being required to have a Mysql server running. Hopefully the same version as the live server. too.

In the eyes of a unit testing 'purist', requiring a running database might disqualify your test from being 'true' unit tests. I say this half-jokingly. But, it does raise the question of how isolated you can make your code and could stand in the way of making mock objects for your database layer.

Multi Byte Breakdown

Multi-byte character sets is where the entire SQL escaping debate really takes off. There is so much confusion about how to properly escape text for exotic (or you could call them "outdated") character sets. In the above comparison there is a line that reads "character-set aware", and only MRES has a "yes" for that column. If we're assuming that "character-set aware" is a Good Thing, this begs the question, "When do you need to be aware of your character set?" Answer, all the time. But, for SQL escaping, it turns out that you need to be character set aware only some of the time.

In order to execute a multi-byte SQL injection the database tables must be created with a special character set and/or collation during table creation time, or the mysql client library must be instructed to use a special character set during the connection. Having a different table collation/character set other than the current character set of the SQL client (libmysqlclient for PHP) will result in a collation error even if the SQL is escaped properly by MRES.

The client character set was set with mysql_set_charset()

Client Character Set Table Character Set MRES Vulnerable AS Vulnerable
utf8 utf8 no no
gbk utf8 no (w/errors) yes
utf8 gbk no no
gbk gbk no yes

As you can see from this table, if your database is constructed only to deal with BGK (or SJIS, or other vulnerable charset) and you only deal with that character set at connection time, then you need MRES to properly escape your data.

Here is where PHP specific issues come to light. A lot of people have read that they should send the SQL commands "SET NAMES" or "SET CHARACTER SET" to MySQL in order to change the running connection's character sets. While this is true, it doesn't tell the whole story for PHP. PHP uses the libmysqlclient library to pass commands to mysql either over a network or through a Unix socket. This MySQL library has some utilities for clients wanting to use MySQL, one of these features is the function "mysql_real_escape_string". Unfortunately, for libmysqlclient, it cannot parse commands sent to MySQL, so it is blissfully unaware of any "SET NAMES" commands you might have sent to the server.

As you can see in the graph above, any MySQL commands pass right through libmysqlclient, but the PHP function mysql_real_escape_string() is actually part of the libmysqlclient C code and does not need to round trip to the server. (It does need to round trip "out" of the Zend memory space, which is why you see the slowness on large binary objects.) The "new-ish" PHP function mysql_set_charset() is exactly what you need to instruct libmysqlclient to be aware of any desired character set. This should be used in conjunction with any "SET NAMES" commands so that the library and the server are in sync.

Conclusion

AS is the most basic function. It has its limitations, but if you are aware of the character sets flying around you it should not pose any significant danger.

Since ACS or str_replace must be used for LIKE-type queries, ACS should be sufficient over MRES for all but the most unusual cases. If ACS could take an optional parameter to use a single quote or a backslash it would be the best SQL escaping tool, hands down, for any backend database.

The moral of this story is, go ahead and keep using addslashes(). It's fast, and fast... and it works with UTF-8. If you have a SQL library, which knows when you're adding LIKE or GRANT clauses and parameters, use addcslashes() instead. What else do you need to know? People who say that you "should" be doing something this way or that (parameterized queries) should understand that *knowing* your data and being confident about a solution is more important than just reaching for the most comfy looking security blanket without understanding what's going on. Remember, all data looks like binary data to a computer, it's only how you interpret it with charsets, plugins, protocols, etc. which makes data really data.



Comments on "Addslashes(): don't call it a comeback"

Anonymous (spam rating:2)
I propose this motto for PHP: "there's more than one way to do it, all of which are wrong. "
 
Anonymous
Great article to make programmers more aware. On the topic of mysql injection I would also like to mention mysql prepared statements. It is a smart move to upgrade from the normal mysql library to the mysqli version. This great package supports all new mysql features of wich "real" prepared statements is just one.

For those who don't know what iam talking about, prepared statements are just normal queries. Only where in the old situation you used something like this:

"SELECT * FROM `users` WHERE `uid` = {$uid};"

You will use this:

"SELECT * FROM `users` WHERE `uid` = ?;"

Trough the use of mysqli you can replace the questionmark with your own data. Now how is this different than all the other placeholder functions? Because mysql will treat the data you substetute just as data, it will preform no other functions or what so ever on it. This effectively avoids all mysql injection because whatever data the user submits it won't break the query.

One last tip on mysql: Don't forget output filters! people normaly focus on input so much they forget the fact that data retrieved from mysql is just as tainted as $_GET or $_POST data. Program defensively, security aware programmers need to be a little paranoid.

Post Scriptum: If you choose another library to use prepared statements make sure the library doesn't simulate this! Make sure the library uses genuine prepared statements else you are still vurnable.


Joost Pluijmers, programmer at http://www.digital-expression.nl
 
Anonymous
but the key is finding the way thats less wrong
 
Anonymous
Merci pour cet article.
 

 

Add a comment