10.1 It's My Way or the Highway!
Sometimes your objects hold a lot of references to various utility objects that you don't want to pass over the network, or that can't be passed over the network: references to files, databases, etc.
To make serialization work in such cases, they came up with the idea of giving a class the ability to manage its own serialization. For this, special methods are used: __reduce__(), __getstate__(), __setstate__(). These methods allow you to specify how objects should be serialized and restored.
Main methods of controlled serialization:
-
__reduce__(): Specifies how an object should be serialized. -
__getstate__(): Returns the state of the object for serialization. -
__setstate__(self, state): Restores the object from the state.
I'll explain them in more detail below, and how to use them together.
10.2 The __reduce__() Method
The __reduce__() method returns a tuple that specifies how an object should be serialized and deserialized. The tuple normally contains:
- A reference to a function or class that will be used to restore the object.
- A tuple of arguments for this function or class.
- Additional object state (if necessary).
Example:
import pickle
class CustomClass:
def __init__(self, value):
self.value = value
def __reduce__(self):
return (self.__class__, (self.value,))
def __repr__(self):
return f"CustomClass(value={self.value})"
# Create an object
obj = CustomClass(42)
# Serialize the object
serialized_obj = pickle.dumps(obj)
print("Serialized object:", serialized_obj)
# Deserialize the object
deserialized_obj = pickle.loads(serialized_obj)
print("Deserialized object:", deserialized_obj)
By default, the __reduce__() function behaves like this:
class CustomClass:
def __init__(self, value):
self.value = value
def __reduce__(self):
# Define the class
cls = self.__class__
# Constructor arguments
args = (self.value,)
# Object state
state = self.__dict__
return (cls, args, state)
It returns a tuple consisting of three objects:
Reference to the current classConstructor arguments (tuple)Reference to the current state of the object
If you're cool with this behavior — you can skip overriding __reduce__().
10.3 Reading and Writing State
The __getstate__() and __setstate__() Methods
These methods are used to manage the state of an object during serialization and deserialization.
-
__getstate__(): Returns the state of the object that should be serialized. -
__setstate__(self, state): Restores the object from the state.
Example:
Let's say we want to save not all fields of an object but exclude some of them. For this, in the __getstate__() method, you need to:
- Copy the current state of the object (set by the utility field
__dict__) into a separate variable — a dictionarystate. - Remove all fields from it that don't need to be serialized.
- Return the resulting object as the result of the
__getstate__()function.
import pickle
class CustomClass:
def __init__(self, value):
self.value = value
self.internal_state = "internal"
def __getstate__(self):
state = self.__dict__.copy()
del state['internal_state'] # Exclude the internal state
return state
def __setstate__(self, state):
self.__dict__.update(state)
self.internal_state = "restored internal" # Restore internal state
def __repr__(self):
return f"CustomClass(value={self.value}, internal_state={self.internal_state})"
# Create an object
obj = CustomClass(42)
print("Original object:", obj)
# Serialize the object
serialized_obj = pickle.dumps(obj)
print("Serialized object:", serialized_obj)
# Deserialize the object
deserialized_obj = pickle.loads(serialized_obj)
print("Deserialized object:", deserialized_obj)
During deserialization, in the __setstate__() function, we do two things:
- Update the current state of the object using the
update()method. - The
internal_statefield (and other unserializable fields) get new values.
GO TO FULL VERSION