10.1 It's My Way or the Highway!
Sometimes your objects hold a lot of references to various utility objects that you don't want to pass over the network, or that can't be passed over the network: references to files, databases, etc.
To make serialization work in such cases, they came up with the idea of giving a class the ability to manage its own serialization. For this, special methods are used: __reduce__()
, __getstate__()
, __setstate__()
. These methods allow you to specify how objects should be serialized and restored.
Main methods of controlled serialization:
-
__reduce__()
: Specifies how an object should be serialized. -
__getstate__()
: Returns the state of the object for serialization. -
__setstate__(self, state)
: Restores the object from the state.
I'll explain them in more detail below, and how to use them together.
10.2 The __reduce__()
Method
The __reduce__()
method returns a tuple that specifies how an object should be serialized and deserialized. The tuple normally contains:
- A reference to a function or class that will be used to restore the object.
- A tuple of arguments for this function or class.
- Additional object state (if necessary).
Example:
import pickle
class CustomClass:
def __init__(self, value):
self.value = value
def __reduce__(self):
return (self.__class__, (self.value,))
def __repr__(self):
return f"CustomClass(value={self.value})"
# Create an object
obj = CustomClass(42)
# Serialize the object
serialized_obj = pickle.dumps(obj)
print("Serialized object:", serialized_obj)
# Deserialize the object
deserialized_obj = pickle.loads(serialized_obj)
print("Deserialized object:", deserialized_obj)
By default, the __reduce__()
function behaves like this:
class CustomClass:
def __init__(self, value):
self.value = value
def __reduce__(self):
# Define the class
cls = self.__class__
# Constructor arguments
args = (self.value,)
# Object state
state = self.__dict__
return (cls, args, state)
It returns a tuple consisting of three objects:
Reference to the current class
Constructor arguments (tuple)
Reference to the current state of the object
If you're cool with this behavior — you can skip overriding __reduce__()
.
10.3 Reading and Writing State
The __getstate__()
and __setstate__()
Methods
These methods are used to manage the state of an object during serialization and deserialization.
-
__getstate__()
: Returns the state of the object that should be serialized. -
__setstate__(self, state)
: Restores the object from the state.
Example:
Let's say we want to save not all fields of an object but exclude some of them. For this, in the __getstate__()
method, you need to:
-
Copy the current state of the object (set by the utility field
__dict__
) into a separate variable — a dictionarystate
. - Remove all fields from it that don't need to be serialized.
- Return the resulting object as the result of the
__getstate__()
function.
import pickle
class CustomClass:
def __init__(self, value):
self.value = value
self.internal_state = "internal"
def __getstate__(self):
state = self.__dict__.copy()
del state['internal_state'] # Exclude the internal state
return state
def __setstate__(self, state):
self.__dict__.update(state)
self.internal_state = "restored internal" # Restore internal state
def __repr__(self):
return f"CustomClass(value={self.value}, internal_state={self.internal_state})"
# Create an object
obj = CustomClass(42)
print("Original object:", obj)
# Serialize the object
serialized_obj = pickle.dumps(obj)
print("Serialized object:", serialized_obj)
# Deserialize the object
deserialized_obj = pickle.loads(serialized_obj)
print("Deserialized object:", deserialized_obj)
During deserialization, in the __setstate__()
function, we do two things:
-
Update the current state of the object using the
update()
method. -
The
internal_state
field (and other unserializable fields) get new values.
GO TO FULL VERSION