De-Mystifying Python Descriptors + Django
Sep 16, 2018
At my new job, I've been working a lot with Django. I was wary at first, but it has been about half a year working with it, and I can honestly say that it's been a dream. It's a significantly different experience from my last job, where we used Flask and SQLite. For me, one of the best reasons to use it is for its ORM that makes it really nice and easy to create complex queries and filters using its very extensible and easy-to-understand QuerySet
API. Maybe I'll write about my experience with that switch in a future blog post.
In any case, after what was probably my first thousand hours working with Django, I finally took the time to learn about a Python feature that's at the heart of Django's Model/Field API, but is so well hidden and abstracted from you that many Django developers don't really ever have to concern themselves with its nuances until you start doing some meta programming that touches this area. This feature, my dear friends, as I'm sure you've figured out by the title of this blog post, is Descriptors.
Descriptors
The name "descriptor" does not refer to a new kind of Python type or keyword; it's better to describe it instead as a coding pattern or protocol that you may find in a regular Python class. Descriptors are Python objects that define the following magic methods: __get__
, __set__
, or __delete__
.
For example, check out this simple descriptor example:
# defining the descriptor
class MyDescriptor:
def __get__(self, instance, type):
# details omitted ...
def __set__(self, instance, value):
# details omitted ...
# defining a class that uses the descriptor
class Foo:
some_attribute = MyDescriptor()
# invoking the descriptor methods
foo = Foo()
foo.some_attribute
Invoking foo.some_attribute
like we do above example is enough to trigger Python to call the descriptor's __get__
method.
Theoretically, you could use the descriptor methods by calling them directly, i.e., foo.__get__(foo, Foo)
, but you typically don't want to do that because you lose the nice syntactical shorthand that the descriptor gives you. If you were going to call the method directly, you would probably prefer to create a class or instance method on Foo
, and call it that way.
Surprisingly, there is a lot of magic happening when you invoke foo.some_attribute
. Python objects have a magic method called __getattribute__
, which is called any time you attempt to lookup a value on a given object. This __getattribute__
method is not special: it's quite literally how Python finds attributes on any object. This method uses a lookup chain that starts by checking for the existence of some_attribute
in the instance's dict (i.e., foo.__dict__['some_attribute']
), then checks the instances's class's dict (i.e., type(foo).__dict__['some_attribute']
or Foo.__dict__['some_attribute']
), then finally through looks through the dicts for each of the base classes of Foo
.
Once Python has has found the some_attribute
object, it checks the some_attribute
object to see if it has a __get__
method implemented. If it doesn't, then it just returns the some_attribute
object as-is, whatever it may be. If it does have a __get__
method, then Python calls it, passing in the instance and its class as arguments.
In other words, for our example above:
foo.some_attribute
is the same as:
type(foo).__dict__['some_attribute'].__get__(foo, type(foo))
Whatever that __get__
method does or returns is up to you. A similar process happens when you assign a value to some_attribute
, except that it checks for the existence of and then calls the __set__
method instead. Same for __delete__
, which gets called when you attempt to del
the attribute. There is space here to perform a whole host of shenanigans inside of these methods. You could make a counter that increments every time an attribute is accessed. You could raise an Exception inside of a __set__
to render it read-only. You could return different values from the __get__
depending on the state of the instance
.
Django's Use of Descriptors
Descriptors are ubiquitous in a Django app. In your models.py
, suppose you have the following model:
class Post(models.Model):
title = models.CharField(max_length=200, unique=True)
body = models.TextField()
When Django sets up this model, Django's ModelBase
metaclass will iterate over all of model fields and call each field's contribute_to_class
function. This function moves the field to the model's _meta
attribute and then instantiates a descriptor called DeferredAttribute
in its original location. DeferredAttribute
helps us with performance because upon accessing the value for the first time, Django will query the database and then cache the result. Every subsequent access of the attribute will attempt to avoid reaching into the database, which could end up being costly if you're doing it repeatedly!
One common pattern in many Django applications is the use of the @property
decorator -- a Python built-in function which transforms the function it decorates it into a descriptor that only implements its __get__
method.
class Post(models.Model):
title = models.CharField(max_length=200, unique=True)
body = models.TextField()
@property
def excerpt(self):
return self.body[:100] if len(self.body) > 100 else self.body
first_post = Post.objects.first()
first_post.excerpt # calls and returns the excerpt function.
first_post.excerpt = 'blah blah blah' # AttributeError
Attempting to set 'blah blah blah' to the excerpt
property will fail with an AttributeError because the __set__
method is not implemented by default when you use the @property
decorator. Other similar functions are the @classmethod
and the @staticmethod
decorators, which work similarly.
Summary
Descriptors are all over the place once you know how to recognize them and it helps to know how they work and what they're good for. In general, they're an advanced Python topic that you typically won't need to reach for other than for the built-in @property
, @classmethod
, or @staticmethod
decorators. However, when you do come across a good use-case, like when multiple class attributes have the same getter/setter functionality, a custom descriptor can significantly DRY out your code and encapsulate logic in exactly the places that you want them.