All You Need To Know About Prefetching in Django

How to use Prefetch to speed up queries in Django


I have recently worked on a ticket ordering system for a conference. It was very important for the customer to see a table of orders including a column with a list of program names in each order:

The column requested by the users
The column requested by the users

The models looked (roughly) like this:

class Program(models.Model):
    name = models.CharField(
        max_length=20,
    )

class Price(models.Model):
    program = models.ForeignKey(
        Program,
    )
    from_date = models.DateTimeField()
    to_date = models.DateTimeField()

class Order(models.Model):
    state = models.CharField(
        max_length=20,
    )
    items = models.ManyToManyField(Price)
  • Program - a session, lecture or a conference day.
  • Price - Prices can change over time. One way to model changes over time is using a type 2 slowly changing dimension (SCD). The Price model represents the price of a program at a certain point in time.
  • Order - An order to one or more programs. Each item in the order is the price of the program at the time the order was made.

Before We Start

Throughout this article we are going to monitor the queries executed by Django. To log the queries add the following to the LOGGING settings in settings.py:

LOGGING = {
    # ...
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
        },
    },
}

What's The Problem?

Let's try to fetch the program names for a single order:

>>> o = Order.objects.filter(state='completed').first()

(0.002) SELECT ... FROM "orders_order"
WHERE "orders_order"."state" = 'completed'
ORDER BY "orders_order"."id" ASC LIMIT 1;

>>> [p.program.name for p in o.items.all()]

(0.002) SELECT ... FROM "events_price"
INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")
WHERE "orders_order_items"."order_id" = 29; args=(29,)

(0.001) SELECT ... FROM "events_program"
WHERE "events_program"."id" = 8; args=(8,)

['Day 1 Pass']
  • To fetch completed orders we need one query.
  • To fetch the program names for each order we need two more queries.

I previously wrote about the N+1 problem and this is a classic case. If we need two queries for each order, the number of queries for 100 orders will be 1 + 100 * 2 = 201 queries, that's a lot!

Let's use Django to reduce the amount of queries:

>>> o.items.values_list('program__name')

(0.003) SELECT "events_program"."name" FROM "events_price"
INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")
INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")
WHERE "orders_order_items"."order_id" = 29 LIMIT 21;

['Day 1 Pass']

Great! Django performed a join between Price and Program and reduced the amount of queries to just one per order.

At this point instead of 201 queries we only need 101 queries for 100 orders. Can we do better?

Why Can't We Join?

The first question that should come to mind is "why can't we join the tables?"

If we have a foreign key we can use select_related or use snake case like we did above to fetch the related fields in a single query.

For example, we fetched the program name for a list of prices in a single query using values_list('program__name'). We were able to do that because each price is related to exactly one program.

If the relation between two models is many to many we can't do that. Every order has one or more related prices - if we join the two tables we get duplicate orders:

SELECT
    o.id AS order_id,
    p.id AS price_id
FROM
    orders_order o
    JOIN orders_order_items op ON (o.id = op.order_id)
    JOIN events_price p ON (op.price_id = p.id)
ORDER BY
    1,
    2;

 order_id | price_id
----------+----------
    45    |    38
    45    |    56
    70    |    38
    70    |    50
    70    |    77
    71    |    38

Orders 70 and 45 have multiple items so they come up more than once in the result - Django can't handle that.


Django has a nice, built-in way, of dealing with this problem called prefetch_related:

>>> o = Order.objects.filter(
...     state='completed',
... ).prefetch_related(
...     'items__program',
... ).first()

(0.002) SELECT ... FROM "orders_order"
WHERE "orders_order"."state" = 'completed'
ORDER BY "orders_order"."id" ASC LIMIT 1;

(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id", "events_price"...
FROM "events_price"
INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")
WHERE "orders_order_items"."order_id" IN (29);

(0.001) SELECT "events_program"."id", "events_program"."name" FROM "events_program"
WHERE "events_program"."id" IN (8);

We told Django we intend to fetch items__program from the result set. In the second and third query we can see that Django fetched the through table orders_order_items and the relevant programs from events_program. The results of the prefetch are cached on the objects.

What happens when we try to fetch program names from the result?

>>> [p.program.name for p in o.items.all()]

['Day 1 Pass']

No additional queries - exactly what we wanted!

When using prefetch, it's important to work on the object and not on the query. Trying to fetch the program names with a query will produce the same outcome but will result in an additional query:

>>> o.items.values_list('program__name')

(0.002) SELECT "events_program"."name" FROM "events_price"
INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")
INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")
WHERE "orders_order_items"."order_id" = 29 LIMIT 21;

['Day 1 Pass']

At this point, fetching 100 orders requires only 3 queries. Can we do even better?

Introducing Prefetch

In version 1.7 Django introduced a new Prefetch object that extends the capabilities of prefetch_related.

The new object allows the developer to override the query used by Django to prefetch the related objects.

In our previous example Django used two queries for the prefetch - one for the through table and one for the program table. What if we could tell Django to join these two together?

>>> prices_and_programs = Price.objects.select_related('program')

>>> o = Order.objects.filter(
...     state='completed'
... ).prefetch_related(
...     Prefetch('items', queryset=prices_and_programs)
... ).first()

(0.001) SELECT ... FROM "orders_order"
WHERE "orders_order"."state" = 'completed'
ORDER BY "orders_order"."id" ASC LIMIT 1;

(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id",
"events_price"..., "events_program"...
INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")
WHERE "orders_order_items"."order_id" IN (29);

We created a query that joins prices with programs. Than we told Django to use this query to prefetch the values. This is like telling Django that you intend to fetch both items and programs for each order.

Fetching program names for an order:

>>> [p.program.name for p in o.items.all()]

['Day 1 Pass']

No additional queries - it worked!


Taking It To The Next Level

When we talked earlier about the models we mentioned that the prices are modeled as an SCD table. This means we might want to query only active prices at a certain date.

A price is active at a certain date if it's between from_date and end_date:

>>> from django.utils import timezone
>>> now = timezone.now()

>>> active_prices = Price.objects.filter(
...     from_date__lte=now,
...     to_date__gt=now,
... )

Using the Prefetch object we can tell Django to store the prefetched objects in a new attribute of the result set:

>>> from django.utils import timezone
>>> now = timezone.now()
>>> active_prices_and_programs = (
...     Price.objects.filter(
...         from_date__lte=now,
...         to_date__gt=now,
...     ).select_related('program')
... )
>>> o = Order.objects.filter(
...     state='completed'
... ).prefetch_related(
...     Prefetch(
...         'items',
...         queryset=active_prices_and_programs,
...         to_attr='active_prices',
...     ),
... ).first()

(0.001) SELECT ... FROM "orders_order"
WHERE "orders_order"."state" = 'completed'
ORDER BY "orders_order"."id" ASC
LIMIT 1;

(0.001) SELECT ... FROM "events_price"
INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")
INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")
WHERE ("orders_order_items"."order_id" IN (29)
AND "events_price"."from_date" <= '2017–04–29T07:53:00.210537+00:00'::timestamptz
AND "events_price"."to_date" > '2017–04–29T07:53:00.210537+00:00'::timestamptz);

We can see in the log that Django performed only two queries, and the prefetch query now include the custom filter we defined.

To fetch the active prices we can use the new attribute defined in to_attr:

>>> [p.program.name for p in o.active_prices]

['Day 1 Pass']

No additional query!


Final Words

Prefetch is a very powerful feature of Django ORM. I strongly recommend going over the documentation, you are bound to strike a gem.



Similar articles