Scaling Django Admin Date Hierarchy

How we got to have the cake and eat it too!


package

We published a package called django-admin-lightweight-date-hierarchy which overrides Django Admin date_hierarchy template tag and eliminates all database queries from it.
For the implementation details and the shocking performance analysis read on.


If you are not familiar with Django Admin date_hierarchy you should, it's great. Set the date_hierarchy attribute of a ModelAdmin to a DateField:

from django.contrib import admin
from .models import Sale

@admin.register(Sale)
class SaleAdmin(admin.ModelAdmin):
    date_hierarchy = 'created'

And you get a nice drill-down menu at the top of the admin change list:

Django date_hierarchy in action
Django date_hierarchy in action

When selecting a year, Django will filter the data to the selected year, and present a list of months for which there is data in that year.

When selecting a month, Django will apply the filter and present the list of days for which there is data in that month.

date_hierarchy Behind the Scenes

To produce a list of dates for which there is data, Django has to perform a query. For example, to produce the list of years, the first level in the hierarchy, Django will execute the following query:

SELECT DISTINCT
    django_datetime_trunc('year',  "sales_sale"."created", 'UTC') AS "datetimefield"
FROM
    "sales_sale"
WHERE
    "sales_sale"."created" IS NOT NULL
ORDER BY
    "datetimefield" ASC;

When a year is selected, Django will execute a query to produce the next level in the hierarchy - months:

SELECT DISTINCT
    django_datetime_trunc('month', "sales_sale"."created", 'UTC') AS "datetimefield"
FROM
    "sales_sale"
WHERE
    "sales_sale"."created" BETWEEN '2017–01–01 00:00:00' AND '2017–12–31 23:59:59.999999'
    AND "sales_sale"."created" BETWEEN '2017–01–01 00:00:00' AND '2017–12–31 23:59:59.999999'
    AND "sales_sale"."created" IS NOT NULL
ORDER BY
    "datetimefield" ASC;

How Expensive is it?

The date hierarchy is a great feature but it comes at a price.

To illustrate the problem we created a simple Sale model with two fields - id and created, and populated it with ~1,000,000 rows.

Using django-admin-toolbar we can see how long it takes Django to produce the date hierarchy:

Breakdown of SQL queries executed by Django Admin
Breakdown of SQL queries executed by Django Admin

WOW! The page took a boggling ~8s to load, out of which 7.6 seconds are spent producing the date hierarchy!

Just for comparison, the exact same page without date_hierarchy:

Breakdown of SQL queries executed by Django Admin without date hierarchy
Breakdown of SQL queries executed by Django Admin without date hierarchy

17ms. That's about 99.8% better.

A Possible Solution

Looking at the chart above it's clear that we have a problem. The date hierarchy is weighing our page making it nearly unusable.

Django need to execute a query because it only wants to show dates for which there is data. In our case, we have sales every day. Once we make this assumption we no longer have to query the data to produce a list of dates - we can just show them all!

The idea we came up with is:

  • if the user selected a month we show all of the days in the month.
  • If the user selected a year we show all of the months in the year.
  • If the user selected nothing we need to make an additional assumption - in our case we decided to show +-3 years from the current year. This is a compromise we were willing to make for the sake of performance and usability.

Now that we have the general idea let's dive into the implementation.

Implementation

Looking at the output above we can see that the queries originate from a template tag called {% date_hierarchy cl %}. The argument cl is the ChangeList created by the ModelAdmin.

The implementation for the date_hierarchy template tag can be found at admin_list.py.The interesting part is where the queries are executed.

Let's take a look at how Django produces a list of months for a given year:

# ...
year_field = '%s__year' % field_name

# ...
year_lookup = cl.params.get(year_field)

def link(filters):
  return cl.get_query_string(filters, [field_generic])

# ...

elif year_lookup:
  months = cl.queryset.filter(**{year_field: year_lookup})
  months = getattr(months, 'dates')(field_name, 'month')
  return {
    'show': True,
    'back': {
      'link': link({}),
      'title': _('All dates'),
    },
    'choices': [{
      'link': link({
        year_field: year_lookup,
        month_field: month.month,
      }),
      'title': capfirst(formats.date_format(month, 'YEAR_MONTH_FORMAT'))
    } for month in months]
  }
    # ...

Our date_hierarchy is set to the created. If we drill-down on year 2017 we get the following URL:

http://localhost:8000/admin/sales/sale/?created__year=2017

In the template tag year_field is created__year and year_lookup is 2017. The generated query applies the filter on created and fetches a list of months there is data for in year 2017 to the variable months.

Let's replace this bit and populate months with a list of all the months in the year instead:

# months = cl.queryset.filter(**{year_field: year_lookup})
# months = getattr(months, 'dates')(field_name, 'month')

# All months of selected year.
months = (
  datetime.date(int(year_lookup), month, 1)
  for month in range(1, 13)
)

After the change:

Queries executed by Django Admin after the change
Queries executed by Django Admin after the change

Awesome! No queries.

And the list view:

All month of year 2017 are shown
All month of year 2017 are shown

Our little change worked! All the months are displayed and no queries are executed by the date hierarchy.

Let's do the same for days:

# days = cl.queryset.filter(**{year_field: year_lookup, month_field: month_lookup})
# days = getattr(days, dates_or_datetimes)(field_name, 'day')
# All days of month.

days_in_month = calendar.monthrange(int(year_lookup), int(month_lookup))[1]
first_day_of_month = datetime.date(int(year_lookup), int(month_lookup), 1)
days = (
  first_day_of_month + datetime.timedelta(days=i)
  for i in range(days_in_month)
)

We use the calendar module to find out how many days there are in a given month.

Let's handle the years. Remember, we fetch +-3 years from today:

# years = getattr(cl.queryset, dates_or_datetimes)(field_name, 'year')

# Three years in each direction.
today = get_today()
years = (
  datetime.date(y, 1, 1)
  for y in range(today.year - 3, today.year + 3 + 1)
)

Integration

Up until now we fiddled with Django's source but we can't really do that. First we override the template tag to make Django use our implementation.

Let's copy the function and register a template with the same name:

# app/templatetags/admin_list.py
from django.contrib.admin.templatetags.admin_list import register

@register.inclusion_tag('admin/date_hierarchy.html')
def date_hierarchy(cl):
  # ... (original implementation) ...

Register the library in our app:

# settings.py

TEMPLATES = [{
  # ...
  'OPTIONS': {
    # ...
    'libraries': {
      # ...
      'admin': 'app.templatetags.admin_list',
    },
  },
}]

Now Django uses our template tag instead of his.

The problem with producing the date hierarchy is really an issue only for very large tables. We don't want to disable the existing behavior - we want to enable it only for very large tables.

Let's add an attribute on the ModelAdmin to turn the default drill-down behavior on and off:

@admin.register(Sale)
class SaleAdmin(admin.ModelAdmin):
    date_hierarchy = 'created'
    date_hierarchy_drilldown = False

When date_hierarchy_drilldown is set to False our new template tag will not execute queries. Otherwise, we preserve the original behavior.

To implement this we add the following at the start of our implementation of the date_hierarchy template tag:

date_hierarchy_drilldown = getattr(
    cl.model_admin,
    'date_hierarchy_drilldown',
    True,
)

Now we can re-enable the default behavior when date_hierarchy_drilldown=True.

For example, producing a list of months:

# ...
if date_hierarchy_drilldown:
    months = cl.queryset.filter(**{year_field: year_lookup})
    months = getattr(months, dates_or_datetimes)(field_name, 'month')
else:
    # All months of selected year.
    months = (
        datetime.date(int(year_lookup), month, 1)
        for month in range(1, 13)
    )
# ...

This is it! We've successfully scaled date hierarchy to handle millions of rows with a little compromise on UX.

Package

We found this approach useful in several projects so we decided to publish it as a package:

Using it is as simple as

Install it

$ pip install django-admin-lightweight-date-hierarchy

Add it to your INSTALLED_APPS:

INSTALLED_APPS = (
    'django_admin_lightweight_date_hierarchy',
)

Set date_hierarchy_drilldown to False on any ModelAdmin with date_hierarchy to prevent the default drill-down behavior:

@admin.register(MyModel)
class MyModelAdmin(admin.ModelAdmin):
  date_hierarchy = 'created'
  date_hierarchy_drilldown = False

To squeeze some more juice out of Django Admin check out this post as well:

Cheers!




Similar articles