Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a different angle.
In this article I put QuerySets and SQL side by side. If SQL is where you are most comfortable, this is the Django GROUP BY cheat-sheet for you.
Table of Contents
- How to Group By in Django
- How to Count Rows
- How to Use Aggregate Functions
- How to Group By
- How to Filter a QuerySet With Group By
- How to Sort a QuerySet With Group By
- How to Combine Multiple Aggregations
- How to Group by Multiple Fields
- How to Group by an Expression
- How to Use Conditional Aggregation
- How to Use Having
- How to Group by Distinct
- How to Create Expressions Using Aggregate Fields
- How to Group By Across Relations
- How to Group By a Many to Many Relationship
- Going Further
How to Group By in Django
To demonstrate different GROUP BY queries, I will use models from Django's built-in django.contrib.auth
app.
>>> from django.contrib.auth.models import User
Django ORM produces SQL statements with long aliases. For brevity, I will show a cleaned-up, but equivalent, version of what Django executes.
SQL Logging
To see the SQL actually executed by Django, you can turn on SQL logging in the Django settings.
How to Count Rows
Let's count how many users we have:
SELECT
COUNT(*)
FROM
auth_user;
User.objects.count()
Counting rows is so common that Django includes a function for it right on the QuerySet. Unlike other QuerySets we'll see next, count
returns a number.
How to Use Aggregate Functions
Django offers two more ways to count rows in a table.
We'll start with aggregate
:
SELECT
COUNT(id) AS id__count
FROM
auth_user;
from django.db.models import Count
User.objects.aggregate(Count('id'))
To use aggregate
we imported the aggregate function Count
. The function accepts an expression to count. In this case, we used the name of the primary key column id
to count all the rows in the table.
Aggregate NULL
Aggregations ignore NULL
values. For more on how aggregations handle NULL
, see 12 Common Mistakes and Missed Optimization Opportunities in SQL.
The result of aggregate
is a dict:
>>> from django.db.models import Count
>>> User.objects.aggregate(Count('id'))
{"id__count": 891}
The name of the key is derived from the name of the field and the name of the aggregate. In this case, it's id__count
. It's a good idea not to rely on this naming convention, and instead provide your own name:
SELECT
COUNT(id) as total
FROM
auth_user;
>>> from django.db.models import Count
>>> User.objects.aggregate(total=Count('id'))
{"total": 891}
The name of the argument to aggregate
is also the name of the key in the resulting dictionary.
How to Group By
Using aggregate
we got the result of applying the aggregate function on the entire table. This is useful, but usually we want to apply the aggregation on groups of rows.
Let's count users by their active status:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
(User.objects
.values('is_active')
.annotate(total=Count('id')))
This time we used the function annotate
. To produce a GROUP BY we use a combination of values
and annotate
:
values('is_active')
: what to group byannotate(total=Count('id'))
: what to aggregate
The order is important: failing to call values
before annotate
will not produce aggregate results.
Just like aggregate
, the name of the argument to annotate
is the key in the result of the evaluated QuerySet. In this case it's total
.
How to Filter a QuerySet With Group By
To apply aggregation on a filtered query you can use filter
anywhere in the query. For example, count only staff users by their active status:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
WHERE
is_staff = True
GROUP BY
is_active
(User.objects
.values('is_active')
.filter(is_staff=True)
.annotate(total=Count('id')))
How to Sort a QuerySet With Group By
Like filter, to sort a queryset use order_by
anywhere in the query:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
ORDER BY
is_active,
total
(User.objects
.values('is_active')
.annotate(total=Count('id'))
.order_by('is_active', 'total'))
Notice that you can sort by both the GROUP BY key and the aggregate field.
How to Combine Multiple Aggregations
To produce multiple aggregations of the same group, add multiple annotations:
SELECT
is_active,
COUNT(id) AS total,
MAX(date_joined) AS last_joined
FROM
auth_user
GROUP BY
is_active
from django.db.models import Max
(User.objects
.values('is_active')
.annotate(
total=Count('id'),
last_joined=Max('date_joined'),
))
The query will produce the number of active and inactive users, and the last date a user joined in each group.
How to Group by Multiple Fields
Just like performing multiple aggregations, we might also want to group by multiple fields. For example, group by active status and staff status:
SELECT
is_active,
is_staff,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active,
is_staff
(User.objects
.values('is_active', 'is_staff')
.annotate(total=Count('id')))
The result of this query includes is_active
, is_staff
and the number of users in each group.
How to Group by an Expression
Another common use case for GROUP BY is to group by an expression. For example, count the number of users that joined each year:
SELECT
EXTRACT('year' FROM date_joined),
COUNT(id) AS total
FROM
auth_user
GROUP BY
EXTRACT('year' FROM date_joined)
(User.objects
.values('date_joined__year')
.annotate(total=Count('id')))
Notice that to get the year from the date we used the special expression <field>__year
in the first call to values()
. The result of the query is a dict, and the name of the key will be date_joined__year
.
Sometimes, the built-in expressions are not enough, and you need to aggregate on a more complicated expression. For example, group by users that have logged in since they signed-up:
SELECT
last_login > date_joined AS logged_since_joined,
COUNT(id) AS total
FROM
auth_user
GROUP BY
last_login > date_joined
from django.db.models import (
ExpressionWrapper,
Q, F, BooleanField,
)
(User.objects
.annotate(
logged_since_joined=ExpressionWrapper(
Q(last_login__gt=F('date_joined')),
output_field=BooleanField(),
)
)
.values('logged_since_joined')
.annotate(total=Count('id'))
.values('logged_since_joined', 'total')
The expression here is fairly complicated. We first use annotate
to built the expression, and we mark it as a GROUP BY key by referencing the expression in the following call to values()
. From here on, it's exactly the same.
How to Use Conditional Aggregation
Using conditional aggregation, you can aggregate only a part of the group. Conditions come in handy when you have multiple aggregates. For example, count the number of staff and non-staff users by the year they signed-up:
SELECT
EXTRACT('year' FROM date_joined),
COUNT(id) FILTER (
WHERE is_staff = True
) AS staff_users,
COUNT(id) FILTER (
WHERE is_staff = False
) AS non_staff_users
FROM
auth_user
GROUP BY
EXTRACT('year' FROM date_joined)
from django.db.models import F, Q
(User.objects
.values('date_joined__year')
.annotate(
staff_users=(
Count('id', filter=Q(is_staff=True))
),
non_staff_users=(
Count('id', filter=Q(is_staff=False))
),
))
The SQL above is from PostgreSQL, which along with SQLite is currently the only database backend that supports the FILTER
syntax shortcut (formally called "selective aggregates"). For other database backends, the ORM will use CASE ... WHEN
instead.
tip
I previously wrote about aggregations with filters. Check out my 9 Django tips for working with databases.
How to Use Having
The HAVING
clause is used to filter on the result of an aggregate function. For example, find the years in which more than a 100 users joined:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
HAVING
COUNT(id) > 100
(User.objects
.annotate(year_joined=F('date_joined__year'))
.values('is_active')
.annotate(total=Count('id'))
.filter(total__gt=100))
The filter on the annotated field total
added an HAVING clause in the generated SQL.
How to Group by Distinct
For some aggregate functions such as COUNT
, it is sometimes desirable to only count distinct occurrences. For example, how many different last names are there per user active status:
SELECT
is_active,
COUNT(id) AS total,
COUNT(DISTINCT last_name) AS unique_names
FROM
auth_user
GROUP BY
is_active
(User.objects
.values('is_active')
.annotate(
total=Count('id'),
unique_names=Count('last_name', distinct=True),
))
Notice the use of distinct=True
in the call to Count
.
How to Create Expressions Using Aggregate Fields
Aggregate fields are often just the first step to a greater question. For example, what is the percent of unique last names by user active status:
SELECT
is_active,
COUNT(id) AS total,
COUNT(DISTINCT last_name) AS unique_names,
(COUNT(DISTINCT last_name)::float
/ COUNT(id)::float) AS pct_unique_names
FROM
auth_user
GROUP BY
is_active
from django.db.models import FloatField
from django.db.models.functions import Cast
(User.objects
.values('is_active')
.annotate(
total=Count('id'),
unique_names=Count('last_name', distinct=True),
)
.annotate(pct_unique_names=(
Cast('unique_names', FloatField())
/ Cast('total', FloatField())
))
The first annotate()
defines the aggregate fields. The second annotate()
uses the aggregate function to construct an expression.
How to Group By Across Relations
So far we've used only data in a single model, but aggregates are often used across relations. The simpler scenario is of a one-to-one or a foreign key relation. For example, say we have a UserProfile
with a one-to-one relationship to the User, and we want to count users by the type of profile:
SELECT
p.type,
COUNT(u.id) AS total
FROM
auth_user u
JOIN user_profile p ON u.id = p.user_id
GROUP BY
p.type
(User.objects
.values('user_profile__type')
.annotate(total=Count('id')))
Just like GROUP BY expressions, using relations in values
will group by that field. Note that the name of the user profile type in the result will be 'user_profile__type'.
How to Group By a Many to Many Relationship
A more complicated type of relation is the many to many relationship. For example, count in how many groups each user is a member:
SELECT
u.id,
COUNT(ug.group_id) AS memberships
FROM
auth_user
LEFT OUTER JOIN auth_user_groups ug ON (
u.id = ug.user_id
)
GROUP BY
u.id
(User.objects
.annotate(memberships=Count('groups'))
.values('id', 'memberships'))
A user can be a member of more than one group. To count the number of groups the user is member of we used the related name "groups" in the User
model. If the related name is not explicitly set (and not explicitly disabled), Django will automatically generate a name in the format {related model model}_set
. For example, group_set
.
Going Further
To dig deeper into the ORM and GROUP BY in particular, check out these links:
- How to use grouping sets in Django: An article about advanced group by techniques such as group by cube, group by rollup and group by grouping sets.
- How to Get the First or Last Value in a Group Using Group By in SQL: A neat little trick using arrays in PostgreSQL.
- 12 Common Mistakes and Missed Optimization Opportunities in SQL: Some SQL do's and dont's you need to know if you are working with data and writing SQL.
- Django Aggregation cheat-sheet page: How to do common aggregate queries.