How "Export to Excel" Almost Killed Our System

And How We Eventually Made It 4 Times Faster

A few weeks ago we had some trouble with an "Export to Excel" functionality in one of our systems. In the process of resolving this issue, we made some interesting discoveries and came up with original solutions.

This article is inspired by the actual issue we used to track this incident over a period of two days. We go through the process of identifying the problem, experimenting and benchmarking different solutions until eventually deploying to production.

These are the main takeaways described in this article:

  • Generating xlsx files can consume significant amount of resources.
  • Under some circumstances better performance can be gained by not using prefetch_related.
  • pyexcelerate is a fast package for creating simple Excel files.
  • tablib (and django-import-export) can be patched to use pyexcelerate and produce excel files faster.

Table of Contents

How a server must feel when asked to produce an Excel file
How a server must feel when asked to produce an Excel file

A few weeks ago we started getting complaints from users about slow response time from one of our systems. A quick glance at the server metrics showed higher than normal CPU usage. This system is mostly IO intensive, so high CPU usage is not something we experience regularly.

The first thing we did was to identify the worker process that is consuming high CPU using htop. After getting the process identifier (PID) of the process, we used py-spy to get a glance at what it's doing:

$ py-spy -p 8187 -r 1000

This command samples the process a 1000 times per second and provides a top-like view of the results:

Total Samples 17974
GIL: 0.00%, Active: 0.00%, Threads: 1

OwnTime  TotalTime  Function (filename:line)
0.000s    173.7s   get_response (django/core/handlers/
0.000s    173.7s   inner (django/core/handlers/
0.000s    173.7s   __call__ (django/utils/
0.000s    173.7s   __call__ (django/core/handlers/
0.000s    173.6s   view (django/views/generic/
0.000s    173.6s   _get_response (django/core/handlers/
0.000s    173.6s   dispatch (django/contrib/auth/
0.000s    173.6s   dispatch (django/contrib/auth/
0.000s    173.6s   dispatch (django/views/generic/
0.050s    173.6s   get (dashboard/views/
0.000s    94.69s   get_resource_to_export (dashboard/views/
0.000s    94.69s   export (dashboard/views/
0.000s    94.68s   export (dashboard/
0.000s    83.81s   __iter__ (django/db/models/
0.040s    82.73s   _fetch_all (django/db/models/
0.000s    78.84s   export (dashboard/views/
0.000s    70.58s   __iter__ (django/db/models/
0.000s    68.98s   execute_sql (django/db/models/sql/
68.81s    68.81s   _execute (django/db/backends/
0.000s    68.81s   _execute_with_wrappers (django/db/backends/
0.000s    68.81s   execute (django/db/backends/
0.000s    50.11s   save (tablib/packages/openpyxl3/
0.000s    50.11s   export_set (tablib/formats/
0.000s    46.41s   save (tablib/packages/openpyxl3/writer/
0.000s    46.41s   save_workbook (tablib/packages/openpyxl3/writer/
0.000s    42.40s   _fetch_all (django/db/models/
0.000s    42.40s   _prefetch_related_objects (django/db/models/
0.000s    42.38s   prefetch_related_objects (django/db/models/
0.000s    41.94s   prefetch_one_level (django/db/models/
0.000s    41.25s   get_prefetch_queryset (django/db/models/fields/
0.000s    32.30s   _write_worksheets (tablib/packages/openpyxl3/writer/

After monitoring this view for a minute or two, we had a few insights:

  1. A lot of time is spent fetching data.
  2. A lot of time is spent on some call to prefetch_related.
  3. The problem is in the dashboard, and more specifically in the view that exports data.

With these insights, we wanted to moved on to identify the exact view. We then turned to the nginx access log:

$ journalctl -u nginx -r | grep dashboard

We managed to identify several endpoints that were taking very long to execute. Some of them finished in just under 60 seconds, others were killed by PostgreSQL after hitting the statement_timeout limit and returned a 500 status code.

At this point we had a pretty good idea where the problem is, but we were still clueless as to why. The next step was to inspect the problematic code, and try to reproduce.

Exporting a QuerySet to Excel

The system is used to report and track violations in public transportation. During an inspection, the inspector documents different types of violations such as dirty bus, bus running late etc. The models for this system look roughly like this:

class ViolationType(models.Model):
    name = model.TextField()

class Inspection(models.Model):
    id = models.AutoField(primary_key=True)

class Violation(models.Model):
    inspection = models.ForeignKey(Inspection, on_delete=models.PROTECT)
    violation_type = models.ForeignKey(ViolationType, on_delete=models.PROTECT)
    comments = models.TextField()

Every once in a while, a back office user would download the inspection information to Excel for further analysis.

Users + Excel = Love
Users + Excel = Love

The report includes a lot of information about the inspection, but most importantly, it includes a list of the violation types for each inspection:

inspection, violations
1, dirty floors | full trash can
2, full trash can | no light | missing signs

Using django-import-export

To produce the Excel report we use a package called django-import-export. Using the package, we define a ModelResource that can produce an Excel file from a queryset:

from import_export import resources, fields, widgets

from .models import Inspection, Violation

class InspectionResource(resources.ModelResource):
    violations = fields.Field(
        widget=widgets.ManyToManyWidget(Violation, field='violation_type')

    class Meta:
        model = Inspection
        fields = (

The query produced by this ModelResource causes an N+1 queries issue, so before we ever deployed it to production we patched it and added prefetch_related:

from django.db.models import Prefetch
from import_export import resources, fields, widgets

from .models import Inspection, Violation

class InspectionResource(resources.ModelResource):
    violations = fields.Field(
        widget=widgets.ManyToManyWidget(Violation, field='violation_type')

    class Meta:
        model = Inspection
        fields = (

    def export(self, queryset=None):
        queryset = (
        return super().export(list(queryset))

    def dehydrate_violations(self, inspection: Inspection) -> str:
        return '\n'.join(
            for v in inspection.prefetched_violations

To use prefetch_related in a ModelResource we had to make the following changes:

  1. Override export and adjust the query to prefetch the violations using prefetch_related. We use the Prefetch object because we needed to customize the prefetch query, and add the violation type name from a related table.

  2. Evaluate the query and have the export function return a list instead of a queryset. django-import-export uses iterator to speed up the query. Using iterator(), the ORM uses a cursor to iterate over the data in chunks and reduce memory. While this is usually useful, Django is unable to use iterator() with prefetch_related.

  3. Add a custom dehydrate_ function for the violations field that will render a comma-delimited list of violation type names.

The resource was used by the view to produce the Excel report:

from django.http import HttpRequest, HttpResponse
from .models import Inspection
from .resources import InspectionResource

LIMIT = 10000

def export_to_excel(request: HttpRequest) -> HttpResponse:
    inspections = Inspection.objects.all()
    # Apply some filter on the queryset based on request
    data = InspectionResource().export(inspections[:LIMIT]).xlsx
    response = HttpResponse(data, content_type='text/xlsx')
    response['Content-Disposition'] = 'attachment; filename=export.xlsx'
    return response

The view takes a request, apply some filter on the inspections and produces the xlsx file using the ModelResource.

Finding the Best File Format

Before we can start improving the export process, we need to establish a baseline. To get the timings and identify the hot spots in the call stack we used cProfile. To identify and time query execution we turned SQL logging on in the Django settings:


    # ...
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
        # ...

The benchmark looked like this:

import cProfile

from .models import Inspection
from .resources import InspectionResource

qs = VehicleInspection.objects.all()[:10000]'resources.VehicleInspectionResource().export(qs).xlsx')

These were the results of exporting 10,000 rows in xlsx format using prefetch_related:

56830808 function calls (47142920 primitive calls) in 41.574 seconds
select    5.009
prefetch  8.009

56660555 function calls (47149065 primitive calls) in 39.927 seconds
select    2.356
prefetch  7.991

We ran the benchmark twice to make sure the results were not effected by caches. The function took 40s to complete, and only 10s of it (25%) were spent in the database.

At this point, we suspected that the problem might be in the file format. This assumption was supported by the application server's high CPU usage.

Next, we wanted to try the same benchmark, only instead of xlsx we produced a csv:'resources.VehicleInspectionResource().export(qs).csv')

These were the results of exporting 10,000 rows in csv format using prefetch_related:

9179705 function calls (9107672 primitive calls) in 17.429 seconds
select    1.970
prefetch  8.343

Wow! That's a big improvement. This confirmed our suspicion that the actual production of the xlsx was the problem.

Before we moved on, we wanted to check another file format that might be more useful to our users, the old xls format:'resources.VehicleInspectionResource().export(qs).xls')

These were the results of exporting 10,000 rows in xls format using prefetch_related:

16317592 function calls (15745704 primitive calls) in 20.694 seconds
select    1.922
prefetch  7.976

OK, so that's surprising. I'm not familiar with the internals of the Microsoft Office file formats, but it seems like the old format is only a little bit slower than the csv format, and much faster than the new xlsx format.

This benchmark results brought up an old dilemma. In the past we used to serve users with only csv files, but they complained a lot about troubles opening the files, and encoding and formatting issues. For this reason we decided to produce xlsx in the first place, so at that time, producing xls files seemed like the best solution.

I should already tell you, using the old xls format was a bad decision, but we didn't know that yet.

Improving the Query

After reducing the overall execution time by half, our next targets were the queries. Two queries are executed to produce the dataset for the export. Before any change is made, it took the "main" query ~2s and the prefetch ~8s to complete.

The "main" query looked like this:

    -- around 50 more fields from joined tables
    INNER JOIN "auth_user" ON ("inspection"."user_id" = "auth_user"."id")
    -- around 11 more joined tables

The resource used a lot of data from related tables, and the query joined ~12 tables and had many fields listed in the SELECT clause. The table is one of the main tables in the database so it is heavily indexed, and the lookup tables were relatively small so the query didn't take long to complete.

The prefetch query looked like this:

FROM "violation"
    INNER JOIN "violation_type" ON (
        "violation"."violation_type_id" = "violation_type"."id"
    "violation"."inspection_id" = ANY([
        2814, 9330, 8848, 8971, 9372, 9084, 78, 3896, 2609, 5177, 2866
        -- another 10,000 inspection IDs
        1399, 9348, 914, 8884, 9082, 3356, 2896, 742, 9432, 8926, 9153

This query seems innocent, but in fact, it took ~8s to complete. The execution plan of this query looked like this:

Nested Loop (cost=1000.28..2040346.39 rows=26741 width=181)
  ->  Gather (cost=1000.00..2032378.29 rows=26741 width=115)
        Workers Planned: 2
        ->  Parallel Seq Scan on violation  (cost=0.00..2028704.19 rows=11142 width=115)
              Filter: (vehicle_inspection_id = ANY ('{2814,9330,....,8926,9153}'::integer[]))
  ->  Index Scan using violationtype_pkey on violationtype  (cost=0.28..0.30 rows=1 width=66)
        Index Cond: (id = violation.violation_type_id)

I trimmed the execution plan for brevity, but the Filter line was three or four pages long, filled with IDs. This got us thinking, is it possible that this huge ANY filter is what's causing us trouble?

To answer this question we decided to try and implement the query without prefetch_related. Instead, we decided to use the new Subquery expression.

Using Subquery the query using the ORM looked like that:

from django.db.models import OuterRef, Subquery, Value
from django.contrib.postgres.aggregates import ArrayAgg

inspections = inspections.annotate(violations_csv=Subquery(
    # Reference the inspection ID of the outer table, inspection.
    # Prevent Django from adding a group by column.
    # Construct an array of violation names.
    .annotate(violations=ArrayAgg('violation_type__name', distinct=True))

If you never experimented with Subquery there is a lot to take in here. Before we break it down, this is what the query looks like:

            ARRAY_AGG(DISTINCT U2."name") AS "violations"
            "violation" U0
            INNER JOIN "violationtype" U2 ON (U0."violation_type_id" = U2."id")
            U0."inspection_id" = ("inspection"."id")

    ) AS "violations_csv",
    -- around 50 more fields from joined tables
    INNER JOIN "auth_user" ON ("inspection"."user_id" = "auth_user"."id")
    -- around 11 more joined tables

Now, let's break it down:

  • Subquery is a query expression that can only exist inside another query. In this case, the outer query is inspection.
  • Subquery in used in annotate so the result of the subquery is stored in a another column for each row.
  • We added a dummy annotation to prevent Django from grouping the results. The subquery is executed for each inspection, this is what the filter using OuterRef does. For this reason, we don't need to group by any other column.
  • The subquery must return at most one row, so we group the names into an array using ARRAY_AGG.

After all this hard work, we were keen to see if this is the silver bullet we were waiting for, but in fact, when we executed this on 10,000 rows it choked. To see it through, we executed the export function with only 1,000 rows.

These were the results of exporting 1,000 rows in xls format using subquery:

1571053 function calls (1514505 primitive calls) in 60.962 seconds
select 59.917

The query is now crazy slow. I won't paste the execution plan because there were so many other tables, but PostgreSQL used a nested loop join on the top level of the query to produce the value for this field. Surprisingly, the database did a significantly worse job than the ORM did in this case.

Using an Iterator

Before we completely abandoned this solution, we wanted to check one last thing. We previously mentioned that django-import-export is using iterator() to create a cursor over the results. We also mentioned that using prefetch_related prevents us from using iterator(). Well, we no longer use prefetch_related so we might as well check if using iterator() makes any difference.

These were the results of exporting 1,000 rows in xls format using subquery and iterator:

1571580 function calls (1514788 primitive calls) in 62.130 seconds
select 60.618

The iterator made no difference.

Simplifying the Query

In a final attempt to get something out of this expedition, we wanted to see if the complexity of the query prevented PostgreSQL from finding an optimal execution plan. To do that, we could have adjusted the database parameters from_collapse_limit and join_collapse_limit and let PostgreSQL take all the time and resources it needs to find an optimal execution plan, but instead, we decided to strip all other fields from the resources besides id and violations.

These were the results of exporting 1,000 rows containing only the id and violations fields in xls format using subquery and iterator:

6937 function calls (6350 primitive calls) in 57.280 seconds
select  57.255

No change, this is officially a dead end!

Manual Prefetch

After a quick lunch break we decided it's time pull out the big guns. If Django's prefetch implementation wasn't working for us, and PostgreSQL was unable to produce a decent execution plan, we will just have to do it ourselves.

To implement our own "prefetch" we needed to adjust some of the other functions in the resource:

from import_export import resources, fields

from .models import Inspection, Violation

class InspectionResource(resources.ModelResource):
    violations = fields.Field()

    class Meta:
        model = Inspection
        fields = (

    def export(self, queryset=None):
        # Manually prefetch the violations.
        self.prefetched_violations = dict(
                # Clean all joins.

        return super().export(queryset)

    def dehydrate_violations(self, inspection: Inspection) -> str:
        return '\n'.join(self.prefetched_violations.get(, []))

This looks like a lot, but it's actually not:

  1. We create our own "prefetch related" dict prefetched_violations:

    • The key is the violation ID, and the value is an array containing the violation names (violations_csv).
    • To fetch only relevant violations, we use filter using queryset to filter only the necessary inspections.
    • We executed select_related(None) to remove all previously set select_related tables, and make the ORM remove any unnecessary joins.
  2. We return the original queryset to the export function which produces the Excel file.

  3. To construct the value for the violations field, we use the prefetched_violations we populated during export. This is the "lookup" part of the prefetch. While using Django's prefetch_related we have access to this value on the instance, when we do it manually we have to look it up ourselves.

  4. Once again, since we no longer use Django's prefetch_related we were able to use an iterator. So, instead of evaluating the query we return a queryset.

We already got disappointed after putting in a lot of effort the last time, let's see if this time the hard work paid off.

These were the results of exporting 10,000 rows in xls format using manual prefetch and iterator:

15281887 function calls (14721333 primitive calls) in 11.411 seconds
select  0.833
manual prefetch 0.107

Compared to the 40 seconds we started with, this is an overall 75% improvement. 20s were reduced by switching to xls format, another 10s were from manually doing the prefetch.

We are ready for production!

Trouble in Paradise

Quickly after rolling out the new version to production we started getting complaints from users not being able to open the file.

Remember I told you using xls was a bad idea? Well, when users started downloading the xls files they got a nasty message saying the file is corrupt, and excel, thank god, managed to salvage some of the data (which is way worse!).

One might ask, "but how come you didn't catch this in QA?". Well, that's just another reason we hate working with Excel. When we tested it locally on our Linux desktops using LibreOffice, it worked just fine.

"But it works on my machine!"
"But it works on my machine!"

So let's recap:

  • xlsx is slow and consumes a lot of CPU.
  • xls is not supported by the excel version used by our users.
  • csv has many encoding and formatting issues, and proved to be unusable in the past.

Using a Different Excel Writer

As always, when all options suck and the future is looking bleak, we turned to Google.

A quick search of "python excel performance" brought up this gist which compares 4 different Excel writers in Python (gotta love the internet!).

These are the benchmark results:

# Source:

    python      : 2.7.2
    openpyxl    : 2.2.1
    pyexcelerate: 0.6.6
    xlsxwriter  : 0.7.2
    xlwt        : 1.0.0

    Rows = 10000
    Cols = 50

    pyexcelerate          :  10.63
    xlwt                  :  16.93
    xlsxwriter (optimised):  20.37
    xlsxwriter            :  24.24
    openpyxl   (optimised):  26.63
    openpyxl              :  35.75

According to the results, there is a big difference between the xlsx libraries.

As mentioned before, we use django-import-export to produce excel files from Django models and querysets. Under the hood, django-import-export is using the popular tablib package to do the actual export.

Tablib offers export and import capabilities to and from many formats, but it doesn't do any of the heavy lifting itself. To produce xlsx files, tablib is using the package openpyxl.

A Faster Excel Writer in Python

Looking back at the benchmark results, openpyxl is the slowest among all packages. It looks like by switching to the fastest implementation, pyexcelerate we might be able to gain some significant improvement for this export process.

The package pyexcelerate looked great from start. The tag line is just what we needed:

PyExcelerate is a Python for writing Excel-compatible XLSX spreadsheet files, with an emphasis on speed.

Even the snarky subtitles on the "Usage" section in the README were just what we wanted: fast, faster and fastest!

With such promising benchmarks and README, we had to try it out!

Patching tablib

We already have an entire system built on top of django-import-export and tablib, and we didn't want to start making changes everywhere. So instead, we looked for a way to patch tablib, and make it use pyexcelerate instead of openpyxl.

After some digging, we found that tablib uses an internal function called _register_formats to add export and import formats such as csv, xls and xlsx. To get a list of available formats, tablib imports a collection called available from the module formats. The contents of the file formats/ where the collection is defined, look like this:

# -*- coding: utf-8 -*-

""" Tablib - formats

from . import _csv as csv
from . import _json as json
from . import _xls as xls
from . import _yaml as yaml
from . import _tsv as tsv
from . import _html as html
from . import _xlsx as xlsx
from . import _ods as ods
from . import _dbf as dbf
from . import _latex as latex
from . import _df as df
from . import _rst as rst
from . import _jira as jira

available = (json, xls, yaml, csv, dbf, tsv, html, jira, latex, xlsx, ods, df, rst)

The interesting part is the contents of the file The file defines some functions to export and import from Excel using openpyxl.

To patch tablib, we first need to implement a similar interface to the one in using pyexcelerate, and then register it in tablib.

Let's start with implementing using pyexcelerate:

import itertools
from io import BytesIO

from tablib.formats._xlsx import *  # noqa
from pyexcelerate import Workbook

# Override the default xlsx implementation
title = 'xlsx'

def export_set(dataset, freeze_panes=True):
    """Returns XLSX representation of Dataset."""
    title = dataset.title or 'Sheet1'
    wb = Workbook()
    wb.new_sheet(title, data=itertools.chain([dataset.headers], dataset))
    stream = BytesIO()
    return stream.getvalue()

def export_book(databook, freeze_panes=True):
    """Returns XLSX representation of DataBook."""
    assert len(databook._datasets) == 1
    return export_set(databook._datasets[0], freeze_panes)

def dset_sheet(*args, **kwargs):
    assert False, 'How did you get here?'

This is a simple implementation of the main functions. It lacks some functionalities such a multiple sheets, but it was fine for our needs.

Next, we need to make tablib register this file instead of the existing xlsx format. To do that, we created a new file called

import tablib

from . import fast_xlsx

# Override default xlsx format with a faster implementation
# using `pyexcelerate` (export only).
tablib.formats.available += (fast_xlsx, )

To apply the patch to tablib, we import our implementation and add it to the available formats list. We then import this file in the module's so every time the system starts up, tablib is patched.

Now for the moment of truth, did all this hard work finally paid off?

These were the results of exporting 10,000 rows in xlsx format with pyexcelerate using manual prefetch and iterator:

13627507 function calls (13566956 primitive calls) in 10.944 seconds
select 0.137
manual prefetch 2.219

The hard work definitely paid off! Just so we have an honest comparison, these are the results of exporting 10,000 rows in xlsx format without patching tablib using manual prefetch and iterator:

55982358 function calls (46155371 primitive calls) in 29.965 seconds
select 0.137
manual prefetch 1.724

That's a 64% improvement compared to the default implementation provided by tablib, and a 75% improvements compared to the 40s we started with.

Results Summary

This a summary of all the results mentioned in the article:

Time Rows Format Method
39.927s 10,000 xlsx prefetch_related (Django)
17.429s 10,000 csv prefetch_related (Django)
20.694s 10,000 xls prefetch_related (Django)
60.962 1,000 xls subquery
62.130 1,000 xls subquery and iterator
57.280s 1,000 xls simplified query, subquery and iterator
29.965s 10,000 xlsx default tablib implementation, manual prefetch and iterator
11.411s 10,000 xls using manual prefetch and iterator
10.944s 10,000 xlsx using pyexcelerate, manual prefetch and iterator


We try to study every incident and take actions to prevent similar incidents from happening in the future. During this incident, some of our users did experience slowness for a short period of time, however, the "Export to Excel" functionality did not really killed our app.

Following this incident, there are a few open questions we haven't had the chance to fully explore yet:

  • Why was the prefetch query so slow? The difference boils down to executing Model.objects.filter(fk__in = [1,2,3,4....9,999, 10,000]) vs executing Model.objects.filter(fk__in = OtherModel.objects.filter( ... ).values_list('pk')). When we tried to compare the two in the database, we found no difference, but the built-in prefetch_related was significantly slower. Is it possible that time is being spent generating the query in Python?

  • Can openpyxl3 performance be improved? When I talked to John, the author of the Excel writers benchmark, he mentioned that openpyxl3 can be faster if lxml is installed.

  • Is xlsx really the best format? Can we eliminate some of the problems we had with csv by switching to a different textual format such as tsv?

If you have the answer to any of these questions feel free to share them with me and i'll be happy to post the response.

UPDATED: Aug 19, 2019

Comments From Readers

A reader from ran a quick benchmark to check how faster openpyxl can get using lxml. These were his results:

python: 3.6.8
    Cols = 50
    Sheets = 1
    Proportion text = 0.10
optimised = True

Rows = 10000
openpyxl: 2.6.3 using LXML True:   3.70
openpyxl: 2.6.3 using LXML False:   6.03

Rows = 1000
openpyxl: 2.6.3 using LXML True:   0.37
openpyxl: 2.6.3 using LXML False:   0.57

This benchmark shows that openpyxl can be made almost twice as fast just by installing lxml. However, pyexcelerate improved the speed by a factor of 3.

Many reader on Reddit and Lobsters suggested that a better approach would be to generate the Excel file on the client side using Javascript. This is definitely something worth considering when designing a new system, even thought I think this approach might be problematic for very large files.

Similar articles