Haki Benitahttps://hakibenita.com/2024-01-03T00:00:00+02:00Fastest Way to Read Excel in Python2024-01-03T00:00:00+02:002024-01-03T00:00:00+02:00Haki Benitatag:hakibenita.com,2024-01-03:/fast-excel-python<p>I'm fairly sure that Excel is the most common way to store data, manipulate data, and yes(!), even pass data around. This is why it's not uncommon to find yourself reading Excel in Python. In this article I compare several ways to read Excel from Python.</p><hr>
<p>I don't have any data to support this next claim, but I'm fairly sure that Excel is the most common way to store, manipulate, and yes(!), even pass data around. This is why it's not uncommon to find yourself reading Excel in Python. I recently needed to, so I tested and benchmarked several ways of reading Excel files in Python.</p>
<p><strong>In this article I compare several ways to read Excel from Python.</strong></p>
<div class="dark--invert">
<figure><img alt="<small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small>" src="https://hakibenita.com/images/00-fast-excel-python.png"><figcaption><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small></figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#what-are-we-testing">What are we Testing?</a><ul>
<li><a href="#speed">Speed</a></li>
<li><a href="#types">Types</a></li>
<li><a href="#correctness">Correctness</a></li>
</ul>
</li>
<li><a href="#reading-excel-in-python">Reading Excel in Python</a><ul>
<li><a href="#reading-excel-using-pandas">Reading Excel using Pandas</a></li>
<li><a href="#reading-excel-using-tablib">Reading Excel using Tablib</a></li>
<li><a href="#reading-excel-using-openpyxl">Reading Excel using Openpyxl</a></li>
<li><a href="#reading-excel-using-libreoffice">Reading Excel using LibreOffice</a></li>
<li><a href="#reading-excel-using-duckdb">Reading Excel using DuckDB</a></li>
<li><a href="#reading-excel-using-calamine">Reading Excel using Calamine</a></li>
</ul>
</li>
<li><a href="#results-summary">Results Summary</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="what-are-we-testing"><a class="toclink" href="#what-are-we-testing">What are we Testing?</a></h2>
<p>To compare ways to read Excel files with Python, we first need to establish what to measure, and how.</p>
<p>We start by creating a 25MB Excel file containing 500K rows with various column types:</p>
<figure><img alt="Excel file" src="https://hakibenita.com/images/01-excel-file.png"><figcaption>Excel file</figcaption>
</figure>
<p>Excel supports both the xls and the xlsx file formats. We'll use the newer format xlsx.</p>
<p>For the benchmarks, we'll implement functions to import data from Excel and return an <code>Iterator</code> of dicts:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">iter_excel</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="c1"># TODO...</span>
</pre></div>
<p>We return an <code>Iterator</code> to allow consumers to process the file row by row. This can potentially <a href="/python-django-optimizing-excel-export">reduce the memory footprint</a> by not storing the entire file in memory as we process it. As we'll see in the benchmarks, this is not always possible.</p>
<p>To produce a "clean" timing we iterate the generator without actually doing any processing:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">iter_excel</span><span class="p">(</span><span class="n">file</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>This will cause the generator to fully evaluate with minimal performance or memory overhead.</p>
<h3 id="speed"><a class="toclink" href="#speed">Speed</a></h3>
<p>The most obvious thing to measure is time, and the most accurate way to measure time in Python for performance purposes is using <a href="https://docs.python.org/3/library/time.html#time.perf_counter" rel="noopener"><code>time.perf_counter</code></a>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">iter_excel</span><span class="p">(</span><span class="n">file</span><span class="p">):</span> <span class="k">pass</span>
<span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span>
</pre></div>
<p>We start the timer, iterate the entire generator and calculate the elapsed time.</p>
<h3 id="types"><a class="toclink" href="#types">Types</a></h3>
<p>Some formats such as <a href="https://parquet.apache.org/" rel="noopener">parquet</a> and <a href="https://avro.apache.org/" rel="noopener">avro</a> are known for being self-describing, keeping the schema inside the file, while other formats such as CSV are notorious for not keeping any information about the data they store.</p>
<p>Excel can be seen as a format that does store type information about its content - there are Date cells, Number cells, Decimal cells and others, so when loading data from Excel, it can be useful to receive the data in its intended type. This is especially useful for types such as date, where the format may be unclear or unknown, or strings that hold digits such as phone numbers or zipcodes. In these situations, trying to sniff the type can cause incorrect results (due to trimming leading zeros, assuming incorrect format and so on).</p>
<p>To be fair, some may argue that when loading data into your system you should have knowledge about its schema, so preserving types may not be a strict requirement for some.</p>
<h3 id="correctness"><a class="toclink" href="#correctness">Correctness</a></h3>
<p>To test the correctness of the import process, we include a control row at the beginning of the Excel file. We'll use the control row as reference to make sure the data is imported correctly:</p>
<div class="highlight"><pre><span></span><span class="c1"># Test correctness of imported data using a control row</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">expected_value</span> <span class="ow">in</span> <span class="p">(</span>
<span class="p">(</span><span class="s1">'number'</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'decimal'</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'date'</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'boolean'</span><span class="p">,</span> <span class="kc">True</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'text'</span><span class="p">,</span> <span class="s1">'CONTROL ROW'</span><span class="p">),</span>
<span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">control_row</span><span class="p">[</span><span class="n">key</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'π΄ "</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1">" missing'</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">expected_value</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">type</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'π΄ "</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1">" expected type "</span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">expected_value</span><span class="p">)</span><span class="si">}</span><span class="s1">" received type "</span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="si">}</span><span class="s1">"'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">expected_value</span> <span class="o">!=</span> <span class="n">value</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'π΄ "</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1">" expected value "</span><span class="si">{</span><span class="n">expected_value</span><span class="si">}</span><span class="s1">" received "</span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s1">"'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'π’ "</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1">"'</span><span class="p">)</span>
</pre></div>
<p>We'll run this test after each benchmark to make sure that all of the expected keys exist in the control row, and that the types and values are as we expect.</p>
<hr>
<h2 id="reading-excel-in-python"><a class="toclink" href="#reading-excel-in-python">Reading Excel in Python</a></h2>
<p>We now have a sample file, a way to test the contents and we've defined what to measure - we are ready to import some data!</p>
<h3 id="reading-excel-using-pandas"><a class="toclink" href="#reading-excel-using-pandas">Reading Excel using Pandas</a></h3>
<p><a href="https://pandas.pydata.org/" rel="noopener">Pandas</a>, the data analysis library for Python, is the go-to for just about anything related to data in Python, so it's a good place to start.</p>
<p>Read an Excel file using <code>pandas</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span>
<span class="k">def</span> <span class="nf">iter_excel_pandas</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="k">yield from</span> <span class="n">pandas</span><span class="o">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">file</span><span class="p">)</span><span class="o">.</span><span class="n">to_dict</span><span class="p">(</span><span class="s1">'records'</span><span class="p">)</span>
</pre></div>
<p>Just two commands chained together to get a list of dictionaries from an Excel file. This is what a single row from the result looks like:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_pandas</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="o">...</span>
<span class="p">{</span><span class="s1">'boolean'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'date'</span><span class="p">:</span> <span class="n">Timestamp</span><span class="p">(</span><span class="s1">'2000-01-01 00:00:00'</span><span class="p">),</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="mf">1.1</span><span class="p">,</span>
<span class="s1">'number'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">}</span>
</pre></div>
<p>At a quick glance we can see the date is not a <code>datetime.date</code> but a pandas <code>Timestamp</code>. The rest looks OK. If the <code>Timestamp</code> is an issue and you insist on <code>datetime.date</code>, you can provide a converter function to <code>read_excel</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span>
<span class="k">def</span> <span class="nf">iter_excel_pandas</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="k">yield from</span> <span class="n">pandas</span><span class="o">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">converters</span><span class="o">=</span><span class="p">{</span>
<span class="hll"> <span class="s1">'date'</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">ts</span><span class="p">:</span> <span class="n">ts</span><span class="o">.</span><span class="n">date</span><span class="p">(),</span>
</span> <span class="p">})</span><span class="o">.</span><span class="n">to_dict</span><span class="p">(</span><span class="s1">'records'</span><span class="p">)</span>
</pre></div>
<p>The converter accepts a pandas <code>Timestamp</code> and converts is to a <code>datetime.date</code>. This is the control row with the custom converter:</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
<span class="s1">'number'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="mf">1.1</span><span class="p">,</span>
<span class="hll"> <span class="s1">'date'</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
</span> <span class="s1">'boolean'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>If you use <code>pandas</code> to read data from Excel it's not unreasonable to assume you also want to continue your analysis with pandas, so we'll accept the <code>Timestamp</code> as a valid type for our benchmark.</p>
<p>Next, run the benchmark on the large Excel file:</p>
<div class="highlight"><pre><span></span>iter_excel_pandas
elapsed 32.98058952600695
π’ "number"
π’ "decimal"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'pandas._libs.tslibs.timestamps.Timestamp'>"
π’ "boolean"
π’ "text"
</pre></div>
<p>The import took ~32s to complete. The type of the date field is a pandas <code>Timestamp</code> and not <code>datetime.date</code>, but that's OK.</p>
<h3 id="reading-excel-using-tablib"><a class="toclink" href="#reading-excel-using-tablib">Reading Excel using Tablib</a></h3>
<p><a href="https://tablib.readthedocs.io/en/stable/" rel="noopener">Tablib</a> is one of the most popular libraries in Python for importing and exporting data in various formats. It was originally developed by the creator of the popular <code>requests</code> library, and therefor characterized by a similar focus on developer experience and ergonomics.</p>
<p>To install Tablib, execute the following command:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>tablib
</pre></div>
<p>Read an Excel file using <code>tablib</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">tablib</span>
<span class="k">def</span> <span class="nf">iter_excel_tablib</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="k">yield from</span> <span class="n">tablib</span><span class="o">.</span><span class="n">Dataset</span><span class="p">()</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">file</span><span class="p">)</span><span class="o">.</span><span class="n">dict</span>
</pre></div>
<p>Just a single line of code and the library does all the heavy lifting.</p>
<p>Before we go on to execute the benchmark, this is how the first row of the results looks like:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_tablib</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="o">...</span>
<span class="n">OrderedDict</span><span class="p">([(</span><span class="s1">'number'</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'decimal'</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'date'</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'boolean'</span><span class="p">,</span> <span class="kc">True</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'text'</span><span class="p">,</span> <span class="s1">'CONTROL ROW'</span><span class="p">)])</span>
</pre></div>
<p><a href="https://docs.python.org/3/library/collections.html#collections.OrderedDict" rel="noopener"><code>OrderedDict</code></a> is a subclass of a Python <code>dict</code> with some additional methods to rearrange the dictionary order. It's defined in the built-in <a href="https://docs.python.org/3/library/collections.html" rel="noopener"><code>collections</code> module</a> and it is what tablib returns when you ask for a dict. Since <code>OrderedDict</code> is a subclass of <code>dict</code> and it's defined in a built-in module, we don't mind and consider it just fine for our purposes.</p>
<p>Now for the benchmark on the large Excel file:</p>
<div class="highlight"><pre><span></span>iter_excel_tablib
elapsed 28.526969947852194
π’ "number"
π’ "decimal"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'datetime.datetime'>"
π’ "boolean"
π’ "text"
</pre></div>
<p>Import using tablib took 28s, faster than pandas (32s). The date cell was returned as a <code>datetime.datetime</code> instead of a <code>datetime.date</code>, not unreasonable.</p>
<p>Let's see if we can bring this timing down even further.</p>
<h3 id="reading-excel-using-openpyxl"><a class="toclink" href="#reading-excel-using-openpyxl">Reading Excel using Openpyxl</a></h3>
<p><a href="https://openpyxl.readthedocs.io/en/stable/" rel="noopener">Openpyxl</a> is a library for reading and writing Excel files in Python. Unlike Tablib, Openpyxl is dedicated just to Excel and does not support any other file types. In fact, both <a href="https://github.com/jazzband/tablib/blob/master/src/tablib/formats/_xlsx.py" rel="noopener">tablib</a> and <a href="https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#:~:text=%E2%80%9Copenpyxl%E2%80%9D%20supports%20newer%20Excel%20file%20formats" rel="noopener">pandas</a> use Openpyxl under the hood when reading xlsx files. Perhaps this specialization will result in better performance.</p>
<p>To install <code>openpyxl</code>, execute the following command:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>openpyxl
</pre></div>
<p>Read an Excel file using <code>openpyxl</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">openpyxl</span>
<span class="k">def</span> <span class="nf">iter_excel_openpyxl</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">openpyxl</span><span class="o">.</span><span class="n">load_workbook</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
<span class="n">rows</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span><span class="o">.</span><span class="n">rows</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">headers</span><span class="p">,</span> <span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">)))</span>
</pre></div>
<p>This time we have to write a bit more code, so let's break it down:</p>
<ol>
<li>
<p><strong>Load a workbook from the open file</strong>: The function <code>load_workbook</code> supports both a file path and a readable stream. In our case we operate on an open file.</p>
</li>
<li>
<p><strong>Get the active sheet</strong>: An Excel file can contain multiple sheets and we can choose which one to read. In our case we only have one sheet.</p>
</li>
<li>
<p><strong>Construct a list of headers</strong>: The first row in the Excel file includes the headers. To use these header as keys for our dictionary we read the first row and produce the list of headers.</p>
</li>
<li>
<p><strong>Return the results</strong>: Iterate the rows and construct a dictionary for each row using the headers and the cell values. <code>openpyxl</code> uses a Cell type that includes both the value and some metadata. This can be useful for other purposes, but we only need the values. To access a cell's value we use <code>cell.value</code>.</p>
</li>
</ol>
<p>This is what the first row of the results looks like:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_openpyxl</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="p">{</span><span class="s1">'boolean'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'date'</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="mf">1.1</span><span class="p">,</span>
<span class="s1">'number'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">}</span>
</pre></div>
<p>Looks promising! Run the benchmark on the large file:</p>
<div class="highlight"><pre><span></span>iter_excel_openpyxl
elapsed 35.62
π’ "number"
π’ "decimal"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'datetime.datetime'>"
π’ "boolean"
π’ "text"
</pre></div>
<p>Importing the large Excel file using openpyxl took ~35s, longer then the Tablib (28s) and pandas (32s).</p>
<p>A quick search at the documentation revealed a promising section titled <a href="https://openpyxl.readthedocs.io/en/stable/optimized.html" rel="noopener">"performance"</a>. In this section, openpyxl describes "optimized modes" to speed things up when only reading or writing a file:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">openpyxl</span>
<span class="k">def</span> <span class="nf">iter_excel_openpyxl</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="hll"> <span class="n">workbook</span> <span class="o">=</span> <span class="n">openpyxl</span><span class="o">.</span><span class="n">load_workbook</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span><span class="o">.</span><span class="n">rows</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">headers</span><span class="p">,</span> <span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">)))</span>
</pre></div>
<p>The worksheet in now loaded in "read only" mode. Since we only want to read the contents and not write, this is acceptable. Let's run the benchmark again and see if it affected the results:</p>
<div class="highlight"><pre><span></span>iter_excel_openpyxl
elapsed 24.79
π’ "number"
π’ "decimal"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'datetime.datetime'>"
π’ "boolean"
π’ "text"
</pre></div>
<p>Opening the file in "read only" mode brings the timing down from 35s to 24s - faster than tablib (28s) and pandas (32s).</p>
<h3 id="reading-excel-using-libreoffice"><a class="toclink" href="#reading-excel-using-libreoffice">Reading Excel using LibreOffice</a></h3>
<p>We have now exhausted the traditional and obvious ways to import Excel into Python. We used the top designated libraries and got decent results. It's now the time to think outside the box.</p>
<p><a href="https://www.libreoffice.org/" rel="noopener">LibreOffice</a> is a free and open source alternative to <em>the other</em> office suite. LibreOffice can process both xls and xlsx files and also happens to include a headless mode with some useful command line options:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>libreoffice<span class="w"> </span>--help
LibreOffice<span class="w"> </span><span class="m">7</span>.5.8.2<span class="w"> </span><span class="m">50</span><span class="o">(</span>Build:2<span class="o">)</span>
Usage:<span class="w"> </span>soffice<span class="w"> </span><span class="o">[</span>argument...<span class="o">]</span>
<span class="w"> </span>argument<span class="w"> </span>-<span class="w"> </span>switches,<span class="w"> </span>switch<span class="w"> </span>parameters<span class="w"> </span>and<span class="w"> </span>document<span class="w"> </span>URIs<span class="w"> </span><span class="o">(</span>filenames<span class="o">)</span>.
...
</pre></div>
<p>One of the LibreOffice command line options is to convert files between different formats. For example, we can use <code>libreoffice</code> to convert an xlsx file to a csv file:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>libreoffice<span class="w"> </span>--headless<span class="w"> </span>--convert-to<span class="w"> </span>csv<span class="w"> </span>--outdir<span class="w"> </span>.<span class="w"> </span>file.xlsx
convert<span class="w"> </span>file.xlsx<span class="w"> </span>-><span class="w"> </span>file.csv<span class="w"> </span>using<span class="w"> </span>filter:<span class="w"> </span>Text<span class="w"> </span>-<span class="w"> </span>txt<span class="w"> </span>-<span class="w"> </span>csv<span class="w"> </span><span class="o">(</span>StarCalc<span class="o">)</span>
$<span class="w"> </span>head<span class="w"> </span>file.csv
number,decimal,date,boolean,text
<span class="m">1</span>,1.1,01/01/2000,TRUE,CONTROL<span class="w"> </span>ROW
<span class="m">2</span>,1.2,01/02/2000,FALSE,RANDOM<span class="w"> </span>TEXT:0.716658989024692
<span class="m">3</span>,1.3,01/03/2000,TRUE,RANDOM<span class="w"> </span>TEXT:0.966075283958641
</pre></div>
<p>Nice! Let's stich it together using Python. We'll first convert the xlsx file to CSV and then import the CSV into Python:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">subprocess</span><span class="o">,</span> <span class="nn">tempfile</span><span class="o">,</span> <span class="nn">csv</span>
<span class="k">def</span> <span class="nf">iter_excel_libreoffice</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="k">with</span> <span class="n">tempfile</span><span class="o">.</span><span class="n">TemporaryDirectory</span><span class="p">(</span><span class="n">prefix</span><span class="o">=</span><span class="s1">'excelbenchmark'</span><span class="p">)</span> <span class="k">as</span> <span class="n">tempdir</span><span class="p">:</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span>
<span class="s1">'libreoffice'</span><span class="p">,</span> <span class="s1">'--headless'</span><span class="p">,</span> <span class="s1">'--convert-to'</span><span class="p">,</span> <span class="s1">'csv'</span><span class="p">,</span>
<span class="s1">'--outdir'</span><span class="p">,</span> <span class="n">tempdir</span><span class="p">,</span> <span class="n">file</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="p">])</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">tempdir</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">file</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">rsplit</span><span class="p">(</span><span class="s2">"."</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s1">.csv'</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">rows</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">headers</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">headers</span><span class="p">,</span> <span class="n">row</span><span class="p">))</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li>
<p><strong>Create a temporary directory for storing our CSV file</strong>: Use the built-in <a href="https://docs.python.org/3/library/tempfile.html" rel="noopener"><code>tempfile</code></a> module to create a temporary directory that will cleanup automatically when we are done. Ideally, we would want to convert a specific file into a file-like-object in memory, but the <code>libreoffice</code> command line does not provide any way of converting into a specific file, only to a directory.</p>
</li>
<li>
<p><strong>Convert to CSV using the <code>libreoffice</code> command line</strong>: Use the built-in <a href="https://docs.python.org/3/library/subprocess.html" rel="noopener"><code>subprocess</code></a> module to execute an OS command.</p>
</li>
<li>
<p><strong>Read the generated CSV</strong>: Open the newly created CSV file, parse it using the <a href="https://docs.python.org/3/library/csv.html" rel="noopener">build-in <code>csv</code> module</a> and produce dicts.</p>
</li>
</ol>
<p>This is what the first row of the results looks like:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_libreoffice</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="p">{</span><span class="s1">'number'</span><span class="p">:</span> <span class="s1">'1'</span><span class="p">,</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="s1">'1.1'</span><span class="p">,</span>
<span class="s1">'date'</span><span class="p">:</span> <span class="s1">'01/01/2000'</span><span class="p">,</span>
<span class="s1">'boolean'</span><span class="p">:</span> <span class="s1">'TRUE'</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">}</span>
</pre></div>
<p>We immediately notice that we lost all the type information - all values are strings.</p>
<p>Let's run the benchmark to see if it's worth it:</p>
<div class="highlight"><pre><span></span>iter_excel_libreoffice
convert file.xlsx -> file.csv using filter : Text - txt - csv (StarCalc)
elapsed 15.279242266900837
π΄ "number" expected type "<class 'int'>" received type "<class 'str'>"
π΄ "decimal" expected type "<class 'float'>" received type "<class 'str'>"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'str'>"
π΄ "boolean" expected type "<class 'bool'>" received type "<class 'str'>"
π’ "text"
</pre></div>
<p>To be honest, this was faster than I had anticipated! Using LibreOffice to convert the file to CSV and then loading it took only 15s - faster than pandas (35s), tablib (28s) and openpyxl (24s).</p>
<p>We did lose the type information when we converted the file to CSV and if we had to also convert the types it will most likely take a bit more time (<a href="/django-rest-framework-slow">serialization can be slow you know</a>). But overall, not a bad option!</p>
<h3 id="reading-excel-using-duckdb"><a class="toclink" href="#reading-excel-using-duckdb">Reading Excel using DuckDB</a></h3>
<p>If we're already down the path of using external tools, why not give the new kid on the block a chance at competing.</p>
<p><a href="https://duckdb.org/" rel="noopener">DuckDB</a> is an "in-process SQL OLAP database management system". This description does not make it immediately clear why DuckDB can be useful in this case, but it is. DuckDB is very good at moving data around and converting between formats.</p>
<p>To install the <a href="https://duckdb.org/docs/api/python/overview.html" rel="noopener">DuckDB Python API</a> execute the following command:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>duckdb
</pre></div>
<p>Read an Excel file using <code>duckdb</code> in Python:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">duckdb</span>
<span class="k">def</span> <span class="nf">iter_excel_duckdb</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="n">duckdb</span><span class="o">.</span><span class="n">install_extension</span><span class="p">(</span><span class="s1">'spatial'</span><span class="p">)</span>
<span class="n">duckdb</span><span class="o">.</span><span class="n">load_extension</span><span class="p">(</span><span class="s1">'spatial'</span><span class="p">)</span>
<span class="n">rows</span> <span class="o">=</span> <span class="n">duckdb</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="sa">f</span><span class="s2">"""</span>
<span class="s2"> SELECT * FROM st_read(</span>
<span class="s2"> '</span><span class="si">{</span><span class="n">file</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">',</span>
<span class="s2"> open_options=['HEADERS=FORCE', 'FIELD_TYPES=AUTO'])</span>
<span class="s2"> """</span><span class="p">)</span>
<span class="k">while</span> <span class="n">row</span> <span class="o">:=</span> <span class="n">rows</span><span class="o">.</span><span class="n">fetchone</span><span class="p">():</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">rows</span><span class="o">.</span><span class="n">columns</span><span class="p">,</span> <span class="n">row</span><span class="p">))</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li>
<p><strong>Install and load the <code>spatial</code> extension</strong>: To import data from Excel using <code>duckdb</code> you need install the <a href="https://duckdb.org/docs/extensions/spatial.html" rel="noopener"><code>spatial</code> extension</a>. This is a bit strange because <code>spatial</code> is used for geo manipulations, but <a href="https://duckdb.org/docs/guides/import/excel_import.html" rel="noopener">that's what it wants</a>.</p>
</li>
<li>
<p><strong>Query the file</strong>: When executing queries directly using the <code>duckdb</code> global variable it will use an in-memory database by default, similar to using <code>sqlite</code> with the <code>:memory:</code> option. To actually import the Excel file, we use the <code>st_read</code> function with the path to the file as first argument. In the function options, we set the first row as headers, and activate the option to automatically detect types (this is also the default).</p>
</li>
<li>
<p><strong>Construct the result</strong>: Iterate the rows and construct dicts using the list of headers and values of each row.</p>
</li>
</ol>
<p>This is what the first row looks like using DuckDB to import the Excel file:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_duckdb</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="p">{</span><span class="s1">'boolean'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'date'</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="mf">1.1</span><span class="p">,</span>
<span class="s1">'number'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">}</span>
</pre></div>
<p>Now that we have our process to read an Excel file using DuckDB to Python, let's see how it performs:</p>
<div class="highlight"><pre><span></span>iter_excel_duckdb
elapsed 11.36
π’ "number"
π’ "decimal"
π’ "date"
π’ "boolean"
π’ "text"
</pre></div>
<p>First of all, we have a winner with the types! DuckDB was able to correctly detect all the types. Additionally, DuckDB clocked at only 11s, which brings us closer to a single digit timing!</p>
<p>One thing that bothered me with this implementation was that despite my best efforts, I was unable to use a parameter for the name of the file using the <code>duckdb.sql</code> function. Using string concatenation to generate SQL is dangerous, prone to injection and should be avoided when possible.</p>
<p>In one of my attempts to resolve this, I tried to use <code>duckdb.execute</code> instead of <code>duckdb.sql</code>, which seemed to accept parameters in this case:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">duckdb</span>
<span class="k">def</span> <span class="nf">iter_excel_duckdb_execute</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="n">duckdb</span><span class="o">.</span><span class="n">install_extension</span><span class="p">(</span><span class="s1">'spatial'</span><span class="p">)</span>
<span class="n">duckdb</span><span class="o">.</span><span class="n">load_extension</span><span class="p">(</span><span class="s1">'spatial'</span><span class="p">)</span>
<span class="hll"> <span class="n">conn</span> <span class="o">=</span> <span class="n">duckdb</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span>
</span> <span class="s2">"SELECT * FROM st_read(?, open_options=['HEADERS=FORCE', 'FIELD_TYPES=AUTO'])"</span><span class="p">,</span>
<span class="p">[</span><span class="n">file</span><span class="o">.</span><span class="n">name</span><span class="p">],</span>
<span class="p">)</span>
<span class="hll"> <span class="n">headers</span> <span class="o">=</span> <span class="p">[</span><span class="n">header</span> <span class="k">for</span> <span class="n">header</span><span class="p">,</span> <span class="o">*</span><span class="n">rest</span> <span class="ow">in</span> <span class="n">conn</span><span class="o">.</span><span class="n">description</span><span class="p">]</span>
</span> <span class="k">while</span> <span class="n">row</span> <span class="o">:=</span> <span class="n">conn</span><span class="o">.</span><span class="n">fetchone</span><span class="p">():</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">headers</span><span class="p">,</span> <span class="n">row</span><span class="p">))</span>
</pre></div>
<p>There are two main differences here:</p>
<ol>
<li>
<p><strong>Use <code>duckdb.execute</code> instead of <code>duckdb.sql</code></strong>: Using <code>execute</code> I was able to provide the file name as parameter rather than using string concatenation. This is safer.</p>
</li>
<li>
<p><strong>Construct the headers</strong>: According to the API reference, <a href="http://duckdb.org/docs/api/python/reference/#duckdb.sql" rel="noopener"><code>duckdb.sql</code></a> returns a <code>DuckDBPyRelation</code> while <a href="http://duckdb.org/docs/api/python/reference/#duckdb.execute" rel="noopener"><code>duckdb.execute</code></a> returns a <code>DuckDBPyConnection</code>. To produce a list of headers from the connection object, I was unable to access <code>.columns</code> like before, so I had to look at the <code>description</code> property of the connection, which I imagine describes the current result set.</p>
</li>
</ol>
<p>Running the benchmark using the new function yielded some interesting results:</p>
<div class="highlight"><pre><span></span>iter_excel_duckdb_execute
elapsed 5.73
π΄ "number" expected type "<class 'int'>" received type "<class 'str'>"
π΄ "decimal" expected type "<class 'float'>" received type "<class 'str'>"
π΄ "date" expected type "<class 'datetime.date'>" received type "<class 'str'>"
π΄ "boolean" expected type "<class 'bool'>" received type "<class 'str'>"
π’ "text"
</pre></div>
<p>Using <code>execute</code> we gobbled the file in just 5.7s - that's twice as fast as the last attempt, but we lost the types. Without much knowledge and experience using DuckDB I can only assume constructing the relation and casting to the correct types incurs some overhead.</p>
<p>Before we move on to other options, let's check if pre-loading and installing the extensions makes any significant difference:</p>
<div class="highlight"><pre><span></span><span class="w"> </span>import duckdb
<span class="gi">+duckdb.install_extension('spatial')</span>
<span class="gi">+duckdb.load_extension('spatial')</span>
<span class="gi">+</span>
<span class="w"> </span>def iter_excel_duckdb_execute(file: IO[bytes]) -> Iterator[dict[str, object]]:
<span class="gd">- duckdb.install_extension('spatial')</span>
<span class="gd">- duckdb.load_extension('spatial')</span>
<span class="w"> </span> rows = duckdb.execute(
<span class="w"> </span> "SELECT * FROM st_read(?, open_options=['HEADERS=FORCE', 'FIELD_TYPES=AUTO'])",
</pre></div>
<p>Executing the function several times:</p>
<div class="highlight"><pre><span></span>iter_excel_duckdb_execute
elapsed 5.28
elapsed 5.69
elapsed 5.28
</pre></div>
<p>Pre-loading the extensions did not have a significant effect on the timing.</p>
<p>Let's see if removing the automatic type detection has any effect:</p>
<div class="highlight"><pre><span></span><span class="w"> </span>duckdb.load_extension('spatial')
<span class="w"> </span>def iter_excel_duckdb_execute(file: IO[bytes]) -> Iterator[dict[str, object]]:
<span class="w"> </span> conn = duckdb.execute(
<span class="gd">- "SELECT * FROM st_read(?, open_options=['HEADERS=FORCE', 'FIELD_TYPES=AUTO'])",</span>
<span class="gi">+ "SELECT * FROM st_read(?, open_options=['HEADERS=FORCE', 'FIELD_TYPES=STRING'])",</span>
<span class="w"> </span> [file.name],
<span class="w"> </span> )
<span class="w"> </span> headers = [header for header, *rest in conn.description]
</pre></div>
<p>Executing the function several times:</p>
<div class="highlight"><pre><span></span>iter_excel_duckdb_execute
elapsed 5.80
elapsed 7.21
elapsed 6.45
</pre></div>
<p>Removing the automatic type detection also didn't seem to have any significant effect on the timing.</p>
<h3 id="reading-excel-using-calamine"><a class="toclink" href="#reading-excel-using-calamine">Reading Excel using Calamine</a></h3>
<p>In recent years it seems like every performance problem in Python ends up being solved with another language. As a Python developer, I consider this a true blessing. It means I can keep using the language I'm used to and enjoy the performance benefits of all others!</p>
<p><a href="https://docs.rs/calamine/latest/calamine/" rel="noopener">Calamine</a> is a pure Rust library to read Excel and OpenDocument Spreadsheet files. To install <a href="https://github.com/dimastbk/python-calamine" rel="noopener">python-calamine</a>, the Python binding for calamine, execute the following command:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>python-calamine
</pre></div>
<p>Read an Excel file using <code>calamine</code> in Python:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">python_calamine</span>
<span class="k">def</span> <span class="nf">iter_excel_calamine</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">IO</span><span class="p">[</span><span class="nb">bytes</span><span class="p">])</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]]:</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">python_calamine</span><span class="o">.</span><span class="n">CalamineWorkbook</span><span class="o">.</span><span class="n">from_filelike</span><span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="c1"># type: ignore[arg-type]</span>
<span class="n">rows</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">workbook</span><span class="o">.</span><span class="n">get_sheet_by_index</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">to_python</span><span class="p">())</span>
<span class="n">headers</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="k">yield</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">headers</span><span class="p">,</span> <span class="n">row</span><span class="p">))</span>
</pre></div>
<p>Going through the same routine again - load the workbook, pick the sheet, fetch the headers from the first row, iterate the results and construct a dict from every row.</p>
<p>This is what the first row looks like:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.xlsx'</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">iter_excel_calamine</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="o">...</span> <span class="n">row</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="p">{</span><span class="s1">'boolean'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'date'</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">'decimal'</span><span class="p">:</span> <span class="mf">1.1</span><span class="p">,</span>
<span class="s1">'number'</span><span class="p">:</span> <span class="mf">1.0</span><span class="p">,</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="s1">'CONTROL ROW'</span><span class="p">}</span>
</pre></div>
<p>Running the benchmark:</p>
<div class="highlight"><pre><span></span>iter_excel_calamine
elapsed 3.58
π΄ "number" expected type "<class 'int'>" received type "<class 'float'>"
π’ "decimal"
π’ "date"
π’ "boolean"
π’ "text"
</pre></div>
<p>That's a big leap! Using <code>python-calamine</code> we processed the entire file in just 3.5s - the fastest so far! The only red dot here is because our integer was interpreted as float - not entirely unreasonable.</p>
<p>After pocking around a bit, the only issue I could find with <code>python-calamine</code> is that it cannot produce results as an iterator. The function <code>CalamineWorkbook.from_filelike</code> will load the entire dataset into memory, which depending on the size of the file, can be an issue. The author of the Python binding library <a href="https://x.com/dima_st_bk/status/1728352773664833597" rel="noopener">pointed me</a> to <a href="https://github.com/PyO3/pyo3/issues/1085" rel="noopener">this issue</a> in the underlying binding library <a href="https://github.com/PyO3/pyo3" rel="noopener"><code>pyo3</code></a>, which prevents iteration on Rust structures from Python.</p>
<hr>
<h2 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h2>
<p>Here is a summary of methods to read Excel files using Python:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>Method</th>
<th>Timing (seconds)</th>
<th>Types</th>
<th>Version</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pandas</td>
<td>32.98</td>
<td>Yes</td>
<td>2.1.3</td>
</tr>
<tr>
<td>Tablib</td>
<td>28.52</td>
<td>Yes</td>
<td>3.5.0</td>
</tr>
<tr>
<td>Openpyxl</td>
<td>35.62</td>
<td>Yes</td>
<td>3.1.2</td>
</tr>
<tr>
<td>Openpyxl (readonly)</td>
<td>24.79</td>
<td>Yes</td>
<td>3.1.2</td>
</tr>
<tr>
<td>LibreOffice</td>
<td>15.27</td>
<td>No</td>
<td>7.5.8.2</td>
</tr>
<tr>
<td>DuckDB (sql)</td>
<td>11.36</td>
<td>Yes</td>
<td>0.9.2</td>
</tr>
<tr>
<td>DuckDB (execute)</td>
<td>5.73</td>
<td>No</td>
<td>0.9.2</td>
</tr>
<tr>
<td>Calamine (python-calamine)</td>
<td>3.58</td>
<td>Yes</td>
<td>0.22.1 (0.1.7)</td>
</tr>
</tbody>
</table>
</div>
<p>So which one should you use? it depends... There are a few additional considerations other than speed when choosing a library for working with Excel files in Python:</p>
<ul>
<li>
<p><strong>Write capability</strong>: we benchmarked ways to read Excel, but sometimes it's necessary to produce Excel files as well. Some of the libraries we benchmarked does not support writing. Calamine for example, cannot write Excel files, only read.</p>
</li>
<li>
<p><strong>Additional formats</strong>: A system may require loading and producing files in other formats other than Excel. Some libraries, such as pandas and Tablib support a variety of additional formats, while calamine and openpyxl only support Excel.</p>
</li>
</ul>
<div class="admonition source">
<p class="admonition-title">Source</p>
<p>The source code for the benchmarks is available in <a href="https://github.com/hakib/fast-excel-python" rel="noopener">this repo</a>.</p>
</div>When Good Correlation is Not Enough2023-07-27T00:00:00+03:002023-07-27T00:00:00+03:00Haki Benitatag:hakibenita.com,2023-07-27:/postgresql-correlation-brin-multi-minmax<p>Choosing to use a block range index (BRIN) to query a field with high correlation is a no-brainer for the optimizer. However, under some easily reproducible circumstances, a BRIN index can result in significantly slower execution even when the indexed field has very high correlation. In this article I describe how using a BRIN index in presumably "ideal circumstances" can result in degraded performance, and suggest a recent new feature of PostgreSQL as a remedy.</p><hr>
<p>Choosing to use a block range index (BRIN) to query a field with high correlation is a no-brainer for the optimizer. The small size of the index and the field's correlation makes BRIN an ideal choice. However, a recent event taught us that correlation can be misleading. Under some easily reproducible circumstances, a BRIN index can result in significantly slower execution even when the indexed field has very high correlation.</p>
<p><strong>In this article I describe how using a BRIN index in presumably "ideal circumstances" can result in degraded performance and suggest a recent new feature of PostgreSQL as a remedy.</strong></p>
<div class="dark--invert">
<figure><img alt="<br><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small>" src="https://hakibenita.com/images/00-postgres-correlation-brin-multi-minmax.png"><figcaption><br><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small></figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#how-brin-index-works">How BRIN Index Works</a></li>
<li><a href="#correlation">Correlation</a></li>
<li><a href="#brin-index-with-perfect-correlation">BRIN Index with Perfect Correlation</a></li>
<li><a href="#brin-index-with-outliers">BRIN Index with Outliers</a></li>
<li><a href="#multi-minmax-brin-index-with-outliers">Multi-Minmax BRIN Index with Outliers</a></li>
<li><a href="#how-multi-minmax-brin-index-works">How Multi-minmax BRIN Index Works</a></li>
<li><a href="#results-summary">Results Summary</a></li>
<li><a href="#the-back-story">The Back Story</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="how-brin-index-works"><a class="toclink" href="#how-brin-index-works">How BRIN Index Works</a></h2>
<style>table { width: auto } </style>
<p>Block range index works by keeping the minimum and maximum value for a ranges of adjacent table pages. To best understand how BRIN index works, let's build one.</p>
<p>Imagine you have the following values, each in a single table block:</p>
<table>
<thead>
<tr>
<th>Physical Ordering</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td><code>A</code></td>
<td><code>B</code></td>
<td><code>C</code></td>
<td><code>D</code></td>
<td><code>E</code></td>
<td><code>F</code></td>
<td><code>G</code></td>
<td><code>H</code></td>
<td><code>I</code></td>
</tr>
<tr>
<td>Logical Ordering</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
</tbody>
</table>
<p>Every group of 3 adjacent pages is a range:</p>
<div class="t1">
<p><style>.t1 table tr td:nth-child(3n + 1), .t1 table tr th:nth-child(3n + 1) { border-right: 1px dashed var(--brand-color); }</style></p>
<table>
<thead>
<tr>
<th>Physical Ordering</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td><code>A</code></td>
<td><code>B</code></td>
<td><code>C</code></td>
<td><code>D</code></td>
<td><code>E</code></td>
<td><code>F</code></td>
<td><code>G</code></td>
<td><code>H</code></td>
<td><code>I</code></td>
</tr>
<tr>
<td>Logical Ordering</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
</tbody>
</table>
</div>
<p>For each range, keep only the minimum and the maximum value:</p>
<table>
<thead>
<tr>
<th>Range</th>
<th>Minmax</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-3</td>
<td><code>A</code>-<code>C</code></td>
</tr>
<tr>
<td>4-6</td>
<td><code>D</code>-<code>F</code></td>
</tr>
<tr>
<td>7-9</td>
<td><code>G</code>-<code>I</code></td>
</tr>
</tbody>
</table>
<p>This is a single minimax block range index.</p>
<p>Let's use the index to find the value <code>E</code>:</p>
<div class="t1-1">
<p><style>
.t1-1 table tr:nth-child(1), .t1-1 table tr:nth-child(3) { background: color-mix(in srgb, var(--negative) 20%, transparent) }
.t1-1 table tr:nth-child(2) { background: color-mix(in srgb, var(--positive) 20%, transparent)}
</style></p>
<table>
<thead>
<tr>
<th>Range</th>
<th>Minmax</th>
<th>Is <code>E</code> in range?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-3</td>
<td><code>A</code>-<code>C</code></td>
<td>π Definitely not here</td>
</tr>
<tr>
<td>4-6</td>
<td><code>D</code>-<code>F</code></td>
<td>π© Might be here</td>
</tr>
<tr>
<td>7-9</td>
<td><code>G</code>-<code>I</code></td>
<td>π Definitely not here</td>
</tr>
</tbody>
</table>
</div>
<p>Using the BRIN index we can eliminate blocks 1-3 and 7-9, and narrow the search to just blocks 4-5 in the table. That's pretty good.</p>
<p>Using the BRIN index the database still needs to read all the table blocks matched by the index and search for the value. PostgreSQL calls this "index recheck". In this case, the database would read blocks 4, 5 and 6 and search for rows with the value <code>E</code>.</p>
<p>A BRIN index can't say a value is definitely in a range. It can only say a value is definitely <em>not</em> in the range, or that it <em>might</em> be in the range. Indexes that produce inconclusive results, or false positives, are called "lossy indexes". Lossy indexes often sacrifice accuracy for space, and as we'll see later, a BRIN index is very small compared to the alternatives. Other types of lossy indexes in PostgreSQL include GIN and GIST.</p>
<p>In the index we've just built, the blocks are perfectly ordered by the value they held. To illustrate why this is important for BRIN indexes, consider another example where the blocks are not perfectly ordered:</p>
<div class="t2">
<p><style>.t2 table tr td:nth-child(3n + 1), .t2 table tr th:nth-child(3n + 1) { border-right: 1px dashed var(--brand-color); }</style></p>
<table>
<thead>
<tr>
<th>Physical Ordering</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td><code>B</code></td>
<td><code>I</code></td>
<td><code>E</code></td>
<td><code>A</code></td>
<td><code>D</code></td>
<td><code>G</code></td>
<td><code>C</code></td>
<td><code>H</code></td>
<td><code>F</code></td>
</tr>
<tr>
<td>Logical Ordering</td>
<td>2</td>
<td>9</td>
<td>5</td>
<td>1</td>
<td>4</td>
<td>7</td>
<td>3</td>
<td>8</td>
<td>6</td>
</tr>
</tbody>
</table>
</div>
<p>Once again we keep just the minimum and maximum for each range of 3 adjacent pages, and look for the value <code>E</code>:</p>
<div class="t2-1">
<p><style>.t2-1 table tr{ background: color-mix(in srgb, var(--positive) 20%, transparent) }</style></p>
<table>
<thead>
<tr>
<th>Range</th>
<th>Minmax</th>
<th>Is <code>E</code> in range?</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-3</td>
<td><code>B</code>β<code>I</code></td>
<td>π© Might be here</td>
</tr>
<tr>
<td>4-6</td>
<td><code>A</code>β<code>G</code></td>
<td>π© Might be here</td>
</tr>
<tr>
<td>7-9</td>
<td><code>C</code>β<code>H</code></td>
<td>π© Might be here</td>
</tr>
</tbody>
</table>
</div>
<p>We're unable to eliminate any of the table blocks. The index is useless in this case! In fact, using this index the database would have to read the entire index <em>and</em> the entire table. That's bad.</p>
<hr>
<h2 id="correlation"><a class="toclink" href="#correlation">Correlation</a></h2>
<p>This is how the <a href="https://www.postgresql.org/docs/current/view-pg-stats.html" rel="noopener">PostgreSQL documentation defines correlation</a>:</p>
<blockquote>
<p>Statistical correlation between physical row ordering and logical ordering of the column values.</p>
</blockquote>
<p>Correlation is the coefficient between the logical order of a value and its physical position in the table. In an append-only table for example, an auto incrementing key would have perfect correlation of 1. A random value however, should have very poor correlation, a value close to 0.</p>
<p>The database keeps track of correlation when it analyzes values in table columns. The result is stored in <code>pg_stats.correlation</code> and the optimizer is factoring it in when deciding if and which indexes to use.</p>
<p>Consider the following time series:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="nb">time</span><span class="w"> </span><span class="k">zone</span><span class="w"> </span><span class="n">UTC</span><span class="p">;</span>
<span class="go">SET</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">setseed</span><span class="p">(</span><span class="mf">0.4050</span><span class="p">);</span>
<span class="go"> setseed</span>
<span class="go">βββββββββ</span>
<span class="go">(1 row)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span>
<span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="nb">timestamptz</span><span class="p">,</span>
<span class="w"> </span><span class="n">padding</span><span class="w"> </span><span class="nb">text</span>
<span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>The table <code>t</code> contains an identity field <code>id</code>, a timestamp and some data. To get the ball rolling, we populate the table with 1M rows:</p>
<div class="highlight"><pre><span></span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="n">happened_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">padding</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2023-01-01 UTC'</span><span class="p">::</span><span class="n">timestamptz</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 second'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">n</span><span class="p">),</span>
<span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1000000</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="k">INSERT</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">1000000</span>
</pre></div>
<p>We add some padding to the table to make benchmarks more realistic. If you're curious about this approach read about <a href="/sql-medium-text-performance">the surprising impact of medium-size texts on PostgreSQL performance</a>.</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span><span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">happened_at</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="go"> count β max</span>
<span class="go">ββββββββββΌββββββββββββββββββββββββ</span>
<span class="go"> 1000000 β 2023-01-12 13:46:40+00</span>
</pre></div>
<p>The table <code>t</code> now contains 1M rows with the timestamp field <code>happened_at</code> going up until <code>2023-01-12 13:46:40+00</code>.</p>
<p>You may have noticed that we created the table with incrementing timestamps. We did that to make sure we get perfect correlation:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="go">ANALYZE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">correlation</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'t'</span>
<span class="k">AND</span><span class="w"> </span><span class="n">attname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'happened_at'</span><span class="p">;</span>
<span class="go"> correlation</span>
<span class="go">βββββββββββββ</span>
<span class="go"> 1</span>
</pre></div>
<p>The correlation of the field <code>happened_at</code> is 1. This means we have perfect correlation between the physical ordering and the logical ordering of the values of the column <code>happened_at</code> in the table.</p>
<p>To establish a baseline, execute a query to find values that happened in a duration of one minute with no indexes on the table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-12 13:45:00 UTC'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-12 13:46:00 UTC'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> Gather (cost=1000.00..150108.10 rows=1 width=1016) (actual time=1330.979..1334.093 rows=61 loops=1)</span>
<span class="go"> Workers Planned: 2</span>
<span class="go"> Workers Launched: 2</span>
<span class="hll"><span class="go"> -> Parallel Seq Scan on t (cost=0.00..149108.00 rows=1 width=1016) (actual time=1308.808..1308.828 rows=20 loops=3)</span>
</span><span class="go"> Filter: ((happened_at >= '2023-01-12 15:45:00+02'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 15:46:00+02'::timestamp with time zone))</span>
<span class="go"> Rows Removed by Filter: 333313</span>
<span class="go"> Planning Time: 0.193 ms</span>
<span class="go"> JIT:</span>
<span class="go"> Functions: 6</span>
<span class="go"> Options: Inlining false, Optimization false, Expressions true, Deforming true</span>
<span class="go"> Timing: Generation 1.750 ms, Inlining 0.000 ms, Optimization 0.855 ms, Emission 10.032 ms, Total 12.637 ms</span>
<span class="hll"><span class="go"> Execution Time: 1334.676 ms</span>
</span></pre></div>
<p>With a no indexes, a cold cache and two parallel workers the query completed in ~1.3 seconds. After executing the query several times the timing can reach ~200ms.</p>
<hr>
<h2 id="brin-index-with-perfect-correlation"><a class="toclink" href="#brin-index-with-perfect-correlation">BRIN Index with Perfect Correlation</a></h2>
<p>Another way of thinking about correlation is that when searching for a range of values, it's very likely that rows with approximate values will be in the same block, or an adjacent block. This feature is <a href="/sql-tricks-application-dba#index-columns-with-high-correlation-using-brin">extremely important for BRIN indexes</a>, as the <a href="https://www.postgresql.org/docs/current/brin.html" rel="noopener">documentation emphasises</a>:</p>
<blockquote>
<p>BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table</p>
</blockquote>
<p>The <code>happened_at</code> column has perfect correlation so it's an ideal candidate for a BRIN index:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_happened_at_brin_minmax</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t</span>
<span class="k">USING</span><span class="w"> </span><span class="n">brin</span><span class="p">(</span><span class="n">happened_at</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="o">=</span><span class="mf">10</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>We create a BRIN index on the field <code>happened_at</code>. We did not explicitly provide an opclass, so the default <code>minmax</code> is used. We also override the default <code>pages_per_range</code> and set it to 10. This means the BRIN index will keep the minimum and maximum values of the field <code>happened_at</code> for every range of 10 adjacent pages in the table.</p>
<p>Being a lossy index, one of the main benefits of BRIN indexes is their small size:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">t_happened_at_brin_minmax</span>
<span class="go">List of relations</span>
<span class="go">β[ RECORD 1 ]ββ¬ββββββββββββββββββββββββββ</span>
<span class="go">Schema β public</span>
<span class="go">Name β t_happened_at_brin_minmax</span>
<span class="go">Type β index</span>
<span class="go">Owner β haki</span>
<span class="go">Table β t</span>
<span class="go">Persistence β permanent</span>
<span class="go">Access method β brin</span>
<span class="hll"><span class="go">Size β 520 kB</span>
</span><span class="go">Description β</span>
</pre></div>
<p>The size of the index is just 520 kB.</p>
<p>To see the BRIN index in action, execute the same query to find values that happened in a duration of one minute:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-12 13:45:00 UTC'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-12 13:46:00 UTC'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Bitmap Heap Scan on t (cost=218.00..494.40 rows=1 width=1016)</span>
<span class="go"> (actual time=8.400..8.464 rows=61 loops=1)</span>
<span class="go"> Recheck Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="hll"><span class="go"> Rows Removed by Index Recheck: 59</span>
</span><span class="go"> Heap Blocks: lossy=18</span>
<span class="go"> -> Bitmap Index Scan on t_happened_at_brin_minmax (cost=0.00..218.00 rows=70 width=0)</span>
<span class="go"> (actual time=8.371..8.371 rows=180 loops=1)</span>
<span class="go"> Index Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="go">Planning Time: 0.568 ms</span>
<span class="hll"><span class="go">Execution Time: 8.514 ms</span>
</span></pre></div>
<p>The database used the BRIN index and returned results pretty quickly, only 8.5 ms. The "Bitmap Index Scan" on the BRIN index produced a list of 18 blocks that <em>may</em> contain relevant values. The database then read those 18 table block and rechecked the condition against the rows in these blocks. 59 rows did not match the condition and were "Removed by Index Recheck", leaving 61 rows in the final result.</p>
<p>To better understand how a BRIN index works, we can use the <a href="https://www.postgresql.org/docs/current/pageinspect.html" rel="noopener"><code>pageinspect</code> extension</a> to view the contents of the actual index blocks:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTENSION</span><span class="w"> </span><span class="n">pageinspect</span><span class="p">;</span>
<span class="go">CREATE EXTENSION</span>
</pre></div>
<p>To find the index blocks, we start by looking at the index metapage info:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">brin_metapage_info</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_happened_at_brin_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">));</span>
<span class="go"> magic β version β pagesperrange β lastrevmappage</span>
<span class="go">βββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββ</span>
<span class="go"> 0xA8109CFA β 1 β 10 β 11</span>
</pre></div>
<p>The index blocks starts after the <code>lastrevmappage</code>. To view the contents of the first index block, we inspect the next block (11 + 1 = 12):</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">blknum</span><span class="p">,</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_happened_at_brin_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_happened_at_brin_minmax'</span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span>
<span class="go"> blknum β value</span>
<span class="go">βββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="hll"><span class="go"> 2910 β {2023-01-01 05:39:31+00 .. 2023-01-01 05:40:40+00}</span>
</span><span class="go"> 2920 β {2023-01-01 05:40:41+00 .. 2023-01-01 05:41:50+00}</span>
<span class="go"> 2930 β {2023-01-01 05:41:51+00 .. 2023-01-01 05:43:00+00}</span>
</pre></div>
<p>Let's break down the first row:</p>
<ul>
<li>The range consists of 10 table pages (from 2910 to 2920). This is the value we provided for <code>pages_per_range</code> when we created the index.</li>
<li>The range contains values from <code>2023-01-01 05:39:31+00</code> to <code>2023-01-01 05:40:40+00</code>, that's 70 seconds. The range should contain ~70 rows. Let's verify:</li>
</ul>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-01 05:39:31+00'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-01 05:40:40+00'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">βββββββ</span>
<span class="go"> 70</span>
</pre></div>
<hr>
<h2 id="brin-index-with-outliers"><a class="toclink" href="#brin-index-with-outliers">BRIN Index with Outliers</a></h2>
<p>So far we've used a BRIN index on a field with perfect correlation - this is the ideal situation for a BRIN index. Next, we create a similar table and introduce some outliers - extreme values that don't follow the natural correlation of the field:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">t_outliers</span><span class="w"> </span><span class="k">AS</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="k">CASE</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mf">70</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span>
<span class="hll"><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 year'</span><span class="w"> </span><span class="c1">-- <-- Outlier</span>
</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">happened_at</span>
<span class="w"> </span><span class="k">END</span><span class="p">,</span>
<span class="w"> </span><span class="n">padding</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">t</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">id</span><span class="p">;</span>
<span class="go">SELECT 1000000</span>
</pre></div>
<p>The new table is created from the previous table <code>t</code>. To introduce outliers, every 70 rows we add a year to the value of <code>happened_at</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_outliers</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">OFFSET</span><span class="w"> </span><span class="mf">65</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10</span><span class="p">;</span>
<span class="go"> id β happened_at</span>
<span class="go">βββββΌββββββββββββββββββββββββ</span>
<span class="go"> 66 β 2023-01-01 02:01:06+02</span>
<span class="go"> 67 β 2023-01-01 02:01:07+02</span>
<span class="go"> 68 β 2023-01-01 02:01:08+02</span>
<span class="go"> 69 β 2023-01-01 02:01:09+02</span>
<span class="hll"><span class="go"> 70 β 2024-01-01 02:01:10+02 -- <--- outlier</span>
</span><span class="go"> 71 β 2023-01-01 02:01:11+02</span>
<span class="go"> 72 β 2023-01-01 02:01:12+02</span>
<span class="go"> 73 β 2023-01-01 02:01:13+02</span>
<span class="go"> 74 β 2023-01-01 02:01:14+02</span>
<span class="go"> 75 β 2023-01-01 02:01:15+02</span>
<span class="go">(10 rows)</span>
</pre></div>
<p>Without the outliers we had perfect correlation. So next, let's analyze the table and see what the correlation of the field <code>happened_at</code> with the outliers:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">t_outliers</span><span class="p">;</span>
<span class="go">ANALYZE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">correlation</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'t_outliers'</span>
<span class="k">AND</span><span class="w"> </span><span class="n">attname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'happened_at'</span><span class="p">;</span>
<span class="go"> correlation</span>
<span class="go">βββββββββββββ</span>
<span class="hll"><span class="go"> 0.97344303</span>
</span></pre></div>
<p>The correlation is no longer perfect, but it's still very high - 0.97. Under normal circumstances, this type of correlation can be considered high, thus a BRIN index is a good candidate:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_outliers_happened_at_brin_minmax</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t_outliers</span>
<span class="k">USING</span><span class="w"> </span><span class="n">brin</span><span class="p">(</span><span class="n">happened_at</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="o">=</span><span class="mf">10</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>With the BRIN index in place, execute the same query as before:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_outliers</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-12 13:45:00 UTC'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-12 13:46:00 UTC'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> Bitmap Heap Scan on t_outliers (cost=218.00..502.23 rows=1 width=1016)</span>
<span class="go"> (actual time=3257.316..3257.372 rows=60 loops=1)</span>
<span class="go"> Recheck Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="hll"><span class="go"> Rows Removed by Index Recheck: 999940</span>
</span><span class="go"> Heap Blocks: lossy=142858</span>
<span class="go"> -> Bitmap Index Scan on t_outliers_happened_at_brin_minmax (cost=0.00..218.00 rows=72 width=0)</span>
<span class="go"> (actual time=28.085..28.085 rows=1428580 loops=1)</span>
<span class="go"> Index Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="go"> Planning Time: 0.785 ms</span>
<span class="hll"><span class="go"> Execution Time: 3257.412 ms</span>
</span></pre></div>
<p>That's a big difference! This query completed in more than 3 seconds - that's ~x300 slower compared to the 8.514 ms it took the execute the same query on the table without the outliers, and ~x3 times slower than the full table scan.</p>
<p>To investigate why this query took so much longer than the previous one, we need to take a closer look at the execution plan. The "Rows Removed by Index Recheck" gives us a hint - the database filtered out 999940 rows from the table blocks matched by the BRIN index. Remember, the table contains only 1M rows, so this query effectively scanned the entire table <em>and</em> the index. This is about as bad as it gets!</p>
<p>To get a better sense of what happened here, let's examine the contents of the BRIN index using <code>pageinspect</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">brin_metapage_info</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_outliers_happened_at_brin_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">));</span>
<span class="go"> magic β version β pagesperrange β lastrevmappage</span>
<span class="go">βββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββ</span>
<span class="go"> 0xA8109CFA β 1 β 10 β 11</span>
<span class="go">(1 row)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">blknum</span><span class="p">,</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_outliers_happened_at_brin_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_outliers_happened_at_brin_minmax'</span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span>
<span class="go"> blknum β value</span>
<span class="go">βββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> 2910 β {2023-01-01 05:39:31+00 .. 2024-01-01 05:40:40+00}</span>
<span class="go"> 2920 β {2023-01-01 05:40:41+00 .. 2024-01-01 05:41:50+00}</span>
<span class="go"> 2930 β {2023-01-01 05:41:51+00 .. 2024-01-01 05:43:00+00}</span>
<span class="go">(3 rows)</span>
</pre></div>
<p>Notice that with the outliers, each index block contains a significantly bigger range of values. Without the outliers, each range contained values in a ~70 seconds time span. With the outliers, each range spans an entire year! When the database used the index to identify pages that may satisfy the query, a lot of ranges matched! This caused the index to produce a lot of false positives and as a result, the database had to read and then sift out a lot of irrelevant rows.</p>
<div class="dark--invert">
<figure><img alt="<small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small>" src="https://hakibenita.com/images/01-postgres-correlation-brin-multi-minmax.png"><figcaption><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small></figcaption>
</figure>
</div>
<hr>
<h2 id="multi-minmax-brin-index-with-outliers"><a class="toclink" href="#multi-minmax-brin-index-with-outliers">Multi-Minmax BRIN Index with Outliers</a></h2>
<p>The default operator class for BRIN indexes is minimax. Using this operator class, each BRIN index entry contains a single minimum and maximum value for each range of adjacent pages in the table. With perfect correlation this is just fine, but with outliers, the single minmax range can cause the index to return a lot of false positives.</p>
<p>What if instead of keeping a single minimum and maximum value for each range, the database would keep multiple minimum and maximum values? This should make the BRIN index more resilient to outliers. In PostgreSQL 14, a <a href="https://www.postgresql.org/docs/current/brin-builtin-opclasses.html" rel="noopener">new set of operator classes <code>*_minmax_multi_ops</code></a> were added to BRIN that does exactly that.</p>
<p>Let's re-create the BRIN index on the table with outliers, but this time use the new multi-minmax operator class:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_outliers_happened_at_brin_minmax</span><span class="p">;</span>
<span class="go">DROP INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_outliers_happened_at_brin_multi_minmax</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t_outliers</span>
<span class="hll"><span class="k">USING</span><span class="w"> </span><span class="n">brin</span><span class="p">(</span><span class="n">happened_at</span><span class="w"> </span><span class="n">timestamptz_minmax_multi_ops</span><span class="p">)</span>
</span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="o">=</span><span class="mf">10</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>To create a multi minimax BRIN index we explicitly provide a specific operator class. The type of the column <code>happened_at</code> is <code>timestamptz</code>, so we use the corresponding <code>timestamptz_minmax_multi_ops</code> operator class.</p>
<p>The multi-minmax index holds more information than the single minmax index, so it should be slightly bigger in size:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">t_outliers_happened_at_brin_multi_minmax</span>
<span class="go">List of relations</span>
<span class="go">β[ RECORD 1 ]ββ¬βββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Schema β public</span>
<span class="go">Name β t_outliers_happened_at_brin_multi_minmax</span>
<span class="go">Type β index</span>
<span class="go">Owner β haki</span>
<span class="go">Table β t_outliers</span>
<span class="go">Persistence β permanent</span>
<span class="go">Access method β brin</span>
<span class="hll"><span class="go">Size β 1304 kB</span>
</span><span class="go">Description β</span>
</pre></div>
<p>Indeed, the multi minmax index is 1304 kB - bigger than the single minmax index that weighs 520 kB.</p>
<p>To check if we get our money's worth for the multi-minmax index, execute the query on the table with the outliers:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_outliers</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-12 13:45:00 UTC'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-12 13:46:00 UTC'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> Bitmap Heap Scan on t_outliers (cost=1290.00..1574.23 rows=1 width=1016)</span>
<span class="go"> (actual time=29.518..29.566 rows=60 loops=1)</span>
<span class="go"> Recheck Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="hll"><span class="go"> Rows Removed by Index Recheck: 60</span>
</span><span class="go"> Heap Blocks: lossy=18</span>
<span class="go"> -> Bitmap Index Scan on t_outliers_happened_at_brin_multi_minmax (cost=0.00..1290.00 rows=72 width=0)</span>
<span class="go"> (actual time=29.491..29.492 rows=180 loops=1)</span>
<span class="go"> Index Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="go"> Planning Time: 5.807 ms</span>
<span class="hll"><span class="go"> Execution Time: 29.620 ms</span>
</span></pre></div>
<p>The database used the multi-minmax index and the query completed in 29ms - a big improvement compared to more than 3s with the single range minmax, and not much worse than the query on the table with no outliers which took 8ms.</p>
<p>Notice the number of "Row Removed by Index Recheck" - it is once again very small, only 60 rows. This means the index did a good job at pointing the database to the most relevant pages in the table. In other words, the index produced very little false positives.</p>
<p>To better understand how the multi-minmax BRIN index achieved this improvement, inspect the contents of the first index block:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">blknum</span><span class="p">,</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_outliers_happened_at_brin_multi_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_outliers_happened_at_brin_multi_minmax'</span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">blknum β 4500</span>
<span class="go">value β {{</span>
<span class="go"> nranges: 2</span>
<span class="go"> nvalues: 14</span>
<span class="go"> maxvalues: 32</span>
<span class="go"> ranges: {</span>
<span class="go"> "2023-01-01 08:45:01+00 ... 2023-01-01 08:45:02+00",</span>
<span class="go"> "2023-01-01 08:45:16+00 ... 2023-01-01 08:46:09+00"}</span>
<span class="go"> values: {</span>
<span class="go"> "2023-01-01 08:45:03+00",</span>
<span class="go"> "2023-01-01 08:45:04+00",</span>
<span class="go"> "2023-01-01 08:45:05+00",</span>
<span class="go"> "2023-01-01 08:45:06+00",</span>
<span class="go"> "2023-01-01 08:45:07+00",</span>
<span class="go"> "2023-01-01 08:45:08+00",</span>
<span class="go"> "2023-01-01 08:45:09+00",</span>
<span class="go"> "2023-01-01 08:45:10+00",</span>
<span class="go"> "2023-01-01 08:45:11+00",</span>
<span class="go"> "2023-01-01 08:45:12+00",</span>
<span class="go"> "2023-01-01 08:45:13+00",</span>
<span class="go"> "2023-01-01 08:45:14+00",</span>
<span class="go"> "2023-01-01 08:45:15+00",</span>
<span class="hll"><span class="go"> "2024-01-01 08:46:10+00"}}}</span>
</span></pre></div>
<p>The contents of the multi-minmax BRIN index now contains both ranges and values. A value is essentially a single value range. For example, the value <code>2023-01-01 08:45:15+00</code> is equivalent to a range <code>2023-01-01 08:45:15+00 .. 2023-01-01 08:45:15+00</code>. For optimization reasons, single value ranges are kept as single values and not as ranges.</p>
<p>Potential outliers will usually be in the list of values. Notice the outlier value <code>2024-01-01 08:46:10+00</code> is not in any of the ranges. This is how the multi-minmax BRIN index is preventing outliers from affecting the index.</p>
<hr>
<h2 id="how-multi-minmax-brin-index-works"><a class="toclink" href="#how-multi-minmax-brin-index-works">How Multi-minmax BRIN Index Works</a></h2>
<p>In the previous example you may have noticed that the multi range minmax index block contained a lot of <code>values</code>, some of which are consecutive:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">blknum</span><span class="p">,</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_outliers_happened_at_brin_multi_minmax'</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_outliers_happened_at_brin_multi_minmax'</span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">blknum β 4500</span>
<span class="go">value β {{</span>
<span class="go"> nranges: 2</span>
<span class="go"> nvalues: 14</span>
<span class="go"> maxvalues: 32</span>
<span class="go"> ranges: {</span>
<span class="go"> "2023-01-01 08:45:01+00 ... 2023-01-01 08:45:02+00",</span>
<span class="go"> "2023-01-01 08:45:16+00 ... 2023-01-01 08:46:09+00"}</span>
<span class="go"> values: {</span>
<span class="hll"><span class="go"> "2023-01-01 08:45:03+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:04+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:05+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:06+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:07+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:08+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:09+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:10+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:11+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:12+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:13+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:14+00",</span>
</span><span class="hll"><span class="go"> "2023-01-01 08:45:15+00",</span>
</span><span class="go"> "2024-01-01 08:46:10+00"}}}</span>
</pre></div>
<p>These values can potentially be combined into their own range of <code>2023-01-01 08:45:03+00 ... 2023-01-01 08:45:15+00</code>, or even added to the existing range <code>2023-01-01 08:45:01+00 ... 2023-01-01 08:45:02+00</code>. The reason they don't, is the multi-minimax algorithm.</p>
<p>A multi-minimax BRIN index defines <code>values_per_range</code> - the maximum amount of values per range of adjacent pages. A minmax range takes up 2 values. When adding new values to a page range, the algorithm works as follows:</p>
<ul>
<li>Does the range of adjacent pages contain more than <code>values_per_range</code> values?<ul>
<li>No -> continue</li>
<li>Yes -> group values into minmax ranges to reduce number of values to <code>values_per_range / 2</code></li>
</ul>
</li>
</ul>
<p>Consider the following simulation using a table with a single integer column <code>n</code>:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">t_brin_minmax</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="nb">int</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_brin_minmax_index</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t_brin_minmax</span>
<span class="hll"><span class="k">USING</span><span class="w"> </span><span class="n">brin</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="n">int4_minmax_multi_ops</span><span class="p">(</span><span class="n">values_per_range</span><span class="o">=</span><span class="mi">8</span><span class="p">))</span>
</span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="o">=</span><span class="mi">2</span><span class="p">);</span>
</pre></div>
<p>We create a table with a single integer field and a multi-minmax BRIN index with 2 pages per range, and a maximum of 8 values per range.</p>
<p>To visualize the multi-minmax BRIN algorithm we add values from 1 to 10 and for each value we print the contents of the index page:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">integer</span><span class="p">;</span>
<span class="w"> </span><span class="n">page_items</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">TRUNCATE</span><span class="w"> </span><span class="n">t_brin_minmax</span><span class="p">;</span>
<span class="hll"><span class="w"> </span><span class="n">FOREACH</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="nb">ARRAY</span><span class="w"> </span><span class="nb">array</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">,</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">]</span><span class="w"> </span><span class="n">LOOP</span>
</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_brin_minmax</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">page_items</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_brin_minmax_index'</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_brin_minmax_index'</span><span class="p">);</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">'inserted=% --> %'</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">page_items</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="w"> </span><span class="err">$$</span><span class="p">;</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">1</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 1 maxvalues: 8 values: {1}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">2</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 2 maxvalues: 8 values: {1,2}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">3</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 3 maxvalues: 8 values: {1,2,3}}}</span>
<span class="hll"><span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">4</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 4 maxvalues: 8 values: {1,2,3,4}}}</span>
</span><span class="hll"><span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">5</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 5"} values: {1,2,3}}}</span>
</span><span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">6</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 6"} values: {1,2,3}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">7</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 7"} values: {1,2,3}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">8</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 8"} values: {1,2,3}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">9</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 9"} values: {1,2,3}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 10"} values: {1,2,3}}}</span>
</pre></div>
<p>Values <code>1</code>, <code>2</code>, <code>3</code>, <code>4</code> are added to the index one after the other but they are not grouped into a range. Only when the value <code>5</code> is added, the number of values exceeds <code>values_per_range / 2 = 4</code> and the database starts grouping values to reduce the number of entries back to 4. At this point, the database joined <code>5</code> to <code>4</code> to create the range <code>4..5</code> the and the rest of the values are kept as values. As we add more values into the index, the range <code>4..5</code> is expanded until it reaches <code>4..10</code>. We don't exceed 4 values, so values <code>1</code>, <code>2</code> and <code>3</code> remain as values.</p>
<p>The order of insertion is significant. If we insert values in a different order the resulting index will be different as well:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">integer</span><span class="p">;</span>
<span class="w"> </span><span class="n">page_items</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">TRUNCATE</span><span class="w"> </span><span class="n">t_brin_minmax</span><span class="p">;</span>
<span class="hll"><span class="w"> </span><span class="n">FOREACH</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="nb">ARRAY</span><span class="w"> </span><span class="nb">array</span><span class="p">[</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="n">LOOP</span>
</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_brin_minmax</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="p">);</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">page_items</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">brin_page_items</span><span class="p">(</span><span class="n">get_raw_page</span><span class="p">(</span><span class="s1">'t_brin_minmax_index'</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="s1">'t_brin_minmax_index'</span><span class="p">);</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">'inserted=% --> %'</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">page_items</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="w"> </span><span class="err">$$</span><span class="p">;</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">5</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 1 maxvalues: 8 values: {5}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">1</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 2 maxvalues: 8 values: {1,5}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 3 maxvalues: 8 values: {1,5,10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">4</span><span class="w"> </span><span class="c1">--> {{nranges: 0 nvalues: 4 maxvalues: 8 values: {1,4,5,10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">2</span><span class="w"> </span><span class="c1">--> {{nranges: 1 nvalues: 3 maxvalues: 8 ranges: {"4 ... 5"} values: {1,2,10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">8</span><span class="w"> </span><span class="c1">--> {{nranges: 2 nvalues: 2 maxvalues: 8 ranges: {"1 ... 2","4 ... 5"} values: {8,10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">3</span><span class="w"> </span><span class="c1">--> {{nranges: 2 nvalues: 2 maxvalues: 8 ranges: {"1 ... 2","3 ... 5"} values: {8,10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">7</span><span class="w"> </span><span class="c1">--> {{nranges: 3 nvalues: 1 maxvalues: 8 ranges: {"1 ... 2","3 ... 5","7 ... 8"} values: {10}}}</span>
<span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">9</span><span class="w"> </span><span class="c1">--> {{nranges: 4 nvalues: 0 maxvalues: 8 ranges: {"1 ... 2","3 ... 5","7 ... 8","9 ... 10"}}}</span>
<span class="hll"><span class="n">NOTICE</span><span class="p">:</span><span class="w"> </span><span class="n">inserted</span><span class="o">=</span><span class="mi">6</span><span class="w"> </span><span class="c1">--> {{nranges: 3 nvalues: 1 maxvalues: 8 ranges: {"1 ... 2","3 ... 5","7 ... 10"} values: {6}}}</span>
</span></pre></div>
<p>The index is now different. Instead of 1 minmax range and 3 values, the index now contains 3 minmax ranges and 1 value.</p>
<hr>
<h2 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h2>
<p>As mentioned before, one of the main benefits of lossy indexes such as BRIN over other types of index, such as B-Tree or a <a href="/postgresql-hash-index">Hash indexes</a>), is their size. To get a fair comparison for both types of index consider a similar B-Tree index on <code>happened_at</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">t_happened_at_btree</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">happened_at</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">t_happened_at_btree</span>
<span class="go">List of relations</span>
<span class="go">β[ RECORD 1 ]ββ¬ββββββββββββββββββββ</span>
<span class="go">Schema β public</span>
<span class="go">Name β t_happened_at_btree</span>
<span class="go">Type β index</span>
<span class="go">Owner β haki</span>
<span class="go">Table β t</span>
<span class="go">Persistence β permanent</span>
<span class="go">Access method β btree</span>
<span class="hll"><span class="go">Size β 21 MB</span>
</span><span class="go">Description β</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2023-01-12 13:45:00 UTC'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2023-01-12 13:46:00 UTC'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> Index Scan using t_happened_at_btree on t (cost=0.42..18.75 rows=66 width=1016)</span>
<span class="go"> (actual time=0.036..0.109 rows=61 loops=1)</span>
<span class="go"> Index Cond: ((happened_at >= '2023-01-12 13:45:00+00'::timestamp with time zone)</span>
<span class="go"> AND (happened_at <= '2023-01-12 13:46:00+00'::timestamp with time zone))</span>
<span class="go"> Planning Time: 0.244 ms</span>
<span class="hll"><span class="go"> Execution Time: 0.145 ms</span>
</span><span class="go">(4 rows)</span>
</pre></div>
<p>The B-Tree index is much faster, but also much bigger!</p>
<p>Here is a recap of the timing and size of the indexes we used:</p>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Index</th>
<th>Timing</th>
<th>Index Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Perfect correlation</td>
<td>No index</td>
<td>1334 ms</td>
<td>-</td>
</tr>
<tr>
<td>Perfect correlation</td>
<td>BRIN minmax</td>
<td>8.52 ms</td>
<td>520 kB</td>
</tr>
<tr>
<td>Perfect correlation</td>
<td>B-Tree</td>
<td>0.145 ms</td>
<td>21 MB</td>
</tr>
<tr>
<td>Outliers</td>
<td>BRIN minmax</td>
<td>3257.412 ms</td>
<td>520 kB</td>
</tr>
<tr>
<td>Outliers</td>
<td>BRIN multi-minmax</td>
<td>29.620 ms</td>
<td>1304 kB</td>
</tr>
</tbody>
</table>
<p>The BRIN indexes with the single minmax value are the same size. The multi-minmax BRIN index is larger because it stores more information. This may be a small price to pay if your data contain outliers.</p>
<hr>
<h2 id="the-back-story"><a class="toclink" href="#the-back-story">The Back Story</a></h2>
<p>The example in this article is contrived - it was engineered to inject an outlier in every range so that the single range BRIN would perform as worst as possible. However, this article was motivated by a similar scenario we reached naturally in our systems.</p>
<p>We have a table of payment processes. Payment processes are usually short lived objects - in less than a minute a normal payment process is either authorized or aborted. However, once in a while a user may abandon a payment process in the middle, leaving the object in a non-terminal state. To overcome that, we've set up a scheduled task to identify these processes and mark them as expired.</p>
<p>When we create a new payment process we keep track of the creation date. It is used by analytical and scheduled queries. The creation date column had a high correlation (~0.97) so we figured a BRIN index would speed things up. At first, queries ran fast, but over time queries become slower and slower to a point where they were constantly timing out. We were puzzled - we had a field with high correlation but the BRIN index performed very poorly. We couldn't figure out why. Only when we inspected the contents of the index using <code>pageinspect</code> we realized what was the problem.</p>
<p>An update command in PostgreSQL is equivalent to delete and insert. However, when you delete a row the space does not immediately become available for new data, so the updated tuple is written to the next available space in the table (because some of the updated columns are indexed, <a href="https://www.postgresql.org/docs/current/storage-hot.html" rel="noopener">HOT updates</a> don't help). Previously used space can be reused only after vacuum is executed. Vaccum is not executed all the time, it is triggered automatically only after a certain amount of modifications to the table. The actual frequency depends on the storage settings of the table.</p>
<p>We eventually realized that our expire task shuffled the table, moving rows with recent creation date to old blocks, introducing outliers in the BRIN ranges.</p>
<p>After upgrading our PostgreSQL installation we switched the single minmax BRIN index to a multi minmax BRIN index and the problem was solved. The old BRIN index was 1.6MB and by the time we got to benchmark it, the query timed out (e.g more than 5 minutes). After we switched to multi minmax, the size of the index was 12MB and the benchmark query finished in 5s - more than acceptable for what we used it for.</p>Future Proofing SQL with Carefully Placed Errors2022-10-06T00:00:00+03:002022-10-06T00:00:00+03:00Haki Benitatag:hakibenita.com,2022-10-06:/future-proof-sql<p>There are many best practices for maintaining backward and forward compatibility in application code, but it's not very commonly mentioned in relation to SQL. SQL is used to produce critical business information for applications and decision-making, so there's no reason it shouldn't benefit from similar practices. In this article, I present a simple way to future-proof SQL.</p><hr>
<p>Backward compatibility is straightforward. You have full control over new code and you have full knowledge of past data and APIs. Forward compatibility is more challenging. You have full control over new code, but you don't know how data is going to change in the future, and what types of API you're going to have to support.</p>
<p>There are many best practices for maintaining backward and forward compatibility in application code, but it's not very commonly mentioned in relation to SQL. SQL is used to produce critical business information for applications and decision-making, so there's no reason it shouldn't benefit from similar practices.</p>
<p><strong>In this article, I present one simple way to future-proof SQL.</strong></p>
<div class="dark--invert">
<figure><img alt="When you make a silly mistake...<br><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small>" src="https://hakibenita.com/images/00-future-proof-sql.png"><figcaption>When you make a silly mistake...<br><small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small></figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#a-simple-payment-system">A Simple Payment System</a><ul>
<li><a href="#calculating-the-commission">Calculating the commission</a></li>
<li><a href="#adding-a-new-payment-method">Adding a new payment method</a></li>
</ul>
</li>
<li><a href="#future-proofing-sql">Future Proofing SQL</a><ul>
<li><a href="#failing-on-purpose">Failing on purpose</a></li>
<li><a href="#assert-never-in-sql">Assert never in SQL</a></li>
<li><a href="#assert-never">Assert never</a></li>
</ul>
</li>
<li><a href="#failing-without-a-function">Failing Without a Function</a><ul>
<li><a href="#abusing-division-by-zero">Abusing division-by-zero</a></li>
<li><a href="#abusing-cast">Abusing cast</a></li>
<li><a href="#abusing-cast-for-none-text-types">Abusing cast for none-text types</a></li>
</ul>
</li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="a-simple-payment-system"><a class="toclink" href="#a-simple-payment-system">A Simple Payment System</a></h2>
<p>Say you have a payment system where your customers can charge their customers for products. The table can look like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">payment</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">GENERATED</span><span class="w"> </span><span class="k">ALWAYS</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">IDENTITY</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="k">method</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">payment_method_check</span><span class="w"> </span><span class="k">CHECK</span><span class="w"> </span><span class="p">(</span><span class="k">method</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">'credit_card'</span><span class="p">,</span><span class="w"> </span><span class="s1">'cash'</span><span class="p">)),</span>
<span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>You provide your users with two payment options, cash or credit card:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">payment</span><span class="w"> </span><span class="p">(</span><span class="k">method</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'cash'</span><span class="p">,</span><span class="w"> </span><span class="mf">10000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'credit_card'</span><span class="p">,</span><span class="w"> </span><span class="mf">12000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'credit_card'</span><span class="p">,</span><span class="w"> </span><span class="mf">5000</span><span class="p">);</span>
<span class="go">INSERT 0 3</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="go"> id β method β amount</span>
<span class="go">βββββΌββββββββββββββΌββββββββ</span>
<span class="go"> 1 β cash β 10000</span>
<span class="go"> 2 β credit_card β 12000</span>
<span class="go"> 3 β credit_card β 5000</span>
<span class="go">(3 rows)</span>
</pre></div>
<h3 id="calculating-the-commission"><a class="toclink" href="#calculating-the-commission">Calculating the commission</a></h3>
<p>To charge for your service, you use the following query to calculate the commission for each payment based on the payment method:</p>
<div class="highlight"><pre><span></span><span class="c1">-- calculate_commission.sql</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mi">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mi">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">02</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
</pre></div>
<p>For cash payments you charge a flat fee of 1$ (100 cents), and for credit card payments you charge a flat fee of 30 cents plus 2% of the charged amount.</p>
<p>This is the commission for the first 3 payment processes:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\i</span><span class="w"> </span><span class="ss">calculate_commission.sql</span>
<span class="go"> payments β commission</span>
<span class="go">ββββββββββββΌββββββββββββ</span>
<span class="go"> 3 β 500.00</span>
<span class="go">(1 row)</span>
</pre></div>
<p>Congrats! You just made your first 5$.</p>
<h3 id="adding-a-new-payment-method"><a class="toclink" href="#adding-a-new-payment-method">Adding a new payment method</a></h3>
<p>Time goes by and your payment system is becoming a real hit! Demand for your service is skyrocketing and your customers ask for more payment methods. You give it some careful thought and decide to introduce a new payment method - a bank transfer:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">payment</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">payment_method_check</span><span class="p">;</span>
<span class="go">ALTER TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">payment</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">payment_method_check</span>
<span class="hll"><span class="w"> </span><span class="k">CHECK</span><span class="w"> </span><span class="p">(</span><span class="k">method</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">'credit_card'</span><span class="p">,</span><span class="w"> </span><span class="s1">'cash'</span><span class="p">,</span><span class="w"> </span><span class="s1">'bank_transfer'</span><span class="p">));</span>
</span><span class="go">ALTER TABLE</span>
</pre></div>
<p>A few more months go by and the new payment method is proving to be a real hit:</p>
<div class="highlight"><pre><span></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">payment</span><span class="w"> </span><span class="p">(</span><span class="k">method</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'bank_transfer'</span><span class="p">,</span><span class="w"> </span><span class="mi">9000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'bank_transfer'</span><span class="p">,</span><span class="w"> </span><span class="mi">15000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'bank_transfer'</span><span class="p">,</span><span class="w"> </span><span class="mi">30000</span><span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">3</span>
</pre></div>
<p>You process more payments than you ever imagined, but something is off:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\i</span><span class="w"> </span><span class="ss">calculate_commission.sql</span>
<span class="go"> payments β commission</span>
<span class="go">βββββββββββΌββββββββββββ</span>
<span class="go"> 6 β 500.00</span>
<span class="go">(1 row)</span>
</pre></div>
<p>You process all these payments but your revenue stays the same, how come?</p>
<hr>
<h2 id="future-proofing-sql"><a class="toclink" href="#future-proofing-sql">Future Proofing SQL</a></h2>
<p>When you added the new payment method you did not edit the query that calculates the commission. The query never failed, no exception or warning was raised and <strong>you completely forgot about it!</strong></p>
<p>This type of scenario is pretty common. SQL is usually not statically checked, so unless you have automated tests for this specific query, it can easily go unnoticed!</p>
<h3 id="failing-on-purpose"><a class="toclink" href="#failing-on-purpose">Failing on purpose</a></h3>
<p>Errors have a bad reputation, but in fact, they are pretty useful. If the query threw an error when it encountered an unknown payment method, you could have caught this mistake and fixed it immediately.</p>
<p>Recall the query to calculate the commission:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mi">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mi">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">02</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
</pre></div>
<p>The query uses a <code>CASE</code> expression to calculate the commission for each payment method. The expression is not defining what should happen when the method does not match any of the <code>WHEN</code> expressions, so the expression implicitly evaluates to <code>NULL</code>, and the aggregate function ignores it.</p>
<p>What if instead of implicitly evalauting to <code>NULL</code> we trigger an error?</p>
<h3 id="assert-never-in-sql"><a class="toclink" href="#assert-never-in-sql">Assert never in SQL</a></h3>
<p>To trigger an error in PostgreSQL we can write a simple function:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">assert_never</span><span class="p">(</span><span class="n">v</span><span class="w"> </span><span class="nb">anyelement</span><span class="p">)</span>
<span class="k">RETURNS</span><span class="w"> </span><span class="nb">anyelement</span>
<span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="w"> </span><span class="k">AS</span>
<span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="hll"><span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Unhandled value "%"'</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">;</span>
</span><span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p>The function accepts an argument of any type, and raises an exception:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">assert_never</span><span class="p">(</span><span class="mf">1</span><span class="p">);</span>
<span class="gs">ERROR:</span><span class="gr"> Unhandled value "1"</span>
<span class="gs">CONTEXT:</span><span class="gr"> PL/pgSQL function assert_never(anyelement) line 3 at RAISE</span>
</pre></div>
<p>To trigger an error when the query encounter an unknown value, we can call it when the expression reaches the <code>ELSE</code> part:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">assert_never</span><span class="p">(</span><span class="k">method</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span>
</span><span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> Unhandled value "bank_transfer"</span>
</span><span class="gs">CONTEXT:</span><span class="gr"> PL/pgSQL function assert_never(anyelement) line 3 at RAISE</span>
</pre></div>
<p>This is great! The query encountered the unhandled payment method <code>bank_transfer</code>, and failed. To error also includes the values we forgot to handle, which makes it especially useful for debugging.</p>
<p>The error forces the developer to handle the exception in one of the following ways:</p>
<ul>
<li>Explicitly exclude the unhandled value:</li>
</ul>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">assert_never</span><span class="p">(</span><span class="k">method</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span>
<span class="hll"><span class="k">WHERE</span>
</span><span class="hll"><span class="w"> </span><span class="k">method</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">'cash'</span><span class="p">,</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="p">);</span>
</span>
<span class="go"> payments β commission</span>
<span class="go">βββββββββββΌββββββββββββ</span>
<span class="go"> 3 β 500.00</span>
</pre></div>
<p>The developer can decide to exclude this value explicitly. Maybe it's not relevant, maybe it's being handled by a different query. Either way, the value is now excluded explicitly and not simply overlooked.</p>
<ul>
<li>Handle the new value:</li>
</ul>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'bank_transfer'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">50</span>
</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">assert_never</span><span class="p">(</span><span class="k">method</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="go"> payments β commission</span>
<span class="go">βββββββββββΌββββββββββββ</span>
<span class="go"> 6 β 650.00</span>
</pre></div>
<p>The developer spotted the mistake and added the commission for the unhandled payment method to the query. Mistake averted!</p>
<p>In both cases, the results are now accurate, and the query is safer.</p>
<h3 id="assert-never"><a class="toclink" href="#assert-never">Assert never</a></h3>
<p>Exhaustive checking is a common pattern in many languages to make sure all possible values are handled. I wrote about <a href="python-mypy-exhaustive-checking">exhaustive checking in Python</a> in the past, where I demonstrated how to implement a similar function named <code>assert_never</code> in Python.</p>
<p>Fortunately, since the article has published, the function <a href="https://docs.python.org/3.11/library/typing.html#typing.assert_never" rel="noopener"><code>assert_never</code></a> found its way into the built-in <a href="https://docs.python.org/3.11/library/typing.html" rel="noopener">typing module</a> in Python 3.11, and it can be used to perform exhaustive checking:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">assert_never</span><span class="p">,</span> <span class="n">Literal</span>
<span class="k">def</span> <span class="nf">calculate_commission</span><span class="p">(</span>
<span class="n">method</span><span class="p">:</span> <span class="n">Literal</span><span class="p">[</span><span class="s1">'cash'</span><span class="p">,</span> <span class="s1">'credit_card'</span><span class="p">,</span> <span class="s1">'bank_transfer'</span><span class="p">],</span>
<span class="n">amount</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
<span class="k">if</span> <span class="n">method</span> <span class="o">==</span> <span class="s1">'cash'</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">100</span>
<span class="k">elif</span> <span class="n">method</span> <span class="o">==</span> <span class="s1">'credit_card'</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">30</span> <span class="o">+</span> <span class="mf">0.02</span> <span class="o">*</span> <span class="n">amount</span>
<span class="k">else</span><span class="p">:</span>
<span class="hll"> <span class="n">assert_never</span><span class="p">(</span><span class="n">method</span><span class="p">)</span>
</span></pre></div>
<p>Running this code in <a href="http://mypy-lang.org/" rel="noopener">Mypy</a>, an optional static type checker for Python, will produce the following error:</p>
<div class="highlight"><pre><span></span>error: Argument 1 to "assert_never" has incompatible type "Literal['bank_transfer']";
expected "NoReturn"
</pre></div>
<p>Just like the <code>assert_never</code> function in SQL, the error warns about an unhandled value "bank_transfer". Unlike the function in SQL, this won't fail in run-time, but during static analysis.</p>
<hr>
<h2 id="failing-without-a-function"><a class="toclink" href="#failing-without-a-function">Failing Without a Function</a></h2>
<p>If for some reason you can't or don't want to use functions, there are other ways to trigger errors in SQL.</p>
<h3 id="abusing-division-by-zero"><a class="toclink" href="#abusing-division-by-zero">Abusing division-by-zero</a></h3>
<p>The go-to way to trigger errors in any programing language is to divide some number by zero:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="mf">1</span><span class="o">/</span><span class="mf">0</span><span class="w"> </span><span class="c1">-- intentional</span>
</span><span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> division by zero</span>
</span></pre></div>
<p>Instead of returning <code>NULL</code> when the method is not handled, we divide 1 by 0 to trigger a zero division error. The query failed as we wanted, but this is not working as we might expect.</p>
<p>Consider the following scenario where all the possible payment methods are handled:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'bank_transfer'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">50</span>
</span><span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="mf">1</span><span class="o">/</span><span class="mf">0</span><span class="w"> </span><span class="c1">-- fail on purpose</span>
</span><span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> division by zero</span>
</span></pre></div>
<p>This query handled all possible payment methods but it still failed - this is no good. If we look at the <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-CASE" rel="noopener">documentation for <code>CASE</code></a>, it is clear why:</p>
<blockquote>
<p>there are various situations in which subexpressions of an expression are evaluated at different times, so that the principle that βCASE evaluates only necessary subexpressionsβ is not ironclad. For example a constant 1/0 subexpression will usually result in a division-by-zero failure at planning time, even if it's within a CASE arm that would never be entered at run time.</p>
</blockquote>
<p>The documentation explains it well. While <code>CASE</code> normally evaluates only the necessary expressions, there are cases where expressions using only constants, such as <code>1/0</code>, are evaluated at planning time. This is why the query failed even though the database did not have to evaluate the expression in the <code>ELSE</code> clause.</p>
<h3 id="abusing-cast"><a class="toclink" href="#abusing-cast">Abusing cast</a></h3>
<p>Another popular genre of errors is casting errors. Let's try to trigger an error by converting a value to an incompatible type:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="k">method</span><span class="o">::</span><span class="nb">int</span>
</span><span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> invalid input syntax for type integer: "bank_transfer"</span>
</span></pre></div>
<p>We attempt to cast the text value in the column <code>method</code> to an integer, and the query failed. As a bonus, the error message provides us with the bad value, <code>"bank_transfer"</code>, which makes it easy to identify the unhandled value.</p>
<p>Let's also check that the query is not failing when all methods are handled:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">payments</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">method</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'cash'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">100</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'credit_card'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.02</span>
<span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'bank_transfer'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">50</span>
</span><span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="k">method</span><span class="o">::</span><span class="nb">int</span>
</span><span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">commission</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">payment</span><span class="p">;</span>
<span class="go"> payments β commission</span>
<span class="go">βββββββββββΌββββββββββββ</span>
<span class="go"> 6 β 650.00</span>
</pre></div>
<p>When the query handles all the possible values for <code>method</code>, it does not fail!</p>
<h3 id="abusing-cast-for-none-text-types"><a class="toclink" href="#abusing-cast-for-none-text-types">Abusing cast for none-text types</a></h3>
<p>If you use this technique long enough you'll find that triggering a casting error requires some creativity. Triggering a casting error for textual values like the above is usually easier - just cast to integer and most chances are it will fail.</p>
<p>However, if you have an integer type, what type would you cast it to to trigger an error? This is what I came up with after some time:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="n">n</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'one'</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'two'</span>
<span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="p">(</span><span class="s1">'Unhandled value '</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">n</span><span class="p">)::</span><span class="nb">int</span><span class="p">::</span><span class="nb">text</span>
</span><span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v</span>
<span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">);</span>
<span class="n">ERROR</span><span class="p">:</span><span class="w"> </span><span class="n">invalid</span><span class="w"> </span><span class="k">input</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="nb">integer</span><span class="p">:</span><span class="w"> </span><span class="ss">"Unhandled value 3"</span>
</pre></div>
<p>It's not as elegant, but it gets the job done. We triggered an error, and we get a useful error message we can act on.</p>Handling Concurrency Without Locks2022-06-09T00:00:00+03:002022-06-09T00:00:00+03:00Haki Benitatag:hakibenita.com,2022-06-09:/django-concurrency<p>Concurrency is not very intuitive - you need to train your brain to consider what happens when multiple processes execute a certain code block at the same time. In this article I present common concurrency challenges and how to overcome them with minimal locking.</p><hr>
<p>Concurrency is not very intuitive. You need to train your brain to consider what happens when multiple processes execute a certain code block at the same time. There are several issues I often encounter:</p>
<ol>
<li>
<p><strong>Failing to recognize potential concurrency issues</strong>: It's not uncommon for both beginner and seasoned developers to completely miss a potential concurrency problem. When this happens, and the concurrency issue end up causing bugs, it's usually very hard to trace and debug.</p>
</li>
<li>
<p><strong>Dismiss concurrency issues due to low likelihood</strong>: If you recognized a potential concurrency issue, at some point you probably thought to yourself "what are the chances if this happening...". It's very tempting to dismiss concurrency issues when the likelihood is low. However, I personally found that concurrency issues tend to creep up at the worst time - when your system is under significant load and you have very little time (and grace) to come up with a solution.</p>
</li>
<li>
<p><strong>Abusing locks</strong>: If you recognized a potential issue and decided to handle it properly, your next step will usually involve some kind of lock. Sometimes locks are necessary, but more often than not they can be avoided, or replaced by more permissive locks.</p>
</li>
</ol>
<p><strong>In this article I present common concurrency challenges and how to overcome them with minimal locking.</strong></p>
<div class="dark--invert">
<figure><img alt="Other type of race..." src="https://hakibenita.com/images/00-django-concurrency.svg"><figcaption>Other type of race...</figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#a-simple-web-application">A Simple Web Application</a></li>
<li><a href="#naive-implementation">Naive Implementation</a></li>
<li><a href="#handling-possible-collisions">Handling Possible Collisions</a></li>
<li><a href="#time-of-check-to-time-of-use">Time-of-check to Time-of-use</a></li>
<li><a href="#locking">Locking</a></li>
<li><a href="#lock-in-the-database">Lock in the Database</a></li>
<li><a href="#asking-for-forgiveness">Asking for Forgiveness</a></li>
<li><a href="#asking-for-forgiveness-in-postgresql">Asking for Forgiveness in PostgreSQL</a></li>
<li><a href="#identifying-race-conditions">Identifying Race Conditions</a></li>
<li><a href="#select-for-update">Select for Update</a></li>
<li><a href="#increment-in-the-database">Increment in the Database</a></li>
<li><a href="#update-and-immediately-return">Update and Immediately Return</a></li>
<li><a href="#take-away">Take Away</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="a-simple-web-application"><a class="toclink" href="#a-simple-web-application">A Simple Web Application</a></h2>
<p>A URL shortener provides short URLs that redirects to other URLs. There are several reasons why you would want that:</p>
<ol>
<li><strong>Include links in space-constrained places</strong>: Links in SMS messages, Tweets etc.</li>
<li><strong>Track the number of clicks</strong>: Ads, campaigns, newsletter links, promotional emails etc.</li>
</ol>
<p>To most developers a URL shortener sounds like a straight forward project. This is why it makes a great example to demonstrate common concurrency issues, and how easy they are to miss and get wrong. I'm using Python, Django and PostgreSQL, but the concepts apply to any programming language and RDBMs.</p>
<p><details markdown="1"></p>
<p><summary>βοΈ Django Project setup</summary></p>
<p>To build your URL shortener start by creating a new Django project:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>venv
$<span class="w"> </span><span class="nb">source</span><span class="w"> </span>venv/bin/activate
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>django
$<span class="w"> </span>django-admin<span class="w"> </span>startproject<span class="w"> </span>project
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>project
</pre></div>
<p>This will create a Python virtual environment, install the latest Django version and create a Django project named <code>project</code>.</p>
<p>For the URL shortener, add a new app called <code>shorturl</code> in the project:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>manage.py<span class="w"> </span>startapp<span class="w"> </span>shorturl
</pre></div>
<p>Next, register the new app in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s2">"shorturl.apps.ShortUrlConfig"</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
<p>Django uses SQLlite by default. If you want to configure Django to use PostgreSQL instead, install psycopg and create a database:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>psycopg2
$<span class="w"> </span>createdb<span class="w"> </span>-O<span class="w"> </span>yourdbuser<span class="w"> </span>shorturl
</pre></div>
<p>Then edit the following parameters in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">DATABASES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"ENGINE"</span><span class="p">:</span> <span class="s2">"django.db.backends.postgresql"</span><span class="p">,</span>
<span class="s2">"NAME"</span><span class="p">:</span> <span class="s2">"shorturl"</span><span class="p">,</span>
<span class="s2">"USER"</span><span class="p">:</span> <span class="s2">"yourdbuser"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Finally, run the initial migrations:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>manage.py<span class="w"> </span>migrate
</pre></div>
<p>You are now ready for the interesting part!</p>
<p></details></p>
<h2 id="naive-implementation"><a class="toclink" href="#naive-implementation">Naive Implementation</a></h2>
<p>A short URL is composed of a short unique identifier that points to some target URL, and a counter to keep track of the number of hits. A Django model for a short URL can look like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">key</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">unique</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">URLField</span><span class="p">()</span>
<span class="n">hits</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">PositiveIntegerField</span><span class="p">()</span>
</pre></div>
<p>A simple function to create a new short URL can look like this:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">secrets</span><span class="o">,</span> <span class="nn">string</span>
<span class="n">CHARACTERS</span> <span class="o">=</span> <span class="n">string</span><span class="o">.</span><span class="n">ascii_letters</span> <span class="o">+</span> <span class="n">string</span><span class="o">.</span><span class="n">digits</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">secrets</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">CHARACTERS</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
<span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">target_url</span><span class="o">=</span><span class="n">target_url</span><span class="p">,</span> <span class="n">hits</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>The function accepts a target URL, generates a random key from a list of possible characters and saves a new <code>ShortUrl</code> to the database. You can now use this function to create a new short URL:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">shorturl</span> <span class="o">=</span> <span class="n">ShortURL</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="s1">'https://hakibenita.com'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">vars</span><span class="p">(</span><span class="n">shorturl</span><span class="p">)</span>
<span class="go">{'_state': <django.db.models.base.ModelState at 0x7fd5e05558a0>,</span>
<span class="go"> 'id': 1,</span>
<span class="go"> 'created_at': datetime.datetime(2022, 4, 29, 8, 2, 18, 615165, tzinfo=datetime.timezone.utc),</span>
<span class="go"> 'key': 'c6UFG',</span>
<span class="go"> 'target_url': 'https://hakibenita.com',</span>
<span class="go"> 'hits': 0}</span>
</pre></div>
<p>This seems innocent enough, <strong>so what can possibly go wrong?</strong></p>
<h2 id="handling-possible-collisions"><a class="toclink" href="#handling-possible-collisions">Handling Possible Collisions</a></h2>
<p>Say your URL shortener becomes a wild success and you have millions of new short URLs created every day. At some point, the function that generates the random key may produce a key that already exist:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">shorturl</span> <span class="o">=</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="s1">'https://hakibenita.com/tag/django'</span><span class="p">)</span>
<span class="go">IntegrityError: duplicate key value violates unique constraint "shorturl_shorturl_key_uk"</span>
<span class="go">DETAIL: Key (key)=(c6UFG) already exists.</span>
</pre></div>
<p>Notice that the function generated the random key <code>c6UFG</code> which is similar to a short URL you previously created. The <code>key</code> column has a unique constraint defined on it, so you got a database error.</p>
<p>In order to avoid a unique constraint violation, you might try to check the key in advance, like this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="hll"> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span> <span class="n">key</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">secrets</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">CHARACTERS</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
<span class="hll"> <span class="k">if</span> <span class="ow">not</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
</span> <span class="k">break</span>
<span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">target_url</span><span class="o">=</span><span class="n">target_url</span><span class="p">,</span> <span class="n">hits</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>Instead of creating the short URL immediately, you first check that the random key you generated does not already exist. If you find that is does, you keep iterating on random keys until you find one that doesn't.</p>
<p>Aside from the fact that this function can now potentially go into an infinite loop, there is another problem!</p>
<h2 id="time-of-check-to-time-of-use"><a class="toclink" href="#time-of-check-to-time-of-use">Time-of-check to Time-of-use</a></h2>
<p>The purpose of checking that the key does not exist in advance was to avoid a database error, but is it really impossible to end up with a unique constraint violation this way? Consider the following scenario:</p>
<table>
<thead>
<tr>
<th>Process 1</th>
<th>Process 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Check key <code>c6UFG</code> -> β
</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Check key <code>c6UFG</code> -> β
</td>
</tr>
<tr>
<td></td>
<td>Use Key <code>c6UFG</code></td>
</tr>
<tr>
<td>Use key <code>c6UFG</code> -> π₯ Already exists</td>
<td></td>
</tr>
</tbody>
</table>
<p>Let's break it down:</p>
<ol>
<li>Process #1 generates the random key <code>c6UFG</code> and checks that it does not already exist</li>
<li>Before Process #1 has a chance to write the new shorturl to the database, process #2 generates the same key <code>c6UFG</code> and checks that it does not already exist. At this point, it doesn't!</li>
<li>Process #2 writes the short URL to the database and completes successfully</li>
<li>Process #1 now tries to save a short URL with the same key <code>c6UFG</code>, which it checked in advance, and fails with a unique constraint violation</li>
</ol>
<p>This is a very common concurrency issue commonly referred to as "time-of-check to time-of-use", or "TOCTOU". The name describes the issue pretty well - the problem is caused when another process changes the data between the time a process checked a value until the time it used it. In this case, Process #2 added a short URL with the same key after Process #1 had checked it, but before it used it.</p>
<h2 id="locking"><a class="toclink" href="#locking">Locking</a></h2>
<p>Whenever you have a problem with multiple processes accessing the same resource at the same time, the most intuitive solutions is a lock. But, what exactly should you lock?</p>
<p>A simple architecture for a web application usually contains a web application process and a database:</p>
<figure><img alt="Single process" src="https://hakibenita.com/images/01-python-concurrency-setup.svg"><figcaption>Single process</figcaption>
</figure>
<p>If that's your setup, you might be able to obtain a lock at the process level and make sure you are the only one accessing a certain resource at a time.</p>
<p>However, a common setup for Django (as well as other web applications) is to run with multiple worker processes:</p>
<figure><img alt="Multiple worker processes" src="https://hakibenita.com/images/02-python-concurrency-setup.svg"><figcaption>Multiple worker processes</figcaption>
</figure>
<p>With multiple worker processes on the same server it's no longer enough to lock a resource within a single process. However, since the two processes are running on the same server, you might be tempted to try and find a solution at the operating system level. But, is it enough?</p>
<figure><img alt="Multiple servers" src="https://hakibenita.com/images/03-python-concurrency-setup.svg"><figcaption>Multiple servers</figcaption>
</figure>
<p>If your system is running on multiple servers, with multiple worker processes on each one, even the OS can't save you. You now might think the lowest common denominator is the application itself. You can spend days trying to come up with an original way to coordinate a lock between the servers, but would that solve your problem?</p>
<figure><img alt="Multiple applications" src="https://hakibenita.com/images/04-python-concurrency-setup.svg"><figcaption>Multiple applications</figcaption>
</figure>
<p>Modern systems can run multiple applications on top of the same database. We for example, <a href="/5-ways-to-make-django-admin-safer#separate-the-django-admin-from-the-main-site">do it with Django admin</a>.</p>
<p>At this point it becomes clearer that the lowest common denominator, the resource that all servers, processes and applications share, is the database. If you want to "lock" a resource, you better do it in the database.</p>
<h2 id="lock-in-the-database"><a class="toclink" href="#lock-in-the-database">Lock in the Database</a></h2>
<p>Now that you know where to lock, there is another challenge. If you were updating an existing row in the database you could have locked that specific row, but this is not the case. You want to create a new row, for a new short URL, so what can you possibly lock?</p>
<p>One option is to lock the entire table:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span><span class="p">,</span> <span class="n">transaction</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">(),</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
</span><span class="hll"> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'LOCK TABLE shorturl_shorturl IN EXCLUSIVE MODE;'</span><span class="p">)</span>
</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">secrets</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">CHARACTERS</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
<span class="k">break</span>
<span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">target_url</span><span class="o">=</span><span class="n">target_url</span><span class="p">,</span> <span class="n">hits</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>To prevent other processes from creating short URLs and potentially causing a unique constraint violation, you locked the entire shorturl table.</p>
<p>First, to obtain a lock in the database you need to operate inside a database transaction:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</pre></div>
<p>In most cases, <a href="https://docs.djangoproject.com/en/4.0/topics/db/transactions/#django-s-default-transaction-behavior" rel="noopener">Django operates in "autocommit" mode</a>, meaning it implicitly opens a transaction for every command, and commits immediately after. In this case you want to execute multiple commands in the same database transaction, so you explicitly control the transaction using the <a href="https://docs.djangoproject.com/en/4.0/topics/db/transactions/#django.db.transaction.atomic" rel="noopener"><code>transaction.atomic</code> context</a>.</p>
<p>The Django ORM does not provide functions for locking tables. To lock a table in the database you need to use a cursor and <a href="https://docs.djangoproject.com/en/4.0/topics/db/sql/#executing-custom-sql-directly" rel="noopener">execute raw SQL</a>:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">(),</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'LOCK TABLE shorturl_shorturl IN EXCLUSIVE MODE;'</span><span class="p">)</span>
</pre></div>
<p>After you obtained an exclusive lock on the table, no other transaction can obtain the same lock until you release it. This guarantees that the data cannot change between the time you checked the values and the time you used them.</p>
<p>So, problem solved?</p>
<h2 id="asking-for-forgiveness"><a class="toclink" href="#asking-for-forgiveness">Asking for Forgiveness</a></h2>
<p>Imagine your app makes it to the top page of a very popular news site. In a matter of minutes you start getting thousands of hits, and users are creating thousands of new short URLs. Now imagine your system can only create a single short URL from a single user at a time, and all other users need to wait in line. Is that acceptable?</p>
<p>When you locked the entire table you made sure no other transaction can make changes to the table until you are done. This means adding new short URLs is now safe, but it can also get pretty slow when you have many concurrent requests.</p>
<p>What if you could ditch the lock? What if you could make your function safe without preventing multiple users from creating short URLs at the same time? Have another look at the exception you got in the beginning:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">shorturl</span> <span class="o">=</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="s1">'https://hakibenita.com/tag/django'</span><span class="p">)</span>
<span class="hll"><span class="go">IntegrityError: duplicate key value violates unique constraint "shorturl_shorturl_key_uk"</span>
</span><span class="go">DETAIL: Key (key)=(c6UFG) already exists.</span>
</pre></div>
<p>When the function attempted to create a short URL with a key that already existed, the database returned an error. This is because the <code>key</code> column has a unique constraint defined on it. So, if the database is already making sure that there are no duplicates, why not rely on it?</p>
<p>With this in mind, you can now change your function to handle the unique constraint violation:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">IntegrityError</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">secrets</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">CHARACTERS</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">target_url</span><span class="o">=</span><span class="n">target_url</span><span class="p">,</span> <span class="n">hits</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="hll"> <span class="k">except</span> <span class="n">IntegrityError</span><span class="p">:</span>
</span><span class="hll"> <span class="c1"># Key exists, try try again!</span>
</span><span class="hll"> <span class="k">continue</span>
</span></pre></div>
<p>The function now generates a random key and attempts to create a new short URL. If the command fails with an <code>IntegrityError</code> it means a short URL with the same <code>key</code> already exists. In this case, the function will generate another random <code>key</code> and try again until it succeeds.</p>
<p>The first thing you might notice is that there is no explicit lock, and there is no explicit database transaction. Multiple processes can now safely create short URLs without getting an <code>IntegrityError</code>.</p>
<p>According to this quote from the <a href="https://docs.python.org/3.11/glossary.html#term-EAFP" rel="noopener">Python glossary</a>, this approach is also much more "pythonic":</p>
<blockquote>
<p><strong>EAFP</strong><br>
Common Python coding style [ ... ] This clean and fast style is characterized by the presence of many try and except statements.</p>
</blockquote>
<p>The term EAFP stands for "Easier to ask for forgiveness than permission", and this is exactly what you did. Instead of checking in advance, you tried to write the row to the database and handled the exception. The opposite of EAFP is LBYL, "Look Before You Leap", which is what you did when you checked the values in advance.</p>
<p>Python as a language encourages "asking or forgiveness" (EAFP), as opposed to other languages such as JavaScript or Java that encourage you to check values in advance, the LBYL style.</p>
<h2 id="asking-for-forgiveness-in-postgresql"><a class="toclink" href="#asking-for-forgiveness-in-postgresql">Asking for Forgiveness in PostgreSQL</a></h2>
<p>Sometimes it's necessary to generate short URLs as part of another transaction. For example, if you include the short URL in a notification you save to the database, your code can look like this:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="o">...</span> <span class="n">shorturl</span> <span class="o">=</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="s1">'https://hakibenita.com'</span><span class="p">)</span>
<span class="o">...</span> <span class="n">notification</span> <span class="o">=</span> <span class="n">Notification</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">to</span><span class="o">=</span><span class="s1">'...'</span><span class="p">,</span> <span class="n">message</span><span class="o">=</span><span class="s1">'...'</span><span class="p">)</span>
</pre></div>
<p>To make sure both the notification and the short URL are created together, you execute the code inside a database transaction. This is a perfectly valid use of database transactions - this is exactly what they are for!</p>
<p>If you use PostgreSQL, under some circumstances you might encounter this error:</p>
<div class="highlight"><pre><span></span>InternalError: current transaction is aborted, commands ignored until end of transaction block
</pre></div>
<p>In PostgreSQL, when there is an error during a transaction, all other commands are aborted until the end of the transaction. Now recall how <code>ShortUrl.create</code> is implemented - you iterate over random keys and attempt to create until you <em>don't</em> get an <code>IntegrityError</code>. This means that if you execute your function inside a transaction and it did trigger an <code>IntegrityError</code>, PostgreSQL will abort the transaction, and you won't be able to proceed with the rest of the code.</p>
<p>To accommodate <a href="https://docs.djangoproject.com/en/4.0/topics/db/transactions/#handling-exceptions-within-postgresql-transactions" rel="noopener">possible exceptions within transaction in PostgreSQL</a>, you can use <em>another</em> transaction:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span><span class="p">,</span> <span class="n">IntegrityError</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">target_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">secrets</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">CHARACTERS</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># In PostgreSQL, when an SQL command fails (usually due to</span>
<span class="c1"># IntegrityError) it is not possible to execute other commands</span>
<span class="c1"># until the end of the atomic block. To be able to retry different</span>
<span class="c1"># keys multiple times after, we execute the command in its own</span>
<span class="c1"># atomic block.</span>
<span class="c1"># https://docs.djangoproject.com/en/4.0/topics/db/transactions/#handling-exceptions-within-postgresql-transactions</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="k">return</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">target_url</span><span class="o">=</span><span class="n">target_url</span><span class="p">,</span> <span class="n">hits</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">except</span> <span class="n">IntegrityError</span><span class="p">:</span>
<span class="c1"># Key exists, try try again!</span>
<span class="k">continue</span>
</pre></div>
<p>The additional transaction introduces very little overhead, it simply restricts the effect a possible exception can have on any outer transactions.</p>
<h2 id="identifying-race-conditions"><a class="toclink" href="#identifying-race-conditions">Identifying Race Conditions</a></h2>
<p>Now that you have all of these short URLs in your system, it's time to actually use them. The URL shortener system includes a view that redirects a short URL to its target URL, and increments the counter. The view can look like this:</p>
<div class="highlight"><pre><span></span><span class="c1"># views.py</span>
<span class="kn">from</span> <span class="nn">django.http</span> <span class="kn">import</span> <span class="n">HttpRequest</span><span class="p">,</span> <span class="n">HttpResponse</span><span class="p">,</span> <span class="n">HttpResponseRedirect</span><span class="p">,</span> <span class="n">Http404</span>
<span class="kn">from</span> <span class="nn">django.views.decorators.http</span> <span class="kn">import</span> <span class="n">require_http_methods</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">ShortUrl</span>
<span class="nd">@require_http_methods</span><span class="p">([</span><span class="s2">"GET"</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">resolve_short_url</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">HttpResponse</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="n">shorturl</span> <span class="o">=</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">resolve</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
</span> <span class="k">except</span> <span class="n">ShortUrl</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">Http404</span><span class="p">()</span>
<span class="hll"> <span class="k">return</span> <span class="n">HttpResponseRedirect</span><span class="p">(</span><span class="n">shorturl</span><span class="o">.</span><span class="n">target_url</span><span class="p">)</span>
</span></pre></div>
<p>The view attempts to "resolve" a key to a <code>ShortURL</code> instance, and if it finds one, redirects to it.</p>
<p>A naive implementation of the function <code>ShortUrl.resolve</code> can look like this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="n">shorturl</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span>
<span class="n">shorturl</span><span class="o">.</span><span class="n">hits</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">shorturl</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="s1">'hits'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">shorturl</span>
</pre></div>
<p>The function accepts the key as argument and attempts to find a short URL with that key. If the key is not found, <code>.get(key=key)</code> throws a <code>ShortUrl.DoesNotExist</code> exception and the view will return a 404 response. If a short URL is found, the hit counter is incremented and saved, the object is returned to the view and the user is redirected to the target URL.</p>
<p>So, where is the problem?</p>
<p>As you already experienced when you created the short URL, concurrency issues often require special attention. Imagine what will happen if multiple users are trying to resolve the same key at the same time. Consider the following scenario:</p>
<table>
<thead>
<tr>
<th>Process 1</th>
<th>Process 2</th>
<th>Hits</th>
</tr>
</thead>
<tbody>
<tr>
<td>Select hits -> 0</td>
<td></td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>Select hits -> 0</td>
<td>0</td>
</tr>
<tr>
<td>Update hits -> 1</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>π₯ Update hits -> 1</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>In this scenario, two processes are resolving the same URL at the same time. The short URL is resolved twice, but the hits counter is 1. This is incorrect!</p>
<h2 id="select-for-update"><a class="toclink" href="#select-for-update">Select for Update</a></h2>
<p>The problem with the naive implementation is that when multiple concurrent processes resolve the same short URL at the same time, the counter can get out of sync. Each of the processes is incrementing the counter based on the value it received when it fetched the row from the database, and that's how the counter gets out of sync.</p>
<p>What if you could lock the row to prevent multiple processes from selecting and updating it at the same time? Consider the following implementation:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="n">shorturl</span> <span class="o">=</span> <span class="p">(</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">shorturl</span><span class="o">.</span><span class="n">hits</span> <span class="o">=</span> <span class="n">shorturl</span><span class="o">.</span><span class="n">hits</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">shorturl</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="s1">'hits'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">shorturl</span>
</pre></div>
<p>The function now opens a database transaction, and uses <a href="https://docs.djangoproject.com/en/4.0/ref/models/querysets/#select-for-update" rel="noopener"><code>select_for_update</code></a> to lock the row. If the lock is obtained, other processes cannot obtain the same lock on the row until the transaction is finished. This means the counter can no longer get out of sync, because only a single process can fetch and update it at the same time. But it also means that any concurrent processes must either wait or fail.</p>
<p>Imagine you launch a big campaign to hundreds of thousands of users. To check how effective your campaign is, you use your URL shortener to keep track of how many users clicked the links. Immediately when you send out the campaign it lands in your users' emails and thousands of them click the links. Now imagine each user needs to wait in line until the previous one fetched the row and updated the hit counter in the database. Sounds like it might be a problem...</p>
<h2 id="increment-in-the-database"><a class="toclink" href="#increment-in-the-database">Increment in the Database</a></h2>
<p>Using <code>select_for_update</code> you solved the problem of the hit counter going out of sync, but your system is now doing a very poor job at the one thing it should be doing very well - redirect short URLs!</p>
<p>The main issue with the previous approach is that you incremented the counter based on what you fetched. With many concurrent processes operating at the same time, it is very much possible that since you fetched the row, the counter was incremented multiple times by other processes and you just don't know about it.</p>
<p>What if instead of incrementing the counter based on what you have stored in memory, you instruct the database to update based on what it currently has stored?</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">F</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="n">shorturl</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span>
<span class="hll"> <span class="n">shorturl</span><span class="o">.</span><span class="n">hits</span> <span class="o">=</span> <span class="n">F</span><span class="p">(</span><span class="s1">'hits'</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
</span> <span class="n">shorturl</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="s1">'hits'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">shorturl</span>
</pre></div>
<p>The function now uses an <a href="https://docs.djangoproject.com/en/4.0/ref/models/expressions/#f-expressions" rel="noopener">F expression</a> to update the counter relative to what is in the database.</p>
<p>The difference seems very mild, so the best way to understand it is to look at the SQL generated by the update commands. The naive approach will execute the following command:</p>
<div class="highlight"><pre><span></span><span class="k">UPDATE</span><span class="w"> </span><span class="n">shorturl_shorturl</span>
<span class="hll"><span class="k">SET</span><span class="w"> </span><span class="n">hits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span>
</span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">154</span><span class="p">;</span>
</pre></div>
<p>This will update hits to 1, regardless of the current value of hits in the database.</p>
<p>The function using the F expression will execute the following command:</p>
<div class="highlight"><pre><span></span><span class="k">UPDATE</span><span class="w"> </span><span class="n">shorturl_shorturl</span>
<span class="hll"><span class="k">SET</span><span class="w"> </span><span class="n">hits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hits</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
</span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">154</span><span class="p">;</span>
</pre></div>
<p>The hit counter is now incremented by one and not set to a fixed value. This is how using an F expression solves the problem without obtaining an explicit lock on the row.</p>
<h2 id="update-and-immediately-return"><a class="toclink" href="#update-and-immediately-return">Update and Immediately Return</a></h2>
<p>Using F expressions solved the problem without an exclusive lock, but there are still two minor downsides to this approach:</p>
<ol>
<li>
<p><strong>There are two round trips to the database</strong>: the function first access the database to fetch the short URL by key, and then access it again to update the row.</p>
</li>
<li>
<p><strong>The hit counter on the returned instance is not updated</strong>: the function is incrementing the value in the database, but the short URL object it returns stores the hits counter prior to when the hits counter was incremented.</p>
</li>
</ol>
<p>Normally, the F expression solution with two round trips to the database is good enough, no reason to go through the trouble of trying to optimize it. However in this case, the system should be able to accommodate sudden bursts and redirect very quickly, so it might be worth the trouble.</p>
<p>To solve both issues, you can extend the solution beyond Django's ORM built-in capabilities with some SQL magic:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span>
<span class="k">class</span> <span class="nc">ShortUrl</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="n">short_url</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">raw</span><span class="p">(</span><span class="s1">'''</span>
<span class="s1"> UPDATE shorturl_shorturl</span>
<span class="s1"> SET hits = hits + 1</span>
<span class="s1"> WHERE key = </span><span class="si">%s</span>
<span class="s1"> RETURNING *</span>
<span class="s1"> '''</span><span class="p">,</span> <span class="p">[</span><span class="n">key</span><span class="p">])</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">short_url</span><span class="p">:</span>
<span class="k">raise</span> <span class="bp">cls</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">()</span>
<span class="k">return</span> <span class="n">short_url</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li>
<p><strong>Use <code>RETURNING</code> to get updated results</strong>: In PostgreSQL and SQLite you can <a href="sql-tricks-application-dba#implement-complete-processes-using-with-and-returning">return the rows affected by an UPDATE statement</a>. The rows returned by the <code>RETURNING</code> clause are the updated rows. Even though you only update the column <code>hits</code>, you can use <code>*</code> to return the entire row with the updated values.</p>
</li>
<li>
<p><strong>Use <code>.raw</code> to construct a <code>ShortUrl</code> instance</strong>: Django ORM allows you to <a href="https://docs.djangoproject.com/en/4.0/topics/db/sql/#django.db.models.Manager.raw" rel="noopener">construct ORM objects from raw SQL</a>.</p>
</li>
<li>
<p><strong>If no rows were affected, raise <code>DoesNotExist</code></strong>: if no rows were affected it means there is no short URL for the provided key. To mimic Django's behavior in this case, raise a <code>DoesNotExist</code> exception, otherwise return the object.</p>
</li>
</ol>
<p><strong>Binding the table name</strong></p>
<p>Under some circumstances you might want to avoid explicitly using the name of the table. Mainly if you have a reuseable model and you cannot be sure what the name of the database table is.</p>
<p>Using string concatenation to set the name of the table is not acceptable as it puts the query at risk of SQL Injection. Using psycopg, the Python driver for PostgreSQL, there is a way to safely bind identifiers such as table names:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">psycopg2.sql</span> <span class="kn">import</span> <span class="n">SQL</span><span class="p">,</span> <span class="n">Identifier</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">ShortUrl</span><span class="p">:</span>
<span class="n">short_url</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">raw</span><span class="p">(</span>
<span class="hll"> <span class="n">raw_query</span><span class="o">=</span><span class="n">SQL</span><span class="p">(</span><span class="s2">"""</span>
</span><span class="hll"><span class="s2"> UPDATE </span><span class="si">{}</span>
</span><span class="s2"> SET hits = hits + 1</span>
<span class="s2"> WHERE key = </span><span class="si">%s</span>
<span class="s2"> RETURNING *</span>
<span class="hll"><span class="s2"> """</span><span class="p">)</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">Identifier</span><span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">db_table</span><span class="p">)),</span>
</span> <span class="n">params</span><span class="o">=</span><span class="p">(</span><span class="n">key</span><span class="p">,),</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">short_url</span><span class="p">:</span>
<span class="k">raise</span> <span class="bp">cls</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">()</span>
<span class="k">return</span> <span class="n">short_url</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
<p>The function now uses the name of the table as parameter.</p>
<p>Check out <a href="https://realpython.com/prevent-python-sql-injection/#using-sql-composition" rel="noopener">Preventing SQL Injection Attacks With Python</a> for more about how to safely compose SQL for PostgreSQL in Python using Psycopg2.</p>
<hr>
<h2 id="take-away"><a class="toclink" href="#take-away">Take Away</a></h2>
<p>Throughout this article you used several different approaches to solve very common concurrency issues in two seemingly simple tasks:</p>
<ul>
<li>Create short URL<ul>
<li>β #1: Failing to recognize possible collisions</li>
<li>β #2: Time-of-check time-of-use (TOCTOU)</li>
<li>β
#1: Lock!</li>
<li>β
#2: Ask for forgiveness</li>
</ul>
</li>
<li>Increment hit counter<ul>
<li>β #3: Ignore race conditions</li>
<li>β
#3: Select for update</li>
<li>β
#4: Increment in the database</li>
<li>β
#5: Update and immediately return</li>
</ul>
</li>
</ul>
<p>Some of these approaches were fragile, and broke under even minor loads, and others has their advantages and disadvantages.</p>
<p>The main take away is this:</p>
<ol>
<li>
<p><strong>Keep concurrency in mind</strong><br>Concurrency issues are very hard to miss, and even harder to debug. Recognizing concurrency issues requires special attention and awareness. This article covers the awareness part, so now you just need to pay attention.</p>
</li>
<li>
<p><strong>Don't let probabilities distract you</strong><br>It's very tempting to dismiss concurrency issues when the likelihood is low. Hopefully now that you know a few more ways to handle concurrency you are in a better position to resist the temptation.</p>
</li>
<li>
<p><strong>Avoid locks if possible</strong><br>Locks slow things down and cause contention so it's best to avoid them when possible. Now you know how.</p>
</li>
</ol>
<p>For more advanced approaches for managing concurrency in Django, check out <a href="how-to-manage-concurrency-in-django-models">How to Manage Concurrency in Django Models</a>.</p>2021 Year in Review2021-12-31T00:00:00+02:002021-12-31T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-12-31:/2021-year-in-review<p>What I've been up to in 2021...</p><hr>
<p>Painters like their paintings to be viewed, musicians like their music to be heard, and writers like their articles to be read. This is why every year I look back at what I've done:</p>
<ul>
<li>
<p>This year <strong>I published 6 articles</strong>. 5 articles were about <a href="tag/sql">SQL</a>, and 2 were also about <a href="tag/python">Python</a> and <a href="tag/django">Django</a>.</p>
</li>
<li>
<p>According to my analytics, <strong>my articles were viewed ~600K this year</strong>. Given the target audience this is most likely an underestimation, but it still an impressive <strong>+50% increase</strong> compared to last year.</p>
</li>
</ul>
<div class="dark--invert">
<figure><img alt="Daily traffic to hakibenita.com at 2021" src="https://hakibenita.com/images/00-2021-year-in-review-visits.png"><figcaption>Daily traffic to hakibenita.com at 2021</figcaption>
</figure>
</div>
<ul>
<li>
<p><strong>More than 2K readers subscribed to my mailing list</strong>. If you want to get an email when I publish something new you can <a href="subscribe">subscribe here</a>.</p>
</li>
<li>
<p>Hacker News remained a significant source of traffic after <strong>several of my articles reached the font page</strong>:</p>
</li>
</ul>
<figure><img alt="Posts from hakibenita.com on Hacker News (<a href="https://hn.algolia.com/?dateEnd=1640908800&dateRange=custom&dateStart=1609459200&page=0&prefix=false&query=hakibenita.com&sort=byPopularity&type=story">source</a>)" src="https://hakibenita.com/images/00-2021-year-in-review-hn.png"><figcaption>Posts from hakibenita.com on Hacker News (<a href="https://hn.algolia.com/?dateEnd=1640908800&dateRange=custom&dateStart=1609459200&page=0&prefix=false&query=hakibenita.com&sort=byPopularity&type=story">source</a>)</figcaption>
</figure>
<ul>
<li>
<p>Other platforms also brought in a fair amount of readers including <a href="https://www.reddit.com/search/?q=site%3Ahakibenita.com&sort=top" rel="noopener">Reddit</a>, <a href="https://lobste.rs/domain/hakibenita.com" rel="noopener">Lobsters</a> and <a href="https://twitter.com/search?q=hakibenita.com" rel="noopener">Twitter</a>.</p>
</li>
<li>
<p>Many conferences became online in 2021 and I used this opportunity to <strong>present in 5 conferences</strong>:</p>
<ul>
<li>
<p>On February I presented a talk inspired by my article <a href="sql-tricks-application-dba">Some SQL Tricks of an Application DBA</a> in the <strong>PostgreSQL devroom at <a href="https://fosdem.org/2021/" rel="noopener">FOSDEM</a></strong> (<a href="https://fosdem.org/2021/schedule/event/postgresql_some_sql_tricks_of_an_application_dba/" rel="noopener">watch</a>)</p>
</li>
<li>
<p>On the same day, I also presented a talk about hidden gems in the typing system inspired my article <a href="python-mypy-exhaustive-checking">Exhaustiveness Checking with Mypy</a> at the <strong>Python devroom at <a href="https://fosdem.org/2021/" rel="noopener">FOSDEM</a></strong> (<a href="https://fosdem.org/2021/schedule/event/python_mypy/" rel="noopener">watch</a>)</p>
</li>
<li>
<p>On May I presented another talk inspired by the article <a href="python-mypy-exhaustive-checking">Exhaustiveness Checking with Mypy</a> at <strong><a href="https://pycon.org.il/2021/" rel="noopener">PyCon IL</a></strong> (<a href="https://youtu.be/fyiIJ2Qy8ss" rel="noopener">watch</a>)</p>
</li>
<li>
<p>On June I presented a talk titled "Unlocking the full potential of PostgreSQL indexes in Django" on <strong><a href="https://2021.djangocon.eu/" rel="noopener">DjangoCon Europe</a></strong> (<a href="https://youtu.be/BhxCYK6TCwo" rel="noopener">watch</a>)</p>
</li>
<li>
<p>On Jul I presented a talk titled "Taming Nondeterminism with Dependency Injection: Take back control of your code!" inspired by my article <a href="python-dependency-injection">Stop using datetime now!</a> at <strong><a href="https://ep2021.europython.eu" rel="noopener">Euro Python</a></strong> (<a href="https://youtu.be/IUzdSBn8S14" rel="noopener">watch</a>)</p>
</li>
<li>
<p>On Nov I presented a talk inspired by my latest article <a href="postgresql-unknown-features">Lesser Known Features of PostgreSQL</a> at <strong><a href="https://www.postgresbuild.com" rel="noopener">Postgres Build</a></strong>. This is the second year I present in this conference organized by <a href="https://www.enterprisedb.com/" rel="noopener">EnterpriseDB</a>.</p>
</li>
</ul>
</li>
<li>
<p><strong>I gave several online live trainings for <a href="https://www.oreilly.com" rel="noopener">OβReilly</a></strong>. In addition to live session of my class <a href="https://www.oreilly.com/live-training/courses/sql-next-steps-optimization/0636920378372/" rel="noopener">SQL Next Steps: Optimization</a>, I also developed a new class titled <a href="https://learning.oreilly.com/live-events/postgresql-fundamentals/0636920060241/0636920060240/" rel="noopener">PostgreSQL Fundamentals</a>. I really enjoy giving these online classes because I can talk to other developers and answer questions in real time. The interactive environment provides a unique experience for learners, and the feedback has been great.</p>
</li>
<li>
<p><strong>Collaborated with <a href="https://hex.tech/" rel="noopener">Hex</a></strong> on a guide for <a href="sql-for-data-analysis">Practical SQL for Data Analysis</a>. The Hex platform was perfect for embedding interactive notebooks in the article.</p>
</li>
<li>
<p><strong>Published an <a href="https://www.educative.io/courses/simple-anomaly-detection-sql" rel="noopener">interactive course on educative</a></strong> inspired by my article <a href="/sql-anomaly-detection">Simple Anomaly Detection using SQL</a>. If like me, you prefer text based materials over video or audio, be sure to check out <a href="https://www.educative.io" rel="noopener">educative</a>.</p>
</li>
<li>
<p><strong>I was <a href="https://www.blog.pythonlibrary.org/2021/04/26/pydev-of-the-week-haki-benita/" rel="noopener">PyDev of the week</a></strong>. I've been following Mike's "PyDev of the week" series for years, and I was very flattered when he approached me about it.</p>
</li>
<li>
<p><strong>I gave <a href="https://dataanalysis.substack.com/p/expert-insight-haki-benita-choosing" rel="noopener">an interview</a></strong> for the lovely Olga from the <a href="https://dataanalysis.substack.com/" rel="noopener">Data Analysis Journal</a>.</p>
</li>
<li>
<p>Last year I gave <a href="https://webmonetization.org/specification.html" rel="noopener">web monetization</a> a chance and just like I thought, it turned out to be a dud. To continue my experimentation with web monetization, I set up different ways for readers to <a href="pages/appreciate">show their appreciation</a>.</p>
</li>
</ul>Lesser Known PostgreSQL Features2021-11-08T00:00:00+02:002021-11-08T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-11-08:/postgresql-unknown-features<p>A list of useful features you already have, but may not know about! In this article I share lesser known features of PostgreSQL.</p><hr>
<p>In 2006 Microsoft conducted a customer survey to find what new features users want in new versions of Microsoft Office. To their surprise, more than 90% of what users asked for already existed, they just didn't know about it. To address the "discoverability" issue, they came up with the "Ribbon UI" that we know from Microsoft Office products today.</p>
<p>Office is not unique in this sense. Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL. With PostgreSQL 14 released just a few weeks ago, what a better opportunity to shed a light on some lesser known features that already exist in PostgreSQL, but you may not know.</p>
<p><strong>In this article I present lesser known features of PostgreSQL.</strong></p>
<div class="dark--invert">
<figure><img alt="<small>Illustration by <a href="https://www.instagram.com/_wrightdesign/">Eleanor Wright</a></small>" src="https://hakibenita.com/images/00-postgresql-unknown-features.png"><figcaption><small>Illustration by <a href="https://www.instagram.com/_wrightdesign/">Eleanor Wright</a></small></figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#get-the-number-of-updated-and-inserted-rows-in-an-upsert">Get the Number of Updated and Inserted Rows in an Upsert</a></li>
<li><a href="#grant-permissions-on-specific-columns">Grant Permissions on Specific Columns</a></li>
<li><a href="#match-against-multiple-patterns">Match Against Multiple Patterns</a></li>
<li><a href="#find-the-current-value-of-a-sequence-without-advancing-it">Find the Current Value of a Sequence Without Advancing It</a></li>
<li><a href="#use-copy-with-multi-line-sql">Use \copy With Multi-line SQL</a></li>
<li><a href="#prevent-setting-the-value-of-an-auto-generated-key">Prevent Setting the Value of an Auto Generated Key</a></li>
<li><a href="#two-more-ways-to-produce-a-pivot-table">Two More Ways to Produce a Pivot Table</a></li>
<li><a href="#dollar-quoting">Dollar Quoting</a></li>
<li><a href="#comment-on-database-objects">Comment on Database Objects</a></li>
<li><a href="#keep-a-separate-history-file-per-database">Keep a Separate History File Per Database</a></li>
<li><a href="#autocomplete-reserved-words-in-uppercase">Autocomplete Reserved Words in Uppercase</a></li>
<li><a href="#sleep-for-interval">Sleep for Interval</a></li>
<li><a href="#get-the-first-or-last-row-in-a-group-without-sub-queries">Get the First or Last Row in a Group Without Sub-Queries</a></li>
<li><a href="#generate-uuid-without-extensions">Generate UUID Without Extensions</a></li>
<li><a href="#generate-reproducible-random-data">Generate Reproducible Random Data</a></li>
<li><a href="#add-constraints-without-validating-immediately">Add Constraints Without Validating Immediately</a></li>
<li><a href="#synonyms-in-postgresql">Synonyms in PostgreSQL</a></li>
<li><a href="#find-overlapping-ranges">Find Overlapping Ranges</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="get-the-number-of-updated-and-inserted-rows-in-an-upsert"><a class="toclink" href="#get-the-number-of-updated-and-inserted-rows-in-an-upsert">Get the Number of Updated and Inserted Rows in an Upsert</a></h2>
<p><code>INSERT ON CONFLICT</code>, also known as "merge" (in Oracle) or "upsert" (a mashup of UPDATE and INSERT), is a very useful command, especially in ETL processes. Using the <code>ON CONFLICT</code> clause of an <code>INSERT</code> statement, you can tell the database what to do when a collision is detected in one or more key columns.</p>
<p>For example, here is a query to sync data in an employees table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">new_employees</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">,</span><span class="w"> </span><span class="mf">1000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jane'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">,</span><span class="w"> </span><span class="mf">1200</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">employees</span><span class="w"> </span><span class="p">(</span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">new_employees</span>
<span class="hll"><span class="k">ON</span><span class="w"> </span><span class="k">CONFLICT</span><span class="w"> </span><span class="p">(</span><span class="k">name</span><span class="p">)</span><span class="w"> </span><span class="k">DO</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="k">SET</span>
</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="k">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="k">role</span><span class="p">,</span>
<span class="w"> </span><span class="n">salary</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="n">salary</span>
<span class="k">RETURNING</span><span class="w"> </span><span class="o">*</span><span class="p">;</span>
<span class="go"> name β department β role β salary</span>
<span class="go">βββββββββΌβββββββββββββΌββββββββββββΌββββββββ</span>
<span class="go"> George β Sales β Manager β 1000</span>
<span class="go"> Jane β R&D β Developer β 1200</span>
<span class="go">INSERT 0 2</span>
</pre></div>
<p>The query inserts new employee data to the table. If there is an attempt to add an employee with a name that already exists, the query will update that row instead.</p>
<div class="admonition tip">
<p class="admonition-title">RETURNING *</p>
<p>Check out how to <a href="sql-tricks-application-dba#implement-complete-processes-using-with-and-returning">implement complete processes using <code>WITH</code> and <code>RETURNING</code></a>.</p>
</div>
<p>You can see from the output of the command above, <code>INSERT 0 2</code>, that two employees were affected. But how many were inserted, and how many were updated? The output is not giving us any clue!</p>
<p>While I was looking for a way to improve the logging of some ETL process that used such query, I stumbled upon <a href="https://stackoverflow.com/a/39204667/2000875" rel="noopener">this Stack Overflow answer</a> that suggested a pretty clever solution to this exact problem:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">new_employees</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">,</span><span class="w"> </span><span class="mf">1000</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jane'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">,</span><span class="w"> </span><span class="mf">1200</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">employees</span><span class="w"> </span><span class="p">(</span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">new_employees</span>
<span class="k">ON</span><span class="w"> </span><span class="k">CONFLICT</span><span class="w"> </span><span class="p">(</span><span class="k">name</span><span class="p">)</span><span class="w"> </span><span class="k">DO</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="k">SET</span>
<span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="k">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="k">role</span><span class="p">,</span>
<span class="w"> </span><span class="n">salary</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">EXCLUDED</span><span class="mf">.</span><span class="n">salary</span>
<span class="hll"><span class="k">RETURNING</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">xmax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">inserted</span><span class="p">;</span>
</span>
<span class="go"> name β department β role β salary β inserted</span>
<span class="go">βββββββββΌβββββββββββββΌββββββββββββΌβββββββββΌββββββββββ</span>
<span class="go"> Jane β R&D β Developer β 1200 β t</span>
<span class="go"> George β Sales β Manager β 1000 β f</span>
<span class="go">INSERT 0 2</span>
</pre></div>
<p>Notice the difference in the <code>RETUNING</code> clause. It includes the calculated field <code>inserted</code> that uses the special column <code>xmax</code> to determine how many rows were inserted. From the data returned by the command, you can spot that a new row was inserted for "Jane", but "George" was already in the table, so the row was updated.</p>
<p>The <code>xmax</code> column is a <a href="https://www.postgresql.org/docs/current/ddl-system-columns.html" rel="noopener">special system column</a>:</p>
<blockquote>
<p>The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version.</p>
</blockquote>
<p>In PostgreSQL, when a row is updated, the previous version is deleted, and <code>xmax</code> holds the ID of the deleting transaction. When the row is inserted, no previous row is deleted, so <code>xmax</code> is zero. This "trick" is cleverly using this behavior to distinguish between updated and inserted rows.</p>
<hr>
<h2 id="grant-permissions-on-specific-columns"><a class="toclink" href="#grant-permissions-on-specific-columns">Grant Permissions on Specific Columns</a></h2>
<p>Say you have a users table that contain sensitive information such as credentials, passwords or PII:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span>
<span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mf">20</span><span class="p">),</span>
<span class="w"> </span><span class="n">personal_id</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mf">10</span><span class="p">),</span>
<span class="w"> </span><span class="n">password_hash</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mf">256</span><span class="p">)</span>
<span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'haki'</span><span class="p">,</span><span class="w"> </span><span class="s1">'12222227'</span><span class="p">,</span><span class="w"> </span><span class="s1">'super-secret-hash'</span><span class="p">);</span>
<span class="go">INSERT 1 0</span>
</pre></div>
<p>The table is used by different people in your organization, such as analysts, to access data and produce ad-hoc reports. To allow access to analysts, you add a special user in the database:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">USER</span><span class="w"> </span><span class="n">analyst</span><span class="p">;</span>
<span class="go">CREATE USER</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">GRANT</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">analyst</span><span class="p">;</span>
<span class="go">GRANT</span>
</pre></div>
<p>The user <code>analyst</code> can now access the <code>users</code> table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\connect</span><span class="w"> </span><span class="ss">db</span><span class="w"> </span><span class="ss">analyst</span>
<span class="go">You are now connected to database "db" as user "analyst".</span>
<span class="gp">db=></span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go"> id β username β personal_id β password_hash</span>
<span class="go">βββββΌβββββββββββΌββββββββββββββΌβββββββββββββββββββ</span>
<span class="go"> 1 β haki β 12222227 β super-secret-hash</span>
</pre></div>
<p>As mentioned previously, analysts access users data to produce reports and conduct analysis, but they should not have access to sensitive information or PII.</p>
<p>To provide granular control over which data a user can access in a table, PostgreSQL allows you to grant permissions only on specific columns of a table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\connect</span><span class="w"> </span><span class="ss">db</span><span class="w"> </span><span class="ss">postgres</span>
<span class="go">You are now connected to database "db" as user "postgres".</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">REVOKE</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">analyst</span><span class="p">;</span>
<span class="go">REVOKE</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">GRANT</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">analyst</span><span class="p">;</span>
</span><span class="go">GRANT</span>
</pre></div>
<p>After revoking the existing select permission on the table, you granted <code>analyst</code> select permission only on the <code>id</code> and <code>username</code> columns. Now, <code>analyst</code> can no longer access these columns:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\connect</span><span class="w"> </span><span class="ss">db</span><span class="w"> </span><span class="ss">analyst</span>
<span class="go">You are now connected to database "db" as user "analyst".</span>
<span class="gp">db=></span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> permission denied for table users</span>
</span>
<span class="gp">db=></span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">personal_id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> permission denied for table users</span>
</span>
<span class="gp">db=></span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go"> id β username</span>
<span class="go">βββββΌββββββββββ</span>
<span class="hll"><span class="go"> 1 β haki</span>
</span></pre></div>
<p>Notice that when the user <code>analyst</code> attempts to access any of the restricted columns, either explicitly or implicitly using <code>*</code>, they get a "permission denied" error.</p>
<hr>
<h2 id="match-against-multiple-patterns"><a class="toclink" href="#match-against-multiple-patterns">Match Against Multiple Patterns</a></h2>
<p>It's not uncommon to use pattern matching in SQL. For example, here is a query to find users with a "gmail.com" email account:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'%@gmail.com'</span><span class="p">;</span>
</pre></div>
<p>This query uses the wildcard '%' to find users with emails that end with "@gmail.com". What if, for example, in the same query you also want to find users with a "yahoo.com" email account?</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'%@gmail.com'</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'%@yahoo.com'</span>
</pre></div>
<p>To match against either one of these patterns, you can construct an <code>OR</code> condition. In PostgreSQL however, there is another way to match against multiple patterns:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">SIMILAR</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="s1">'%@gmail.com|%@yahoo.com'</span>
</span></pre></div>
<p>Using <a href="https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP" rel="noopener"><code>SIMILAR TO</code></a> you can match against multiple patterns and keep the query simple.</p>
<p>Another way to match against multiple patterns is using regexp:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s1">'@gmail\.com$|@yahoo\.com$'</span>
</span></pre></div>
<p>When using regexp you need to take be a bit more cautious. A period "<code>.</code>" will match anything, so to match the period "<code>.</code>" in <code>gmail.com</code> or <code>yahoo.com</code>, you need to add the escape character "<code>\.</code>".</p>
<p>When I posted this <a href="https://twitter.com/be_haki/status/1435859174538293248?s=20" rel="noopener">on twitter</a> I got some interesting responses. <a href="https://twitter.com/psycopg/status/1435926642388516866?s=20" rel="noopener">One comment</a> from the official account of <a href="https://www.psycopg.org/" rel="noopener">psycopg</a>, a PostgreSQL driver for Python, suggested another way:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="k">ANY</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="s1">'@gmail\.com$'</span><span class="p">,</span><span class="w"> </span><span class="s1">'@yahoo\.com$'</span><span class="p">])</span>
</span></pre></div>
<p>This query uses the <code>ANY</code> operator to match against an array of patterns. If an email matches any of the patterns, the condition will be true. This approach is easier to work with from a host language such as Python:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'''</span>
<span class="s1"> SELECT *</span>
<span class="s1"> FROM users</span>
<span class="hll"><span class="s1"> WHERE email ~ ANY(ARRAY</span><span class="si">%(patterns)s</span><span class="s1">)</span>
</span><span class="s1"> '''</span> <span class="o">%</span> <span class="p">{</span>
<span class="s1">'patterns'</span><span class="p">:</span> <span class="p">[</span>
<span class="s1">'@gmail\.com$'</span><span class="p">,</span>
<span class="s1">'@yahoo\.com$'</span><span class="p">,</span>
<span class="p">],</span>
<span class="p">})</span>
</pre></div>
<p>Unlike the previous approach that used <code>SIMILAR TO</code>, using <code>ANY</code> you can bind a list of patterns to the variable.</p>
<hr>
<h2 id="find-the-current-value-of-a-sequence-without-advancing-it"><a class="toclink" href="#find-the-current-value-of-a-sequence-without-advancing-it">Find the Current Value of a Sequence Without Advancing It</a></h2>
<p>If you ever needed to find the current value of a sequence, your first attempt was most likely using <a href="https://www.postgresql.org/docs/current/functions-sequence.html" rel="noopener"><code>currval</code></a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">currval</span><span class="p">(</span><span class="s1">'sale_id_seq'</span><span class="p">);</span>
<span class="gs">ERROR:</span><span class="gr"> currval of sequence "sale_id_seq" is not yet defined in this session</span>
</pre></div>
<p>Just like me, you probably found that <code>currval</code> only works if the sequence was defined or used in the current session. Advancing a sequence for no good reason is usually not something you want to do, so this is not an acceptable solution.</p>
<p>In PostgreSQL 10 the view <a href="https://www.postgresql.org/docs/current/view-pg-sequences.html" rel="noopener"><code>pg_sequences</code></a> was added to provide easy access to information about sequences:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_sequences</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sequencename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'sale_id_seq'</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]ββ¬ββββββββββββ</span>
<span class="go">schemaname β public</span>
<span class="go">sequencename β sale_id_seq</span>
<span class="go">sequenceowner β db</span>
<span class="go">data_type β integer</span>
<span class="go">start_value β 1</span>
<span class="go">min_value β 1</span>
<span class="go">max_value β 2147483647</span>
<span class="go">increment_by β 1</span>
<span class="go">cycle β f</span>
<span class="go">cache_size β 1</span>
<span class="hll"><span class="go">last_value β 155</span>
</span></pre></div>
<p>This table can answer your question, but it's not really a "lesser known feature", it's just another table in the information schema.</p>
<p>Another way to get the current value of a sequence is using the undocumented function <code>pg_sequence_last_value</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_sequence_last_value</span><span class="p">(</span><span class="s1">'sale_id_seq'</span><span class="p">);</span>
<span class="go"> pg_sequence_last_value</span>
<span class="go">ββββββββββββββββββββββββ</span>
<span class="go"> 155</span>
</pre></div>
<p>It's not clear why this function is not documented, but I couldn't find any mention of it in the <a href="https://www.postgresql.org/search/?u=%2Fdocs%2F14%2F&q=pg_sequence_last_value" rel="noopener">official documentation</a>. Take that under consideration if you decide to use it.</p>
<p>Another interesting thing I found while I was researching this, is that you can query a sequence, just like you would a table:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_id_seq</span><span class="p">;</span>
</span>
<span class="go"> last_value β log_cnt β is_called</span>
<span class="go">βββββββββββββΌββββββββββΌβββββββββββ</span>
<span class="go"> 155 β 10 β t</span>
</pre></div>
<p>This really makes you wonder what other types of objects you can query in PostgreSQL, and what you'll get in return.</p>
<p>It's important to note that this feature should not be used for anything except getting a cursory look at a sequence. You should not try to update ID's based on values from this output, for that you should use <code>nextval</code>.</p>
<hr>
<h2 id="use-copy-with-multi-line-sql"><a class="toclink" href="#use-copy-with-multi-line-sql">Use <code>\copy</code> With Multi-line SQL</a></h2>
<p>If you work with psql a lot you probably use <code>\COPY</code> very often to export data from the database. I know I do. One of the most annoying things about <code>\COPY</code> is that it does not allow multi-line queries:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\COPY</span><span class="w"> </span><span class="ss">(</span>
<span class="go">\copy: parse error at end of line</span>
</pre></div>
<p>When you try to add a new line to a <code>\copy</code> command you get this error message.</p>
<p>To overcome this restriction, my first idea was to use a view:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">VIEW</span><span class="w"> </span><span class="n">v_department_dbas</span><span class="w"> </span><span class="k">AS</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">employees</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dba'</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">department</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">employees</span><span class="p">;</span>
<span class="go">CREATE VIEW</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="kp">\COPY</span><span class="w"> </span><span class="ss">(SELECT</span><span class="w"> </span><span class="ss">*</span><span class="w"> </span><span class="ss">FROM</span><span class="w"> </span><span class="ss">v_department_dbas)</span><span class="w"> </span><span class="ss">TO</span><span class="w"> </span><span class="ss">department_dbas.csv</span><span class="w"> </span><span class="ss">WITH</span><span class="w"> </span><span class="ss">CSV</span><span class="w"> </span><span class="ss">HEADER;</span>
</span><span class="go">COPY 5</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">VIEW</span><span class="w"> </span><span class="n">v_department_dbas</span><span class="p">;</span>
<span class="go">DROP VIEW;</span>
</pre></div>
<p>This works, but if something fails in the middle it can leave views laying around. I like to keep my schema tidy, so I looked for a way to automatically cleanup after me. A quick search brought up <a href="https://www.postgresql.org/docs/current/sql-createview.html#id-1.9.3.97.6" rel="noopener">temporary views</a>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TEMPORARY</span><span class="w"> </span><span class="k">VIEW</span><span class="w"> </span><span class="n">v_department_dbas</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="mf">...</span>
</span><span class="k">CREATE</span><span class="w"> </span><span class="k">VIEW</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\COPY</span><span class="w"> </span><span class="ss">(SELECT</span><span class="w"> </span><span class="ss">*</span><span class="w"> </span><span class="ss">FROM</span><span class="w"> </span><span class="ss">v_department_dbas)</span><span class="w"> </span><span class="ss">TO</span><span class="w"> </span><span class="ss">department_dbas.csv</span><span class="w"> </span><span class="ss">WITH</span><span class="w"> </span><span class="ss">CSV</span><span class="w"> </span><span class="ss">HEADER;</span>
<span class="go">COPY 5</span>
</pre></div>
<p>Using temporary views I no longer had to cleanup after myself, because temporary views are automatically dropped when the session terminates.</p>
<p>I used temporary views for a while, until I struck this little gem in the <a href="https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-META-COMMANDS-COPY" rel="noopener">psql documentation</a>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">COPY</span><span class="w"> </span><span class="p">(</span>
</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">employees</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dba'</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">department</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">employees</span>
<span class="hll"><span class="p">)</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="k">STDOUT</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="k">CSV</span><span class="w"> </span><span class="k">HEADER</span><span class="w"> </span><span class="kp">\g</span><span class="w"> </span><span class="ss">department_dbas.csv</span>
</span><span class="k">COPY</span><span class="w"> </span><span class="mf">5</span>
</pre></div>
<p>Nice, right? Let's break it down:</p>
<ul>
<li>
<p><strong>Use <code>COPY</code> instead of <code>\COPY</code></strong>: the <code>COPY</code> command is a server command executed <em>in the server</em>, and <code>\COPY</code> is a psql command with the same interface. So while <code>\COPY</code> does not support multi-line queries, <code>COPY</code> does!</p>
</li>
<li>
<p><strong>Write results to STDOUT</strong>: Using <code>COPY</code> we can write results to a directory on the server, or write results to the standard output, using <code>TO STDOUT</code>.</p>
</li>
<li>
<p><strong>Use <code>\g</code> to write STDOUT to local file</strong>: Finally, psql provides a command to write the output from standard output to a file.</p>
</li>
</ul>
<p>Combining these three features did exactly what I wanted.</p>
<div class="admonition tip">
<p class="admonition-title">Copy expert</p>
<p>If you move a lot of data around, don't miss the <a href="fast-load-data-python-postgresql">fastest way to load data into PostgreSQL using Python</a>.</p>
</div>
<hr>
<h2 id="prevent-setting-the-value-of-an-auto-generated-key"><a class="toclink" href="#prevent-setting-the-value-of-an-auto-generated-key">Prevent Setting the Value of an Auto Generated Key</a></h2>
<p>If you are using auto generated primary keys in PostgreSQL, it's possible you are still using the <a href="https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-SERIAL" rel="noopener"><code>SERIAL</code> datatype</a>:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="p">,</span>
<span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="nb">INT</span>
<span class="p">);</span>
</pre></div>
<p>Behind the scenes, PostgreSQL creates a sequence to use when rows are added:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span><span class="p">;</span>
<span class="go"> id β sold_at β amount</span>
<span class="go">βββββΌββββββββββββββββββββββββββββββββΌββββββββ</span>
<span class="go"> 1 β 2021-09-25 10:06:56.646298+03 β 1000</span>
</pre></div>
<p>The <code>SERIAL</code> data type is unique to PostgreSQL and <a href="https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/" rel="noopener">has some known problems</a>, so starting at version 10, the <code>SERIAL</code> datatype was softly deprecated in favor of <em>identity columns</em>:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">GENERATED</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">IDENTITY</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="p">,</span>
<span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="nb">INT</span>
<span class="p">);</span>
</pre></div>
<p>Identity columns work very similar to the <code>SERIAL</code> data type:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span><span class="p">;</span>
<span class="go"> id β sold_at β amount</span>
<span class="go">βββββΌββββββββββββββββββββββββββββββββΌββββββββ</span>
<span class="go"> 1 β 2021-09-25 10:11:57.771121+03 β 1000</span>
</pre></div>
<p>But, consider this scenario:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> duplicate key value violates unique constraint "sale_pkey"</span>
</span><span class="gs">DETAIL:</span><span class="gr"> Key (id)=(2) already exists.</span>
</pre></div>
<p>Why did it fail?</p>
<ul>
<li>The first <code>INSERT</code> command explicitly provides the value 2 of the <code>id</code> column, so the sequence was not used.</li>
<li>The second <code>INSERT</code> command does not provide a value for <code>id</code>, so the sequence is used. The next value of the sequence happened to be 2, so the command failed with a unique constraint violation.</li>
</ul>
<p>Auto-incrementing IDs rarely need to be set manually, and doing so can cause a mess. So how can you prevent users from setting them?</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">GENERATED</span><span class="w"> </span><span class="n">ALWAYS</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">IDENTITY</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="p">,</span>
<span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="nb">INT</span>
<span class="p">);</span>
</pre></div>
<p>Instead of using <code>GENERATED BY DEFAULT</code>, use <code>GENERATED ALWAYS</code>. To understand the difference, try the same scenario again:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> cannot insert into column "id"</span>
</span><span class="hll"><span class="gs">DETAIL:</span><span class="gr"> Column "id" is an identity column defined as GENERATED ALWAYS.</span>
</span><span class="gs">HINT:</span><span class="gr"> Use OVERRIDING SYSTEM VALUE to override.</span>
</pre></div>
<p>What changed?</p>
<ul>
<li>The first <code>INSERT</code> does not provide a value for <code>id</code> and completes successfully.</li>
<li>The second <code>INSERT</code> command however, attempts to set the value 2 for <code>id</code> and fails!</li>
</ul>
<p>In the error message, PostgreSQL is kind enough to offer a solution for when you actually <em>do</em> want to set the value for an identity column explicitly:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span>
<span class="hll"><span class="k">OVERRIDING</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">VALUE</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="mf">1000</span><span class="p">);</span>
</span>
<span class="go">INSERT 0 1</span>
</pre></div>
<p>By adding the <code>OVERRIDING SYSTEM VALUE</code> to the <code>INSERT</code> command you explicitly instruct PostgreSQL to allow you to set the value of an identity column. You still have to handle a possible unique constraint violation, but you can no longer blame PostgreSQL for it!</p>
<hr>
<h2 id="two-more-ways-to-produce-a-pivot-table"><a class="toclink" href="#two-more-ways-to-produce-a-pivot-table">Two More Ways to Produce a Pivot Table</a></h2>
<p>In one of my previous articles I demonstrated <a href="sql-for-data-analysis#pivot-tables">how to produce pivot tables using conditional aggregates</a>. After writing the article, I found two more ways to generate pivot tables in PostgreSQL.</p>
<p>Say you want to get the number of employees, at each role, in each department:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">employees</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Haki'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Dan'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jax'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Bill'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'David'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">employees</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">;</span>
<span class="go"> role β department β count</span>
<span class="go">ββββββββββββΌβββββββββββββΌβββββββ</span>
<span class="go"> Developer β Sales β 2</span>
<span class="go"> Manager β Sales β 1</span>
<span class="go"> Manager β R&D β 1</span>
<span class="go"> Developer β R&D β 2</span>
</pre></div>
<p>A better way of viewing this would be as a pivot table. In psql you can use the <code>\crosstabview</code> command to transform the results of the last query to a pivot table:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="kp">\crosstabview</span>
</span>
<span class="go"> role β Sales β R&D</span>
<span class="go">ββββββββββββΌββββββββΌβββββ</span>
<span class="go"> Developer β 2 β 2</span>
<span class="go"> Manager β 1 β 1</span>
</pre></div>
<p>Magic!</p>
<p>By default, the command will produce the pivot table from the first two columns, but you can control that with arguments:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="kp">\crosstabview</span><span class="w"> </span><span class="ss">department</span><span class="w"> </span><span class="ss">role</span>
</span>
<span class="go"> department β Developer β Manager</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββββ</span>
<span class="go"> Sales β 2 β 1</span>
<span class="go"> R&D β 2 β 1</span>
</pre></div>
<p>Another, slightly less magical way to produce a pivot table is using the built-in <a href="https://www.postgresql.org/docs/current/tablefunc.html" rel="noopener"><code>tablefunc</code> extension</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTENSION</span><span class="w"> </span><span class="n">tablefunc</span><span class="p">;</span>
<span class="go">CREATE EXTENSION</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">crosstab</span><span class="p">(</span><span class="s1">'</span>
</span><span class="s1"> SELECT role, department, count(*) AS employees</span>
<span class="s1"> FROM employees</span>
<span class="s1"> GROUP BY 1, 2</span>
<span class="s1"> ORDER BY role</span>
<span class="s1">'</span><span class="p">,</span><span class="w"> </span><span class="s1">'</span>
<span class="s1"> SELECT DISTINCT department</span>
<span class="s1"> FROM employees</span>
<span class="s1"> ORDER BY 1</span>
<span class="s1">'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="k">role</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">sales</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">rnd</span><span class="w"> </span><span class="nb">int</span><span class="p">);</span>
<span class="go"> role β sales β rnd</span>
<span class="go">ββββββββββββΌββββββββΌβββββ</span>
<span class="go"> Developer β 2 β 2</span>
<span class="go"> Manager β 1 β 1</span>
</pre></div>
<p>Using the function <a href="https://www.postgresql.org/docs/current/tablefunc.html#id-1.11.7.47.5.5" rel="noopener"><code>crosstab</code></a> you can produce a pivot table. The downside of this method is that you need to define the output columns in advance. The advantage however, is that the <code>crosstab</code> function produces a table, which you can use as a sub-query for further processing.</p>
<hr>
<h2 id="dollar-quoting"><a class="toclink" href="#dollar-quoting">Dollar Quoting</a></h2>
<p>If you store text fields in your database, especially entire paragraphs, you are probably familiar with escape characters. For example, to include a single quote <code>'</code> in a text literal you need to escape it using another single quote <code>''</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s1">'John''s Pizza'</span><span class="p">;</span>
<span class="go"> ?column?</span>
<span class="go">ββββββββββββββ</span>
<span class="go"> John's Pizza</span>
</pre></div>
<p>When text starts to get bigger, and include characters like backslashes and new lines, it can get pretty annoying to add escape characters. To address this, PostgreSQL provides another way to write string constants:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s">$$a long</span>
<span class="s">string with new lines</span>
<span class="s">and 'single quotes'</span>
<span class="s">and "double quotes</span>
<span class="s">PostgreSQL doesn't mind ;)$$</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span>
<span class="go"> text</span>
<span class="go">βββββββββββββββββββββββββββ</span>
<span class="go"> a long β΅</span>
<span class="go"> string with new lines β΅</span>
<span class="go"> and 'single quotes' β΅</span>
<span class="go"> and "double quotes β΅</span>
<span class="go"> β΅</span>
<span class="go"> PostgreSQL doesn't mind ;)</span>
</pre></div>
<p>Notice the dollar signs <code>$$</code> at the beginning and end of the string. Anything in between <code>$$</code> is treated as a string. PostgreSQL calls this <a href="https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING" rel="noopener">"Dollar Quoting"</a>.</p>
<p>But there is more, if you happen to need to use the sign <code>$$</code> in the text, you can add a tag, which makes this even more useful. For example:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s">$</span><span class="dl">JSON</span><span class="s">${</span>
<span class="s"> "name": "John's Pizza",</span>
<span class="s"> "tagline": "Best value for your $$"</span>
<span class="s">}$</span><span class="dl">JSON</span><span class="s">$</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">json</span><span class="p">;</span>
<span class="go"> json</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> { β΅</span>
<span class="go"> "name": "John's Pizza", β΅</span>
<span class="go"> "tagline": "Best value for your $$"β΅</span>
<span class="go"> }</span>
</pre></div>
<p>Notice that we choose to tag this block with <code>$JSON$</code>, so the sign "$$" was included as a whole in the output.</p>
<p>You can also use this to quickly generate jsonb objects that include special characters:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s">$</span><span class="dl">JSON</span><span class="s">${</span>
<span class="s"> "name": "John's Pizza",</span>
<span class="s"> "tagline": "Best value for your $$"</span>
<span class="hll"><span class="s">}$</span><span class="dl">JSON</span><span class="s">$</span><span class="o">::</span><span class="nb">jsonb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">json</span><span class="p">;</span>
</span><span class="go"> json</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> {"name": "John's Pizza", "tagline": "Best value for your $$"}</span>
</pre></div>
<p>The value is now a jsonb object which you can manipulate as you wish!</p>
<hr>
<h2 id="comment-on-database-objects"><a class="toclink" href="#comment-on-database-objects">Comment on Database Objects</a></h2>
<p>PostgreSQL has this nice little feature where you can <a href="https://www.postgresql.org/docs/current/sql-comment.html" rel="noopener">add a comments on just about every database object</a>. For example, adding a comment on a table:</p>
<div class="highlight"><pre><span></span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">COMMENT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="s1">'Sales made in the system'</span><span class="p">;</span>
<span class="k">COMMENT</span>
</pre></div>
<p>You can now view this comment in psql (and probably other IDEs):</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\dt+</span><span class="w"> </span><span class="ss">sale</span>
<span class="go"> List of relations</span>
<span class="go"> Schema β Name β Type β Owner β Persistence β Size β Description</span>
<span class="go">βββββββββΌβββββββΌββββββββΌββββββββΌββββββββββββββΌβββββββββββββΌββββββββββββββββββββββββββ</span>
<span class="go"> public β sale β table β haki β permanent β 8192 bytes β Sales made in the system</span>
</pre></div>
<p>You can also add comments on table columns, and view them when using extended describe:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">COMMENT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">sale</span><span class="mf">.</span><span class="n">sold_at</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="s1">'When was the sale finalized'</span><span class="p">;</span>
<span class="go">COMMENT</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\d+</span><span class="w"> </span><span class="ss">sale</span>
<span class="go"> Column β Type β Description</span>
<span class="go">βββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ</span>
<span class="go"> id β integer β</span>
<span class="hll"><span class="go"> sold_at β timestamp with time zone β When was the sale finalized</span>
</span><span class="go"> amount β integer β</span>
</pre></div>
<p>You can also combine the <code>COMMENT</code> command with dollar quoting to include longer and more meaningful descriptions of, for example, functions:</p>
<div class="highlight"><pre><span></span><span class="k">COMMENT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">generate_random_string</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="err">$</span><span class="n">docstring</span><span class="err">$</span>
<span class="n">Generate</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">given</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">list</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">possible</span><span class="w"> </span><span class="n">characters</span><span class="mf">.</span>
<span class="n">Parameters</span><span class="p">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="p">(</span><span class="nb">int</span><span class="p">):</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="n">string</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">characters</span><span class="w"> </span><span class="p">(</span><span class="nb">text</span><span class="p">):</span><span class="w"> </span><span class="n">possible</span><span class="w"> </span><span class="n">characters</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">choose</span><span class="w"> </span><span class="k">from</span>
<span class="n">Example</span><span class="p">:</span>
<span class="w"> </span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">10</span><span class="p">);</span>
<span class="go"> generate_random_string</span>
<span class="go"> ββββββββββββββββββββββββ</span>
<span class="go"> o0QsrMYRvp</span>
<span class="go"> db=# SELECT generate_random_string(3, 'AB');</span>
<span class="go"> generate_random_string</span>
<span class="go"> ββββββββββββββββββββββββ</span>
<span class="go"> ABB</span>
<span class="go">$docstring$;</span>
</pre></div>
<p>This is a function I used in the past to demonstrate the <a href="/sql-medium-text-performance#toast-compression">performance impact of medium sized texts on performance</a>. Now I no longer have to go back to the article to remember how to use the function, I have the docstring right there in the comments:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\df+</span><span class="w"> </span><span class="ss">generate_random_string</span>
<span class="go">List of functions</span>
<span class="go">βββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Schema β public</span>
<span class="go">Name β generate_random_string</span>
<span class="go">/* ... */</span>
<span class="go">Description β Generate a random string at a given length from a list of possible characters.β΅</span>
<span class="go"> β β΅</span>
<span class="go"> β Parameters: β΅</span>
<span class="go"> β β΅</span>
<span class="go"> β - length (int): length of the output string β΅</span>
<span class="go"> β - characters (text): possible characters to choose from β΅</span>
<span class="go"> β β΅</span>
<span class="go"> β Example: β΅</span>
<span class="go"> β β΅</span>
<span class="go"> β db=# SELECT generate_random_string(10); β΅</span>
<span class="go"> β generate_random_string β΅</span>
<span class="go"> β ββββββββββββββββββββββββ β΅</span>
<span class="go"> β o0QsrMYRvp β΅</span>
<span class="go"> β β΅</span>
<span class="go"> β db=# SELECT generate_random_string(3, 'AB'); β΅</span>
<span class="go"> β generate_random_string β΅</span>
<span class="go"> β ββββββββββββββββββββββββ β΅</span>
<span class="go"> β ABB β΅</span>
<span class="go"> β</span>
</pre></div>
<hr>
<h2 id="keep-a-separate-history-file-per-database"><a class="toclink" href="#keep-a-separate-history-file-per-database">Keep a Separate History File Per Database</a></h2>
<p>If you are working with CLI tools you probably use the ability to search past commands very often. In bash and psql, a reverse search is usually available by hitting <kbd>CTRL + R</kbd>.</p>
<p>If in addition to working with the terminal, you also work with multiple databases, you might find it useful to keep a separate history file per database:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\set</span><span class="w"> </span><span class="ss">HISTFILE</span><span class="w"> </span><span class="ss">~/.psql_history-</span><span class="w"> </span><span class="nv">:DBNAME</span>
</pre></div>
<p>This way, you are more likely to find a relevant match for the database you are currently connected to. You can drop this in your <a href="https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.10" rel="noopener"><code>~/.psqlrc</code> file</a> to make it persistent.</p>
<hr>
<h2 id="autocomplete-reserved-words-in-uppercase"><a class="toclink" href="#autocomplete-reserved-words-in-uppercase">Autocomplete Reserved Words in Uppercase</a></h2>
<p>There is always a lot of debate (and jokes!) on whether keywords in SQL should be in lower or upper case. I think my opinion on this subject is pretty clear.</p>
<p>If like me, you like using uppercase keywords in SQL, there is an option in psql to autocomplete keywords in uppercase:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="n">selec</span><span class="w"> </span><span class="o"><</span><span class="n">tab</span><span class="o">></span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">select</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="kp">\set</span><span class="w"> </span><span class="ss">COMP_KEYWORD_CASE</span><span class="w"> </span><span class="ss">upper</span>
</span><span class="gp">db=#</span><span class="w"> </span><span class="n">selec</span><span class="w"> </span><span class="o"><</span><span class="n">tab</span><span class="o">></span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
</pre></div>
<p>After setting <code>COMP_KEYWORD_CASE</code> to upper, when you hit <kbd>TAB</kbd> for autocomplete, keywords will be autocompleted in uppercase.</p>
<hr>
<h2 id="sleep-for-interval"><a class="toclink" href="#sleep-for-interval">Sleep for Interval</a></h2>
<p>Delaying the execution of a program can be pretty useful for things like testing or throttling. To delay the execution of a program in PostgreSQL, the go-to function is usually <code>pg_sleep</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\timing</span>
<span class="go">Timing is on.</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_sleep</span><span class="p">(</span><span class="mf">3</span><span class="p">);</span>
</span><span class="go"> pg_sleep</span>
<span class="go">ββββββββββ</span>
<span class="go">(1 row)</span>
<span class="hll"><span class="go">Time: 3014.913 ms (00:03.015)</span>
</span></pre></div>
<p>The function sleeps for the given number of seconds. However, when you need to sleep for longer than just a few seconds, calculating the number of seconds can be annoying, for example:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_sleep</span><span class="p">(</span><span class="mf">255</span><span class="p">);</span>
</pre></div>
<p>How long will this function sleep for? Don't take out the calculator, the function will sleep for 4 minutes and 15 seconds.</p>
<p>To make it more convenient to sleep for longer periods of time, PostgreSQL offers another function:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_sleep_for</span><span class="p">(</span><span class="s1">'4 minutes 15 seconds'</span><span class="p">);</span>
</span></pre></div>
<p>Unlike its sibling <code>pg_sleep</code>, the function <a href="https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-DELAY" rel="noopener"><code>pg_sleep_for</code></a> accepts an interval, which is much more natural to read and understand than the number of seconds.</p>
<hr>
<h2 id="get-the-first-or-last-row-in-a-group-without-sub-queries"><a class="toclink" href="#get-the-first-or-last-row-in-a-group-without-sub-queries">Get the First or Last Row in a Group Without Sub-Queries</a></h2>
<p>When I initially compiled this list I did not think about this feature as a lesser known one, mostly because I use it <em>all the time</em>. But to my surprise, I keep running into weird solutions to this problem, that can be easily solved with what I'm about to show you, so I figured it deserves a place on the list!</p>
<p>Say you have the this table of students:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">students</span><span class="p">;</span>
<span class="go"> name β class β height</span>
<span class="go">βββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> Dan β A β 175</span>
<span class="go"> Jax β A β 182</span>
<span class="go"> George β B β 178</span>
<span class="go"> Bill β B β 167</span>
<span class="go"> David β B β 178</span>
</pre></div>
<p><details></p>
<p><summary>β Table data</summary></p>
<p>You can use the following CTE to reproduce queries in this section</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">students</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Haki'</span><span class="p">,</span><span class="w"> </span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="mf">186</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Dan'</span><span class="p">,</span><span class="w"> </span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="mf">175</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jax'</span><span class="p">,</span><span class="w"> </span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="mf">182</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="mf">178</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Bill'</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="mf">167</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'David'</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="mf">178</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">students</span><span class="p">;</span>
</pre></div>
<p></details></p>
<p><strong>How would you get the entire row of the tallest student in each class?</strong></p>
<p>On first thought you might try something like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">height</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">tallest</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">students</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span><span class="p">;</span>
<span class="go"> class β tallest</span>
<span class="go">ββββββββΌβββββββββ</span>
<span class="go"> A β 186</span>
<span class="go"> B β 178</span>
</pre></div>
<p>This gets you the height, but it doesn't get you the name of the student. As a second attempt you might try to find the tallest student based on its height, using a sub-query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">students</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="p">)</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span>
</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">height</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">tallest</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">students</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span>
<span class="p">);</span>
<span class="go"> name β class β height</span>
<span class="go">βββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> George β B β 178</span>
<span class="go"> David β B β 178</span>
</pre></div>
<p>Now you have all the information about the tallest students in each class, but there is another problem.</p>
<div class="admonition tip">
<p class="admonition-title">side note</p>
<p>The ability to match a set of records like in the previous query (<code>(class, height) IN (...)</code>), is another lesser known, but a very powerful feature of PostgreSQL.</p>
</div>
<p>In class "B", there are two students with the same height, which also happen to be the tallest. Using the aggregate function <code>MAX</code> you only get the height, so you may encounter this type of situation.</p>
<p>The challenge with using <code>MAX</code> is that you choose the height based only on the height, which makes perfect sense in this case, but you still need to pick just one student. A different approach that lets you "rank" rows based on more than one column, is using a window function:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">students</span><span class="mf">.</span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span>
</span><span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="k">name</span>
</span><span class="hll"><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">rn</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">students</span><span class="p">;</span>
<span class="go"> name β class β height β rn</span>
<span class="go">βββββββββΌββββββββΌβββββββββΌββββ</span>
<span class="go"> Haki β A β 186 β 1</span>
<span class="go"> Jax β A β 182 β 2</span>
<span class="go"> Dan β A β 175 β 3</span>
<span class="go"> David β B β 178 β 1</span>
<span class="go"> George β B β 178 β 2</span>
<span class="go"> Bill β B β 167 β 3</span>
</pre></div>
<p>To "rank" students bases on their height you can attach a row number for each row. The row number is determined for each class (<code>PARTITION BY class</code>) and ranked first by height in descending order, and then by the students' name (<code>ORDER BY height DESC, name</code>). Adding the student name in addition to the height makes the results deterministic (assuming the name is unique).</p>
<p>To get the rows of only the tallest student in each class you can use a sub-query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span>
<span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">students</span><span class="mf">.</span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="k">name</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">rn</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">students</span>
<span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">inner</span>
<span class="k">WHERE</span>
<span class="hll"><span class="w"> </span><span class="n">rn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
</span>
<span class="go"> name β class β height</span>
<span class="go">ββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> David β B β 178</span>
</pre></div>
<p>You made it! This is the entire row for the tallest student in each class.</p>
<p><strong>Using <code>DISTINCT ON</code></strong></p>
<p>Now that you went through all of this trouble, let me show you an easier way:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="k">class</span><span class="p">)</span>
</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">students</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="k">name</span><span class="p">;</span>
</span>
<span class="go"> name β class β height</span>
<span class="go">ββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> David β B β 178</span>
</pre></div>
<p>Pretty nice, right? I was blown away when <a href="/the-many-faces-of-distinct-in-postgre-sql#distinct-on">I first discovered <code>DISTINCT ON</code></a>. Coming from Oracle, there was nothing like that, and as far as I know, no other database other than PostgreSQL does.</p>
<p><strong>Intuitively understand <code>DISTINCT ON</code></strong></p>
<p>To understand how <code>DISTINCT ON</code> works, let's go over what it does step by step. This is the raw data in the table:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">students</span><span class="p">;</span>
<span class="go"> name β class β height</span>
<span class="go">βββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> Dan β A β 175</span>
<span class="go"> Jax β A β 182</span>
<span class="go"> George β B β 178</span>
<span class="go"> Bill β B β 167</span>
<span class="go"> David β B β 178</span>
</pre></div>
<p>Next, sort the data:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">students</span>
<span class="hll"><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="k">name</span><span class="p">;</span>
</span>
<span class="go"> name β class β height</span>
<span class="go">βββββββββΌββββββββΌββββββββ</span>
<span class="go"> Haki β A β 186</span>
<span class="go"> Jax β A β 182</span>
<span class="go"> Dan β A β 175</span>
<span class="go"> David β B β 178</span>
<span class="go"> George β B β 178</span>
<span class="go"> Bill β B β 167</span>
</pre></div>
<p>Then, add the <code>DISTINCT ON</code> clause:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="k">class</span><span class="p">)</span><span class="w"> </span><span class="o">*</span>
</span><span class="k">FROM</span><span class="w"> </span><span class="n">students</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">;</span>
</pre></div>
<p>To understand what <code>DISTINCT ON</code> does at this point, we need to take two steps.</p>
<p>First, split the data to groups based on the columns in the <code>DISTINCT ON</code> clause, in this case by <code>class</code>:</p>
<div class="highlight"><pre><span></span> name β class β height
βββββββββββββββββββββββββ
Haki β A β 186 β
Jax β A β 182 β£ββ class=A
Dan β A β 175 β
David β B β 178 β
George β B β 178 β£ββ class=B
Bill β B β 167 β
</pre></div>
<p>Next, keep only the first row in each group:</p>
<div class="highlight"><pre><span></span> name β class β height
βββββββββββββββββββββββββ
Haki β A β 186 β£ββ class=A
David β B β 178 β£ββ class=B
</pre></div>
<p>And there you have it! The tallest student in each class.</p>
<p>The only requirement <code>DISTINCT ON</code> has, is that the leading columns in the <code>ORDER BY</code> clause will match the columns in the <code>DISTINCT ON</code> clause. The remaining columns in the <code>ORDER BY</code> clause are used to determine which row is selected for each group.</p>
<p>To illustrate how the <code>ORDER BY</code> affect the results, consider this query to find the <em>shortest</em> student in each class:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="k">class</span><span class="p">)</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">students</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">class</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="p">,</span><span class="w"> </span><span class="k">name</span><span class="p">;</span>
</span>
<span class="go"> name β class β height</span>
<span class="go">βββββββΌββββββββΌββββββββ</span>
<span class="go"> Dan β A β 175</span>
<span class="go"> Bill β B β 167</span>
</pre></div>
<p>To pick the shortest student in each class, you only have to change the sort order, so that the first row of each group is the shortest student.</p>
<hr>
<h2 id="generate-uuid-without-extensions"><a class="toclink" href="#generate-uuid-without-extensions">Generate UUID Without Extensions</a></h2>
<p>To generate UUIDs in PostgreSQL prior to version 13 you probably used the <a href="https://www.postgresql.org/docs/current/uuid-ossp.html#id-1.11.7.53.5" rel="noopener"><code>uuid-ossp</code> extension</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTENSION</span><span class="w"> </span><span class="s s-Name">"uuid-ossp"</span><span class="p">;</span>
<span class="go">CREATE EXTENSION</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">uuid_generate_v4</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">uuid</span><span class="p">;</span>
</span><span class="go"> uuid</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> 8e55146d-0ce5-40ab-a346-5dbd466ff5f2</span>
</pre></div>
<p>Starting at version 13 there is a <a href="https://www.postgresql.org/docs/current/functions-uuid.html" rel="noopener">built-in function to generate random (version 4) UUIDs</a>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">gen_random_uuid</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">uuid</span><span class="p">;</span>
</span><span class="go"> uuid</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> ba1ac0f5-5d4d-4d80-974d-521dbdcca2b2</span>
</pre></div>
<p>The <code>uuid-ossp</code> extension is still needed if you want to generate UUIDs other than version 4.</p>
<hr>
<h2 id="generate-reproducible-random-data"><a class="toclink" href="#generate-reproducible-random-data">Generate Reproducible Random Data</a></h2>
<p>Generating radom data is very useful for many things such for demonstrations or testing. In both cases, it's also useful to be able to reproduce the "random" data.</p>
<p>Using PostgreSQL <code>random</code> function you can <a href="sql-for-data-analysis#random">produce different types of random data</a>. For example:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_float</span><span class="p">,</span>
<span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_int_0_10</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2022-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 days'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_day_in_2022</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]βββββββ¬ββββββββββββββββββββ</span>
<span class="go">random_float β 0.6031888056092001</span>
<span class="go">random_int_0_10 β 3</span>
<span class="go">random_day_in_2022 β 2022-11-10 00:00:00</span>
</pre></div>
<p>If you execute this query again, you will get different results:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_float</span><span class="p">,</span>
<span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_int_0_10</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2022-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 days'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_day_in_2022</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]βββββββ¬ββββββββββββββββββββ</span>
<span class="go">random_float β 0.7363406030115378</span>
<span class="go">random_int_0_10 β 2</span>
<span class="go">random_day_in_2022 β 2022-02-23 00:00:00</span>
</pre></div>
<p>To generate reproducible random data, you can use <a href="https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE" rel="noopener"><code>setseed</code></a>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">setseed</span><span class="p">(</span><span class="mf">0.4050</span><span class="p">);</span>
</span><span class="go"> setseed</span>
<span class="go">βββββββββ</span>
<span class="go">(1 row)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_float</span><span class="p">,</span>
<span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_int_0_10</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2022-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 days'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_day_in_2022</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span>
<span class="go"> random_float β random_int_0_10 β random_day_in_2022</span>
<span class="go">βββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββ</span>
<span class="hll"><span class="go"> 0.1924247516794324 β 9 β 2022-12-17 00:00:00</span>
</span><span class="hll"><span class="go"> 0.9720620908236377 β 5 β 2022-06-13 00:00:00</span>
</span></pre></div>
<p>If you execute the same block again in a new session, even in a different database, it will produce the exact same results:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">otherdb=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">setseed</span><span class="p">(</span><span class="mf">0.4050</span><span class="p">);</span>
</span><span class="go"> setseed</span>
<span class="go">βββββββββ</span>
<span class="go">(1 row)</span>
<span class="gp">otherdb=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_float</span><span class="p">,</span>
<span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_int_0_10</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2022-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 days'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">random_day_in_2022</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span>
<span class="go"> random_float β random_int_0_10 β random_day_in_2022</span>
<span class="go">βββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββ</span>
<span class="hll"><span class="go"> 0.1924247516794324 β 9 β 2022-12-17 00:00:00</span>
</span><span class="hll"><span class="go"> 0.9720620908236377 β 5 β 2022-06-13 00:00:00</span>
</span></pre></div>
<p>Notice how the results are random, but still exactly the same. The next time you do a demonstration or share a script, make sure to include <code>setseed</code> so your results could be easily reproduced.</p>
<hr>
<h2 id="add-constraints-without-validating-immediately"><a class="toclink" href="#add-constraints-without-validating-immediately">Add Constraints Without Validating Immediately</a></h2>
<p>Constraint are an integral part of any RDBMS. They keep data clean and reliable, and should be used whenever possible. In living breathing systems, you often need to add new constraints, and adding certain types of constraints may require very restrictive locks that interfere with the operation of the live system.</p>
<p>To illustrate, add a simple check constraint on a large table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">check_price_gt_zero</span><span class="w"> </span><span class="k">CHECK</span><span class="w"> </span><span class="p">(</span><span class="n">price</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mf">0</span><span class="p">);</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 10745.662 ms (00:10.746)</span>
</pre></div>
<p>This statement adds a check constraint on the price of an order, to make sure it's greater than or equal to zero. In the process of adding the constraint, the database scanned the entire table to make sure the constraint is valid for all the existing rows. The process took ~10s, and during that time, the table was locked.</p>
<p><strong>In PostgreSQL, you can split the process of adding a constraint into two steps.</strong></p>
<p>First, add the constraint and only validate new data, but don't check that existing data is valid:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">check_price_gt_zero</span><span class="w"> </span><span class="k">CHECK</span><span class="w"> </span><span class="p">(</span><span class="n">price</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">VALID</span><span class="p">;</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 13.590 ms</span>
</pre></div>
<p>The <code>NOT VALID</code> in the end tells PostgreSQL to not validate the new constraint for existing rows. This means the database does not have to scan the entire table. Notice how this statement took significantly less time compared to the previous, it was almost instantaneous.</p>
<p>Next, validate the constraint for the existing data with a much more permissive lock that allows other operations on the table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">VALIDATE</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">check_price_gt_zero</span><span class="p">;</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 11231.189 ms (00:11.231)</span>
</pre></div>
<p>Notice how validating the constraint took roughly the same time as the first example, which added and validated the constraint. This reaffirms that when adding a constraint to an existing table, most time is spent validating existing rows. Splitting the process into two steps allows you to reduce the time the table is locked.</p>
<p>The documentation also mentions <a href="https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-NOTES" rel="noopener">another use case for <code>NOT VALID</code></a> - enforcing a constraint only on future updates, even if there are some existing bad values. That is, you would add <code>NOT VALID</code> and never do the <code>VALIDATE</code>.</p>
<p>Check out this great article from the engineering team at Paypal about <a href="https://medium.com/paypal-tech/postgresql-at-scale-database-schema-changes-without-downtime-20d3749ed680" rel="noopener">making schema changes without downtime</a>, and my own tip to <a href="https://hakibenita.com/sql-tricks-application-dba#disable-constraints-and-indexes-during-bulk-loads" rel="noopener">disable constraints and indexes during bulk loads</a>.</p>
<hr>
<h2 id="synonyms-in-postgresql"><a class="toclink" href="#synonyms-in-postgresql">Synonyms in PostgreSQL</a></h2>
<p>Synonyms are a way to reference objects by another name, similar to symlinks in Linux. If you're coming from Oracle you are probably familiar with synonyms, but otherwise you may have never heard about it. PostgreSQL does not have a feature called "synonyms", but it doesn't mean it's not possible.</p>
<p>To have a name reference a different database object, you first need to understand how PostgreSQL resolves unqualified names. For example, if you are connected to the database with the user <code>haki</code>, and you reference a table <code>foo</code>, PostgreSQL will search for the following objects, in this order:</p>
<ol>
<li><code>haki.foo</code></li>
<li><code>public.foo</code></li>
</ol>
<p>This order is determined by the <a href="https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH" rel="noopener"><code>search_path</code> parameter</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SHOW</span><span class="w"> </span><span class="n">search_path</span><span class="p">;</span>
<span class="go"> search_path</span>
<span class="go">βββββββββββββββββ</span>
<span class="go"> "$user", public</span>
</pre></div>
<p>The first value, <code>"$user"</code> is a special value that resolves to the name of the currently connected user. The second value, <code>public</code>, is the name of the default schema.</p>
<p>To demonstrate some of the things you can do with search path, create a table <code>foo</code> in database <code>db</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'A'</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="go"> value</span>
<span class="go">βββββββ</span>
<span class="go"> A</span>
<span class="go">(1 row)</span>
</pre></div>
<p>If for some reason you want the user <code>haki</code> to view a different object when they reference the name <code>foo</code>, you have two options:</p>
<p><strong>1. Create an object named <code>foo</code> in a schema called <code>haki</code>:</strong></p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">haki</span><span class="p">;</span>
<span class="go">CREATE SCHEMA</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">haki</span><span class="mf">.</span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="w"> </span><span class="nb">text</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">haki</span><span class="mf">.</span><span class="n">foo</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'B'</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\conninfo</span>
<span class="go">You are connected to database "db" as user "haki"</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
</span><span class="hll"><span class="go">value</span>
</span><span class="hll"><span class="go">βββββββ</span>
</span><span class="hll"><span class="go">B</span>
</span></pre></div>
<p>Notice how when the user <code>haki</code> referenced the name <code>foo</code>, PostgreSQL resolved the name to <code>haki.foo</code> and not <code>public.foo</code>. This is because the schema <code>haki</code> comes before <code>public</code> in the search path.</p>
<p><strong>2. Update the search path:</strong></p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">synonyms</span><span class="p">;</span>
<span class="go">CREATE SCHEMA</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">synonyms</span><span class="mf">.</span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="w"> </span><span class="nb">text</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">synonyms</span><span class="mf">.</span><span class="n">foo</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'C'</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SHOW</span><span class="w"> </span><span class="n">search_path</span><span class="p">;</span>
<span class="go"> search_path</span>
<span class="go">βββββββββββββββββ</span>
<span class="hll"><span class="go"> "$user", public</span>
</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="go"> value</span>
<span class="go">βββββββ</span>
<span class="hll"><span class="go"> A</span>
</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">search_path</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">synonyms</span><span class="p">,</span><span class="w"> </span><span class="s s-Name">"$user"</span><span class="p">,</span><span class="w"> </span><span class="n">public</span><span class="p">;</span>
</span><span class="go">SET</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="go"> value</span>
<span class="go">βββββββ</span>
<span class="hll"><span class="go"> C</span>
</span></pre></div>
<p>Notice how after changing the search path to include the schema <code>synonyms</code>, PostgreSQL resolved the name <code>foo</code> to <code>synonyms.foo</code>.</p>
<p><strong>When synonyms are useful?</strong></p>
<p>I used to think that synonyms are a code smell that should be avoided, but over time I found a few valid use cases for when they are useful. One of those use cases are zero downtime migrations.</p>
<p>When you are making changes to a table on a live system, you often need to support both the new and the old version of the application at the same time. This poses a challenge, because each version of the application expects the table to have a different structure.</p>
<p>Take for example a migration to remove a column from a table. While the migration is running, the old version of the application is active, and it expects the column to exist in the table, so you can't simply remove it. One way to deal with this is to release the new version in two stages - the first ignores the field, and the second removes it.</p>
<p>If however, you need to make the change in a single release, you can provide the old version with a view of the table that includes the column, and only then remove it. For that, you can use a "synonym":</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\conninfo</span>
<span class="go">You are now connected to database "db" as user "app".</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go"> username β active</span>
<span class="go">βββββββββββΌββββββββ</span>
<span class="go"> haki β t</span>
</pre></div>
<p>The application is connected to database <code>db</code> with the user <code>app</code>. You want to remove the column <code>active</code>, but the application is using this column. To safely apply the migration you need to "fool" the user <code>app</code> into thinking the column is still there while the old version is active:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\conninfo</span>
<span class="go">You are now connected to database "db" as user "admin".</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">app</span><span class="p">;</span>
<span class="go">CREATE SCHEMA</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">GRANT</span><span class="w"> </span><span class="n">USAGE</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">app</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">app</span><span class="p">;</span>
<span class="go">GRANT</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">VIEW</span><span class="w"> </span><span class="n">app</span><span class="mf">.</span><span class="n">users</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">public</span><span class="mf">.</span><span class="n">users</span><span class="p">;</span>
</span><span class="go">CREATE VIEW</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">GRANT</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">app</span><span class="mf">.</span><span class="n">users</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">app</span><span class="p">;</span>
<span class="go">GRANT</span>
</pre></div>
<p>To "fool" the user <code>app</code>, you created a schema by the name of the user, and a view with a calculated field <code>active</code>. Now, when the application is connected with user <code>app</code>, it will see the view and not the table, so it's safe to remove the column:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\conninfo</span>
<span class="go">You are now connected to database "db" as user "admin".</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">active</span><span class="p">;</span>
</span><span class="go">ALTER TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\connect</span><span class="w"> </span><span class="ss">db</span><span class="w"> </span><span class="ss">app</span>
<span class="go">You are now connected to database "db" as user "app".</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go"> username β active</span>
<span class="go">βββββββββββΌββββββββ</span>
<span class="hll"><span class="go"> haki β t</span>
</span></pre></div>
<p>You dropped the column and the application sees the calculated field instead! All is left is some cleanup and you are done.</p>
<hr>
<h2 id="find-overlapping-ranges"><a class="toclink" href="#find-overlapping-ranges">Find Overlapping Ranges</a></h2>
<p>Say you have a table of meetings:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">meetings</span><span class="p">;</span>
<span class="go"> starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌβββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 10:00:00 β 2021-10-01 10:30:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
<span class="go"> 2021-10-01 12:30:00 β 2021-10-01 12:45:00</span>
</pre></div>
<p><details></p>
<p><summary>β Table data</summary></p>
<p>You can use the following CTE to reproduce the queries in this section:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">starts_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">ends_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-10-01 10:00 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 10:30 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-10-01 12:30 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:45 UTC'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">ends_at</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">meetings</span><span class="p">;</span>
</pre></div>
<p></details></p>
<p>You want to schedule a new meeting, but before you do that, you want to make sure it does not overlap with another meeting. There are several scenarios you need to consider:</p>
<ul>
<li><strong>[A]</strong> New meeting ends after an existing meeting starts</li>
</ul>
<div class="highlight"><pre><span></span>|-------NEW MEETING--------|
|*******EXISTING MEETING*******|
</pre></div>
<ul>
<li><strong>[B]</strong> New meeting starts before an existing meetings ends</li>
</ul>
<div class="highlight"><pre><span></span> |-------NEW MEETING--------|
|*******EXISTING MEETING*******|
</pre></div>
<ul>
<li><strong>[C]</strong> New meeting takes place during an existing meeting</li>
</ul>
<div class="highlight"><pre><span></span> |----NEW MEETING----|
|*******EXISTING MEETING*******|
</pre></div>
<ul>
<li><strong>[D]</strong> Existing meeting takes place while the new meeting is scheduled</li>
</ul>
<div class="highlight"><pre><span></span>|--------NEW MEETING--------|
|**EXISTING MEETING**|
</pre></div>
<ul>
<li><strong>[E]</strong> New meeting is scheduled at exactly the same time as an existing meeting</li>
</ul>
<div class="highlight"><pre><span></span>|--------NEW MEETING--------|
|*****EXISTING MEETING******|
</pre></div>
<p>To test a query that check for overlaps, you can prepare a table with all the scenarios above, and try a simple condition:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">new_meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="n">starts_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">ends_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'C'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'E'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">meetings</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">;</span>
<span class="go"> starts_at β ends_at β id β starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌββββββββββββββββββββββΌβββββΌββββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β A β 2021-10-01 11:10:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β B β 2021-10-01 11:20:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β C β 2021-10-01 11:20:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β E β 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
</pre></div>
<p>The first attempt found an overlap with 4 out of 5 scenarios. It did not detect the overlap for scenario <code>D</code>, where the new meetings starts before and ends after an existing meeting. To handle this scenario as well, you need to make the condition a bit longer:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">new_meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="cm">/* ... */</span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">meetings</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="hll"><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span>
</span><span class="hll"><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">;</span>
</span>
<span class="go"> starts_at β ends_at β id β starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌββββββββββββββββββββββΌβββββΌββββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β A β 2021-10-01 11:10:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β B β 2021-10-01 11:20:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β C β 2021-10-01 11:20:00 β 2021-10-01 11:55:00</span>
<span class="hll"><span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β D β 2021-10-01 11:10:00 β 2021-10-01 12:05:00</span>
</span><span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β E β 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
</pre></div>
<p>The query now detects an overlap in all 5 scenarios, but, consider these additional scenarios:</p>
<ul>
<li><strong>[F]</strong> New meeting is scheduled immediately after an existing meetings</li>
</ul>
<div class="highlight"><pre><span></span> |--------NEW MEETING--------|
|*****EXISTING MEETING******|
</pre></div>
<ul>
<li><strong>[G]</strong> New meeting is scheduled to end immediately when an existing meeting starts</li>
</ul>
<div class="highlight"><pre><span></span>|--------NEW MEETING--------|
|*****EXISTING MEETING******|
</pre></div>
<p>Back-to-back meetings are very common, and they should not be detected as an overlap. Adding the two scenarios to the test, and trying the query:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">new_meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="n">starts_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">ends_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'C'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'E'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">),</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="s1">'F'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:10 UTC'</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="s1">'G'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:00 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">)</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">meetings</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">;</span>
<span class="go"> starts_at β ends_at β id β starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌββββββββββββββββββββββΌβββββΌββββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β A β 2021-10-01 11:10:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β B β 2021-10-01 11:20:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β C β 2021-10-01 11:20:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β D β 2021-10-01 11:10:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β E β 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
<span class="hll"><span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β F β 2021-10-01 12:00:00 β 2021-10-01 12:10:00</span>
</span><span class="hll"><span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β G β 2021-10-01 11:00:00 β 2021-10-01 11:15:00</span>
</span></pre></div>
<p>The two back-to-back meetings, scenarios <code>F</code> and <code>G</code>, are incorrectly classified as overlaps. This is caused because <a href="sql-dos-and-donts#use-between-only-for-inclusive-ranges">the operator <code>BETWEEN</code> in inclusive</a>. To implement this condition without using <code>BETWEEN</code> you would have to do something like this:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">new_meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="cm">/* ... */</span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">meetings</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="p">(</span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">)</span>
<span class="w"> </span><span class="k">OR</span>
<span class="w"> </span><span class="p">(</span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">)</span>
<span class="w"> </span><span class="k">OR</span>
<span class="w"> </span><span class="p">(</span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">)</span>
<span class="w"> </span><span class="k">OR</span>
<span class="w"> </span><span class="p">(</span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">)</span>
<span class="w"> </span><span class="k">OR</span>
<span class="w"> </span><span class="p">(</span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">);</span>
<span class="go"> starts_at β ends_at β id β starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌββββββββββββββββββββββΌβββββΌββββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β A β 2021-10-01 11:10:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β B β 2021-10-01 11:20:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β C β 2021-10-01 11:20:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β D β 2021-10-01 11:10:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β E β 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
</pre></div>
<p>The query correctly identifies scenarios <code>A</code> - <code>E</code> as overlaps, and does not identify the back-to-back scenarios <code>F</code> and <code>G</code> as overlaps. This is what you wanted. However, this condition is pretty crazy! It can easily get out of control.</p>
<p>This is where the following operator in PostgreSQL proves itself as extremely valuable:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">new_meetings</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="n">starts_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">ends_at</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'C'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:20 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:55 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:10 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:05 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'E'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'F'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:00 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 12:10 UTC'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'G'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:00 UTC'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2021-10-01 11:15 UTC'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">ends_at</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">meetings</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span>
<span class="k">WHERE</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">new_meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">)</span>
</span><span class="hll"><span class="w"> </span><span class="k">OVERLAPS</span><span class="w"> </span><span class="p">(</span><span class="n">meetings</span><span class="mf">.</span><span class="n">starts_at</span><span class="p">,</span><span class="w"> </span><span class="n">meetings</span><span class="mf">.</span><span class="n">ends_at</span><span class="p">);</span>
</span>
<span class="go"> starts_at β ends_at β id β starts_at β ends_at</span>
<span class="go">ββββββββββββββββββββββΌββββββββββββββββββββββΌβββββΌββββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β A β 2021-10-01 11:10:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β B β 2021-10-01 11:20:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β C β 2021-10-01 11:20:00 β 2021-10-01 11:55:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β D β 2021-10-01 11:10:00 β 2021-10-01 12:05:00</span>
<span class="go"> 2021-10-01 11:15:00 β 2021-10-01 12:00:00 β E β 2021-10-01 11:15:00 β 2021-10-01 12:00:00</span>
</pre></div>
<p>This is it! Using the <a href="https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TABLE" rel="noopener"><code>OVERLAPS</code> operator</a> you can replace those 5 complicated conditions, and keep the query short and simple to read and understand.</p>
<hr>
<p><em>UPDATES</em></p>
<ul>
<li>
<p>2021-11-09: A <a href="https://www.reddit.com/r/programming/comments/qpj4cy/comment/hjwwqgi/?utm_source=share&utm_medium=web2x&context=3" rel="noopener">commenter on Reddit</a> spotted a mistake in the name of the psql parameter in the "Autocomplete Reserved Words in Uppercase" section. Fixed <code>COMP_KEYWORD_UPPER</code> to <code>COMP_KEYWORD_CASE</code>.</p>
</li>
<li>
<p>2021-11-09: The example for <code>pg_sleep</code> was sleeping for 4 hours (14400 seconds) and not 4 minutes as was previously mentioned in the article. The example was changed to better illustrate the benefit of using an interval with <code>pg_sleep_for</code>.</p>
</li>
</ul>One Database Transaction Too Many2021-06-07T00:00:00+03:002021-06-07T00:00:00+03:00Haki Benitatag:hakibenita.com,2021-06-07:/django-nested-transaction<p>A story about how I ended up sending hundreds of users messages saying they got paid when they didn't! In the process we've learned a valuable lesson about nested transactions and Django signals.</p><hr>
<p>Have you ever wondered how bugs are born? I'm not talking about the trivial kind you can catch with simple unit testing. I'm talking about bugs that may not be apparent on first sight, but are so obvious in retrospect.</p>
<p><strong>This is a story about how I accidentally sent hundreds of users messages they got paid when they didn't!</strong></p>
<figure><img alt="What it feels like when you realized you made a mistake<br><small>Illustration by <a href="https://www.abstrakt.design/">Milica Vezmar Basara</a></small>" src="https://hakibenita.com/images/abstrakt-design-halloween-2020-22.png"><figcaption>What it feels like when you realized you made a mistake<br><small>Illustration by <a href="https://www.abstrakt.design/">Milica Vezmar Basara</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#the-story">The Story</a><ul>
<li><a href="#creating-a-payout">Creating a Payout</a></li>
<li><a href="#sending-notifications">Sending Notifications</a></li>
<li><a href="#working-in-bulk">Working in Bulk</a></li>
<li><a href="#the-bug">The Bug</a></li>
<li><a href="#nested-transactions">Nested Transactions</a></li>
</ul>
</li>
<li><a href="#remedies">Remedies</a><ul>
<li><a href="#assert-atomic-block">Assert Atomic Block</a></li>
<li><a href="#durable-transaction">Durable Transaction</a></li>
<li><a href="#sending-signal-on-commit">Sending Signal on Commit</a></li>
<li><a href="#using-a-queue">Using a Queue</a></li>
</ul>
</li>
<li><a href="#testing">Testing</a><ul>
<li><a href="#testing-with-django">Testing with Django</a></li>
<li><a href="#testing-with-pytest">Testing with Pytest</a></li>
</ul>
</li>
<li><a href="#thoughts-on-django-signals">Thoughts on Django Signals</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="the-story"><a class="toclink" href="#the-story">The Story</a></h2>
<p>We have a process in the system where we pay out money to merchants and other types of users. The payout process is a pretty big deal for most users because this is how they get paid.</p>
<h3 id="creating-a-payout"><a class="toclink" href="#creating-a-payout">Creating a Payout</a></h3>
<p>To facilitate the payout process we have a Django model called <code>PayoutProcess</code>. To create a new payout we use a function that looks roughly like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">annotations</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">model</span><span class="p">,</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="k">class</span> <span class="nc">PayoutProcess</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1">#... fields</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">to</span><span class="p">:</span> <span class="n">User</span><span class="p">,</span> <span class="n">amount</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">PayoutProcess</span><span class="p">:</span>
<span class="c1"># ... Validate input ...</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
<span class="n">to</span><span class="o">=</span><span class="n">user</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<span class="n">status</span><span class="o">=</span><span class="s1">'pending'</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Create related objects etc...</span>
<span class="k">return</span> <span class="n">payout</span>
</pre></div>
<p>The simplified version of this function creates a new instance of a payout process and returns it. In real life this function validates the input and creates several related objects. To make sure all of the related objects are created along with the payout process instance, we use a database transaction.</p>
<p>The new instance now represents a payout process in the system, and the payout module is responsible for completing the payout. A payout can be fulfilled in many different ways, such as by bank transfer, credit card and other methods. Not all payout methods are immediate, so <strong>a payout is an asynchronous process that can take some time to complete</strong>.</p>
<p>When the payout is eventually paid, the module updates the instance status:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">PayoutProcess</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">mark_paid</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">pk</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">PayoutProcess</span><span class="p">:</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">return</span> <span class="n">payout</span>
</pre></div>
<p>The function fetches the payout, checks its state and marks it as paid. So far so good!</p>
<h3 id="sending-notifications"><a class="toclink" href="#sending-notifications">Sending Notifications</a></h3>
<p>At some point our staffers came to us with an idea. They said it would be nice if the system would notify users that they got paid. We thought that this is a great idea! Who doesn't like to get a message saying they got some $$$?</p>
<p>The payout module is a core module in our system. We have payouts going out to different types of users and top level apps use it to create payouts in different contexts. For example, one app sends out commission payments to merchants, and another issues payments to business partners.</p>
<p>To keep the payout module independent and decoupled from the apps that use it, the top level apps are the ones sending the notification to the users. The problem is that after the top level app created the payout process, the payout module handles the actual payment internally, and the top level app has no way of knowing about it unless it constantly monitors its status.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 267.8984092783621 421.3308681218084" width="267.8984092783621" height="421.3308681218084">
<g stroke-linecap="round"><g transform="translate(65.9251034428728 204.6887627435733) rotate(35.06792711455562 59.703688558597264 -1.5004384097272236)" fill-rule="evenodd"><path d="M-0.05 -1.42 L43.49 52.09 L120.45 -11.36 L88.65 -52.85 L-1.83 -5.86" stroke="none" stroke-width="0" fill="#daaeff44" fill-rule="evenodd"></path><path d="M0.42 0.15 C11.25 12.21, 24.27 28.04, 43.47 49.28 M0.72 0.44 C9.71 10.64, 20.01 22.68, 43.98 51.48 M43.23 51.94 C72.86 27.19, 99.56 4.62, 120.68 -10.79 M43.64 50.6 C74.62 25.73, 104.61 0.73, 118.95 -12.57 M121.29 -11.96 C106.87 -25.15, 95.93 -39.37, 86.76 -53.72 M119.82 -11.4 C108.58 -28.01, 96.67 -41.63, 88.17 -52.76 M89.24 -54.94 C60.04 -39.69, 26.15 -22.3, -1.88 -7.64 M88.49 -53.05 C69.15 -42.84, 51.66 -33.6, -1.01 -5.86 M-0.13 -6.78 C-0.53 -4.55, -0.52 -3.53, 0.58 -0.51 M-0.72 -6.75 C-0.66 -4.88, 0.02 -3.27, 0.23 0.23" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(94.78198525753078 189.77899088367008) rotate(0 33.5 13)"><text x="33.5" y="18" font-size="20px" fill="currentColor" text-anchor="middle" >Payout</text></g><g stroke-linecap="round"><g transform="translate(19.71055668610188 14.417519022198235) rotate(0 37.05220775927103 31.470392396558566)" fill-rule="evenodd"><path d="M-1.98 0.55 L70.28 2.77 L82.36 62.57 L-8.44 60.18 L1.16 0.08" stroke="none" stroke-width="0" fill="#f41d9222" fill-rule="evenodd"></path><path d="M0.57 0.73 C23.26 0.68, 49.55 -0.12, 69.59 -0.08 M-0.75 0.91 C25.93 -0.01, 52.26 1.25, 70.96 1.61 M68.77 0.04 C72.46 16.59, 75.09 30.25, 81.89 61.39 M69.88 1.37 C74.6 20.01, 77.03 40.55, 82.93 61.01 M83.81 63.12 C58.02 64.29, 28.68 61.88, -9.4 63.45 M81.31 61.18 C60.22 61.62, 37.7 62.09, -9.6 62.59 M-9.71 61.72 C-6.2 48.2, -4.98 35.99, -0.06 1.37 M-8.92 62.21 C-5.78 45.21, -3.6 29.31, -0.53 -0.51 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(163.13047010601576 17.009510364189737) rotate(0 47.01278034673828 27.617796965927198)" fill-rule="evenodd"><path d="M0.47 -0.63 L95.57 -4.95 L81.08 63.8 L4.15 57.68 L-0.51 0.01" stroke="none" stroke-width="0" fill="#f41d9222" fill-rule="evenodd"></path><path d="M1.48 -1.05 C33.45 -1.56, 67.82 -5.86, 91.84 -3.46 M-0.74 0.65 C36.52 -1.58, 75.37 -4.38, 92.82 -5.79 M94.77 -7.01 C90.17 14.9, 86.8 39.33, 83.72 62.25 M94.22 -5.25 C89.7 16.58, 86.16 39.68, 81.98 61.68 M81.42 60.52 C59.39 59.3, 37.18 58.13, 5.12 58.66 M81.98 60.98 C59.41 61.44, 37.76 60.63, 3.29 58.83 M3.69 60.09 C2.68 47.43, 1.87 31.84, 1.72 0.61 M3.45 58.49 C2.87 40.82, 2.51 22.56, 0.58 -0.72 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(19.163298388844396 35.55676866144802) rotate(0 39 10.5)"><text x="39" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Merchants</text></g><g transform="translate(174.52693475248088 34.46585957053901) rotate(0 30.5 10.5)"><text x="30.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Refunds</text></g><g stroke-linecap="round"><g transform="translate(59.57238929793516 75.46585957053901) rotate(0 17.979516149667916 45.769515162418486)"><path d="M0.79 1.42 C9.03 20.38, 14.5 37.98, 35.75 90.77 M0.21 0.77 C6.49 18.94, 14.48 37.32, 35.5 89.51" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(59.57238929793516 75.46585957053901) rotate(0 17.979516149667916 45.769515162418486)"><path d="M16.2 68.66 C20.74 73.15, 22.79 76.4, 35.2 90.81 M15.61 68.01 C19.12 72.43, 23.82 76.95, 34.95 89.55" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(59.57238929793516 75.46585957053901) rotate(0 17.979516149667916 45.769515162418486)"><path d="M35.21 60.95 C35.73 66.94, 33.73 71.84, 35.2 90.81 M34.63 60.3 C34.16 66.42, 34.94 72.54, 34.95 89.55" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(207.2023819182773 77.29418156135) rotate(0 -31.74047533939165 45.15799520164728)"><path d="M0.41 1.54 C-14.15 20.5, -25.98 39.07, -63.89 90.76 M0.36 -0.44 C-22.99 32.22, -46.65 66.13, -62.22 90.01" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(207.2023819182773 77.29418156135) rotate(0 -31.74047533939165 45.15799520164728)"><path d="M-54.67 62.41 C-57.3 68.35, -57.65 74.23, -63.34 90.09 M-54.73 60.43 C-57.64 70.59, -61.01 82.06, -61.67 89.34" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(207.2023819182773 77.29418156135) rotate(0 -31.74047533939165 45.15799520164728)"><path d="M-37.61 73.82 C-43.67 77.51, -47.54 81.03, -63.34 90.09 M-37.67 71.83 C-46.78 77.85, -56.37 85.17, -61.67 89.34" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round" transform="translate(44.123382014298386 332.3281237465423) rotate(0 44.994461829091165 36.77409946036036)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.04 5.72 C1.03 4.87, 3.13 2.47, 4.37 1.1 M-0.55 6.38 C1.49 4.37, 3.21 2.25, 5.34 0.71 M0.95 12.49 C3.65 8.31, 5.32 7.35, 9.83 -0.18 M-0.34 11.51 C3.57 9.14, 6.68 5.42, 10.18 -0.68 M0.79 17.78 C5.11 12.35, 10.44 3.96, 16.64 -0.37 M0.19 18.8 C4.61 11.79, 11.85 5.14, 16.16 0.95 M0.24 25.33 C7.02 17.47, 10.71 12.07, 22.75 -0.82 M-0.29 24.39 C5.5 18.66, 9.63 13.87, 20.66 -0.76 M1.88 28.5 C5.69 24.83, 11.04 15.99, 26.93 -1.06 M-0.12 30.16 C6.46 22.92, 14.02 14.36, 26.5 0.44 M0.36 35.29 C9.35 23.26, 21.06 12.2, 32.2 0.23 M0 36.12 C10.63 25.2, 19.22 14.19, 32.27 -0.19 M-1.23 42.58 C9.88 29.73, 19.8 20.21, 38.5 -0.79 M0.89 41.63 C10.23 31.9, 18.68 21.79, 36.87 0.48 M0.78 50.29 C16.03 32.64, 27.68 16.97, 42.08 0.61 M0.83 48.69 C14.78 31.6, 29.01 14.48, 42.7 0.57 M1.34 55.94 C16.36 38.84, 31.08 20.87, 48.37 -0.29 M0.13 53.78 C11.62 42.86, 22.17 29.28, 48.36 0.22 M-0.41 60.59 C13.36 46.05, 23.48 32.31, 51.32 0.15 M-0.24 60.89 C15.08 42.29, 33.12 22.85, 53.82 0.64 M1.57 67.43 C12.6 52.28, 22.7 40.33, 58.35 -0.13 M-0.05 67.36 C18.08 45.7, 35.24 24.98, 59.13 -0.78 M0.37 71.9 C26.1 46.15, 49.49 17.17, 65.64 0.9 M0.09 73.16 C24.13 43.41, 50.57 13.55, 63.61 0.54 M2.9 76.82 C26.02 47.68, 50.79 19.85, 67.07 -0.67 M3.38 77.23 C20.56 55.68, 39.26 33.56, 68.48 -0.53 M7.38 77.03 C32.28 46.72, 55.45 21.17, 73.58 -0.91 M8.08 75.29 C27.38 53.52, 47.73 29.92, 74.12 0.2 M12.2 77.29 C32.21 57.26, 48.09 34.92, 77.99 1.91 M12.21 76.92 C33.8 51.75, 55.07 27.05, 78.74 -0.66 M20.56 76.09 C42.09 49.92, 63.87 23.91, 86.48 1.13 M17.97 76.58 C38.7 54.03, 58.32 30.61, 85.53 -1.05 M22.71 74.56 C43.89 53.87, 67.51 29.39, 91.37 -0.38 M23.84 75.69 C45.2 52.18, 65.12 27.5, 89.39 -0.39 M31.3 74.35 C49.34 54.35, 65.46 34.88, 88.62 6.41 M29.72 75.7 C49.47 52.63, 70.92 27.08, 90.18 6.31 M34.33 75.49 C48.05 58.82, 60.33 47.02, 89.77 12.72 M33.66 77.15 C51.89 56.31, 70.24 34.92, 90.47 11.56 M38.42 74.27 C50.69 63.3, 62.63 46.95, 90.28 20.2 M41.01 76.72 C50.98 61.13, 63.79 48.2, 89.73 18.93 M45.79 77.12 C57.75 63.58, 68.45 51.36, 89.57 25.52 M45.14 76.32 C58.14 60.42, 72.61 45.58, 90.13 24.71 M49.56 74.47 C63.71 58.51, 77.99 42.76, 91.85 30.21 M50.6 76.11 C65.59 59.58, 80.52 41.55, 89.21 31.07 M54.64 74.66 C63.86 67.3, 75.13 53.48, 88.83 35.34 M55.84 75.63 C67.69 62, 80.97 47.52, 90.44 37.81 M61.15 77.78 C67.9 65.43, 78.92 57.94, 91.92 43.6 M60.35 76.76 C68.08 66.67, 75.19 58.29, 91.16 41.69 M65.23 77.54 C73.3 67.7, 83.61 54.17, 88.35 49.44 M65.42 76.91 C71.89 69.68, 75.25 64.62, 90.56 49.77 M70.46 75.19 C75.78 72.73, 79.24 65.74, 91.91 53.71 M71.23 75.95 C78.63 67.79, 85.82 61.54, 90.2 55.16 M76.72 77.35 C80.59 73.57, 84.66 66.3, 91.81 61.51 M77.64 75.39 C79.98 73.8, 81.72 70.6, 90.74 60.15 M83.11 74.73 C85.97 73.53, 87.98 69.53, 90.67 68.21 M82.96 75.54 C83.82 73.99, 86.24 72.25, 90.12 67.74 M87.45 76.06 C88.4 75.4, 88.75 74.17, 89.8 73 M87.71 76.03 C88.4 75.08, 89.16 74.33, 90.34 72.9" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.95 -1 C18.71 -2.39, 35.89 -0.78, 88.7 -0.34 M-0.76 0.01 C27.86 0.74, 56.89 0.34, 89.32 -0.21 M90.61 1.19 C90.38 29.13, 89.11 56.32, 89.83 74.86 M89.88 0.28 C90.07 29.12, 90.85 56.4, 89.23 73.31 M90.46 73.84 C62.7 73.32, 30.79 71.76, 0.68 71.77 M90.68 74.35 C63.49 74.14, 38.68 74.49, -0.04 74.17 M1.12 72.92 C-0.16 56.06, 1.37 35.66, 1.58 -0.58 M0 74.28 C0.46 51.48, -0.87 29.29, 0.39 0.57" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(70.11784384338966 358.6022232069027) rotate(0 19 10.5)"><text x="19" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Bank</text></g><g stroke-linecap="round" transform="translate(160.03247292338915 337.7826692010877) rotate(0 44.994461829091165 36.77409946036036)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.57 6.81 C0.85 5.19, 2.22 3.97, 5.24 0.65 M-0.49 6.62 C1.15 4.94, 2.07 3.86, 4.98 0.61 M1.43 13.23 C3.13 6.4, 6.3 2.27, 9.74 0.46 M-0.38 11.56 C3.68 6.65, 7.65 3.4, 11.38 -0.83 M-1.59 19.12 C3.06 12.3, 7.86 8.25, 15.36 -0.31 M-0.6 18.69 C3.91 14.15, 8.13 10.26, 15.93 -0.47 M0.19 24.39 C6.2 17.92, 9.47 9.72, 21.77 -1.58 M0.07 24.46 C7.97 15.06, 16.52 6.18, 20.58 -0.04 M-0.13 32.1 C7.6 19.58, 15.78 12, 27.75 -1.5 M-0.23 30.47 C10.79 18.56, 20.71 6.4, 26.7 0.98 M0.2 36.44 C13.26 23.23, 27.2 7.78, 33.81 -0.56 M-0.51 37.05 C11.89 23.86, 21.98 11.46, 32.42 0.59 M-0.19 43.38 C13.02 25.12, 27.26 11.44, 36.59 -1.31 M0.09 43.37 C12.45 27.14, 25.94 12.73, 36.67 0.07 M-0.85 47.09 C16.59 30.57, 33.08 13.19, 42.03 -0.28 M-0.68 49.59 C9.49 37.32, 17.18 27.37, 43.43 -0.1 M0.9 54.77 C15.44 37.7, 28.89 22.2, 47.91 0.22 M-0.44 55.56 C12.89 40.58, 27.28 23.85, 47.62 -0.33 M0.65 62.82 C20.33 40.41, 40.02 17.53, 53.05 0.14 M-0.43 60.78 C11.35 49.29, 21.42 37.29, 53.35 0.05 M-0.66 65.66 C15.08 48.53, 33.92 30.45, 59.89 1.89 M0.15 67.07 C16.12 47.75, 34.13 26.18, 57.75 0.71 M1.15 72.16 C17.39 53.53, 33.96 33.92, 63.64 -1.6 M-0.71 72.71 C13.63 57.62, 26.76 43.42, 63.31 0.26 M2.09 74.98 C29.28 47.37, 52.66 16.13, 66.91 1.3 M1.81 75.72 C16.59 60.78, 32.38 43.86, 69.66 0.22 M9.15 77.43 C31.65 50.44, 52.43 23.79, 72.8 1.01 M7.83 76.48 C27.94 52.12, 49.64 27.8, 73.88 -1.11 M14.98 77.68 C28.24 57.54, 45.56 36.91, 79.55 -1.13 M12.56 75.71 C35.81 50.56, 58.66 23.52, 80.35 -0.66 M20.22 74.8 C41.86 51.01, 65.49 21.94, 85.67 -0.72 M17.83 76.37 C33.05 58.61, 47.14 42.56, 85.01 -0.67 M25.54 75.92 C43.96 55.37, 63.19 34.46, 88.55 1.83 M23.35 76.62 C43.44 51.35, 65.15 28.02, 89.22 -0.72 M28.74 76.3 C44.43 59.15, 56.85 46.28, 91.7 8.15 M28.95 75.75 C44.71 58.7, 58.83 42.33, 90.15 6.92 M35.38 75.56 C56.68 51.74, 75.15 28.64, 90.38 12.28 M34.43 76.11 C49.41 60.12, 63.97 42.89, 89.6 11.44 M39.91 77.7 C52.61 63.13, 64.19 46.51, 89.76 19.05 M39.88 75.06 C53.65 60.26, 66.98 44.07, 89.39 17.75 M44.29 75.3 C53.41 65.48, 64.19 52.86, 90.65 23 M45.45 76.91 C55.53 64.7, 64.3 53.42, 90.22 25.15 M51.11 77.6 C60.88 64.23, 68.37 55.58, 88.14 32.46 M49.71 76.84 C60.13 65.36, 71.58 52.24, 89.5 31.25 M54.24 74.65 C62.8 66.34, 68.3 61.08, 88.04 38.43 M55.45 76.86 C68.91 61.54, 79.57 48.77, 89.03 36.67 M59.91 77.43 C68.74 66.98, 77.59 56.96, 88.78 42.95 M61.96 75.72 C71.18 64.94, 80.57 53.79, 90.07 43.21 M65.39 77.94 C73.43 66.36, 80.2 58.27, 91.35 47.52 M66.89 76.88 C73.16 67.95, 81.09 58.32, 90.4 48.08 M72.06 77.4 C79.69 66.92, 83 59.78, 91.76 53.9 M72.53 76.41 C76.27 71.2, 80.52 66.69, 90.07 54.54 M76.09 74.3 C80.75 71.59, 87.24 67.94, 91.63 60.31 M77.64 76.98 C80.39 70.55, 85.96 66.37, 89.11 61.84 M81.99 76.19 C85.13 73.57, 86.21 72.13, 90.29 66.79 M82.11 75.74 C84.83 73.67, 86.53 71.46, 89.49 67.73 M87.41 76.42 C88.36 74.56, 89.63 73.36, 89.89 73.05 M87.43 76.14 C88.21 75.31, 88.61 74.64, 90.34 73.21" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.82 0.68 C33.13 0.21, 68.72 1.36, 91.36 1.61 M0.55 -0.04 C20.59 -0.56, 40.2 0.73, 90.55 -0.31 M89.66 1.58 C89.89 22.17, 90.27 45.35, 89.99 75.01 M89.52 0.39 C90.43 24.95, 90.64 49.46, 89.3 73.16 M90.34 73.11 C54.39 72.58, 20.89 72.85, 0.26 71.77 M89.96 74.38 C66.99 74.43, 44.89 74.71, 0.37 72.7 M-0.36 72 C1.51 55.15, 0.75 36.97, -1.46 -0.07 M-0.9 74.44 C1.11 48.31, 1.14 24.47, 0.51 0.28" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(179.5269347524802 353.5567686614481) rotate(0 25.5 21)"><text x="25.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Credit</text><text x="25.5" y="36" font-size="16px" fill="currentColor" text-anchor="middle" >Card</text></g><g transform="translate(97.07238929793516 87.96585957053901) rotate(0 32.5 21)"><text x="32.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Create </text><text x="32.5" y="36" font-size="16px" fill="currentColor" text-anchor="middle" >Payout</text></g><g stroke-linecap="round"><g transform="translate(139.44000617548727 241.06440285847123) rotate(0 28.42418093826973 45.024129553697975)"><path d="M-3.49 -2.16 C22.23 31, 40.66 58.74, 60.34 92.21 M-1.25 0.92 C18.19 28.25, 37.6 58.49, 60.34 91.59" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(139.44000617548727 241.06440285847123) rotate(0 28.42418093826973 45.024129553697975)"><path d="M32.54 71.84 C47.19 82.24, 54.42 87.6, 59.4 92.37 M34.79 74.93 C43.21 78.14, 50.52 84.2, 59.4 91.75" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(139.44000617548727 241.06440285847123) rotate(0 28.42418093826973 45.024129553697975)"><path d="M49.53 60.34 C59.01 74.04, 61.09 82.88, 59.4 92.37 M51.78 63.42 C54.47 70.42, 56.22 80.25, 59.4 91.75" stroke="#495057" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(139.11175452857128 237.6194530633179) rotate(0 -26.606720757402172 45.86629873359206)"><path d="M0.39 0.55 C-21.68 36.52, -42.26 73.8, -53.6 91.48 M0.08 -0.56 C-14 23.22, -27.07 45.26, -53.34 92.29" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(139.11175452857128 237.6194530633179) rotate(0 -26.606720757402172 45.86629873359206)"><path d="M-48.04 63.24 C-51.64 74.44, -53.33 86.82, -53.85 90.7 M-48.35 62.14 C-50.37 70.78, -51.78 77.46, -53.58 91.51" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(139.11175452857128 237.6194530633179) rotate(0 -26.606720757402172 45.86629873359206)"><path d="M-30.16 73.32 C-40.84 80.52, -49.54 88.95, -53.85 90.7 M-30.47 72.21 C-36.75 78.33, -42.49 82.58, -53.58 91.51" stroke="#495057" stroke-width="1" fill="none"></path></g></g><g transform="translate(126.07238929793516 275.3749504796299) rotate(0 13.5 10.5)"><text x="13.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Pay</text></g></svg>
<figcaption>Top level app creates a payout</figcaption></p>
</figure>
<p>To have the top level app respond to changes in the payout module, we need to have a mechanism to let the top level app know that something had changed. The tricky part is that the top level apps depend on the payout module, so the payout module cannot depend back on them. It will cause a circular dependency.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 400.1237576753888 420.9203548399862" width="400.1237576753888" height="420.9203548399862">
<g stroke-linecap="round"><g transform="translate(65.26044666035023 204.2782494617511) rotate(35.06792711455562 60.22152623642637 -1.1676249167510946)" fill-rule="evenodd"><path d="M-0.61 -0.37 L45.61 51.41 L120.57 -13.49 L89.58 -54.47 L-1.2 -5.9" stroke="none" stroke-width="0" fill="#daaeff44" fill-rule="evenodd"></path><path d="M-1.07 -0.04 C14.32 16.35, 26.23 33.07, 42.72 50.18 M-0.01 0.69 C11.28 12.43, 21.87 25.36, 44.33 50.97 M43.62 52.17 C63.36 34.56, 83.91 16.86, 121.69 -11.08 M44.47 51.23 C72.04 28.57, 99.94 4.58, 120.88 -11.46 M118.63 -12.84 C110.92 -21.72, 105.84 -34.36, 89.87 -54.09 M120.71 -12.9 C109.19 -25.49, 97.29 -39.98, 88.69 -54.51 M87.79 -53.25 C64.65 -41.12, 39.49 -29.72, -0.76 -6.11 M88.26 -54.21 C56.08 -38, 25.12 -19.83, -0.74 -6.82 M-1.25 -6.83 C-0.77 -5.36, -0.2 -3.76, -0.09 -0.2 M-1.01 -6.65 C-0.48 -5.43, -0.61 -3.92, -0.23 0.01" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(94.11732847500798 189.36847760184787) rotate(0 33.5 13)"><text x="33.5" y="18" font-size="20px" fill="currentColor" text-anchor="middle" >Payout</text></g><g stroke-linecap="round"><g transform="translate(19.045899903579084 14.007005740375917) rotate(0 37.13987370549398 30.82963782247333)" fill-rule="evenodd"><path d="M1.95 0.27 L70.67 -0.13 L83.52 61.01 L-9.11 62.62 L0.91 -1.5" stroke="none" stroke-width="0" fill="#f41d9222" fill-rule="evenodd"></path><path d="M1.06 0.3 C13.87 1.63, 27.53 -0.52, 69 1.11 M0.35 -0.65 C16.44 -0.75, 33.53 -0.16, 69.64 0.93 M69.46 1.72 C74.18 22.7, 80.46 46.15, 83.33 62.92 M70.87 0.94 C73.92 12.69, 76.45 27.06, 81.54 61.16 M80.75 62.73 C52.63 60.28, 21.72 61.22, -7.79 63.17 M82.53 62.85 C51.64 62.09, 22.7 60.5, -9.05 61.37 M-8.32 62.91 C-8.12 49.32, -5.07 37.37, 1.15 -1.52 M-8.59 61.24 C-5.74 47.19, -4.43 30.88, -0.56 -0.19 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(162.46581332349297 16.598997082367532) rotate(0 46.38528032770205 28.47783512720497)" fill-rule="evenodd"><path d="M0.27 0.67 L92.43 -3.8 L81.13 61.48 L3.72 60.2 L-1.5 -0.1" stroke="none" stroke-width="0" fill="#f41d9222" fill-rule="evenodd"></path><path d="M0.3 -0.36 C24.5 -2.73, 45.24 -4.11, 93.68 -6.6 M-0.65 0.01 C27.41 -1.23, 55.73 -3.84, 93.5 -4.68 M94.29 -5.45 C90.24 16.53, 88.25 36.1, 83.04 60.04 M93.51 -4.36 C88.3 21.43, 85.66 45.61, 81.28 62.33 M82.85 63.55 C61.96 62.57, 44.2 61.07, 4.28 59.9 M82.97 61.49 C52.7 61.92, 21.03 59.96, 2.47 58.31 M4.01 58.02 C1.96 42.4, 1.63 26.62, -1.52 -1.02 M2.34 60.19 C3.06 40.49, 1.19 21.03, -0.19 0.2 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(18.498641606322053 35.14625537962593) rotate(0 39 10.5)"><text x="39" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Merchants</text></g><g transform="translate(173.86227796995763 34.05534628871692) rotate(0 30.5 10.5)"><text x="30.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Refunds</text></g><g stroke-linecap="round"><g transform="translate(58.90773251541259 75.05534628871692) rotate(0 18.258686520328638 44.60247732614249)"><path d="M-0.93 -0.97 C10.21 27.98, 21.26 56.11, 37.44 90.17 M0.68 -0.17 C8.12 21.04, 17.55 42.17, 36.57 89.2" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(58.90773251541259 75.05534628871692) rotate(0 18.258686520328638 44.60247732614249)"><path d="M15.48 66.03 C21.72 73.89, 27.77 80.82, 37.96 89.9 M17.09 66.82 C20.79 72.17, 26.49 77.51, 37.09 88.93" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(58.90773251541259 75.05534628871692) rotate(0 18.258686520328638 44.60247732614249)"><path d="M34.47 58.26 C34.72 68.71, 34.78 78.09, 37.96 89.9 M36.08 59.05 C35.32 66.13, 36.57 73.29, 37.09 88.93" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(206.5377251357545 76.88366827952768) rotate(0 -31.61287960367781 45.005114175379276)"><path d="M-0.97 -1 C-17.03 23.98, -34.8 52.25, -62.09 89.37 M-0.17 -0.36 C-19.54 28.37, -38.34 55.61, -63.05 91.01" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(206.5377251357545 76.88366827952768) rotate(0 -31.61287960367781 45.005114175379276)"><path d="M-56.34 61.01 C-57.29 68.84, -59.53 79.72, -62.36 89.7 M-55.54 61.65 C-57.98 71.41, -60.17 80.06, -63.32 91.33" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(206.5377251357545 76.88366827952768) rotate(0 -31.61287960367781 45.005114175379276)"><path d="M-39.49 72.74 C-45.28 77.18, -52.27 84.75, -62.36 89.7 M-38.7 73.38 C-46.15 79.72, -53.42 84.83, -63.32 91.33" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round" transform="translate(43.45872523177604 331.91761046472016) rotate(0 44.99446182909128 36.77409946036039)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.86 6.36 C1.31 5.12, 1.91 3.25, 5.42 0.37 M-0.05 6.01 C0.99 4.92, 2.78 2.76, 4.68 0.74 M-1.32 11.02 C3.96 8.98, 6.78 4.82, 11.67 0.34 M-0.42 12.65 C2.89 9.37, 4.18 7.2, 10.56 -0.38 M-0.2 18.4 C4.39 11.99, 9.33 9.33, 16.02 -0.05 M0.45 19.34 C4.63 12.33, 10.82 6.52, 16.24 1.03 M-0.02 23.66 C6.83 15.22, 11.71 10.57, 21.23 -0.36 M0.33 23.17 C6.95 16.23, 12.76 9.23, 20.97 0.81 M0.82 32.02 C6.81 23.88, 12.79 16.65, 26.22 2.08 M0.5 30.22 C9.06 19.42, 19.76 7.94, 26.32 -0.7 M-0.3 38.59 C9.03 25.38, 21.02 10.19, 32.84 1.61 M-0.59 35.97 C9.19 25.39, 19.92 13.48, 30.9 -0.73 M1.51 42.23 C9.29 30.72, 17.49 20.4, 34.99 2.2 M0.4 42.82 C8.2 34.2, 14.14 25.46, 37.11 0.97 M-1.12 47.71 C13.88 35.53, 24.76 21.04, 42 -0.66 M-0.23 49.39 C11.12 37.02, 20.52 23.17, 42.28 -0.82 M-1.14 56.29 C13.74 35.14, 30.64 21.02, 47.51 1.91 M1.23 54.31 C17.58 35.05, 32.79 17.83, 47.36 0.27 M-0.21 59.63 C17.43 43.87, 30.77 24.11, 52.81 -1.97 M0.63 61.31 C17.01 41.42, 33.62 22.8, 52.15 -0.75 M-1.96 66.39 C18.82 48.43, 36.87 24.37, 59.75 1 M0.5 67.01 C17.91 46.02, 38.23 24.06, 57.93 0.85 M0.16 72.38 C15.32 54.6, 32.68 33.99, 62.12 -0.93 M-0.02 72.25 C15.85 55.17, 32.5 36.88, 64.68 0.57 M1.63 75.65 C18.59 55.44, 36.63 36.44, 70.51 1.44 M3.43 76.37 C23.24 52.62, 44.09 29.5, 69.75 -0.24 M8.39 77.14 C29.63 47.62, 55.36 21.91, 75.66 0.02 M7.74 75.07 C27.22 52.92, 48.35 28.71, 75.08 0.64 M13.66 74.59 C26.45 60.57, 43.07 40.93, 81.38 -0.61 M12.97 75.48 C35.61 50.53, 57.29 25.2, 79.86 -0.55 M18.59 74.35 C35.55 53.1, 55.15 35.18, 83.91 -1.37 M18.83 75.15 C39.35 52, 60.53 28.27, 85.71 -0.62 M22.51 76.28 C42.36 52.69, 62.08 31.19, 88.78 1.21 M24.71 76.18 C39.65 57.26, 57.37 39.03, 89.38 -0.22 M30.3 75.63 C43.66 61.18, 57.13 47.03, 88.3 7.65 M28.88 76.25 C42.78 60.59, 54.22 45.91, 90.06 5.9 M33.27 76.92 C54.69 53.13, 75.27 32.54, 89.45 11.52 M35.19 75.67 C53.75 54.49, 73.67 31.61, 91.05 12.12 M40.51 75.78 C55.1 59.76, 72.49 41.85, 90.28 19.7 M40.23 76.5 C54.29 58.58, 67.94 43.3, 89.99 19.4 M43.8 76.88 C58.27 63.06, 70.3 49.09, 91.91 24.95 M44.05 75.55 C62.42 57.13, 78.44 38.46, 90.67 25.05 M48.99 75.76 C65.63 60.11, 76.69 44.24, 91.38 31.81 M51.33 75.46 C65.97 58.11, 81.16 41.1, 89.69 29.75 M56.82 76.74 C69.74 61.53, 78.84 46.24, 88.37 35.18 M56.04 75.81 C64.52 63.61, 74.92 53.65, 88.99 37.41 M62.35 74.74 C69.06 68.22, 74.57 56.71, 89.36 42.48 M60.7 76.09 C69.07 65.51, 77.36 56.72, 89.86 42.46 M67.82 77.25 C73.9 65.96, 80.5 59.88, 91.8 47.8 M66.99 76.87 C74.22 67.16, 81.94 58.66, 90.84 49.18 M70.01 74.08 C75.77 69.86, 82.23 61.66, 90.8 54.43 M70.99 76.76 C77.22 69.51, 82.84 62.62, 90.32 55.48 M77.13 75.35 C80.56 70.33, 82.94 68.83, 92.03 60.85 M77.82 75.54 C80.73 71.32, 85.74 65.19, 89.16 62.01 M83.46 76.6 C85.3 73.07, 85.8 72.67, 90.13 67.35 M82.75 75.45 C84.72 73.69, 86.57 71.08, 89.51 67.71 M87.21 76.12 C88.65 74.59, 89.37 73.42, 90.2 73.23 M87.35 76.29 C88.41 75.23, 89.06 74.42, 90.06 72.91" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1 0.04 C32 0.41, 67.99 1.13, 88.68 0.03 M-0.36 -0.14 C22.94 0.14, 45.17 -0.12, 90.31 -0.08 M91.29 1.01 C88.79 22.67, 90.81 43.82, 89.73 75.42 M89.51 -0.74 C91.13 14.39, 90.43 29.41, 90.4 74.37 M90.89 74.7 C69.97 74.93, 47.51 74.33, 1.89 72.71 M89.63 72.89 C55.72 72.56, 23.59 72.9, 0.44 72.91 M1.15 72.03 C-1.03 52.48, -1.26 30.89, -1.57 1.8 M-0.56 73.36 C0.17 46.7, -0.54 21.03, -0.69 0.33" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(69.4531870608671 358.19170992508043) rotate(0 19 10.5)"><text x="19" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Bank</text></g><g stroke-linecap="round" transform="translate(159.36781614086658 337.3721559192654) rotate(0 44.99446182909105 36.77409946036039)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.14 6.83 C0.85 4.56, 3.84 2.24, 4.81 0.29 M-0.47 6.64 C1.87 3.87, 3.49 2.12, 5.02 0.53 M0.76 12.61 C2.58 9.79, 5.55 7.57, 9.12 1.35 M0.84 11.78 C2.55 9.15, 5.14 5.88, 10.27 -0.29 M0.31 16.67 C6.49 12.59, 10.79 5.98, 16.51 1.34 M-0.71 18.59 C4.77 13.32, 10.05 7.88, 15.55 1.26 M0.9 22.2 C4.93 18.62, 10.8 11.72, 21.53 0.96 M0.49 23.82 C6.21 17.1, 12.99 10.17, 20.62 -0.57 M0.45 30.99 C7.7 21.96, 15.03 14.61, 27.62 -0.08 M-0.67 30.6 C8.55 20.72, 16.62 10.35, 26.56 0.5 M-1.53 37.92 C12.58 20.04, 26.19 8.99, 32.15 0.97 M-0.68 36.86 C8.68 26.97, 16.01 16.83, 31.37 0.38 M0.4 41.28 C11.21 27.67, 21.96 16.91, 34.94 0.13 M0.72 42.64 C8.74 31.28, 19.56 20.92, 36.65 -0.32 M1.77 48.86 C11.07 36.71, 19.61 25.64, 41.58 0.54 M0.5 49.39 C11.6 33.45, 24.53 19.45, 42.76 0.65 M-1.11 52.57 C13.62 40.99, 26.12 26.55, 48.35 -1.64 M0.92 53.74 C14.6 37.79, 29.93 19.85, 47.48 0.39 M1.76 62.38 C18.98 38.59, 38.91 14.45, 53.59 0.63 M0.14 60.63 C10.42 48.67, 20.95 34.52, 53.23 -0.21 M0.78 68.86 C16.07 48.92, 31.26 34.71, 58.55 -0.83 M-1.16 67.07 C12.38 52.52, 23.97 39.54, 57.77 0.65 M-1.1 73.33 C12.95 56.71, 25.4 41.5, 62.8 0.41 M0.11 73.47 C19.28 52.89, 37.9 31.98, 63.34 0.75 M2.39 75.01 C24.63 51.8, 46.18 30.05, 69.06 -0.68 M2.08 75.95 C14.75 61.42, 29.53 45.22, 69.37 0.87 M6.32 75.63 C33.63 47.61, 60.38 17.11, 74.62 1.23 M7.22 76.14 C23.61 58.34, 40.15 41.79, 73.69 -0.1 M11.45 77.35 C26.68 59.67, 42.73 44.2, 80.18 -0.25 M12.99 76.32 C31.38 55.47, 51.87 32.35, 79.19 -0.14 M16.84 74.54 C39.33 53.91, 60.38 26.32, 85.27 -1.38 M18.29 76.36 C44.85 46.92, 71.54 16.17, 85.02 -1 M22.09 74.73 C35.82 60.66, 49.53 46.92, 89.06 1.6 M23.54 75.89 C38.06 60.6, 50.81 43.42, 89.26 0.85 M30.72 76.44 C46.21 59.69, 59.25 40.8, 87.9 5.11 M29.83 75.26 C50.59 50.1, 72.88 25.89, 90.23 6.03 M35.08 75.11 C50.95 60.17, 62.56 42.69, 91.15 10.26 M33.67 76.7 C48.34 59.62, 64.27 42.68, 90.19 11.53 M39.66 76.91 C54.58 61.72, 65.8 47.73, 89.66 19.1 M40.09 75.33 C56.54 56.4, 74.56 36.72, 89.55 18.07 M46.03 76.67 C57.96 63.03, 70.15 50.2, 91.7 23.16 M45.41 76.23 C57.04 63.5, 69.63 50.15, 90.68 23.88 M52.16 75.71 C64.56 58.74, 81 44.2, 91.82 29.4 M50.39 75.49 C59.32 65.49, 68.74 53.64, 90.58 29.8 M57.35 75.75 C67.48 58.92, 80.71 43.63, 89.62 38.76 M54.9 75.33 C67.85 62.38, 80.08 47.62, 89.39 37.08 M61.55 76.77 C66.66 65.68, 74.96 59, 89.85 44.31 M61.15 75.06 C71.22 64.37, 80.75 53.09, 90.4 42.64 M66.28 74.89 C73.51 67.7, 84.95 55.31, 90.61 49.41 M66.36 76.26 C73.02 68.61, 81.12 59.98, 90 48.47 M69.96 76.23 C74.7 70.41, 80.17 67.05, 90.04 54.3 M71.91 75.76 C79.8 66.31, 86.82 59.55, 90.21 55.38 M76.85 74.82 C80.22 72.2, 88.57 64.85, 91.21 61.96 M76.09 76.72 C80.03 71.74, 83.45 68.49, 89.37 60.08 M83.54 75.01 C84.49 73.95, 87.25 70.52, 89.26 68.18 M82.95 75.72 C84.47 73.5, 85.31 72.54, 90.26 67.19 M87.62 75.77 C88.24 74.75, 89.38 74.21, 90.02 73.4 M87.74 76.04 C88.59 75, 89.18 73.93, 90.15 73.18" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.31 -0.03 C24.03 0.68, 43.99 0.01, 88.95 1.36 M0.61 -0.01 C25.96 -0.14, 51.21 0.05, 89.85 0.89 M88.13 1.63 C90.63 28.24, 90.21 54.85, 88.08 75.5 M90.41 -0.63 C89.35 27.5, 90.05 57.11, 89.28 74.44 M90.6 75.16 C64.06 75.08, 42.46 73.23, 1.4 74.77 M89.01 73.4 C71.52 73.14, 53.16 72.32, -0.09 73.56 M-1.02 73.91 C-1.67 50.26, -1.99 28.36, 1.11 0.09 M0.2 72.83 C-0.1 53.59, -0.28 34.48, -0.67 -0.14" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(178.86227796995763 353.1462553796259) rotate(0 25.5 21)"><text x="25.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Credit</text><text x="25.5" y="36" font-size="16px" fill="currentColor" text-anchor="middle" >Card</text></g><g transform="translate(96.40773251541259 87.55534628871692) rotate(0 32.5 21)"><text x="32.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Create </text><text x="32.5" y="36" font-size="16px" fill="currentColor" text-anchor="middle" >Payout</text></g><g stroke-linecap="round"><g transform="translate(138.77534939296447 240.6538895766489) rotate(0 31.017570713873738 44.91082421038297)"><path d="M1.39 -2.61 C14.41 18.53, 31.46 41.37, 59.85 90.86 M-0.54 0.65 C21.95 32.1, 46.02 66.6, 62.58 92.43" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(138.77534939296447 240.6538895766489) rotate(0 31.017570713873738 44.91082421038297)"><path d="M39.88 71.95 C44.19 76.35, 52.51 81.93, 61.15 91.87 M37.94 75.21 C47.02 80.49, 57.67 88.43, 63.88 93.45" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(138.77534939296447 240.6538895766489) rotate(0 31.017570713873738 44.91082421038297)"><path d="M57 60.64 C57.29 67.93, 61.59 76.16, 61.15 91.87 M55.07 63.9 C57.96 73.44, 62.42 85.46, 63.88 93.45" stroke="#495057" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(138.44709774604826 237.2089397814957) rotate(0 -26.522568929671934 47.107910371698466)"><path d="M-1.31 0.03 C-16.25 29.35, -31.42 55.07, -53.37 94.29 M0.32 -0.08 C-15.51 28.53, -30.62 56.07, -52.58 92.14" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(138.44709774604826 237.2089397814957) rotate(0 -26.522568929671934 47.107910371698466)"><path d="M-48.27 62.7 C-49.48 72.85, -50.46 79.58, -52.86 93.36 M-46.64 62.59 C-48.63 72.35, -49.57 80.9, -52.08 91.21" stroke="#495057" stroke-width="1" fill="none"></path></g><g transform="translate(138.44709774604826 237.2089397814957) rotate(0 -26.522568929671934 47.107910371698466)"><path d="M-30.65 73.2 C-37.19 80.28, -43.44 83.87, -52.86 93.36 M-29.01 73.1 C-36.32 79.67, -42.52 85.09, -52.08 91.21" stroke="#495057" stroke-width="1" fill="none"></path></g></g><g transform="translate(125.40773251541259 274.9644371978077) rotate(0 13.5 10.5)"><text x="13.5" y="15" font-size="16px" fill="currentColor" text-anchor="middle" >Pay</text></g><g stroke-linecap="round"><g transform="translate(180.77641938409852 203.28817523967632) rotate(0 28.132230309098304 -64.22513856614634)"><path d="M-0.03 1.03 C9.03 -4.06, 47.42 -7.62, 54.58 -29.37 C61.74 -51.12, 44.79 -113.15, 42.93 -129.48" stroke="#f41d92" stroke-width="1.5" fill="none" stroke-dasharray="8 9"></path></g><g transform="translate(180.77641938409852 203.28817523967632) rotate(0 28.132230309098304 -64.22513856614634)"><path d="M59.98 -104.96 C54.15 -111.89, 46.28 -119.29, 44.65 -129.21" stroke="#f41d92" stroke-width="1.5" fill="none"></path></g><g transform="translate(180.77641938409852 203.28817523967632) rotate(0 28.132230309098304 -64.22513856614634)"><path d="M39.82 -101.14 C40.42 -109.28, 39.01 -117.92, 44.65 -129.21" stroke="#f41d92" stroke-width="1.5" fill="none"></path></g></g><g transform="translate(234.13063826463303 182.3951092413472) rotate(0 27 21)"><text x="27" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" >Payout</text><text x="27" y="36" font-size="16px" fill="#f41d92" text-anchor="middle" >Paid</text></g><g stroke-linecap="round" transform="translate(366.4263776130042 20.2779413215477) rotate(0 7.5 7.142857142857167)"><path d="M11.44 0.9 C12.84 1.38, 14.02 3.41, 14.57 4.76 C15.12 6.12, 15.13 7.83, 14.74 9.05 C14.35 10.26, 13.6 11.22, 12.24 12.05 C10.88 12.87, 8.28 13.88, 6.58 13.99 C4.87 14.11, 2.99 13.93, 1.99 12.74 C1 11.55, 0.68 8.31, 0.61 6.85 C0.54 5.39, 0.89 5, 1.58 3.98 C2.28 2.97, 3.1 1.3, 4.79 0.74 C6.48 0.18, 10.59 0.69, 11.74 0.63 C12.88 0.57, 11.7 0.18, 11.65 0.38 M6.55 1.14 C7.81 1.09, 10.44 -0.42, 12.07 0.56 C13.7 1.55, 15.83 5.49, 16.34 7.06 C16.84 8.64, 16.35 8.87, 15.1 10 C13.85 11.14, 10.72 13.44, 8.84 13.87 C6.96 14.31, 5.23 13, 3.84 12.62 C2.45 12.23, 1.23 12.55, 0.5 11.58 C-0.24 10.6, -0.79 8.29, -0.56 6.77 C-0.34 5.26, 0.5 3.37, 1.87 2.48 C3.23 1.58, 6.91 1.9, 7.63 1.43 C8.36 0.95, 6.2 -0.29, 6.21 -0.37" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M8.22 0.17 C9.57 -0.01, 10.75 0.9, 11.92 1.98 C13.1 3.06, 14.88 5.03, 15.29 6.63 C15.71 8.23, 15.4 10.21, 14.41 11.56 C13.42 12.9, 11.05 14.23, 9.34 14.7 C7.63 15.18, 5.66 15.04, 4.14 14.38 C2.61 13.72, 1 12.04, 0.18 10.77 C-0.64 9.49, -1.21 8.2, -0.78 6.75 C-0.35 5.29, 1.1 2.99, 2.74 2.04 C4.37 1.08, 7.87 1.18, 9.01 1.02 C10.16 0.87, 9.66 1.16, 9.59 1.09 M8.84 1.6 C10.12 1.55, 11.6 0.58, 12.43 1.35 C13.25 2.11, 13.59 4.71, 13.81 6.21 C14.02 7.7, 14.76 8.8, 13.71 10.32 C12.67 11.85, 9.13 14.78, 7.55 15.35 C5.97 15.93, 5.58 14.55, 4.21 13.75 C2.83 12.96, 0.12 11.94, -0.7 10.57 C-1.53 9.2, -1.5 7.05, -0.73 5.54 C0.04 4.03, 2.52 2.32, 3.92 1.51 C5.31 0.7, 6.8 0.65, 7.63 0.68 C8.46 0.71, 8.91 1.45, 8.89 1.7" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g stroke-linecap="round"><g transform="translate(373.56923475586154 33.84936989297648) rotate(0 0.224559781788912 14.131781067302086)"><path d="M0.44 0.38 C0.44 5.02, -0.66 24.02, -0.9 28.73 M-0.79 -0.46 C-0.33 3.76, 0.98 22.54, 1.35 27.06" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(374.0412538346477 57.108725622177985) rotate(0 -3.357253714589433 6.988165282189527)"><path d="M0.87 -0.98 C-0.37 1.4, -5.67 11.17, -7.03 13.39 M-0.13 1.12 C-1.5 3.79, -6.3 12.96, -7.58 14.96" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(381.04125383464816 57.77539228884473) rotate(293.5804257326911 -3.3068900323789876 7.172603662639517)"><path d="M0.59 1.15 C-0.43 3.25, -5.45 11.47, -6.78 13.64 M-0.56 0.7 C-1.59 2.95, -5.97 9.75, -7.21 11.68" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(360.9301427235364 42.66428117773353) rotate(0 14.200112145559615 -1.6809096463469473)"><path d="M-0.79 -0.96 C4.22 -1.42, 24.23 -2.49, 29.19 -2.79 M0.99 1.15 C5.89 0.35, 23.56 -3.54, 28.06 -4.51" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(249.31903161242553 45.0253922888445) rotate(0 52.810719070708046 -1.5714045324673407)"><path d="M0 0.45 C17.57 0, 88.02 -3.25, 105.62 -3.6" stroke="#f41d92" stroke-width="1.5" fill="none" stroke-dasharray="8 9"></path></g><g transform="translate(249.31903161242553 45.0253922888445) rotate(0 52.810719070708046 -1.5714045324673407)"><path d="M78.56 5.93 C85.23 2.35, 97.27 1.26, 106.69 -3.06" stroke="#f41d92" stroke-width="1.5" fill="none"></path></g><g transform="translate(249.31903161242553 45.0253922888445) rotate(0 52.810719070708046 -1.5714045324673407)"><path d="M77.85 -14.58 C84.59 -11.71, 96.86 -6.35, 106.69 -3.06" stroke="#f41d92" stroke-width="1.5" fill="none"></path></g></g><g transform="translate(252.98569827909205 54.5253922888445) rotate(0 45 10.5)"><text x="45" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" >Notification</text></g></svg>
<figcaption>Top level app gets notified when the payout is paid</figcaption></p>
</figure>
<p>One way to avoid circular dependencies and keep modules decoupled in Django is to use signals:</p>
<div class="highlight"><pre><span></span><span class="c1"># payouts/signals.py</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">Signal</span>
<span class="n">payout_paid</span> <span class="o">=</span> <span class="n">Signal</span><span class="p">()</span>
</pre></div>
<p>After declaring the signal we can send it when a payout is paid. This is done by the model:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">signals</span>
<span class="k">class</span> <span class="nc">PayoutProcess</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">mark_paid</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">pk</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">PayoutProcess</span><span class="p">:</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="hll"> <span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">sender</span><span class="o">=</span><span class="n">PayoutProcess</span><span class="p">,</span> <span class="n">payout</span><span class="o">=</span><span class="n">payout</span><span class="p">)</span>
</span> <span class="k">return</span> <span class="n">payout</span>
</pre></div>
<p>Now a top level app can listen to this signal, and send a notification to the user:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="kn">import</span> <span class="nn">payout.signals</span>
<span class="kn">import</span> <span class="nn">payout.models</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">MerchantPayoutProcess</span>
<span class="hll"><span class="nd">@receiver</span><span class="p">(</span><span class="n">payout</span><span class="o">.</span><span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="p">)</span>
</span><span class="k">def</span> <span class="nf">on_merchant_was_paid</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">payout</span><span class="p">:</span> <span class="n">payout</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">PayoutProcess</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">MerchantPayoutProcess</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">payout_id</span><span class="o">=</span><span class="n">payout</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="k">except</span> <span class="n">MerchantPayoutProcess</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">:</span>
<span class="c1"># Not a merchant payout</span>
<span class="k">return</span>
<span class="n">p</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">email_user</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dear merchant, you got paid </span><span class="si">{</span><span class="n">payout</span><span class="o">.</span><span class="n">amount</span><span class="si">}</span><span class="s1">$!'</span><span class="p">)</span>
</pre></div>
<p>When the signal receiver is triggered, it first checks to see if it's one of its own payouts. If it is, it fetches the related object, in this case a payout to a merchant, and sends a notification to the user.</p>
<div class="admonition info">
<p class="admonition-title">N receivers</p>
<p>With this scheme, if you have N receivers, then every dispatch causes N-1 useless queries. This can be avoided by adding a bit of context to the signal.</p>
</div>
<div class="admonition info">
<p class="admonition-title">dispatch_uid</p>
<p>It's usually a good idea to set <code>dispatch_uid</code> on signal receivers. The <a href="https://docs.djangoproject.com/en/3.2/topics/signals/#preventing-duplicate-signals" rel="noopener">documentation</a> explains it well.</p>
</div>
<p>The benefit of using signals this way is that the low level payout module can communicate with apps that depend on it, without forming a dependency back on them. This pattern eliminates the circular dependency and keeps low level modules independent and decoupled.</p>
<h3 id="working-in-bulk"><a class="toclink" href="#working-in-bulk">Working in Bulk</a></h3>
<p>This design was working pretty well - payouts were flying out and the users were happy.</p>
<p>At some point the staffers came back with another idea. They said that work is picking up and they now want to automate and streamline some of it. They asked if it was possible to mark payouts as paid in bulk. After a quick discussion we decided it's best to have the bulk process "all or nothing", meaning, if the operation fails for even one of the payouts in the bulk, it should not be applied to any of them.</p>
<p>We figured this would be a straightforward task, all we have to do is execute the command on all of the given payouts inside of a database transaction:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="k">def</span> <span class="nf">mark_paid_in_bulk</span><span class="p">(</span><span class="n">payout_ids</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="hll"> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="k">for</span> <span class="n">pk</span> <span class="ow">in</span> <span class="n">payout_ids</span><span class="p">:</span>
<span class="n">PayoutProcess</span><span class="o">.</span><span class="n">mark_paid</span><span class="p">(</span><span class="n">pk</span><span class="p">)</span>
</pre></div>
<p>The bulk process simply iterates over payout IDs and marks each one as paid. To make sure the process is atomic, or "all or nothing", we wrap the loop in a database transaction.</p>
<p>Easy, right? This is where it gets hairy.</p>
<h3 id="the-bug"><a class="toclink" href="#the-bug">The Bug</a></h3>
<p>This bulk process was also working great for a while. Staffers would upload Excel files (what else), and the system would go over the payouts and mark them all as paid.</p>
<p>One day, the person that usually does this was on holiday, and asked someone else to do it instead. The other person prepared the Excel file and uploaded it to the system. This new person was not familiar with the process so they made some mistakes with the amounts. As a result, the system rejected some of the payouts.</p>
<p>Now what does a normal person do when a system reports an error? They try again and again...</p>
<p>At some point we started getting complaints from users saying they are getting a ton of messages that they got paid. Some were happy, but others opened the app to check the details and saw that they were in fact not paid, and realized it must be a mistake.</p>
<p>At this point hundreds of users got these messages, but none of them did get paid! So what's causing the issue? How are notifications being sent when all of the payouts are still marked as pending? A closer look at the implementation of the bulk process revealed the problem.</p>
<h3 id="nested-transactions"><a class="toclink" href="#nested-transactions">Nested Transactions</a></h3>
<p>The function that marks a payout as paid is executed inside of a database transaction. To make sure that the signal is sent only when the payout status is committed to the database, the signal is sent <em>after</em> the transaction is completed:</p>
<div class="highlight"><pre><span></span><span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">mark_paid</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">pk</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">PayoutProcess</span><span class="p">:</span>
<span class="hll"> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="hll"> <span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">sender</span><span class="o">=</span><span class="n">PayoutProcess</span><span class="p">,</span> <span class="n">payout</span><span class="o">=</span><span class="n">payout</span><span class="p">)</span>
</span> <span class="k">return</span> <span class="n">payout</span>
</pre></div>
<p>When this function is executed on a single payout it works as expected. But what happens if we add the bulk process:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="k">for</span> <span class="n">pk</span> <span class="ow">in</span> <span class="n">payout_ids</span><span class="p">:</span>
<span class="c1"># inline `mark_paid()`</span>
<span class="hll"> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="hll"> <span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">sender</span><span class="o">=</span><span class="n">PayoutProcess</span><span class="p">,</span> <span class="n">payout</span><span class="o">=</span><span class="n">payout</span><span class="p">)</span>
</span></pre></div>
<p>Ha! The bulk process is using its own database transaction! When the signal is sent the payout can still be rolled back if a later payout in the bulk fails.</p>
<p>Just to illustrate, if we mark three payouts in bulk and we fail to mark the third one, all three payouts are rolled back, but notifications for the first two had already been sent:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">fail</span> <span class="ow">in</span> <span class="p">[</span><span class="kc">False</span><span class="p">,</span> <span class="kc">False</span><span class="p">,</span> <span class="kc">True</span><span class="p">]:</span>
<span class="gp">... </span> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">fail</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Failed!'</span><span class="p">)</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Message sent!'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Message sent!</span>
<span class="go">Message sent!</span>
<span class="go">Exception: Failed!</span>
</pre></div>
<p>Notice how the first two messages were sent even though the third failure caused the outer transaction to rollback all three.</p>
<hr>
<h2 id="remedies"><a class="toclink" href="#remedies">Remedies</a></h2>
<p>The single <code>mark_paid</code> function assumes that it is not executed inside of a database transaction, but it is not checking or enforcing it in any way. This is a problem.</p>
<h3 id="assert-atomic-block"><a class="toclink" href="#assert-atomic-block">Assert Atomic Block</a></h3>
<p>Before Django 3.2 we had some cases where we wanted to make sure a function is executed, or not executed, inside a database transaction. We ended up implementing two functions:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/db.py</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="k">def</span> <span class="nf">assert_is_in_atomic_block</span><span class="p">()</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">connection</span><span class="o">.</span><span class="n">in_atomic_block</span><span class="p">,</span> <span class="p">(</span>
<span class="s1">'This function must be run inside of a DB transaction.'</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">assert_is_not_in_atomic_block</span><span class="p">()</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">assert</span> <span class="ow">not</span> <span class="n">connection</span><span class="o">.</span><span class="n">in_atomic_block</span><span class="p">,</span> <span class="p">(</span>
<span class="s1">'This function must not be run inside of a DB transaction.'</span>
<span class="p">)</span>
</pre></div>
<p>Using these utility functions we could prevent some code from being executed inside a database transaction:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">common.db</span>
<span class="k">def</span> <span class="nf">do_not_run_inside_a_db_transaction</span><span class="p">():</span>
<span class="hll"> <span class="n">common</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">assert_is_not_in_atomic_block</span><span class="p">()</span>
</span> <span class="c1"># Rest of function goes here...</span>
</pre></div>
<p>Running this code block inside of an atomic block will now trigger an assertion error at runtime:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="o">>>></span> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="o">...</span> <span class="n">do_not_run_inside_a_db_transaction</span><span class="p">()</span>
<span class="hll"><span class="ne">AssertionError</span><span class="p">:</span> <span class="n">This</span> <span class="n">function</span> <span class="n">must</span> <span class="ow">not</span> <span class="n">be</span> <span class="n">run</span> <span class="n">inside</span> <span class="n">of</span> <span class="n">a</span> <span class="n">DB</span> <span class="n">transaction</span><span class="o">.</span>
</span></pre></div>
<p>The main downside to this approach is that unless explicitly stated otherwise, tests will run inside a database transaction. This will cause any test which uses a transaction to fail. To overcome that we ended up patching these functions in tests:</p>
<div class="highlight"><pre><span></span><span class="nd">@pytest</span><span class="o">.</span><span class="n">fixture</span><span class="p">(</span><span class="n">scope</span><span class="o">=</span><span class="s1">'session'</span><span class="p">,</span> <span class="n">autouse</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">patch_is_in_db_transaction</span><span class="p">():</span>
<span class="c1"># Patch atomic transaction check in tests.</span>
<span class="c1"># The checks can't be run in tests because tests are always wrapped in a transaction.</span>
<span class="n">patch_in</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'common.db.assert_is_in_atomic_block'</span><span class="p">)</span>
<span class="n">patch_not_in</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'common.db.assert_is_not_in_atomic_block'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch_in</span><span class="p">,</span> <span class="n">patch_not_in</span><span class="p">:</span>
<span class="k">yield</span>
</pre></div>
<p>This function creates a fixture which is automatically applied for the entire test session. The fixture mocks the two functions and disables their functionality.</p>
<h3 id="durable-transaction"><a class="toclink" href="#durable-transaction">Durable Transaction</a></h3>
<p>Starting with Django 3.2, there is another way to prevent a transaction from being executed inside of another transaction, by marking a transaction as <a href="https://docs.djangoproject.com/en/3.2/topics/db/transactions/#controlling-transactions-explicitly" rel="noopener">"durable"</a>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">(</span><span class="n">durable</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span> <span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">sender</span><span class="o">=</span><span class="n">PayoutProcess</span><span class="p">,</span> <span class="n">payout</span><span class="o">=</span><span class="n">payout</span><span class="p">)</span>
</pre></div>
<p>If you try to open a durable transaction inside of another transaction, a <code>RuntimeError</code> is raised:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="hll"><span class="gp">... </span> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">(</span><span class="n">durable</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span><span class="gp">... </span> <span class="k">pass</span>
<span class="gp">...</span>
<span class="hll"><span class="go">RuntimeError: A durable atomic block cannot be nested within another atomic block.</span>
</span></pre></div>
<p>Using a durable transaction may have prevented this issue from happening, but it would have also made the bulk feature impossible, or at least very complicated to implement!</p>
<h3 id="sending-signal-on-commit"><a class="toclink" href="#sending-signal-on-commit">Sending Signal on Commit</a></h3>
<p>Another way to tackle this issue is to instead try to make sure the signal is only sent when the overall transaction is successfully committed. One way of doing this is using <a href="https://docs.djangoproject.com/en/3.2/topics/db/transactions/#django.db.transaction.on_commit" rel="noopener"><code>on_commit</code></a>.</p>
<p>Using <code>on_commit</code> we can register a function to be executed only when the transaction is actually committed. To illustrate how using <code>on_commit</code> can solve the issue, consider the following example:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="gp">... </span><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">fail</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">([</span><span class="kc">False</span><span class="p">,</span> <span class="kc">False</span><span class="p">,</span> <span class="kc">True</span><span class="p">],</span> <span class="mi">1</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'processing </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s1">...'</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">fail</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Failed!'</span><span class="p">)</span>
<span class="hll"><span class="gp">... </span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">on_commit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Message sent!'</span><span class="p">))</span>
</span><span class="go">processing 1...</span>
<span class="go">processing 2...</span>
<span class="go">processing 3...</span>
<span class="hll"><span class="go">Exception: Failed!</span>
</span></pre></div>
<p>In the example we loop over three values where the third one is expected to fail. To print the message only when the transaction is committed successfully we use <code>on_commit</code>. Notice in the output that three items were processed, but since the third one failed, the entire process failed and none of the messages were sent.</p>
<p>To illustrate what happens when all items succeed, consider the following example:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="gp">... </span><span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">fail</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">([</span><span class="kc">False</span><span class="p">,</span> <span class="kc">False</span><span class="p">,</span> <span class="kc">False</span><span class="p">],</span> <span class="mi">1</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'processing </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s1">...'</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">fail</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Failed!'</span><span class="p">)</span>
<span class="gp">... </span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">on_commit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Message sent!'</span><span class="p">))</span>
<span class="go">processing 1...</span>
<span class="go">processing 2...</span>
<span class="go">processing 3...</span>
<span class="hll"><span class="go">Message sent!</span>
</span><span class="hll"><span class="go">Message sent!</span>
</span><span class="hll"><span class="go">Message sent!</span>
</span></pre></div>
<p>Amazing! Three items were processed and three messages were sent. We can now apply a similar fix to our payout module:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">signals</span>
<span class="k">class</span> <span class="nc">PayoutProcess</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">mark_paid</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">pk</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">PayoutProcess</span><span class="p">:</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">payout</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="k">if</span> <span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="s1">'pending'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">StateError</span><span class="p">()</span>
<span class="n">payout</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'paid'</span>
<span class="n">payout</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="hll"> <span class="n">db_transaction</span><span class="o">.</span><span class="n">on_commit</span><span class="p">(</span><span class="k">lambda</span> <span class="n">signals</span><span class="o">.</span><span class="n">payout_paid</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">PayoutProcess</span><span class="p">,</span> <span class="n">payout</span><span class="p">))</span>
</span> <span class="k">return</span> <span class="n">payout</span>
</pre></div>
<p>When a payout is marked as paid the function now sends the signal only when the transaction is committed. This makes the function safe to execute inside of another transaction!</p>
<h3 id="using-a-queue"><a class="toclink" href="#using-a-queue">Using a Queue</a></h3>
<p>When dealing with such problems it's not uncommon to immediately think about queues. As a thought exercise, let's examine two common patterns often referred to as "queues".</p>
<p><strong>Async Tasks</strong></p>
<p>Async task runners such as <a href="https://celeryproject.org/" rel="noopener">Celery</a> are very popular. They let you execute tasks asynchronously, now, at a later time or at a predetermined time. Using async tasks in this case would not solve the issue:</p>
<ul>
<li>
<p><strong>Fire an async task in <code>on_commit</code></strong><br>If we set aside the fact that the payout module is not the one sending the messages, the outcome in this case is exactly the same as sending a signal in <code>on_commit</code> and firing an async task from the receiver (which is what we do).</p>
</li>
<li>
<p><strong>Fire an async task instead of sending a signal</strong><br>This will suffer from the same issues as the signal. If the bulk process fails, the task was already fired and the message will be sent.</p>
</li>
<li>
<p><strong>Schedule an async task for a later time and check the status before sending</strong><br>This may work in some cases, but there are other problems:</p>
<ul>
<li>
<p><strong>We have a race</strong>: How long after the payout was processed should the task be executed? 1s? 10s? 1m? What if the bulk process takes 2m to run? When the task will be fired the transaction will not be commited yet and the message will not be sent. What do you do then?</p>
</li>
<li>
<p><strong>We do extra work</strong>: You now have to fetch the payout <em>again</em> before sending the message.</p>
</li>
<li>
<p><strong>We send the message later</strong>: If we wait, users can receive the message a few minutes or even hours after they were paid. This may not be a big deal in some cases, but in others sending the message close to when the event occurred may be crucial.</p>
</li>
</ul>
</li>
</ul>
<p>Another downside to using an async task runner, is that now you need to have an async task runner. If you already have one it may no be so bad, but if you don't, it can be a pain to set up and operate.</p>
<p><strong>Transactional Queue</strong></p>
<p>If you decided to implement a queue in the database you are likely one step closer to a proper solution. Instead of using signals, you can stage tasks to a database table that acts as a queue.</p>
<p>The main benefit of using a queue table in the database is that the task will be added only when the transaction is committed. This plays very nicely with the overall transaction management of the process, and guarantees that the task is only added when it should.</p>
<p>The challenging part is how to make sure the tasks are being picked up shortly after they were added to the queue. If you are using a cron job to process tasks, the sending may be delayed up to the cron job's repeat interval. If you use database triggers, LISTEN/NOTIFY or similar to trigger processing of tasks then the delay can be shorter.</p>
<hr>
<h2 id="testing"><a class="toclink" href="#testing">Testing</a></h2>
<p>We ended up implementing the <code>on_commit</code> solution because it required very little changes to existing code. However, after we finished making the changes to the code, we faced yet another challenge - the tests!</p>
<h3 id="testing-with-django"><a class="toclink" href="#testing-with-django">Testing with Django</a></h3>
<p>Our tests included scenarios to make sure a notification is sent when a payout is paid:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_send_notification</span><span class="p">(</span><span class="n">db</span><span class="p">,</span> <span class="n">mailoutbox</span><span class="p">,</span> <span class="n">merchant_user</span><span class="p">:</span> <span class="n">User</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="n">comm</span> <span class="o">=</span> <span class="n">MerchantCommission</span><span class="o">.</span><span class="n">create_payout</span><span class="p">(</span><span class="n">merchant_user</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">100_00</span><span class="p">)</span>
<span class="n">PayoutProcess</span><span class="o">.</span><span class="n">mark_paid</span><span class="p">(</span><span class="n">comm</span><span class="o">.</span><span class="n">payout_process_id</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">mailoutbox</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
</pre></div>
<p>After we made the change to send the signal in <code>on_commit</code>, all of these tests failed. After some debugging we found that the receiver function we registered for the signals was not executed, but only in the tests!</p>
<p>The fact that the <code>on_commit</code> handler is not fired is not surprising if you know how tests are executed. To speed things up, Django starts a database transaction at the beginning of every test and then rolls it back immediately after. Executing tests in this manner is a fast way to prevent tests that change data in the database from affecting each other.</p>
<p>To make it possible to test things that are triggered in <code>on_commit</code> without using slow transactional tests, Django 3.2 added a new context manager called <a href="https://docs.djangoproject.com/en/3.2/topics/testing/tools/#django.test.TestCase.captureOnCommitCallbacks" rel="noopener"><code>captureOnCommitCallbacks</code></a> (<a href="https://code.djangoproject.com/ticket/30457" rel="noopener">Ticket #30457</a>):</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core</span> <span class="kn">import</span> <span class="n">mail</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">TestCase</span>
<span class="k">class</span> <span class="nc">TestPayoutProcess</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">test_should_send_notification</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">merchant_user</span><span class="p">:</span> <span class="n">User</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="n">comm</span> <span class="o">=</span> <span class="n">MerchantCommission</span><span class="o">.</span><span class="n">create_payout</span><span class="p">(</span><span class="n">merchant_user</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">100_00</span><span class="p">)</span>
<span class="hll"> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">captureOnCommitCallbacks</span><span class="p">(</span><span class="n">execute</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span> <span class="n">PayoutProcess</span><span class="o">.</span><span class="n">mark_paid</span><span class="p">(</span><span class="n">comm</span><span class="o">.</span><span class="n">payout_process_id</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
</pre></div>
<p>The context manager is available on instances of <code>TestCase</code>, and when <code>execute=True</code> any <code>on_commit</code> handlers will also be executed, not just captured.</p>
<h3 id="testing-with-pytest"><a class="toclink" href="#testing-with-pytest">Testing with Pytest</a></h3>
<p>Unfortunately we are not using Django's <code>TestCase</code> directly anymore, we are using pytest, and we were not in a position to start rewriting stuff. Lucky for us, <code>pytest-django</code> implemented equivalent functionality. A quick upgrade to <a href="https://pytest-django.readthedocs.io/en/latest/changelog.html#v4-4-0-2021-06-06" rel="noopener"><code>pytest-django</code> version 4.4</a> and we were ready to go:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_send_notification</span><span class="p">(</span>
<span class="n">db</span><span class="p">,</span>
<span class="n">mailoutbox</span><span class="p">,</span>
<span class="hll"> <span class="n">django_capture_on_commit_callbacks</span><span class="p">,</span>
</span> <span class="n">merchant_user</span><span class="p">:</span> <span class="n">User</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="n">comm</span> <span class="o">=</span> <span class="n">MerchantCommission</span><span class="o">.</span><span class="n">create_payout</span><span class="p">(</span><span class="n">merchant_user</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">100_00</span><span class="p">)</span>
<span class="hll"> <span class="k">with</span> <span class="n">django_capture_on_commit_callbacks</span><span class="p">(</span><span class="n">execute</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span> <span class="n">PayoutProcess</span><span class="o">.</span><span class="n">mark_paid</span><span class="p">(</span><span class="n">comm</span><span class="o">.</span><span class="n">payout_process_id</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">mailoutbox</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
</pre></div>
<p>The fixture <a href="https://pytest-django.readthedocs.io/en/stable/helpers.html#django-capture-on-commit-callbacks" rel="noopener"><code>django_capture_on_commit_callbacks</code></a> is based on the Django function. Once you inject it you can use it just like you would the Django one.</p>
<hr>
<p>The "bug" caused by this nested transaction ended up causing some users to get multiple messages saying they got paid, but all of these users were eventually paid.</p>
<hr>
<h2 id="thoughts-on-django-signals"><a class="toclink" href="#thoughts-on-django-signals">Thoughts on Django Signals</a></h2>
<p>As described in this story, Django signals are useful for implementing interactions between modules without creating explicit dependencies between them. The <a href="https://docs.djangoproject.com/en/3.2/topics/signals/" rel="noopener">official documentation about Signals</a> also provide this as the main reason for using signals:</p>
<blockquote>
<p>Django includes a βsignal dispatcherβ which allows decoupled applications get notified when actions occur elsewhere in the framework. In a nutshell, signals allow certain senders to notify a set of receivers that some action has taken place. Theyβre especially useful when many pieces of code may be interested in the same events.</p>
</blockquote>
<p>If you look at <a href="https://github.com/django/django/blob/main/django/dispatch/dispatcher.py" rel="noopener">how signals are implemented in Django</a> you'll find that there is not a lot of magic under the hood. The function <code>connect</code> adds a function to a list of receivers, and when a signal is <code>send</code> (or <code>send_robust</code>) the signal object iterates over the list of receiver functions, and executes them one by one.</p>
<p>This is very similar to a pub-sub pattern, but it lacks some of the guarantees of more advanced implementations. One of the main disadvantages of Django signals is that <strong>there is no guarantee that a "message" ever reaches the destination</strong>. If for example, the server crashes while a signal is being broadcast, some receivers may not be executed and they will not be attempted when the service starts up again. This can become a problem if you rely on signals exclusively to trigger certain actions in the system.</p>Practical SQL for Data Analysis2021-04-26T00:00:00+03:002021-04-26T00:00:00+03:00Haki Benitatag:hakibenita.com,2021-04-26:/sql-for-data-analysis<p>Pandas is by far the most popular tool for data analysis. It's packed with useful features, it's battle tested and widely accepted. However, pandas comes at a cost which is often overlooked. SQL databases has been around since the 1970s. They contain many features that most developers never heard of, and I want to bring some of them to light.</p><hr>
<p>Pandas is a very popular tool for data analysis. It comes built-in with many useful features, it's battle tested and widely accepted. However, pandas is not always the best tool for the job.</p>
<p>SQL databases have been around since the 1970s. Some of the smartest people in the world worked on making it easy to slice, dice, fetch and manipulate data quickly and efficiently. SQL databases have come such a long way, that many developers and data scientists lost track of what they can do with the database they already have!</p>
<p><strong>In this article I demonstrate how to use SQL to perform fast and efficient data analysis.</strong></p>
<section class="foo">
<style>.dark .foo img { filter:invert(1); }</style>
<figure><img alt="<small>Illustration by <a href="https://weareskribbl.com/mindfulness/">Victoria Holmes</a></small>" src="https://hakibenita.com/images/wrapping-your-head-around-sql.png"><figcaption><small>Illustration by <a href="https://weareskribbl.com/mindfulness/">Victoria Holmes</a></small></figcaption>
</figure>
</section>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#sql-vs-pandas-performance">SQL vs Pandas Performance</a><ul>
<li><a href="#pandas-true-cost">Pandas True Cost</a></li>
<li><a href="#removing-unnecessary-data">Removing Unnecessary Data</a></li>
<li><a href="#aggregating-in-the-database">Aggregating in the Database</a></li>
<li><a href="#removing-pandas">Removing Pandas</a></li>
<li><a href="#results-summary">Results Summary</a></li>
<li><a href="#pandas-and-sql-better-together">Pandas and SQL: Better Together!</a></li>
</ul>
</li>
<li><a href="#basics">Basics</a><ul>
<li><a href="#common-table-expressions">Common Table Expressions</a></li>
<li><a href="#generating-data">Generating Data</a></li>
<li><a href="#random">Random</a></li>
<li><a href="#random-choice">Random Choice</a></li>
<li><a href="#sampling">Sampling</a></li>
<li><a href="#example-train-test-split-with-sql">Example: Train / Test Split with SQL</a></li>
</ul>
</li>
<li><a href="#descriptive-statistics">Descriptive Statistics</a><ul>
<li><a href="#describing-a-series">Describing a Series</a></li>
<li><a href="#describing-a-categorical-series">Describing a Categorical Series</a></li>
</ul>
</li>
<li><a href="#subtotals">Subtotals</a><ul>
<li><a href="#rollup">Rollup</a></li>
<li><a href="#cube">Cube</a></li>
<li><a href="#grouping-sets">Grouping Sets</a></li>
</ul>
</li>
<li><a href="#pivot-tables">Pivot Tables</a><ul>
<li><a href="#conditional-expressions">Conditional Expressions</a></li>
<li><a href="#aggregate-expressions">Aggregate Expressions</a></li>
</ul>
</li>
<li><a href="#running-and-cumulative-aggregation">Running and Cumulative Aggregation</a><ul>
<li><a href="#window-functions">Window Functions</a></li>
<li><a href="#sliding-window">Sliding Window</a></li>
</ul>
</li>
<li><a href="#linear-regression">Linear Regression</a></li>
<li><a href="#interpolation">Interpolation</a><ul>
<li><a href="#fill-with-constant">Fill with Constant</a></li>
<li><a href="#back-and-forward-fill">Back and Forward Fill</a></li>
<li><a href="#linear-interpolation">Linear Interpolation</a></li>
</ul>
</li>
<li><a href="#binning">Binning</a><ul>
<li><a href="#custom-binning">Custom Binning</a></li>
<li><a href="#equal-height-binning">Equal Height Binning</a></li>
<li><a href="#equal-width-binning">Equal Width Binning</a></li>
</ul>
</li>
<li><a href="#take-away">Take Away</a></li>
</ul>
</div>
<p></details></p>
<div class="admonition tip">
<p class="admonition-title">Interactive Course</p>
<p><div style="display:flex;align-items:center;"><svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 109.72693673325944 107.6819131095042" style="height:4em;min-height:3em"><g stroke-linecap="round"><g stroke-opacity="0.6" fill-opacity="0.6" transform="translate(12.09554198756814 105.20243761579223) rotate(0 42.2257148024091 -51.36148106104014)" fill-rule="evenodd"><path d="M-1.66 -5.05 L7.13 -87.78 L83.5 -86.79 L82.01 -15.03 L-0.81 -7" stroke="none" stroke-width="0" fill="#f41d92" fill-rule="evenodd"></path><path d="M-2.1 -10.09 C-1.54 -30.82, 2.14 -59.36, 7.53 -91.25 M-1.53 -7.52 C1.88 -36.65, 3.27 -66.22, 6.29 -89.45 M2.14 -95.2 C30.81 -89.31, 48.3 -87.5, 86.55 -84.03 M5.05 -91.17 C20.77 -91.14, 40.01 -90.01, 86.29 -84.29 M86.01 -81.64 C82.96 -67.53, 83.74 -54.72, 79.6 -17.01 M85.42 -83.3 C84.19 -65.56, 80.76 -42.9, 78.07 -16.48 M81.65 -13.13 C55.9 -16.93, 35.32 -13.22, 1.09 -8.34 M78.72 -14.99 C55.21 -14.55, 33.55 -15.14, 1.98 -8.14 M0 -8.18 C0 -8.18, 0 -8.18, 0 -8.18 M0 -8.18 C0 -8.18, 0 -8.18, 0 -8.18" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(104.40120730265056 21.034223086383463) rotate(0 -46.43324311682974 36.885851424860505)"><path d="M-9.61 -0.25 C-28.67 3.39, -53.55 2.8, -86.14 4.19 M-8.02 -0.71 C-36.35 0.99, -63.74 2.79, -86.4 3.37 M-85.91 5.78 C-84.74 25.27, -84.02 48.38, -84.66 73.9 M-88.19 3.96 C-87.82 21.88, -87.78 38.36, -85.17 74 M-87.22 74.48 C-68.49 72.13, -46.92 70.49, -6.96 71.07 M-84.84 72.51 C-57.62 73.88, -28.98 71.83, -4.67 70.81 M-4.88 70.43 C-6.67 41.96, -8.5 14.45, -8.34 1.17 M-4.8 69.91 C-6.4 54.87, -6.47 37.84, -7.93 -0.48" stroke="#000" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(36.04266212705488 42.52137844133961) rotate(0 17.1304131627725 6.13138926161713)"><path d="M0.52 -1.09 C5.99 0.95, 27.36 11.25, 32.8 13.36 M-0.66 0.94 C5.16 3.19, 28.87 9.83, 34.92 11.73" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(40.47640031284823 70.78754917499106) rotate(0 14.673419521024726 -7.19565281114194)"><path d="M-0.01 -0.88 C4.69 -3.45, 23.52 -13.44, 28.47 -15.66 M-1.47 1.27 C3.59 -1.11, 25.7 -11.47, 30.82 -14.34" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(64.73291211726252 72.57512139647838) rotate(0 10.217923741740321 -1.1320545942063802)"><path d="M-0.48 0.17 C3.07 -0.23, 17.17 -2.01, 20.91 -2.43 M1.47 -0.79 C4.86 -1.07, 16.39 -1.33, 19.75 -1.46" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g></svg>
<div style="flex-grow:1; margin-left: 1em;" markdown="1">
This article is available as an <strong><a href="https://www.educative.io/courses/practical-data-analysis-with-sql" rel="noopener">interactive course on Educative β«</a></strong>
</div>
</div></p>
</div>
<hr>
<h2 id="sql-vs-pandas-performance"><a class="toclink" href="#sql-vs-pandas-performance">SQL vs Pandas Performance</a></h2>
<p>Imagine a simple table with 1M users, each with a username and an indication if the user was activated or not. A simple data analysis task would be to answer <em>how many activated and inactivated users are there?</em></p>
<p><details markdown="1"></p>
<p><summary>β Benchmark setup</summary></p>
<p>Create the table and populate with random data:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">activated</span><span class="w"> </span><span class="nb">BOOLEAN</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="w"> </span><span class="n">activated</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()::</span><span class="nb">text</span><span class="p">)::</span><span class="nb">text</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">9</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">activated</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1000000</span><span class="p">);</span>
</pre></div>
<p>Setup a python virtual environment and install dependencies:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>venv
$<span class="w"> </span><span class="nb">source</span><span class="w"> </span>venv/bin/activate
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>psycopg2<span class="w"> </span>pandas<span class="w"> </span>memory-profiler
</pre></div>
<p>To produce benchmark results, create a script with the following pattern:</p>
<div class="highlight"><pre><span></span><span class="c1"># bench.py</span>
<span class="kn">from</span> <span class="nn">memory_profiler</span> <span class="kn">import</span> <span class="n">profile</span>
<span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">():</span>
<span class="c1"># TODO: Replace with code to benchmark</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'do work!'</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">run</span><span class="p">()</span>
</pre></div>
<p>Execute your script from the command like and view the row level results:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>bench.py
<span class="go">do work!</span>
<span class="go">Filename: bench.py</span>
<span class="go">Line # Mem usage Increment Occurences Line Contents</span>
<span class="go">============================================================</span>
<span class="go"> 3 38.9 MiB 38.9 MiB 1 @profile</span>
<span class="go"> 4 def run():</span>
<span class="go"> 5 # TODO: Replace with code to benchmark</span>
<span class="go"> 6 38.9 MiB 0.0 MiB 1 print('do work!')</span>
</pre></div>
<p>You can find more details about this method in <a href="/fast-load-data-python-postgresql#measuring-memory">this article</a> or in the <a href="https://github.com/pythonprofilers/memory_profiler" rel="noopener"><code>memory-profiler</code> documentation</a>.</p>
<hr>
<p></details></p>
<p>Let's start with a naive approach using pandas:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">dbname</span><span class="o">=</span><span class="s1">'db'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'SELECT * FROM users'</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">fetchall</span><span class="p">(),</span>
<span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">'id'</span><span class="p">,</span> <span class="s1">'username'</span><span class="p">,</span> <span class="s1">'activated'</span><span class="p">],</span>
<span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s1">'activated'</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</pre></div>
<p>The script uses <a href="https://pypi.org/project/psycopg2" rel="noopener">psycopg2</a> to create a connection to the database. It then fetches data from the users table into a pandas dataframe, and calls <code>groupby</code> to get the counts for activated and inactivated users.</p>
<p>This script produces the following output:</p>
<div class="highlight"><pre><span></span><span class="go"> id username</span>
<span class="go">activated</span>
<span class="go">False 900029 900029</span>
<span class="go">True 99971 99971</span>
</pre></div>
<p>We got an answer to our question, but at what cost?</p>
<h3 id="pandas-true-cost"><a class="toclink" href="#pandas-true-cost">Pandas True Cost</a></h3>
<p>Let's execute this function again, but this time look at the memory usage:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>python<span class="w"> </span>test_pandas_naive.py
<span class="go"> id username</span>
<span class="go">activated</span>
<span class="go">False 900029 900029</span>
<span class="go">True 99971 99971</span>
<span class="go">Filename: test_pandas_naive.py</span>
<span class="go">Line # Mem usage Increment Occurences Line Contents</span>
<span class="go">============================================================</span>
<span class="go"> 3 38.8 MiB 38.8 MiB 1 @profile</span>
<span class="go"> 4 def run():</span>
<span class="go"> 5 41.2 MiB 2.3 MiB 1 import psycopg2</span>
<span class="hll"><span class="go"> 6 78.2 MiB 37.1 MiB 1 import pandas as pd</span>
</span><span class="go"> 7</span>
<span class="go"> 8 78.6 MiB 0.4 MiB 1 connection = psycopg2.connect(dbname='db')</span>
<span class="go"> 9 78.6 MiB 0.0 MiB 1 with connection.cursor() as cursor:</span>
<span class="hll"><span class="go"> 10 179.9 MiB 101.3 MiB 1 cursor.execute('SELECT * FROM users')</span>
</span><span class="go"> 11 386.0 MiB 12.7 MiB 2 df = pd.DataFrame(</span>
<span class="hll"><span class="go"> 12 373.4 MiB 193.5 MiB 1 cursor.fetchall(),</span>
</span><span class="go"> 13 373.4 MiB 0.0 MiB 1 columns=['id', 'username', 'activated'],</span>
<span class="go"> 14 )</span>
<span class="go"> 15</span>
<span class="go"> 16 386.0 MiB 0.0 MiB 1 result = df.groupby(by='activated').count()</span>
<span class="go"> 17 386.0 MiB 0.0 MiB 1 print(result)</span>
</pre></div>
<p>To view the memory usage of the program we use the package <a href="https://pypi.org/project/memory-profiler/" rel="noopener">memory-profiler</a>. We used this technique in the past to find the <a href="/fast-load-data-python-postgresql">fastest way to load data into PostgreSQL using Python</a>.</p>
<p>The output shows the overall memory usage for each row in the program, as well as the additional memory added by each row in the column "Increment".
The output for this program reveals some interesting finds:</p>
<ol>
<li>
<p><strong>Pandas alone consumes ~37M of memory</strong>: Just importing pandas, before even doing anything with it, consumes a significant amount of memory. For comparison, importing psycopg2 only adds 2.3MB of memory to the program.</p>
</li>
<li>
<p><strong>Fetching the data into memory consumed an additional ~300MB</strong>: When we fetched the data into memory, and then into a pandas dataframe, the program occupied an additional 300MB. For reference, the size of the table in the database is only 65MB.</p>
</li>
</ol>
<p>If we ignore the 38MB consumed by the profiler itself, the program consumed 347MB of memory, and executing this script without the profiler took 1.101s to complete.</p>
<h3 id="removing-unnecessary-data"><a class="toclink" href="#removing-unnecessary-data">Removing Unnecessary Data</a></h3>
<p>Our quick analysis showed that fetching the data consumed the most memory. To optimize that, we can try to fetch less data. For example, we don't really use the username column, so maybe we can <em>not</em> fetch it from the database:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>python<span class="w"> </span>test_pandas.py
<span class="go"> id</span>
<span class="go">activated</span>
<span class="go">False 900029</span>
<span class="go">True 99971</span>
<span class="go">Filename: test_pandas.py</span>
<span class="gp"> # </span>Mem<span class="w"> </span>usage<span class="w"> </span>Increment<span class="w"> </span>Occurences<span class="w"> </span>Line<span class="w"> </span>Contents
<span class="go">========================================================</span>
<span class="go"> 3 38.7 MiB 38.7 MiB 1 @profile</span>
<span class="go"> 4 def run():</span>
<span class="go"> 5 41.0 MiB 2.3 MiB 1 import psycopg2</span>
<span class="go"> 6 78.3 MiB 37.2 MiB 1 import pandas as pd</span>
<span class="go"> 7</span>
<span class="go"> 8 78.6 MiB 0.3 MiB 1 connection = psycopg2.connect(dbname='db')</span>
<span class="go"> 9 78.6 MiB 0.0 MiB 1 with connection.cursor() as cursor:</span>
<span class="hll"><span class="go">10 132.1 MiB 53.6 MiB 1 cursor.execute('SELECT id, activated FROM users')</span>
</span><span class="hll"><span class="go">11 142.1 MiB -90.7 MiB 2 df = pd.DataFrame(</span>
</span><span class="hll"><span class="go">12 232.8 MiB 100.7 MiB 1 cursor.fetchall(),</span>
</span><span class="hll"><span class="go">13 232.8 MiB 0.0 MiB 1 columns=['id', 'activated'],</span>
</span><span class="hll"><span class="go">14 )</span>
</span><span class="go">15</span>
<span class="go">16 142.1 MiB 0.0 MiB 1 result = df.groupby(by='activated').count()</span>
<span class="go">17 142.1 MiB 0.0 MiB 1 print(result)</span>
</pre></div>
<p>By explicitly providing a list of columns to the query and fetching only what we need, the program now consumes only 232MB, or 193MB without the overhead of the profiler. This is an improvement from the previous attempt which consumed 347MB of memory.</p>
<p>Executing the script without the profiler took 0.839s compared to the previous program which took 1.1s.</p>
<h3 id="aggregating-in-the-database"><a class="toclink" href="#aggregating-in-the-database">Aggregating in the Database</a></h3>
<p>The most memory in the program is still the data being fetched into memory. What if instead of first fetching the data and aggregating using pandas, we would aggregate the data <em>in the database</em>, and create a pandas dataframe from the results:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>test_db.py
<span class="go"> activated cnt</span>
<span class="go">0 False 900029</span>
<span class="go">1 True 99971</span>
<span class="go">Filename: test_db.py</span>
<span class="gp"> # </span>Mem<span class="w"> </span>usage<span class="w"> </span>Increment<span class="w"> </span>Occurences<span class="w"> </span>Line<span class="w"> </span>Contents
<span class="go">========================================================</span>
<span class="go"> 3 38.6 MiB 38.6 MiB 1 @profile</span>
<span class="go"> 4 def run():</span>
<span class="go"> 5 41.0 MiB 2.3 MiB 1 import psycopg2</span>
<span class="go"> 6 78.0 MiB 37.0 MiB 1 import pandas as pd</span>
<span class="go"> 7</span>
<span class="go"> 8 78.3 MiB 0.4 MiB 1 connection = psycopg2.connect(dbname='db')</span>
<span class="go"> 9 78.3 MiB 0.0 MiB 1 with connection.cursor() as cursor:</span>
<span class="go">10 78.3 MiB 0.0 MiB 1 cursor.execute('''</span>
<span class="hll"><span class="go">11 SELECT activated, count(*) AS cnt</span>
</span><span class="hll"><span class="go">12 FROM users</span>
</span><span class="hll"><span class="go">13 GROUP BY activated</span>
</span><span class="go">14 ''')</span>
<span class="go">15 78.3 MiB 0.0 MiB 2 result = pd.DataFrame(</span>
<span class="go">16 78.3 MiB 0.0 MiB 1 cursor.fetchall(),</span>
<span class="go">17 78.3 MiB 0.0 MiB 1 columns=['activated', 'cnt'],</span>
<span class="go">18 )</span>
<span class="go">19 79.3 MiB 1.0 MiB 1 print(result)</span>
</pre></div>
<p>This is a big leap compared to the previous attempt. Doing the processing in the database and fetching aggregated results consumed only 79MB of memory, or 40MB if we remove the overhead of the profiler. This is a big improvement!</p>
<p>Executing the script without the profiler took 0.380s, which is twice as fast as the previous program which took 0.839s.</p>
<h3 id="removing-pandas"><a class="toclink" href="#removing-pandas">Removing Pandas</a></h3>
<p>At this point the only significant memory hog is pandas itself. Just for fun and reference, let's see what the program consumes without pandas:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>test_db_plain.py
<span class="gp gp-VirtualEnv">([(False, 900029)</span><span class="go">, (True, 99971)],)</span>
<span class="go">Filename: test_db_plain.py</span>
<span class="gp"> # </span>Mem<span class="w"> </span>usage<span class="w"> </span>Increment<span class="w"> </span>Occurences<span class="w"> </span>Line<span class="w"> </span>Contents
<span class="go">========================================================</span>
<span class="go"> 3 38.9 MiB 38.9 MiB 1 @profile</span>
<span class="go"> 4 def run():</span>
<span class="go"> 5 41.2 MiB 2.4 MiB 1 import psycopg2</span>
<span class="go"> 6</span>
<span class="go"> 7 41.7 MiB 0.5 MiB 1 connection = psycopg2.connect(dbname='db')</span>
<span class="go"> 8 41.7 MiB 0.0 MiB 1 with connection.cursor() as cursor:</span>
<span class="go"> 9 41.7 MiB 0.0 MiB 1 cursor.execute('''</span>
<span class="go">10 SELECT activated, count(*) AS cnt</span>
<span class="go">11 FROM users</span>
<span class="go">12 GROUP BY activated</span>
<span class="go">13 ''')</span>
<span class="hll"><span class="go">14 41.7 MiB 0.0 MiB 1 result = cursor.fetchall(),</span>
</span><span class="go">15</span>
<span class="go">16 41.7 MiB 0.0 MiB 1 print(result)</span>
</pre></div>
<p>After removing pandas and keeping the results as a python list of tuples, the program consumes 41MB of memory, or just 2.8MB if we ignore the profiler overhead. This is a huge difference from where we started!</p>
<p>The timing is also much lower. Without the profile this program completes in just 0.114s. That's 70% less than the previous attempt using pandas, and overall 90% faster than the first program.</p>
<h3 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h3>
<p>This is the a summary of the results:</p>
<section class="table-container">
<table>
<thead>
<tr>
<th>Program</th>
<th>Peak Memory</th>
<th>% Memory Diff</th>
<th>Runtime</th>
<th>% Runtime Diff</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pandas with entire table</td>
<td>347 MB</td>
<td></td>
<td>1.101s</td>
<td></td>
</tr>
<tr>
<td>Pandas with only necessary data</td>
<td>193 MB</td>
<td>-44%</td>
<td>0.839s</td>
<td>-23%</td>
</tr>
<tr>
<td>Pandas with aggregation in database</td>
<td>40 MB</td>
<td>-80%</td>
<td>0.380s</td>
<td>-54%</td>
</tr>
<tr>
<td>No Pandas, aggregation in database</td>
<td>2.3 MB</td>
<td>-94%</td>
<td>0.114s</td>
<td>-70%</td>
</tr>
</tbody>
</table>
</section>
<p>This benchmark does not mention the memory consumed by the database itself - this is intentional. Databases usually consume a configurable amount of memory, and than manage allocations between different buffers and system components internally. Over the years, databases have gotten pretty good at managing their memory so you won't have to. Whether you decide to use the database or not, the memory is already paid for, so you might as well use it!</p>
<h3 id="pandas-and-sql-better-together"><a class="toclink" href="#pandas-and-sql-better-together">Pandas and SQL: Better Together!</a></h3>
<p>Programs that consume a lot of memory are a huge pain. Developers need powerful development environments, iterations are slower and the entire process takes more time. From an infrastructure perspective, resources cost money, and the more you scale the more you have to pay. The costs pile up pretty quickly.</p>
<p>All of this is not to say that Pandas is unnecessary, or that it can be replaced. Pandas provide great benefits and it has proven itself as being incredibly valuable. The same thing can be said for databases.</p>
<p><strong>To take advantage of both worlds and create lightweight programs that are also fast, use SQL and Pandas together!</strong></p>
<p>I'm focusing on Pandas and Numpy because they are the most popular, but the concepts described in the article apply to other tools and languages such as R, Julia, Matlab, SAS and so on. To make the argument even more compelling, I include interactive <a href="https://hex.tech/" rel="noopener">Hex Notebooks</a> you can experiment with on your own.</p>
<hr>
<h2 id="basics"><a class="toclink" href="#basics">Basics</a></h2>
<p>The SQL query language was invented more than 40 years ago, and it is the most popular language for querying relational data. SQL is defined in an ANSI standard but there are still subtle differences between popular database engines such as PostgreSQL, MySQL, Oracle, SQL Server and others.</p>
<p>These are the common clauses of an SQL query:</p>
<div class="highlight"><pre><span></span>SELECT <expressions>
FROM <tables>
JOIN <to other table> ON <join condition>
WHERE <predicates>
GROUP BY <expressions>
HAVING <predicate>
ORDER BY <expressions>
LIMIT <number of rows>
</pre></div>
<p>In PostgreSQL only the <code>SELECT</code> clause is really mandatory, so you can mix and match to do what you want.</p>
<h3 id="common-table-expressions"><a class="toclink" href="#common-table-expressions">Common Table Expressions</a></h3>
<p>It's sometimes useful to split a large query into smaller steps. Using SQL, you can define a <a href="https://www.postgresql.org/docs/current/queries-with.html" rel="noopener">common table expression</a> or "CTE" in short, with the <code>WITH</code> clause:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emails</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s1">'ME@hakibenita.com'</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">email</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">emails</span><span class="p">;</span>
<span class="go"> email</span>
<span class="go">βββββββββββββββββββ</span>
<span class="go"> ME@hakibenita.com</span>
</pre></div>
<p>You can have multiple CTE's in a single query, and they can even depend on each other:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emails</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="s1">'ME@hakibenita.com'</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">email</span>
<span class="p">),</span>
<span class="hll"><span class="n">normalized_emails</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">lower</span><span class="p">(</span><span class="n">email</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">emails</span>
</span><span class="hll"><span class="p">)</span>
</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">normalized_emails</span><span class="p">;</span>
<span class="go"> email</span>
<span class="go">βββββββββββββββββββ</span>
<span class="go"> me@hakibenita.com</span>
</pre></div>
<p>Common table expressions are a great way to split a big query into smaller chunks, perform <a href="https://www.postgresql.org/docs/current/queries-with.html#id-1.5.6.12.5.4" rel="noopener">recursive queries</a> and even to cache intermediate results!</p>
<h3 id="generating-data"><a class="toclink" href="#generating-data">Generating Data</a></h3>
<p>Generating data is very handy. Sometimes you need to generate data for practice, sometime you need to <a href="sql-anomaly-detection#preparing-the-data">generate a time series or a small table to join to</a>. There are several ways to generate data in SQL:</p>
<p><strong>UNION ALL</strong></p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">dt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="s1">'haki'</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">name</span>
<span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'benita'</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">dt</span><span class="p">;</span>
<span class="go"> id β name</span>
<span class="go">βββββΌββββββββ</span>
<span class="go"> 1 β haki</span>
<span class="go"> 2 β benita</span>
</pre></div>
<p>Using <a href="https://www.postgresql.org/docs/current/queries-union.html" rel="noopener"><code>UNION ALL</code></a> you can combine, or concatenate, the results of multiple queries.</p>
<p>Concatenating query results is very common, but it can be a bit tedious for generating data.</p>
<p><strong>VALUES LIST</strong></p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">dt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">VALUES</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'haki'</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'benita'</span><span class="p">)</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="k">name</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">dt</span><span class="p">;</span>
</pre></div>
<p>Using the <a href="https://www.postgresql.org/docs/current/queries-values.html" rel="noopener"><code>VALUES</code> keyword</a> you can provide a list of rows, and then define names and types using a "table alias list" <code>t(..)</code>. The <code>t</code> can be any name. Using a <code>VALUES</code> list is very useful when you need to generate small sets of data, or as the documentation calls it, "constants table".</p>
<p><strong>UNNEST</strong></p>
<p>To generate small sets of one dimensional data, you can <code>unnest</code> a PostgreSQL array:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">dt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">dt</span><span class="p">;</span>
<span class="go"> n</span>
<span class="go">βββ</span>
<span class="go"> 1</span>
<span class="go"> 2</span>
</pre></div>
<p>This is more restricting than <code>VALUES</code> as it can only produce a one dimensional table of the same datatype, but we are going to use it later.</p>
<p><strong>GENERATE_SERIES</strong></p>
<p>To generate large amounts of data, PostgreSQL provides a table function called <a href="https://www.postgresql.org/docs/current/functions-srf.html" rel="noopener"><code>generate_series</code></a>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">dt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">dt</span><span class="p">;</span>
<span class="go"> n</span>
<span class="go">ββ</span>
<span class="go"> 0</span>
<span class="go"> 1</span>
<span class="go"> 2</span>
<span class="go"> 3</span>
<span class="go"> 4</span>
<span class="go"> 5</span>
</pre></div>
<p>The function <code>generate_series</code> accepts three arguments: start, stop and step. In the example above we did not specify a step, so the default <code>1</code> was used. We can provide a different step to generate a different series:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">dt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span>
<span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="c1">-- start</span>
<span class="w"> </span><span class="mf">10</span><span class="p">,</span><span class="w"> </span><span class="c1">-- stop</span>
<span class="hll"><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="c1">-- step</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">dt</span><span class="p">;</span>
<span class="go"> n</span>
<span class="go">ββββ</span>
<span class="go"> 0</span>
<span class="go"> 2</span>
<span class="go"> 4</span>
<span class="go"> 6</span>
<span class="go"> 8</span>
<span class="go"> 10</span>
</pre></div>
<p>To generate a list of even numbers, we set the step to 2.</p>
<p>The function <code>generate_series</code> is not restricted just to integers, it can be used for other types as well. One common examples is generating date ranges:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">daterange</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="s1">'2021-01-01 UTC'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="p">,</span><span class="w"> </span><span class="c1">-- start</span>
</span><span class="hll"><span class="w"> </span><span class="s1">'2021-01-02 UTC'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="p">,</span><span class="w"> </span><span class="c1">-- stop</span>
</span><span class="hll"><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span><span class="w"> </span><span class="c1">-- step</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">hh</span><span class="p">)</span>
<span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">daterange</span><span class="p">;</span>
<span class="go"> hh</span>
<span class="go">ββββββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 00:00:00+00</span>
<span class="go"> 2021-01-01 01:00:00+00</span>
<span class="go"> 2021-01-01 02:00:00+00</span>
<span class="go"> ...</span>
<span class="go"> 2021-01-01 22:00:00+00</span>
<span class="go"> 2021-01-01 23:00:00+00</span>
<span class="go"> 2021-01-02 00:00:00+00</span>
</pre></div>
<p>To generate a 24 hour range we provided <code>generate_series</code> with a start and end data, and set the step to a 1 hour interval.</p>
<p><strong>GENERATE_SERIES with row numbers</strong></p>
<p>As mentioned above, <code>generate_series</code> is a "table function". There is a little trick with table functions to include row numbers in the result:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">daterange</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span>
<span class="w"> </span><span class="s1">'2021-01-01'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="p">,</span><span class="w"> </span><span class="c1">-- start</span>
<span class="w"> </span><span class="s1">'2021-01-02'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="p">,</span><span class="w"> </span><span class="c1">-- stop</span>
<span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span><span class="w"> </span><span class="c1">-- step</span>
<span class="hll"><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="k">ORDINALITY</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">)</span>
</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">daterange</span><span class="p">;</span>
<span class="go"> hh β n</span>
<span class="go">βββββββββββββββββββββββββΌββββ</span>
<span class="go"> 2021-01-01 00:00:00+00 β 1</span>
<span class="go"> 2021-01-01 01:00:00+00 β 2</span>
<span class="go"> 2021-01-01 02:00:00+00 β 3</span>
<span class="go"> ...</span>
<span class="go"> 2021-01-01 22:00:00+00 β 23</span>
<span class="go"> 2021-01-01 23:00:00+00 β 24</span>
<span class="go"> 2021-01-02 00:00:00+00 β 25</span>
</pre></div>
<p>Using <a href="https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-TABLEFUNCTIONS" rel="noopener"><code>WITH ORDINALITY</code></a>, the results now include another column with the row number.</p>
<h3 id="random"><a class="toclink" href="#random">Random</a></h3>
<p>To generate random numbers PostgreSQL provides a <a href="https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE" rel="noopener"><code>random</code> function</a> that returns a value between 0 and 1:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">random</span><span class="p">();</span>
<span class="mi">0</span><span class="p">.</span><span class="mi">5917508391168769</span>
</pre></div>
<p>To generate values at different ranges you can <code>random</code> in an expression:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Random float between 0 and 100</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span>
<span class="mi">59</span><span class="p">.</span><span class="mi">17508391168769</span>
<span class="c1">-- Random integer between 1 and 100</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="p">);</span>
<span class="mi">59</span>
<span class="c1">-- Random integer between 11 and 100</span>
<span class="k">SELECT</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">90</span><span class="p">);</span>
<span class="mi">59</span>
</pre></div>
<p>It's a common mistake to use <code>round</code> instead of <code>ceil</code> or <code>floor</code> to generate a range of integers. Using <code>round</code> may produce inconsistent distribution. Consider the following query to generate random integers in the range 0 - 4 using <code>round</code> instead <code>ceil</code>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">3</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1000</span><span class="p">)</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go">n β count</span>
<span class="go">βββΌβββββββ</span>
<span class="hll"><span class="go">0 β 150</span>
</span><span class="go">1 β 328</span>
<span class="go">2 β 341</span>
<span class="hll"><span class="go">3 β 182</span>
</span></pre></div>
<p>Notice how the values 0 and 3 are coming up less than 1 and 2. Using round, random values less than 0.5 will be rounded down to 0, and random numbers greater than 2.5 will be rounded up to 3, while for example, random values between 0.5 and 1.5 will be rounded to 1. This makes the edges less likely to come up.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 272.3571428571429 115.00000000000011" width="auto" height="10em">
<g><g transform="translate(16.07142857142867 66.57142857142878) rotate(0 119.54833883970878 -0.7636616346197229)"><path d="M0.4026914169938293 -0.11913903325953333 C57.37934727482316 0.228205005162611, 112.30953642145697 -1.9381348476672953, 238.2602580773018 -0.14734447711966459 M-0.29885097801555593 -0.645290435223627 C75.65518512823405 -1.180871289047221, 151.65757904994356 -1.796951346809558, 239.39552865743298 -1.196456809211996" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(16.07142857142867 59.42857142857156) rotate(0 0.8506242880051218 8.234760755634454)"><path d="M1.2504832937858108 -0.7143213542076756 C-0.3410602948161073 7.027281816724713, -0.10218487010765887 14.555885404410446, 1.430750944548125 15.696149441027945 M0.41653759807249635 0.6639055681855954 C-0.09199479860492238 5.860624806038096, 1.5849681092935644 11.64951143862752, 1.5323112330648936 17.183842865476578" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(86.07142857142867 56.571428571428555) rotate(0 1.0984403418365218 9.236423897602549)"><path d="M1.3001046729628092 0.002179515976680202 C-1.0420634460368188 4.734438492551264, -0.13945321422711687 7.843246623977129, 2.2616466247164375 18.470668279228416 M0.6780294132079738 0.10373980449344877 C0.4674011521763652 4.376865574731494, -0.04158967573518291 7.972342546716609, 0.5971692157613695 17.441145905646316" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(163.21428571428567 55.142857142857224) rotate(0 0.1677613707785781 8.928248710525907)"><path d="M-0.5894102466229652 1.1457471522000464 C1.3662821840422001 4.746034217703965, -0.5557426770785893 7.621266815166414, -0.25656475342050933 16.625823748543567 M-0.18948407645715248 -0.13465817193311158 C0.9381784696205041 4.967987701665235, 0.8409927615566395 9.606706392569784, 0.9249329881800787 17.991155592984907" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(253.2142857142859 58) rotate(0 1.225909437616565 8.211356825380506)"><path d="M-0.6075499725882867 -0.601244058902467 C1.686304316956111 6.813157200658946, 2.6199256025560747 11.462357234673307, 3.0593688478213488 16.300636798458168 M0.556061429123863 -0.7932986871244863 C1.1582673920506057 7.039438172370931, 1.8414797255302682 11.602192113426387, 1.1103712495000955 17.21601233788546" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g transform="translate(10 79.00000000000011) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">0</text></g><g transform="translate(85.50000000000011 75.42857142857156) rotate(0 2 13)"><text x="2" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(158.57142857142867 74.7142857142859) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(248.3571428571429 74.00000000000011) rotate(0 7 13)"><text x="7" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g transform="translate(46.78571428571445 39.428571428571445) rotate(0 42.85714285714289 11.428571428571445)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.05160326358395306 6.639001826246373 C1.9429753369017142 4.270906544749829, 3.0895451710468174 1.9448484092091736, 4.281259346338457 0.009261542916041032 M-0.502429757607157 6.39330883217334 C1.4899956122226445 4.0929071556328545, 3.5874214010605576 2.0826370483860392, 4.789809008645362 0.21001075951729353 M1.0272023343573915 10.66406035911995 C5.518408697186159 8.42678797665713, 9.548798444816052 2.0570229832297717, 11.284298708987368 -1.5184075732633708 M-0.31686872829686535 12.60979497548896 C3.328645322865256 9.89913936218866, 4.537921401355295 6.475755316942976, 10.651354379159343 -0.7741527018055557 M-0.6607209751894931 18.49812350878554 C3.6409569757964535 11.04140532656048, 9.790453251693679 7.059321097275767, 16.916494572072704 -0.852546141769273 M0.23499469826461272 17.58939681192922 C5.652246669860715 13.119461520921817, 11.334017549185605 4.7877781736536615, 15.075162126220246 0.18896011420758274 M1.021760758627078 21.993838893656086 C4.848237812970538 18.02927054720061, 10.491640512454241 11.498141687230186, 19.65484661636677 0.898240435326052 M0.9204783902129741 22.992633176442634 C7.815406996974833 16.00389094640116, 15.660411587234272 6.181371904816309, 21.08683034525987 0.1848178342148188 M4.558445253422823 25.430441618356994 C11.067380926477032 19.60149748075996, 16.565475383625422 12.386542416142445, 27.48878948823771 1.2939629567798914 M5.9406154733284975 24.169126685168735 C11.741453583108058 16.586691866629295, 18.46842027537388 8.55645152949463, 27.114756086517485 0.32762851975976837 M12.22891285252646 21.331174408220875 C18.679791310912197 13.883733930418074, 23.596856683786445 5.299545672408607, 30.10476985894779 0.041693961365449894 M11.664372277268026 22.393066108362007 C18.614025294465705 17.086843452383878, 24.49460995190888 9.84266017666734, 31.009967320083476 0.8283404420375113 M15.017249111238222 25.25007312291202 C21.44741813145901 17.749289197288917, 27.69688256116904 9.510643277727386, 36.61382640879725 -1.1502338162529746 M17.482603156159918 24.355175538108618 C23.332074903022242 14.93764254002609, 32.335888054810574 5.709783460600786, 36.89671683758621 0.2536542556342205 M24.080603825820372 25.204757176203632 C28.57928281095438 14.007229419313688, 37.35981464593249 7.015073040334624, 41.612562059048344 -0.4224040873021835 M22.55785951759456 22.814254808103644 C25.732535753195037 18.861275525755577, 30.960244918835517 13.51387610390231, 43.514704484116066 -0.10540364325503049 M26.31321880157315 23.90160617776001 C34.35482376641066 16.749865502117224, 41.006826023020906 8.374766803632234, 46.74717174000132 0.46085963288212284 M27.66966759070032 23.04110613543326 C33.70944612135063 17.40784211053608, 39.057498353113644 10.81062692056572, 47.06435492630142 1.210445524955933 M32.40740008475882 22.885583291354358 C36.5777247488811 19.532390806619983, 41.79941167668194 14.980652168594942, 54.77441462291843 -1.7386826169268481 M32.556750658067585 24.187107478792548 C39.61975722226082 13.791595925089698, 48.764581139180635 5.335132543731369, 53.54903988423264 0.1973062814621933 M36.41754573067761 24.765411876650973 C44.87668917737772 18.883832505143832, 48.19839595942615 11.665066980742678, 59.12564188809992 0.49198939793519614 M37.76230682382471 23.728139180267675 C43.071439388338035 17.24729667304367, 48.48256948907165 10.370188999096356, 57.62035589953811 0.8320355308499181 M43.821118035591795 21.363398847423053 C48.5579863795012 15.30865103043348, 53.703099079122104 10.54041859562507, 61.967768956808044 -0.433020533758544 M43.97623124556209 22.833157323553717 C49.53702993376467 15.108239771983374, 55.48537019843849 9.17586047492149, 62.9203465673082 -0.5591840806858173 M47.08267130480161 23.940611313334898 C55.961804527462895 16.319553651774022, 62.668621109283336 9.095862393662028, 67.76730325934305 1.0266515726562275 M47.88039390240809 22.80812037573685 C56.60121149579934 15.56242068577805, 63.65569342878497 7.191178263669883, 68.92678030417129 0.14308247644115824 M53.7148240845693 23.34669993815972 C61.32501445636684 14.181225937946481, 69.25792399303901 4.7404948674188105, 73.35131646696719 1.6874532559527324 M53.49086182695308 22.464562545512155 C59.10303409995074 16.816520560989094, 62.5548480818875 12.228864924750434, 75.29537258021774 -0.7799040265740587 M57.84228693019513 24.2450755815618 C65.73893844021967 13.693725912740637, 76.59026747444423 4.367410305366601, 77.47147225998071 1.8670922507952383 M60.003176873325444 22.563520424965795 C65.19946091737987 17.63874517406858, 69.5152098360556 11.030112661979711, 78.70506368707595 0.5588695648546746 M63.60677560895083 22.01744223753712 C72.82811276077969 17.36478225791378, 76.78345548600113 9.408595507746893, 86.86499855964587 -0.4656401540430295 M64.47851892012504 23.938815928015625 C70.82753412177101 16.17763032891887, 75.74999444870467 9.64650653418743, 85.20024096983627 0.02074214692245846 M68.01133401310818 25.215425775081663 C74.99528287574866 18.753626853501128, 81.89260866806178 11.478452337077128, 87.85053820929139 0.7984362667705298 M68.96927878345178 24.1674350310795 C74.94277118457379 18.145095465053416, 80.10304537193478 11.380737165303588, 87.06783950815557 3.1494337398837686 M73.7646570729674 24.01220267885972 C79.77726060324638 19.270456832747715, 84.97797248881304 11.557201167148222, 87.0133573221985 7.039875712454666 M74.644631129934 23.239906513743588 C78.88307910296795 19.21645990377853, 81.16190916802698 16.174437221267027, 87.80380046414321 8.52078052537923 M79.33960002051381 24.442480150779314 C83.1462829654616 19.72019466183345, 84.37572502842035 17.17400385915445, 88.5005173179658 14.520169683059208 M79.81401625243663 22.9884112965144 C83.65024783453602 19.85796801633993, 86.70733962355395 15.95259705499169, 88.61390279842185 14.55069677725201 M0.0032943462950463243 22.86000658868695 C0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695 M0.0032943462950463243 22.86000658868695 C0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695 M6.318540051442424 22.914608311610046 C4.532498835398303 20.566005638140297, 1.5778836191192827 19.67271086727341, 0.9166118198851624 17.877646046536263 M6.454167424377839 23.383527276994407 C3.7077341769471612 20.866670990356948, 1.3849064971589942 19.115252135236158, 0.43865544924834127 17.645679036559756 M12.605311995229945 22.44236333731448 C9.164346108619686 19.77540522804084, 4.2160415630313 16.12912930776096, 1.0680219994005793 13.598381128944268 M12.025802108875737 23.32063913334047 C8.55341131849846 19.346004556493103, 5.939297136423485 16.970451275515956, 0.2206135773179171 12.346438776738754 M19.704991085734587 23.88090964056902 C9.900001754144306 15.540140053037312, 5.999724211351399 12.534651795680599, -1.3457941263055155 7.7500489593346575 M18.06611663410477 23.77225501764916 C11.80653109383955 18.912165523016846, 6.144706364194359 12.42666983235984, 0.9545261311272433 6.509195496035085 M22.689763036644862 24.56700848878251 C19.534281844890323 17.165274809200415, 14.071252114279684 12.81475459340093, 1.7368774309915298 1.5887509387832672 M24.11889679925035 22.282863218308254 C15.88786042381448 15.93667082226684, 5.979394025589272 7.097226202770768, -0.908049216664136 2.33454195213584 M31.696946670439043 23.1573639215285 C22.373252166677197 17.173089722602093, 14.398071110849955 9.147560322885298, 1.8492436011492746 -1.9741533962858586 M30.389993379030326 23.21503433131967 C23.827459615048483 15.431696783199346, 15.10777055790085 9.24950403375781, 1.3365970946087973 -1.933670611664441 M36.63327039517561 21.26904923403523 C29.20057005707486 17.55077027780791, 22.805502589105913 11.087208969226957, 8.981324238117285 -0.48055775968244907 M36.6606770178313 22.214131446833527 C27.920188150129363 15.361483341762895, 19.923806841771583 8.94485499716396, 8.253222467723734 -2.6811164267536127 M42.39179074391303 21.56405959653094 C31.63645048848282 11.923990483295317, 21.77303259506227 4.895670395252658, 11.874871518617404 -3.300524280121305 M41.64092929753986 22.731181717499936 C31.932161682513588 13.058823085359073, 20.777114295959375 5.093522592441126, 13.631949440730644 -1.3921459519246007 M49.92852111537812 23.904989141215406 C40.95178365293614 13.935289165491728, 30.22075577136782 6.152085796122446, 21.17535196637643 -2.979661950402658 M49.32732147821096 23.397303445440553 C38.85140162912659 15.37095878774956, 29.8320753030148 6.3484069419771245, 19.921497711184447 -1.7784143225557614 M55.79196471240655 20.95011775932636 C44.531293347610315 15.228988744311145, 37.72561205391409 6.660159017840174, 26.433221909230063 -1.580394004288781 M54.813797436508395 21.83754287931498 C45.086661020860014 14.756648307417468, 35.70113523443288 6.75622473276788, 26.511709593248973 -2.901398727487493 M60.973291183725145 21.589383333063555 C51.690260964346976 13.424716672707545, 42.40872739763941 4.653078518548682, 30.954104120905107 -1.2459856845799013 M60.4231003074871 23.325596446244848 C54.4894146164559 17.068249849938002, 49.64119167430103 11.991924903603623, 32.86497665947915 -1.5176761770981475 M66.43586956239238 22.53946853265525 C61.82738009605244 18.778495994271413, 54.555195904666064 10.206935517376285, 37.32832075946003 -2.6054574460477795 M67.51755542303368 22.59764754888029 C60.77996949822904 18.518657552896748, 54.6072436714732 13.354687242999509, 39.32702267926156 -1.2650867062376108 M74.19189700646672 24.499477903329673 C67.04480493914775 16.40115560167922, 60.88998810487035 11.774243052660912, 42.507522107953314 -2.617907868464176 M72.85694905253473 23.22679365237141 C63.53170009453019 14.610615909451427, 53.1468186814901 6.135615857900781, 44.48478235948283 -2.288306287255768 M78.37902753967495 22.306614952286594 C73.05547913127427 16.14383982130139, 66.15958574440597 11.779919018791622, 50.881315315111756 -2.4168365696293215 M79.95640714354516 21.99747337133985 C73.0818452645997 17.01980587671223, 67.88147954665689 13.474082776386862, 51.524280085197866 -1.8644197430309752 M86.78990498985281 22.53126894576376 C77.00025198901234 14.947559878197373, 66.34075071123574 6.27199505130616, 57.80679733377106 -3.71505740830548 M85.89717826022884 21.719303266945236 C76.62467080154451 14.397399929997224, 68.15225082154626 8.738175843791597, 56.0414128580705 -2.589083949903621 M88.70350130983599 18.288751024998888 C81.39146084604907 15.256421128777575, 76.5459875194945 7.733147529987578, 62.458731562658016 -1.3687262522928911 M87.43922190424956 20.257563962847158 C82.45046743763452 15.828008915964272, 76.41284097761596 10.450334255582796, 62.173376938155656 -1.7218056057520705 M89.69761657746018 15.42679420622973 C82.92233413590549 12.045192059689018, 81.5140166602872 7.92243738642905, 67.3590807848311 -3.7822541813633066 M87.11495851477117 13.939450145205242 C83.00088857089611 10.230259094271974, 80.52446814370941 8.124639840729627, 68.40945078233992 -1.3721044329262 M86.68799647066598 9.924490932855042 C84.12357631494898 3.8280640163135846, 77.72851054329593 1.8429867531103215, 75.38864147016994 -0.8991557309437721 M88.36347289627274 9.247031396641614 C83.21170433820627 4.888046716171953, 78.67102257570428 1.060269117757581, 74.03435555916496 -3.03436585283244 M88.77097263889713 3.8670673126548536 C85.36985841259512 2.177376518883772, 82.95685270200542 -0.28410377387771446, 81.19570166712204 -1.667220103869798 M87.78738612327119 3.762371532964335 C86.64900144301788 2.431730858256455, 84.84094025229763 1.3303514737053692, 81.58723780742423 -1.8261915765596797" stroke="#fc54ee" stroke-width="0.5" fill="none"></path><path d="M0.9689726401120424 1.2783108483999968 C25.239783802468885 -2.128971926867962, 48.60503351768217 -1.5913136526942253, 83.82823049995523 -1.7712509501725435 M-0.49923997465521097 -0.8128165816888213 C26.01100338955545 0.22585560714027764, 51.37995704276758 1.380196082937931, 85.30291380427252 0.6665317500010133 M86.3400899169169 0.1946652065962553 C86.57415624340737 5.6293992849865235, 86.58031072100366 13.921568589817221, 85.6665977832995 20.913026218169534 M85.41687621615301 -0.25388436671346426 C86.39982303160116 6.223950903197494, 86.03667890208646 13.23164094758355, 86.35555051706206 23.350437670813108 M85.36282914850335 21.697255973571146 C60.523523036124494 22.133519957640374, 36.42477364066458 22.541772256472314, -1.9607835840433836 21.872757498973215 M86.10108183763396 21.892812996969724 C66.25742333129585 21.93644466424098, 46.06562696942797 23.107850188256407, -0.15224506799131632 23.83546584831288 M-0.8072310518473387 21.00703913425764 C1.731711952547942 13.112328161352467, 0.9674310968922719 8.216043625825225, 1.1571895647794008 -1.3025185335427523 M0.45166180189698935 23.02398222671559 C-0.3008174845948812 17.524022947331652, 0.6336400857195261 14.007836899454059, 0.08978182729333639 -0.14141472335904837" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(137.5000000000001 40.14285714285745) rotate(0 42.85714285714289 11.428571428571445)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.527113968166535 5.898495654066197 C1.240079288859103 4.1225634112232745, 2.981279212907114 3.093616049207059, 4.591598892542071 -0.39050366246725976 M-0.18765781928318237 6.349543668751709 C1.0583446393988751 5.2812697031807625, 2.2172506080980185 3.594049203441405, 4.940674086617774 0.2181895658222252 M-1.166278729485382 10.73962760460335 C3.988531526144537 8.752120857373395, 6.992398644231275 6.234043366057671, 10.754452899050843 -0.1054990191862224 M0.414597755772959 12.428447509973823 C3.6595816096129337 9.06199704580508, 5.406798547602337 5.410333559247634, 10.114186830026043 -0.7889169153675675 M-1.7352596829226474 17.440162521694475 C1.580172474181114 15.69864499220129, 5.609303440926673 11.646343213011903, 17.000113212972362 -0.49713127436814997 M-0.42077870299576325 18.751544572320455 C5.714215225530588 12.018515058840865, 10.599998660602203 5.380392366638853, 15.368508292830963 0.30176858016522434 M2.6736167042142363 22.73218833136971 C4.592116005426245 20.13324603271238, 8.996248143144792 14.662386028991927, 22.37159819183674 1.2135399462818626 M0.27052621697994184 23.151279044744026 C7.485858814637542 14.575471319371124, 15.087947337494501 6.498688021395086, 21.38184734927293 -1.0493814512923834 M5.698902883580294 24.460706949625305 C9.229094050220736 19.582703073571885, 14.597089525920296 13.912209025656294, 26.929006029527994 -0.013372419959610582 M5.895849762497809 23.283499177005282 C12.928173164653131 15.279790483787147, 21.219413896189415 5.589966784524147, 27.22174666230789 0.8697124984127225 M9.823304198504243 24.195836101793873 C16.881949373722733 16.978190316644614, 20.0939851550808 13.35199234550942, 33.061908397308386 0.07607916568063544 M11.59759576321468 23.02395623935871 C17.940857702143422 15.969441180908632, 23.855289492254485 9.673009485240085, 31.262671702026225 -0.8784387995243783 M14.734031204286318 22.038486647539706 C23.28332756744966 14.389900651198957, 28.74308260603223 7.123778160376636, 35.08446897547816 -0.7628975621331016 M17.290056073258917 23.91493682389048 C21.647154122199698 18.90326851881848, 25.815555309024774 12.831034791769463, 35.902312760056304 -0.6255721905175129 M20.79135869718878 22.053153000636005 C27.275746821673373 17.11686548011937, 31.06778361077633 12.686133819606672, 41.296415685299564 -1.124011314627623 M22.409219600220904 23.81680326429566 C26.492786412032196 18.15730667066336, 33.25178887350545 12.097730663851314, 41.9038071814885 0.27090122162839236 M28.59265348251187 22.49308977074707 C32.97547049987085 20.71976829602394, 37.4835477066263 12.910830364912192, 47.547068457072854 0.2752893928613709 M27.758050073229008 23.351920277660433 C31.427609271767622 18.238999003866454, 34.65503681680424 14.121012104356819, 47.2410171329512 0.871822716499512 M31.033346606517853 23.950135121646106 C35.88850557988393 17.8506321395752, 44.43564288331402 12.391191010381347, 51.296311462060274 -1.2563604008929126 M32.848708274873616 23.273228322679877 C38.05123603884047 15.758471123206274, 44.99658208751057 9.18583333132212, 53.440751834534765 -0.9751406847090713 M36.870961124506955 23.579887412997408 C44.07965088757223 14.442970352967096, 53.75886097194338 7.412904424212506, 57.17862088056208 -0.15906546122618082 M37.74782980928308 22.90817024144016 C45.39610294877845 15.912252403735332, 52.066968633151234 8.464244522798673, 58.42804532786758 0.8084304225888097 M42.78608433274145 23.251528558574176 C51.48778761797895 13.556048851577009, 58.149692331585854 5.91854922730462, 63.064037133840515 1.5281530006347666 M44.41874733404781 23.52726461763636 C48.541016999331895 16.15454903385902, 55.0846885533077 8.038203988886105, 62.86837870804001 -1.0997188154117694 M48.94639553652158 23.77300066804071 C52.89045982977553 18.20663808269034, 62.4567054585078 10.76160968546719, 69.62219265219584 -1.6074660306457744 M48.56676832829615 22.788032418354646 C57.19656419048569 13.591194350625948, 65.10781467926286 4.343066129429516, 69.01295789168044 0.021409130875362337 M55.067638282109584 25.040829405170218 C59.95911862511495 15.892474694533739, 63.82510935171243 13.02939900277304, 74.38328410689029 -0.20260001629458202 M54.981397324618484 24.10357035229011 C60.97742163066551 15.505848473171618, 65.89605609074827 9.120516580704233, 75.04093584887924 -0.6953624673546983 M59.977648124805725 24.265300629626985 C64.15613666433526 14.916726088329025, 72.85200116268483 9.468987004478803, 81.30210017822412 2.1073337783526114 M59.38161367714319 23.086477511528905 C65.26143555493178 16.57822144502514, 71.93389037247869 9.290335993223547, 79.90469037126479 -0.26145795551580875 M65.02126583188173 24.874993335406995 C69.46426626913883 17.700646221628965, 78.54222531351712 9.043095437614245, 86.81899378745959 -1.161998357930969 M63.10203476447013 23.672138309988892 C72.47488540431657 14.10908870869374, 81.49966229283827 4.9803482767232055, 85.67204630392746 0.5360240625871313 M68.9046602956094 22.968408391506223 C75.52555593780636 17.13226452936692, 76.31352073690243 15.380091989554899, 87.45573277792542 3.145458800187278 M70.21848890269922 22.789101224087922 C76.54046343100859 17.14208987489917, 82.9682694429788 8.965729364965007, 88.4033762265241 2.8423661072085977 M76.53965144628893 23.765597282588967 C79.34231137485037 18.034065579575238, 84.14959295181937 12.518628422062564, 88.50070826290285 6.957134451926102 M75.67358825845878 24.029457533412167 C78.79402369761962 17.635155834094974, 84.33240746854283 11.438409629173416, 89.11805880116408 7.0849668599892155 M80.39717171774893 24.085877386649678 C82.15376448465665 20.535957783132563, 83.96742690664131 17.413964877703734, 88.98267092438059 13.356904931624882 M80.7057328757429 23.04153491368664 C82.50160446202334 21.34118494525301, 84.60243713837457 18.98771535360078, 88.42274741245078 14.089060567764097 M0.0032943462950463243 22.86000658868695 C0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695 M0.0032943462950463243 22.86000658868695 C0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695, 0.0032943462950463243 22.86000658868695 M6.465836384755413 22.65591366286737 C3.488146838824818 21.442929373605434, 1.0397213718223701 18.25492868247609, 0.5369367176512754 18.580786114590463 M6.164923249420563 22.877957914354027 C5.010980133502242 21.643856222080707, 3.2397906028759107 20.914602889532958, 0.20855128893950328 17.487756768645937 M12.519301634512171 24.23624112967288 C5.877353481928279 19.323983521704143, 3.460220051208152 13.431196866725086, 0.17845939258905608 11.121787729896418 M12.351580116383062 22.550839519387836 C7.1313849830304505 19.827853287169813, 4.190518176526224 15.154328692388802, 0.29885130162211626 12.959628664739242 M20.040181381907683 23.851696212071705 C13.317121299014364 19.68280688757035, 7.077215838841552 12.60168960174789, 0.6260691435003927 7.561945760329531 M17.87491232464127 22.149032904993522 C13.331396549707383 17.639570528984613, 7.29288217524304 13.101997680916384, 0.707558577034836 7.138391901445852 M25.58923861113827 21.152464860089882 C20.126294863074953 18.25957222931586, 14.84944079582187 13.594528579813472, -0.2577357396322494 -0.3836082416671722 M23.873943405314193 22.13758452606277 C14.862048870984648 14.789034581790894, 6.160372686454647 6.919824236395769, -0.49416624585419944 0.879425918581763 M28.985414554502764 22.79059556139056 C22.131009654369418 18.09981900701813, 15.026039225073276 9.073253711486865, 3.040986021345808 -3.3757880894316106 M30.659406134996512 22.038177575109344 C21.17938040436941 14.08601239105082, 9.43436254983795 4.311655104938033, 1.1903833246960775 -1.818558068558362 M38.33856262005964 21.430036611201977 C26.2048060367371 15.864096315927567, 18.94410393788101 7.958893736332019, 7.215379280384131 -0.4096124967735135 M37.78600865807483 23.097737642283356 C27.617083405045868 14.914306101235235, 17.3747526922329 6.356500862360921, 7.8494021914633585 -1.4896717913623774 M41.24486434086737 22.279658179752374 C29.538285428616234 14.463835761246544, 19.698473531317443 3.3807681350016168, 12.210321214204562 -2.808268592407682 M43.48717005643574 22.61885438023492 C35.087556794046435 17.43604756826724, 27.662589643665704 10.618285896571567, 14.546450392497611 -1.516703208623209 M49.863458115652776 23.24892939924153 C41.394792172152066 14.144953698159421, 30.20053227372968 6.716453219061208, 20.3744033465339 -1.5704903691343475 M49.03289909013418 23.334208591085694 C41.635404721836004 17.825476913434528, 35.307826779492856 11.341149120572954, 20.75368512535376 -1.2597449557192135 M56.445972147251524 22.54300516090717 C45.063384208558645 12.794239403398441, 30.911571012154553 4.844384736194595, 25.461207005207847 -1.711826060715783 M54.540577135833956 23.22040902349528 C45.79150494080712 13.818150073074175, 35.58863397015121 7.461687905418861, 26.63438620419563 -3.1787117214907887 M62.18670108439043 22.098698346949053 C53.134074824056206 13.857070996108494, 44.13181689330651 6.237168794121768, 31.479479963953203 -0.8286659099522158 M61.10638482859 22.152714604585302 C55.6275987136796 17.388823225134164, 47.80880097993522 11.767075690935199, 32.174382701578146 -1.1786180163162872 M68.75213896012798 23.179566627595435 C57.98200079484441 17.522586372179383, 51.800979362438085 7.996740043886163, 38.222669380828435 -3.5031553716154065 M66.61608500028893 23.7188712370059 C56.942423185628165 13.58518119182517, 46.375615685883446 3.771804781056275, 37.9928358868879 -1.7957413750455942 M71.35760793252263 22.531589548074546 C65.05336697504329 16.697275243051664, 54.59060429725231 7.045402860730153, 44.345550538892034 -3.1475180254771153 M72.10290142031732 22.593538885907176 C63.19190792145244 14.533449939647227, 55.583423773177614 7.105872370306303, 44.700117967011515 -1.8231952700285952 M77.84388702530117 22.538845615586155 C67.08823053406452 12.828142023531878, 58.122200147626465 4.729224582504536, 52.44085558687079 -1.8483784893375734 M79.1053975362301 22.912718665910774 C72.51362951288593 17.31604509677889, 66.32245076013122 12.477297592889455, 51.255690827003896 -2.936502024143646 M86.20674506629935 20.818599096368008 C77.8779981571681 16.47571393519112, 69.05955906275399 8.636395514481011, 56.568806477600404 -2.7716756219590444 M84.47825941219112 23.087589638176315 C74.4337385410074 13.912771780590262, 62.960128070364654 4.065651775018448, 56.778888968528875 -1.8797560137311358 M88.42827900835283 21.033036131657823 C82.79502102015549 13.501849523884504, 77.17605848607842 9.258863710909676, 63.107291609807675 -2.726690291141768 M88.50488404986419 20.834994926363393 C79.43647319115682 12.936584145384305, 72.38095981656733 7.0067184132779605, 62.68034615164137 -1.810901436669429 M89.28941392929734 15.076764737693352 C80.09432111755983 9.308009764796582, 75.76654247458914 4.058530798342526, 70.28139876655106 -0.570958576657496 M88.99112938841314 14.583691716631634 C82.72384131136691 10.68217577389461, 78.17421767904781 6.956672756904222, 68.47277833322798 -2.487895515308402 M89.1369181034327 8.620626215371521 C84.71485783504684 3.931994691982935, 77.77137896826824 1.7261590690191553, 73.3964086229562 -1.1709235856343239 M88.39591324871262 9.175284022107007 C83.37225067770811 5.973957647349847, 79.43999337821174 1.4862978903302573, 74.19634531956382 -2.826641639232281 M87.56298487739964 4.089832066950203 C85.81247934469775 2.030010702847903, 85.25794340629709 0.41522774413516195, 82.0704813921968 -1.3751110197404648 M87.93081081304094 4.215571408319101 C86.0236917793474 2.0179033599828906, 83.07706069047019 0.007482827597152686, 81.30841327539221 -2.36517729385628" stroke="#f41d92" stroke-width="0.5" fill="none"></path><path d="M-1.9110317658632994 0.5856300126761198 C33.89867418605304 0.12436729563134008, 64.99080697207583 -1.5662306557808607, 85.9467154738627 -0.383075462654233 M0.1177145903930068 0.20099286083132029 C28.259280406603878 1.5307578541604543, 55.864781121191214 1.6706864036885762, 85.80429067156683 0.6292929137125611 M84.53853094551187 1.1415565144270658 C85.68313940587859 8.795813886448753, 85.39728643718581 14.61170570366087, 85.47339147302728 22.651223067992532 M86.50034536860358 -0.11442642565816641 C86.42004205108228 6.7120868881632285, 86.06386355877461 14.529636617084721, 85.84272908113371 22.94690397368481 M85.98004764291863 23.451199893706644 C52.365584010790535 22.785019433391977, 20.83591537683138 21.57927331913798, -1.297224147245288 24.5503311798509 M85.83636831186186 21.906210452185178 C59.99819983608493 23.06400313401332, 35.58899311415321 23.531055325509215, -0.5330245727673173 23.22799080596974 M0.272221939638257 23.282391135447824 C-0.7896037174654857 16.90727081282865, -0.4445248319102184 13.178526146337408, -0.950903220102191 -1.0493375528603792 M-0.45249416772276163 22.5111727117162 C-0.3289185814293367 15.894907127400618, -0.3303664676578981 9.801721415216383, -0.7392780119553208 0.31108490470796824" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(227.50000000000034 38.71428571428578) rotate(0 11.785714285714391 11.785714285714334)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.9111688792379448 5.74716859241306 C1.6662362485770146 4.753415218798084, 3.1681698076203837 3.494376473140221, 5.049196935030168 0.32460528051827486 M-0.12073063660877442 6.591578545098297 C1.5017612903112132 4.908353313013926, 2.375369759305915 3.082521569735203, 4.729063900517768 -0.01710366757239762 M-1.1524132109154357 11.242074494480669 C1.0553835706672272 10.35573621443494, 3.834798904787781 7.624531178325467, 11.736496213030946 -0.6926898856565593 M-0.10082842697392835 12.291180134981452 C4.041140840456979 7.8063860196756325, 7.1800199854203415 3.3813811900473763, 10.431212276917828 -0.05357000202985984 M1.6229554702946682 17.843823474739366 C2.5981404283811256 16.30373957303668, 6.070957951262811 11.904234479407755, 16.728523754983623 1.608142151210921 M-0.7801350169396262 18.262914188113683 C4.8221599271733355 11.561674299420247, 10.819983127626456 5.330387145137813, 15.738772912419815 -0.6547792463633249 M0.055828446727177905 24.855309154554366 C3.755322470528224 19.79114234974015, 9.287844909813272 13.931381680779811, 21.941990621665383 -0.3734797952533242 M0.2527753256446932 23.678101381934344 C7.541352270622834 15.376685575097975, 16.09359447401888 5.386614028369323, 22.23473125444528 0.5096051231190089 M4.180229761651125 24.59043830672293 C10.916563259506011 17.716066035694247, 13.820416527905595 14.44439149112447, 26.106715902474257 1.9801005310552373 M5.954521326361563 23.41855844428777 C11.919229785595807 16.79944886481842, 17.458581439530946 10.93449750798281, 24.307479207192095 1.0255825658502236 M9.090956767433198 22.433088852468764 C15.733178420359614 16.981164696068205, 19.315013740214248 11.87534172096787, 23.536863277710474 6.4240908648009025 M11.646981636405798 24.309539028819536 C14.60873802867262 20.90348072247154, 17.38118123935221 16.437112948988467, 24.354707062288618 6.561416236416491 M15.635194034374605 22.869457839270055 C18.67015948135066 21.052330152228148, 19.93345619823578 19.554553499626728, 24.27265901936449 13.200474532941518 M16.686803621345486 24.015830510648833 C18.148512257116877 21.712423187270637, 21.310636368891707 19.190063987350356, 24.667463491887297 14.107167681507928 M21.9534030295143 23.675216666713965 C22.801608514843316 23.647315162571886, 23.705324698618167 21.820812115118823, 25.47473272621392 19.500418168083403 M21.703022006729437 23.932865818787974 C22.35377787099974 22.929513765740552, 22.872677126288107 22.211011591394914, 25.38291732897742 19.679378165174846 M-0.35037282111268553 23.266854124744146 C-0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146 M-0.35037282111268553 23.266854124744146 C-0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146 M5.675941752671777 23.52681382839539 C3.646074368489086 21.193048205466994, 2.580497471653076 19.812762035724642, -0.37380294617064114 18.14115924154427 M6.026689226582229 23.25812695977249 C4.4555961274292715 21.88453444786226, 2.512383318125363 20.24262228027144, 0.12596683275155907 18.528157595070265 M11.919895343561343 23.8397682124513 C8.97034073521704 18.741070459976935, 4.374865979950275 15.509188727602277, -0.20335961808833525 14.680920891776397 M13.226025744606428 24.060357059701047 C7.950612772732433 20.734519481131805, 4.5783727285065465 16.737410727278107, -0.3598863587287371 12.578623438939168 M18.604940608962536 23.619030851591123 C10.731153098199293 20.02623898413511, 8.483028967672785 14.931919643769307, 0.8298779005845431 5.889144444038401 M18.225313400737107 22.63406260190506 C11.746180019805259 16.61914161935914, 4.539225581340004 10.379087630649217, 0.2206431400691513 7.518019605559537 M24.726183354550546 24.88685958872063 C19.598649793160938 16.479525834065754, 13.42085382838236 14.318780125473642, 0.308002293719595 2.701597255456034 M24.639942397059446 23.94960053584052 C17.550246224870378 16.630161206943406, 9.361715577615549 11.161371584049357, 0.9656540357085519 2.208834804395918 M25.009285164677788 18.764208030021074 C16.877992129259706 10.810510792370941, 13.291337711049033 6.424218648452568, 3.354619912707302 0.32046729593741663 M24.413250717015256 17.585384911922993 C17.276431131039654 11.781516272203582, 10.915143380026446 5.619997997952574, 1.957210105747971 -2.0483244379310035 M24.671285258962136 13.370718923654252 C17.611916138485206 9.574692106114956, 15.197261281523852 4.15300511544325, 9.626223102165547 -2.292805811355658 M22.752054191550535 12.167863898236149 C17.334617381315518 6.379943936097614, 11.665319931583014 1.3884714517003474, 8.479275618633416 -0.5947833908375579 M23.507285852512386 7.02524197690313 C22.9923891100522 4.24871719366388, 18.350954620274663 4.265316585516753, 13.068345498800845 -1.2433413705209027 M24.426965877475258 6.8997269597103195 C20.94029955113854 4.974645299430335, 17.54615719852665 1.1407910393158225, 13.731695912819923 -1.4555062556059788 M24.128037370360538 1.7217197191220652 C22.55206180174713 0.5011480713022436, 21.478684876868826 -0.6627546902313247, 19.900443771199996 -1.7984658373266331 M23.911521573403 1.7876847818278652 C22.06076808776182 0.4148366159899832, 20.81488626985119 -0.8858271538323818, 20.054781405765304 -1.7665077353108547" stroke="#fa5252" stroke-width="0.5" fill="none"></path><path d="M-1.1757547687739134 1.1415565144270658 C8.556068482343195 0.7249359733358574, 16.751776087097998 -1.683715795825367, 23.33053433017031 -0.20591978915035725 M0.7860596543177962 -0.11442642565816641 C7.64748246402354 -0.010542077019010498, 14.416779381090123 0.8974557397820606, 23.699871938276743 0.08976111654192209 M23.837190500061666 0.5940570365637541 C23.26722055716863 9.344830569079447, 24.813465595678 17.23496071475432, 22.274204424183523 25.264616894136672 M23.693511169004893 -0.9509324049577117 C23.156495980885385 7.049240619409857, 24.00525542345321 14.462233506143148, 23.038403998661494 23.94227652025551 M23.843650511067068 23.996676849733596 C17.54220290834635 22.520540749487957, 12.698414073325827 23.823425387797435, -0.950903220102191 22.522091018568283 M23.11893440370605 23.22545842600197 C16.850448182185488 23.00035001500519, 10.452909798494515 23.109433876117784, -0.7392780119553208 23.88251347613663 M0.25115700252354145 23.17109738529802 C-0.3193958773211174 13.8996004800845, 0.9377370701237984 8.99829514896766, -0.35494659654796124 -1.627161966636777 M0.6315127080306411 23.321357918504184 C0.9876765767178903 17.483291497520636, -0.7267410894789328 10.623291520934062, -0.3439851636067033 -0.5078324591740966" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(19.285714285714334 39.78571428571445) rotate(0 11.785714285714391 11.785714285714334)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.5321578219206878 6.589916353722208 C0.6525784839199124 3.681351541626241, 2.7939283897993157 2.7382514154649713, 5.144945501121229 -0.1963609378936475 M-0.5494083735765765 6.688537372325651 C0.7527866621969619 4.755844580505631, 2.105835517120445 3.791737006678467, 4.636545988874978 0.450783423319202 M1.0296486490498888 13.55011688065487 C2.986207612210203 8.307226589024141, 6.224415860304375 2.203610723222245, 9.737088632536066 1.1039920459583166 M0.8192623213957655 11.591694688468754 C2.6217581805080465 9.022358259533355, 4.424834540372389 6.032600130662785, 11.133756037753951 0.4363093185320259 M-2.0559142577043037 19.123183236364895 C3.6380608521989846 14.848559670844843, 7.297091984348009 9.699438043416798, 14.21954192733713 -1.4536601726966927 M0.8443668391043087 18.59200728489148 C4.394109463686366 11.794846199323104, 10.480191580266062 5.895304286042897, 16.006656662098546 0.3617014956590703 M1.176959267090938 22.96113893361306 C7.410398601140088 15.25530268562457, 14.199541293049222 10.659277812193185, 19.697483066746855 -0.9976632674992558 M1.229063139501891 24.06716408272749 C6.161860487244961 16.78831700643515, 13.189327737028526 9.262491984003503, 21.87744364077028 -0.12079458272312849 M7.759346817633476 24.149847630802682 C10.722352162770685 19.828383515119956, 15.566638883061968 13.841783076936084, 23.379761080977055 1.1439120396213784 M6.161762654794719 23.164723241161695 C11.21278772816803 18.07243678809078, 17.37471892356409 10.983513470390553, 25.204492962707313 2.7765750409277334 M9.6711775822275 22.272039788219963 C14.053062946517686 16.701069889972413, 20.840738706767976 12.85286449119389, 23.368015236265457 7.960282272392007 M11.655567180708857 24.742348081808423 C15.964585588062135 17.881154963045795, 20.960636747003583 13.715766921999958, 25.781796189069247 7.580655064166578 M17.467003028153727 22.721396551308157 C18.689617819189067 21.34163455519693, 19.36563972064055 20.369319890403744, 25.18323239812858 14.488040854470967 M16.734576496494288 23.860286818518237 C19.824657966213046 19.876326386936057, 23.194634716923982 16.45078068560185, 25.605863873411415 14.431984232101751 M21.40448612823878 24.107475196302545 C23.41505881981843 21.97060843265719, 24.50602812350188 20.01431806062817, 25.9283023519045 19.759819878042315 M21.825903291644348 23.935369946446876 C23.160883002818743 22.81711734543103, 24.21363677493213 21.12850398871288, 25.168277102576035 19.581009543743555 M-0.35037282111268553 23.266854124744146 C-0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146 M-0.35037282111268553 23.266854124744146 C-0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146, -0.35037282111268553 23.266854124744146 M6.433633372962057 23.65901686047175 C4.872367709084076 22.689169715398997, 2.0518484291299366 21.191782967763128, 0.23396205935828585 17.965741128372215 M5.775965901121079 23.7924836246387 C4.21925509831889 22.198843519366935, 2.8311143650715493 20.46760518496852, 0.2457729920392503 18.491272571208142 M13.7708387152149 23.52726130129274 C10.145135030675208 22.264241654754702, 7.050363994067862 18.711379337846523, 0.31120403478470404 14.783008726253822 M12.746582505930833 23.255782950087152 C8.871114562524022 20.2169502908471, 5.790485597441784 17.47561962976756, 0.11718499953475547 14.090158175989702 M17.29744727618909 21.608364544529213 C13.783179611240211 16.475252297827964, 6.452713486756963 13.531830044732995, 0.3427508312006835 7.7783782157499886 M17.34975881118244 22.73847020855148 C12.402330915910705 16.928935876791765, 4.99152964855892 12.45021981035863, -0.1915966182286497 8.292646812406609 M24.177182849192565 24.0834654536329 C19.015335361704928 17.378095734697013, 13.679015646325809 15.001283850904759, -1.615948262946064 3.68427644743282 M24.4517774793073 23.040855721966157 C19.100924468969062 18.28433808597009, 12.404824109169535 13.731618126274455, 0.8994057100775295 2.335908269311074 M23.452324697428633 18.725493446057918 C16.006196786879524 11.411221969615301, 13.22718649771351 5.67384957056008, 3.4435796028907033 -1.5063278611077093 M23.332963828609575 17.056994681312027 C18.969363135499613 14.235913818402153, 12.739198660435607 9.544176866357455, 2.2534498711089226 -2.2586106994448327 M23.002387626452666 14.01310005902856 C19.082433492039712 7.0570568523493655, 17.815287628326015 5.05514005857943, 6.117646109266367 -0.8128300476473052 M24.158934959566515 11.70941679764151 C17.709128596435946 6.321136762469264, 11.768520988610385 1.7299540525392771, 7.130879527842787 -1.022481945308693 M24.880165986871386 6.922207620294143 C21.782541366519812 6.212593593942036, 18.134872639553684 2.3006872601195214, 13.615642234817715 -0.6825971002110967 M23.75958058970612 6.944501055610677 C20.529178800352494 4.409354416404149, 17.006962475417293 1.1762233121245491, 13.453919550631763 -2.1982854402975867 M24.062579250513657 1.9928165903565418 C23.007863008966176 0.9737524933759709, 21.615720755964276 0.22094153477572404, 19.99850750742268 -1.1638546321287626 M23.846285489705473 1.776688618470816 C22.755671364122122 0.7501139282745761, 22.119832590420156 0.40380994201278697, 19.97821729541544 -1.5926784335847817" stroke="#e08fff" stroke-width="0.5" fill="none"></path><path d="M-1.8274158108979464 0.8113921452313662 C6.923674503407925 -2.0443330101735375, 14.655613130091727 -0.2128935746915122, 23.213167774092113 -1.6069482397288084 M-0.4628364956006408 0.4195208614692092 C6.586152611366932 -0.536485587612594, 12.455829012340853 0.028990621312653175, 24.340041352037048 -0.29246725980192423 M21.99144660175935 0.6903420854359865 C23.09717733699719 7.761295593543275, 21.650254617518584 15.163687166225758, 23.027204322229778 25.32269548491717 M23.45741362643585 0.38459024485200644 C24.067293650477566 7.011656700034783, 24.00487921294514 12.57473446960965, 24.20446596813545 22.919604997682875 M23.406511592279827 22.86392807082415 C17.15755198649813 21.504602210869038, 9.272619404031866 22.19976158653661, -0.9461625088006258 23.360966256420618 M22.74092851353035 24.50313375477821 C15.92917174679642 23.615608833271303, 8.58342698110009 24.3969606678069, -0.38727214094251394 23.557913135338133 M-1.4497135151177645 24.450340321819787 C2.1671510687962714 18.6085690187424, -1.0844705947741327 9.459197694302713, -1.7147713769227266 -1.729135436937213 M-0.6155572151765227 24.009094785499876 C0.5464930087527525 15.525150437107635, -0.180741146186899 7.069197383576235, 0.2991453493013978 0.04515612777322531" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(22.142857142857338 13.285714285714448) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="#e08fff" text-anchor="middle" style="white-space: pre;" direction="ltr">0</text></g><g transform="translate(85.50000000000011 12.571428571428783) rotate(0 2 13)"><text x="2" y="18" font-size="1em" fill="#fc54ee" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(172.857142857143 10.71428571428578) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(229.0714285714289 10) rotate(0 7 13)"><text x="7" y="18" font-size="1em" fill="#fa5252" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g></svg>
<figcaption>Random distribution using round</figcaption></p>
</figure>
<p>This problem can be solved by either rounding up or down. Consider the same query using <code>ceil</code>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">3</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1000</span><span class="p">)</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go">n β count</span>
<span class="go">βββΌβββββββ</span>
<span class="hll"><span class="go">1 β 328</span>
</span><span class="go">2 β 339</span>
<span class="go">3 β 334</span>
</pre></div>
<p>Using <code>ceil</code> produces more evenly distributed random numbers.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 272.35714285714266 114.28571428571422" width="auto" height="10em">
<g><g transform="translate(16.07142857142844 65.85714285714289) rotate(0 119.37370930534331 -0.693674550398498)"><path d="M0.44991617927250005 1.6299532568273012 C92.99367391112976 -3.0396632119699767, 185.06608226299812 -3.0028244478558515, 237.50372916410333 -3.017302357624312 M-0.8189794095741527 0.21570609201316515 C50.452880702289704 -0.9750297869011367, 101.4072071079536 0.09507481247986482, 239.56639802026078 -1.1372182724153928" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(16.07142857142844 58.71428571428578) rotate(0 0.7414076470730038 8.240985897301528)"><path d="M1.6790079814065666 0.581844015967786 C-1.1823856627166833 5.415825379896721, -0.11543840399336236 8.880186425358637, -0.14055217843255896 15.607124056477476 M0.22219793642002394 0.09627482092919515 C-0.34248733236813844 5.98946165484879, 1.5230672502408247 11.146491051690708, 1.796114515264964 16.385696973673767" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(86.07142857142844 55.85714285714266) rotate(0 0.39919836382858875 8.128679283134716)"><path d="M0.581844015967786 -1.6869741506472846 C0.7479851273328118 4.783111893446286, -0.5055284655424006 10.118671881439278, -0.10716165780830322 17.267727145696266 M0.09627482092919515 -0.7736263406698528 C0.9620808292296886 5.714130464320981, 1.2738839154789081 10.493520674060843, 0.6714112593879875 17.94433271691675" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(163.21428571428555 54.42857142857133) rotate(0 0.27153642599182604 8.730682897082147)"><path d="M-1.6869741506472846 0.21422708668219137 C-0.4194339037798882 4.073022100006389, -0.46768394355548826 9.668860985135744, 1.5534414314104887 17.064563602291994 M-0.7736263406698528 0.7081539545590854 C0.9857255771274125 4.057719349447924, 0.9887343106171751 6.164102493687651, 2.2300470026309727 17.247138707482012" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(253.21428571428578 57.28571428571422) rotate(0 0.588973593122148 8.23895753062152)"><path d="M0.21422708668219137 -1.0538141496006466 C-0.900870089303165 5.260951375567312, -0.34173876364125766 13.77101041019754, 1.3502778880062136 15.455602665438388 M0.7081539545590854 0.6529869240625705 C1.0825524868435714 5.966110285748652, 0.19298127785401187 11.98576073803211, 1.5328529931962338 17.5317292108437" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g transform="translate(10 75) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">0</text></g><g transform="translate(85.5 74.71428571428578) rotate(0 2 13)"><text x="2" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(158.57142857142844 74) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(248.35714285714266 73.28571428571422) rotate(0 7 13)"><text x="7" y="18" font-size="1em" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g transform="translate(13.928571428571331 38.71428571428555) rotate(0 35.00000000000006 10.357142857142776)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.13971435573416446 6.228472643395059 C1.1768973393652016 5.404699703183528, 2.664785758736407 3.0629692495110348, 4.660863088401503 0.661494668185454 M-0.003695902568275733 6.375225414012663 C0.835025577790322 5.415292227674961, 2.2612519212095177 3.2431994307663037, 4.797962805586404 0.20194290055308875 M-0.04769586388956437 12.253132780313074 C2.558271837612338 9.695111359911783, 5.329184057466852 6.479021483365672, 10.255911111784112 -1.2161121715709804 M-0.148626701431383 12.220673799186526 C2.379937271617097 9.815847596589974, 4.434338904216448 6.628496477333888, 11.003916331333054 0.14384067534354938 M0.7452017319807549 19.793121323828274 C3.427108583678004 13.124291438891882, 9.855904837988652 4.2060775621722435, 16.546140094756563 1.7664793307823112 M-0.1383673642343144 19.029277926218384 C5.050294949559307 12.579855857945152, 9.094315881329015 7.495873784328408, 15.657909885517736 0.006049163353162079 M3.507325883116385 21.327865136021522 C10.367070207398067 15.240084179199897, 13.81492894726769 6.099893511959948, 22.115237240025664 -1.585964926129627 M1.039968600589594 23.52448028528854 C6.354446463843488 17.18115027843605, 8.964261926990844 13.510129113980398, 20.674769251884783 -0.4162204745993492 M8.9699319395183 21.54407017110177 C12.846230458060212 14.591318529155673, 23.73357783705783 3.731595197603845, 27.207625255596252 1.2152891209360899 M7.661709253577737 22.326797134152432 C10.925741189000853 17.57460167181967, 15.87278870751853 11.142518268853165, 27.24565355074814 0.8791243977856578 M12.0188171474717 22.42129772890967 C17.549203390976757 17.910086326036907, 23.490639905577535 10.393541886338788, 33.14292952366809 -0.9531303251848335 M12.505199448437187 22.581347633978844 C16.79217756546496 16.56678866932087, 20.67622047262379 12.628860227438249, 31.741915459229446 0.6969651571720128 M16.30173188917635 21.75805101228094 C24.875759868754017 15.996359849971359, 29.663961716612416 8.195759221164169, 37.16146118547891 1.7280713842344007 M18.652729362289588 22.75707566143161 C24.214236236414063 15.23700444254347, 29.38446637949753 8.650237331911361, 36.966335600988316 0.8209824386146387 M21.887112305869984 21.795499669942355 C30.613504446691426 13.902626358839512, 35.46714176764867 6.95422521929623, 43.86018937322848 -1.204321297970509 M23.368017118794548 22.551724351565635 C26.33300421536385 17.593280549208846, 30.947575266183286 13.032720196156435, 42.807844778146375 -0.9302705062226408 M28.700052529752366 23.992303154729637 C32.138714804880955 18.45753525222664, 35.72712426184589 10.415069267821826, 48.13176897815454 0.35414252692723736 M28.750931020073704 21.947909456044766 C34.94053750473453 15.799730704991966, 40.88126522869546 7.885796724491049, 47.39930173936325 -0.5719411672849013 M35.461332452061235 21.87478736536957 C40.21444571307814 15.501069096566326, 42.8858736319688 9.556298823500189, 54.60051288245684 -0.14959643823665303 M32.89526731478667 23.34454598492391 C40.37735429033821 16.325484496705972, 46.0089625099268 7.9375036911767545, 53.77221541176057 -0.8968044179083101 M39.123389954996355 23.811768004697868 C45.99883051971325 16.639222278884056, 50.08176988405951 10.762077277552521, 56.561419727373135 -1.0759310447353876 M38.54347243005509 22.61292156486691 C43.46275861384635 17.237399548932565, 47.98874546244266 12.035648072856349, 59.014522668997955 0.6561641269620502 M45.809348679795455 21.86472722144132 C52.046032070745014 16.357846680673937, 57.537511748553534 8.138116341451406, 63.41614614510317 -1.817764339733266 M44.24442073953856 23.228123080928675 C51.072143273568585 14.087661953925387, 59.14028839221011 6.297123202875408, 62.92531056319703 -0.942138531009995 M50.24899025865831 22.562636968435626 C58.39248514402829 15.477516216811111, 62.640867412511895 7.875068509797977, 67.6120345268847 2.049667075496508 M49.00813679535874 21.325110034773033 C54.1503384423823 16.225638236100778, 59.28395702627539 11.229057827985965, 69.17259409301742 -0.17951613176952463 M54.717782082822644 23.752135644407602 C62.287255153856535 16.63630829307737, 66.98175757170436 7.626620482605981, 74.10990816794595 3.807745144457968 M55.46357309617522 21.830141186454924 C60.60171666058808 16.092428804835293, 65.31443644925668 11.274908922879662, 71.97986134860923 2.31772894201562 M60.46275782905421 21.435638446404297 C61.729826472842205 17.75736121456028, 65.10413214175823 17.121917539068292, 73.77070113658701 7.525272703720776 M60.499192335213486 21.565685260805 C63.49878062035834 17.36537621036215, 67.01319112716196 13.459399761226928, 71.93537815628369 8.849690482290981 M66.0921286147958 22.332191418605298 C67.41427502886641 18.88143682866715, 69.6050581987798 16.145688294778715, 72.04504385826434 13.922831096765577 M64.88182134790667 22.79011141292788 C67.06341993748217 19.676477150687006, 69.6370266209872 16.99800397822061, 72.5047646177772 13.703613541866645 M0.3095862682954813 20.983404951524832 C0.3095862682954813 20.983404951524832, 0.3095862682954813 20.983404951524832, 0.3095862682954813 20.983404951524832 M0.3095862682954813 20.983404951524832 C0.3095862682954813 20.983404951524832, 0.3095862682954813 20.983404951524832, 0.3095862682954813 20.983404951524832 M5.539499527960441 20.88561869701925 C3.329027482618118 18.70261483228152, 1.8463357382689911 17.115064145655612, -0.5058932211902045 15.972927681716591 M6.0199985790992 20.655316216038496 C4.33015312516603 18.96345603125516, 2.418808517609992 16.96419578212362, -0.24167931533419118 15.4423772573345 M12.959218488682524 21.368740266441314 C8.217598555989495 17.960239675690442, 3.711617807537544 11.817318858545525, 0.45853156215857527 10.79151649999615 M11.902414710123555 21.193545446424693 C7.904960658726836 18.271821366081983, 4.417680162312331 13.57295333164016, 0.44422692052434953 9.89216017428037 M18.827437862797577 18.545404632572076 C9.974650527218945 15.675181466547272, 4.388451394365553 10.302371361950913, -1.787460608558014 3.1034035332850394 M18.55574737027933 19.773640284311018 C12.705995935677114 15.558642916856602, 6.126340140005777 10.253557569706755, 0.48206619623933733 5.431797906232553 M23.60429329434414 18.963109370470956 C17.022324486956485 12.690625473333178, 8.959890002995017 5.7142039964160105, -0.6345111431344483 1.396920783522558 M24.944664034154307 19.999090747126356 C18.49382649622726 15.530165776448918, 12.81263161533271 11.358595117526566, 0.5698565494478609 -0.5701905758278252 M30.384229093932696 22.073259863374723 C23.95373198789104 14.81629753834563, 13.424944626418434 6.981541342275229, 8.472464375082035 1.0413291652202674 M30.713830675141104 20.271186374431444 C21.48762555557218 14.200670410577493, 12.19136965889404 5.122443972764135, 6.7327085313160016 -0.7130052649900249 M35.966918005559215 18.808969445574114 C28.356975523722518 16.125233387500845, 23.37667032262038 6.322175887299352, 11.51225892330524 -2.2767770942551486 M36.51933483215756 20.521606174345145 C27.57145852317176 13.145094312287084, 20.36246680454576 6.167020290479513, 12.433759601479437 -0.5175409475592367 M41.461083388888014 20.495356663331393 C39.90061490728378 14.939313861266353, 34.87157397780981 10.745630015566027, 18.497892736199457 -1.9178263236414228 M42.58705684728987 20.360167179457846 C36.683482250603284 14.72172810741939, 30.60362833730035 9.141924889500771, 18.399866664524335 0.2656678465549316 M49.18903215769227 19.0696320719468 C40.58754494421612 14.148411215695493, 29.596539681512112 5.688855615064325, 24.343469602537894 0.6823451573616772 M48.83595280423309 19.576161243897914 C40.57253290159299 14.477516123882328, 35.22437674037192 8.560877834950512, 23.707212366645777 -0.0906885910191626 M52.911831421636286 19.14207046768259 C46.34482042783892 13.894111028361952, 35.21316717756495 3.7415792808682315, 29.08935350698485 -1.9265347299835813 M55.32198117007339 21.48057727525573 C45.80057356209126 11.540094182320535, 35.00236905803604 4.394494091297741, 31.225231737429937 -0.9484729151911271 M62.733841424716054 22.21609768482952 C56.35680050674827 15.31662437845744, 47.55908721029857 8.546448829850432, 36.80670156853629 -0.9890870263864571 M60.361385733728646 21.407438031650173 C52.5590837527185 14.811589341806915, 46.35830287729594 6.967876666582617, 36.739895128294656 0.19723606965790452 M67.55434428592754 19.468967682678322 C59.971124059741534 15.429824725785629, 53.81802641530941 9.119360120297452, 44.04909278118821 -0.7636046475562104 M67.20107434661669 20.7757272087001 C58.487619234935934 13.501544274577164, 51.45763931970925 7.700389921836095, 42.6240869531578 -1.2199295033532032 M69.25504399926672 20.56426254551975 C65.22447888261875 12.687218901304137, 60.26269421989457 11.427746231803912, 50.30415675460489 -1.7413752972305687 M71.42486892265538 18.183139439524393 C62.261830628892845 10.585076406584278, 54.3701925827465 4.838552278040023, 48.507984429105534 -0.7756994766340526 M73.37814727328403 15.152274424663517 C67.63288275638482 8.313769059153845, 62.409121631992164 6.21528932539314, 54.898106652559534 -0.42659467010930285 M71.85033627781702 14.187712236760888 C67.70265003325879 9.53007470473961, 61.56682140262131 4.912225610777203, 55.40234051483293 -0.5538288710562895 M70.42285039013407 7.816387586847237 C68.75058950564168 6.952031997610314, 64.85110983987045 2.5289783078431602, 59.9134547531781 -0.9292611575935549 M70.82661387951269 8.283783382030492 C69.45806317599383 6.540437594514192, 67.18901071931792 4.575949380902222, 61.81242631968942 0.02722591024185006 M71.44256023239848 2.900055320782414 C69.76654792806546 2.533208906237001, 69.53529132584804 1.6215209926123677, 67.60574955574427 -0.13767793249513005 M71.36627927973439 3.242189272388643 C70.25267596064948 1.864353308542416, 68.8558087896831 1.0378227399143651, 67.57584565842714 -0.1287053275032619" stroke="#fc54ee" stroke-width="0.5" fill="none"></path><path d="M0.14517845027148724 -0.09102694503962994 C12.82519797701391 1.1415958453901123, 29.797589253634264 -1.1739678929559876, 71.64665157906722 1.5183731485158205 M0.9318249309435487 0.12124157603830099 C26.317998969927473 0.7532553006894896, 50.970661084167745 1.4149381746537992, 69.33290616516035 0.7134984498843551 M68.59314329735946 1.6201068330556154 C70.22777199225645 5.725307600905284, 71.93712979035597 10.866047995830147, 68.12899257056426 21.314659416409313 M69.99703136924666 0.7012248998507857 C70.82430855664435 7.784679784106834, 70.94620503219787 14.725532701504973, 69.53139145020407 20.51097850940573 M69.71557523123931 19.246272861691295 C49.44619969930507 21.3143903549892, 26.53987462818626 19.509030931828647, -1.7540553081780672 22.710504650803387 M70.03257567528647 21.17470746181357 C49.823178641498174 21.73212654293898, 31.181844389997472 21.55206785501364, -0.04108788724988699 21.360880274990677 M0.23873157612979412 20.526197075577556 C0.6682899587721173 15.959067850706182, 0.8952144377798382 8.100940166972503, 1.440562142059207 1.375159339979291 M0.9883174682036042 19.78679337046492 C-1.167107050925758 13.227401217645188, -0.3867881052794719 7.4611370784362965, -0.33994566556066275 0.4827777212485671" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(87.5 37.28571428571445) rotate(0 37.500000000000114 11.428571428571445)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.43076819774227826 6.3684650697308705 C2.2087986054903475 4.341044789025142, 3.3343072645053935 2.025864920326358, 5.288402700754351 -0.17387744748618061 M-0.2840154271246741 6.271455552105897 C1.7551891833988404 4.614289137500473, 2.578153138435459 2.5241993852208897, 4.828850933121985 0.00274717061195906 M0.3454197072516787 12.334867339253007 C4.191473029081104 8.48206992155823, 7.365151433105268 2.829967759997113, 9.448472502780092 -1.0852074046537519 M0.3129607261251317 11.90738937970954 C4.591439614311288 7.1214739588338505, 8.07381623153581 3.616586744733796, 10.808425349694621 0.39866600661424423 M1.2261674096295376 16.48734710821944 C6.311136723651389 11.14283447911058, 9.13488100582086 7.042153174229005, 17.05797203770228 0.40619726357282904 M0.4623240120196481 19.002701081243032 C5.739334486216654 11.102137131600173, 12.385554806839112 4.16282800819949, 15.297541870273133 0.6815852911141684 M-0.3220828206652193 25.320422875170063 C5.601939624857696 16.91252142765338, 8.4538797301955 13.876991763291597, 19.743204422572518 -1.9820962069392678 M1.8745323286017985 24.130293143388283 C6.6674820276637785 15.301448510824558, 14.165601949480218 7.79849969256686, 20.912948874102796 0.7523236110969904 M5.2757398272066975 21.95681273976355 C12.042643349984282 19.282962313592705, 14.860485577422601 12.243588537426945, 27.171366502207132 -1.0309957252803503 M6.058466790257361 22.97004615833997 C12.823536138440987 16.052180967781457, 17.878449797426722 9.94529232160824, 26.8352017790567 -0.038346919788724065 M11.435934446574006 23.470540677332508 C19.877696078457163 16.191143813440416, 25.348003611941742 8.596390205885674, 31.040623697868384 -1.3576475026772137 M11.595984351643178 23.239508271352577 C17.351624066669633 15.026465417190977, 25.189819320136362 8.35097701599796, 32.69071918022523 0.40138134465210484 M16.15430534273694 24.091423797541232 C24.939577750290155 14.46427847674416, 31.68137333104633 8.285013222678163, 38.34873343985652 0.54706244553444 M17.15332999188761 24.010262949512274 C22.466474100429824 16.239400540918048, 28.66958143155413 9.990432266884845, 37.441644494236755 0.45656843400564995 M21.474721061957762 24.63786276416292 C26.083625829832403 17.273846754005135, 31.610245447893924 12.661905207381889, 41.45401739943378 -1.5316683418721908 M22.230945743581042 22.359469401514136 C29.87069943592462 13.86732570932561, 37.88418078044687 6.505783591442867, 41.72806819118165 -0.8258998805330151 M29.053142159536712 22.742018436861876 C33.4212436599451 16.32574775016507, 35.292857400983294 13.053158226876137, 47.63938925690043 0.6926178697671936 M27.00874846085184 23.813789219444338 C33.00918823722096 17.67936950870855, 37.274552210555704 11.725087874855578, 46.71330556268829 -0.29338544437771574 M32.21859343173605 24.999635229411304 C36.84306458159294 16.383760893639426, 41.90851565943452 12.08143208869476, 53.17332693351871 0.1574623692258008 M33.688352051290394 23.717959736997962 C38.2857033995995 18.130054195687848, 41.50299177241592 13.748078158901873, 52.42611895384705 -0.31892436133336943 M39.537191683856015 25.195217751475496 C46.1084468101189 15.187253890397267, 54.00532373430534 4.685125425972597, 56.87390035958887 -1.3233333588320768 M38.33834524402506 23.678720671261175 C44.10258729243504 16.876800219999765, 50.04882188690778 10.230003376676875, 58.60599553128631 0.6426996481861913 M42.87311796215887 24.451854166827655 C52.304477434311984 14.754374599072131, 59.04850436334745 6.221029196875904, 62.16974370637317 1.0901157243667612 M44.23651382164623 22.933387480929053 C49.014971202007565 17.26852766579598, 55.15504213783186 9.259367114989594, 63.04536951509644 0.42352860537604187 M48.95264532194486 25.45639283513208 C57.474359071378586 13.975143180150129, 65.44215126050173 4.80048316989463, 70.66408315417185 -0.5110451465133892 M47.715118388282264 22.79377800808777 C53.71136187230877 16.922278188712696, 59.83783809157397 11.946488981695445, 68.43489994690582 0.5863569148223533 M55.42511105947622 24.609265670161978 C61.133762268099254 16.459529569894944, 64.9997677432614 11.295313816849344, 76.19570912424716 -0.6345871826993061 M53.503116601523544 24.014785836432413 C59.16103922941824 17.129675664088445, 65.69128841176337 9.577501369773943, 74.70569292180481 -1.2066952772797777 M58.42694448005322 22.249664781581636 C61.656018331941986 18.434672055104645, 68.26783727945242 15.501520330073149, 76.43482695992783 2.102107341141597 M58.57144094049845 23.238306933049138 C63.788811067858234 17.002882205820576, 69.29578875221573 10.491499803185725, 77.90640226945028 2.5138732993241106 M63.98658010094759 23.560075413387036 C67.07908655531209 17.9926403165853, 71.48223816217808 12.117328483392516, 77.5682652744951 8.103030124228077 M64.81916190880683 23.116522586855805 C69.06035315105376 17.811545292778867, 74.09415407569743 11.631901171672567, 77.16968790195159 8.37468007627465 M70.39723004541176 24.402801100791244 C73.25873359157711 19.263558169752717, 75.35407146025082 16.995706003181056, 76.50495978182856 13.263731905043791 M69.99539827528226 23.181425179608908 C71.81173754532603 20.908802839649503, 74.86965991676604 18.38849397450232, 77.52135369944129 14.81452898920394 M0.003294346295044548 22.86000658868695 C0.003294346295044548 22.86000658868695, 0.003294346295044548 22.86000658868695, 0.003294346295044548 22.86000658868695 M0.003294346295044548 22.86000658868695 C0.003294346295044548 22.86000658868695, 0.003294346295044548 22.86000658868695, 0.003294346295044548 22.86000658868695 M6.660318711262979 23.312296509699642 C5.129628986960093 21.812615350226352, 2.262634788458181 20.17365856789253, 0.5825024181822793 18.082716446774302 M6.572721301254668 22.89771799850625 C4.603674710057037 21.221187910999458, 1.8167256566717471 19.434895409393704, 0.13282425532438927 18.068117133559884 M10.515950613699182 21.27911546591556 C9.987193140245783 19.48743428425424, 7.2902965639423885 16.588725015834314, -1.3166701926770596 12.521577349342222 M11.498539135090335 23.00963153827822 C9.311193396065889 20.09666078909196, 6.397114930246354 18.619199418285806, 0.5460453056809514 12.461398588903064 M16.687931997981288 23.42168470122271 C13.379345282893526 19.5709143425639, 9.096964526827033 14.719777729611799, 2.002602413382592 6.19941624410944 M17.723913374636687 23.278717234026423 C11.01728850171541 17.438976930624975, 4.865709599732519 10.929922995612532, 0.03549105403220887 7.766989041758048 M25.179700103676723 24.319171183713543 C17.91210426512833 15.455352382705165, 9.656650734063282 10.784783123455666, 0.9909517660897942 3.010980467210394 M23.377626614733444 22.429786402703094 C15.602674956375665 13.883394041758496, 4.892511323540742 6.307486270129509, -0.7633826641204982 1.5336997861888824 M28.707795907881057 21.860904485111874 C23.603342594106422 16.474857651551858, 11.418670801706828 7.170186132839676, -0.06302575271730482 -1.1281338898313882 M30.420432636652087 22.548438991544586 C19.08053745387375 14.311130783345376, 8.53285925354631 4.148501197088262, 1.696210393978607 -3.0400967314688208 M36.53051031865278 24.937395400645947 C28.53645861354111 19.101139328650483, 22.258577738184805 11.687929971025385, 6.43225221091086 -2.736383040825635 M36.395320834779234 23.20581934070102 C26.21106425537642 14.464644036489455, 15.926599810895905 5.322401664360655, 8.615746381107215 -1.2892934926029143 M41.14246236905036 23.234076600544007 C36.64783177634095 14.731400303957813, 28.54023848472567 10.594204584652756, 15.070100333696137 -2.4103916144871462 M41.64899154100147 21.903735852822514 C33.2137865213091 15.218391104015923, 23.671277664018994 5.326308461113218, 14.297066585315298 -1.5224597060063232 M47.351227957800596 23.386979717005815 C41.29541321764816 13.062809241177112, 29.892920605792202 5.4458892184246075, 18.597547639365317 -0.17233110361615722 M49.689734765373736 23.449088080030705 C37.65408792660539 12.932334099739547, 28.044644532501717 5.146333732560947, 19.57560945415777 -2.1504692570574075 M55.806872787739195 24.57782214126911 C43.99989754386034 13.512932814771828, 32.48171949519683 4.026765464587367, 24.91661295575411 -1.8111872419749364 M54.99821313455985 21.767703520985208 C47.03315971892089 15.67853028623441, 37.40068965241079 6.923455454450043, 26.102936051798473 -2.809018084596561 M59.85212900759294 21.96997925553365 C53.28846606480274 15.206104942078039, 44.45033272516604 10.593189776129575, 31.93448155658931 -1.4467877723637113 M61.15888853361472 22.559059852807657 C50.14218788778963 14.681608483319765, 41.01926200674828 4.91608116472003, 31.47815670079232 -2.7039019490021357 M69.05192815042034 22.30357123002769 C59.640262971047676 17.265005727671895, 57.12608531631937 13.95985414141576, 37.09303809992939 0.06528435044575431 M66.67080504442498 23.110121728148176 C55.62227177259141 14.452252185580189, 46.444077257722356 3.5649344363312387, 38.0587139205259 -1.7971796351240208 M74.27002987427983 24.051140109979457 C61.788198060764294 15.091418418611852, 53.65655639110433 5.025894940842573, 44.44549536883283 -2.7106319533182983 M73.3054676863772 23.669767862156874 C62.55343415839766 13.241697949776116, 51.567162903891585 5.4696405878516074, 44.318261167885844 -2.4327458653121 M76.4815219622299 21.6051498156172 C73.33047435335747 14.97649013192948, 64.75950094270887 9.499052847873347, 49.73302301202642 -3.6345988014561 M77.20059241635799 21.85504103248387 C69.73645638158723 15.090121926687104, 62.11811116473274 8.963852428811396, 51.204541577927046 -1.961904212444729 M75.32119158166074 13.497309055197647 C70.84474356825626 11.331008943814346, 64.46805259934932 3.863740374550215, 55.91428048234588 -1.3959925527299148 M76.68972738808566 16.089220992307773 C70.6081624779752 11.111617813504, 66.53358988500315 6.014004449261167, 55.95017090231335 -2.9295303028622968 M76.49397719188555 11.70137200557174 C74.516132801986 7.045200641021115, 70.71550638049848 4.102852471764727, 62.052955648223666 -3.977891912491662 M77.91895597631073 11.152274993634911 C70.51184359073383 5.662490393935645, 65.90472454032579 0.38926394033316825, 62.23200259717627 -2.4026190215196883 M76.02045045462867 3.8921975734370733 C74.10837596575038 3.2207284578289084, 72.34407622624943 -0.19805407046740442, 69.91055248665427 -1.691735474885128 M76.38453677944338 4.703182704160639 C74.21222075882103 3.3263376032493337, 72.56873545947857 1.2464475175491514, 69.50141781043996 -2.031691015724711" stroke="#f41d92" stroke-width="0.5" fill="none"></path><path d="M-0.09102694503962994 -1.9616640079766512 C16.17546825716277 0.16942336480133224, 29.51870181690911 1.5922096602153033, 76.51837314851605 1.3902520071715117 M0.12124157603830099 0.4521169448271394 C25.8419369109907 -1.4599716642405858, 52.208512945100814 -1.3659625330474248, 75.71349844988458 -0.7163256322965026 M76.62010683305584 0.17693842761218548 C75.96472972115258 9.304361175479645, 76.37500178297738 12.965122470100034, 75.60037370212399 23.49611807321867 M75.70122489985101 0.8271406972780824 C75.9130398864461 7.7749288229804066, 75.97781133005152 13.183174750501578, 74.7966927951204 23.01505610333493 M73.53198714740597 24.408735279792158 C48.69053168362021 23.50114364373235, 19.65787024144089 20.996428260679526, 1.9962189365178347 22.34142750953993 M75.46042174752824 21.87531778203061 C45.68008978990852 22.453411570384336, 15.858321636915257 22.84048672635777, 0.6465945607051253 22.62318265424779 M-0.18808863870799541 23.63046378349623 C1.0613163500492062 15.84128300446487, -1.0671574682529483 8.265494483123941, 1.375159339979291 0.3165001403540373 M-0.9274923438206315 21.87421429143956 C-0.8435333831767949 13.76507606024723, -0.06770809130477037 4.738282837904997, 0.4827777212485671 0.16338238958269358" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(168.214285714286 38) rotate(0 40.35714285714295 10)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.2907757714064667 7.142876725693338 C1.5333555422600607 4.414881286055029, 3.070928264094417 2.4910436763928874, 4.4530305850827165 0.37872696703646713 M-0.38778528903144094 6.796451450561277 C1.078456609316904 4.536528678438601, 2.116860821389377 3.244400574684801, 4.629655203180856 0.625667182340961 M0.42715426619161234 12.14376192879635 C3.1980849442715527 8.929424899054943, 4.167926869637071 5.789086644818406, 9.579377269697321 1.018796456892765 M-0.0003236933518542262 12.555176806121647 C1.8829993378452292 9.589961778405248, 5.041791181569817 7.027813760863969, 11.063250680965316 0.2722317743196847 M-2.0796068059792976 19.460185454134518 C2.630073753581455 12.32194831753366, 8.593926344118247 10.716815153505829, 15.6976899704928 2.230530994657891 M0.43574716704429584 18.111817276012772 C4.668855233739086 12.71941264040407, 9.862980041359142 6.581450682373168, 15.973077998034139 -0.5139915573480556 M4.227883396241566 21.97948322065089 C5.55951139169184 15.149290847349285, 12.278448201820769 9.605519715087748, 19.347073141762877 0.36484783088560135 M3.0377536644597853 21.227200382313768 C9.134551599066242 13.499271713921527, 16.548906266242806 4.284448924369126, 22.081492959799135 -0.45611574089382856 M5.491181293403951 22.77163158534356 C12.792244190919735 15.87762353952013, 15.741323269813119 14.245927591961244, 24.92508165599069 0.5531925609347166 M6.50441471198037 22.56197968768217 C11.29513533629149 16.17052370337919, 16.90365049599401 12.637031877727303, 25.917730461482318 0.7036824472260417 M13.042585872755083 23.38133790216209 C21.19181791365603 13.359976863214163, 29.0242243666127 3.819859589959723, 30.636106520376003 0.1936777626409416 M12.811553466775152 21.216068844895677 C17.971550834730458 15.727335791331758, 24.673697802898253 8.745032533448827, 32.39513536770532 0.275167196175385 M18.290377025532703 23.64742806983328 C22.088802649324947 17.21718010046244, 29.323106773020257 10.976812025980387, 37.167724501156556 0.06458245973107424 M18.209216177503745 21.9321328640092 C21.040996391739135 17.716938454131633, 25.407078631525934 13.79988024797293, 37.077230489627766 -0.17184804649087582 M24.87449263393657 20.251217791192822 C27.871292643790916 15.625975994639791, 33.61279913318913 10.77106505358395, 41.1266703555321 1.0991754800408131 M22.596099271287784 21.92520937168657 C27.032138260662798 17.42751755889763, 32.60697833896444 10.6619748687951, 41.832438816871274 -0.7514272166089171 M27.605556339204423 23.56668921496752 C32.96691769771911 13.277998822800571, 41.47365137171806 7.806754120236846, 47.977864599740386 -0.7641079027030422 M28.677327121786885 23.014135252982708 C35.447157144806084 14.109225533039673, 42.40129552842541 5.140404628621198, 46.991861285595476 -0.13008499162381426 M35.900849773536024 20.435314293993084 C39.788342350642445 12.897278969486347, 48.00481105310266 7.243547882211052, 53.48038574098116 -1.8068426106647841 M34.61917428112268 22.677620009561448 C39.28395655257472 15.379602141485787, 45.14578763696569 9.00452247282533, 53.00399901042199 0.5292865676282652 M40.723340328169115 23.0162314269963 C46.39929014915582 15.139781253731469, 51.5811654819664 4.486439828867418, 56.626498045492184 0.31956287988237975 M39.206843247954794 22.185672401477703 C45.65250689756919 14.816171385251899, 52.24449292481253 8.484778464436202, 58.59253105251045 0.6988446587022388 M46.01765338530345 23.561068816812877 C51.53656342510558 15.148626054341927, 58.23285392034471 3.983773879610826, 65.0776237704732 -0.6313101032258501 M44.499186699404845 21.65567380539531 C51.66105727671183 14.719985814826437, 56.480882909055346 6.283628604797148, 64.41103665148248 0.5418690957619319 M51.64910008617677 23.264121112169605 C53.04929824199499 18.824545859148124, 56.76001326169234 14.419900926296952, 68.10337093216195 -0.6507137862626777 M48.986485259132465 22.18380485636918 C56.10280031525403 15.22708170447568, 64.10346272973946 5.972541079580116, 69.2007729934977 0.04418895136226553 M56.83964956298884 23.037172765902206 C62.87637526898496 12.356184524902313, 71.89853555284837 6.22438917305352, 74.0175055377582 -0.6999105913923813 M56.245169729259274 20.90111880606316 C59.93626407830778 17.18224686764557, 62.97167318954213 12.603866320294605, 73.44539744317773 -0.9297440853329135 M59.1069567069774 20.35967467673745 C65.92619874440736 16.16947933504699, 73.62905637184693 7.839446274831415, 79.11697935349969 0.1400035051118067 M60.0955988584449 21.104968164532146 C65.03609781155738 15.630990068081505, 69.7056120155708 11.48835748711606, 79.5287453116822 0.4945709332312873 M65.7003344003422 20.053567547511058 C71.2266693645948 13.159036960374419, 76.48408754638862 8.056504120761872, 83.14972504961466 3.7070510717539236 M65.25678157381097 21.315078058439987 C71.3245903561075 14.911893793919152, 76.46702495548489 8.84849534794847, 83.42137500166123 2.5218863118870303 M72.38600538837527 23.01718533072197 C71.86694563099904 19.180142288866506, 75.6452739055716 14.839729872691851, 81.0712365605931 8.569335800501824 M70.55394150660176 21.461548242024563 C74.63584108846545 18.538848241391072, 78.3607019163709 14.27585363356593, 83.39743218683333 8.758410042337449 M76.14852537641518 21.578784463815857 C79.36414627883723 19.3494477875201, 80.7063546309826 17.1102200696649, 83.32800008075469 14.201257826102676 M76.98470109895088 21.617086984571536 C78.6968526559093 19.133639045119317, 79.95175819471177 17.649003414344993, 83.5785808958874 13.987785097019525 M-0.09145614451965933 19.92049838647724 C-0.09145614451965933 19.92049838647724, -0.09145614451965933 19.92049838647724, -0.09145614451965933 19.92049838647724 M-0.09145614451965933 19.92049838647724 C-0.09145614451965933 19.92049838647724, -0.09145614451965933 19.92049838647724, -0.09145614451965933 19.92049838647724 M5.5824550016043375 20.51460149181086 C4.4430566237285305 19.2787731051772, 2.7972028556671775 17.030111214033006, 0.41448153345955174 14.308023910346094 M6.447713037785668 20.185266000824157 C3.9294962100468602 18.14649564710097, 2.192487984424308 16.66976776608981, 0.384392153239973 14.66355133106059 M12.296095644882884 19.021349648735644 C9.436836777098284 16.419742657576727, 5.678958376606767 16.142263429224705, -0.9607390477543455 10.835307828091262 M12.181721671125857 19.67526242320907 C7.964252349880944 16.586475146295957, 3.3496024395401287 11.811740827319413, 0.29331919036454157 9.48052006488311 M20.072641683247046 21.504336079154307 C14.298007990664289 17.873736241738758, 12.635862496672791 14.61776779382158, 1.6453099690936046 6.097271884774497 M18.183256902236597 21.01157192377388 C12.123290905917079 14.608180012855314, 7.003785065693304 9.270026840566716, 0.16802928807209305 4.2815182953902395 M23.75070217765982 21.99163043192135 C18.467122170905967 11.668149917027634, 9.149367588204743 5.29822078952064, 1.541200337227131 0.8911038475364295 M24.438236684092534 19.969790445366073 C16.41677888126663 12.286298270650352, 6.513301116884183 4.222549081507781, -0.37076250441030156 -1.4672248329714093 M32.20881070598556 18.107657702109023 C23.52821751692536 10.577773485122956, 12.935184845957032 2.970449293945528, 6.069278379247326 0.575438758164803 M30.477234646040635 19.199668100504383 C24.316094400239233 13.392731402025287, 17.63483078396829 9.329978184385087, 7.516367927470046 -0.9649789828880033 M37.297878127888566 21.310348935979587 C27.865196962410025 14.103975131227443, 22.97582818653813 9.277842169791292, 12.43294644736799 -0.04398853097369848 M35.96753738016707 20.432115671779197 C31.722631340069544 15.329515150189746, 24.451266442393944 10.296923050307278, 13.320878355848812 -0.6828342531647342 M42.83239885714205 20.91972718883327 C35.88995114094932 14.051965772352363, 31.17464107694883 9.415145849873703, 20.80733415125342 0.8540342763831923 M42.89450722016694 18.95251876331887 C36.13106881320887 15.57380350401121, 32.319536553605104 10.148126154461341, 18.82919599781217 0.14722951378879046 M50.81562750341029 20.178879756932353 C40.58730193229946 12.448813597447316, 32.3016148978476 5.264240777018125, 25.206154654676816 -0.9221885144642936 M48.00550888312639 20.942960032713806 C44.699834112514196 15.555104746910517, 38.27762443825199 11.358962045598602, 24.20832381205519 -1.2147927028990733 M54.34411181068927 20.474044974567555 C47.00962152094437 13.061198573848102, 41.98570955435669 7.253129074324372, 31.706881317302482 0.4837116427461119 M54.933192407963276 19.98355598808285 C46.53664633061828 13.09475904323092, 36.39578339789874 3.3520022984383715, 30.449767140664058 -1.6747088706379323 M60.05932139797498 18.11528930102964 C52.06220474179897 14.256900243034515, 45.3597345840233 3.885979774505314, 39.355280633126384 1.196254332305152 M60.86587189609547 19.11438407504489 C54.41168848920234 14.523259780552063, 45.91410065194139 6.761129568881609, 37.49281664755661 0.19720689369108158 M68.59927649993169 19.11718440758087 C63.20640150250568 17.204683906228066, 57.197678384554884 11.791198451092813, 42.61704097114451 0.38878726350356274 M68.21790425210911 19.89740976369031 C60.904840730994295 14.410411703212244, 56.46878978255758 10.413245277506322, 42.89492705915071 -1.7718585892931706 M73.50308090533264 21.418893974521467 C64.43562061972098 11.790380404911096, 56.30792668248491 5.921584011220197, 47.829401316021155 -0.9358366423963851 M73.75297212219931 19.917823345647463 C65.4154963842532 12.64724560323064, 57.584173832355084 6.084347201972301, 49.502095905032526 -0.29533047450029315 M77.43609859884208 20.94514592311371 C71.69377698037937 12.965335192279612, 60.70975494710411 3.2481987998187662, 56.10568420652951 -1.7251724550944552 M80.0280105359522 19.12626216600675 C73.90383988390788 15.33752237008535, 67.65196221563467 9.552216795057635, 54.57214645639713 -1.1259090953019246 M84.07564837856495 19.2020830966863 C76.56659738913856 14.19915351140532, 71.19405103453875 6.452438250219332, 59.560472377333895 1.1837889966716126 M83.4976515238946 16.715911304425 C75.70154016808677 12.034258105947242, 68.46400137969488 5.382016969618432, 61.21865436783071 -0.71334773160768 M81.35501881871996 11.962471608520502 C79.89847689072494 9.153603188536726, 72.74390279819893 6.736452510990615, 68.21414169743025 -0.7152920778074332 M82.97698908016709 12.088961923011595 C78.32730159067452 8.319699057329935, 72.5122608227257 3.8752689149440016, 67.53423061575108 -0.7031158127986856 M83.5451850826775 6.878158797404335 C78.44742296158834 4.306019834147732, 74.41414557293645 -0.18507528653329675, 74.10434357858044 -0.8210306186160623 M82.58330161840817 7.401455052091347 C79.90510564251223 5.578209113636242, 78.72662450640534 3.619235439791108, 73.99418977553736 -0.8552573105251966 M82.68965528148735 1.8689973366745538 C82.31055075561922 1.164031083239307, 81.84792640917186 0.7776125371347442, 79.88119955327434 -0.5771076161672042 M83.18839824840003 1.8907531789802736 C82.11898117638177 1.2617649294987068, 80.78794169264746 0.2643759156777159, 79.88562269382251 -0.6676902154718204" stroke="#fa5252" stroke-width="0.5" fill="none"></path><path d="M-1.9616640079766512 0.5166709590703249 C17.23238445159851 -1.7621569050276384, 35.66368991495785 1.555109022763957, 82.10453772145745 0.8546381760388613 M0.4521169448271394 0.13704375084489584 C16.34788630616723 0.7008673527211481, 33.55088470956054 0.6156391777962976, 79.99796008198943 0.24540341552346945 M80.89122414189812 0.9948392678052183 C82.45989626973358 7.14165279641747, 78.70259261458602 12.073456635698669, 81.35326093036171 19.972655193880186 M81.54142641156402 0.9085983103141182 C81.66272916096683 7.012233545072372, 80.2450834383392 11.874420377425842, 80.87219896047797 20.630306935869143 M82.2658781369352 20.91783370263873 C60.91427068698369 18.520026235395495, 37.85888020175386 21.403770892435137, -0.5157153476029634 21.904455857351408 M79.73246063917365 20.3217992549762 C48.046051464376305 20.51336669184133, 16.19305541366343 21.01822068549558, -0.23396020289510489 20.507046050392077 M0.7733209263533349 20.97443600185214 C0.9763722477480761 13.60616523027419, 1.4166889844462267 10.89556543342768, 0.3165001403540373 1.7782750297337757 M-0.9829285657033318 19.05520493444054 C-0.40027311369776697 14.71706994716077, -0.5376091592013832 10.254857162944964, 0.16338238958269358 0.6313275462016459" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(44.07142857142867 14.71428571428578) rotate(0 2 13)"><text x="2" y="18" font-size="1em" fill="#fc54ee" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(120 10) rotate(0 7.5 13)"><text x="7.5" y="18" font-size="1em" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(198.35714285714312 12.14285714285711) rotate(0 7 13)"><text x="7" y="18" font-size="1em" fill="#fa5252" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g></svg>
<figcaption>Random distribution using ceil</figcaption></p>
</figure>
<h3 id="random-choice"><a class="toclink" href="#random-choice">Random Choice</a></h3>
<p>You can use the <code>random</code> function to pick a random value from a list of values:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="s1">'red'</span><span class="p">,</span><span class="w"> </span><span class="s1">'green'</span><span class="p">,</span><span class="w"> </span><span class="s1">'blue'</span><span class="p">])[</span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">3</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">color</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">);</span>
<span class="go"> color</span>
<span class="go">βββββββ</span>
<span class="go"> green</span>
<span class="go"> green</span>
<span class="go"> blue</span>
<span class="go"> green</span>
<span class="go"> blue</span>
</pre></div>
<p>The expression defines an array of colors, and then uses <code>random</code> to get a random element from the array. Notice that in PostgreSQL, arrays start at 1:</p>
<div class="highlight"><pre><span></span><span class="c1">-- In PostgreSQL arrays start at 1</span>
<span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="s1">'red'</span><span class="p">,</span><span class="w"> </span><span class="s1">'green'</span><span class="p">,</span><span class="w"> </span><span class="s1">'blue'</span><span class="p">])[</span><span class="mf">1</span><span class="p">];</span>
<span class="go"> array</span>
<span class="go">βββββββ</span>
<span class="go"> red</span>
</pre></div>
<h3 id="sampling"><a class="toclink" href="#sampling">Sampling</a></h3>
<p>Sampling a random portion of a table is a very common when training a model. A simple way to fetch a random portion of a table is combining <code>random</code> with <code>LIMIT</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">sample</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span>
<span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10000</span>
</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sample</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">βββββββ</span>
<span class="go"> 10000</span>
<span class="go">(1 row)</span>
<span class="go">Time: 205.643 ms</span>
</pre></div>
<p>To sample 10K random rows from the table you first sort in a random order, and then take the first 10K rows.</p>
<p>Using <code>random</code> to sample data is great, but for very large datasets it can be inefficient. PostgreSQL provides other methods of sampling a proportion of a table, which are more suited for large tables.</p>
<p>PostgreSQL provides two <a href="https://www.postgresql.org/docs/13/sql-select.html#SQL-FROM" rel="noopener">sampling methods</a>, <code>SYSTEM</code> and <code>BERNOULLI</code>. To sample a table, use the <code>TABLESAMPLE</code> keyword in the <code>FROM</code> clause, and provide the sampling method along with it's arguments. For example, sampling 10% of the table using the <code>SYSTEM</code> sampling method:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">sample</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">TABLESAMPLE</span><span class="w"> </span><span class="k">SYSTEM</span><span class="p">(</span><span class="mf">10</span><span class="p">)</span>
</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sample</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">βββββββ</span>
<span class="go"> 95400</span>
<span class="go">(1 row)</span>
<span class="go">Time: 13.690 ms</span>
</pre></div>
<p>The <code>SYSTEM</code> sampling method works by sampling blocks rather than rows, which makes it very fast. The table we sampled contains 1M rows, and the sample returned slightly less than 100K rows. For large datasets it's not uncommon to compromise accuracy for performance.</p>
<p>Another sampling method provided by PostgreSQL is <code>BERNOULLI</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">sample</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">TABLESAMPLE</span><span class="w"> </span><span class="n">BERNOULLI</span><span class="p">(</span><span class="mf">10</span><span class="p">)</span>
</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sample</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">ββββββββ</span>
<span class="go"> 100364</span>
<span class="go">(1 row)</span>
<span class="go">Time: 54.593 ms</span>
</pre></div>
<p>Unlike the <code>SYSTEM</code> sampling method, <code>BERNOULLI</code> works at the row level which makes it a bit slower, but the results are better distributed.</p>
<p>These are the timings for sampling 10% of table with 1M rows using different sampling methods:</p>
<table>
<thead>
<tr>
<th>Sampling Method</th>
<th>Timing</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>random()</code></td>
<td>205ms</td>
</tr>
<tr>
<td><code>BERNOULLI</code></td>
<td>54ms</td>
</tr>
<tr>
<td><code>SYSTEM</code></td>
<td>13ms</td>
</tr>
</tbody>
</table>
<p>If you need to sample from a large table consider using <code>TABLESAMPLE</code>.</p>
<h3 id="example-train-test-split-with-sql"><a class="toclink" href="#example-train-test-split-with-sql">Example: Train / Test Split with SQL</a></h3>
<p>A common task when analyzing data is to split a dataset for training and testing. The training dataset is used to train the model, and the test dataset is used to evaluate the model.</p>
<p>To put what you've seen so far to practice, generate a transactions table with some random data:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="k">AS</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'2021-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 day'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">365</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">billed_at</span><span class="p">,</span>
<span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">90</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">random</span><span class="p">())</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">6</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">reported_as_fraud</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="mi">1</span><span class="p">;</span>
</pre></div>
<p>The transaction table include the date and amount of the transaction, and an indication whether the transaction was reported as fraudulent.</p>
<p>Before we move on, let's break it down:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="s1">'2021-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 day'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ceil</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">365</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">billed_at</span><span class="p">;</span>
</pre></div>
<p>Add a random number of days between 0 and 365 to January 1st, 2021 to produce a random date in that year.</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">90</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">random</span><span class="p">())</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">charged_amount</span><span class="p">;</span>
</pre></div>
<p>Produce a random round charged amount between 10 and 100.</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">6</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">reported_as_fraud</span><span class="p">;</span>
</pre></div>
<p>Produce the parameter we want to estimate. In our fake data, we want to have 40% fraudulent transactions. Using an expression we produce a boolean value which will evaluate to true ~40% of the times.</p>
<p>This is what the data looks like:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">transaction</span><span class="p">;</span>
<span class="go"> id β billed_at β charged_amount β reported_as_fraud</span>
<span class="go">βββββΌββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ</span>
<span class="go"> 1 β 2021-05-22 00:00:00 β 54 β t</span>
<span class="go"> 2 β 2021-05-31 00:00:00 β 63 β f</span>
<span class="go"> 3 β 2021-11-11 00:00:00 β 26 β t</span>
<span class="go"> 4 β 2021-07-04 00:00:00 β 64 β t</span>
<span class="go"> 5 β 2021-02-27 00:00:00 β 90 β t</span>
<span class="go"> 6 β 2021-05-21 00:00:00 β 20 β t</span>
<span class="go"> 7 β 2021-07-29 00:00:00 β 69 β t</span>
<span class="go"> 8 β 2021-02-24 00:00:00 β 20 β f</span>
<span class="go"> 9 β 2021-05-07 00:00:00 β 36 β f</span>
<span class="go"> 10 β 2021-05-05 00:00:00 β 38 β f</span>
</pre></div>
<p>To test a model which classifies transactions as fraudulent, we want to split the table into a training and test datasets. One way to do that is adding a column, but we are going to create two separate tables instead.</p>
<p>To create a table similar to an existing table in PostgreSQL, you can use the following commands:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">transaction_training</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="k">NO</span><span class="w"> </span><span class="k">DATA</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">transaction_test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="k">NO</span><span class="w"> </span><span class="k">DATA</span><span class="p">;</span>
</pre></div>
<p>This is a really handy syntax! We simply tell PostgreSQL to create a table similar to another table, but with no data.</p>
<p>Next, we want to split the data in the <code>transaction</code> table between <code>transaction_training</code> and <code>transaction_test</code>. We want our training set to include 80% of the rows, in this case 8 rows:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span>
<span class="n">training_transaction_ids</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">transaction_training</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">transaction</span>
<span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">8</span>
</span><span class="hll"><span class="w"> </span><span class="n">RETURNING</span><span class="w"> </span><span class="n">id</span>
</span><span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">transaction_test</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">transaction</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">training_transaction_ids</span><span class="p">);</span>
</span></pre></div>
<p>To populate data for training we select from the <code>transaction</code> table, shuffle the rows using <code>ORDER BY random()</code> and then insert into <code>transaction_training</code> just the first 8 rows.</p>
<p>To insert only the remaining rows into the test table, we keep the ids of the training rows by specifying <code>RETURNING id</code> in a common table expression (the <code>WITH</code> clause). We then insert rows into <code>transaction_test</code> and exclude rows in <code>training_transaction_ids</code>. For more on this technique check out <a href="sql-tricks-application-dba#implement-complete-processes-using-with-and-returning">how to implement complete processes using <code>WITH</code> and <code>RETURNING</code></a>.</p>
<p>This is the result:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">transaction_training</span><span class="p">;</span>
<span class="go"> id β billed_at β charged_amount β reported_as_fraud</span>
<span class="go">βββββΌββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ</span>
<span class="go"> 6 β 2021-05-21 00:00:00 β 20 β t</span>
<span class="go"> 4 β 2021-07-04 00:00:00 β 64 β t</span>
<span class="go"> 5 β 2021-02-27 00:00:00 β 90 β t</span>
<span class="go"> 2 β 2021-05-31 00:00:00 β 63 β f</span>
<span class="go"> 10 β 2021-05-05 00:00:00 β 38 β f</span>
<span class="go"> 3 β 2021-11-11 00:00:00 β 26 β t</span>
<span class="go"> 9 β 2021-05-07 00:00:00 β 36 β f</span>
<span class="go"> 7 β 2021-07-29 00:00:00 β 69 β t</span>
<span class="hll"><span class="go">(8 rows)</span>
</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">transaction_test</span><span class="p">;</span>
<span class="go"> id β billed_at β charged_amount β reported_as_fraud</span>
<span class="go">βββββΌββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ</span>
<span class="go"> 1 β 2021-05-22 00:00:00 β 54 β t</span>
<span class="go"> 8 β 2021-02-24 00:00:00 β 20 β f</span>
<span class="hll"><span class="go">(2 rows)</span>
</span></pre></div>
<p>And there you have it, a training dataset and a test dataset with SQL, directly in the database!</p>
<hr>
<h2 id="descriptive-statistics"><a class="toclink" href="#descriptive-statistics">Descriptive Statistics</a></h2>
<p>When you get a fresh data set, the first thing you usually want to do is get familiar with it. Some people call this "Exploratory data analysis", or EDA for short. Pandas, as well as other languages and tools, provide some utility functions to produce descriptive statistics.</p>
<h3 id="describing-a-series"><a class="toclink" href="#describing-a-series">Describing a Series</a></h3>
<p><a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html" rel="noopener">Describing a numeric series using pandas</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">>>> </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">s</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span>
<span class="go">count 3.0</span>
<span class="go">mean 2.0</span>
<span class="go">std 1.0</span>
<span class="go">min 1.0</span>
<span class="go">25% 1.5</span>
<span class="go">50% 2.0</span>
<span class="go">75% 2.5</span>
<span class="go">max 3.0</span>
<span class="go">dtype: float64</span>
</pre></div>
<p>To generate descriptive statistics in SQL, you can use the following query:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mf">3</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">n</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">percentile_cont</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">0.25</span><span class="p">,</span><span class="w"> </span><span class="mf">0.5</span><span class="p">,</span><span class="w"> </span><span class="mf">0.75</span><span class="p">])</span><span class="w"> </span><span class="k">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="go">count β avg β stddev β min β percentile_cont β max</span>
<span class="go">βββββββΌβββββββββΌββββββββββββΌββββββΌββββββββββββββββββΌβββββ</span>
<span class="go"> 3 β 2.0000 β 1.0000000 β 1 β {1.5,2,2.5} β 3</span>
</pre></div>
<p>Basic aggregate functions in SQL produced a similar output to that of Pandas. The interesting part here is function <code>percentile_cont</code>.</p>
<p>The function <code>percentile_cont</code> is an <a href="https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE" rel="noopener">Ordered-Set Aggregate Function</a>, meaning, it operates with respect to some order. To illustrate, in the query above you can replace both <code>min</code> and <code>max</code> with <code>percentile_cont</code>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mf">3</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">percentile_cont</span><span class="p">(</span><span class="k">array</span><span class="p">[</span>
<span class="hll"><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="c1">-- <--- min</span>
</span><span class="w"> </span><span class="mf">0.25</span><span class="p">,</span><span class="w"> </span><span class="mf">0.5</span><span class="p">,</span><span class="w"> </span><span class="mf">0.75</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="c1">-- <--- max</span>
</span><span class="w"> </span><span class="p">])</span><span class="w"> </span><span class="k">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="p">),</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="go"> percentile_cont</span>
<span class="go">βββββββββββββββββ</span>
<span class="go"> {1,1.5,2,2.5,3}</span>
</pre></div>
<p>Another common use for <code>percentile_cont</code> is to find the <a href="https://en.wikipedia.org/wiki/Median" rel="noopener">median</a> of a sequence of numbers:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">percentile_disc</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="w"> </span><span class="k">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">percentile_cont</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="w"> </span><span class="k">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="p">)</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="go"> percentile_disc β percentile_cont</span>
<span class="go">ββββββββββββββββββΌβββββββββββββββββ</span>
<span class="go"> 5 β 5.5</span>
</pre></div>
<p>The query demonstrates two types of medians:</p>
<ul>
<li><code>percentile_disc</code> returns <em>the</em> value that 50% of the table is less than. Notice that <code>5</code> is present in the table.</li>
<li><code>percentile_cont</code> returns <em>a</em> value that 50% of the values are less than. The value 5.5 is not present in the table, it's the value between 5 and 6 which divides the values in the table in the middle (0.5).</li>
</ul>
<p>Both functions can accept an array of values, in which case they will return a corresponding list of results.</p>
<h3 id="describing-a-categorical-series"><a class="toclink" href="#describing-a-categorical-series">Describing a Categorical Series</a></h3>
<p>Previously we described a list of numbers. This time we want to describe a list of categorical values. For example, pandas will produce the following output:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">])</span>
<span class="go">>> s.describe()</span>
<span class="go">count 4</span>
<span class="go">unique 3</span>
<span class="go">top a</span>
<span class="go">freq 2</span>
<span class="go">dtype: object</span>
</pre></div>
<p>Using SQL we can produce similar output:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="s1">'a'</span><span class="p">,</span><span class="w"> </span><span class="s1">'a'</span><span class="p">,</span><span class="w"> </span><span class="s1">'b'</span><span class="p">,</span><span class="w"> </span><span class="s1">'c'</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">v</span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">V</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">unique</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="k">mode</span><span class="p">()</span><span class="w"> </span><span class="k">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">V</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">top</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="go"> count β unique β top</span>
<span class="go">ββββββββΌβββββββββΌβββββ</span>
<span class="go"> 4 β 3 β a</span>
</pre></div>
<p>To calculate the number of unique values you used <code>COUNT(DISTINCT ...)</code>. To get the value that appears most often in the series, i.e. the one with the highest frequency, you used another ordered set function called <code>mode</code>.</p>
<hr>
<h2 id="subtotals"><a class="toclink" href="#subtotals">Subtotals</a></h2>
<p>Another useful technique to analyze data is producing sub totals. We already saw how to apply aggregate functions on the table, but how about multiple aggregation levels in the same query?</p>
<p>Let's imagine a table of employees. For each employee we keep the name, the role and the department they work at:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Haki'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Dan'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jax'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Bill'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'David'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">emp</span><span class="p">;</span>
<span class="go"> name β department β role</span>
<span class="go">βββββββββΌβββββββββββββΌβββββββββββ</span>
<span class="go"> Haki β R&D β Manager</span>
<span class="go"> Dan β R&D β Developer</span>
<span class="go"> Jax β R&D β Developer</span>
<span class="go"> George β Sales β Manager</span>
<span class="go"> Bill β Sales β Developer</span>
<span class="go"> David β Sales β Developer</span>
</pre></div>
<p>To find the number of employees with each role in each departments, you can use <code>GROUP BY</code>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="hll"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">;</span>
</span>
<span class="go"> department β role β count</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββ</span>
<span class="go"> R&D β Developer β 2</span>
<span class="go"> R&D β Manager β 1</span>
<span class="go"> Sales β Manager β 1</span>
<span class="go"> Sales β Developer β 2</span>
</pre></div>
<h3 id="rollup"><a class="toclink" href="#rollup">Rollup</a></h3>
<p>What if we want to also get the number of employees in each department, in all roles:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="hll"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROLLUP</span><span class="p">(</span><span class="n">department</span><span class="p">),</span><span class="w"> </span><span class="k">role</span><span class="p">;</span>
</span>
<span class="go"> department β role β count</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββ</span>
<span class="go"> R&D β Developer β 2</span>
<span class="go"> R&D β Manager β 1</span>
<span class="go"> Sales β Manager β 1</span>
<span class="go"> Sales β Developer β 2</span>
<span class="hll"><span class="go"> R&D β Β€ β 3 -- <-- Total for R&D</span>
</span><span class="hll"><span class="go"> Sales β Β€ β 3 -- <-- Total for Sales</span>
</span></pre></div>
<p>Notice the use of the subclause <code>ROLLUP</code> in the <code>GROUP BY</code> clause.</p>
<p>To add a subtotal for each department, we tell the database to "rollup" by the department field. The database then added two additional aggregate results, one for each department.</p>
<p>The database can actually produce sub totals in several levels. For example, to add the grand total of the number of employees in all departments, we can tell the database to "rollup" the role field as well:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="hll"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROLLUP</span><span class="p">(</span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">);</span>
</span>
<span class="go"> department β role β count</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββ</span>
<span class="hll"><span class="go"> Β€ β Β€ β 6 -- <-- Grand total</span>
</span><span class="go"> R&D β Developer β 2</span>
<span class="go"> R&D β Manager β 1</span>
<span class="go"> Sales β Manager β 1</span>
<span class="go"> Sales β Developer β 2</span>
<span class="go"> R&D β Β€ β 3 -- <-- Total for R&D</span>
<span class="go"> Sales β Β€ β 3 -- <-- Total for Sales</span>
</pre></div>
<p>The query now includes several subtotals. To identify the aggregate level for each row, use the function <code>GROUPING</code>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">),</span>
<span class="hll"><span class="w"> </span><span class="k">GROUPING</span><span class="p">(</span><span class="n">department</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">department_subtotal</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="k">GROUPING</span><span class="p">(</span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grand_total</span>
</span><span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROLLUP</span><span class="p">(</span><span class="n">department</span><span class="p">),</span><span class="w"> </span><span class="k">role</span><span class="p">;</span>
<span class="go"> department β role β count β department_subtotal β grand_total</span>
<span class="go">βββββββββββββΌββββββββββββΌββββββββΌββββββββββββββββββββββΌβββββββββββββ</span>
<span class="go"> Sales β Developer β 2 β 0 β 0</span>
<span class="go"> Sales β Manager β 1 β 0 β 0</span>
<span class="go"> R&D β Developer β 2 β 0 β 0</span>
<span class="go"> R&D β Manager β 1 β 0 β 0</span>
<span class="go"> Β€ β Manager β 2 β 1 β 2</span>
<span class="go"> Β€ β Developer β 4 β 1 β 2</span>
</pre></div>
<h3 id="cube"><a class="toclink" href="#cube">Cube</a></h3>
<p>When talking about subtotals, or aggregates at multiple levels, OLAP usually comes to mind. <a href="https://en.wikipedia.org/wiki/OLAP_cube" rel="noopener">OLAP cube</a> is a technique where all the subtotals are pre-calculated to make retrieval faster. Using <code>ROLLUP</code> we can achieve this by providing all possible combinations, but there is an easier way to do that:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="hll"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">CUBE</span><span class="p">(</span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">);</span>
</span>
<span class="go"> department β role β count</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββ</span>
<span class="go"> Β€ β Β€ β 6 -- <-- Grand Total</span>
<span class="go"> R&D β Developer β 2</span>
<span class="go"> R&D β Manager β 1</span>
<span class="go"> Sales β Manager β 1</span>
<span class="go"> Sales β Developer β 2</span>
<span class="hll"><span class="go"> R&D β Β€ β 3 -- <-- Subtotal for R&D department</span>
</span><span class="hll"><span class="go"> Sales β Β€ β 3 -- <-- Subtotal for Sales department</span>
</span><span class="go"> Β€ β Manager β 2 -- <-- Subtotal for Manager role</span>
<span class="go"> Β€ β Developer β 4 -- <-- Subtotal for Developer role</span>
</pre></div>
<p><code>CUBE</code> generates subtotals for all possible combinations. In the examples above, using <code>CUBE</code> added an additional subtotal for each department.</p>
<h3 id="grouping-sets"><a class="toclink" href="#grouping-sets">Grouping Sets</a></h3>
<p>Both <code>CUBE</code> and <code>ROLLUP</code> are syntactic sugar of <code>GROUPING SETS</code>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">emp</span>
<span class="hll"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">GROUPING</span><span class="w"> </span><span class="k">SETS</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="p">(),</span><span class="w"> </span><span class="c1">-- <-- Grand total</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="k">role</span><span class="p">),</span><span class="w"> </span><span class="c1">-- <-- Subtotal by role</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">department</span><span class="p">),</span><span class="w"> </span><span class="c1">-- <-- Subtotal by department</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="k">role</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">)</span><span class="w"> </span><span class="c1">-- <-- No aggregation, the row itself</span>
</span><span class="hll"><span class="p">);</span>
</span>
<span class="go"> department β role β count</span>
<span class="go">βββββββββββββΌββββββββββββΌβββββββ</span>
<span class="go"> Β€ β Β€ β 6</span>
<span class="go"> Sales β Developer β 2</span>
<span class="go"> Sales β Manager β 1</span>
<span class="go"> R&D β Developer β 2</span>
<span class="go"> R&D β Manager β 1</span>
<span class="go"> Β€ β Manager β 2</span>
<span class="go"> Β€ β Developer β 4</span>
<span class="go"> R&D β Β€ β 3</span>
<span class="go"> Sales β Β€ β 3</span>
</pre></div>
<p>Using <code>GROUPING SETS</code> you can tell the database exactly which subtotals to generate. The query above is generating all possible combinations, so it's equivalent to <code>CUBE</code>.</p>
<hr>
<h2 id="pivot-tables"><a class="toclink" href="#pivot-tables">Pivot Tables</a></h2>
<p>Pivot tables are a technique to reshape data, and pandas includes a very powerful <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html" rel="noopener"><code>pivot_table</code> function</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">>>> </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span>
<span class="gp">... </span> <span class="s1">'name'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'Haki'</span><span class="p">,</span> <span class="s1">'Dan'</span><span class="p">,</span> <span class="s1">'Jax'</span><span class="p">,</span> <span class="s1">'George'</span><span class="p">,</span> <span class="s1">'Bill'</span><span class="p">,</span> <span class="s1">'David'</span><span class="p">],</span>
<span class="gp">... </span> <span class="s1">'department'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'R&D'</span><span class="p">,</span> <span class="s1">'R&D'</span><span class="p">,</span> <span class="s1">'R&D'</span><span class="p">,</span> <span class="s1">'Sales'</span><span class="p">,</span> <span class="s1">'Sales'</span><span class="p">,</span> <span class="s1">'Sales'</span><span class="p">,],</span>
<span class="gp">... </span> <span class="s1">'role'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'Manager'</span><span class="p">,</span> <span class="s1">'Developer'</span><span class="p">,</span> <span class="s1">'Developer'</span><span class="p">,</span> <span class="s1">'Manager'</span><span class="p">,</span> <span class="s1">'Developer'</span><span class="p">,</span> <span class="s1">'Developer'</span><span class="p">],</span>
<span class="gp">... </span><span class="p">})</span>
<span class="gp">>>> </span><span class="n">pd</span><span class="o">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">values</span><span class="o">=</span><span class="s1">'name'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="s1">'role'</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="s1">'department'</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="s1">'count'</span><span class="p">)</span>
<span class="go">department R&D Sales</span>
<span class="go">role</span>
<span class="go">Developer 2 2</span>
<span class="go">Manager 1 1</span>
</pre></div>
<h3 id="conditional-expressions"><a class="toclink" href="#conditional-expressions">Conditional Expressions</a></h3>
<p>To recreate the "pivot" above in SQL, do the following:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Haki'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Dan'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Jax'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'George'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Manager'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'Bill'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'David'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Developer'</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span>
<span class="w"> </span><span class="k">name</span><span class="p">,</span><span class="w"> </span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="k">role</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">role</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">SUM</span><span class="p">(</span><span class="k">CASE</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'R&D'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">END</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s s-Name">"R&D"</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">SUM</span><span class="p">(</span><span class="k">CASE</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="s1">'Sales'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">END</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s s-Name">"Sales"</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">emp</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">role</span><span class="p">;</span>
<span class="go"> role β R&D β Sales</span>
<span class="go">ββββββββββββΌββββββΌβββββββ</span>
<span class="go"> Manager β 1 β 1</span>
<span class="go"> Developer β 2 β 2</span>
</pre></div>
<p>Using <code>CASE</code>, you constructed a <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-CASE" rel="noopener">"Conditional Expression"</a> that returns the value 1 to <code>sum</code> only for a specific department. By adding a conditional expression for every department, you "reshaped" the data to a pivot table.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 385.5359422599188 79.18104057052943" width="auto"
height="10em">
<g transform="translate(10 22.169329432965583) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M-0.4368722550570965 -0.26062386110424995 L32.17644193436828 1.1683365814387798 L33.52596392895904 7.79942736449482 L-1.2316900603473186 8.943986867096443"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M-1.371488381177187 0.884352196007967 C7.760223944166898 -1.898048547422233, 15.79203316282153 1.838062497461495, 31.968381155690814 1.866489041596651 M0.0025178734213113785 -0.45973207987844944 C12.214591764951575 0.7030594047828915, 23.46614968052969 -0.9176273050979373, 32.165179165488745 -0.496780002489686 M33.17431992253075 0.17124689128179016 C32.499750820766856 1.8396612329139987, 33.36442335639901 5.008142184964745, 32.88014092423447 7.781657712512918 M32.88561733518119 0.3543002405216675 C32.70941245070828 1.8408805617858388, 32.46175908872408 3.788729361909405, 32.65053552117902 7.74888516151089 M31.69276057597127 9.2403752920294 C24.018667671607655 8.22098037869831, 16.517865019929587 7.797537895750008, -1.2292835004627705 9.11103905412199 M31.857605429774786 8.466313241522927 C23.636092059745067 7.447658928165478, 15.280356659075697 7.308909626731915, -0.7755557354539633 7.003983794253488 M-0.11059429307389457 7.5409398757176005 C0.6094023991872307 5.189143648697765, 0.7598314125533342 3.200698704023167, 0.2047759205761739 0.14502139166133843 M0.037555551495630946 7.790328945663456 C-0.36718932949990774 5.706542297453959, 0.3364537393442586 2.740053618113184, 0.17562924947138553 0.01684024184891103"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(10.499393115351722 33.69161737603281) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M0.38576923683285713 1.4668311588466167 L34.157709982654715 -1.0417045839130878 L32.62887122895446 8.237705443527718 L-1.110537063330412 9.120305870201607"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M-0.7049884088337421 1.5045171864330769 C12.489281693905747 2.224593871203693, 25.251495369643283 -1.1649702853285948, 32.470585931501056 -0.20677782222628593 M-0.7034308966249228 0.6871890183538198 C6.879229300942518 -0.33431184130316216, 15.64889549521406 0.5823568717896418, 31.758937808162237 -0.1733395103365183 M33.43466232425595 -0.6645074555353025 C32.48144020183008 2.9518121452945794, 33.283165807752994 6.505706190484923, 32.36405995307544 8.605881995380011 M32.519909780072545 0.3644520294580969 C32.6732361599081 1.629224343175423, 32.59765946304875 2.885788760798209, 32.93529680238145 8.18886717218381 M30.76303099509206 6.263776671614666 C21.955307877678038 8.33765511365824, 10.44965919263706 6.908410476168922, -1.030583668500185 8.55062772008421 M33.18811372149613 7.432062922518869 C22.709317231607287 7.901815042944403, 14.075851933735045 8.148687766046972, -0.3600617107003927 7.316052673857827 M-0.46882027542046883 7.369155406674746 C-0.07360801185760682 5.066238700972, -0.6729493693638934 3.199593733708943, 0.64254281714411 0.3660967787481616 M-0.056430715817756605 7.963621377774218 C-0.10127834745169215 5.46483974441213, -0.2575197031822929 3.5208383253546907, -0.2527697920163717 0.3353286104675181"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(10.499393115351722 47.02495070936618) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M-1.1919469945132732 -0.4733721502125263 L34.60377838875976 -0.8736734874546528 L31.657928374073173 7.855157468941231 L0.9161816723644733 8.860396598007698"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M1.3147425167262554 -0.06397510692477226 C10.319408291618881 -1.134460561192507, 21.705860412947054 1.0059855056638773, 31.005205620488788 0.42022930458188057 M-0.5615162309259176 -0.949983024969697 C10.65003837301652 0.16378142255538286, 21.689529149444308 -0.19038383346802412, 31.968023570663 0.605502163991332 M32.98998918765364 0.3616138793969922 C33.142414543052496 3.1664764010743274, 32.066409938717584 5.531750166523653, 32.307767050567094 7.5121098466882135 M32.508892873494226 0.3434003106956088 C32.88496361331082 3.186716285638144, 32.995160138315626 6.28366201797717, 32.618635583946435 7.465433940341823 M32.086648334226275 6.596354019370098 C26.811324617330982 7.037266656449731, 19.831605606246235 6.295991226724084, -0.7167357690632343 6.302581441130657 M32.163741918689276 7.980769632857461 C22.41707112404896 7.424154456365103, 11.379317947206495 7.268479343163961, -0.6154801901429892 8.010631500285287 M-0.14644128858115668 8.38835205496855 C-0.7024841569900766 5.823467118163387, 0.7027274400167905 2.903468010033846, -0.2549751612286896 0.7064539008715781 M0.07804720528104209 8.066343563520345 C-0.2924538399252803 5.1945894328389635, -0.20735668213264202 3.2238318699705117, 0.0864111250005199 0.2048819746505272"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(11.166059782018692 60.35828404269955) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M1.6934340707957745 -0.24396861717104912 L32.14172183390584 1.1943671591579914 L34.047981013021136 6.086092126097698 L-1.6369827277958393 9.061636459555645"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M-1.5656354539096355 1.211004327982664 C11.488749353398255 1.626340479382598, 23.92666763848935 0.5975486941424245, 33.67157700326171 0.8332471363246441 M-0.5640744213014841 -0.4363906439393759 C13.551296596213174 0.19196625648148147, 25.206559810383105 -0.8297487288891545, 33.625201894378925 0.26036625541746616 M32.48642987027505 -0.7789119043462803 C32.1527568981245 2.8207454355416157, 33.55347941569963 4.671251201408914, 32.25655826215116 8.33202238507747 M32.61009428391735 -0.3048302991211035 C32.90728119575209 1.960823212839144, 32.519274318116366 3.5911651915436194, 32.800279872622255 7.988794585537083 M31.51988091733184 8.16637310805561 C23.161108838698333 7.320972588170523, 16.04939562847303 7.368116166699881, 1.3582931645214558 6.0242345077061845 M32.42623437795665 8.754270928453822 C22.237012631422303 6.903288632444322, 10.97427976123073 7.82523438315767, 0.5384000893682241 6.965603786062617 M0.1728222500010398 8.264653841816017 C-0.4152976070480574 5.885636387734535, 0.7805638543665823 3.9450810424660583, -0.4292355338274907 0.00010508916830820336 M0.25986299119261264 8.22766839397423 C-0.41570154451718183 4.785940353100504, -0.040894710329698936 2.2024517117195304, 0.1871846584819145 0.11080622053132522"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(48.583029891008664 22.59714007116588) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M-0.3338521011173725 0.6252269633114338 L33.0815873038384 1.4404662735760212 L33.498333085736895 6.030994188513775 L-1.269527841359377 6.476459634032269"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M1.8244913704693317 -1.9770560748875141 C8.670833417842378 -0.17420140553788113, 14.156202107061128 1.8975813026301487, 31.640900757572318 -1.9398085363209248 M0.8735440131276846 0.8430576864629984 C8.247574278595437 -0.9213133427182587, 17.774961096187212 0.29107773575387874, 31.83556927595165 0.9042421523481607 M32.34746631092574 -0.6129170471443197 C32.80483786931608 3.135240652784645, 31.931774657024718 5.441222242448953, 32.149322112785256 8.308295510758045 M32.42412336970322 0.04682782601157581 C32.58174280190645 1.4859834409189188, 32.99398893872321 3.275068701379869, 32.87721857496489 8.104954186007802 M32.51646068360534 6.068270896103401 C19.003525466923104 9.120869662810712, 8.000954782711574 6.085238125373274, -1.996350061148405 6.041088197853584 M31.993426634407307 8.691682057928462 C20.48049727385122 8.986350762706213, 6.691114895106658 8.427436756949835, 0.7322231288999319 6.997545021128078 M0.11450087650273277 8.226171182403021 C0.4113266960618521 6.781523339844184, 0.6526825793569896 5.244662702410315, 0.2618688772361525 0.583758031240812 M-0.25132698461321795 7.927454617231808 C0.30283018583463933 5.695118294162205, 0.2391052448463526 3.294981172619908, 0.1990879384337721 -0.2839073554465046"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(49.08242300635993 34.11942801423322) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M0.6252269633114338 0.33074600622057915 L34.19130757119384 0.7474917881190777 L30.92694559361663 6.585362051155586 L-1.3784302584826946 9.647882674362679"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M-1.9770560748875141 1.5262091048061848 C11.44104875381004 -0.43181465562683063, 25.314182116443142 -1.4087086766610715, 30.811032761296893 -0.4978567473590374 M0.8430576864629984 -0.7624167446047068 C9.136074494263214 -0.5311139636173644, 20.10321888079878 -1.201365035070459, 33.65508344996598 0.6729359980672598 M32.137924250473496 0.10948644340395153 C32.96688740785411 1.037991481810162, 32.38781821281113 3.0016173165625832, 33.2042469158609 7.332291600638933 M32.79766912362939 -0.2113216610315695 C32.37053992439359 2.979623096741713, 32.325756508976085 5.683774519260112, 33.00090559111066 8.099103257643113 M30.964222301206256 5.969314110007305 C25.92644756636661 6.7901026625190255, 14.695019724837508 5.8201590199027535, -1.8138016946613789 9.83328689309599 M33.58763346303132 8.649506567519326 C25.17449698892245 7.677025213232763, 16.09771953886143 8.339053347101935, -0.8573448713868856 6.96060317615809 M0.37128128988805786 8.257255316812754 C0.5028620894382101 6.231756949994017, 0.6200160508266299 3.583050269523367, 0.583758031240812 -0.048074382937242754 M0.07256472471684416 8.131313626877244 C0.12290825137678935 5.975186232858057, 0.031163714651739616 3.8691729248869366, -0.2839073554465046 -0.06555928736654332"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(49.08242300635993 47.45276134756648) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M0.33074600622057915 1.4404662735760212 L33.498333085736895 -1.8238957040011883 L31.48131345625844 6.476459634032269 L1.7929927818477154 9.409462463583965"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M1.5262091048061848 -0.3603019006550312 C7.831200919102586 2.0352320562386597, 14.81855433618219 -1.616057287595264, 32.25298455025878 -0.043051768094301224 M-0.7624167446047068 -0.6186788138002157 C12.606576728252701 0.3176037797310567, 24.787923335265383 0.6281336674072957, 33.42377729568508 -0.7064372953027487 M32.86032774102177 0.25018986807735744 C31.946927030061197 1.6861500623781938, 32.108984614116046 4.4834643969307155, 32.22824300574179 8.277121545716993 M32.53951963658625 -0.34788523495954 C32.88642218782009 1.6396369347306, 32.81187498938197 4.314675463708137, 32.99505466274597 7.659487307012284 M30.86526551511016 9.226292465355415 C22.953410515247363 5.898094551864055, 13.212442508792765 5.748521821799663, 1.978397000581026 7.056504939224739 M33.54545797262218 8.796409028124232 C21.09636655754343 8.511551496453329, 10.033724956307452 7.271601495213552, -0.8942867163568735 7.740804629873653 M0.40236542429779065 8.435537938666473 C0.6322712199426548 5.418365365167414, 0.2504187895858792 2.1699948539808185, -0.048074382937242754 -0.2640645147235898 M0.27642373436228035 8.003510878979451 C0.2629641902966372 5.298172023622927, 0.24935333535641568 2.3437938998055117, -0.06555928736654332 0.1227772238660701"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(49.74908967302781 60.78609468089985) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M1.4404662735760212 0.7474917881190777 L30.92694559361663 -1.269527841359377 L31.372411039135123 9.647882674362679 L1.5545725710690022 8.627467964317818"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M-0.3603019006550312 -0.13304651901125908 C14.52441456295406 -1.3598637009379333, 23.6860589745612 -0.747779908251471, 32.70778952952352 1.8481023199856281 M-0.6186788138002157 -0.24502095021307468 C13.279894788000165 -0.6655741236142175, 26.276607199004648 0.9226338961192114, 32.04440400231507 -0.4346815589815378 M33.001031165695174 -0.7635767688874102 C32.36067026593776 1.351572796229851, 33.142955211146464 3.3835856702487943, 33.17307295081985 7.530728509868123 M32.402956062658276 0.20092447578519396 C32.36753821244309 1.8294083892723685, 33.010271156312974 3.6502755889787393, 32.55543871211514 7.7284680246576 M34.12224387045827 6.82848347398283 C21.64516805837353 5.5497688653288595, 12.05415105366184 9.52451592705829, -0.7983849532902241 9.014809620108624 M33.69236043322709 7.925084887545724 C25.788790781314972 8.847356667569981, 17.20418513391946 7.220846822313176, -0.11408526264131069 8.845176933806558 M0.580648046151509 8.498611200107892 C0.660673878551123 5.645790925493894, 0.5466299022520063 2.864879821063277, -0.2640645147235898 0.2571273508769846 M0.1486209864644878 8.067588685888957 C0.042363504796069235 6.200164832420683, -0.29842030038987344 4.081704608092314, 0.1227772238660701 0.06494933653129298"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(197.29151494550524 21.71656808539899) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M-0.018450189381837845 1.3909083493053913 L33.28777061816182 -1.884768020361662 L33.11373041983571 8.875708472456951 L-0.06632762029767036 6.154330503668804"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M1.1443223841488361 1.5858052484691143 C11.762245674471018 0.683103806193028, 22.229278614451403 0.40946186226144843, 31.2956507470819 0.49693508073687553 M-0.6781304124742746 -0.03314054571092129 C9.366451270189623 0.5398594073560783, 18.673215188571294 -0.25829524476260457, 33.513377501106525 0.38507860340178013 M33.44177472206316 0.1848477825978011 C32.36293343221748 2.0358641542126077, 32.8192287721528 5.214962610504115, 32.67679053785331 7.325566394243047 M33.12675487172356 0.14299495948346813 C32.36251583946239 1.6827791876164508, 32.59915554877432 3.7050299455062734, 33.03538797537623 8.147105917116912 M31.15023365285125 8.923269365456123 C24.768756668313195 8.955077126197617, 16.584289724058344 8.639694764785569, -0.5332733504474163 6.2169904452824785 M32.373099102115894 7.861370819162746 C22.201264844362377 8.448445410082385, 10.061022002014138 8.141070634672687, 0.8031702395528555 7.6855233484024 M0.06805780513065574 8.158248304642164 C0.33566333761817735 4.652967256015906, -0.19206296535719947 3.0003545054690495, -0.06826611071558508 0.6383685472709865 M-0.37730420489118693 8.136809364475065 C-0.10550865590764487 5.1922142145586845, -0.06193166712701288 2.8459624795932896, 0.23009450288338723 0.07744066782835901"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(197.7909080608565 33.23885602846633) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M1.3909083493053913 0.5369293205440044 L30.866073277256156 0.3628891222178936 L33.771659877559806 7.788562272217293 L-1.700559388846159 8.152630899574776"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M1.5858052484691143 0.40514885261654854 C9.578639825373067 -0.6723026332403869, 17.943373053400066 -1.6425781545187683, 33.24777637835469 -0.4467177502810955 M-0.03314054571092129 0.2727179881185293 C10.993519611220323 0.7078420396094375, 20.81888010079317 0.9846296186690383, 33.1359199010196 0.1292648073285818 M32.93568908021562 -0.36760411854089825 C32.37969853736669 2.2421207205545395, 33.100793943049396 4.2328082385557355, 32.221517799345904 7.2661471815178 M32.893836257101285 -0.3675483156614259 C32.64479237655044 2.0177869364649226, 32.89931887768294 4.581937763387239, 33.04305732221977 7.603344474826833 M33.81922077055898 7.600428235259075 C25.975575222511786 6.881478028285441, 18.00358418496264 7.0595103073000285, -1.6378994472324848 8.810672533240338 M32.7573222242656 8.69570010761561 C22.719402352708915 7.82213018666048, 11.630695509106332 8.534346385104847, -0.16936654411256313 8.362439034979959 M0.30335841212720116 8.183064058242177 C-0.4189979106825804 5.3737625064219445, 0.6835305694353457 3.2234732841096663, 0.6383685472709865 -0.04490301841139244 M0.2819194719601012 7.792982096388069 C-0.36901640477343023 5.044083519606704, -0.4037041903957285 2.500033264274357, 0.07744066782835901 -0.003623105152259609"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(197.7909080608565 46.57218936179959) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M0.5369293205440044 -1.884768020361662 L33.11373041983571 1.020818579941988 L32.68451367732015 6.154330503668804 L0.29774100705981255 8.11104167672636"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M0.40514885261654854 0.9402646534144878 C11.91194583224241 0.4069404293752015, 23.338531172463973 0.23725280062916876, 32.30412354733672 0.4874761812388897 M0.2727179881185293 0.3700044695287943 C8.612451635441612 -0.41729637505894024, 17.01594222551372 0.39593241093272846, 32.8801061049464 -0.2584854420274496 M32.38323717907692 -0.42213889639619906 C32.90862586145677 3.165384838491714, 32.74588388029784 5.203533160138508, 32.16209858662065 7.647118467574888 M32.383292981956394 -0.08494506914087463 C32.69511540608947 2.315009880481052, 33.11057069019738 4.583072921580808, 32.49929587992969 7.594828518395189 M32.49637964036193 8.736232493546028 C20.02583695235458 8.595801764953622, 8.190170515567939 6.391942078102121, 0.9557826407253742 8.438740347054024 M33.591651512718464 8.604783015798946 C24.57991900660656 8.299530798390277, 17.030258872931178 7.68764590640343, 0.5075491424649954 7.759545820307155 M0.32817416572721403 7.408108486680687 C-0.20635791776122836 6.432004950561079, -0.07507189122865066 4.3359944073108805, -0.04490301841139244 0.43600859780750756 M-0.06190779612689412 7.5037781639017815 C0.05713003363197525 4.933576648922507, 0.30555534386162464 2.8613031016293684, -0.003623105152259609 0.2731357983593397"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(198.4575747275244 59.90552269513307) rotate(0 16.375420648808927 3.9274449462574808)">
<path
d="M-1.884768020361662 0.3628891222178936 L33.771659877559806 -0.06632762029767036 L31.05028190877166 8.152630899574776 L0.25615178421139717 8.557665560867806"
stroke="none" stroke-width="0" fill="#f41d92"></path>
<path
d="M0.9402646534144878 -0.48491502925753593 C12.365751138596764 -1.5215353794056998, 23.895191938964587 -0.5130625791508782, 33.23831747885671 -1.0372554175555706 M0.3700044695287943 0.4857486244291067 C7.17616805489215 0.6732187680099517, 15.593715078365857 0.03994737184982605, 32.49235585559037 0.6492278929799795 M32.32870240122162 0.08869122139441776 C33.068282585164106 2.9733496324353235, 32.240002577605686 5.5060580734025, 32.54306987267774 7.643851373249904 M32.665896228476946 -0.13090860634949136 C32.949021721506135 3.392840793724386, 33.07165631411646 5.9650428142438185, 32.490779923498046 8.117325974297563 M33.63218389864888 7.14358426305296 C26.112611495626677 7.279947375301965, 16.704561767165018 8.769003366474756, 0.5838504545390606 6.900180947508831 M33.5007344209018 7.945843933623452 C24.605244725836478 8.93343255122959, 15.405244914194007 8.63781145414173, -0.0953440722078085 7.501180349391122 M-0.4467814058342759 7.6553377552668005 C0.6254112330709302 5.3620990089142335, 0.6080327060582663 2.9609373283332814, 0.43600859780750756 0.17904676160248767 M-0.35111172861318146 7.836559085168701 C-0.4421834086275 6.483125837426236, 0.02105677343633877 4.647549671418473, 0.2731357983593397 0.10543801732340069"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(340.99935944157005 10.766680953518517) rotate(0 15.47968001294737 3.7126124781994463)">
<path
d="M0.3628891222178936 1.020818579941988 L30.893032405597147 -1.700559388846159 L31.25710103295463 7.681376740610343 L0.7027756683528423 7.109512016451102"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M-0.48491502925753593 0.6666227094829082 C10.287393238147974 0.3513851211134638, 23.038449827086687 0.341926221615478, 29.922104608339247 -1.3562608249485493 M0.4857486244291067 -0.4281501825898886 C9.94774320264525 0.2705275875684584, 18.49967880564167 -0.37303645786077133, 31.608587918874797 0.8796220365911722 M31.043199809713922 0.2826034946393512 C30.85461026005351 1.901934700732038, 30.368072119679045 4.6082018129321085, 30.75986537969976 8.135926924241652 M30.83561216129395 0.16030527457373467 C31.15885324975516 2.7741453843663484, 30.652085559754692 4.749991400035879, 31.20744077184755 7.128103160665854 M30.248054396432813 7.9911851960179625 C19.4844614303259 5.923754523623858, 10.031892176377188 8.145504425395403, -0.9547089450061321 6.669740565395099 M31.050314067003306 7.86774330427323 C22.454657723540215 7.488020524133298, 12.851164084544898 7.562042996038053, -0.3537095431238413 7.511868825456244 M-0.18863657287032853 8.045102460435809 C-0.09555422670115264 5.5143424453548135, -0.07346910285988127 2.8087195495400206, 0.16925284769173754 -0.7133310935349857 M-0.01732810644591065 7.060528796367773 C0.229212545603601 4.7378653313579955, 0.008279363893460831 2.1622916128899394, 0.09967052476814175 -0.3498706635453012"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(305.55004749318505 10.315099895649041) rotate(0 15.47968001294737 3.7126124781994463)">
<path
d="M1.020818579941988 -0.06632762029767036 L29.258800637048658 0.29774100705981255 L31.215511810106214 8.128000624751788 L-0.31571293994784355 7.645900734042865"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M0.6666227094829082 -1.4551905505359173 C10.411900548117018 -0.37165445541733866, 20.317407115999174 -0.9621921226918138, 29.603099200946268 -0.06628109142184258 M-0.4281501825898886 0.7625362034887075 C10.517204049140153 -0.021119858129256985, 20.005765449449296 0.4988432275221407, 31.83898206248599 0.23532829247415066 M31.241963520534167 -0.07000016002359033 C30.419085912023434 1.3235331283444283, 30.72304898628124 3.5606482112899025, 31.670061993737523 7.695571137268439 M31.11966530046855 0.2689818854068856 C31.219532033018673 1.29489506686068, 30.697464380899024 3.3134424888612695, 30.662238230161726 7.623548904533623 M31.525320265513834 6.89195160595153 C20.427551073882434 8.27089976176952, 13.755391470874077 6.360408176038014, -0.7554843910038471 7.4381868096945105 M31.4018783737691 8.228395195951801 C18.899090857738713 7.960932887908819, 7.082210705599923 7.099674202319982, 0.0866438690572977 7.811428221216541 M0.6198775040368625 7.360693023310402 C0.6456838201990842 4.844987724105903, 0.4543930233206109 2.518896957999142, -0.7133310935349857 0.5329967771763948 M-0.3646961600311723 7.642733217942879 C0.13958580048343278 5.311275389909739, 0.32457634611145597 3.3038461885210904, -0.3498706635453012 0.06736333416745027"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(269.31355949354383 10) rotate(0 15.47968001294737 3.7126124781994463)">
<path
d="M-0.06632762029767036 -1.700559388846159 L31.25710103295463 0.25615178421139717 L31.66213569424766 7.109512016451102 L0.220675777643919 7.665687606012565"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M-1.4551905505359173 0.49693508073687553 C10.296589853721466 0.3842851189172955, 20.449359790449552 -1.4594518872701434, 30.893078934472975 0.5454359762370586 M0.7625362034887075 0.38507860340178013 C10.562582973430244 -0.12593288554791998, 21.515864225183304 1.0121745930707018, 31.194688318368968 -0.4679939802736044 M30.889359865871228 -0.5003693372102275 C30.411205703878675 1.6321134573557077, 30.768249358081672 4.367741408458966, 31.229706206764313 6.730338340933529 M31.2283419113017 0.27623171657717127 C30.69682464838106 1.8211344568440577, 31.18269119892738 3.8368189890399753, 31.157683974029492 7.3779890802013695 M30.4260866754474 5.787325509166461 C21.26304451427778 8.09668424503154, 8.700454776203863 6.757349399488633, 0.012961853295564651 9.106845386600238 M31.762530265447673 7.255858412286383 C19.857480042122212 7.466881732690179, 7.386792230295775 7.648869673955285, 0.3862032648175955 7.843020966550451 M-0.06453193308854399 8.028674544291134 C-0.023583151960239766 6.032158583561937, 0.1881165049787704 3.1014435351625145, 0.5329967771763948 -0.11704284059668535 M0.21750826154393244 7.498429595454074 C0.00004070112297441253 6.119664268445459, 0.10313614605732388 3.9478425475803687, 0.06736333416745027 0.18949518989352454"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(233.70727128519957 10.315099895649041) rotate(0 15.47968001294737 3.7126124781994463)">
<path
d="M-1.700559388846159 0.29774100705981255 L31.215511810106214 0.7027756683528423 L30.643647085946974 7.645900734042865 L0.24046264961361885 6.797784455394488"
stroke="none" stroke-width="0" fill="#daaeff"></path>
<path
d="M0.49693508073687553 -0.4467177502810955 C9.024651240369767 -0.8119965767539629, 15.718089293313206 0.15897774937976517, 31.504796002131876 0.7400089390575886 M0.38507860340178013 0.1292648073285818 C10.533169220319385 0.41315156441412065, 22.46293136128484 -0.0007480360917082041, 30.491366045621213 -0.5374217871576548 M30.45899068868459 -0.5565383003980514 C30.79121411240633 2.1119126245437325, 31.698322343201458 4.893160722347073, 30.2644734104294 7.264627861865796 M31.23559174247199 -0.2377858045935869 C30.745585145267086 2.228152503670688, 30.69429928507647 4.158467463570582, 30.91212414969724 7.588829133306781 M29.321460578662332 8.38100759712432 C23.356480826213428 6.38796700314923, 13.83041632645007 7.355637801450927, 1.681620430201292 8.92501120296691 M30.789993481782254 7.932774098863941 C21.29462005546979 7.320171779359115, 11.907212098517675 8.060084587300551, 0.41779601015150547 6.8564309797195016 M0.6034495878921882 7.382778147151673 C0.35017104061002857 5.555546890041327, -0.7753188439386719 3.88035990547663, -0.11704284059668535 -0.6638116142830458 M0.0732046390551282 7.421800036232738 C0.3311344166893858 5.331984207139225, -0.27693143153906674 3.106765742510671, 0.18949518989352454 -0.012312437538320398"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(87.2136852684016 23.39041740738446) rotate(0 14.661128495757907 22.2286449149058)">
<path
d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.4017499395040919 5.651468089272372 C1.3871852087065062 4.054675247860221, 3.2471853260842045 3.7424064787899267, 4.802808965327686 -0.2690802280011362 M-0.2442514523537705 6.518281718392961 C0.884715308748695 4.749317661415791, 2.69670586159955 3.1110233876447855, 4.980214293631262 0.06352231389258445 M0.08711027289737339 10.680161622583448 C5.16288510311983 6.746471035650974, 9.308001504702 4.244967588971493, 9.735410575521124 0.10250187768486763 M0.5384873204957354 11.901626243114054 C3.1569231802983886 9.334970815067912, 3.9187544113415678 7.347950387554146, 10.806686085415018 -0.35736003414841877 M1.727081955359294 20.133384029288464 C4.753233092562224 10.026027272855984, 11.106221690098051 4.201523209503778, 14.668762083634217 -1.1474955559778044 M0.2049965140828629 17.547048174683624 C4.228469948060025 13.770685764866986, 9.379749483507664 7.569731685153272, 16.16994945472427 0.6680943320435455 M0.17007752900936346 23.772759997933267 C4.729839374581561 15.445347172017119, 12.704473234837671 8.675762096657982, 20.773670593554044 -0.349556757679629 M0.37458397282280664 23.503309316649197 C7.2147537707358 15.209383200744483, 15.39534677902555 8.63165193956629, 21.815096166262293 0.30426344221229584 M0.2752094166865007 29.774248348314515 C6.61775551872468 25.697610348912765, 10.720937473492764 17.611700432158248, 26.779118737813445 0.7771692270424904 M-0.6972853177646741 30.480921655658832 C5.843167661422801 23.980521749514708, 10.745901545147904 18.829116516403076, 26.368139155816646 0.38176665958135736 M0.6820571551937142 35.90157211739694 C9.402607657828733 29.04603943197939, 15.458014239950629 17.455048953694288, 31.875658274224914 1.3565837653760529 M-0.0699453950953206 35.98646711539601 C11.314871322078687 23.28627932856827, 24.91761418367802 8.756475799734414, 29.978919731981076 1.8904111130569792 M1.955981531334018 43.30517309214525 C9.35031103935028 29.61731321865892, 18.535723734721863 21.430628471087914, 29.02432706564842 6.54785487562242 M-0.04687092158486905 41.6144182445442 C9.573989805965084 31.45200863184947, 21.508375705778807 18.03957479542847, 31.7103434333968 7.243942457659273 M0.7131245819104137 48.3972118095242 C11.025861709074679 36.42825526917172, 15.083366280731514 30.317617640778902, 28.922534514848994 12.09977352385523 M1.761822815350932 46.109638216786884 C7.804664229613986 40.559164097005365, 14.196665164671861 33.05783058388872, 31.3167654636101 14.021901171180627 M5.7135866806974285 44.46243881583019 C10.2612463953638 41.374836222537255, 15.270377989244853 37.18948661074086, 28.94423515862965 18.79168774703921 M8.34505699026786 46.16575209013923 C15.138498518904967 37.995973373886656, 20.35413447594366 30.160292873767112, 29.996664285205238 20.598056677304516 M10.801600132574016 48.09912878981012 C14.749626308990479 42.10688572860424, 19.70198854410553 36.26892904496983, 29.88916478379339 25.289664332139093 M12.04346283382612 46.053580930066886 C16.411090526497002 40.66032762776661, 20.92530383518608 37.034954944187604, 30.95437773576647 26.18406902471443 M18.15001745881802 47.91114037233096 C22.491212621675036 41.691771390746105, 25.716365312392732 35.49498791508727, 32.06182944385511 30.365608458772197 M17.633552295121465 45.621223575814334 C20.861728139198437 43.143535525154114, 24.501306093080288 38.80858801348593, 31.330103914553202 32.14258352619772 M23.560955004037922 46.616740121590354 C25.171969215839688 43.845608001245964, 27.402442646125856 41.93068491908994, 30.617887933642454 38.4650092899672 M23.407351859675355 46.839358596029136 C24.856406811903135 44.272984362325374, 27.151828846078324 42.02223764064705, 30.946889757827705 37.71206453627565 M-0.1257394441655144 44.347986198578155 C-0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155 M-0.1257394441655144 44.347986198578155 C-0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155 M6.4429088167157795 44.39735575351224 C5.164784828559348 42.21418838925779, 3.1549992203444317 41.971966590152235, 0.9118121939307984 39.47298053506168 M5.944048142364325 44.693744134486764 C3.8809613987590437 42.97671110108837, 1.9047285940246133 40.560463050038685, 0.33460424193804816 39.46792989301833 M11.9737085546558 43.59076231445947 C8.814090751476495 40.27239587105459, 5.170405246648753 37.23139042502132, -1.4504079086876018 34.14636673595681 M12.123756990880262 44.97882730539718 C7.491593262704607 40.36612446266093, 3.6310568054766015 36.44560928624832, 0.6211579473519221 33.901803619823596 M18.712105958511714 46.4010123970389 C11.584500952654256 38.829648304715526, 5.93927407414408 32.10866107702508, -0.6485348934629283 27.228556435634093 M17.459533230190758 45.46284361616071 C11.154737375563544 38.577895883936975, 4.772653580851095 33.674061798412644, -0.5627617961117348 29.58173867509206 M26.157859912912123 45.219313813194766 C19.12837812168928 37.375110659970225, 11.736658442361751 30.964189200712923, -0.5840457926592588 22.381901023248446 M25.709754207010143 44.708784218698874 C15.172339943539972 36.11638929779963, 6.904957020062454 30.22664184437116, -0.32296941495451925 22.79972920873226 M30.159094151447075 45.4865864963208 C19.430125803284113 33.85079975951891, 9.19807572595839 27.372637114270834, -0.020138096740241807 19.04016251949197 M28.647079689765828 43.036506680858565 C19.880974174595906 36.71880021740512, 10.466351309135966 27.609014387214348, -0.6095438848489483 17.83336472412175 M28.465647111037967 37.37437069764245 C22.44120620841025 32.37039180583455, 15.959674692462077 29.41622887218609, -1.8655229583389161 10.667467983571168 M30.48845016616046 38.100644515583994 C23.313499471785086 31.792331225372017, 15.420472825385762 26.601242164303798, -1.1433911826468108 12.69041241718578 M31.170524700611168 34.72045667929977 C19.826562315338585 23.635657511670537, 8.252141109676035 16.70588277918442, -1.334291959197242 5.988899340654543 M30.18655172090166 33.360753121733964 C21.372794977554484 24.76845831669319, 13.239035337006012 17.827800478401947, 0.7542804106911829 7.465766224811604 M29.778385338424716 27.03117636646127 C20.89290762587415 22.334499379477478, 12.347616470120727 15.086562630919495, -1.8585937519482614 3.617021149948582 M30.310801073028717 27.271101436195725 C22.21219779495138 19.59764783592662, 11.806533626100592 12.616618900822878, 0.06178366831324078 2.1554745929244605 M30.575830371845573 22.700344318333432 C21.06656243157407 16.106860644260607, 15.869333070259978 10.339068393963139, 2.5834617421116133 -1.512529105397186 M30.23013599568984 21.687112904894516 C22.687683621109905 17.114894602088114, 16.613751290274152 11.839030942513947, 1.5415482197042008 -1.720169233607373 M30.794080719131777 18.132729960084774 C24.707513128981333 13.835516062935136, 21.17055965148287 7.563330369729165, 8.3098362714632 -2.8584542212237385 M30.088191957924316 18.176337714717604 C23.448786884110305 11.076605071087751, 16.366546663400086 5.181110647515524, 8.39465348912613 -0.6465404139061239 M30.69630022765029 12.370585350921782 C25.27641482506428 9.744577045511278, 22.910926691373295 6.070578991959307, 13.26717703235959 -0.30332853327712606 M28.70312548288036 11.832480561590227 C25.07577480652732 8.587559794978738, 22.23511153577374 5.66166372748231, 13.405942473613372 -1.9567137719990928 M29.780663180605096 7.724559221874858 C26.939987395956084 3.4479993078446007, 24.16153814153067 1.064735504015831, 20.987501735901276 -0.4803953185931209 M29.652300188664867 7.314575275026198 C26.86946679345229 3.99106281511067, 24.974061840662742 1.9452632814279724, 19.50921003409643 -2.099133140339738 M29.762235269134504 1.5908043184294292 C28.941537975237644 0.12401028694778965, 27.44452362143621 -0.19649017206778363, 26.125883223399747 -2.108355821764274 M29.25194522430898 1.3800505455559975 C28.132693712356712 -0.2108085565947063, 26.622004471302695 -1.3541554865280512, 25.88375526518025 -1.8670270351713434"
stroke="#ced4da" stroke-width="0.5" fill="none"></path>
<path
d="M0.4744492657482624 -1.8594930656254292 C11.481920538455332 -1.6942676379555488, 21.591553218633106 -1.2280070855492378, 27.633552728096927 0.556500505656004 M-0.7842386607080698 0.015608960762619972 C6.550132594391122 0.6071495357194834, 14.765277393801837 -0.9307378587087698, 28.545142695183838 -0.08419824205338955 M27.640681443612063 0.6667271368205547 C28.570285292254034 13.388974946455372, 27.998359532938544 21.815885079088236, 27.51902037660309 44.422651376536656 M29.76259426760205 -0.4793460424989462 C29.20320474667695 13.052394046285766, 29.55752032561448 27.48403075677458, 28.427896961445892 44.80427832338827 M29.340083418290103 44.95673665027933 C22.843194378336378 43.34118293171196, 14.02292963647836 45.886064394762805, -0.7731576077640057 43.36640545349436 M28.823586448903168 45.13152737352865 C17.83071897963157 43.54532205975586, 6.812252984358995 44.54545575297409, 0.16752288304269314 44.95437718365209 M1.575706947594881 45.40735718231516 C0.5063116501777195 25.410006843764034, 0.40618001928019754 10.601449687585955, 0.7366157658398151 1.1796328537166119 M-0.35767080821096897 45.32970667812841 C0.27769004766780647 35.851505623624185, -0.6118870430723974 26.056526386136564, -0.5451398734003305 -0.44682890735566616"
stroke="transparent" stroke-width="1" fill="none"></path>
</g>
<g transform="translate(235.54701860173645 24.723750740717833) rotate(0 69.99446182909116 22.2286449149058)">
<path
d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.8035552702518055 6.76368796234309 C1.8044130826338263 3.43535582909761, 3.036262400026707 2.227133571758367, 5.736610410692639 0.0604889836010748 M0.0557896346179893 6.6015909322100805 C1.3329290501455715 5.09672180166312, 2.1779159306117326 3.1844923283009257, 4.8224673423492375 0.6604369792935663 M1.6608975508343087 10.965854916393338 C1.994100109159878 8.103994046193563, 3.321126016332616 8.317550958657542, 11.895193146121633 -1.0330326651141286 M0.793131204796921 12.616565345644533 C3.580303696315507 8.263945513847274, 8.704194094538806 2.536532619225327, 9.966911730122698 0.49473565026072275 M1.33255090652497 19.725125747640305 C3.143312642736771 13.692106631436335, 7.434326677988869 12.487320347804273, 17.615817863031705 0.18203174014808443 M0.6196292703458806 18.28482574680882 C4.557385954986606 11.160640854513778, 10.264284437348095 5.763523717465571, 14.921839715450485 1.1628698470932415 M1.2587749252281402 24.284772874503492 C6.825738423115688 18.469775655239307, 11.670419442308846 12.535181200585969, 19.810850264781976 -0.6496469141752002 M0.22043308822431307 24.485958862765788 C4.594500030429427 18.83648192595457, 9.880024069571924 12.062457868698413, 20.808097347894574 -0.5356979859829423 M1.7153222086419717 29.45882883085703 C5.791336877414068 21.869799078785363, 12.400405661918137 13.757506458357923, 25.295849485930415 1.8871115813043655 M-0.13403743966022574 31.019245635483614 C7.972724767514389 21.777445669854593, 15.937240392425334 11.993727373880482, 25.293829169731946 0.36442836504548204 M-2.0490968470555266 36.36672954429065 C11.165771940778676 21.76501665019432, 23.036316713200783 10.33318503053983, 31.45605370164553 -0.1359351760462637 M0.7177088445890227 36.84427091833328 C12.195569554729314 22.87148382614264, 23.755092223518016 8.891305576236157, 31.487600992849863 0.6833585483267193 M-1.524053095686341 42.55369219239407 C10.736920264445432 32.15315626429907, 18.773233655871586 19.62612489933258, 37.189183721133816 -0.07741001796904001 M0.08755372386882243 42.42647893968575 C11.139826816846739 30.284121776200024, 21.615517591491656 18.468637797294445, 37.58663313817357 1.2033999079000672 M0.9376013589275303 48.11571520275702 C14.902586002248675 32.6812512481835, 24.165385966020857 20.423166298379567, 44.02891521956377 -0.6389928470970148 M2.500269606075925 45.67467330291202 C13.55898857400713 33.34541769492296, 25.604997097054465 19.60974568732408, 43.18777266759031 0.5688808419137246 M5.962910249935614 47.56044951230093 C21.17809540793634 28.75998276722019, 36.70768878056407 11.526993900771892, 46.7714065061419 -1.7027439951721384 M6.724196611078607 45.61438824571474 C20.410024334386033 31.081785740989595, 33.32374636420872 15.769237747833575, 47.10152115202534 -0.7386725555930092 M11.848986741591865 46.49813269039768 C21.79385968031164 37.986927603774724, 30.174413979760192 25.875731873480365, 54.5987707560367 -0.24378086482745331 M11.648916164849574 46.891322763247075 C28.742477004713628 28.67513587378053, 44.671504741976726 9.196974832736949, 53.66300690336591 -0.5007131364719264 M19.559194009064797 44.67239733441424 C27.06517174158184 36.08090545760024, 38.20160353873359 22.217967500279677, 57.70560377845467 1.8323987121950829 M18.468931579276088 46.61464900365722 C27.40358432209788 36.2641478661029, 36.30947194399455 24.97388555704827, 58.54601274253727 -0.7864960566837347 M24.069377861560888 47.896495612608646 C36.679079502292694 33.14889691938639, 45.24625858981139 18.177257513887007, 64.42245624856432 1.7772398332624206 M23.21740067528338 46.10573681098447 C35.20076952024363 33.95749202700812, 46.27150349885814 18.892801294146327, 63.07394823142667 -0.053255418080233596 M27.491338974248663 48.14062262234786 C38.262444420682066 31.42551322046851, 50.87424254045724 21.201353969077243, 67.84236147557213 0.16180716663260597 M29.07710801475872 46.797977083377035 C37.02898616385974 35.65955986518743, 46.07606550302341 25.5269427937491, 68.03387744095936 0.8289723733778729 M33.6993606563346 47.69459303188993 C44.08721823896586 35.364860928845765, 53.4129912265381 25.06785554007112, 74.94430763464662 0.9641083096726106 M34.19449266862734 46.21195841487421 C42.91333228290033 37.436471361209996, 50.87154232179626 26.606055993299663, 73.98749118302258 0.4771719658388065 M37.45368108298722 47.2551644386018 C50.037217033024504 36.336939467688765, 58.4741927890329 23.587792873257257, 81.21976553499428 -0.2682295693595478 M39.99062044877651 45.8552904536242 C53.8078659635306 29.04299656596663, 68.14401300085277 13.708009207230752, 79.0851347785416 0.9705257033418349 M43.027021063412995 46.08871401074152 C53.67775388551293 34.64178435941994, 63.759901142946156 24.47917892123328, 83.79616085178361 -0.330636373096894 M44.97521167613146 47.19778054366629 C58.261237069280355 31.279226497703434, 70.65354411004168 15.409261720534158, 85.37678028940365 -0.24976301596481676 M49.32805627278045 46.11856184567605 C65.29337428228222 28.919723123087326, 76.02221906907161 14.160422415590837, 88.0634018942211 -0.19892243328139614 M49.05761341257945 45.5712570831487 C62.15029102663554 33.229958675110275, 73.28957504854179 19.820524641979972, 90.62605428653399 -0.40529309804186653 M56.52794502029608 48.08764761166342 C68.51447318045768 33.12514409989587, 82.29390305517097 15.938866827670228, 96.7973518374753 -0.9419551525460861 M55.36082722904573 46.144746166110714 C68.94937771288052 31.038822278250713, 82.16675823164019 15.718175530033417, 96.6212100818199 0.5193576052589037 M59.18380048112839 47.64585390607061 C69.91382656888298 33.48376586402321, 81.13018760526157 20.263577949404635, 100.78635649075264 0.4410176373670751 M60.53248143741756 45.824294940700554 C71.61212356517133 34.03722830780341, 82.82175138187779 20.115240173655412, 100.98915549618656 -0.24819106202321706 M64.86521846448386 47.51268289715564 C79.70707281090363 30.65838961273568, 94.5851383477152 10.130005042930108, 107.95471652922166 -0.8923089661749586 M66.55885885862017 47.22814825095749 C75.35451758045565 37.47469336980239, 84.2163358377197 25.678651165521302, 105.50642509152127 -0.3682982017400249 M71.277749645531 45.56457968228548 C85.10496831431138 29.69291800110021, 102.37296546245956 8.49917970726812, 109.98921526208885 -1.4646202949141482 M71.07586905454082 46.58657199502974 C86.87934008286958 29.1146565794596, 100.55091963190274 10.451833215672174, 111.91320793874927 -0.41412521985223094 M76.88942733347632 46.997608929718176 C89.57988008262991 31.65008482897742, 100.87285447834086 18.627611207105492, 114.80653183553221 -0.6369491712607456 M77.18307609610476 45.71702976695661 C87.66723765552652 33.50172249741951, 98.83012521739273 22.317207458610554, 115.5682608207673 0.7828971120111632 M82.05423970348832 44.41477823682067 C98.44849119744782 29.817568644560083, 110.3127927062406 10.431521118411567, 120.46006414572975 -0.219547167625052 M81.75467079519925 45.92400621488628 C92.526348285128 33.95105242296518, 103.54323208116026 21.51225321516387, 122.54323114547213 0.33469739722110603 M87.63865756477347 45.988714771851015 C95.26269789902805 35.59392903234107, 108.6149972325639 22.112795866860306, 125.04675175188794 -0.08275586574900018 M87.2534984727495 45.97418368285806 C99.66237523526084 32.89572517286636, 111.17066083236364 18.895467051915947, 126.71697985878899 -0.5018076451830851 M92.1451065925003 47.548988632176176 C105.762476298733 31.895009218482876, 115.37997531548342 17.71034792933724, 131.17448907917532 -1.3149739182656361 M92.0420311906428 45.99626556902017 C103.51111372020063 34.63729021045849, 114.63534307330676 21.526544881172754, 133.43049464760514 0.32981975794679386 M98.3384950958753 44.74478948605417 C107.83507926783005 38.05096509253584, 114.07036277960029 27.557881539863608, 136.58865318679838 1.9400195930830506 M98.1963955198044 45.45615296741186 C106.41693846086204 36.432121503891395, 114.73108645058684 26.04046884667104, 137.30091770784585 -0.3392849243499967 M102.36968625006698 45.132249930851984 C115.61748772161864 31.811414822478795, 127.97958175704206 19.826744251903538, 141.98345794194458 0.07980767610960982 M103.81734477212153 45.05676802618139 C111.29284758376977 35.87006567460118, 122.21890612640911 24.674632439583633, 141.22039991841731 0.505953991520343 M108.40202586428182 44.337122134827645 C115.84915047161053 38.79206587065812, 121.33900746803705 27.72089496804603, 141.62848178785842 8.206012069963165 M108.84428676851945 46.71323058936802 C116.85338884217794 36.10456675076113, 126.61564878427663 26.213023741627172, 142.17012860928278 8.039729045488187 M113.07875558469783 46.89681077557087 C123.86924985436889 32.853155086004804, 134.90362815277211 22.50968504676915, 140.36484832365144 13.155101640673998 M112.7101970418949 46.20739113534272 C120.8632999953051 37.232714519561945, 129.49650960610606 27.80061176904143, 141.40400814127256 14.438658029365065 M119.42983580749303 46.64950178067142 C125.30500332395341 36.6109315374342, 135.51661431798257 26.602309721839333, 141.7348096085301 21.736422306500195 M119.43594175014466 45.3894595174765 C128.32356177390903 35.38224467882609, 136.35158448971407 27.010867503844462, 141.76574321031183 18.932313912190317 M122.77876859187388 47.1232910683873 C131.62682638178094 39.289947372175526, 134.36965982235634 33.671937556875044, 140.38551353867246 25.045972495031457 M124.72460058919394 46.379633704808626 C131.05850727579246 38.279777531903775, 136.19536422096635 31.688988663317332, 141.2801062242974 25.92950029733489 M131.1197992057605 46.37131044676355 C133.44225625553713 41.65453200195487, 135.31412748914002 38.385313605061896, 140.07058038568667 30.907785023435352 M129.60584988203553 46.22350491944528 C134.19928260509775 41.08859662715386, 139.2243094840982 35.525361849409094, 142.22844216876274 31.25391588878199 M135.38428310571382 45.812999532336256 C136.16001504438518 45.07456607138024, 136.88785908767264 42.92723302004416, 142.41998577052806 39.235254989300365 M134.34320941060926 46.74466313967656 C137.26878138774535 43.07360057375775, 139.92442461685624 39.98343002348078, 142.0096440663569 38.23085294367417 M-0.1257394441655144 44.347986198578155 C-0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155 M-0.1257394441655144 44.347986198578155 C-0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155, -0.1257394441655144 44.347986198578155 M7.022208299418988 44.56268532011017 C4.604353392752843 42.46128644795872, 2.420324837643724 41.38873618636521, 0.22325574630563805 38.964697220952836 M6.242376406421962 44.49737846064672 C4.976095005735076 43.61086873152783, 3.34882888525874 42.38108741086478, 0.1378908971852596 39.11806533116254 M12.276193132550013 43.22382184590974 C6.695431711682969 39.92521658396217, 3.3181886996289567 34.88789744112052, -1.6085680527782542 34.79805133607164 M11.66630060777117 44.70175824638286 C8.746887461166361 42.19727852092828, 6.206600059408037 39.05000896979498, 0.3634773554110423 33.80195004161372 M19.303905061235312 45.135499967861136 C13.648885173875879 39.60077744361788, 7.44192961409073 36.50692135675027, -1.2948112855055651 26.820836145509677 M19.23339024630714 45.31486856997307 C13.445157001617547 38.83320947561736, 7.222778036983602 34.42784441976494, 0.0859561027637401 27.843435293142395 M25.17272587011265 43.81393383984281 C15.040876598281333 36.069433066993504, 10.158179529770548 30.876958054625064, -0.8271909381351232 21.52258375779749 M24.648979422474973 44.04915992152704 C16.868921148449715 36.44707569337318, 7.228664609443165 29.994612582199288, -1.1326435401192665 23.020033381218326 M31.447584626843398 44.28023644772655 C18.647643074819115 35.381801122546705, 8.84980367064366 24.253255397613216, -0.47353242777824 17.242408984280885 M29.564476709460084 44.77795643781609 C20.056066555366716 34.880950904868826, 8.608574613108573 25.45737200401835, -0.33013612658321456 17.125263969354094 M35.00398935906058 44.32223385730589 C25.748423557960326 33.49471985657513, 12.438542804051067 22.316257036221835, 0.8533027023070456 11.100933024791356 M37.17051464661425 44.66438936041499 C27.792074035323992 37.641434241695606, 18.46037753786428 30.042383129498596, -0.918711813167393 12.188256558162447 M42.71032852350616 43.77929726841229 C27.307439993443143 31.62780701279341, 10.258404111912476 14.126666052677145, -0.8539900202462265 6.643794445122412 M41.91427807739818 43.62821318993646 C28.786689403971728 32.71322834650257, 14.406922195282338 20.32498404973959, 0.5461681894799879 7.680736139218142 M49.340547647813196 43.91581524683928 C29.326228975173432 30.351157705365242, 15.372984502188203 14.352724872980957, -0.8598346386960986 2.7181769451055375 M48.59968079116368 43.531777917955274 C35.159510466580684 31.8498386210041, 20.011630208373244 19.015194417595637, -0.6094050251803639 1.2060509631339158 M56.1661822689464 44.953780521408845 C40.813558692018226 29.692439199206206, 22.33981673843322 19.028595732678024, 1.7363210350360174 -1.8416003514847372 M56.166898461594215 45.525115824074675 C34.72777621398389 26.681083062974658, 14.654015752631452 8.196801926841033, 1.4318077426488895 -1.0027371705344532 M61.0954866738705 46.165967070942976 C41.0037491606271 28.908970921918616, 23.337096683154364 10.638959904324093, 6.592038358805938 -2.68149609986439 M61.01899738737326 43.9516185628979 C49.1399069259005 33.61450205800868, 38.614453600986636 24.258572304670704, 8.757726737542315 -0.7377661162216929 M68.81256264021854 45.969107211235205 C49.31615151168892 29.641943141109735, 35.49416773820771 15.327914680365677, 13.204501638241759 -1.707277995154472 M66.34856323285263 44.25371425099973 C48.33225725688461 28.242581019448615, 30.469817432885165 12.358972972233566, 13.997067026846757 -1.7834046432676036 M73.36939237473248 43.45647839211699 C56.797207314028334 32.26686276519965, 41.06127485014807 16.064818899421326, 19.12500015097652 -0.4944012215614677 M71.917233966493 44.317954263992874 C60.683886363027234 34.4465767360514, 47.446883317564726 23.062783370272975, 19.721320524914006 -1.5136308850974203 M80.09851515504538 45.77137542938188 C68.20810471622342 33.63272902891128, 54.602449009400104 20.729963741575833, 26.022703603752184 -2.6917758292538956 M79.1558765026462 43.660985468452125 C61.27253904724328 28.091895741487537, 41.04124746278585 11.861141407066981, 25.696031450204778 -1.4832997160984114 M84.20506026812193 45.639697301851825 C73.00852262819735 34.54571075097088, 61.09036480182684 22.394111773384672, 33.597338774201795 -2.648294502438235 M84.94411963999566 44.7746370189807 C71.1629304483247 33.133584500208904, 57.654752382457836 21.63992474504749, 31.980972967070862 -2.242170519695806 M92.9058757509269 45.463124238348854 C81.28103335180872 35.25976255176301, 69.25742228534693 22.480747367012636, 38.17122244929007 0.11975423770750382 M90.93698280222492 45.03159744518186 C79.24373518532582 34.2678475347243, 65.30283377444279 21.590626435064507, 38.70968012419573 -1.989820391286365 M99.5686571310933 45.41229480466648 C82.66453983184104 34.13671305376472, 70.42435933024431 19.42645419096157, 42.82839016598175 -2.978136879994686 M96.89150477868573 44.591091738630375 C80.81128332457605 30.0705548065043, 64.08419750639922 15.191230951177644, 43.41607859526287 -2.6699670512668447 M102.3039803653631 44.95939270063616 C87.58864674376545 28.59831303895595, 73.08265014637412 16.713373547860797, 51.55372495190987 -1.513516789751293 M102.4462766628388 43.74396814993895 C92.98168529291533 34.930992970272044, 81.7279660660275 25.995911352021675, 50.36111341962406 -2.0647917580624835 M109.87797848136935 46.251618992051434 C96.16408594612177 29.829384353598815, 83.35232021405227 18.241996787744235, 55.824990263816396 -0.5063892917308266 M110.22161849403594 44.09867042068238 C90.52476481648492 28.60932151065301, 71.56257875283758 10.62975334279998, 56.31228194074003 -2.4498525110605414 M116.30437451607676 43.64377890773427 C99.46285476718313 30.368902368077055, 80.92860830860306 16.33163368424413, 64.52550662933436 -1.0517408201397807 M115.5991121959606 43.926324580463884 C100.47578112415376 30.782443742983496, 85.70636043027025 18.335042831932125, 61.64381044796256 -1.4081118186862618 M122.98361025645009 43.791458926506685 C110.45566147349021 33.7832821082706, 96.93151558706523 23.742327230508174, 70.5361845854285 -2.7869536461561495 M122.0482303172963 44.06641603388672 C109.99350526760381 35.03365433787074, 97.9104661350798 23.319813470168008, 68.11618711497512 -2.484849901336389 M128.84197498012236 45.7442038641622 C116.93637951858332 33.608658561770156, 102.53648531161895 22.635996231479275, 72.87465341906355 -3.798628790957949 M128.4529647723948 43.83246664752566 C112.48920792757656 32.37064326079934, 96.85106257996262 17.797042432252088, 74.20279074046121 -1.2819876044076892 M132.92363839383552 42.47810367053226 C114.2218850428832 27.988498289201118, 96.6277536318209 13.075819402918384, 82.83438990787252 -0.529408938751665 M132.89135977971682 44.35712154699341 C119.2370277763531 31.635199733678967, 104.51707038440644 19.501556026364124, 80.4165194887726 -1.5524093113253983 M140.3033272752226 44.26926566141044 C122.83360535818932 30.635593274515976, 105.43661640107182 13.689178517319498, 86.5767512585974 -2.245391639705808 M139.73568850361414 43.77812134886478 C126.07154000102186 32.54110493528937, 112.38313812917335 20.71615029436611, 87.08932382837 -1.8548761464767765 M137.88597218924346 38.3218700547041 C129.11126697234593 28.3994282781295, 114.64450398642474 14.973928693891779, 95.04365174887123 -3.662657078026868 M140.63959977076533 38.678857696828395 C122.50946848957 23.761957115810837, 105.02567058014672 8.697706307303157, 93.47238317360522 -2.3680537432870636 M138.43241405531325 31.88705889813086 C130.33455032112653 27.313346859314567, 120.01355703696765 16.565837271454882, 98.54959566565873 -3.4146812748928497 M139.75132463730432 33.019762189383115 C130.80376088369303 26.159490456562047, 121.7603898278134 16.798460498310543, 99.51563882316174 -1.5454254471530149 M140.2836975801739 29.39404714152218 C131.00070393209955 19.64426204557921, 123.63155840104358 13.73703118550969, 104.92080604592462 -3.9591590014817655 M140.6254698216918 29.081917452141653 C128.35628508138188 19.142249415479466, 119.3740516105417 10.68336791397926, 105.19442270072301 -2.2096152284713924 M139.31142454145854 25.072067100671596 C132.48200544180813 12.834356529015015, 119.38396718593575 6.1765079696032, 109.63435341730197 -3.484826692000084 M139.36087098827963 23.901501601678795 C131.23438608950133 16.892904347933666, 122.96547572509395 8.820263305175992, 111.83608609564085 -1.6467105956674146 M138.75507714590904 15.826896225410593 C135.11043310846804 14.034786479608686, 127.02673047465206 6.513530005339149, 118.4101887365327 -2.3788756466163363 M139.62541610552844 17.958816013607276 C132.70017862769453 9.507118560319721, 123.01547617384827 2.624360336193435, 117.89953260137595 -2.261175798128363 M139.1346946711585 12.614329331532659 C133.90209000526713 7.098412407177522, 126.97809054669494 2.5910560135717535, 125.1238200312499 -3.8534797995370056 M139.7562220877901 12.364257554501833 C135.6139762442948 8.989158961517155, 130.59073501465235 3.7511334110028685, 124.07539475728014 -2.3898795942794706 M139.96381494702572 5.879125411791849 C138.23027307253062 6.349020928085839, 134.72428813332638 2.423242349930285, 130.23148289141722 -0.8303881091410685 M139.76638481125036 7.329041399510697 C136.767659168501 4.622118820336794, 134.29905297718344 1.2705773430096337, 129.3651769630193 -1.183857805316375 M139.55026730019748 1.990537154514004 C138.03785644077456 0.6914205664014152, 137.43022681175614 -1.3093019983645626, 135.61834338218864 -1.9283816237114195 M140.22570592574922 2.0884311635304584 C138.70290762637828 1.0706778617648378, 137.32947616657802 -0.23780607597970288, 135.36821925048676 -1.87034300548209"
stroke="#ced4da" stroke-width="0.5" fill="none"></path>
<path
d="M-1.1135634668171406 -0.8221067301928997 C38.859346253322244 1.1533425512993882, 76.26934017145967 0.06925935179835063, 140.93150192548842 -0.8044588677585125 M0.17723297514021397 -0.7870570067316294 C41.72988498974261 -2.049094549504052, 84.1443476885491 -1.6489358461768728, 140.67672578853876 0.8966344352811575 M138.4621692304811 -1.9629795663058758 C141.0403441467718 10.503167948449322, 141.83186867622888 21.034977847237723, 138.17426723291487 45.176985744348336 M139.11115257305414 -0.8880502227693796 C140.7371989577964 15.165931510038217, 139.24438221156151 27.727334736376086, 139.6739010648182 44.0016855503978 M138.91492599298567 45.51599312197285 C105.29673944923489 42.393301599136485, 68.97476796285716 45.14844416867842, 1.1904189102351665 43.16850579153614 M140.17984579366953 44.45969193238633 C90.20493346345711 44.216453046314555, 39.68765701276129 43.6011126346515, 0.073489123955369 43.85079227942841 M-1.4946730621159077 43.09390843283253 C0.278952686228546 31.044731492463583, -0.5633042174198304 18.205992326752842, 1.573893141001463 1.395809281617403 M0.5860103908926249 43.764868607706 C-0.030750473618688767 28.30046832837778, 0.2104987896679018 13.839461222267332, -0.5904493387788534 0.4199678059667349"
stroke="transparent" stroke-width="1" fill="none"></path>
</g>
<g>
<g
transform="translate(160.14739927070377 22.238037042632186) rotate(270.07194166149554 -0.5550582851521995 24.125604218213198)">
<path
d="M1.0177691169083116 0.7330422811210155 C0.5069009876730279 8.469579850583138, -1.7516145733583774 39.07294884696037, -2.127885687212802 47.142015865885384 M0.09295917394571007 0.07228553337045018 C-0.12810021497175214 7.866526984394328, 0.4553478915683867 40.29809332680323, 0.10314364524430841 48.17892290305599"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g
transform="translate(160.14739927070377 22.238037042632186) rotate(270.07194166149554 -0.5550582851521995 24.125604218213198)">
<path
d="M-6.699121066316145 23.974565680760115 C-6.339254506956795 33.96119486370708, 0.037083777608089896 42.603353713599176, 1.1961396353120288 48.18835946198806 M-7.880602559953111 24.681386330786317 C-6.548438292518609 30.943060442841695, -2.902697102559192 38.919191067985395, -0.0764298552846232 47.98647887099787"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
<g
transform="translate(160.14739927070377 22.238037042632186) rotate(270.07194166149554 -0.5550582851521995 24.125604218213198)">
<path
d="M9.75502582862224 24.12238971692443 C3.8925202893841684 34.19267947013765, 4.047376857030704 42.77894453567315, 1.1961396353120288 48.18835946198806 M8.573544334985275 24.829210366950633 C5.372070219411432 31.19054093563014, 4.482936464359031 39.12593024965316, -0.0764298552846232 47.98647887099787"
stroke="#000000" stroke-width="1" fill="none"></path>
</g>
</g>
</svg>
<figcaption>Pivot Table</figcaption></p>
</figure>
<h3 id="aggregate-expressions"><a class="toclink" href="#aggregate-expressions">Aggregate Expressions</a></h3>
<p>Using <code>CASE</code> is flexible but it's a bit tedious. Applying conditions on aggregates is so useful that SQL added <a href="https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES" rel="noopener">special syntax</a> for it:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">emp</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">role</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'R&D'</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s s-Name">"R&D"</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'Sales'</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s s-Name">"Sales"</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">emp</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">role</span><span class="p">;</span>
</pre></div>
<p>This way of reshaping data into "pivot tables" is very common for visualizing and analyzing data. There is no doubt Pandas is much more flexible and comfortable when it comes to pivot tables. However, the process of turning rows into columns is <strong>also very common in ETL processes</strong>, where data is <a href="https://en.wikipedia.org/wiki/Denormalization" rel="noopener">denormalized</a>.</p>
<p>The bottom line is that Pandas may be better to quickly analyze and visualize small sets of data, but ETL processes may benefit from doing this process in the database using conditional or aggregate expressions.</p>
<hr>
<h2 id="running-and-cumulative-aggregation"><a class="toclink" href="#running-and-cumulative-aggregation">Running and Cumulative Aggregation</a></h2>
<p>Aggregations over a sliding window are very common, usually on a time series. For example, traders use moving averages as an indication of a stock's trend, running sums can be used to <a href="/sql-anomaly-detection#backtesting">backtest an anomaly detection strategy</a> and so on.</p>
<p>To illustrate, take this table listing daily temperatures:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-02'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-03'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">13</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-04'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">14</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-05'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">18</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-06'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">15</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-07'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">16</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-08'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">17</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">temperatures</span><span class="p">;</span>
<span class="go"> t β c</span>
<span class="go">βββββββββββββΌββββ</span>
<span class="go"> 2021-01-01 β 10</span>
<span class="go"> 2021-01-02 β 12</span>
<span class="go"> 2021-01-03 β 13</span>
<span class="go"> 2021-01-04 β 14</span>
<span class="go"> 2021-01-05 β 18</span>
<span class="go"> 2021-01-06 β 15</span>
<span class="go"> 2021-01-07 β 16</span>
<span class="go"> 2021-01-08 β 17</span>
</pre></div>
<h3 id="window-functions"><a class="toclink" href="#window-functions">Window Functions</a></h3>
<p>Say you want to compare each day to the hottest day ever:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">hottest_temperature</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="p">;</span>
<span class="go"> t β c β hottest_temperature</span>
<span class="go">βββββββββββββΌβββββΌβββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β 18</span>
<span class="go"> 2021-01-02 β 12 β 18</span>
<span class="go"> 2021-01-03 β 13 β 18</span>
<span class="go"> 2021-01-04 β 14 β 18</span>
<span class="go"> 2021-01-05 β 18 β 18</span>
<span class="go"> 2021-01-06 β 15 β 18</span>
<span class="go"> 2021-01-07 β 16 β 18</span>
<span class="go"> 2021-01-08 β 17 β 18</span>
</pre></div>
<p>By adding the <code>OVER (PARTITION ...)</code> clause to the aggregate function <code>MAX</code>, you turned it into a <a href="https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS" rel="noopener">window function</a>. Window functions operate on a set of rows determined by the <code>PARTITION</code> clause. Since you used a constant value <code>PARTITION BY 1</code>, the function operates on all the rows.</p>
<p>To complete the query, use the result of the window function in an expression:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="o">::</span><span class="k">float</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">100</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">compared_to_hottest_day</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="p">;</span>
<span class="go"> t β c β compared_to_hottest_day</span>
<span class="go">βββββββββββββΌβββββΌββββββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β -44.44444444444444</span>
<span class="go"> 2021-01-02 β 12 β -33.333333333333336</span>
<span class="go"> 2021-01-03 β 13 β -27.77777777777778</span>
<span class="go"> 2021-01-04 β 14 β -22.22222222222222</span>
<span class="go"> 2021-01-05 β 18 β 0</span>
<span class="go"> 2021-01-06 β 15 β -16.666666666666664</span>
<span class="go"> 2021-01-07 β 16 β -11.111111111111116</span>
<span class="go"> 2021-01-08 β 17 β -5.555555555555558</span>
</pre></div>
<p>If you're are not sure what is the purpose of casting the temperature to float, make sure to read my tip about <a href="/sql-dos-and-donts#be-careful-when-dividing-integers">dividing integers in SQL</a>.</p>
<h3 id="sliding-window"><a class="toclink" href="#sliding-window">Sliding Window</a></h3>
<p>Comparing each day's temperature against the hottest temperature <em>ever</em> can be useful, but more often than not, you want to compare a value to a limited period, or in other words, a sliding window.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 225.47702292506506 87.35177153348172" width="auto" height="10em">
<g transform="translate(13.88959401038312 53.636774791360494) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M-0.24147923849523067 -0.44432140327990055 L17.906558263013594 1.8072009589523077 L17.142505573938124 25.27377608382226 L0.2150809559971094 23.22675642334939" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.2081591442314812 -1.719491982290383 C2.8937599421319544 -0.8871284732484628, 7.011981255466572 -0.964178155679026, 16.940209427505373 -1.4284389142148284 M-0.48783951316909396 -0.8649253128131389 C6.395639008024929 -0.8048800743401883, 14.06736643908072 0.0052715164662442415, 16.935489532137733 -0.35180197206810926 M17.795771716991894 -1.9701620545238256 C16.102611971351614 7.627304670553231, 19.229429972621908 14.191181078857827, 15.867937146584026 24.58121927418829 M17.744015959920816 -0.2723116325214505 C17.084413556418202 10.17799668516404, 18.524467824063084 18.815250918071698, 17.936584768714837 24.27058715057642 M17.31489399544385 24.303975485697183 C12.498476374250103 23.608876427832705, 8.35133328896253 26.355467620730252, -0.7035320215307561 25.27131427448903 M16.988052714022402 24.442194115340428 C13.845867817261363 25.445616828790953, 9.82709105127517 25.459700253299424, 0.8732125289221866 24.384937348838125 M-1.8817004058510065 25.493638652945766 C1.2630745989904615 20.019540878791922, -1.3221702354326037 13.67310767067812, -0.47306723333895206 0.5715794954448938 M-0.09406284522265196 24.54632774424822 C-0.3219884346815302 18.40731427001747, -0.7975397120805934 13.166611457102634, 0.9591112034395337 -0.7430987702682614" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(54.554969835078396 15.587598017623918) rotate(271.5481576989779 3.896384374871843 32.45260051974486)"><path d="M0 -0.11004032739924836 C1.6454524547526415 -0.2341022188290171, 3.290904909505283 -0.35816411025878586, 7.382623026073015 -0.6666666666666856 M0 -0.11004032739924836 C2.3625043278074735 -0.28816565370220154, 4.725008655614947 -0.46629098000515473, 7.382623026073015 -0.6666666666666856 M7.382623026073015 -0.6666666666666856 C7.521089570887774 21.695680320922538, 7.659556115702534 44.05802730851176, 7.792768749743645 65.57186770615644 M7.382623026073015 -0.6666666666666856 C7.484693805484187 15.817764208241748, 7.586764584895359 32.30219508315018, 7.792768749743645 65.57186770615644 M7.792768749743645 65.57186770615644 C5.599831301540484 65.57186770615644, 3.4068938533373236 65.57186770615644, 0.6152185855060845 65.57186770615644 M7.792768749743645 65.57186770615644 C5.816893115546164 65.57186770615644, 3.841017481348684 65.57186770615644, 0.6152185855060845 65.57186770615644" stroke="#000000" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(73.2606024117091 14.50976948040693) rotate(271.5481576989779 8.111319967999236 32.45260051974492)"><path d="M0 -0.11004032739924836 C5.045925042641416 -0.292793158969819, 10.091850085282832 -0.4755459905403896, 15.368816781472365 -0.6666666666666856 M0 -0.11004032739924836 C5.298168847097074 -0.30192890100340924, 10.596337694194148 -0.4938174746075701, 15.368816781472365 -0.6666666666666856 M15.368816781472365 -0.6666666666666856 C15.580828333482476 15.7809248525076, 15.792839885492588 32.228516371681884, 16.22263993599842 65.57186770615644 M15.368816781472365 -0.6666666666666856 C15.664577608649529 22.278089063205364, 15.96033843582669 45.222844793077414, 16.22263993599842 65.57186770615644 M16.22263993599842 65.57186770615644 C11.623521240381427 65.57186770615644, 7.024402544764435 65.57186770615644, 1.2807347317893643 65.57186770615644 M16.22263993599842 65.57186770615644 C11.629962429919907 65.57186770615644, 7.037284923841396 65.57186770615644, 1.2807347317893643 65.57186770615644" stroke="#333" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(96.17956926128761 9.0601776629818) rotate(271.5481576989779 13.169242679752074 32.45260051974486)"><path d="M0 -0.11004032739924836 C6.113718496916701 -0.2464230922145052, 12.227436993833402 -0.382805857029762, 24.95224928795159 -0.6666666666666856 M0 -0.11004032739924836 C7.548998267646831 -0.2784408279860006, 15.097996535293662 -0.4468413285727529, 24.95224928795159 -0.6666666666666856 M24.95224928795159 -0.6666666666666856 C25.367981482043483 19.198269611757855, 25.783713676135374 39.063205890182395, 26.33848535950414 65.57186770615644 M24.95224928795159 -0.6666666666666856 C25.334519090380468 17.599335760696654, 25.716788892809348 35.865338188059994, 26.33848535950414 65.57186770615644 M26.33848535950414 65.57186770615644 C21.130428130980118 65.57186770615644, 15.922370902456093 65.57186770615644, 2.0793541073292996 65.57186770615644 M26.33848535950414 65.57186770615644 C21.3431668399922 65.57186770615644, 16.34784832048026 65.57186770615644, 2.0793541073292996 65.57186770615644" stroke="#666" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(120.74239127330486 3.66720342308372) rotate(271.5481576989779 18.508161097713526 32.45260051974492)"><path d="M0 -0.11004032739924836 C7.10185918699874 -0.222766208681882, 14.20371837399748 -0.3354920899645156, 35.06809471145765 -0.6666666666666856 M0 -0.11004032739924836 C7.065936937189219 -0.22219602457081902, 14.131873874378439 -0.3343517217423897, 35.06809471145765 -0.6666666666666856 M35.06809471145765 -0.6666666666666856 C35.59951045598019 17.401141302091084, 36.130926200502735 35.46894927084885, 37.01632219542708 65.57186770615644 M35.06809471145765 -0.6666666666666856 C35.5702254147786 16.405467940431592, 36.07235611809955 33.47760254752987, 37.01632219542708 65.57186770615644 M37.01632219542708 65.57186770615644 C29.13881545886252 65.57186770615644, 21.26130872229796 65.57186770615644, 2.922341225954805 65.57186770615644 M37.01632219542708 65.57186770615644 C26.038556107589507 65.57186770615644, 15.06079001975193 65.57186770615644, 2.922341225954805 65.57186770615644" stroke="#999" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(146.1717339584817 -0.07080076045792794) rotate(271.5481576989779 23.285088103257806 32.45260051974486)"><path d="M0 -0.11004032739924836 C11.564155316553745 -0.2559388465007664, 23.12831063310749 -0.4018373656022845, 44.11911430091004 -0.6666666666666856 M0 -0.11004032739924836 C9.502079417450092 -0.22992278119217524, 19.004158834900185 -0.34980523498510213, 44.11911430091004 -0.6666666666666856 M44.11911430091004 -0.6666666666666856 C44.66942342500927 14.205120026668288, 45.2197325491085 29.07690672000326, 46.570176206515605 65.57186770615644 M44.11911430091004 -0.6666666666666856 C44.689098229905724 14.736820281742153, 45.25908215890141 30.14030723015099, 46.570176206515605 65.57186770615644 M46.570176206515605 65.57186770615644 C31.753897957772878 65.57186770615644, 16.937619709030155 65.57186770615644, 3.6765928584091707 65.57186770615644 M46.570176206515605 65.57186770615644 C31.12600884629989 65.57186770615644, 15.68184148608417 65.57186770615644, 3.6765928584091707 65.57186770615644" stroke="#ccc" stroke-width="2" fill="none"></path></g></g><g transform="translate(43.50907959425501 54.95042783487554) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M-1.5256249513477087 0.5579734947532415 L19.683581220815412 0.4823970105499029 L19.53111224145674 25.706223874921807 L-1.097594877704978 27.05158778988839" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M0.8996284070701903 0.3597140709696265 C3.032922568366642 -0.6140530104045832, 7.732596368290962 -1.5038281345581548, 16.083993972787425 -0.11341646702893038 M-0.1898431636844622 -0.39309847780712215 C6.198131230595502 -0.36289539852311425, 12.871647676191746 0.4279995492820353, 17.86097761449649 0.8056740607370796 M16.927339075962536 1.9178228173404932 C16.84861132134688 9.930678832685103, 18.220962312119127 20.222078320319753, 19.286884724060528 24.5997316423905 M18.146777240457467 -0.5825388478115201 C17.81536165007541 6.526485508820458, 17.06010913976619 14.516974823643354, 17.696785090389184 24.90031469297678 M17.024552062129803 24.222139089451765 C9.413834921348943 25.508833039747046, 2.599219792798026 26.773578254005926, -0.8824023778613684 25.745123159002485 M18.043605636099027 25.77289193573401 C11.324924174064604 24.93766641038683, 5.725802680879483 25.539120599997705, -0.33629739356416466 24.78306974384266 M-0.016009816899895668 24.244057912017116 C-1.5018006311098235 16.785040784926693, -1.2631433354059356 9.898634512938806, -0.33488900400698185 -1.4271124210208654 M0.9639146523550153 24.178113315823385 C-0.6049877802384698 17.279549422135773, 0.40957896909586555 10.09490898273283, 0.30580215621739626 -0.624117230065167" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(73.67574626092153 54.78376116820914) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M0.33776656724512577 0.32996748946607113 L17.605700003812544 -0.9218289349228144 L16.140498327920668 25.382003575201043 L0.4643173012882471 26.11437997185708" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M1.3680242693743159 -0.4029874005892762 C5.154752813374069 -1.4967254176557339, 10.463240151302516 0.6110136552978536, 16.711299870341264 0.4742839786046895 M-0.04279290165412608 -0.136294848684136 C5.629096657797269 0.49062873345214425, 10.009416145175932 -0.5284806078047617, 17.329120788378514 -0.7405509526204549 M16.750480531612865 0.7790285144001245 C18.026327090799228 7.721653644100263, 19.46696497732343 16.83308103515294, 19.365668116966717 23.55289061703802 M17.365982917966775 -0.30527979601174593 C17.439736690320117 7.467722620166373, 17.261394198931796 13.40264653750416, 17.340080199661188 24.398534361841985 M17.447885450592803 23.787722850561188 C13.319840567037422 25.556424809508346, 8.049094404788196 26.6818973982536, -1.5541371957170382 24.126942064440637 M18.016339194641343 24.500180975614576 C12.522260772502324 25.250598980232063, 9.513114292120864 25.21707133222115, 0.7923576647042536 25.34330384548283 M-1.4362841937690973 26.17961205878378 C1.5731823721802496 17.621307176392598, -1.2602542957389093 14.658684438166459, 1.3293359782546759 0.3770063314586878 M-0.008831421844661236 24.752365623238394 C-0.8045491102320408 18.99050177095478, -0.918272537289307 13.844111665924586, 0.5165449278429151 0.14021190535277128" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(103.00907959425501 54.78376116820925) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M0.32996748946607113 -0.1393159832805395 L16.82318705217027 -1.6045176591724157 L18.07309007541322 25.518246788169154 L1.0604504849761724 26.059289771701106" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.4029874005892762 -0.5182887935135048 C4.458883360454448 -1.0196454894100622, 12.420318880421515 -1.1705185436892522, 18.219299965697772 -0.4576953205183083 M-0.136294848684136 0.7632871821876519 C5.116469366534954 0.2863557742875231, 8.554952013037735 -0.40733661838374513, 17.004465034472627 0.19967205703739954 M18.524044501493208 0.48541860841214657 C17.62161299713038 7.401160293940588, 18.653033406809794 12.570859727882116, 16.243977117250196 24.406643300886163 M17.439736191081337 -0.16103328298777342 C18.59616445312454 5.7343429338362375, 17.825762647485274 11.742501628992768, 17.08962086205416 24.475848733085105 M16.478809350773364 24.92645364115944 C13.957757514625964 23.923693999594377, 10.710976128026052 23.47007923751384, -0.9269874224402695 24.27637392654441 M17.191267475826752 24.377123508081578 C11.795679509333606 25.968190905001197, 5.565284168943953 25.900551793094813, 0.28937435860192173 24.56700852057525 M1.1256825719028711 26.877357810373315 C-1.4121496824169288 14.025282783268299, 1.7664531918620934 6.345988746763869, 0.3770063314586878 -1.5273741576820612 M-0.30156386364251375 24.264667599099585 C-0.3013373273296977 16.48189310147086, 0.15511519137185564 10.232372273580902, 0.14021190535277128 0.16888328362256289" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(163.00907959425479 55.61709450154251) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M0.8107856418937445 1.433719852939248 L16.836037217537395 -1.3141852226108313 L16.7101353396573 26.330197114135036 L-1.2853648159652948 23.208898026133784" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M-1.331797937130508 -0.5743051859350623 C6.010744726288128 0.4848107035160743, 13.392205134596518 -0.6855119758471151, 17.45926216898587 1.3528745129134627 M-0.5814998485839269 -0.5129026108968668 C4.2456565315429575 -0.34064498103578683, 9.293998615618534 -0.5612473979426081, 17.681278064232348 0.29249775143902434 M16.700229751775495 -0.876365015283227 C18.970861747758903 5.389116479249032, 18.598118796842613 10.676500058131877, 16.219391035745375 25.61190298163415 M18.071163181113533 -0.548797438852489 C18.804274842031823 10.426584916751764, 17.087303593643533 20.452066645219787, 18.656730148839287 24.408231137889334 M18.07951515604157 23.698765544563855 C13.42591713551556 24.043966375618343, 8.874596444550017 24.63674936966605, -1.4005464816682824 24.23662050444031 M17.86941911219637 25.203771315272665 C14.170242035152171 25.08378635814288, 9.89375743473218 24.43378567906746, 0.145541998554795 25.259912435242004 M0.06029830314218998 23.71332046353341 C-0.6641071354510089 14.398002855305847, 1.3952444876073102 8.51559807361637, 1.7155870590358973 1.1029267217963934 M0.8902841163799167 25.958511261838385 C-1.0151326174739461 18.66915684072247, 0.5964350287910838 11.91677923387657, 0.6977958930656314 -0.6944458289071918" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(132.17574626092153 54.78376116820914) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M1.433719852939248 -0.9089787695556879 L16.430830764482252 -1.0348806474357843 L19.02128361434721 23.768564670915612 L-1.8450314607471228 23.540287762517938" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.5743051859350623 -1.0216552345742953 C6.549287287426505 -0.5394149934152785, 11.299648702203125 -0.15257402439416154, 19.097890500006546 -0.6022228541935888 M-0.5129026108968668 -0.47562541039137957 C4.896799172032933 -0.03226529853335389, 9.985496828412083 0.052562046856051176, 18.03751373853211 -0.5502612168182517 M16.868650971809856 0.9843472633510828 C16.95233884127956 10.542768131646453, 16.2773551610313 18.33670268707561, 18.302989481846325 26.992494720603236 M17.196218548240594 0.9988291515037417 C18.450731872278716 7.402333783529695, 18.531600921112563 17.15266592257469, 17.09931763810151 24.54892531466753 M16.389852044776035 25.162204720483803 C9.619177450305298 25.071708031593527, 3.214030459759522 23.686832922107474, -0.8173089824405986 23.451580602088804 M17.89485781548484 25.20031144627582 C12.367040340470691 24.669863382212306, 6.400868123537145 25.22435211045636, 0.20598294836109643 25.524372257116475 M-1.340609023347497 24.515763479853877 C-2.0300846588647783 17.799524972608136, 0.932111989256102 9.218336285308684, 1.1029267217963934 -0.6242121662944555 M0.904581774957478 24.120788459303686 C0.5532022525912847 17.396261360537196, 0.5350551118499365 9.079535889883498, -0.6944458289071918 0.40539282094687223" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(193.84241292758873 54.78376116820925) rotate(271.5481576989779 8.872507993546549 12.52696474344043)"><path d="M0.3654832150787115 -1.759724935516715 L18.062653659740917 0.6880963835865259 L16.845548092285625 26.839179236079463 L-0.20935643650591373 23.209382909919032" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.6744251884655568 1.1794544093173625 C7.275842202660968 -1.2655508712222998, 13.990961301027859 -1.038295481099552, 17.729344614910854 -0.5351255581465966 M-0.0036666291349009006 0.4583049001312185 C4.047361439562928 0.31007671687096805, 7.9922985883141955 0.09843077971494116, 17.336069257403825 -0.7118047878396921 M18.750376271913282 1.7463085558265448 C17.863479154865345 6.5865470764790475, 17.265014844696125 14.042986997888386, 19.266201603124372 26.204800933237085 M18.07227638726168 -0.8132234616205096 C18.773463345058417 10.569870481306289, 16.95003820110128 19.680035232643732, 18.42344260578089 25.38524016608567 M19.017085072664067 24.247437346996215 C10.017259373885869 24.207667826094394, 5.753896011673419 23.985425547340903, -1.63700563838237 23.710949657003304 M17.693890962322598 25.52958259540803 C13.28349117723188 25.615982045501124, 8.656473196713472 24.785888008496617, 0.3358790657775048 24.651206225861248 M-0.4165212716907263 23.155623703832635 C-0.9187578456662611 17.202892669794256, -0.3276812927983716 10.07815502870021, -1.867229601368308 -1.0400876495987177 M0.4027136219665408 24.443175433772513 C0.04613510068324839 14.925393759892023, -0.7893883556518193 5.439919710828427, 0.14141302835196257 0.14829157758504152" stroke="#000000" stroke-width="1" fill="none"></path></g></svg>
<figcaption>Sliding window</figcaption></p>
</figure>
<p>To find the highest temperature <em>in the last three days</em> for example, you can add a frame clause:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">t</span>
</span><span class="hll"><span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">hottest_temperature_last_three_days</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="p">;</span>
<span class="go"> t β c β hottest_temperature_last_three_days</span>
<span class="go">βββββββββββββΌβββββΌβββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β 10</span>
<span class="go"> 2021-01-02 β 12 β 12</span>
<span class="go"> 2021-01-03 β 13 β 13</span>
<span class="go"> 2021-01-04 β 14 β 14</span>
<span class="go"> 2021-01-05 β 18 β 18</span>
<span class="go"> 2021-01-06 β 15 β 18</span>
<span class="go"> 2021-01-07 β 16 β 18</span>
<span class="go"> 2021-01-08 β 17 β 17</span>
</pre></div>
<p>Once again you used a window function, but this time you added a frame clause to it, stating the window should include 2 previous rows and the current one.</p>
<div class="admonition info">
<p class="admonition-title">Try it!</p>
<p><embed loading="lazy" width="100%" height="520" frameborder="0" src="https://app.hex.tech/global/app/d1f8949c-3a73-49d8-8597-d73a9b13cc44/latest?embedded=true" /></p>
<p><a style="font-weight:bold;font-size:0.8em;color:var(--stable-color);" href="https://app.hex.tech/global/hex/d1f8949c-3a73-49d8-8597-d73a9b13cc44/draft/logic" target="_blank">Try in on Hex Β»</a></p>
</div>
<p>The frame syntax is very flexible, and it is not restricted to <code>ROWS</code>. The query above can be expressed using a range frame as well:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">t</span>
<span class="hll"><span class="w"> </span><span class="k">RANGE</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2 days'</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'0 days'</span><span class="w"> </span><span class="k">FOLLOWING</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">hottest_temperature_last_three_days</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="p">;</span>
<span class="go"> t β c β hottest_temperature_last_three_days</span>
<span class="go">βββββββββββββΌβββββΌβββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β 10</span>
<span class="go"> 2021-01-02 β 12 β 12</span>
<span class="go"> 2021-01-03 β 13 β 13</span>
<span class="go"> 2021-01-04 β 14 β 14</span>
<span class="go"> 2021-01-05 β 18 β 18</span>
<span class="go"> 2021-01-06 β 15 β 18</span>
<span class="go"> 2021-01-07 β 16 β 18</span>
<span class="go"> 2021-01-08 β 17 β 17</span>
</pre></div>
<p>Notice how nice the <code>RANGE</code> syntax is... it reads like an actual sentence!</p>
<hr>
<h2 id="linear-regression"><a class="toclink" href="#linear-regression">Linear Regression</a></h2>
<p>Another common tool for analyzing data is linear regression.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 222.42911531172103 127.92075252892323" width="auto" height="10em">
<g><g transform="translate(110.67622217317125 14.954809705273874) rotate(270.07194166149554 1.8402040890621265 99.34177735880428)"><path d="M0.53911277577281 0.3127993293106557 C1.3064013414354425 33.32151526644653, 3.7852332764721424 165.10730276777477, 4.317399924410165 198.1459282202915 M-0.6369917462859302 -0.5685849681403488 C0.061766735183949484 32.56443690062328, 3.1421854159000278 165.88402509123154, 3.755977666977451 199.25213968574894" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(110.67622217317125 14.954809705273874) rotate(270.07194166149554 1.8402040890621265 99.34177735880428)"><path d="M-6.574855462930456 172.51113087657498 C-5.446590828587766 179.21006044418047, -1.7855112223711975 184.88424674775683, 4.904828417155549 199.42670153275026 M-7.534901651836291 172.12154620300637 C-3.7174812387736385 177.1567587280794, -1.7049247682753181 183.59100641439727, 3.9138790676776627 199.31065288565947" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(110.67622217317125 14.954809705273874) rotate(270.07194166149554 1.8402040890621265 99.34177735880428)"><path d="M13.941799665003296 172.07885232370762 C9.404952056776388 178.9893241351946, 7.400582125725538 184.78287942518915, 4.904828417155549 199.42670153275026 M12.981753476097461 171.68926765013902 C12.41490341929704 177.00475660284636, 10.04211041020417 183.53140202724745, 3.9138790676776627 199.31065288565947" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(12.191909039370898 11.918274662560634) rotate(178.1590126331561 0.7558961721176161 52.76398613550376)"><path d="M-0.4820705719292163 0.3714224047958852 C-0.24649167600890295 17.90869585394906, 1.8612282035936762 88.48855806887381, 1.9938629161643802 106.00715704903284 M1.4657036484684798 -0.47918477802537374 C1.5739310872102004 16.61589920328045, 1.3097086979047556 86.2104989589867, 1.5345409511758952 104.07658508987443" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(12.191909039370898 11.918274662560634) rotate(178.1590126331561 0.7558961721176161 52.76398613550376)"><path d="M-8.20382391247568 77.47272963050413 C-4.42132906832877 84.52249316013025, -1.6867323533069127 94.33288301604247, 3.065711390150991 103.20484485672132 M-8.024861460138535 76.1692663968693 C-6.541724488089862 84.09862591561752, -1.8348384695070656 92.60472803851545, 2.409590536652413 103.59512562685923" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(12.191909039370898 11.918274662560634) rotate(178.1590126331561 0.7558961721176161 52.76398613550376)"><path d="M12.317263561651934 77.4022224319893 C8.95137381537043 84.39410982142253, 4.537125577104544 94.22906197360237, 3.065711390150991 103.20484485672132 M12.49622601398908 76.09875919835447 C7.589712423880254 83.88388636833831, 5.9071470376139255 92.41194163183712, 2.409590536652413 103.59512562685923" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g transform="translate(34.41213414124206 68.28422152560086) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M1.0429479964077473 0.1427345983684063 L8.173145741947565 0.5235993377864361 L4.894982189663324 11.53593564289293 L-0.3849264495074749 10.460524442369554" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.4508036507775584 0.37011181715651276 C2.21618971865858 -0.642317674775598, 3.2284983148470126 -0.2531542648496409, 6.0992359853797025 -0.012818468008578665 M0.241927600761984 -0.08188414634071192 C1.1731922110012265 -0.1025369140050721, 2.441713812293948 0.17854063238757928, 6.535825903742837 0.2005309850955162 M6.0321260925103815 0.23230320437875207 C6.032005053463589 3.24641531312416, 7.124238569116807 8.175233322096325, 6.720255262418073 12.181569502257801 M6.269134850618092 0.04751418342768032 C5.80558232540947 3.4141269364733984, 5.700664081320349 7.030385787370517, 5.908360309035777 11.737424298331192 M5.648319624547072 11.146913218911994 C5.322709577123628 10.972131085287868, 3.894883992318084 11.51118068062062, -0.08739929190571993 11.016477747799637 M5.99147126257284 11.250642966549465 C4.086126101726836 11.243922577952594, 2.0744575451863705 11.069060159117841, -0.20324937886840716 11.451670766940564 M0.9846041928844533 10.244483812824042 C0.7841423051727794 7.727412079903335, 1.1964811827412654 5.216579703120638, -0.9248163383552905 -0.7390902408115679 M-0.03234836489293369 11.711167868089552 C0.5901045981367444 8.523093076414176, -0.22185988573910284 4.9256991488350375, -0.44312998394439784 0.23228959068283428" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(65.60962630425195 90.66666666666674) rotate(0 3.2515151206274595 6.573472501112718)"><path d="M1.209281075745821 1.2068073265254498 L8.299488951239937 0.8583896867930889 L5.831601430449837 14.60681863009669 L1.5598909743130207 14.644166066047923" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.5309137244470274 0.5948180450810778 C2.3679818866395825 0.5604217957327734, 4.128469316462653 0.2009770764154019, 7.090245171950929 0.043704045176992556 M-0.3250992251817407 0.24787944856421218 C1.8190812819269944 0.27157049014434853, 3.727294872447576 0.006797210467585968, 6.403250903337314 -0.30422394730654084 M7.161601942560347 0.6015783332689895 C6.4063323990561205 3.254289786460472, 7.263672771191716 7.714698654715676, 5.862480067375579 12.134757906316377 M6.9668478509760225 -0.30749994250360696 C6.092417758873493 4.402862944758717, 6.052878343031047 7.384623783002833, 6.990860796939014 13.722504943249717 M7.14056476821867 12.635782354704492 C4.595995560526343 12.664859335667185, 3.6411677370927293 12.796910081670061, -0.1661095419878199 13.109531175486888 M6.345193490661199 13.159387747025045 C3.756890574663551 13.333140565328364, 1.6928291051805062 13.222013729032906, 0.31568925211598786 13.232069560423572 M-0.5157836909629956 13.436843741956588 C0.07469170430037325 10.34401456554858, -0.9279860111343581 6.185716832831713, 0.3340058853401786 -0.6152594734894273 M0.06222236080698085 12.605590507490101 C0.34620441696842 9.429399301684725, -0.7153471806542708 5.102272410820007, 0.509788264949617 -0.462317782570959" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(68.775770504879 41.9811912225706) rotate(0 3.418704253333999 5.925614611875517)"><path d="M0.29470355436205864 1.6352726109325886 L7.081610833651212 -0.10947419330477715 L6.408623491770413 9.969775102429573 L0.3281096927821636 10.034209511571113" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.3128663588859836 -0.0958997429584525 C2.2563845302514602 0.31573155065068087, 5.140042458887256 -0.3673853950319513, 6.310994449353806 0.01201356754729388 M-0.15992329186120327 -0.22301740397360964 C2.3724868006289763 -0.2918249658921083, 4.005892725032944 0.2054648206666923, 7.136743328270336 -0.30422357502594827 M5.905857294133443 -0.9746594994117287 C5.9588476622755735 4.068658639241818, 6.199499112751594 8.262599653217896, 6.769224933723329 12.833457969683423 M6.860084369353912 -0.46701256283347475 C7.176734361986892 5.000536526568082, 6.974214988558053 9.958379547539883, 6.99254090100578 11.455289814572073 M6.988177838157156 11.900015039498031 C4.823403535987208 11.351721917456464, 2.104504776062223 11.998112952385775, -0.3199815894211615 11.725742837870026 M6.555863111384179 12.090342182474838 C4.187912359466557 11.544726909662725, 1.221387760748751 12.122829854816171, -0.2404403105664078 12.05793794160744 M-0.397862837214079 12.716294073831586 C0.817880055217013 8.46794844514739, 0.10566905899931713 3.1062059977751595, 0.04660410327777531 0.2633626535482394 M-0.348977136124503 11.71310999755543 C0.3589207619122897 7.558649816387259, -0.30356929195428944 1.9675212712426435, -0.5571353954998797 0.5118012834790197" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(123.745467474576 73.61755485893423) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M-0.6146116741001606 0.0798795111477375 L6.095230513044271 1.4775849469006062 L6.734319739766988 10.198816696877095 L-0.45356957986950874 12.98797873663387" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.4898073932197621 -0.5124736270643077 C1.842256274220604 -0.1736114841654074, 4.220602694384196 -0.22025982291545498, 6.195497141491644 0.5164535187300011 M0.011922914212067426 -0.2455540854055402 C2.5955157409597955 0.19272283066261828, 4.9558273568123985 0.3327489830171595, 6.312916141464132 -0.20818399168335083 M6.479311226671315 0.08023576822882439 C6.434401327458657 2.7596375799184663, 5.4997914497925295 7.359763728865893, 5.70508902283384 11.038786979518571 M5.768303248468476 0.3932579919573359 C6.1841026709857125 3.8301552146550777, 5.650086931512842 9.125823979021598, 5.835906874643651 11.585132857764012 M6.022152575076388 11.700017640999796 C4.471952882794074 11.738398508193692, 1.8520678640136843 11.164613374746544, 0.024504325723246234 11.383644069249252 M6.047856559456653 11.172545859836534 C3.9403640420822836 11.47485075015508, 1.1178599854012221 11.009217146816004, -0.2929404546614067 11.514272504985383 M-0.061552788149892734 11.004080628027305 C-1.1498080232645467 7.8904009153532915, -1.113579233485653 3.402825276348345, -0.8092260919267571 -0.934972987056576 M0.40047659373685895 11.081454218014525 C0.4149474012050854 8.370492958199884, -0.051117481681891286 5.384645056349544, -0.23954083852198876 0.5310732761378696" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(112.276292970918 32) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M0.4401153214275837 0.8066806308925152 L5.783764053855293 0.3018680326640606 L5.974834968123787 12.632915689346568 L-0.7089619748294353 14.843458010551707" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.046410370483156105 0.6313785042319757 C1.608638050681861 -0.46843294338570673, 2.971321206473681 -0.15907102889377867, 6.2479020012181135 0.14339607194328374 M-0.20526992546129863 0.0826066577732179 C1.845142488374232 -0.01471813906655512, 4.275478612169171 -0.055179494003528486, 6.533808029854596 -0.26777663175035027 M5.224879937368238 1.019576529899234 C5.5754452799510705 5.658532200663987, 7.293372322598627 10.908126414768837, 7.067290341392231 12.705583119797213 M6.620686122631429 -0.11319103588529067 C6.63141895151247 5.049805693773558, 7.243537085042378 9.749064229822396, 6.635333254464913 12.759813580746519 M6.158326965377133 12.628290380450771 C4.092274385204775 13.753939770690573, 3.0001916367836334 13.723977881643204, 0.07940275586783296 13.111349302740518 M6.2076268878233645 12.926081507302769 C4.457555367112639 12.887268209926862, 2.671481934424616 13.249476807487607, -0.08446413721407825 13.378538808287919 M-0.045238199215180463 11.846701055667106 C-0.6645569730860242 10.827441029476617, -0.5084665749651569 6.0768933132862735, -0.08947639040686162 0.9712864016509215 M0.5728157195767876 12.879816012852032 C0.008993795913171634 8.606878725777293, 0.15614376747148284 4.114060458604278, -0.11157794395704135 0.3244346614105581" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(168.77577050487855 48.64785788923723) rotate(0 3.418704253333999 5.925614611875517)"><path d="M1.5809017308056355 -0.29255228117108345 L5.046244097298768 -1.2972046621143818 L5.471163583344605 13.84253548025388 L-1.6556923501193523 9.925796612494175" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.2934580473257572 -0.22954165313103236 C2.9637276058066213 0.47186498053447706, 5.441135577333636 0.06096435308546451, 6.719672659821503 0.026887615299726608 M0.0688076011443493 -0.20133770042928023 C2.1234800756224326 0.20707454259686572, 4.644591724755695 -0.1751402508536145, 6.564746886516875 -0.3214318296132848 M6.982113394370571 -0.06487018794700705 C6.600790028583954 3.4640941307957718, 7.049296662531789 8.081246812510782, 6.032397978227974 10.998389730159328 M6.683479674389151 0.4220603596513529 C6.588850278353531 3.0407989249132914, 6.526945903659608 5.1196206065875005, 6.251360224924643 11.598778283948505 M6.7908739974865675 12.356371818014152 C4.312319367743173 11.496243291779292, 1.2882658104615525 12.449775304634715, -0.27785466874776166 11.789040780492673 M6.7793795133893875 12.019959853825984 C5.137427256779882 11.938831839389525, 3.1640356760524506 11.852541590554516, -0.09028717134221576 11.763363513837149 M-0.46205681508605 11.920107304435358 C-0.7416162931681592 8.712426754282184, -0.18923328063674844 6.762051839593487, -0.6510926131717314 0.5683709750400638 M-0.07972408265772302 11.58416420475717 C0.21395895504267998 8.301437512624391, -0.479027082941825 4.656952979738956, 0.44429881676334715 -0.42134223867501575" stroke="currentColor" stroke-width="1" fill="none"></path></g><g><g transform="translate(71.19697168765208 -13.742960400147354) rotate(270.07194166149554 34.064883528928476 76.32405013942409)"><path d="M-0.1751625631004572 0.7279165778309107 C18.78753580275023 37.446244107898984, 36.12281802453832 76.06729258002743, 68.30492962095741 151.92018370101727" stroke="#f41d92" stroke-width="1.5" fill="none" stroke-dasharray="12 8"></path></g></g><g transform="translate(146.27629297091755 10) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M-1.1418089978396893 -0.26213471964001656 L4.997978498015755 1.6473021022975445 L7.422810246024483 11.847555233833567 L1.1564899571239948 14.073665334579722" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.15279988563104874 0.06637966502617698 C1.1385814941684074 -0.15040462877832447, 3.7048564031863283 -0.6164144121783025, 6.193088832859099 -0.31035164356163353 M-0.07383139134831695 -0.2545195669124534 C1.60820683547545 0.0904902096676623, 2.647104465718915 0.21461956678373678, 6.411894837530366 0.10288502106129649 M7.536770268041878 -0.879617413484642 C5.934616642877319 3.0265530260201365, 7.815639245577698 7.746295504413279, 6.378877848703961 12.564336189246958 M6.290595753628718 0.19194953147483174 C6.217599967334683 4.825437907261962, 6.077402498512418 10.179528511560333, 5.8547171348638045 13.412869694105456 M6.02707198436204 13.37176887968589 C4.992516842784791 13.162400696221097, 3.6509727013180067 13.454056549026717, 0.06121229695539954 12.948789710079817 M6.556659586249301 13.03271818248206 C5.125875219889348 13.458319126758704, 3.861693420717658 12.949531509952928, -0.07701702891132442 13.240331349121515 M0.8269948517633949 13.631629715814032 C-0.1613604368539349 10.325435899715774, -0.6245396750407051 6.145954516568489, -1.113568688787406 -0.4748403580367635 M0.264804599144442 12.755124639815566 C-0.47028860871015343 10.107957354624308, -0.418508202313955 7.056631994622795, -0.6248132349596902 0.04686486611797047" stroke="currentColor" stroke-width="1" fill="none"></path></g></svg>
<figcaption>Linear Regression</figcaption></p>
</figure>
<p>For example, performing <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html" rel="noopener">linear regression using Pandas and Scipy</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">scipy.stats</span>
<span class="gp">>>> </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([[</span><span class="mf">1.2</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mf">1.8</span><span class="p">],</span> <span class="p">[</span><span class="mf">3.1</span><span class="p">,</span> <span class="mf">2.9</span><span class="p">]])</span>
<span class="gp">>>> </span><span class="n">slope</span><span class="p">,</span> <span class="n">intercept</span><span class="p">,</span> <span class="n">r_value</span><span class="p">,</span> <span class="n">p_value</span><span class="p">,</span> <span class="n">std_err</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">stats</span><span class="o">.</span><span class="n">linregress</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">df</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="go">(1.0 -0.2000000000000004 1.0 9.003163161571059e-11 0.0)</span>
</pre></div>
<p>Most developers probably don't expect the database to have statistical functions, but PostgreSQL does:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mf">1.2</span><span class="p">,</span><span class="w"> </span><span class="mf">1.0</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span><span class="w"> </span><span class="mf">1.8</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">3.1</span><span class="p">,</span><span class="w"> </span><span class="mf">2.9</span><span class="p">)</span>
<span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">))</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">regr_slope</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">slope</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">regr_intercept</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">intercept</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">sqrt</span><span class="p">(</span><span class="n">regr_r2</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">r</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="go"> slope β intercept β r</span>
<span class="go">βββββββββββββββββββββΌβββββββββββββββββββββββΌβββ</span>
<span class="go"> 1.0000000000000002 β -0.20000000000000048 β 1</span>
</pre></div>
<p>Using <a href="https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE" rel="noopener">statistical aggregate functions in PostgreSQL</a> we got results similar to scipy.</p>
<hr>
<h2 id="interpolation"><a class="toclink" href="#interpolation">Interpolation</a></h2>
<p>Data cleaning is an important part of any data job, and handling missing values is a big part of that.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 330.74143176335497 109.28031153954504" width="auto" height="10em">
<g transform="translate(10 18.685786920406258) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-0.02556205913424492 -1.9061564691364765 L36.695111050859545 -1.6287463195621967 L35.560441150919054 7.540316950079077 L1.914544451981783 9.795044909711951" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.6051103584468365 1.2821125872433186 C12.212825879825226 1.6828978201563467, 22.507747669514163 0.24341072050882467, 35.08773920650825 0.9249549992382526 M0.09150427021086216 -0.9973726514726877 C11.781391350117305 0.1940111434201791, 23.7578327665716 -0.48291946294502475, 36.501264473081086 -0.7969022635370493 M37.160376079344964 0.5873523776664785 C36.74067289094627 3.474988188011666, 36.558621131271494 6.9970267998668545, 37.25745706494878 8.759939693991093 M37.03094135536447 -0.28953357104453814 C36.54838781987791 2.5427005403039513, 36.541664810682875 4.922601956790585, 37.181392240804634 8.718926001834161 M36.70024118538246 6.827780533131236 C25.098829129460828 8.54083056044184, 13.332816701224743 8.625383682962287, -0.488707710057497 10.409931111630076 M37.60363336432025 8.471650039326423 C24.087378858394878 9.263484881893884, 10.746472634329855 8.132944808975946, -0.3659211453050375 9.440198277603859 M-0.5768625613584838 8.970073467778555 C0.33864472470203144 6.897439653514966, 0.8997982601752001 5.119960591578465, -0.4516712789151425 -0.07055415840530166 M0.18351657115268732 8.737777832386026 C-0.25946286773762506 5.982133713330276, -0.14977627603508412 2.1195082916510986, -0.33831611476947715 -0.10939118456761854" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(10.56073975645586 46.59473198317471) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-0.07892168685793877 -1.6287463195621967 L35.560441150919054 -1.27948634698987 L38.68857718969927 9.795044909711951 L-0.11211610957980156 8.863077770467871" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.8566988222301006 1.7088478691875935 C11.087203865951626 0.033618518256444185, 20.897127092046198 0.6892127479546253, 37.27680096741066 1.125215519219637 M-0.17525923810899258 0.061317240819334984 C9.54980238542581 -0.2488392201762618, 19.137773325977214 -0.43012811816744223, 37.504869540811036 0.22332212887704372 M36.72097631846049 0.18446351433659702 C36.57755889294846 3.046468813173304, 37.29609139911153 5.201119118235322, 36.465608686401445 8.71038767314341 M36.55932938227046 0.20518360249930162 C36.60398809660482 2.590140845545532, 37.23277396433403 4.584268233493708, 36.739514771109015 8.663879956156185 M35.843194700435255 8.33690564661848 C21.750088883624752 8.055185217462308, 10.142407108562296 10.043657559953457, 1.059736680239439 7.369095135029429 M37.06812824833438 9.104196285378212 C24.082875627736904 7.845626219480607, 11.157920019468524 9.312168284624192, -0.9138945993036032 8.213600312840217 M0.28517881391209554 8.737343582624508 C0.9040186055211126 6.175216987480088, -0.39398502277929415 3.5001462140287245, 0.48825294571388145 0.17957212712565707 M-0.24150565502505933 9.116327648164841 C-0.16852237755305588 5.7184870489684965, -0.3750194290000146 3.3715938877978626, -0.005636308335802143 -0.42029812778048" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(11.30930134613618 61.565963776800686) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-1.6287463195621967 -1.2135915867984295 L35.49454639072761 1.914544451981783 L37.74927435036049 8.707687187489146 L0.043274473398923874 9.721908736523265" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.7088478691875935 -0.20450644567608833 C12.414187997428115 -1.843814883884499, 25.21460945501434 0.34524687701790574, 37.89924825693712 -0.721671748906374 M0.061317240819334984 -0.15546840988099575 C10.753094558606897 -0.20831837503832218, 21.940513584768482 0.7952866926916278, 36.99735486659453 -0.017516976222395897 M36.95849625205408 -0.23510817893177693 C37.248745428000845 3.5157086579794647, 36.77290437240627 5.756144610180301, 36.66461711379195 8.516435072312511 M36.97921634021679 -0.2214263646420558 C36.963694104689864 3.5671177275018073, 36.61524872827379 6.284998482220843, 36.61810939680472 9.064347815768015 M36.291135087267016 6.867318402525061 C25.479159036538974 8.159942244728803, 16.571102205781635 9.70838663502574, -1.4507081620395184 9.68611348747837 M37.05842572602675 8.753741973560327 C21.844899771601447 8.562032931118326, 9.22845595523767 8.01405947711976, -0.6062029842287302 8.79555159203338 M-0.08245971444443945 9.666135646454212 C-0.035061864342359014 6.299053590391605, -0.0855811754762623 5.169899387258428, 0.17957212712565707 0.023468123231048854 M0.2965243510958934 8.687984233746429 C-0.41550204281213343 5.468513050873411, -0.09298609163103436 2.788218725881028, -0.42029812778048 -0.01740184384899729" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(13.153967362993626 88.7543629638883) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-0.8449347130954266 -1.8435358293354511 L35.456014886156176 1.9602872841060162 L37.8015238192235 6.841798673864478 L-0.2611667029559612 10.041884909864539" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.36498136445879936 -1.984556209295988 C8.295426417320744 -0.6887375741342714, 19.083487894743236 0.9466807217260191, 37.51766941662178 0.0861273892223835 M0.5826892200857401 0.9913427587598562 C12.814206016399973 0.7701972591233648, 26.774415821137058 0.11721532219175462, 36.04157509434268 -0.4629033450037241 M37.576656813836934 -0.1651450023635911 C36.362850293467254 1.1159181613190836, 36.625900102923346 3.738303419096655, 35.94278276683675 8.365578598506847 M37.035717561433174 0.06584351795984328 C36.52476209108579 2.506541442020841, 36.649970747955194 5.920911244824453, 36.965266261404416 9.04709447677017 M36.66668567772255 7.069870281513804 C28.91642819362217 11.046753316209658, 20.70323049345734 8.207188396737918, 0.9729912988841534 8.082791734036082 M36.27351399290607 8.27947250522303 C25.95849233536574 8.378766649741813, 16.245332631560764 8.670124465007468, 0.6201749984174967 9.287151908051246 M-0.48075995833710655 8.722289654414098 C-0.8089456741382076 7.3218967245719515, -0.04428772039625832 5.1458362375763285, 0.275615992890082 0.4682994983514367 M0.24879299957547868 8.380536186214709 C-0.08873773035165537 5.818764587262459, 0.48383847828434307 3.2521641069770024, -0.1625532664887271 0.37026672489820006" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(57.11242469161607 16.68028410992281) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-0.47695304080843925 0.9997671358287334 L36.53344632650957 -1.3466554172337055 L36.3703874495183 8.45809663414585 L0.05767403915524483 8.803347359892005" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-1.8912106044590473 1.3729121573269367 C13.883969221624008 1.2486842020592548, 28.70743930039261 1.1025854691109516, 37.06708810444221 -1.924392830580473 M-0.3936631139367819 -0.5121728423982859 C11.26337212844346 0.3271004656423049, 20.348685340184055 1.2059292892087417, 36.35156538116977 -0.9217679146677256 M35.90175215284966 -0.11517194739078063 C37.39940545086638 1.6684921999443545, 35.98522773582504 3.4742761733203604, 36.798487812461225 9.201035104233291 M37.13039104562679 -0.3724695133161189 C36.50696279327662 3.170579021403525, 36.91720582271329 7.238395994349072, 37.02789468009856 8.754923026424205 M35.25767693634376 8.808915186222666 C23.523005060150787 8.679117420586186, 13.031691796018361 10.738211253555852, -1.8818716295063496 8.994375396069163 M37.524326881051515 8.268396174084419 C27.030576927525114 7.9851976138989595, 18.455461388671566 7.979997049037662, 0.18540667928755283 8.585619186531776 M-0.26853005598538326 8.188396199160945 C0.7547672756901297 5.790889846471393, 0.4690152683567116 4.04781172758255, -0.6134687685483496 0.6206236413141054 M-0.17703662485557786 8.592299423573456 C-0.3661766451988306 5.388711689329568, 0.14529541549568625 1.9553319513254064, 0.2153646826816517 -0.033548529161233576" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(58.421726037752705 59.56046096631724) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-1.3466554172337055 -0.4036452881991863 L36.412326074794386 0.05767403915524483 L36.75757680054054 10.475259828861827 L-1.4577538259327412 9.584871697720164" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M1.2080260403454304 0.6249935142695904 C8.616765094597598 0.32196962503257437, 13.18528274382047 -0.3374106917017396, 37.902370825067614 -1.9921855218708515 M-0.039203012362122536 -0.36860973201692104 C10.424177608565259 -0.7551710743220992, 18.247335680054164 -0.9917126435550399, 37.75417637977049 0.5137455407530069 M36.377816858180964 -0.8752517431502417 C36.107375557337015 1.7981154789966025, 37.143747350142746 4.35028543389008, 37.171895366539275 8.043099456269621 M36.49570614461597 0.10338682211362821 C37.040123600393265 3.612519547320367, 36.63465856440952 6.793921706726109, 36.502719603747494 8.789429249359758 M36.658675685182665 8.419520388837928 C30.473685590924337 6.647417055167551, 20.46043698226171 7.299079524077769, 0.5736088640987873 9.49328220963108 M35.815594376653884 8.041676396053308 C22.019049851489005 8.647312738761224, 8.957160706573148 8.160706319674768, 0.7873601671308279 9.48243086926269 M-0.6486247072923376 9.35826218340607 C0.05172662691483647 5.828923595385296, 0.6837643511049056 4.844892479594878, 0.6378432483631619 -0.6018683295103648 M-0.005239569822161283 9.02888303740095 C0.03273350478999281 6.74330024034107, 0.011952245537864885 4.130902018203156, 0.22044373702208603 -0.053048120570038715" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(58.56911286906006 31.79367972965457) rotate(0 16.462136479959668 10.736594456999967)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-1.010108491000707 5.833169192125194 C1.4123596660643636 3.99276949179078, 3.9704614858402825 2.1370208022346633, 4.5832386474632685 0.852106138676148 M-0.6177728205593178 6.668317068417661 C1.2980211349488353 4.195444517121244, 3.7096787871389494 1.9324371016255775, 5.020348826380915 0.5975526723859126 M-0.7402063271869315 11.280558953105986 C3.8734243447475563 9.244640604858274, 6.054854182223378 3.077639131355851, 10.801331089389455 0.35162681593445555 M-0.46294908885847463 12.364710306047975 C2.839544899503199 9.880856351136, 4.244632175963929 6.499781055343717, 10.811873611760271 0.4497241775586964 M-1.665422416343377 16.55485756965416 C2.9723066667829983 14.79088787867478, 5.87528986813017 11.418879251336532, 13.991276699587186 -0.044468890230638713 M-0.8123122985055904 19.033355717454725 C3.6658682510642455 13.059629501256179, 9.823621815405293 8.788012577630541, 14.913503111808975 0.7466754677635601 M3.27176075531151 22.36358086688235 C6.615012703920345 18.44497171293716, 9.184598729240157 11.87762808595206, 19.48781132912066 -2.0621102215558764 M2.128355777915379 23.061916978264 C6.093670426164373 17.837484255315754, 8.697944285412625 12.593816916125698, 20.911918565908337 -0.9816832436085221 M4.703087185932604 22.06106781695176 C10.842741799416562 18.07458643403306, 15.485560846004027 13.225930840521158, 26.290059848364802 1.4155166516092361 M6.242425459860174 23.458888574653752 C12.297282274545331 16.458399579035877, 19.020258322449898 8.945094861151707, 26.926739361744733 0.5723070142802538 M14.054003376745376 22.895770998565972 C16.010950779731676 18.523173685620986, 21.341850369896743 13.210673119095949, 31.36529168248812 -1.3093619233683462 M12.151327135273013 22.810918886108816 C16.93244565963427 16.573939070592054, 22.700551320966326 9.932979863942988, 31.744265686205424 0.14477151457809256 M16.448792636956313 21.90750507998304 C21.97730497869263 15.982037239403283, 27.763875475795942 13.78366218760286, 35.00614083271609 1.0195191322149668 M17.376499875079134 23.26359650666849 C23.37599982277827 14.871302608056197, 28.697056147033067 7.982779235517869, 35.364834177234634 1.5059523867698346 M23.84473791010462 22.5410416800485 C26.924017080478194 19.09292898789664, 30.16499211767931 15.780177740808991, 37.198303628299826 7.489963041693445 M22.720238442064897 22.927219934664016 C26.14992146473754 18.446057241402794, 30.743874197770705 14.655814841296054, 34.6900300734271 8.117195780557122 M26.779405897811227 22.838830704967513 C29.493899222797022 20.584603513462344, 33.52048534574749 16.135501719035368, 35.1524374734531 13.66494072278367 M27.948238730106677 22.368478438518032 C29.542011138439314 21.009506418873645, 31.154099573462513 18.07052051549852, 35.388704548116294 14.812901789726334 M-0.06617253467599049 21.415666007198396 C-0.06617253467599049 21.415666007198396, -0.06617253467599049 21.415666007198396, -0.06617253467599049 21.415666007198396 M-0.06617253467599049 21.415666007198396 C-0.06617253467599049 21.415666007198396, -0.06617253467599049 21.415666007198396, -0.06617253467599049 21.415666007198396 M6.790362903076606 22.0669503166424 C4.379916626866119 20.52274365492075, 3.508019576217559 18.124058375885962, -0.4499516495484641 17.083883787309286 M6.573303689408498 22.001647343401785 C4.87953277561205 20.004188985610824, 2.7652009644761266 18.851499861650627, 0.11380337914640395 16.50287391728675 M12.283077642289786 21.259383334064893 C9.441087958057858 20.345996159366372, 7.49189556937933 17.98681309861081, 1.4500651141880772 11.793045807625225 M11.301539619235186 20.637472465479785 C7.854631740498074 17.6711121286352, 3.121605990162796 12.745772414238854, -0.6137058133007267 10.70660499557393 M19.43600474374272 23.43266612272619 C12.166825404357482 16.16338478997187, 4.698636020303008 10.972850761732676, 1.5270962888475488 4.834749584310276 M17.90474966058411 22.218546973354208 C13.126569568961088 17.52377272828698, 7.289836418103606 12.632844347158231, -0.3959685558352639 6.571631900258882 M24.83999100291487 22.012639079731027 C13.248669859759952 14.044603423201876, 4.578578674391018 2.9960708368919065, -1.7582550129519323 -0.15797988470073887 M23.439704192442264 20.76801075648896 C15.162903276814326 13.0919008996785, 7.152146128109418 7.453653481823336, -0.2005152354614357 -0.5011806409376497 M31.111704878198463 20.742281447806455 C22.524831416848123 13.801212672820085, 15.092825115099021 8.441076672814992, 4.265862346715846 -3.509303285357147 M31.034217929156977 22.369764266246058 C23.422718422691997 15.381131553274823, 16.219754572901728 7.813356950364829, 1.6956401905935192 -2.7030014661965662 M35.52417816588991 17.859783868802936 C26.361887280961632 11.005525738902044, 19.176181085423188 3.4918058849075866, 7.001496815261158 -2.86870358085594 M34.60652530374341 19.838475195257217 C25.536516541029943 11.21173993209863, 16.51566243887858 4.717727642807828, 8.837795681458624 -2.88613336293361 M35.241870992508396 12.810177263561453 C29.067671261203728 8.089975935222647, 25.904442416581116 5.021218300622586, 14.579448528534869 -4.802295687327662 M35.042460954439505 13.088229640144672 C27.489438526010723 7.836016346448545, 21.41685689368564 3.684935568286896, 15.263672759703496 -2.6144329464980753 M34.04050111682376 10.15477987973513 C31.29225628698274 6.222398365007134, 25.294057371192213 1.385493328009491, 19.24219335205504 -1.017750991073949 M34.096560235742835 8.984509222382027 C28.40554362895231 3.9494198088971357, 24.313936242842573 -1.05382254732307, 21.790675832513955 -2.419936758776708 M35.6769491426983 3.2073086587998976 C32.51549453738708 2.2061218543188312, 29.61972944120955 -1.5812410768878868, 26.93113729934273 -2.0452150089745427 M35.293961726628346 3.721747546019774 C33.12339363921916 2.288507216079229, 30.46578584142361 0.8021496005929479, 26.532570400359756 -3.3679958160114496" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.6249935142695904 1.0619273073971272 C12.430443985591733 -2.1232597260555903, 23.908452287724423 0.9294711918750125, 30.93208743804852 -0.7873262278735638 M-0.36860973201692104 0.8396258112043142 C7.768522684850206 -0.9088606078691094, 15.722971157015184 0.9930509488516243, 33.43801850067238 -0.9890023116022348 M30.93953086608807 -1.485611330717802 C33.11093782759615 7.0260430745263145, 34.876304977902855 13.225308105991134, 31.163000481070675 23.089362408607126 M33.15871542678932 0.7723182793706656 C33.35051858685015 4.481162166806904, 32.948569836831076 8.641336239179665, 32.85539603220085 20.715011013313056 M32.52399005168835 23.416925694434763 C21.183043599113248 21.62877077757259, 11.975348336723695 22.027807542671162, 0.6734789125621319 22.97377720066798 M32.14614605890373 20.50954998807502 C25.047865614149757 20.923445941313588, 16.499445170006133 21.944990218981587, 0.6626275721937418 21.168726278110267 M1.2210224382579327 21.44730606266749 C-1.5917836850294629 17.97583268570012, 1.56787820404677 13.110182825578454, -1.3648112304508686 -0.8029050938785076 M0.47411429323256016 21.444237542911292 C0.6356104494703926 16.06294422007395, -0.09123225196469653 11.304734011860628, -0.12029320560395718 -0.6733277086168528" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(59.00404616677724 73.57096492570201) rotate(0 18.38701636885878 4.409901648534486)"><path d="M1.820042122155428 -0.7467214353382587 L35.783627524629686 0.1499590389430523 L35.446801557794664 10.043517242666358 L1.895867932587862 8.260777245756262" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.1809731014072895 0.2474219687283039 C9.528488990538504 1.151624975068497, 18.819444460169844 -0.02370959772355624, 36.57304093952522 1.9249094612896442 M-0.5968806203454733 -0.02442469634115696 C14.426473763980098 0.7371387909218792, 26.15837955516602 0.20236287816019088, 35.902345677495454 0.7939160224050283 M37.227132461905676 0.22857021657410737 C36.58770750812201 1.3275467665896232, 36.0671108772463 2.861210113280387, 35.92615767532285 8.45788616425854 M36.954791269871464 -0.03629618556040165 C36.38229858941304 2.4972183933355363, 36.66895724739435 5.218298969949322, 36.49532273262883 8.937097011637869 M36.45094666595802 9.132033396061534 C28.976854387806554 9.484600056492207, 21.904323846715634 9.783989299618122, -0.13898031786084175 8.10087983637678 M36.65674169886157 9.064066683422798 C25.024886308278703 8.379422718493466, 12.279438769868687 9.294601792304043, -0.7113838251680136 9.61117861188578 M-0.23053741690864415 9.37595194150447 C0.05255498586237507 6.250909555181737, -0.3128950120796279 2.153282860846897, -0.4011418217472588 -0.5207191767637898 M-0.1848157415617529 8.810177354934464 C0.15482339829635866 4.972880744820691, -0.059343844559326966 2.0293696115156314, -0.19159074260967057 0.1690007950901538" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(59.56478592323219 86.50867819484438) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-0.7467214353382587 -0.9904052130877972 L36.923991776660536 -1.3272311799228191 L37.997746683314894 10.71567122965681 L-0.559026051312685 8.073280382450694" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.2474219687283039 -0.6941475160419941 C9.011174495812998 -1.592552999212162, 16.041761281930896 -0.16771624441327299, 38.69894219900713 1.3046059124171734 M-0.02442469634115696 0.9954829160124063 C9.472701671596017 -0.4841050660384269, 17.454788595563464 -0.6521899854910942, 37.56794876012251 -0.8796824868768454 M37.00260295429159 -0.2610003845187795 C36.22229695584418 2.611095522959258, 35.88588196322728 5.937510014312859, 36.41211560490707 8.984256240031835 M36.73773655215708 -0.4402383882558197 C36.50404065203518 2.9774036583599632, 36.41472729656539 5.983256772180285, 36.89132645228641 8.39098253524403 M37.08626283671007 8.478911768194312 C29.99111477577068 8.80326215923977, 22.682576364489492 8.62141709030819, -0.7189234606921673 9.665432225462073 M37.018296124071334 9.657725983779901 C26.29562986668178 9.456214916492335, 17.36639584501 8.065286826873653, 0.7913753148168325 9.297624820392603 M0.5561486444355216 8.953743225663413 C0.48269039161958704 5.128502811279819, -0.5475661289125205 1.4990806424385077, -0.5207191767637898 0.804015081607136 M-0.009625942134483478 8.96958427895156 C-0.42105477270017666 6.159900211112648, 0.08539016472258389 3.437178904492825, 0.1690007950901538 0.40131033774477015" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(12.178511809778229 31.793679729654457) rotate(0 16.462136479959668 4.373820944708854)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.9941690973611423 5.643639567543857 C1.3728248553317846 4.764402974844172, 3.255753710777812 2.6507380415571364, 5.034648717524952 0.46759808620784815 M-0.13125897493677835 6.378159114022844 C1.1756874735655742 4.506387165969838, 2.401240414561179 2.9343224607181964, 5.204913314017004 0.10993535405371724 M1.1910245513448816 10.56610060029682 C2.8993232560890743 7.64922879655988, 7.370572479180254 4.831531634364006, 11.87620113450004 -1.3964256510988113 M1.6744841638857824 11.045057562114861 C4.194434627985389 8.25508537298072, 6.023227172867638 5.121069321964296, 10.446539064635468 0.3857999161898853 M7.724052542433397 8.919514892301759 C11.59854788303567 7.4739516125582846, 12.501114324657514 1.5112379545036243, 15.01631949516334 0.1963163663563301 M6.555138299320148 10.760085572633198 C9.092376974878796 8.375975015155344, 10.806760946229716 5.681133432329506, 16.092404711435965 -0.19420893789036364 M11.208291458095216 10.955567243172768 C16.569425791836608 6.313468032484163, 19.01949910438136 3.513617972276565, 22.332746453115607 -0.9987398614943501 M12.730789287703836 10.448300280830804 C15.315639000138898 6.98188762995379, 19.53695712225894 2.1991969974235333, 20.958797023536526 -0.7116550732240554 M18.519223757025987 11.127825388651367 C20.600565818281886 6.099502921965959, 23.006222597916977 3.506243493378824, 27.207246096108467 0.5128814861190261 M17.414480118856968 10.047667331866947 C20.37774890176319 6.678026915481389, 24.1282705016999 2.740832961045637, 25.997066469632447 0.7321318621759059 M21.61064595029477 10.707871105286994 C24.612070853738597 7.369082857520261, 25.281468339078433 6.211582838807599, 32.004408654055396 -0.8940273995266075 M23.21028548207853 9.907709839146355 C25.970612295669124 5.949101043923858, 29.250543692537 3.358390085930478, 31.53118452955673 -0.09574107460088688 M28.538362381870073 11.066996135071669 C29.884360510845156 6.5979493557326485, 33.11860419116775 4.097450735684295, 36.6069957426949 2.0868787192063367 M28.35511681119181 9.620053924594824 C29.549116872300146 8.258570511580345, 32.0140838240428 6.896484378027164, 36.11346634125473 1.6291418005967473 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M5.258115538843386 7.877165349609047 C3.3110862802774355 7.567906643978159, 1.9341492669981282 5.697544168501756, -0.31311932857050384 3.039583649224907 M5.576353458203785 8.73977079544106 C3.9284014715838107 7.172800137346602, 2.354702787977381 5.232824115815924, -0.28942512850179153 3.5823151660331782 M13.387883124252365 9.51744918022659 C10.726931681435264 6.867938850653157, 7.113452890226113 5.511186629367841, 0.004888998766848451 -0.0406635645732647 M12.698225262419836 8.54280066220444 C7.990527351460395 5.7211512969784195, 4.121950687109741 1.8930190328864622, 0.9639251604242712 -0.9061041991182035 M17.789500047812858 9.79713492260202 C14.180314742772737 5.34289386645572, 12.906618965103995 4.019589606427387, 6.658145288477244 -0.8771344002618009 M18.886434296971807 8.410276074212781 C15.059151301901544 5.772401057755143, 11.189188401356992 1.8299553014697016, 7.2576759010614476 -0.434934348175537 M24.305821720679386 8.919862321106567 C22.56418738187492 5.190728176752833, 19.053682049379645 2.8449694081821297, 13.082447880228052 -0.6074715388055918 M24.168773034290506 8.442721643479391 C19.65384574782431 5.257941789104743, 15.249828463771943 1.404460960365994, 13.104739083723167 -1.588787626107412 M30.05988424665687 10.10122233411188 C25.649242811786134 4.1754576111847985, 22.581174216599976 -0.05593942492155257, 20.212356315324186 0.0317675406670066 M30.603565053766793 8.617462337164978 C27.46521973474841 5.418227132132048, 23.70137664211925 2.7403028122068704, 19.75923906905028 -0.8908273470031229 M35.62213302265043 7.0313205863386425 C31.220647249832766 4.6084314606859005, 28.12928610796435 -0.43285022842113396, 24.486728574101395 -2.585672566495059 M35.156380981776216 6.9145555472038716 C31.814723906884975 4.348891642291617, 27.410603013798436 0.6861937000365501, 24.586766760282732 -1.8423874071940463 M34.44120408653874 1.2106052325196268 C33.238201258268994 1.1367645611145853, 32.433358187524405 -0.12076148443326007, 31.31389811523414 -1.2555839941372877 M34.225396073829714 1.5420237553395033 C33.50840000124163 0.9341020967808661, 32.65211505889352 -0.05446507128650256, 31.35442770943201 -1.2515424687882666" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.4614542834460735 -1.8597162254154682 C10.53811852874392 -0.3565361121425203, 17.524618373404618 -0.6389938214549593, 34.13273399496237 1.622622560709715 M0.7100650426000357 -0.450328653678298 C12.719216649939318 -0.9076680692560167, 24.651133989090123 0.4517528859251051, 33.11701778681735 -0.014460807666182518 M33.48563575217423 0.847828274262626 C33.54057770714543 3.229523161541376, 33.796017188963745 5.464103137108313, 32.188217613451556 8.042310600348705 M32.51648724444099 -0.2804922892038433 C32.910031300085855 3.5773794621799584, 32.64840308599967 6.641852968187412, 33.15685850072404 9.160681193072145 M34.7397163453118 10.015298167282886 C20.61656448766185 8.722184509630278, 11.179802553107606 8.05350396096237, -0.9354405514895916 10.096780935341663 M33.329513054208554 8.092233322600073 C23.341202051996994 9.541706390893413, 11.806784711284454 9.01532030872102, 0.4122233148664236 9.179273806551642 M-0.7872680451564114 8.037730152720544 C-0.18192314275861737 7.0262794921177285, -0.2548973083367034 4.275262949382126, -0.5037243686648996 -0.3336039978887211 M0.19031602676589066 8.71905763923995 C-0.20575842521533572 6.8755288339702485, -0.23833599800873684 5.2646237205125574, 0.15272868551232943 -0.14620603559345263" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(14.42419657882192 76.70737511053278) rotate(0 16.462136479959668 2.876697765346222)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.265274188411469 5.549691496225483 C1.6643587374164333 4.6064494871786215, 2.519663718618075 3.4214044548983438, 5.081069779912478 0.2587005944770986 M0.37743041475764433 5.543591072498298 C1.229904709850103 4.1628115731843245, 2.342387469224913 3.5647967708497266, 4.768114889277613 0.3131747589295023 M5.401424073827874 5.280862490812468 C6.6070447497047615 5.020401147950606, 7.875452949717965 4.08311888733532, 9.851843661022318 -0.3560278941795467 M5.675113766295325 6.156007587144303 C6.752220952259871 4.554267262525939, 7.630494113222094 3.3491469306459565, 10.870258270901573 0.33560558735278867 M10.369210867047208 6.2764144395508 C13.404153788025157 2.9334499036292287, 14.009093988187256 1.2548882438492068, 15.547484232954163 -0.3506257328274677 M11.360287387225675 5.731119974333061 C12.330358484370166 4.397818611240997, 13.136073766631293 3.1821192736495783, 15.337201376821328 0.4856258790870181 M16.273777617121482 6.757361714064086 C17.585176849087926 4.212862934590127, 19.952476288316916 2.288979220948478, 20.728893859589363 0.6327180331736331 M15.983910781497501 5.795339308946335 C18.00482243735073 4.212533115215811, 19.273900753730853 1.9254424297728139, 20.892942310029532 0.21546923054153955 M22.206965118062527 5.410476097475726 C22.734165996087764 3.500508784515355, 24.57859808176605 3.1173068160505206, 26.36660635811499 0.5920888029050767 M21.625341549024764 5.459175075508642 C23.243072423893047 4.056823656592417, 24.55904659127912 1.7245568911977225, 26.484664252914847 -0.035271313133394544 M26.782038142836964 5.899809423643827 C27.355215999443637 3.8147985335712686, 29.179453055391264 3.193525275053102, 31.438530728078284 0.41260775935493743 M26.32480313361374 5.908679209459304 C28.02117110040861 3.8025943960569064, 30.505655087679802 1.6324475257867666, 31.89469434232155 -0.11356910380877211 M32.73903024170713 5.224670443610069 C32.49452658743429 4.953352275720675, 33.14110072505411 4.487588823684856, 35.711001735107466 1.3601384662346871 M32.081329236944924 5.339469567333129 C33.03054205692947 4.890214334358759, 33.977342814640046 4.08540472049142, 35.502939499375834 1.9879574646497105 M0.1701363804484819 5.90129282983639 C0.1701363804484819 5.90129282983639, 0.1701363804484819 5.90129282983639, 0.1701363804484819 5.90129282983639 M0.1701363804484819 5.90129282983639 C0.1701363804484819 5.90129282983639, 0.1701363804484819 5.90129282983639, 0.1701363804484819 5.90129282983639 M5.166085029268591 4.897776483550385 C4.855617433668202 3.9805870776716894, 3.020920574636244 2.485683401013559, -0.4607010809736686 0.14615221495945685 M6.028690475100605 5.325252240702227 C4.182501908052468 3.828817889707075, 1.891850016695627 2.1557498475948087, 0.08203043583460293 -0.060799932971403736 M12.84513589043083 6.591064199764917 C9.808227751472316 3.4217372309964653, 7.895155125111909 0.5051971063982599, 3.2789810122515135 -2.179917636292509 M12.000440508144967 5.500265878864268 C9.01136763688752 2.715604038493778, 5.327389819093977 0.12135595795771081, 2.528932462312566 -2.7734717392013284 M19.258718303378835 5.056888166742091 C15.571234658954067 4.616999181689881, 14.429986276049998 0.9956982603751726, 8.725227724210749 -1.9757160982093775 M18.056773968108164 6.301880316083943 C15.293600118837196 3.425648957791967, 11.450278141185034 0.22575243785272736, 9.108467769352178 -2.895497076304164 M24.48347839654108 6.410601659721435 C21.176940711351595 3.992500793065939, 19.200954175097415 2.2994264002577984, 14.9439985519311 -1.7298347509209546 M24.069956475930862 5.4939718838284906 C21.26324244768006 2.4719754268838114, 17.780201586972545 -0.2774634796177494, 14.093524609602856 -2.25750487760869 M31.678511844034983 5.499314191762085 C26.238629095673513 3.5844779107371174, 22.376596156595546 0.35989896487765805, 21.669193857029985 -2.3980937688680948 M30.39258651334767 6.297676584414865 C26.83099366351835 2.945653336421591, 23.540527448253037 -0.16471867555758468, 20.869611621049206 -2.032206797047949 M34.63435380393343 4.365992716752816 C33.29192007259832 3.3286673275444674, 29.866879195334334 0.34738086428992965, 25.411741995673076 -2.297636869905902 M34.53555261697324 4.733896961395521 C31.974136725248787 1.5492369611596504, 28.59270091020744 -1.201293208310521, 26.040675592004703 -3.04400834597872" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.8597162254154682 1.6380829699337482 C11.021065377839086 -0.03486614548261735, 21.856631374404188 1.4490476337761107, 34.546895520629086 1.6136280186474323 M-0.450328653678298 0.18555829487740993 C8.036706926404754 -0.6499664738522959, 18.34838278053566 -0.04103892592062469, 32.90981215225319 -0.03764685429632664 M33.48189654186301 0.36227320960446896 C33.36225123284687 2.687051075478305, 33.12759380231552 3.829436849709214, 32.46037080920418 6.098762918801241 M32.73979088995131 -0.028397022421644935 C33.22379086623568 1.7158482872062149, 33.16793407769206 3.9856140567712752, 33.195932284076264 5.8815383537437596 M34.19192923778454 4.789584360382686 C21.697059628209317 6.995580495595881, 9.930167771319638 5.116815747022578, 1.3491390459239483 6.044281325600276 M32.26886439310173 6.5264666763312915 C21.74482709461934 5.4816137439667605, 9.1217240154973 6.299041200345697, 0.43163191713392735 5.057667502499352 M-0.4669147485381925 5.6632207782774335 C0.2108753948631303 3.594430567052506, -0.2727518521073641 1.3805028856787565, -0.21941407464290574 -0.4863041890084477 M-0.018800094848407334 5.564492293424248 C0.0543336093759933 4.352789499971246, 0.23454262687161337 3.4732939194067267, -0.09616090397587451 0.056518343245082925" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(105.32095381040563 13.417840078636914) rotate(0 18.38701636885878 4.409901648534486)"><path d="M1.1703398935496807 1.4770015366375446 L37.11520776863441 0.8599173910915852 L37.84641811962471 9.934573340710276 L-0.9352233894169331 8.651572752293223" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-0.2897336594760418 -1.7334765680134296 C11.253432130266836 0.6015784090963886, 23.703568897792916 -0.6178872042687847, 35.85128547216997 -1.9332552440464497 M-0.13409318961203098 -0.812355762347579 C11.505869250287864 0.04839664711869757, 21.486505136843032 -0.6729032991894384, 36.929082573980544 0.41064625419676304 M36.41955043986953 -0.796603104791832 C36.60247915283617 3.2040533001927036, 37.49968058374859 6.167456574238433, 37.33185200813158 8.454339504397778 M37.1820593474673 0.349736548773577 C36.898360784878605 2.9294089223029682, 36.337961919824124 5.848246073938459, 37.18679420713328 8.411065058689674 M34.8419233706151 7.968543897863501 C29.130484954468347 7.142614277967775, 21.6895429067298 9.483459028372133, 0.8346344120800495 9.457508813139075 M37.23121369037077 9.401568823974603 C28.56720434436971 7.906382156290929, 19.88046512471142 9.511555446066778, -0.35254153050482273 9.287584298770899 M0.773615477424887 8.965354283866496 C-0.03119293976304001 6.773259662551842, -0.12886050281324718 2.6928310379432796, 0.737444340583323 0.4496573273119827 M0.313608931780811 8.726170199455101 C0.42390819887508474 6.572578314989692, 0.38584104164721983 3.6789849690286913, -0.09739420908240809 -0.3607092791713911" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(106.06951540008595 28.38907187226289) rotate(0 18.38701636885878 4.409901648534486)"><path d="M-1.6764583699405193 -0.3219753988087177 L38.69955684210405 -1.0259571559727192 L36.79615677858934 8.769380222555274 L-1.9722298495471478 7.312735806699866" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-0.524430800229311 1.2008421309292316 C15.208530682594699 1.5578652415882632, 27.384801949354255 0.9497456346165225, 34.90122935886726 1.9080995209515095 M-0.4621428903192282 -0.0994585994631052 C13.508160197947051 -0.17251759118655113, 29.37720023994868 -0.5671949321708956, 37.73287655937717 0.35011533461511135 M37.418963158383264 -0.705342036075441 C37.205379531140316 3.527112307328581, 37.158253831807315 5.481776344547072, 37.062958644945944 9.689297796776772 M37.111883733384204 -0.2928647821079552 C37.202097840641095 2.4077817872094425, 36.8334787316138 5.082928061216129, 36.743387585877535 8.628691749212805 M35.912939406589125 6.983616400059336 C24.08935465143916 8.342173369046325, 10.31039803959974 9.998816640492553, 1.5776663534343243 8.35234992533552 M37.512533506036256 8.990390812527412 C26.98394889337727 9.571186575820823, 16.891332679766137 8.567382190158744, -0.0841152723878622 8.491489862572426 M-0.339397737683269 9.41821222326791 C-0.9054947972809485 5.623008626639107, 0.43204461352346285 2.8696069486603086, 0.1546120785537105 -0.037906113580670975 M0.03595302093182179 8.570787833870169 C0.3657386900649605 6.871928434296718, 0.44222619690047105 4.923574589515681, -0.34947827668158077 -0.028127392808215634" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(105.32095381040563 44.85742684525178) rotate(0 18.38701636885878 4.409901648534486)"><path d="M0.5833394043147564 -1.6236143223941326 L35.28707645054207 -0.8669382445514202 L36.79802999611244 7.191463280018443 L-1.600963044911623 8.732662368115061" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.8575409539043903 -1.6406140811741352 C12.58414808536424 0.8567518743035767, 22.16755529616979 0.7942267211434815, 35.45699895884142 1.4624598734080791 M0.40374926291406155 -0.6193298753350973 C8.41888486640364 0.7719047691197105, 15.830052803267243 0.2538423563809105, 37.030765892595504 0.7661191169172525 M36.81040999708779 0.8258309471895869 C36.79010398178007 1.6779883291924904, 37.5888386260433 4.570080177121532, 36.646965327751424 8.440069607020448 M37.15996756758772 0.2716485970753671 C37.18462219671602 1.733129456975664, 36.73936031499124 3.3116919742419606, 37.032086929013005 9.14547487263424 M37.88880278135881 7.884579907652014 C24.946273283814968 8.360356103784433, 13.419381756241823 10.899401080018869, -1.7955453507602215 8.050176750417823 M37.358011068433974 8.037805312793726 C23.422067252027592 8.5406027786957, 9.108596198682733 7.662368094228384, 0.12264261208474636 8.901331239860529 M0.7219540277715052 9.455890476573463 C-0.7558544547732099 6.615638448729388, -0.7961995543400596 4.38199518742328, 0.8491371922227192 -0.4524370153449835 M-0.33230097051156626 8.89591211255019 C0.4027824738838018 5.886353015250667, -0.42059233960220216 3.66661167030903, -0.4351986037982008 0.4176693455747746" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(106.81807698976763 73.30276725314172) rotate(0 18.38701636885878 4.409901648534486)"><path d="M0.061309803277254105 -0.5144860036671162 L38.020149722353075 -0.6870346553623676 L36.924183740869616 8.758876453634375 L-0.9996594302356243 10.259446393247718" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.5134663097560406 1.532238233834505 C13.329339046400756 1.5822472924041129, 29.64516219198102 -0.08953198253637779, 38.64670703525886 -0.1389833800494671 M-0.14407057128846645 -0.4305466655641794 C8.52044311713745 0.5236633009285554, 18.66474844679872 0.6876712626785859, 37.390029688954804 0.7888331767171621 M37.290141120308526 0.6513431511305859 C36.90500872902894 2.8190137617164854, 37.2274653022547 5.371199924636396, 36.36160842104371 8.745615281394915 M36.37812381760069 -0.1696988688416345 C37.071667577325265 1.9451339928475397, 36.48918815611273 4.970278696165985, 36.42917931771689 8.897109336345803 M37.019317961886976 8.98285918265211 C24.77052485253013 9.771266712211371, 15.74800051148383 10.11815656568885, 1.4424067251384258 7.234832454022044 M37.73679478991077 8.306824719082588 C25.987438390949713 8.735390090529453, 14.19260507853651 8.00706788260175, 0.1725861970335245 9.728120302330726 M-0.8703972075964016 9.655141988218496 C0.34181012959356066 6.013369976789837, -0.5711699086920727 4.2566213125102506, 0.010582554935622657 -0.7180819325565674 M0.34521128619298364 9.25942996621645 C-0.10592382920351413 7.271321331793779, 0.14343106860924654 4.6741217798465025, -0.22444591372385042 0.017668755512350975" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(107.2458336993036 59.116177753022214) rotate(0 16.462136479959668 4.373820944708854)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.2449534841207911 6.529770281960361 C1.185646538634646 5.113874252949957, 2.3744802405010654 3.2679452433219964, 5.344370664241261 0.655688029139977 M-0.42630702104883883 6.513925238748185 C1.6365560611675871 3.9029905206229594, 3.890717702348587 1.7037630891094842, 4.979106600912756 0.19408709889624653 M0.3977230166921717 9.907721379334298 C4.335310654938768 9.15773774765489, 4.72755824187896 5.995094334511212, 11.193802004632484 0.5151765122160079 M1.7384837929064731 10.571150019886161 C4.694661962533621 6.634209498288924, 7.372939677533379 4.023066399513674, 10.20326864360733 -0.7206935330212353 M7.863937467322008 9.10472419043561 C10.33559851367779 5.447613184023158, 13.53583149011688 3.329976220513009, 16.78103581043281 1.017570738953071 M6.947307691429065 10.568812062177113 C9.075212318017732 8.14963530623357, 11.446908995908887 4.962404472130094, 16.253365683745077 0.7528853952814808 M11.647658990825319 9.66573965795243 C16.33492388633732 5.498222212157882, 19.524827946649303 3.7188655517744897, 21.512410921880864 0.9848319063126567 M12.50743387522062 10.25918430354351 C14.215536920116152 8.588158385684794, 16.16332883754225 6.309107896618928, 21.906443045379483 -0.4905346918687698 M17.926728005645067 10.765410480242249 C21.912598940972824 5.2977580441122, 23.723622330098326 1.2817396288480656, 27.109064156983408 0.972497980297798 M18.3615239311319 10.299529655151096 C20.652474829438273 6.171625250166113, 23.56100723581545 2.3644191140154582, 26.226988776170078 -0.08820925997451529 M21.668034112907684 11.691868593308845 C24.74734743674719 6.916590972240632, 27.941008147847207 3.4949343971084303, 31.837738283953463 0.9605374102725865 M22.73033674684141 10.902311208050468 C26.376160346914233 6.248207989871874, 29.482339331509674 1.9515320667847522, 31.447383713020777 -0.6449560814291949 M29.414285753185077 10.964052881554519 C31.060117299393998 7.710743329560357, 34.84309208365039 3.1542970531867742, 35.535217850582164 1.49624854552317 M28.292328122055306 10.34724061635584 C30.52951144781941 7.4876482775678035, 33.47945602801789 3.864799394305277, 36.012412805316465 2.173908524706397 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M5.89914241720032 8.485610517150551 C3.878747986213887 7.016969416068147, 1.8005782419495762 5.237053109533727, -0.6343055849791952 3.057800545282036 M5.557306388654225 8.518676568603903 C4.75821170798615 7.464162093131023, 3.664666675788516 6.240859400635324, -0.010324615043539598 3.243200309313208 M13.065901813407944 10.157593409161832 C10.533904657962418 4.912628887880215, 6.9852086579255115 2.194941759534853, 0.3964582555493802 -1.6010694149837261 M11.92052859331812 8.846886453315433 C10.016586326298977 7.1295815020871265, 6.694942987429357 3.5823390923818934, 0.7080847993059546 -1.0761757413812345 M18.534554412017265 10.316349433519683 C15.132936511344667 5.962040166223838, 13.166982253989453 3.5010080055026096, 7.528981665621104 -0.7343732890023404 M18.675775210267552 9.685501773160688 C14.38174683906674 5.628846461352538, 10.095944799864863 1.6463312257563105, 8.053187087184341 -0.31446900994540394 M23.82213539274512 9.897126606194153 C19.524831873620005 5.478486010625007, 15.822878132931296 1.0060142095258602, 13.15630626054574 -2.3895952236887092 M24.996281901077463 8.123230316670462 C20.518469308886885 5.2384346659662215, 14.994911814956021 1.1752257426644714, 12.381716563181023 -0.6088767159778651 M31.289962109396434 8.189734635556887 C29.093190683172267 6.809138645497436, 25.892385109040944 3.8947054439287028, 18.576272831628202 -0.3540739481711892 M30.03998162013871 9.381537676005463 C27.434612084449963 5.777278956388183, 24.36191117136448 4.0744125650961855, 18.984906962936847 -1.6759215485679118 M35.83847745071287 5.9932075883299145 C31.97461377608805 4.960388389328459, 29.562782495215973 4.07822585178825, 24.685448337857196 -1.5839563103533965 M34.855797917220094 6.769446358647201 C32.34894553358026 4.290621840598437, 29.610966884133823 2.7767078539877046, 25.554132990770555 -0.8070755294665379 M34.72473632454533 1.1952660487122537 C33.73411826740095 0.633714586636178, 32.59987296389491 0.3190690916341616, 31.12276909835792 -0.8443576261868484 M34.346033061515875 1.6100563538349868 C33.6173033874569 0.7456701621411121, 32.59989656861223 0.1566722258825603, 31.450603332372072 -1.1580475289841896" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.46745337173342705 -0.44170694425702095 C11.122760480856702 0.9106223052489191, 26.994334959667533 0.08145744261616766, 33.784190351010956 1.0723853819072247 M-0.32831343449652195 -0.017796581611037254 C12.653640601815926 -0.8700783334981512, 23.040049823064898 0.7061776436556269, 31.991732876615323 -0.642361169680953 M32.88667698460624 -0.5343964750624957 C32.05002234990079 3.2424192627441086, 32.92718430142468 5.776315632427711, 33.488596731130805 9.063558389776414 M32.89637569834994 -0.36662643656891564 C32.91694811022184 3.761434631566813, 32.762993602513 6.685517827070199, 32.91324588494914 8.316332878231249 M31.66459494257132 6.830066481644458 C26.245437520185227 7.0729648283744435, 20.025853614644838 9.630038063886223, -1.6236143223941326 7.260685602242297 M32.12379143746356 8.704071424940771 C25.284265875577447 9.26239182379103, 17.020948428315354 8.230672794841292, 0.8876616712659597 9.064175568560309 M0.026815810169003362 8.522614923557835 C0.5895511674609908 6.350137869605828, 0.11019527101091534 4.526978852475981, -0.43723313535402963 0.6296741126972212 M-0.08452720288651827 8.435208578717223 C0.28791193709533813 5.870166541006606, -0.4521713505098585 2.737625139963869, 0.08921183114027587 -0.11980118808524448" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(108.74295687866561 86.8129565712307) rotate(0 16.462136479959668 4.373820944708854)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-1.045683913645322 6.971942713906161 C1.258543705122976 4.8071699054610155, 3.255806824656196 2.2643300641148896, 5.035196889522022 -0.27159144321109707 M-0.04854661074000094 6.186691256662003 C1.274586244988918 4.377491953903223, 2.864495725037701 2.8938948529052153, 4.903293402823157 0.6152021158335024 M2.3655434226522547 11.366537478501167 C3.861233760033265 8.173152579957664, 6.944109048701572 2.857530386656564, 11.5429044540496 -1.1604876742616532 M1.9721742216402989 9.841835430385736 C4.589564508237585 6.46497466178038, 8.369620260192308 2.730945587283953, 10.909391781081391 0.1492754773318531 M6.305468529448167 11.250624082288942 C10.912944187063182 6.411024384622328, 13.32661087932091 3.7790574394118415, 16.583216744385144 -0.7065664480510185 M7.429028007839129 10.590205492410114 C9.752567396910113 7.23070871453363, 12.27203435987963 5.163878624153364, 15.44990876932018 -0.015830808884931546 M11.258341335261964 10.513187959267738 C14.422839101068172 6.762200720579746, 15.468697712943777 6.729636990065673, 20.54275421602454 -0.5140164842665669 M12.607364233153664 10.997492089531365 C14.026739881826611 7.471534457735071, 17.28247064479512 5.0382263139229, 21.418210408927763 -0.36812885290240016 M16.71283762001732 11.262605843286988 C20.674174534872055 7.932923780796353, 23.82852618950739 5.743639351212962, 26.168470127556834 1.3581035301101148 M17.559004287825168 10.573827595882145 C20.986856468100438 6.096090133504377, 24.213827782807833 2.92967433692129, 25.642734284840884 0.863239788544619 M23.743648708320894 11.28400378427259 C25.892642120926897 7.162336742546795, 26.050962504074466 5.715372472432877, 32.0526427353237 0.6133095030918008 M22.090159377723673 11.100526525776603 C26.53122952607234 6.302784640637069, 28.94059305888012 3.4789846074473667, 32.067822646393275 -0.23720325759954408 M28.72060873311214 10.480616940812027 C30.63938291439985 7.5442754310773585, 34.17028616425743 3.437394837552146, 35.63084823407368 2.0704900644974744 M28.198629942333472 9.542407263790075 C30.775913069832384 8.014939501185115, 32.281058839150376 5.10295090706532, 35.81148157810928 1.722640006735358 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M0.19700226139836108 8.918893342571115 C0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115, 0.19700226139836108 8.918893342571115 M5.817635878809207 8.483495839721352 C4.196889794606446 6.611659402143362, 1.738587504917648 5.647703216938473, -0.2609110003234335 3.1877964635559617 M5.855891994275563 8.744445516205221 C3.386453769259867 6.599608511121326, 1.2198760085413118 4.846839838361413, -0.3395313678206392 3.3040370059379636 M13.196283993621872 9.835814515691068 C9.465469952420012 5.437491958690856, 6.654069766028941 5.634984381006892, 0.33644508053869904 -0.4970403555736309 M11.637204530493872 9.511049863979038 C8.478686933700661 6.4395691149664875, 6.322902643097259 2.4901931085876363, 1.5886534824533851 -0.6452715317674345 M19.79445700849096 8.074583815195403 C16.364815659414 7.862522594840005, 13.514068487272485 3.2790773924348633, 8.434060195932688 -0.6886366185082609 M18.922020773297795 8.705074806016675 C14.877654871351998 5.394317719887161, 10.363158926869442 2.7897618784285516, 6.9586256891550144 -1.387713283607971 M22.78052813204108 7.473045161453857 C20.2683711547556 6.409801814410404, 20.304015927696977 2.732652718363341, 12.09935247274209 -1.6058862936731453 M24.127260664180948 8.599104085000082 C22.264968357407895 6.661676458183762, 19.204403572556448 3.705599238118177, 13.508985533193687 -1.7084252346355493 M31.86347096807716 8.035852012838076 C27.183200396694488 3.885186816364926, 22.828075828467078 -0.13323773652527393, 19.602818952271818 -1.9462843601966409 M31.08077903253419 8.713431418089966 C27.715954015229567 6.090509131601592, 25.533389799766034 4.774118389686345, 19.388388625210254 -1.2126019489394633 M34.172663694723795 8.297392239704731 C31.89958129655714 6.179894303621502, 31.43476050200533 4.377751336690576, 26.20491824275679 -1.2521769018985016 M35.14450485167215 6.941879293885085 C32.588750794519726 5.713587427272981, 30.32006081070187 2.8440456043909963, 24.4959317373564 -1.6397790840968174 M34.2098680651154 1.589515106482029 C33.0682802825236 0.6823627199873885, 32.29437022181974 -0.24824287690460362, 31.481640462951425 -0.9647173330655104 M34.49346734381873 1.6102973950184585 C33.84811406410855 0.8704036223490801, 33.33641980898703 0.4991284961678448, 31.44693440422571 -1.0608780820389991" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.9722298495471478 -1.5070674903690815 C8.533509609621442 1.8249347258673905, 15.116996413760734 -1.9092746924294235, 33.64176433706442 -1.9737338311970234 M-0.7434781435877085 -0.4334691222757101 C13.00759485537735 -0.5468884028645575, 25.190710929903933 0.22371114118375174, 33.73438219340304 0.7828094903379679 M33.201165287027294 -0.5272852035499044 C32.457280137199696 3.531400464327517, 32.929315055022016 6.76767718596337, 33.46930221661503 8.447145232881239 M33.23911001626798 0.2212925183876584 C33.18725205156899 2.419512336240589, 32.83708092599376 5.067498543441305, 33.26531368155245 8.937044144607237 M32.376462751866974 9.401088992173023 C24.595911725136453 7.196982310146851, 17.655070934001635 10.578503177494568, -0.12849761173129082 10.039426008278674 M32.991173725442685 8.114494465330786 C22.863106942475902 9.24021981597984, 13.666961164758142 9.170910760174634, -0.07102873362600803 8.119641028383917 M-0.2921978278909757 8.76407426125162 C0.6036376250633512 6.750873463210936, 0.48670901499391295 4.04500474548682, -0.09951291351687486 0.6170772951446799 M0.16160210683044096 9.170667907856961 C0.20931191671377178 6.5137161740126865, -0.2394180074701945 4.690147815450441, -0.16083486171364558 -0.1274329211651194" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(182.60251373491406 10.583493612782604) rotate(0 20.673131540005443 4.958198493424135)"><path d="M-1.4674116857349873 -1.429482113569975 L42.81515894375252 1.0722386725246906 L39.37922451935219 10.9980354310633 L-0.23065929487347603 11.761574149286872" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9197800047695637 -1.2993897683918476 C13.01845432709521 0.5861714018850608, 25.44109181470406 0.8382328881292624, 42.32342864863085 0.7591628544032574 M-0.44477500952780247 0.26551508344709873 C16.287106940531565 0.6427122499951463, 31.783978743496203 -0.43129460549267723, 40.52829283952224 -0.24147862754762173 M41.92131537558268 0.3709139603258824 C40.660284654690244 2.259123576682857, 42.120629232025564 5.416152271994974, 41.156462949026476 10.825090086061229 M41.3058128517753 -0.03440044949619864 C41.82733230327208 2.1073742560812323, 41.76736387588163 4.276775734079241, 41.495260673926694 9.816518335906583 M42.650461810747856 8.893328391170627 C27.48212143005525 9.786315944831102, 13.438769605270846 10.924859669844835, 1.37657605484128 10.619140230274326 M41.41149390220153 10.67341495134545 C32.84495064082539 9.753210438590031, 23.461243912042022 9.420715881686192, -0.6988221053034067 9.074551886102682 M-0.37158892508398844 9.726264068744022 C0.7312707571286096 7.7827061009893885, -0.9015126427558654 4.942507914121927, 0.7982252154995118 -0.022754720593310185 M-0.3665394480540746 9.526203903245005 C0.28566199225787525 6.900519569751855, 0.39800980088403803 3.4454901767897317, -0.401935895579948 -0.4169107710847153" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(183.23297201658465 41.96244497993234) rotate(0 20.673131540005443 4.958198493424135)"><path d="M-1.429482113569975 1.4688958637416363 L42.418501752535576 -1.9670385606586933 L42.427901524225945 9.685737691974765 L1.845177162438631 10.46320506200803" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.2993897683918476 1.1564899571239948 C10.219185990107798 1.5270157444969488, 19.76371313410549 0.7870182383552862, 42.10542593441414 1.550144899636507 M0.26551508344709873 0.6785930749028921 C10.794777208682811 0.7508488577923689, 20.149228924372625 -0.6340723949351397, 41.104784452463264 -0.4148303512483835 M41.717177040336765 -0.6586205625139383 C41.11618497141888 3.5030644828088593, 41.70965681519379 5.2128808049905455, 42.254956179223875 9.25055747247728 M41.31186263051469 0.45239766607510923 C41.216931199508586 3.654063906600004, 41.07725309582308 6.6722669297465105, 41.246384429069224 10.026252387661275 M40.32319448433327 9.372215032732612 C32.5633300888113 8.91844162293655, 25.126047938162372 11.562591760837856, 0.7027432434260845 8.03550040737403 M42.103281044508094 9.930561231604344 C30.233294777097562 9.089301694114774, 19.065573295453632 9.24454538889703, -0.8418451007455587 9.211538897982365 M-0.19013291810421806 10.627358204867999 C0.16901915090569603 6.22275110001757, -0.39522520276581247 5.170974792415844, -0.022754720593310185 0.34366838311100656 M-0.39019308360323723 10.110495194534767 C0.24169012767610787 8.163561752819952, 0.10524821387946381 5.395898590586335, -0.4169107710847153 -0.36378592047220815" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(184.0746046066031 58.795096780347876) rotate(0 20.673131540005443 4.958198493424135)"><path d="M1.4688958637416363 1.0722386725246906 L39.37922451935219 1.0816384442150593 L41.11560378513741 11.761574149286872 L0.5468080751597881 8.888993024981147" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.1564899571239948 0.9267203323543072 C17.52653130345736 1.4474063746355275, 32.59590202601143 1.027787410440276, 42.89640797964739 -0.9555496461689472 M0.6785930749028921 0.3663186375051737 C10.2068938113424 -0.8174687829017977, 18.461915357718432 -0.35125919246676957, 40.9314327287625 -0.2758851107209921 M40.68764251749695 -0.3044332748543488 C42.205492324389446 4.007509327598434, 41.213968178583656 8.345634616568733, 40.68042356563993 10.203112558709119 M41.79866074608599 -0.20170532560227888 C41.730602273048426 3.357270117766461, 41.48717062827958 7.297428443166826, 41.45611848082392 9.457341066357905 M40.802081125895256 9.709289871311313 C26.904549337749703 10.636333279796126, 16.374559816449853 10.407639913745406, -1.8808965794742107 9.157609187221652 M41.36042732476699 9.638655608675009 C26.942770056771742 9.61915969093513, 13.548586528400516 9.387550525266642, -0.7048580888658762 9.711589342138296 M0.7109612180197578 10.058659934185343 C-0.8846632893563183 6.787058520145306, 0.8353841080079106 4.478928145517875, 0.34366838311100656 -0.9156378457134937 M0.19409820768652586 10.219107048917333 C0.37975390592350355 6.497546695136366, -0.3286280059691331 2.3292493357948767, -0.36378592047220815 -0.35438280309396886" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(186.14862371056392 89.36391455269677) rotate(0 20.673131540005443 4.958198493424135)"><path d="M1.0722386725246906 -1.9670385606586933 L42.427901524225945 -0.23065929487347603 L43.19144024244952 10.46320506200803 L-1.027403961867094 8.09975370035184" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9267203323543072 1.717163074761629 C15.48413404524858 0.7380828785367706, 29.369867307703398 1.311062209553311, 40.39071343384194 -1.3369702212512493 M0.3663186375051737 0.5669510122388601 C9.857212250485532 -1.0985431184807335, 20.88832230938874 -0.6954032292404686, 41.07037796928989 0.7498203869909048 M41.04182980515654 0.8017240148213807 C41.67239304496403 3.3601712606418515, 42.29204757525646 6.434103268881566, 41.63297865187176 10.64013143784801 M41.14455775440861 0.3924292386846535 C40.997999560823864 3.043016233499113, 41.23950433916998 5.897892679979639, 40.88720715952055 10.008765808896595 M41.13915596447396 8.648822903788215 C27.17763357400107 11.180473590498226, 11.848874092463728 7.923000956182735, -0.7587877996265888 10.470521211779243 M41.06852170183765 9.062331186762577 C28.062443147408576 9.223431267444989, 15.15724998421485 9.217395283882519, -0.20480764470994473 9.646579193106419 M0.14226294733710154 8.994574804983523 C-0.4029681145239851 8.11129203137727, -0.0037414287828883702 5.053405028018279, -0.9156378457134937 0.6101993439298525 M0.3027100620690931 10.22284300316093 C0.2443067649930965 7.27262089132197, -0.33887215436406787 5.068930666483387, -0.35438280309396886 0.36415386293003515" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(185.0518867218941 25.32113167334535) rotate(0 18.508925322768505 4.917631763005261)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7017153791757176 5.877070096184604 C1.5786462206632117 5.710745157749052, 2.452122762955626 3.276706583866802, 4.525520099284596 0.3200670059618764 M-0.6282438143761704 6.0229745351272745 C1.706646419318922 4.34702357796536, 2.9509931488381484 2.418894985461018, 4.963126381071753 0.012895153820257743 M1.1183325504789452 10.050526800686683 C4.186329535476249 9.068712838059776, 6.003693831218697 6.721825795796477, 10.005537408170236 -0.20515931704195722 M1.5017125610167485 10.98701547098365 C3.7046520182848806 7.865901403243819, 6.189449184972929 5.0610999445497225, 10.675289335240555 0.5901122943579913 M7.325744465127901 10.970413259709188 C8.081597772482523 6.394273744278169, 12.24974250249687 3.977354914139238, 15.509608792423684 -0.2360155795770713 M6.488585734315531 11.481318215759405 C8.555687286806672 7.985750250945322, 11.552323233610862 5.694141787247412, 15.175896372369905 0.9317560044047406 M11.43541406294217 9.590825810505988 C15.197312627037856 5.191770693725728, 18.45835626495255 3.2905689857812193, 21.155476434061192 -0.1998296370566366 M11.882469512837732 9.804447719118539 C15.874864140617001 7.045154453333083, 19.61368899158949 2.596780347353099, 21.295483008863293 0.257542635144246 M16.690093891409486 11.180180195553882 C18.04838837359042 9.84389230489882, 20.285104892316607 4.792965283715631, 26.655814755570503 1.570642377996141 M17.48101756711796 10.456930319240971 C19.094617151875944 8.78585881605631, 21.472099090103534 6.312718577035917, 26.72486697775534 -0.009540537743574218 M21.00407305718193 10.007845636463191 C23.188813848318865 10.100981928682735, 26.603255217516015 6.539929609686501, 32.12363914441953 -0.11902703239924861 M21.42503922558183 11.465760468045648 C25.02678814370945 8.57832402815931, 27.655844964970935 5.296323183186748, 32.00993113028521 -0.4185773100458796 M28.030937153140286 11.523878100636862 C29.37403345993102 7.348954571576211, 32.88577344950302 3.3976418684669056, 37.03354385104154 -0.16957924206736485 M28.194687345477025 11.50941238732215 C31.662760342471238 6.409858440682764, 35.08993901020348 2.090807608709503, 36.619514005259916 0.5957350331452211 M32.33274554367803 11.256052187378705 C34.8706931078065 9.1263174178731, 36.52080928073624 7.132614696114324, 36.768334437977785 5.706883063450908 M32.43038762705095 11.437058681636305 C33.76532365241139 9.469927253858536, 35.24639585086906 7.866638467903386, 37.42666068670925 5.873235579956775 M-0.3415162273967791 9.538387998785446 C-0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446 M-0.3415162273967791 9.538387998785446 C-0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446 M6.065318440573752 9.883814158104903 C5.037958472152734 9.137073019906763, 2.493869653200533 7.602210820622136, 0.612418534097362 5.193131871783261 M5.679857247995675 9.686587221734767 C3.7597633794426395 8.030969321743255, 1.420878134225712 5.717063040520566, 0.03557524332134776 4.390642822795634 M11.720139043755466 10.558318896373331 C10.610982587096967 5.985547004792593, 5.366587340368838 5.516482592007974, -0.47619726822075265 0.08960996337848681 M12.099249321944647 9.79055736639124 C7.9462075764826725 5.851024436563165, 3.317075312322359 2.6176110089611613, 0.3170619605452119 0.3058417252512551 M18.23672269747597 10.127124097096377 C15.573012045997192 6.757969972735476, 12.000017904442606 2.6248426632812993, 7.359472552398168 -1.6633103869002535 M17.756179729914336 9.837536929539048 C13.007351905901027 5.123171305951185, 9.158920503013626 1.9568835178689703, 6.175051644062572 -0.8449766544703617 M23.492024910099026 9.249398178286484 C22.483279881253765 7.453090568831913, 20.57523294927073 4.338450632176252, 11.13365856882593 -1.7708549441614885 M24.421278334803098 9.241462068684033 C20.141013715184815 7.148225628532769, 16.750215878783457 3.005794385060106, 11.8118540349271 -0.5975107746043937 M30.77727901546061 10.909574545715614 C24.603966780061647 6.506041201647312, 20.06231195814391 0.8915835497737286, 19.528582260885738 0.019853028637928816 M30.068449036514014 9.97766081456156 C27.386056808319136 6.999178291451726, 25.283065597601805 6.012017536593747, 18.247827580606096 -0.7893187499776353 M37.65088587409883 10.415416425043382 C31.00166869388715 6.802794578586556, 27.492946917748334 1.3544321041523233, 25.69338012852886 -1.0630215118577127 M35.83565964258342 10.029994944804944 C33.15644246086417 5.789981699972598, 28.783417280908694 2.5996545955271277, 24.57692809173086 -0.519289788131917 M38.16255626085257 5.936263490757855 C36.82239319755305 4.822461715178103, 32.95290151089323 1.1407092269826602, 30.48357384855398 0.007206374287278772 M38.82892409444829 7.077958989898624 C37.33793360256542 4.913113376182234, 35.52434631583503 3.1862883754353537, 31.185682173897558 -0.6218209658082112" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M1.717163074761629 1.1787818185985088 C9.677489943447657 0.5876368801368864, 18.95079364929189 -1.1270756204353183, 35.68088042428578 -0.8895500190556049 M0.5669510122388601 -0.7076882179826498 C11.270264723490662 -0.37708191366605315, 23.761639576710238 -0.4114883968394235, 37.767671032527936 0.5799004379659891 M37.81301515653404 0.28667372819835124 C36.76428022188061 4.219716694890448, 36.29213567274893 6.921286324495223, 37.73566368366729 9.755024976684185 M37.40706912709784 -0.33859031611093793 C37.255368431171014 2.7832796014330006, 37.216546489626886 5.309382100203325, 37.109463729163835 10.155941981187658 M35.750276562477005 10.7667001354873 C29.18266153046386 10.49967346808371, 16.713423726233923 9.038142425031037, 0.5541242249310017 9.96572517039178 M36.16378484545137 9.225027590933424 C22.4113555519979 8.890604536711914, 8.49764658019971 9.527641992747528, -0.26981779374182224 9.460541814509016 M-0.9142800651068989 9.416734665187375 C0.7211799067170492 7.050693219491883, 0.27034214376556176 3.4031136797468324, 0.6052068466912272 -0.727081029343915 M0.30393875224325256 10.000171310199667 C-0.43714938639969414 7.245186000247642, -0.3993115447730433 5.130624566633904, 0.3611744478041449 0.26364374767650006" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(183.661513821766 74.84026940704484) rotate(0 20.466560657864875 3.723775416737709)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7821707449527332 7.0161086738604235 C1.8909527528918657 5.034656659078127, 2.3033790562452 2.6139649441418102, 4.946975038530773 0.37143935820434626 M-0.6362663060100624 6.769280875523679 C0.9616532110517559 4.81477793307335, 2.3007426777179925 3.0920211015676835, 4.6398031863891545 0.07896463012669619 M1.6547608673849807 10.401714190907338 C5.904037797978697 6.8177483846625195, 8.875249314058108 2.5521947678613683, 10.471615677838159 -0.5410041767090795 M2.524357489803593 9.70027495973886 C5.186804664249329 6.312293852065039, 8.152990514065396 2.887653243868364, 11.210082174138112 -0.024761281713121974 M7.81199683977683 8.49859397615919 C9.63415933255007 6.166828795764039, 13.318684467537345 2.2785124554863683, 15.135709716662248 -0.31830165992258297 M8.24991537353416 9.264136976186727 C9.867131611029885 7.156909461963036, 12.507098440137366 3.5374694784694483, 16.136656788646658 0.7978100995463304 M11.89001642016574 9.321113368477958 C13.286715571391369 7.117432515619359, 17.010980205871668 5.602121336447779, 21.138685424344505 0.881340328620327 M12.08837962102025 10.011181816533492 C15.063125368642936 8.151653566889758, 16.48177535437982 5.45313286755, 21.563388248531037 -0.5886333632119474 M18.651453912046883 8.429540421166788 C21.58621660392805 6.226524584310523, 21.343638585188515 4.901395642243429, 27.343930521842452 -0.7932039087626395 M18.031525446635815 9.462586563551199 C21.061176038456125 6.85135447978576, 23.40569655311525 3.806869545911494, 25.98948802263698 -0.28708305600911954 M22.313430389694545 9.035447391019275 C27.629990994519616 6.604539207106852, 29.769111712624305 2.543858009223775, 31.876799329786397 -0.3237191882670636 M23.57695657706601 9.908037998889647 C26.70727799748739 6.559798295748731, 29.494430821874044 3.3152234783975723, 31.617189089159318 0.03035550194450065 M29.60570895794974 8.173472095666575 C30.78582604841234 6.494609699038682, 32.16788819405557 5.136092676999407, 36.51196883190547 1.282672040639492 M29.59330977510856 9.652517168982078 C30.81471363178487 7.069450758778014, 32.71566720228449 4.177302012582048, 37.1679524963734 0.6701395721268377 M34.11739966354947 10.534876539556173 C36.25487136417191 7.684784803513712, 38.623925577068675 3.222082208697163, 41.53376161880403 0.38164286026151184 M34.427696510848214 9.273992497099426 C35.48472627265007 7.475746521818877, 37.17574015250995 5.623251990617581, 41.818937361385515 0.8999130112061025 M0.08601201075366216 7.5223199337165045 C0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045 M0.08601201075366216 7.5223199337165045 C0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045 M5.813055782864786 7.6114999680457345 C4.549388522741588 4.965841568183827, 2.475449252360402 4.518958420016457, 0.33316908668502676 2.641491910392307 M5.615828846494652 7.27005444529035 C3.678192896451231 5.163754431190528, 1.0263815624331953 2.89353308819638, -0.4693199623026001 1.69260822864797 M12.561760557964533 8.368867361760099 C8.671763796584885 4.49691500994463, 8.389383130038919 2.769831241989709, 2.0205325243423378 -1.0283129531226587 M11.889969219230204 7.129155316977163 C9.342969300357295 4.670792577504482, 7.345426259427637 3.068562277955804, 2.20973531598101 -1.6203577627695633 M18.954384159867846 8.7319233903618 C15.549171591186692 6.058292187338102, 11.475516651353196 2.6385039425788053, 6.568870924219282 -0.15624393929146496 M18.684102803481004 7.180569241191581 C13.982925957243108 4.288345015844334, 10.704549585593973 0.11715203642234506, 7.332649074487181 -2.2211523575370786 M23.57381559182671 7.665831934772799 C20.07399711511676 4.936432381625223, 15.606133926249623 -0.18836864771585016, 12.549986392433025 -1.3722213354322252 M23.566871495924563 7.067378642335901 C21.256677086695827 4.613335656899096, 17.224230421472956 0.5801574171299326, 13.576662540795484 -1.2878637312340673 M31.84979872131791 6.47664415844286 C28.227724804714285 3.7943275039499285, 23.563484176957697 2.97496683307173, 20.304950921460364 -3.050272103095172 M30.980012572240792 7.029442056906765 C27.306878117729987 4.883938139156354, 25.58901211222426 2.3781751455842106, 19.549723928085836 -1.2124692848638874 M36.751442219927156 6.069221983859717 C34.47090507507364 4.850610202786778, 30.664850360232254 4.1183855002452905, 25.326701307886996 -1.7781124447925722 M36.41419842471852 7.57398346793878 C32.190847245439876 4.043456134963394, 28.89486308590611 1.1410589266349798, 25.802466566147068 -1.7133309728919304 M40.85904199522886 7.377232030532768 C39.84069737813699 3.792586350320865, 35.7642234658205 1.4873221962294674, 32.56951732792601 -2.376840587690184 M42.20831849421341 7.003114513527115 C39.77503202665571 5.552451039262506, 37.893940369513906 3.5429269201608653, 31.82612138054043 -1.6426168369908727 M41.39455382504937 1.4687289900528673 C40.80518485769781 0.5704444299654801, 39.78727082363315 -0.1101443855832302, 38.10788577603176 -1.5643135214398605 M41.31901113332686 0.9474568526642807 C40.17420664497604 -0.02786682855493694, 39.20558171175643 -0.6738170924581113, 38.43612110087648 -1.7169479420582752" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M1.1787818185985088 0.9771655686199665 C15.406036434891156 1.1987006773459608, 28.33819751480685 -1.6884144435417956, 40.04357129667413 0.5310301668941975 M-0.7076882179826498 -0.8179702404886484 C14.816151033072716 -0.6469010968888739, 29.839374210519683 0.5177496413504145, 41.513021753695725 0.3740410562604666 M41.150199094859175 -0.1425463427474154 C41.59271671394786 1.2395507622512174, 41.125590472594496 4.022786836263256, 40.87236232606651 7.395879022301052 M40.67673076917347 0.11190225202008763 C40.82781154801742 2.996302401438352, 40.558057223262125 5.896966652508357, 41.17594847433268 7.257066949164407 M41.86455792520654 8.82412688831672 C30.469681461719198 5.307225435408375, 17.841967321229898 7.742246239813587, 0.1304616443812847 8.961586762469857 M40.322885380652664 6.748728728172034 C27.753361145370533 6.393096954544341, 16.0524835317925 6.828137249668395, -0.37472171150147915 7.2558149455754455 M-0.31692236390142886 8.047045072852796 C-0.027773998145374276 4.937436193540276, -0.6983595416148781 2.3693887303784855, -0.550567141569103 -0.5860965084112186 M0.12487302473763362 7.145683328765793 C-0.35858325159890997 4.648521316783256, -0.31162262459470425 2.6694148716058734, 0.19963880048114646 -0.36624049179779955" stroke="transparent" stroke-width="1" fill="none"></path></g><g><g transform="translate(229.51267155793334 16.71315914537547) rotate(0 1.6078565315667674 8.902164203595532)"><path d="M0 0 C0.5359521771889122 1.244388544588623, 3.1085226276956908 4.498943199666563, 3.215713063133473 7.4663312675317375 C3.3229034985712556 10.433719335396912, 1.0719043543778244 16.08132888391449, 0.6431426126266947 17.804328407191043 M0 0 C0.5359521771889122 1.244388544588623, 3.1085226276956908 4.498943199666563, 3.215713063133473 7.4663312675317375 C3.3229034985712556 10.433719335396912, 1.0719043543778244 16.08132888391449, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(229.51267155793334 16.71315914537547) rotate(0 1.6078565315667674 8.902164203595532)"><path d="M0.2272060114637523 8.422212505749702 C0.32933506651411637 10.72589683818918, 0.43146412156448044 13.02958117062866, 0.6431426126266947 17.804328407191043 M0.2272060114637523 8.422212505749702 C0.36876308190758855 11.615258844685105, 0.5103201523514248 14.808305183620508, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(229.51267155793334 16.71315914537547) rotate(0 1.6078565315667674 8.902164203595532)"><path d="M6.355224544705579 10.34985176255103 C4.952680194286799 12.180223635291913, 3.5501358438680186 14.010595508032798, 0.6431426126266947 17.804328407191043 M6.355224544705579 10.34985176255103 C4.411212928080638 12.886858298058716, 2.467201311455697 15.423864833566402, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(231.12052808950034 65.15438216253358) rotate(0 1.625365556812426 7.675218932235623)"><path d="M0 0 C0.5359521771889122 1.2443885445886225, 2.9277252832505862 4.907924956786525, 3.215713063133473 7.466331267531735 C3.50370084301636 10.024737578276945, 1.9758910766033457 14.036420098314668, 1.7279266792973202 15.350437864471255 M0 0 C0.5359521771889122 1.2443885445886225, 2.9277252832505862 4.907924956786525, 3.215713063133473 7.466331267531735 C3.50370084301636 10.024737578276945, 1.9758910766033457 14.036420098314668, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(231.12052808950034 65.15438216253358) rotate(0 1.625365556812426 7.675218932235623)"><path d="M1.0370742013603782 7.303718897883263 C1.2787355471154154 10.118474575079704, 1.5203968928704525 12.933230252276143, 1.7279266792973202 15.350437864471255 M1.0370742013603782 7.303718897883263 C1.2137010109201491 9.360983428129707, 1.39032782047992 11.41824795837615, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(231.12052808950034 65.15438216253358) rotate(0 1.625365556812426 7.675218932235623)"><path d="M6.371034227911183 8.742222101837328 C4.746867485346768 11.05378696588277, 3.1227007427823525 13.365351829928212, 1.7279266792973202 15.350437864471255 M6.371034227911183 8.742222101837328 C5.183954063990267 10.431711687981876, 3.9968739000693523 12.121201274126424, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g></g><g transform="translate(275.84905870769467 10) rotate(0 20.673131540005443 4.958198493424135)"><path d="M1.4232905898243189 -0.3399385903030634 L40.17057056083786 0.1464069988578558 L40.55850494756805 8.043585838905102 L1.5330776367336512 9.106999756923443" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.49302185513079166 1.342030981555581 C15.867674397354829 -0.6257826808783074, 30.942106230284786 0.37991522513899606, 40.95166321575152 0.2699523363262415 M0.8549465732648969 -0.8739619301632047 C11.049885713675055 0.7234329406181172, 20.717073885152107 -0.77888910733072, 41.34332718338328 0.2813338888809085 M41.62766716940167 0.17620639775682712 C40.707744037688656 3.780634576251402, 42.039113840572355 7.387624659599548, 42.10180063956019 9.945095760653135 M41.181722513312806 -0.49534831611125757 C41.83584884838954 4.093154236870321, 41.39014140124929 7.928286063815499, 41.34868893178536 10.001203547607368 M40.54354967653262 10.137697687527663 C25.607177901048477 11.695086923956538, 11.557271198196169 7.968328622698451, 0.1514634396880865 9.208094170472151 M41.884716201680476 9.587033099649972 C25.086721141034253 9.80975666091118, 11.148243177861179 10.081530148240663, -0.3493270380422473 9.550943649529046 M0.8268301690848165 10.320896213222019 C0.24266668646437536 8.420194459746908, 0.4863910848955415 6.243700669825837, -0.30213815658909504 0.5257315441859762 M-0.46067437238159864 10.224010761882512 C-0.04947906714191558 7.276842551386886, 0.41994121977434623 5.177277582735816, -0.43899320933150154 -0.29248644490504894" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(276.4795169893648 41.378951367149966) rotate(0 20.673131540005443 4.958198493424135)"><path d="M-0.3399385903030634 -1.175692519173026 L41.49267007886874 -0.787758132442832 L39.473451932067746 11.449474623581892 L-0.809397229924798 11.502711286923414" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.342030981555581 1.1775106694549322 C11.829759912192182 1.6637018311171556, 25.047692471368823 -0.2926768076271986, 41.61621541633713 -1.1307731959968805 M-0.8739619301632047 0.9836690919473767 C14.0188211366876 0.23115218232110885, 25.844582833604324 -0.35642435600324723, 41.627596968891794 0.2681501703336835 M41.522469477767714 -0.6263091986156664 C41.935341544865175 3.5578812632568826, 42.30150417804687 6.46117887752699, 41.374961853815776 10.905887958875557 M40.85091476389963 0.449487380650758 C41.383589429254336 3.3796829068439314, 41.254311188117725 6.754231798441323, 41.43106964077001 10.125215219697902 M41.56756378069031 8.32251948319984 C32.15409268539915 8.251874457871224, 17.29614260146053 9.90191442874601, -0.70830281637609 11.43114769660545 M41.016899192812616 9.192198039187616 C32.170808896581754 11.011693311578584, 23.074530726510563 9.789700193054033, -0.36545333731919527 9.118351251973337 M0.4044992263737772 10.112886171264527 C0.7138293964626172 7.843466494122115, 0.7208808705279597 4.587970679111989, 0.5257315441859762 -0.4557993407640162 M0.30761377503427156 9.928083559611837 C-0.1389996948753146 6.81092835141852, 0.2805550917640721 2.3042527872975835, -0.29248644490504894 0.3528478629085833" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(277.3211495793837 58.2116031675655) rotate(0 20.673131540005443 4.958198493424135)"><path d="M-1.175692519173026 0.1464069988578558 L40.55850494756805 -1.872811147943139 L42.87934071674454 9.106999756923443 L1.5863143000751734 10.89397591554237" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.1775106694549322 -0.3824747409671545 C16.398737097599785 0.34578333492553753, 29.279316781970227 -0.0074874937983699374, 40.215489884014005 0.4470560345798731 M0.9836690919473767 0.6907373918220401 C12.661564077739246 -0.4502340572623776, 24.150910975457386 0.6426844877453282, 41.61441325034457 -0.12312782276421785 M40.71995388139522 0.5398071260849517 C42.079464093218 4.37360224301448, 42.12994104849927 6.96396374130056, 42.3357540520382 10.317380864127193 M41.795750460661644 0.12874416878853356 C41.31713490018178 2.0602068282555432, 41.315780818445745 4.206288361966368, 41.555081312860544 9.763777741561299 M39.752385576362485 11.855418374917036 C29.047122134425905 8.17440985860936, 19.89659769090241 9.253843955422562, 1.5147507097572088 8.447907677075392 M40.62206413235026 10.108994460819787 C32.80742823557599 10.244588288857724, 22.17393419213424 9.4147639902375, -0.7980457348749042 9.45072884034974 M0.19648918441628593 10.603739723733321 C0.35753056593897037 7.900738697519564, -0.3848211734975767 5.022023530476018, -0.4557993407640162 0.6116563990551056 M0.011686572763594971 9.795962306677342 C0.5732417341561765 7.054947850218096, -0.3468583348551818 3.302772055745926, 0.3528478629085833 -0.0842741503148684" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(279.3951686833441 88.78042093991428) rotate(0 20.673131540005443 4.958198493424135)"><path d="M0.1464069988578558 -0.787758132442832 L39.473451932067746 1.5330776367336512 L40.53686585008609 11.502711286923414 L0.977578928694129 10.628863981148726" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.3824747409671545 1.561778774484992 C12.00787152273178 -0.6380281938533789, 23.039249051689502 -1.3742015255908973, 41.79331911459076 -0.1581547949463129 M0.6907373918220401 0.5846406416967511 C15.127178861632892 -0.4096463072209215, 32.15886092440029 -0.13856024025963304, 41.22313525724667 0.4968673484399915 M41.88607020609584 0.7050606042680374 C42.26985347905322 4.011860744994555, 41.39258249359146 7.502137342917824, 41.74724695728984 9.00221879139739 M41.47500724879942 0.0037799335105064835 C41.24665934444056 2.6482638144327915, 41.33199997754776 5.500494158166298, 41.19364383472394 9.724129270401292 M43.28528446807968 8.417820455661541 C28.81410722072149 9.738356033620539, 19.14912237143459 11.101643303689661, -1.468489309772849 8.468924941650158 M41.53886055398243 10.789063067330545 C27.180452674828246 9.267640746683561, 11.720447009724356 8.818922049850904, -0.4656681464985013 9.64929735769195 M0.68734273688508 10.356610569695693 C0.6539442883130446 5.812509565119857, 0.4852816215485985 1.8570991158056473, 0.6116563990551056 0.18565014704416072 M-0.12043468017089853 10.397503846528098 C0.23759735146163388 7.423833426225464, -0.35400919991190327 6.162104147174274, -0.0842741503148684 -0.2914658438646853" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(278.2984316946747 24.73763806056286) rotate(0 18.508925322768505 4.917631763005261)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.12973539385480193 6.437335052390687 C1.2604964169191981 5.2336987336829015, 2.4621630128617795 3.9240182283548277, 4.34790555420751 1.0883615899679473 M0.07110049243551991 6.472008642231218 C0.984993897345254 4.712970250750595, 3.062002440998285 2.5004308300127835, 5.035541059034413 0.3418150894192272 M0.8955995667631131 10.506669853748944 C4.583024726313701 8.007775091839687, 8.17051116096063 3.047613429301469, 11.568381882597638 -0.5395342319190739 M1.3229948413463632 10.903969261823413 C4.098842936584043 6.777542993248491, 8.152116328464086 2.662106379402511, 11.024463933938128 -0.40283397509830576 M5.522086988337652 9.820351117582925 C8.909626129495917 8.06609011510362, 10.364556953628648 3.6461839231615083, 14.247774005405088 0.9369802457210592 M6.485090211820172 10.748951842276313 C9.407457931519502 8.925674828856689, 11.609146602754475 6.019103763048109, 16.282725056822827 0.4591832313616744 M12.427531731444203 11.702802072129192 C14.270096164371797 7.657778311686174, 19.23397078322059 2.30781100258617, 21.62839517826017 -0.3391284081921455 M11.87179882194733 9.919527748542817 C14.868617516377029 8.083583121384123, 17.295085497163605 3.839986112253592, 21.84132878947806 -0.6207813096069272 M17.345280825992376 11.708699446192666 C19.37974842656012 9.575088950014019, 20.92806480667191 6.774789250411928, 27.260193993572504 -0.43916002735376936 M17.598025929176508 10.967575247513786 C20.05723073781684 6.519882295182738, 23.929157617556164 4.002535253255292, 25.680317108102503 0.9237014501367451 M23.045826976511425 11.511285590479549 C23.77965860866572 7.876798455444895, 29.01911195406103 2.893289736252826, 32.30370669993276 0.9219451468683069 M21.655245971491357 11.77559226682077 C23.663204478081596 8.329438330491955, 25.61119303591192 7.305631194371278, 31.69767118356826 -0.018465283907648455 M27.54273384943811 10.262205092917643 C29.745776704575317 6.798162785811488, 33.390609396544306 3.705996489509507, 37.65002258159344 -0.5247767713758629 M27.37157985637171 11.287805784709267 C31.01974639494834 8.187419917044583, 33.46786255390959 4.324101805230656, 36.3219135996898 0.5265577579010746 M32.08723909081586 11.41406607741043 C33.636611020965134 8.969223373230577, 35.1565700230256 6.942207705506863, 37.387272941299045 6.487376884002691 M32.82860403298095 11.041189328548684 C33.87295907305821 9.959550491781796, 34.89954289705467 8.235176021785058, 37.47623579988841 5.61352385129266 M-0.3415162273967791 9.538387998785446 C-0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446 M-0.3415162273967791 9.538387998785446 C-0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446, -0.3415162273967791 9.538387998785446 M6.216864204334557 9.949376655929333 C5.129483176167226 8.320460673487156, 3.3034944880022756 8.057641099003167, -0.7339613412943597 4.490166837983851 M6.035097777398765 9.502182038315182 C4.024901823758896 7.703994373558267, 1.7291409985717765 5.3625896413571335, -0.2966946819302554 4.589081002661113 M11.71852271033042 9.336288136025487 C9.424426390610336 6.407000094168254, 4.2041879649404095 1.1927198919600448, -0.5306457759469452 -0.9943064876584433 M13.100652044988806 9.30140143089515 C9.946577966562081 6.715219483657586, 6.255939087273605 4.995432555526236, -0.0762713441282214 -0.6861081660477543 M17.282297423547178 10.06472628041228 C16.683928369096126 9.043838137438549, 12.988931333548436 5.975402725192067, 7.915851797526651 -0.6807687899176255 M18.471099598072513 9.234492792878875 C15.438748298790125 6.397567391776031, 11.264360117206703 4.13304445367815, 6.075039590559848 -0.026811284440033223 M25.37612682694864 9.533026425964763 C20.5559211190869 8.127397445372269, 17.717594259393977 4.641979019004044, 12.154042691257208 -1.208881063930108 M24.871618044770475 9.989601903833083 C19.53839120071966 5.800947119261395, 14.70621171872195 1.7998774134429518, 13.031567519422978 -0.6079006740668552 M31.25703063357398 8.999932496138527 C26.285938427109073 5.530662221694951, 24.69778960746632 4.9686570990304535, 19.924429955368392 0.9839134008452426 M30.603030536852405 10.261263649007915 C28.074362913603185 7.7860663875756675, 25.420209111592364 5.32142436901701, 18.93628676392836 -0.6760806008878966 M37.34917894161373 9.92007843660716 C30.608832303050296 4.501382754838786, 26.78812627686987 0.6799861471693245, 22.964921344008236 -0.8970091069986466 M36.987643248653704 9.43064356329445 C32.602389367375935 6.0893338019055445, 28.156525754141562 1.7402276939412755, 24.56190699332574 -0.802730901812518 M38.13078393445392 7.854295901964368 C37.813506325632055 4.474670416072694, 35.06885768551862 2.782547671148271, 29.91113894440704 0.001948297369898766 M39.12084293270526 7.292200492099808 C35.631233633514775 3.6474210989924316, 33.03192261415495 1.2103877886692067, 30.90623339222283 -0.5939611865937248" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M1.561778774484992 0.623223165050149 C13.194710451247147 0.34074430051030447, 26.047847435016138 0.5178479987639361, 36.85969585059072 1.7098931465297937 M0.5846406416967511 -0.8115846561267972 C7.867200076966827 0.025637752823347792, 16.00842211752255 -0.37882395882177855, 37.51471799397702 0.2837765468284488 M37.717142621643 0.8985573430351131 C37.70508809773763 3.1538077468950374, 37.937127362359895 6.64839068670669, 36.11115202567373 9.508874851506846 M37.02159965258424 -0.0005297027027316714 C37.10941860724173 2.0521007505857716, 37.31412233787368 3.7846055583767533, 36.82715601508728 9.637891079533757 M35.51927411435033 8.047526612821203 C25.654355077288834 9.485852730344568, 15.502683339421669 8.72566623694781, -1.447472045198083 10.912169769349676 M37.890516726019335 10.299634487311524 C26.782402732267578 9.541105633953164, 16.447563160207714 9.440890824773858, -0.2670996291562915 10.66906452266423 M0.43661186626313564 10.523976378540635 C-0.319154294285386 7.0049248937503545, -0.471558261379842 3.738377959657368, 0.18413120433193308 -0.9138105015484195 M0.4771705646111563 10.131936307157485 C-0.43771966876985274 5.984397160362228, 0.34764275939118655 2.630132943317739, -0.2890811437906464 0.035998785395483235" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(276.90805879454615 74.25677579426247) rotate(0 20.466560657864875 3.723775416737709)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.22190578874665 6.765809697707288 C1.0721787371253746 5.41984456033503, 2.2678321152729106 2.916337001928983, 5.7152696225368445 0.31930391526196533 M-0.18723219890611864 6.094664889996045 C1.3869990974513642 4.786939376648256, 2.5138827803310457 2.8160213826936626, 4.968723121988124 0.5731759279814774 M2.0783222737999365 9.734363247241387 C6.27656681129308 6.573472478348052, 8.190212046553752 3.6561256380290934, 10.161124685452265 0.7483023637987434 M2.447243152726229 9.214109898285502 C5.09957077526117 6.521867571872281, 7.766692362903283 3.360427454508457, 10.288060638214407 -0.12632185485328118 M6.8262292893828915 9.31316891032884 C10.789447517471787 5.061950670922328, 12.457650789715885 1.6618138241281624, 16.14113470977493 -0.24845911758719186 M7.622172767691509 9.900810345631633 C10.311476638152266 7.542720994565312, 12.063307138288613 5.43847139303136, 15.731594411752601 0.8318150465306811 M13.851137234530144 8.583943552598642 C17.13916673817426 6.131150473371238, 19.213119728225333 2.537748329122995, 21.00933656543296 -0.3174281905241785 M12.195239648342795 9.85156070313445 C15.025402093021123 7.6782999692829454, 15.627667912066311 6.195532949593999, 20.747801728404948 -0.5655463696759668 M19.10447041259441 9.58903835090376 C21.802023880030866 6.2682109601358675, 23.92315698596795 4.456695227043586, 25.621242745828244 0.9451601093646824 M18.469221099441082 9.06944312667973 C19.754293144461354 7.363248694882014, 22.696825743639234 4.453809205778706, 26.789409726534398 0.3334857761871757 M23.61641168317539 8.64546500480648 C26.532851842403293 6.997361470180589, 28.280656086356704 2.9824917956708026, 32.778975218484945 -1.3913958206527643 M23.84547746933778 9.380028849597162 C25.281100016210285 6.582765820413662, 28.813650189413572 3.9523304701267556, 31.963952845145783 -0.13637927530675786 M28.52427495133327 8.672538711888839 C29.902019817063472 7.552328648189809, 31.601133233770312 6.122301698090209, 36.2075138067839 0.13415439735171408 M29.40336125858323 9.8476759748665 C31.43108049844547 7.343298662000388, 32.79343483052502 4.158423632127747, 37.10865768902128 0.45049387766984317 M34.38828061788957 8.629423490537194 C34.96371360870894 6.6528660899580565, 36.26534134183709 5.872427868460862, 42.87175102546423 0.06829175473542226 M33.74906333412658 9.771918967177955 C37.024090767259956 6.2322364275877415, 39.20840750567637 3.4620295993584587, 41.373717255104175 0.4801627046370626 M0.08601201075366216 7.5223199337165045 C0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045 M0.08601201075366216 7.5223199337165045 C0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045, 0.08601201075366216 7.5223199337165045 M5.878618280689216 7.578832451128744 C3.755504423575922 5.613227764687636, 2.929682407825628 3.3198494842177735, -0.3697959471143829 2.6163453790786493 M5.4314236630750665 7.447445598491811 C4.110542344995836 6.106601992724861, 2.205831375673408 3.9902631095696997, -0.27088178243712063 2.1019885152074327 M11.49248364266017 8.502888704864642 C9.001309324255525 4.824470815718261, 4.461354327231967 0.9043474546232866, 1.0721056296850238 -2.5429449563182427 M11.461957775671124 7.788763276608993 C8.452919398913625 4.61682288924478, 6.18342028712583 1.808832043675251, 1.3417791610943768 -2.33191029543872 M18.896146197629353 8.72724817618294 C16.23100988195955 4.473481507794345, 11.843550077591972 2.4405503928366254, 7.485909748069735 -1.7751778540795324 M18.12126160926484 8.073283232837305 C16.021027796140693 5.303368759999104, 14.302542859979681 3.2198121991431172, 8.096270086515489 -2.07128073914327 M23.821990308545203 6.107582653239736 C22.291681509013934 3.895122430824865, 18.823736302168165 2.7894772049563223, 13.041713537635484 -0.811195267586702 M24.22149385167998 6.943113312144199 C20.678568972255633 3.9145751632492587, 17.372352309155207 2.114532526268973, 13.56757137876583 -1.9273460646520495 M30.06746614171263 7.100694031906669 C26.368259923969323 6.048964022188449, 25.35674326163699 2.9890870469304116, 21.20474060218719 -2.9025937384681084 M31.244708551057393 7.769035512940382 C28.460234402167288 5.438677346843627, 25.624542515996648 3.381645465041949, 19.655412867236258 -1.101687554804353 M36.31802148004546 6.016687154593348 C31.100833522807687 3.4325719060139788, 27.43749500765828 -0.7685587182766094, 25.471962162138677 -1.091416861814145 M35.88976596589684 7.494763686424523 C34.329396640174956 5.697696344483978, 31.766217996385354 3.5336439493391474, 25.55445559167654 -2.051079639651511 M43.12580757210929 8.018393625538309 C39.54757881481012 5.854526371140593, 38.06397158156685 1.9432584406019284, 32.56330323702365 -0.7372225354273714 M42.461512996814804 6.354949585707372 C37.75143636816383 3.951126429562429, 34.59500996863839 0.6365699827301263, 31.859046574157553 -2.21059574497276 M41.59417692662653 1.4754652212756159 C40.287540372754506 0.19557656124804879, 39.80500898485996 -0.39432512120034796, 38.75116602549329 -1.7833917638461092 M41.29845725258318 1.1910794743718986 C40.437857045574795 -0.08115245927498321, 39.15800863308185 -0.9226217620265614, 38.35976123301312 -1.6841295741672615" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.623223165050149 -0.3945998642593622 C14.959848085213407 -1.4504155962185101, 29.826847532354204 -0.4777971951679424, 42.64301446225953 -1.7479238603264093 M-0.8115846561267972 -0.0029358966276049614 C16.587734911121398 0.028838507206891517, 32.48967422171676 0.2575556853131995, 41.216897862558184 0.177691956050694 M41.61353536578022 0.5674343603635625 C40.99521641728386 2.5070270931659264, 41.27481517037391 2.841192688659975, 40.685970211094045 6.703504025061495 M40.932720209262705 0.0018218970487791641 C41.016152170754104 3.0662554020959414, 40.83783732897718 5.674452055020397, 40.78366509380182 7.488754538920284 M39.145384402540444 7.599014273163527 C31.207001165597347 8.655594468472795, 21.428997338444287 5.693371713517505, 1.076906243339181 6.788823059078902 M41.397492277030764 7.098223795433193 C31.252103921823235 7.0067145640055815, 21.836325056056626 7.537660669724194, 0.8338009966537356 7.855460305948406 M0.5215136295353202 7.220634819230126 C0.3432204667989164 4.723394753546112, 0.407752950254909 2.823309799308984, -0.6919641903287299 0.4620567792280694 M0.22464935613138381 7.117852027912996 C-0.20157714426386267 5.460732283004965, -0.045202641491178926 2.725011536581349, 0.027259339159261542 -0.14667171839629176" stroke="transparent" stroke-width="1" fill="none"></path></g><g><g transform="translate(266.5284472999449 25.533511686439113) rotate(180.5394402074999 1.6078565315667674 8.902164203595532)"><path d="M0 0 C0.5359521771889122 1.244388544588623, 3.1085226276956908 4.498943199666563, 3.215713063133473 7.4663312675317375 C3.3229034985712556 10.433719335396912, 1.0719043543778244 16.08132888391449, 0.6431426126266947 17.804328407191043 M0 0 C0.5359521771889122 1.244388544588623, 3.1085226276956908 4.498943199666563, 3.215713063133473 7.4663312675317375 C3.3229034985712556 10.433719335396912, 1.0719043543778244 16.08132888391449, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(266.5284472999449 25.533511686439113) rotate(180.5394402074999 1.6078565315667674 8.902164203595532)"><path d="M0.2272060114637523 8.422212505749702 C0.3440327296232073 11.057426158725725, 0.4608594477826623 13.692639811701747, 0.6431426126266947 17.804328407191043 M0.2272060114637523 8.422212505749702 C0.38071728811769656 11.884905102935473, 0.5342285647716407 15.347597700121243, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(266.5284472999449 25.533511686439113) rotate(180.5394402074999 1.6078565315667674 8.902164203595532)"><path d="M6.355224544705579 10.34985176255103 C4.750836317949665 12.443637304757814, 3.1464480911937516 14.5374228469646, 0.6431426126266947 17.804328407191043 M6.355224544705579 10.34985176255103 C4.247045107518117 13.101103331434729, 2.138865670330654 15.85235490031843, 0.6431426126266947 17.804328407191043" stroke="#495057" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(268.4439961392029 77.80165778052037) rotate(180.04179857414044 1.625365556812426 7.675218932235623)"><path d="M0 0 C0.5359521771889122 1.2443885445886225, 2.9277252832505862 4.907924956786525, 3.215713063133473 7.466331267531735 C3.50370084301636 10.024737578276945, 1.9758910766033457 14.036420098314668, 1.7279266792973202 15.350437864471255 M0 0 C0.5359521771889122 1.2443885445886225, 2.9277252832505862 4.907924956786525, 3.215713063133473 7.466331267531735 C3.50370084301636 10.024737578276945, 1.9758910766033457 14.036420098314668, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(268.4439961392029 77.80165778052037) rotate(180.04179857414044 1.625365556812426 7.675218932235623)"><path d="M1.0370742013603782 7.303718897883263 C1.2982778815585747 10.346094332172807, 1.5594815617567712 13.388469766462352, 1.7279266792973202 15.350437864471255 M1.0370742013603782 7.303718897883263 C1.2847199883433453 10.188178481877603, 1.5323657753263125 13.072638065871942, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g><g transform="translate(268.4439961392029 77.80165778052037) rotate(180.04179857414044 1.625365556812426 7.675218932235623)"><path d="M6.371034227911183 8.742222101837328 C4.615526622473215 11.240715386392447, 2.8600190170352477 13.739208670947566, 1.7279266792973202 15.350437864471255 M6.371034227911183 8.742222101837328 C4.70664702565816 11.111029981021195, 3.042259823405138 13.479837860205063, 1.7279266792973202 15.350437864471255" stroke="#495057" stroke-width="2" fill="none"></path></g></g></svg>
<figcaption>Fillna, forward fill and backward fill</figcaption></p>
</figure>
<h3 id="fill-with-constant"><a class="toclink" href="#fill-with-constant">Fill with Constant</a></h3>
<p>The simplest way to fill in missing data is with some constant value. Using Pandas for example, this is done using the <code>fillna</code> function:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="s1">'D'</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="s1">'G'</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="s1">'X'</span><span class="p">)</span>
<span class="go"> 0</span>
<span class="go">0 A</span>
<span class="go">1 B</span>
<span class="go">2 X</span>
<span class="go">3 D</span>
<span class="go">4 X</span>
<span class="go">5 X</span>
<span class="go">6 G</span>
</pre></div>
<p>In SQL, if your missing values are NULL, you can use a condition expression <code>CASE</code>, or use the shorter <code>COALESCE</code> function:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'A'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">4</span><span class="p">,</span><span class="w"> </span><span class="s1">'D'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">6</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">7</span><span class="p">,</span><span class="w"> </span><span class="s1">'G'</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="s1">'X'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">v</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> n β v</span>
<span class="go">ββββΌβββ</span>
<span class="go"> 1 β A</span>
<span class="go"> 2 β B</span>
<span class="go"> 3 β X</span>
<span class="go"> 4 β D</span>
<span class="go"> 5 β X</span>
<span class="go"> 6 β X</span>
<span class="go"> 7 β G</span>
</pre></div>
<p>The function <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL" rel="noopener"><code>COALESCE</code></a> accepts any number of arguments and return the first one that is not NULL.</p>
<h3 id="back-and-forward-fill"><a class="toclink" href="#back-and-forward-fill">Back and Forward Fill</a></h3>
<p>Filling values with constants is easy, but not always possible. Another common interpolation technique is filling empty values with previous or following non-missing values.</p>
<p>Pandas offers several variations on back and forward filling, for example:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="s1">'D'</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">,</span> <span class="s1">'G'</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="s1">'ffill'</span><span class="p">)</span> <span class="c1"># or df.ffill()</span>
<span class="gp">>>> </span> <span class="mi">0</span>
<span class="go">0 A</span>
<span class="go">1 B</span>
<span class="go">2 B</span>
<span class="go">3 D</span>
<span class="go">4 D</span>
<span class="go">5 D</span>
<span class="go">6 G</span>
<span class="gp">>>> </span><span class="n">df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="s1">'bfill'</span><span class="p">)</span> <span class="c1"># or df.bfill() or df.backfill()</span>
<span class="go"> 0</span>
<span class="go">0 A</span>
<span class="go">1 B</span>
<span class="go">2 D</span>
<span class="go">3 D</span>
<span class="go">4 G</span>
<span class="go">5 G</span>
<span class="go">6 G</span>
</pre></div>
<p>To achieve the same using SQL, you can use a subquery:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'A'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">4</span><span class="p">,</span><span class="w"> </span><span class="s1">'D'</span><span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">6</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mf">7</span><span class="p">,</span><span class="w"> </span><span class="s1">'G'</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Find the next not null value</span>
<span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">v</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">tb_</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tb_</span><span class="mf">.</span><span class="n">n</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tb</span><span class="mf">.</span><span class="n">n</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="k">DESC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">ffill_v</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Find the previous not null value</span>
<span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">v</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">tb_</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tb_</span><span class="mf">.</span><span class="n">n</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">tb</span><span class="mf">.</span><span class="n">n</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="k">ASC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">bfill_v</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> n β v β ffill_v β bfill_v</span>
<span class="go">ββββΌββββΌββββββββββΌβββββββββ</span>
<span class="go"> 1 β A β A β A</span>
<span class="go"> 2 β B β B β B</span>
<span class="go"> 3 β Β€ β B β D</span>
<span class="go"> 4 β D β D β D</span>
<span class="go"> 5 β Β€ β D β G</span>
<span class="go"> 6 β Β€ β D β G</span>
<span class="go"> 7 β G β G β G</span>
</pre></div>
<p>The SQL version is a bit longer, but it is fairly expressive, and it gives great flexibility.</p>
<p>NOTE: It's tempting to use the window function <code>LEAD</code> and <code>LAG</code> here, but these function can only be used when filling single row gaps. Once you have more than one consecutive missing row, <code>LEAD</code> and <code>LAG</code> may leave you with missing values.</p>
<h3 id="linear-interpolation"><a class="toclink" href="#linear-interpolation">Linear Interpolation</a></h3>
<p>Another common interpolation technique for discrete data is <a href="https://en.wikipedia.org/wiki/Linear_interpolation" rel="noopener">linear interpolation</a>.</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="gp">... </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">...</span>
<span class="gp">... </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-01'</span><span class="p">),</span> <span class="mi">10</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-02'</span><span class="p">),</span> <span class="mi">12</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-03'</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-04'</span><span class="p">),</span> <span class="mi">14</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-05'</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-06'</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">NaN</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-07'</span><span class="p">),</span> <span class="mi">18</span><span class="p">),</span>
<span class="gp">... </span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">datetime64</span><span class="p">(</span><span class="s1">'2021-01-08'</span><span class="p">),</span> <span class="mi">15</span><span class="p">)</span>
<span class="gp">... </span><span class="p">],</span> <span class="n">columns</span><span class="o">=</span><span class="p">(</span><span class="s1">'t'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">))</span>
<span class="gp">>>> </span><span class="c1"># Assume data is evenly distributed</span>
<span class="gp">>>> </span><span class="n">df</span><span class="p">[</span><span class="s1">'c'</span><span class="p">]</span><span class="o">.</span><span class="n">interpolate</span><span class="p">(</span><span class="s1">'linear'</span><span class="p">)</span>
<span class="go">0 10.000000</span>
<span class="go">1 12.000000</span>
<span class="go">2 13.000000</span>
<span class="go">3 14.000000</span>
<span class="go">4 15.333333</span>
<span class="go">5 16.666667</span>
<span class="go">6 18.000000</span>
<span class="go">7 15.000000</span>
</pre></div>
<p>Linear interpolation works by filling missing values along a linear line between two known coordinates (x1, y1) and (x2, y2). In this case, The two known coordinates are the last and the next known date and temperatures.</p>
<p>You already found the next and previous known value for each row when you implemented back and forward fill:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span>
<span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-02'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-03'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-04'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">14</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-05'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-06'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-07'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">18</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-08'</span><span class="o">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mf">15</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">)</span>
<span class="p">),</span>
<span class="n">temperatures_with_previous_values</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Last known temperature</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">array</span><span class="p">[</span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="p">),</span><span class="w"> </span><span class="n">c</span><span class="p">]</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">temperatures_</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="mf">.</span><span class="n">t</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">temperatures</span><span class="mf">.</span><span class="n">t</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="mf">.</span><span class="n">t</span><span class="w"> </span><span class="k">DESC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">last_known_temperature</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Next known temperature</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">array</span><span class="p">[</span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="p">),</span><span class="w"> </span><span class="n">c</span><span class="p">]</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">temperatures_</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="mf">.</span><span class="n">t</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">temperatures</span><span class="mf">.</span><span class="n">t</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="mf">.</span><span class="n">t</span><span class="w"> </span><span class="k">ASC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">next_known_temperature</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">temperatures_with_previous_values</span><span class="p">;</span>
<span class="go"> t β c β last_known_temperature β next_known_temperature</span>
<span class="go">βββββββββββββΌβββββΌβββββββββββββββββββββββββΌββββββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β Β€ β {1609545600,12}</span>
<span class="go"> 2021-01-02 β 12 β {1609459200,10} β {1609718400,14}</span>
<span class="go"> 2021-01-03 β Β€ β {1609545600,12} β {1609718400,14}</span>
<span class="go"> 2021-01-04 β 14 β {1609545600,12} β {1609977600,18}</span>
<span class="go"> 2021-01-05 β Β€ β {1609718400,14} β {1609977600,18}</span>
<span class="go"> 2021-01-06 β Β€ β {1609718400,14} β {1609977600,18}</span>
<span class="go"> 2021-01-07 β 18 β {1609718400,14} β {1610064000,15}</span>
<span class="go"> 2021-01-08 β 15 β {1609977600,18} β Β€</span>
</pre></div>
<p>There are two main differences here from what you've done before:</p>
<ol>
<li>
<p><strong>You converted the date to a number</strong>: This is called "epoch", the number of seconds since 1970. To convert the date you used the function <code>extract('epoch' FROM t)</code>.</p>
</li>
<li>
<p><strong>You keep two values from the previous and next row</strong>: To implement linear interpolation we need coordinates, which are both the date and the temperature. To return multiple values from the previous row, you constructed an array <code>array[extract('epoch' FROM t), c]</code>.</p>
</li>
</ol>
<p>To calculate a missing value with two known coordinates using linear interpolation, use the following formula:</p>
<div class="highlight"><pre><span></span>y = y0 + (x - x0) * ((y1 - y0) / (x1 - x0))
</pre></div>
<p>You already have all the data available, so just organize it a bit:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span>
<span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">),</span>
<span class="n">temperatures_with_previous_values</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">),</span>
<span class="c1">-- This step is just for convenience</span>
<span class="n">temperatures_prep</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">t</span><span class="p">,</span>
<span class="w"> </span><span class="n">c</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">t</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">last_known_temperature</span><span class="p">[</span><span class="mf">1</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x0</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">last_known_temperature</span><span class="p">[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">y0</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">next_known_temperature</span><span class="p">[</span><span class="mf">1</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x1</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">next_known_temperature</span><span class="p">[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">y1</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures_with_previous_values</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">x0</span><span class="p">,</span><span class="w"> </span><span class="n">y0</span><span class="p">,</span><span class="w"> </span><span class="n">x1</span><span class="p">,</span><span class="w"> </span><span class="n">y1</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">temperatures_prep</span><span class="p">;</span>
<span class="go"> t β c β x β x0 β y0 β x1 β y1</span>
<span class="go">βββββββββββββΌβββββΌβββββββββββββΌβββββββββββββΌβββββΌβββββββββββββΌββββ</span>
<span class="go"> 2021-01-01 β 10 β 1609459200 β Β€ β Β€ β 1609545600 β 12</span>
<span class="go"> 2021-01-02 β 12 β 1609545600 β 1609459200 β 10 β 1609718400 β 14</span>
<span class="go"> 2021-01-03 β Β€ β 1609632000 β 1609545600 β 12 β 1609718400 β 14</span>
<span class="go"> 2021-01-04 β 14 β 1609718400 β 1609545600 β 12 β 1609977600 β 18</span>
<span class="go"> 2021-01-05 β Β€ β 1609804800 β 1609718400 β 14 β 1609977600 β 18</span>
<span class="go"> 2021-01-06 β Β€ β 1609891200 β 1609718400 β 14 β 1609977600 β 18</span>
<span class="go"> 2021-01-07 β 18 β 1609977600 β 1609718400 β 14 β 1610064000 β 15</span>
<span class="go"> 2021-01-08 β 15 β 1610064000 β 1609977600 β 18 β Β€ β Β€</span>
</pre></div>
<p>Now that you have all the data neatly organized, you can use the formula to calculate missing values:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span>
<span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">),</span>
<span class="n">temperatures_with_previous_values</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">),</span>
<span class="n">temperatures_prep</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* ... */</span><span class="w"> </span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">t</span><span class="p">,</span>
<span class="w"> </span><span class="n">c</span><span class="p">,</span>
<span class="w"> </span><span class="k">CASE</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="n">c</span>
<span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">y0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">x0</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">((</span><span class="n">y1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">y0</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">x1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">x0</span><span class="p">))</span>
</span><span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">interpolated_c</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures_prep</span>
<span class="p">;</span>
<span class="go"> t β c β interpolated_c</span>
<span class="go">βββββββββββββΌβββββΌββββββββββββββββββββ</span>
<span class="go"> 2021-01-01 β 10 β 10</span>
<span class="go"> 2021-01-02 β 12 β 12</span>
<span class="go"> 2021-01-03 β Β€ β 13</span>
<span class="go"> 2021-01-04 β 14 β 14</span>
<span class="go"> 2021-01-05 β Β€ β 15.333333333333334</span>
<span class="go"> 2021-01-06 β Β€ β 16.666666666666668</span>
<span class="go"> 2021-01-07 β 18 β 18</span>
<span class="go"> 2021-01-08 β 15 β 15</span>
</pre></div>
<p>And there it is, the missing temperatures were filled using linear interpolation.</p>
<p><details markdown="1"></p>
<p><summary> The complete query </summary></p>
<div class="highlight"><pre><span></span><span class="k">WITH</span>
<span class="n">temperatures</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-02'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mi">12</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-03'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-04'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mi">14</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-05'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-06'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">null</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-07'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mi">18</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'2021-01-08'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="mi">15</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">)</span>
<span class="p">),</span>
<span class="n">temperatures_with_previous_values</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Last known temperature</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="nb">ARRAY</span><span class="p">[</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="p">),</span><span class="w"> </span><span class="k">c</span><span class="p">]</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">temperatures_</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">temperatures</span><span class="p">.</span><span class="n">t</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="k">DESC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">last_known_temperature</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- Next known temperature</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="nb">ARRAY</span><span class="p">[</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="p">),</span><span class="w"> </span><span class="k">c</span><span class="p">]</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">temperatures_</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">temperatures</span><span class="p">.</span><span class="n">t</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">temperatures_</span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="k">ASC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">next_known_temperature</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures</span>
<span class="p">),</span>
<span class="c1">-- This step is just for convenience</span>
<span class="n">temperatures_prep</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">t</span><span class="p">,</span>
<span class="w"> </span><span class="k">c</span><span class="p">,</span>
<span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">t</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x</span><span class="p">,</span>
<span class="w"> </span><span class="n">last_known_temperature</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x0</span><span class="p">,</span>
<span class="w"> </span><span class="n">last_known_temperature</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">y0</span><span class="p">,</span>
<span class="w"> </span><span class="n">next_known_temperature</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">x1</span><span class="p">,</span>
<span class="w"> </span><span class="n">next_known_temperature</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">y1</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures_with_previous_values</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">t</span><span class="p">,</span>
<span class="w"> </span><span class="k">c</span><span class="p">,</span>
<span class="w"> </span><span class="k">CASE</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="k">c</span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">y0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">x0</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">((</span><span class="n">y1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">y0</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">x1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">x0</span><span class="p">))</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">interpolated_c</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">temperatures_prep</span><span class="p">;</span>
</pre></div>
<p></details></p>
<h2 id="binning"><a class="toclink" href="#binning">Binning</a></h2>
<p>Binning, or "bucketing", is a technique to group values together.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 354.6376470465739 124.44993896436506" width="auto" height="10em">
<g transform="translate(260.5880871908248 59.638840955241676) rotate(0 3.418704253333999 5.925614611875517)"><path d="M0.5413520447909832 -0.39544589444994926 L6.26863751316705 0.855372715741396 L8.533770059591916 12.919750414185707 L-1.4136165268719196 11.167860589318458" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.35073682901796166 -0.3734706852702168 C1.216740465996737 0.37998135642491593, 3.237658780165063 0.5314287997125366, 7.019585966566325 -0.3866545242318275 M-0.043768854286268255 -0.007136692042380988 C2.718552018051451 -0.16866268236878879, 5.267560554706127 0.038873387001724324, 6.690779955946089 0.17851420546694102 M6.224623388834969 0.9177195643526146 C6.218548579405479 3.5228732935074394, 7.037162422629263 5.209028266345023, 5.982906987345757 11.226580362848757 M7.030287131099407 0.1184017497519323 C6.284969085939742 2.698506865620281, 6.725575579008778 6.026164079229053, 6.521075781721182 11.277995196769245 M6.7027604623417805 11.89600799149974 C4.6939748158465004 11.949675928572411, 3.0843331674895174 11.258927853457717, -0.2055618208463153 11.62380002786288 M6.808736897771368 11.735477924124854 C4.977049759497223 11.702353058150958, 2.6388813722139695 11.78218720064513, -0.16885687236209834 11.835824231822917 M0.13575606481630187 11.202473237144009 C-0.3036097236343241 6.881351705788765, -1.0565324210418943 3.3192393254195984, -0.6772691166696445 0.3434058791323369 M-0.473492008924767 12.199147342524112 C-0.11350163714764831 9.816441097197913, 0.18862336179322597 6.968460042906614, 0.5670266927040463 0.5315912640600987" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(174.8028953711496 37.51479258981385) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M-0.568770993500948 0.855372715741396 L8.199391794178837 1.0685211904346943 L5.089413714382999 12.46357636779286 L1.1126473061740398 13.745061329302565" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.27789115710753975 0.36673692259476187 C2.213359555528252 0.5462985823253836, 4.181851665554589 -0.33222484011725945, 7.032982993720306 -0.27104879191893483 M0.10474147333351697 -0.1303532935008165 C2.2997506187157724 0.06839091766884298, 4.363581723828221 0.17114163945868743, 6.382498960798491 0.09571793855256511 M5.7778073568609924 0.4512993466085353 C6.617943494869029 2.799551172851278, 5.487126365313331 7.176022081139574, 5.917554283946831 12.309986567281335 M5.973887488494677 -0.3650681660822894 C6.505459542883567 3.653262285889667, 6.194905578839252 6.735017593862774, 6.137256022221531 13.206756030276877 M6.053474087246765 13.233749942303414 C4.092423174458367 12.58104126633011, 2.269157050631342 12.934896947892069, -0.0871359790881382 13.166058482387584 M6.597424017710505 12.955638242519754 C3.7893533007813422 13.009550042901152, 1.4463876448663857 13.110275265655359, 0.2812201507107731 12.849211342309479 M-0.45718742049394456 12.714851074608132 C-1.3074883348779878 10.727878632564057, -0.766375994326837 7.294781860051154, -1.0585760469526726 1.0214721972829106 M-0.02376375145199494 13.775197357257204 C0.41644505106471863 8.375366770671922, 0.7340726653404296 3.4846916803776797, 0.17792813899273308 -0.12997263564223294" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(52.70064130337937 76.98949030589074) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M-0.39544589444994926 -0.568770993500948 L7.086720616348657 1.6963615529239178 L7.2998690910419555 9.831552090818498 L-0.6833686344325542 12.357815923864457" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.3403666415905027 -0.26628147405096936 C2.437822728240463 0.1683686432938971, 4.662253301416302 0.13406465491491237, 5.878966021495809 0.5078124887912198 M-0.006504103262544858 0.10036560430285985 C1.4847719835123816 0.054193726717284854, 3.2835916790937816 -0.12565370625704853, 6.394038797074859 -0.11549574791531775 M7.102136158100116 -0.6203154891906103 C6.616543146125512 3.8610205130031563, 5.259769321929372 6.598367463526009, 5.6386429891524426 10.744384910112824 M6.343694697336305 -0.4525994043947522 C5.896967772803858 2.6231375857076866, 6.181684879428546 5.015169619341351, 5.687428517300673 10.932305608958105 M6.272157526086081 10.81439393380348 C4.388423862226458 10.85782523587991, 1.8328005021799307 11.04555627914461, -0.20727011424756092 11.161672988696779 M6.125856672868769 11.335618829195724 C4.160314186672768 10.985385900466344, 2.3453525847604997 11.086053001003673, -0.014039509854705035 11.514640007293224 M-0.6155792216482681 10.854115819304855 C-0.2988071889004178 7.795840164657227, 0.6673769622987722 5.914820982668148, 0.3258444286446007 -0.9054473218391023 M0.33012591663591484 11.224842420842304 C0.43627920037135404 7.264255705181803, 0.2020444556173999 3.288675445663353, 0.5044061917279303 0.15218987563015252" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(286.30237290511013 34.87693619333663) rotate(0 3.418704253333999 5.925614611875517)"><path d="M0.855372715741396 1.6963615529239178 L7.905929697102692 -1.4136165268719196 L6.154039872235444 12.963876529925052 L0.5981163270771503 11.524453564875309" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.38559410939703354 0.21981785258644082 C2.939000933703016 0.21139545967758455, 4.417264237804895 0.5864203075359734, 6.552422700538489 0.3160677384779764 M-0.13705590852099114 0.05071083519189562 C2.034165004277385 -0.1686539823807755, 4.105883892322667 -0.14875429062744083, 6.938048150498678 -0.0024966368795839577 M7.244229407210403 0.16486632699035852 C5.906103943518213 2.0880821299700325, 6.3045316022010045 5.357393753180753, 6.082937695266932 11.618096669761554 M6.50832015056677 -0.03638584651911325 C6.83206249240442 3.624793291041305, 6.287854829636618 7.552527208972161, 6.8913247741678685 11.913440174281817 M6.9286768559899565 11.43967764713153 C4.321354273201081 11.598400221950497, 2.7768302954315005 11.712345632823126, 0.02009627312260931 12.373902879701523 M6.636264975472892 11.571914373350413 C4.722219783759924 11.702221830360102, 2.834244846325624 12.166758867104344, -0.31304277905953215 11.636040977350373 M-0.3895082988113564 10.68617785937389 C0.6936241386310691 7.482835275139762, 0.4082814777930372 3.5147329739815176, 0.9207995586533018 1.0654149977289065 M0.5663340546859216 12.131932541409569 C0.2498915578790219 9.809871167179288, 0.46200764030547886 6.79485190453858, -0.11716299851793993 -0.1685158854950083" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(144.5171810854349 26.943364018385182) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M1.0685211904346943 -1.4136165268719196 L5.819661606822365 1.1126473061740398 L7.101146568332069 12.820169343349711 L-1.787829589098692 12.824849619266764" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.5107779437944855 0.17326820970817736 C1.3052724366207784 0.509045969704882, 3.074987038618937 0.279703903233006, 6.41977349777642 -0.013575355086321572 M0.06703338216021082 -0.1394577929117512 C2.3301761229935654 -0.13017745252558488, 4.416501976639585 -0.012020712820018153, 6.334905902529241 0.2517864589107759 M5.36809896264147 -0.9479256764634485 C5.8552173541258945 2.6333846589656105, 5.711201740971692 6.179099885541026, 6.930963010021674 13.409638645689215 M6.470830373483331 -0.3509179392923429 C5.7941098945787255 2.520023776113754, 6.489827636089923 5.840608573691585, 6.373579848486399 13.189995230706414 M5.932867615744461 12.951436025965277 C4.785370378938097 13.020666527175793, 3.5194378846804444 13.604915204391986, -0.054538891317781 12.926763861607775 M6.3876533959924195 12.986345946845942 C4.348681993458977 13.375034118942379, 1.9259033307636855 12.88914933944114, 0.037246169923239036 12.968951659912642 M0.6974907259960728 12.39562885936345 C0.4241604419023294 8.601472875099061, 1.0646815900446387 7.355051539071148, -1.0505194495524037 0.7719132397879549 M0.3544039663072258 13.775965722610604 C0.46406060286734396 9.428317143454514, -0.25562310664859067 5.166892982437693, 0.28113845125640813 0.5575493010045112" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(131.57041621012468 56.1811497768241) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M-1.9218490831553936 -1.577092994004488 L6.647116650615089 -0.804933775216341 L7.5447667732576065 14.811081303951518 L1.5234206207096577 14.183727278110759" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.4893377873654036 -0.5178916292806333 C2.269789526087484 -0.023915123021799628, 5.157758671951252 0.30812233379715126, 7.136988225156306 -0.17386839355020312 M0.2090641374290303 -0.31765356678566364 C2.000196918323486 -0.24945013537424998, 4.026270842565144 0.04934936523293648, 6.237893041664345 0.17605637721941747 M6.2354250698869835 0.6189320850826714 C5.297420527229066 5.190346577494738, 6.438760719433888 10.22013930964752, 7.576749372497783 12.12715245305963 M6.808035258447596 -0.36652308451806553 C5.804046496778173 4.485887944527506, 6.376496583408962 9.933823073896948, 6.489912005162773 13.599319705496239 M7.060217989863038 12.623680234409143 C5.599605176743898 13.72820002345237, 3.1843398141644426 13.25346868679514, 0.5232897493328144 13.796371964150499 M6.618190892955687 13.27651302530106 C4.334378922095977 13.223812289692974, 2.317300863280036 13.329931388161018, -0.2832339342097553 13.132485977470946 M-1.269331181543272 13.913358098707345 C-0.02921351295479027 8.747424105512408, 0.3759059839902933 6.273982382526693, -0.5703337203059251 -0.8761782272946914 M0.07806402183337602 13.4609052833164 C-0.6837670612220339 9.199648816762743, -0.6995778672508262 6.166789712450386, -0.035367921792127 0.5432029613190863" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(198.71327335298201 69.03829263396676) rotate(0 3.2515151206274595 6.57347250111269)"><path d="M-0.06175107881426811 -0.7863254435360432 L5.787545312915199 -1.1729758866131306 L5.784007538829201 13.202076210376994 L1.2385486848652363 15.130512132046" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.1323687382754939 0.30614976071583666 C1.8344132837420999 0.06442699648659747, 4.864808044004281 0.08653333234281593, 7.0341367439370535 -0.5044321540790633 M0.15086827017247828 -0.18129768568557564 C2.160044805155502 -0.17719428687261837, 4.896037497308674 0.06474797978223559, 6.496541411644336 0.22376349602519208 M7.629476899982304 -1.0578657747030922 C7.592237484962479 4.08944355422452, 5.66077066378114 6.081085371212973, 7.560946520201707 14.45986839864865 M6.735846435884558 0.26194306503699427 C5.99687975481018 4.053034251775203, 5.825299285911793 8.17384597853533, 5.930426267772395 13.117713706017245 M5.875165816603681 13.52604505496559 C4.91654617106522 12.522826396211464, 3.5189042608817678 12.897389409942155, -0.28211097179831834 12.713550513402193 M6.541643979629157 13.302242921410592 C4.381799906603661 12.962842158108852, 2.5601540153324995 13.276488082908982, -0.01749445707315339 13.415635963053143 M-0.5291210036601455 13.831727646877834 C1.1975851960278536 11.467600067285154, 0.7851957522951956 6.810733726692844, 1.0888220550599859 -0.3273100961841877 M-0.537635407189728 13.017638939830205 C0.3785324458701703 8.217211313679348, -0.31879994256320354 4.366538675989105, -0.0469848773657896 -0.1590024401452243" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(240.68894136313293 87.54329338034711) rotate(0 3.418704253333999 5.925614611875517)"><path d="M-0.5280839912593365 0.8576579205691814 L6.8428923016253975 0.7082663886249065 L7.5642538195315865 13.401326462023441 L0.7437886483967304 11.27307598232526" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.6245096570229581 -0.42845174398930097 C1.7205163335593763 -0.2733019403896382, 3.0432581262077005 -0.6756285508428223, 6.70517684351305 0.44996967273895516 M0.14051240514935448 -0.03492965016313049 C1.473746071063161 0.02275343198942499, 2.6401218935607744 0.18581814583843975, 6.633757027411739 -0.31834514233311473 M6.267815126660882 0.009752778871888301 C5.861639144736693 3.2690328404425655, 6.67852897665589 8.326063234409215, 7.334081081687452 11.696378339502495 M6.246758829145822 0.43668282720384466 C7.297368891054656 3.9465679995049907, 7.211357461294815 9.263291613617943, 7.147915664898171 12.092664874518391 M7.421856445846114 11.222721077097173 C5.201377123488215 12.229764304214495, 3.155467630173527 12.069482017495702, 0.2798831521047187 11.371655294663737 M6.59782119660979 11.524933582241998 C4.238919728716749 11.890230904500907, 1.651772307585075 11.714869406095232, -0.07716757107884104 11.915560109740566 M-0.6017302335257723 10.723708270179705 C-1.0505609661554558 9.279805754843665, -0.6877081528865189 5.689561014199449, -0.5225950347922365 0.34160122035760776 M0.025825076418398263 11.704178677646954 C-0.5551721894146369 8.787239171134098, 0.10674936314163697 5.115857737879583, -0.4609727838624198 0.25074069203261096" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(78.13482880902075 57.132037969091584) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M-0.6962600462138653 -1.1686907894909382 L4.358248383530054 -0.38678883388638496 L7.5475473709186645 9.509800019437883 L0.02239375188946724 12.21396607416353" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.3424927896212612 -0.5428020832879004 C1.3885623806989784 -0.27655735853917185, 3.0816551326287245 -0.3788408113581463, 5.629714274548423 0.36326208400143 M-0.09754390081406883 -0.13516249706955763 C1.6728247855650895 -0.08598658237852658, 3.908574567246816 -0.237967616946328, 6.268348441682167 0.14880991273023608 M6.8364812000580635 -0.06050352293136285 C7.112640138380135 1.2298326605489223, 5.296654850055974 4.701830819320092, 5.778767099188167 11.830893765569332 M5.814500537481281 0.4656590410067576 C6.177022045202987 4.485790091615716, 6.565125004204596 8.46203223711155, 5.771484349975099 11.134567373471505 M6.168273153373367 11.200629127050723 C3.7973881418733493 11.201207952674123, 1.4198889314851493 10.997525800134305, -0.36546104142548363 11.021144587327944 M6.35113365736597 11.355716143829587 C4.058058627291379 11.457660351883305, 2.574830744373604 11.26838902259551, 0.20260157722622568 11.548507691795638 M0.7194899035245867 12.163542777936296 C0.13840150967860376 8.398310162704156, 0.851279383841272 3.6741363391186277, -0.12688397125287398 -0.697347153280522 M-0.12378415918878094 11.742623856733642 C-0.32509866202486704 7.679048889467648, 0.2361684144749934 5.7250100987537955, -0.35232735480149974 -0.0007262282064878223" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(25.277685951878084 51.41775225480603) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M0.8220214135944843 -0.20434438064694405 L6.323749691971216 0.3268709294497967 L4.617983371742639 12.525995793516252 L-1.1913957111537457 9.382795515234086" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.011064524196265801 -0.554002394158998 C1.6320671138190603 -0.23141401002828066, 2.5876456392319005 -0.4516144260268524, 6.099991095276918 0.295273556979855 M-0.2945624188259882 -0.13287954362806456 C1.1320291204663193 0.13790095979473058, 2.8514074144059474 0.2277036592472644, 5.931955144367152 -0.24568537793130718 M7.167022566393749 0.856556087777335 C6.7690293960735275 2.620869657493798, 7.117407902478867 6.629165839127506, 6.584477581651465 11.74141160732962 M6.604747888931432 -0.48994089990636286 C5.916315675157368 3.003801314968836, 6.0991869286169225 5.985426229284842, 6.213987868267493 11.024109562666842 M6.248524987515926 11.631060005052161 C5.202880149237429 11.462337728171587, 3.159477763133827 10.649350582204091, 0.3013804996845644 11.478460133586474 M6.290497186797227 11.236754235279818 C4.552335466844971 11.505183905735784, 2.9013153715437983 11.343341593867846, 0.20507870762070296 11.480189271947667 M-0.3566958095832603 11.783931631701869 C1.065192472997072 8.021014645507371, 1.0075988752117364 4.749552588832005, -1.098529904997808 0.5800894317522769 M-0.19573904053589591 10.916615492946233 C-0.45611752672089234 6.735778325948769, 0.4404875954144636 1.9560005045018625, 0.006295537899494552 0.2723572688607435" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(83.84911452330653 22.84632368337759) rotate(0 3.1156739503036306 5.622584308845205)"><path d="M-1.8711885921657085 -0.1445324309170246 L5.506375104912195 0.8381790034472942 L5.97002330494729 12.845611515218827 L-1.020893406122923 11.69956165807924" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.33532588468155555 -0.03352715405303763 C1.7952108161168252 -0.5383742374302348, 2.0691883091977896 0.10530390227360961, 5.98055678109115 0.32457113758404177 M-0.23098995038655495 0.2580382372400753 C2.3103262745052016 0.06553810292220699, 4.9132829108683715 -0.11897967947371069, 5.976521150718466 -0.061288083300401086 M6.117522420233387 -0.08037652372781268 C5.861355958572582 4.085630497275931, 5.691241610578838 7.8384136907811195, 5.571832319134783 10.840892041254186 M6.447514780427606 0.19949545291514703 C5.783589821606611 3.676486087750032, 6.511486743183117 6.739473243674824, 6.596965249286655 11.792578117377456 M6.630042847055905 11.754072404728115 C4.31865528075792 11.677971988437596, 2.7259833239975393 11.047842185353463, -0.07031092149596396 10.858743756728083 M6.162754695230509 11.520826259193594 C4.409307408623743 10.999124365590706, 3.0485789238374257 11.588068505254856, -0.19523711891834233 11.244766188810628 M-0.21747528282461237 11.985212866598745 C-0.9100150001109041 8.275687189254073, 0.6104251626703238 5.743212079380861, -0.11489435082262012 0.051953686223215945 M-0.33493614155792645 10.7216011285361 C0.4809120720257012 7.340382076820721, 0.4998066545381119 3.8602631175076065, 0.004627015511844079 -0.4281897198130523" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(11.250502759324718 57.754622277936846) rotate(0 45.70308969435939 -1.1054186984630974)" fill-rule="evenodd"><path d="M1.9708588235080242 -0.673730444163084 L38.35772980377078 46.975357819348574 L91.92586920357166 -1.796245816030705 L79.27535724054474 -45.659213085791976 L0.2679928823240516 -6.144148761140514" stroke="none" stroke-width="0" fill="#daaeff44" fill-rule="evenodd"></path><path d="M-0.22275335714221 -0.5273026116192341 C9.694347552675755 13.06960125397891, 23.899904563743622 24.10069997273386, 39.198193836957216 45.01470376178622 M0.25803626142442226 -0.33162478171288967 C15.688969958480447 17.095524273626506, 30.75242855763063 34.615343016386035, 39.16120535694063 45.54378488101065 M39.54686002805829 43.480414401739836 C50.853325995816164 34.19589769528912, 63.12440551371709 24.132350338303603, 92.6566821480435 -4.7246032160308005 M40.55517823062837 44.008370662108064 C53.96414005387838 31.730560406377265, 70.69759613439169 18.339341151394994, 90.71059391035556 -2.349924650841558 M92.20746625906645 -4.824780027249062 C88.12745912947813 -17.600452023092704, 84.82013362835015 -28.143779217598222, 82.2649349288215 -47.754622277936846 M90.96761915220736 -3.4844752222832085 C87.81699245417745 -19.53223362092732, 84.13581231297644 -36.00721173234557, 81.11043243456083 -46.7445227346783 M80.59107914726633 -46.44705372782698 C55.611396156224686 -35.62679574604021, 33.18193570441955 -22.735963894772812, -1.2505027593247178 -5.447266788887191 M80.3267014508267 -46.22283856490901 C51.04336029778089 -31.426486006459037, 21.839361764836433 -15.775694014158706, -0.2864522707782271 -5.766134161236096 M-0.3234736817810592 -5.358044590886488 C-0.12828180044280368 -4.227381602434128, -0.6529641594215109 -3.7101722514716307, -0.14288427756129446 -0.4813340452006054 M-0.9042778165982728 -5.428992197496362 C-0.6125349833146894 -4.420666162947862, -0.435277130319901 -2.889400318382176, -0.028520986121137737 0.18192724528001325" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(279.26036993456114 88.97186480891867) rotate(0 3.418704253333999 5.925614611875517)"><path d="M0.412126075476408 1.7377893216907978 L7.665755843168881 1.1542802341282368 L5.098590110785153 13.348441562943641 L-0.06317483261227608 10.338884196572486" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.6815348945335168 0.5038766563481208 C3.198905306843421 -0.6584281918588478, 5.698025660359611 0.27966395563163393, 7.19569442705328 0.2785861519008428 M0.292223969589058 -0.31425407332691946 C1.9844493728539645 0.13254063066168464, 3.7639596906404766 0.052399487302288314, 6.977350082720357 -0.23978696454363738 M6.0068587076227375 -1.1311316088436243 C6.8688962067705335 4.71570517276577, 7.037601160407382 8.676435442189081, 6.569900554611726 12.074237714551488 M6.536543389905112 -0.5637604767856538 C6.419675443161047 2.8266902777997784, 6.601101849795516 5.143969919731123, 6.57611098927188 12.022029833929816 M6.867207372904528 11.681551524356411 C4.334084707177831 11.931867751023727, 3.258852705935664 11.373569948040803, -0.5319041888763245 12.140552214354509 M6.958476172432272 11.975472681881143 C5.231688454449349 11.978556369275237, 2.9972092527973238 11.84541399012187, 0.14938540845592774 12.103269667766929 M1.081712435577699 10.6762471479096 C-0.060007406640324654 9.368738206215472, 0.027534095514131898 6.21432786854176, 0.5077923499226578 0.9322425547716271 M-0.20751455971848604 11.574856967065793 C0.17931716911505222 7.013753583956493, -0.4187632686037479 1.94896595146576, -0.39315259472311126 -0.495543886809975" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(292.11751279170426 63.97186480891844) rotate(0 3.418704253333999 5.925614611875517)"><path d="M1.5492554195225239 0.10835577175021172 L5.278866661137727 -0.37342559173703194 L5.210669768399384 11.545594259493534 L-1.305359672755003 12.834465667002384" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.584447939178116 -0.6285081466538389 C2.100588702745358 0.30289920103674045, 3.791299295355812 0.14261691431794782, 7.117291658772717 -0.47957392908727475 M-0.23958731005820805 -0.3262956415090154 C2.6731381353855275 0.04584801527717325, 5.3576176275906295 -0.12951348312850128, 6.760240935589157 0.06433088598955294 M6.235678273142226 -1.1275209535713075 C5.9302440999402455 2.9117792140972556, 6.293096913209182 4.804737156457642, 6.314813471875762 12.19283044410862 M6.863233583086396 -0.1470505461040585 C6.241122927027646 3.3110793914770813, 6.90304447958392 6.014767402316563, 6.376435722805578 12.101969915783624 M7.079543838196546 12.099716140011276 C5.496654655527512 12.10569604006801, 2.8983825055202708 11.839411281761276, 0.2987708169118555 12.355310111782847 M7.149448282000543 11.512284111439797 C5.282332531191533 11.887113197214292, 3.7372102751444842 11.762353331397428, 0.14648200905708247 12.120151672803768 M-0.41502911943697207 11.298484710380576 C0.3428518934395225 6.894645509912968, -0.8533089819980778 1.4834378106824944, -0.7863051894462225 -0.99108777361995 M0.341990991077468 11.336050839048946 C0.3406204677402852 8.032932941586413, -0.5510533875320388 4.72410403593109, 0.3029348168457163 -0.265195226802185" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(315.53621704503826 11.326050849365629) rotate(0 -30.39847914395591 51.06622245189334)" fill-rule="evenodd"><path d="M0.9502568729221821 1.8495128266513348 L-63.166344713419676 33.23148137597093 L-86.87678621762575 101.65248899215032 L-26.84671705375831 96.12109803568035 L29.09470889983436 32.836161045623726 L0.6155475862324238 1.0975350253283978" stroke="none" stroke-width="0" fill="#fc54ee33" fill-rule="evenodd"></path><path d="M0.7082663886249065 0.7268453128635883 C-16.09272884604122 9.735147181831811, -36.004706354146556 17.945817971003333, -64.12607001140714 34.33162222323665 M0.912742817774415 -0.9914432112127542 C-14.570651418715718 7.433000357994525, -29.42559672724456 14.614499772406546, -64.5715276952833 33.64376383727142 M-65.70039843395352 31.92433748660335 C-73.92678028495453 60.53449265162588, -85.32997559972912 87.51896687300341, -89.89838828944744 101.89888008737137 M-64.42285988293588 31.987733659201353 C-71.19410976633168 55.594627858073, -80.25603703847162 79.09425757392972, -88.06019921095253 103.12388811499943 M-88.9448541631657 101.94468983315994 C-65.15508905107696 99.79011742102777, -40.139986325068726 99.9360503659184, -27.28786954443376 97.24922064106374 M-87.80754275114418 103.10385952384343 C-67.14691149128078 100.61654282661951, -45.58663098837389 99.48953260023676, -25.5070695464633 96.24592267322748 M-24.832840056823898 95.97761705677419 C-7.198299468042624 74.19187542962442, 14.10955888610334 49.93867406016218, 29.10143000153562 33.27046126712651 M-25.395053345444012 95.37996207523554 C-8.290402104918478 75.39310934733864, 10.312455344439616 54.19515919678713, 27.792093650304878 34.295451008049554 M26.312837662973607 33.12968654025883 C20.9141849089148 24.658711090085262, 11.566476403336452 12.790220736858586, 1.3111014477908611 0.17798631265759468 M27.128215686285102 34.24224980947167 C20.055147063226595 25.560598801621936, 12.470001689131783 14.906929314469142, -0.14290807209908962 -0.3155482951551676 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g transform="translate(317.83179850599004 39.68615052320433) rotate(0 3.418704253333999 5.925614611875517)"><path d="M1.0632739178836346 1.2952901609241962 L7.788766478544858 -0.9993395321071148 L7.71885416412988 12.11456056623954 L-0.7327667362987995 12.468101344399635" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.49225641073519866 -0.347160226146249 C1.5293761603246636 -0.6061071262016439, 4.457950499964328 -0.3967640390608321, 6.44597687750026 -0.301504229558725 M-0.309581842778075 0.014899433118265137 C2.1634659665867373 -0.3202991840679029, 4.56788708048224 0.06158762014013292, 6.629537053090701 -0.26595209443816226 M6.840657992220808 0.41969136615360414 C7.23681680516673 4.975378035684024, 7.246856772827639 7.771108972851491, 6.816599607309313 12.369086454767219 M6.549824055206787 0.5408562177888495 C6.166572556618153 4.455606458137363, 6.873777908688047 8.928465624003374, 6.740877474897288 12.10512539871234 M6.621761220387169 11.611783713232475 C3.929442487782493 12.030822105469207, 1.521837979602224 11.340713055121105, -0.537160493614584 11.397580940551647 M6.979002234794671 12.04853636104869 C5.060755524013958 12.166565410812739, 3.867754322490165 11.652126082359452, 0.2723327954490025 12.026003422659912 M-0.9235318333093594 11.629951609466492 C-1.055745589233342 8.645143409219662, -0.8653087405775074 6.383900620215027, 1.0703746913172705 -0.9324451336935904 M-0.4245097015079453 12.303878521599238 C-0.3270667268464912 7.371209237357901, -0.4745957483821913 2.9365391893949453, 0.5007784184292873 0.12278831518657363" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(137.67907418789514 18.468907992222398) rotate(0 26.335409401516813 39.153543299330124)" fill-rule="evenodd"><path d="M0.46835439279675484 -0.06510530784726143 L66.30168490484357 25.203470275338304 L75.28508942201734 68.69434734167794 L22.84053013739822 76.65883378737738 L-21.71996379164716 48.627813401499 L1.037846613675356 1.8938887231051922" stroke="none" stroke-width="0" fill="#f41d9222" fill-rule="evenodd"></path><path d="M-0.9449572451412678 1.968819785863161 C18.929446789209862 4.3085919783983115, 38.253960232955535 10.853047389177926, 65.43688906356692 24.64341601037563 M-0.9243254419416189 -0.11340796388685703 C25.4217782312605 10.40899187519352, 50.57796431179823 19.26609216224676, 65.43585785664618 22.86602673919083 M63.92521322891116 22.539201342101705 C68.4895258864521 34.837266745684474, 71.43661498670866 46.837348833121396, 75.25836492702365 66.6762495014284 M65.11545523442328 22.711049417566073 C69.18142029250572 37.77121771531451, 70.77384719418893 51.9064839798159, 75.88258471526206 66.38964755753318 M75.62709070369601 65.43821525307635 C59.56954965982581 70.70734875219472, 47.32809690348793 72.22866627889285, 23.63966232804296 79.568808399673 M75.82583870925009 67.20319135407249 C58.30166454780841 71.30816852449038, 40.71604869642415 74.39676537725275, 23.778016702138757 78.072800377916 M21.768672528543448 77.02019675874283 C6.559061809190013 68.11226499466491, -11.0655021487736 54.78834862980462, -21.803966919758523 49.71281633792182 M23.03717340608773 78.82605669886937 C10.168173815269933 69.75535326890116, -1.4445864390449934 63.35518380680256, -21.94844641988834 47.54286557143291 M-22.75353948292991 48.89561949191352 C-16.32039541305444 37.031139456613815, -9.703885732817344 23.599212336699892, 0.5633763186633587 -1.2617218010127544 M-23.21176591222843 47.847650823050344 C-17.31884750440453 34.310390782063585, -10.066214218030565 21.77936838545971, 0.2606643084436655 0.5268328841775656 M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="transparent" stroke-width="1" fill="none"></path></g></g></svg>
<figcaption>Binning</figcaption></p>
</figure>
<h3 id="custom-binning"><a class="toclink" href="#custom-binning">Custom Binning</a></h3>
<p>Custom binning is most common for categorical data or for discrete data when ranges are pre-determined.</p>
<p>Image you have a table with student grades, and you want to classify them to letter grades A-F:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="k">CASE</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'F'</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">70</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'D'</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">80</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'C'</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">90</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'B'</span>
</span><span class="hll"><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="s1">'A'</span>
</span><span class="hll"><span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">letter_grade</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">grades</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">letter_grade</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">letter_grade</span><span class="p">;</span>
<span class="go"> letter_grade β count</span>
<span class="go">βββββββββββββββΌβββββββ</span>
<span class="go"> A β 29</span>
<span class="go"> B β 10</span>
<span class="go"> C β 12</span>
<span class="go"> D β 10</span>
<span class="go"> F β 39</span>
</pre></div>
<p>Custom binning can also use expressions to categorize data into custom groups. In the american grade system for example, the letter grade can also be calculated based on the percentile, and not the absolute grade:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="hll"><span class="n">percent_grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">percent_rank</span><span class="p">()</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">percent_grade</span>
</span><span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">grades</span>
</span><span class="hll"><span class="p">)</span>
</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">CASE</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">percent_grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0.6</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'F'</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">percent_grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0.7</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'D'</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">percent_grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0.8</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'C'</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">percent_grade</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0.9</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'B'</span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="s1">'A'</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">letter_grade</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">percent_grades</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">letter_grade</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">letter_grade</span><span class="p">;</span>
</pre></div>
<p>To find the relative grade of every student based on the grades of all other students, you used the window function <a href="https://www.postgresql.org/docs/current/functions-window.html#FUNCTIONS-WINDOW-TABLE" rel="noopener"><code>percent_rank</code></a>. The function returns a value between 0 and 1 that represents the rank of the current row relative to all other rows.</p>
<p>Custom binning is mostly useful for when the data is familiar, or within a known set of values. When exploring unknown or unbound sets of data there are other binning techniques you can use.</p>
<h3 id="equal-height-binning"><a class="toclink" href="#equal-height-binning">Equal Height Binning</a></h3>
<p>Say you need to divide students to groups based on their grade, and you want every group to have roughly the same number of students. To achieve this, PostgreSQL provides a function called <a href="https://www.postgresql.org/docs/current/functions-window.html#FUNCTIONS-WINDOW-TABLE" rel="noopener"><code>NTILE</code></a>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">grades_with_tiles</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">ntile</span><span class="p">(</span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">bucket</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">grades</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">from_grade</span><span class="p">,</span>
<span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">to_grade</span><span class="p">,</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">cnt</span><span class="p">,</span>
<span class="w"> </span><span class="n">bucket</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">grades_with_tiles</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">from_grade</span><span class="p">;</span>
<span class="go"> from_grade β to_grade β cnt β bucket</span>
<span class="go">βββββββββββββΌβββββββββββΌββββββΌββββββββ</span>
<span class="go"> 40 β 41 β 10 β 1</span>
<span class="go"> 41 β 45 β 10 β 2</span>
<span class="go"> 47 β 53 β 10 β 3</span>
<span class="go"> 53 β 61 β 10 β 4</span>
<span class="go"> 61 β 70 β 10 β 5</span>
<span class="go"> 71 β 79 β 10 β 6</span>
<span class="go"> 79 β 87 β 10 β 7</span>
<span class="go"> 89 β 95 β 10 β 8</span>
<span class="go"> 95 β 99 β 10 β 9</span>
<span class="go"> 99 β 100 β 10 β 10</span>
</pre></div>
<p>Divding values into bins or buckets with roughly the same frequency is called "Equal Height Binning". Notice how each group holds exactly 10 rows.</p>
<div class="admonition info">
<p class="admonition-title">Try it!</p>
<p><embed loading="lazy" width="100%" height="410" frameborder="0" src="https://app.hex.tech/global/app/649076ce-fe1f-414d-aed2-a2edb6de25cd/latest?embedded=true" /></p>
<p><a style="font-weight:bold;font-size:0.8em;color:var(--stable-color);" href="https://app.hex.tech/global/hex/649076ce-fe1f-414d-aed2-a2edb6de25cd/draft/logic" target="_blank">Try in on Hex Β»</a></p>
</div>
<p>The function <code>NTILE</code> is a window function. It accepts the number of buckets, in this case 10, and an order by clause in which to divide the range by. Window functions can't be used as a group by key, so you need to use either a subquery or a CTE to add the "bucket" field.</p>
<h3 id="equal-width-binning"><a class="toclink" href="#equal-width-binning">Equal Width Binning</a></h3>
<p>So far you divided students to groups based on arbitrary letter grades (custom binning) and to equally sized groups based on their grades (equal height binning). None of these grouping techniques gives you a good sense of the data distribution. One way to visualize the data and get a sense of how grades are distributed is using a histogram.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 221.66296785886914 125.58217717603782" width="auto" height="10em">
<g><g transform="translate(110.24619182371998 13.264808317175266) rotate(270.07194166149554 1.8692533998441263 100.21933745864263)"><path d="M0.724831211194396 1.127395564690232 C1.2389690872551302 34.29917372859841, 3.5833627579218654 165.41550182736546, 4.092277931956232 198.48223047650313 M-0.3537711322680115 0.6736742908135058 C-0.04106745244102655 34.0553660554049, 2.8343328751108556 166.96402865710735, 3.4126666284852036 199.7650006264717" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(11.759117350885731 10.244060571972682) rotate(178.1590126331561 -0.23861488552535093 52.14465767453527)"><path d="M1.127395564690232 0.511303162202239 C1.151814520704397 17.816361622345042, -0.3212942121046353 87.29525099909102, -0.4019247708609782 104.55518197205122 M0.2601395068131387 -0.2658666229806841 C0.13208302649746198 16.58740475937758, -1.5899630395377744 84.95228197403422, -1.6046253357409024 102.53621465323697" stroke="#000000" stroke-width="1" fill="none"></path></g></g><g transform="translate(28.201695548355247 87.59002920525631) rotate(0 8.00916300231495 11.239850083654687)"><path d="M1.6081505920737982 -0.9627700056880713 L14.147381435742513 -0.36528476141393185 L15.357607792725698 20.950893341209653 L-1.0343026611953974 23.65594160428262" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.3900309258708285 0.6188545171768298 C2.9891757075923087 0.4510172709253094, 7.966533834352828 0.18821226572367764, 15.757098018801603 -0.10987307151261994 M-0.6092806775766115 0.6607898002845388 C6.083912064654459 -0.3510863053323164, 12.123350124711957 0.48873126908985853, 16.614120047087994 0.27171478372418967 M15.80971592630996 -1.817090580239892 C15.02367005549235 8.776491492449114, 15.201639690025278 14.682747133863705, 17.660702466627495 21.633897254820827 M15.570632175205187 -0.6288401586934924 C15.324853460400245 7.173750372632924, 16.523260782568595 14.51582015963561, 15.874991104600863 21.56074086727566 M16.4567380692054 21.656151975911374 C10.532876625283977 22.53777849394676, 3.4016111887846803 21.900598035372685, 0.979324835855256 22.18514687271462 M15.66204965278354 21.938718763973025 C11.090482176071918 22.46557067110958, 8.16657227906407 22.721258965186568, -0.6649772515802921 22.486269812818396 M0.9635673556476831 22.839524636622432 C1.1868977362802442 16.406356087571996, 1.821864288537782 8.243746849434647, 0.5253883991390467 1.0234148409217596 M0.456798012368381 22.576555201429787 C-0.5526703944286677 14.633651566251855, 1.1219809005180028 5.5686671281564415, 0.6762169850990176 -0.3299122853204608" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(50.59277716619454 72.98714988927475) rotate(0 8.495925646180979 18.054527097779555)"><path d="M0.09687112830579281 0.06623444892466068 L18.19493533465773 -1.9301943387836218 L16.580923868154585 36.23136028354476 L-0.16282684542238712 36.294398811510305" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.3981130562907007 -0.08037720536161275 C6.945839864241723 0.44943843153437846, 12.316667678583801 1.7086648012003731, 18.383996682749505 0.8766483665025016 M0.43379765569307205 0.7882903826903058 C5.297675060326536 0.23444817278737595, 10.450666049530865 0.2252409452169606, 17.618146486558263 -0.006500318407318573 M16.58897906709224 0.9548140075057745 C19.05674836354543 14.273519518000358, 19.220271739533437 24.975821448783027, 18.72865808177501 37.50958595504473 M16.73389508090675 0.19571684766560793 C16.54137136813022 12.933393185706143, 16.602928333242907 25.111610791141697, 16.930437314135432 35.594911225004594 M17.728091562116447 35.00695241247355 C10.75905532194773 37.47598319465319, 6.300354384408565 37.35719171099226, 1.1050931439395297 35.842258059604085 M16.335613136028797 36.74537418993784 C10.220162432721903 36.85816194988186, 4.475496962501106 35.82537399837923, 0.6772431008430124 36.688073613525084 M1.5563459191471338 34.4829173468561 C0.469634931285792 24.70433998267639, -1.712335009376592 16.21255728097227, -0.7506112661212683 -0.756426939740777 M-0.1343748616054654 35.700109638138215 C0.04993802937555117 28.7562097772782, 0.5729684594540576 19.80613487521946, 0.807915891520679 0.9079995946958661" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(78.33824786655964 50.59606827143625) rotate(0 8.495925646180979 28.276542618966744)"><path d="M0.11350165866315365 -1.1614346709102392 L17.378851784204542 0.9007417354732752 L16.696164322828352 55.44737888333035 L-1.551443049684167 54.845633933512694" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.4755781344635919 1.1898813693704668 C6.8618720108466125 0.6100545902806033, 11.083432571718701 -0.30620220423973266, 17.324410449456302 -0.9380289204300142 M-0.052176859264837505 -0.4368120449337897 C5.222908889458741 -0.6550579665454523, 10.26041683610609 0.4381772847100386, 16.44080040081913 -0.7335725984178283 M18.29258436131984 -0.31402833200991154 C18.14409512020599 21.26861048401575, 17.822888729348293 42.174665056266996, 18.4897928254655 55.6888296810968 M17.788989978653788 0.6815259950235486 C17.12803148172635 22.03539425300498, 16.912208457215012 43.30618317291268, 16.1787828680104 56.92717336603088 M16.35413754174671 55.91043053425294 C11.298841112657989 55.66660158401578, 8.338728736666804 56.439032519146686, -0.6948725106516851 56.27657469286311 M17.678250626634682 57.32451494226338 C10.577592214281978 56.58139319410546, 4.592097774339881 55.73331766719222, -0.17456044210210675 56.60504040941358 M0.5740457568317652 58.31581326407028 C0.5921405980247432 41.103604735436726, -0.570897000108344 23.610197705216258, 0.5856043491512537 -0.2924621198326349 M-0.08575350511819124 57.14563967772407 C-0.4329419078275458 43.02973937909537, 0.5550181029394372 29.95044968379919, -0.5982833849266171 0.2627262072637677" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(103.64990534759454 26.744698721999157) rotate(0 8.495925646180979 38.98532078402013)"><path d="M0.36738758347928524 -1.8339578714221716 L18.011439872716956 0.5603623185306787 L18.241329086755805 79.51325475584508 L-0.5188114736229181 78.42199831615926" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.2913942360371176 0.5855165888017202 C3.0108974080741966 1.4279127111878593, 8.728196986016586 -0.02630583338082848, 18.30258425866926 -0.46645561667910584 M-0.3330136187939161 -0.6090702267600594 C6.991190547571411 -0.38242137373628493, 12.518303265339092 -0.6943055132202747, 16.49147194150487 0.10496687227318746 M18.59009289908915 0.7205983307212591 C16.675016843182085 23.959294836453964, 18.488993786674975 44.783797235131004, 17.05808574128657 79.17372561033608 M16.910437869650714 0.09267230797559023 C16.216628207561115 16.15329080153005, 17.713013930913547 32.05481066990141, 17.519864815098636 77.71139759264679 M15.769299111399672 76.85385271284996 C11.668996489615463 77.80008312311934, 7.687838866979709 76.57856719898881, -1.064329516016484 78.28214152180833 M17.040066374997927 77.47726843769276 C13.432817677236631 78.39791535503423, 9.419384596353225 77.54558366572367, -0.6590472397200453 77.24532260169856 M-0.22344311140477657 76.1482109480703 C1.2381572189585555 55.521262779449955, -1.1286863026364458 35.715349995961446, 0.9169320855289698 1.2287005688995123 M0.5742514384910464 77.66182896934242 C-1.372639985091313 60.54366227148008, -1.3105937850543103 41.62241501484088, 0.9294879389926791 -0.6876968843862414" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(180.80178440036457 80.77535219113224) rotate(0 8.00916300231495 11.239850083654687)"><path d="M0.5873582754284143 0.37131320498883724 L15.678044020792381 -1.7516418192535639 L14.51606881584777 23.022940466757778 L-1.2475053276866674 24.1500274041853" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9122288584760498 -0.178959230098527 C1.9783578302518623 1.1187266279481234, 6.437765600696937 -0.7769169285946235, 17.25916333931043 0.7343858535054097 M0.7648048698019095 0.45992733751886183 C5.542251071110341 -0.6946318250224225, 12.00076251846197 -0.6449380120453633, 15.664731014915702 0.7444420412078124 M17.037914584984915 0.5603623185306787 C17.309097612632797 6.740601261298085, 15.540808344615982 10.84733289510553, 15.459914456715719 23.39587080350137 M16.3245250383745 0.5335578871890903 C15.564770840760488 6.37687562432125, 15.9986128739083 13.640964960269327, 16.32462901867731 22.032491251472297 M17.607394943136832 23.070525066415573 C10.647009971425337 23.12401233266235, 4.806111025618966 22.51389406591873, 1.1679274401634905 21.404958403884937 M15.98525345317304 23.061671456515576 C9.314181021737353 22.806181457174354, 3.612862002563826 23.108399586957763, 0.2765850244418072 22.240690138796015 M0.688595762476325 21.685750661995176 C-0.6471045149867396 14.618908451621555, 0.28977970942808584 6.116376315100474, -1.0512276384979486 0.1906620655208826 M-0.6905617518350482 22.373377337697807 C-0.4969528395750375 15.740365062904445, 0.04508735946632908 10.438707883020747, 0.6165586011484265 -0.09976396430283785" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(155.4901269193292 66.17247287515022) rotate(0 8.495925646180979 18.054527097779555)"><path d="M0.37131320498883724 -0.3402819838374853 L15.24020947310835 -1.502257188782096 L17.535091591810286 34.86154886787246 L1.6703272368758917 36.475105431726675" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.1898356060646309 -1.548323504258828 C6.0063031557577995 -0.5615764955210516, 8.768533148719207 1.2194830940807546, 17.77086997448709 1.0438948674790498 M0.4878797523632388 -0.26236488771407773 C3.3237841915223343 0.2849596972019504, 7.289009022168007 -0.5373422668217099, 17.781537334232294 -0.5842621596855809 M17.552213610892593 1.249477794393897 C18.203352059308955 10.977205693176272, 17.112095619623133 22.433622785684232, 17.90802192855388 36.581833817860584 M17.525409179551005 -0.6272274954244494 C16.426453188075193 13.429022645355742, 16.41523526610292 27.551119229312203, 16.544642376524806 36.887477822466295 M17.618583999795575 36.831533804699106 C13.843614627078127 37.22445583652441, 9.310558965933184 37.51652784084845, -1.1400599673474665 37.48233606776918 M17.609192305383946 35.46829723777872 C11.103102683987496 35.88429720125821, 5.247141261106688 36.39543236194436, -0.2535360349582634 36.90159661351422 M-0.7939495053142309 35.37248317231844 C0.7196341272696056 26.913748150563553, 1.0580308553083935 16.266588064721752, 0.1906620655208826 -0.5514352414757013 M-0.10632282961159945 35.7997460140423 C-0.3339735185974795 22.494941172936677, 0.48847440720856605 9.031922716413579, -0.09976396430283785 0.2936791377142072" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(130.66523208215972 47.67549240824019) rotate(0 8.495925646180979 28.276542618966744)"><path d="M-0.3402819838374853 -1.7516418192535639 L15.489594103579819 0.5432402994483709 L15.744345964675247 58.223412474809415 L0.3660512361675501 58.21230628174377" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.548323504258828 1.2332205052347698 C5.097770530316906 -0.7586485917203893, 12.441410123389002 0.7980022427316397, 18.035746159840965 -1.1850445472771463 M-0.26236488771407773 -0.5887266489090943 C7.025991622614826 -0.6335886091674977, 12.782464296799471 0.6921100913977306, 16.407589132676335 0.15606487962850535 M18.24132908675581 1.5426131878048182 C16.76134421820121 12.231483528233195, 16.72174414390998 24.47642419642036, 17.464630914663374 58.098233351675994 M16.364623796937465 -0.8984316335991025 C16.97163083063661 21.272162720647728, 17.47131930696069 43.9164156366011, 17.770274919269085 55.83997967392785 M17.714330901501896 57.29070559042428 C13.382629488178173 56.706180217797254, 9.118642622007343 55.47569854132942, 1.3732818722100544 57.632076927165436 M16.35109433458151 56.84088370649959 C11.091305856407098 57.28493756316367, 5.919636063526048 56.423017405579664, 0.7925424179550962 56.214957595530926 M-0.7365710232406855 57.53322238680554 C0.005602126925122852 36.82576126231571, -1.2459387127469657 14.652031598397166, -0.5514352414757013 1.6694587264209986 M-0.30930818151682615 55.9378552392183 C0.6232659936603822 44.19934528060631, 1.0070925772842683 31.331405504794617, 0.2936791377142072 0.18565660249441862" stroke="#000000" stroke-width="1" fill="none"></path></g></svg>
<figcaption>Histogram</figcaption></p>
</figure>
<p>To draw a histogram you need to divide grades into equal width ranges. Grades range from 0 to 100, so you can split the range to 10 bars of 10 each:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">floor</span><span class="p">((</span><span class="n">grade</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="n">from_grade</span><span class="p">,</span>
<span class="w"> </span><span class="n">max</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="n">to_grade</span><span class="p">,</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">grades</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">;</span>
<span class="go"> bucket β from_grade β to_grade β count</span>
<span class="go">βββββββββΌβββββββββββββΌβββββββββββΌβββββββ</span>
<span class="go"> 3 β 40 β 40 β 7</span>
<span class="go"> 4 β 41 β 50 β 20</span>
<span class="go"> 5 β 51 β 58 β 12</span>
<span class="go"> 6 β 61 β 70 β 11</span>
<span class="go"> 7 β 71 β 79 β 11</span>
<span class="go"> 8 β 81 β 90 β 14</span>
<span class="go"> 9 β 92 β 100 β 25</span>
</pre></div>
<p>To assign each grade to the right bucket we used a little arithmetics. This worked out nicely because the arithmetics here are fairly simple, but what if you wanted smaller buckets? Say 20 buckets of width 5? or 25 buckets of width 4? That would have made the calculation more complicated.</p>
<p>To simplify the task of assigning values into equal width buckets within a predefined range, PostgreSQL provides the function <a href="https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-FUNC-TABLE" rel="noopener"><code>width_bucket</code></a>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">width_bucket</span><span class="p">(</span><span class="n">grade</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="p">,</span><span class="mf">101</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">width_bucket</span><span class="p">(</span><span class="n">grade</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="p">,</span><span class="mf">101</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">width_bucket</span><span class="p">(</span><span class="n">grade</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="p">,</span><span class="mf">101</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">high_bound</span><span class="p">,</span>
</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">grades</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">;</span>
<span class="go">bucket β low_bound β high_bound β count</span>
<span class="go">ββββββββΌββββββββββββΌβββββββββββββΌβββββββ</span>
<span class="go"> 8 β 35 β 40 β 7</span>
<span class="go"> 9 β 40 β 45 β 13</span>
<span class="go"> 10 β 45 β 50 β 7</span>
<span class="go"> 11 β 50 β 55 β 8</span>
<span class="go"> 12 β 55 β 60 β 4</span>
<span class="go"> 13 β 60 β 65 β 7</span>
<span class="go"> 14 β 65 β 70 β 4</span>
<span class="go"> 15 β 70 β 75 β 7</span>
<span class="go"> 16 β 75 β 80 β 4</span>
<span class="go"> 17 β 80 β 85 β 6</span>
<span class="go"> 18 β 85 β 90 β 8</span>
<span class="go"> 19 β 90 β 95 β 7</span>
<span class="go"> 20 β 95 β 100 β 18</span>
</pre></div>
<p>Ths function <code>width_bucket</code> accepts the value to assign, two arguments for the lower and higher bounds of the range, and the number of buckets to divide the range into.</p>
<p>You should be careful with the values you set for the higher and lower bounds. The higher bound of the range is exclusive, meaning, if you set the higher bound to 100, grades that equal 100 will be considered out of range, and will result in an additional bucket. This is why the query above uses 101 as the higher bound.</p>
<p>To calculate the higher bound of each bucket, we multiply the index of the bucket by the width. To get the lower bound, we multiple the width by the index minus one, or in other words, the higher bound of the previous bucket.</p>
<p>Histograms are great for visualization, but if you try to draw a histogram from the result above you won't be able to get a real sense of the distribution because you might have gaps. Notice for example, how the range above starts with bucket 9, which is not the first bucket. This is because in our grades table, no one got a grade which is less than 40.</p>
<p>The function <code>width_bucket</code> is useful, but we already do most of the hard work, so might as well generate the buckets on our own using <code>generate_series</code>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">bucket</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">bucket</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">high_bound</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">bucket</span><span class="p">;</span>
<span class="go">bucket β low_bound β high_bound</span>
<span class="go">ββββββββΌββββββββββββΌββββββββββββ</span>
<span class="go"> 1 β 1 β 5</span>
<span class="go"> 2 β 6 β 10</span>
<span class="go"> 3 β 11 β 15</span>
<span class="go"> 4 β 16 β 20</span>
<span class="go"> 5 β 21 β 25</span>
<span class="go"> 6 β 26 β 30</span>
<span class="go"> 7 β 31 β 35</span>
<span class="go"> 8 β 36 β 40</span>
<span class="go"> 9 β 41 β 45</span>
<span class="go"> 10 β 46 β 50</span>
<span class="go"> 11 β 51 β 55</span>
<span class="go"> 12 β 56 β 60</span>
<span class="go"> 13 β 61 β 65</span>
<span class="go"> 14 β 66 β 70</span>
<span class="go"> 15 β 71 β 75</span>
<span class="go"> 16 β 76 β 80</span>
<span class="go"> 17 β 81 β 85</span>
<span class="go"> 18 β 86 β 90</span>
<span class="go"> 19 β 91 β 95</span>
<span class="go"> 20 β 96 β 100</span>
</pre></div>
<p>The query generates 20 ranges of 5 within a range 1 to 100. To create the histogram, use this table as an axis and join it to the grades:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">buckets</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">bucket</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">bucket</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">high_bound</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">bucket</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
<span class="w"> </span><span class="n">low_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">high_bound</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">cnt</span>
</span><span class="k">FROM</span>
<span class="hll"><span class="w"> </span><span class="n">buckets</span>
</span><span class="hll"><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">low_bound</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">high_bound</span>
</span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span><span class="w"> </span><span class="n">high_bound</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">;</span>
<span class="go">bucket β low_bound β high_bound β cnt</span>
<span class="go">ββββββββΌββββββββββββΌβββββββββββββΌβββββ</span>
<span class="go"> 1 β 1 β 5 β 0</span>
<span class="go"> 2 β 6 β 10 β 0</span>
<span class="go"> 3 β 11 β 15 β 0</span>
<span class="go"> 4 β 16 β 20 β 0</span>
<span class="go"> 5 β 21 β 25 β 0</span>
<span class="go"> 6 β 26 β 30 β 0</span>
<span class="go"> 7 β 31 β 35 β 0</span>
<span class="go"> 8 β 36 β 40 β 7</span>
<span class="go"> 9 β 41 β 45 β 13</span>
<span class="go"> 10 β 46 β 50 β 7</span>
<span class="go"> 11 β 51 β 55 β 8</span>
<span class="go"> 12 β 56 β 60 β 4</span>
<span class="go"> 13 β 61 β 65 β 7</span>
<span class="go"> 14 β 66 β 70 β 4</span>
<span class="go"> 15 β 71 β 75 β 7</span>
<span class="go"> 16 β 76 β 80 β 4</span>
<span class="go"> 17 β 81 β 85 β 6</span>
<span class="go"> 18 β 86 β 90 β 8</span>
<span class="go"> 19 β 91 β 95 β 7</span>
<span class="go"> 20 β 96 β 100 β 18</span>
</pre></div>
<p>To make sure you don't have any gaps in the data, you <code>LEFT JOIN</code>ed the grades to the generated axis table <code>buckets</code>. As a result, some rows do not have a value. Using <code>COUNT(*)</code> count rows, so buckets with no grades return 1. To overcome that, count only rows with grades using <code>COUNT(grade)</code>.</p>
<div class="admonition info">
<p class="admonition-title">Try it!</p>
<p><embed loading="lazy" width="100%" height="700" frameborder="0" src="https://app.hex.tech/global/app/15d28489-1d89-4aac-a3d1-1b20d55cb60b/latest?embedded=true" /></p>
<p><a style="font-weight:bold;font-size:0.8em;color:var(--stable-color);" href="https://app.hex.tech/global/hex/15d28489-1d89-4aac-a3d1-1b20d55cb60b/draft/logic" target="_blank">Try in on Hex Β»</a></p>
</div>
<p>To finish off with a bang, you can enhance your query with a little ascii chart to display the histogram straight in the terminal:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="mf">70</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">sin</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">30</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grade</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">buckets</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">bucket</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">bucket</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">5</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">high_bound</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">20</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">bucket</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">low_bound</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">' - '</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">high_bound</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">bounds</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">cnt</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">repeat</span><span class="p">(</span><span class="s1">'β '</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">grade</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">chart</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">buckets</span>
<span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">grades</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">grade</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">low_bound</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">high_bound</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">,</span><span class="w"> </span><span class="n">low_bound</span><span class="p">,</span><span class="w"> </span><span class="n">high_bound</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">bucket</span><span class="p">;</span>
<span class="go"> bounds β cnt β chart</span>
<span class="go">ββββββββββΌββββββΌββββββββββββββββββββ</span>
<span class="go">1 - 5 β 0 β</span>
<span class="go">6 - 10 β 0 β</span>
<span class="go">11 - 15 β 0 β</span>
<span class="go">16 - 20 β 0 β</span>
<span class="go">21 - 25 β 0 β</span>
<span class="go">26 - 30 β 0 β</span>
<span class="go">31 - 35 β 0 β</span>
<span class="go">36 - 40 β 7 β β β β β β β β </span>
<span class="go">41 - 45 β 13 β β β β β β β β β β β β β β </span>
<span class="go">46 - 50 β 7 β β β β β β β β </span>
<span class="go">51 - 55 β 8 β β β β β β β β β </span>
<span class="go">56 - 60 β 4 β β β β β </span>
<span class="go">61 - 65 β 7 β β β β β β β β </span>
<span class="go">66 - 70 β 4 β β β β β </span>
<span class="go">71 - 75 β 7 β β β β β β β β </span>
<span class="go">76 - 80 β 4 β β β β β </span>
<span class="go">81 - 85 β 6 β β β β β β β </span>
<span class="go">86 - 90 β 8 β β β β β β β β β </span>
<span class="go">91 - 95 β 7 β β β β β β β β </span>
<span class="go">96 - 100 β 18 β β β β β β β β β β β β β β β β β β β </span>
</pre></div>
<p>And there you have it! A histogram with SQL, right there in your terminal...</p>
<hr>
<h2 id="take-away"><a class="toclink" href="#take-away">Take Away</a></h2>
<p>If there is one thing you take away from this article, it should be this - <strong>use the best tool for the job!</strong>. Pandas is great, and SQL is also great. Each has its strengths and weaknesses, and you have a better chance of creating an optimal data pipeline if know both!</p>Exciting New Features in Django 3.22021-03-03T00:00:00+02:002021-03-03T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-03-03:/django-32-exciting-features<p>Django 3.2 is just around the corner and it's packed with new features. Django versions are usually not that exciting (it's a good thing!), but this time many features were added to the ORM, so I find it especially interesting!</p><hr>
<p>Django 3.2 is just around the corner and it's <a href="https://docs.djangoproject.com/en/dev/releases/3.2/" rel="noopener">packed with new features</a>. Django versions are usually not that exciting (it's a good thing!), but this time many features were added to the ORM, so I find it especially interesting!</p>
<p><strong>This is a list of my favorite features in Django 3.2</strong></p>
<figure>
<p><svg viewBox="0 0 508 268" aria-hidden="true" width="20em" height="auto">
<path d="M305.2 156.6c0 4.6-.5 9-1.6 13.2-2.5-4.4-5.6-8.4-9.2-12-4.6-4.6-10-8.4-16-11.2 2.8-11.2 4.5-22.9 5-34.6 1.8 1.4 3.5 2.9 5 4.5 10.5 10.3 16.8 24.5 16.8 40.1zm-75-10c-6 2.8-11.4 6.6-16 11.2-3.5 3.6-6.6 7.6-9.1 12-1-4.3-1.6-8.7-1.6-13.2 0-15.7 6.3-29.9 16.6-40.1 1.6-1.6 3.3-3.1 5.1-4.5.6 11.8 2.2 23.4 5 34.6z" fill="#2E3B39" fill-rule="nonzero"></path>
<path d="M282.981 152.6c16.125-48.1 6.375-104-29.25-142.6-35.625 38.5-45.25 94.5-29.25 142.6h58.5z" stroke="var(--bg-color)" stroke-width="3.396" fill="#6DDCBD"></path>
<path d="M271 29.7c-4.4-10.6-9.9-20.6-16.6-29.7-6.7 9-12.2 19-16.6 29.7H271z" stroke="var(--bg-color)" stroke-width="3" fill="#2E3B39"></path>
<circle fill="#FFF" cx="254.3" cy="76.8" r="15.5"></circle>
<circle stroke="#FFF" stroke-width="7" fill="#6DDCBD" cx="254.3" cy="76.8" r="12.2"></circle>
<path class="smoke" d="M507.812 234.24c0-2.16-.632-4.32-1.58-6.24-3.318-6.72-11.85-11.52-21.804-11.52-1.106 0-2.212.12-3.318.24-.474-11.52-12.956-20.76-28.282-20.76-3.318 0-6.636.48-9.638 1.32-4.74-6.72-14.062-11.28-24.806-11.28-.79 0-1.58 0-2.37.12-.79 0-1.58-.12-2.37-.12-10.744 0-20.066 4.56-24.806 11.28a35.326 35.326 0 00-9.638-1.32c-15.642 0-28.282 9.6-28.282 21.48 0 1.32.158 2.76.474 3.96a26.09 26.09 0 00-4.424-.36c-8.058 0-15.01 3.12-19.118 7.8-3.476-1.68-7.742-2.76-12.324-2.76-12.008 0-21.804 7.08-22.752 15.96h-.158c-9.322 0-17.38 4.32-20.856 10.44-4.108-3.6-10.27-6-17.222-6h-1.264c-6.794 0-12.956 2.28-17.222 6-3.476-6.12-11.534-10.44-20.856-10.44h-.158c-.948-9-10.744-15.96-22.752-15.96-4.582 0-8.69.96-12.324 2.76-4.108-4.68-11.06-7.8-19.118-7.8-1.422 0-3.002.12-4.424.36.316-1.32.474-2.64.474-3.96 0-11.88-12.64-21.48-28.282-21.48-3.318 0-6.636.48-9.638 1.32-4.74-6.72-14.062-11.28-24.806-11.28-.79 0-1.58 0-2.37.12-.79 0-1.58-.12-2.37-.12-10.744 0-20.066 4.56-24.806 11.28a35.326 35.326 0 00-9.638-1.32c-15.326 0-27.808 9.24-28.282 20.76-1.106-.12-2.212-.24-3.318-.24-9.954 0-18.486 4.8-21.804 11.52-.948 1.92-1.58 4.08-1.58 6.24 0 4.8 2.528 9.12 6.636 12.36-.79 1.44-1.264 3.12-1.264 4.8 0 7.2 7.742 13.08 17.222 13.08h462.15c9.48 0 17.222-5.88 17.222-13.08 0-1.68-.474-3.36-1.264-4.8 4.582-3.24 7.11-7.56 7.11-12.36z" fill="#E6E9EE"></path>
<path fill="#6DDCBD" d="M239 152h30v8h-30z"></path>
<path class="exhaust__line" fill="#E6E9EE" d="M250 172h7v90h-7z"></path>
<path class="flame" d="M250.27 178.834l-5.32-8.93s-2.47-5.7 3.458-6.118h10.26s6.232.266 3.306 6.194l-5.244 8.93s-3.23 4.37-6.46 0v-.076z" fill="#AA2247"></path>
</svg>
<figcaption>Image from the Django welcome page</figcaption></p>
</figure>
<p>A lot of great people worked on this release and none of them is me. I included links to the tickets of each new feature to show my appreciation to the people behind it.</p>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#covering-indexes">Covering Indexes</a></li>
<li><a href="#provide-timezone-to-truncdate">Provide Timezone to TruncDate</a></li>
<li><a href="#building-json-objects">Building JSON Objects</a></li>
<li><a href="#loud-signal-receiver">Loud Signal Receiver</a></li>
<li><a href="#queryset-alias">QuerySet Alias</a></li>
<li><a href="#new-admin-decorators">New Admin Decorators</a></li>
<li><a href="#value-expression-detects-type">Value Expression Detects Type</a></li>
<li><a href="#more-mentionable-features">More Mentionable Features</a></li>
<li><a href="#wishlist">Wishlist</a></li>
</ul>
</div>
<p></details></p>
<hr>
<p><details markdown="1"></p>
<p><summary>β Setup local environment with the latest version of Django</summary></p>
<p>To setup an environment with the latest version of Django start by creating a new directory and a virtual environment:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>mkdir<span class="w"> </span>django32
<span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>django32
<span class="gp">$ </span>python3.9<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>venv
<span class="gp">$ </span><span class="nb">source</span><span class="w"> </span>venv/bin/activate
</pre></div>
<p>To install the latest version of Django you can either install using <code>pip</code>, or if it hasn't been released yet, install directly from git:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>git+https://github.com/django/django@3.2a1
</pre></div>
<p>Start a new project and app:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>django-admin<span class="w"> </span>startproject<span class="w"> </span>project
<span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>./manage.py<span class="w"> </span>startapp<span class="w"> </span>store
</pre></div>
<p>Add the new app to the list of <code>INSTALLED_APPS</code>, and configure a PostgreSQL database:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s1">'store'</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">DATABASES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'default'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'django32'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'postgres'</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>To try out some of the new features, create a <code>Customer</code> model:</p>
<div class="highlight"><pre><span></span><span class="c1"># store/models.py</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Customer</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">joined_at</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">name</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
</pre></div>
<p>Finally, create the DB, generate and apply the migrations:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>createdb<span class="w"> </span>django32<span class="w"> </span>-O<span class="w"> </span>postrges
<span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>./manage.py<span class="w"> </span>makemigrations
<span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>./manage.py<span class="w"> </span>migrate
</pre></div>
<p>Great! Now add some random customer data:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">string</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">pytz</span>
<span class="n">starting_at</span> <span class="o">=</span> <span class="n">pytz</span><span class="o">.</span><span class="n">UTC</span><span class="o">.</span><span class="n">localize</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2021</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">DAY</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">bulk_create</span><span class="p">((</span><span class="n">Customer</span><span class="p">(</span>
<span class="n">joined_at</span> <span class="o">=</span> <span class="n">starting_at</span> <span class="o">+</span> <span class="p">(</span><span class="n">DAY</span> <span class="o">*</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">365</span><span class="p">),</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">choices</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">ascii_letters</span> <span class="o">+</span> <span class="s1">' '</span> <span class="o">*</span> <span class="mi">10</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">))),</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">__</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10_000</span><span class="p">)))</span>
</pre></div>
<p>Congratulations! You now have 10K new customers, you are ready to go!</p>
<hr>
<p></details></p>
<h2 id="covering-indexes"><a class="toclink" href="#covering-indexes">Covering Indexes</a></h2>
<p><a href="https://code.djangoproject.com/ticket/30913" rel="noopener">Ticket #30913</a></p>
<p>Covering indexes let you store additional columns in an index. The main benefit of a covering index is that when a query only uses fields that are present in the index, the <a href="https://www.postgresql.org/docs/current/indexes-index-only-scans.html" rel="noopener">database can use an index-only scan</a>, meaning the actual table is not accessed at all. This can make queries faster.</p>
<p>Django 3.2 added support for PostgreSQL covering indexes:</p>
<blockquote>
<p>The new Index.include and UniqueConstraint.include attributes allow creating covering indexes and covering unique constraints on PostgreSQL 11+.</p>
</blockquote>
<p>If for example you are searching for names of customers that have joined during a certain period of time, you can create an index on <code>joined_at</code>, and include the field <code>name</code> in the index:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Customer</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">models</span><span class="o">.</span><span class="n">Index</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s1">'</span><span class="si">%(app_label)s</span><span class="s1">_</span><span class="si">%(class)s</span><span class="s1">_joined_at_ix'</span><span class="p">,</span>
<span class="hll"> <span class="n">fields</span><span class="o">=</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">,),</span>
</span><span class="hll"> <span class="n">include</span><span class="o">=</span><span class="p">(</span><span class="s1">'name'</span><span class="p">,),</span>
</span> <span class="p">),</span>
<span class="p">)</span>
</pre></div>
<p>The <code>include</code> arguments makes this a covering index.</p>
<p>For queries that only use the fields <code>joined_at</code> and <code>name</code>, the database will be able to satisfy queries using just the index:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="gp">... </span> <span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">joined_at__lt</span><span class="o">=</span><span class="n">pytz</span><span class="o">.</span><span class="n">UTC</span><span class="o">.</span><span class="n">localize</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2021</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)))</span>
<span class="gp">... </span> <span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'name'</span><span class="p">)</span>
<span class="gp">... </span> <span class="o">.</span><span class="n">explain</span><span class="p">()</span>
<span class="gp">... </span><span class="p">)</span>
<span class="hll"><span class="go">Index Only Scan using store_customer_joined_at_ix on store_customer</span>
</span><span class="go"> Index Cond: (joined_at < '2021-02-01 00:00:00+00'::timestamp with time zone)</span>
</pre></div>
<p>The query above finds names of customers that joined before February 2021. According to the execution plan, the database was able to satisfy the query using just the index, without even accessing the table. This is called an "index only scan".</p>
<p>Index only scans can be a bit confusing at first. As described in the <a href="https://www.postgresql.org/docs/current/indexes-index-only-scans.html" rel="noopener">official documentation of PostgreSQL</a>, it might take some time before PostgreSQL can actually use <em>just</em> the index:</p>
<blockquote>
<p>But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be βvisibleβ to the query's MVCC snapshot [...]. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway.</p>
</blockquote>
<p>Another way to check if a table page can be viewed by the current transaction is to check the table's visibility map, which is significantly smaller and faster to access than the table itself. It may take some time for PostgreSQL to update the visibility map, so until then you might see an execution plan like this one:</p>
<div class="highlight"><pre><span></span><span class="n">Bitmap</span><span class="w"> </span><span class="n">Heap</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">27.07..117.02</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">876</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">16</span><span class="p">)</span>
<span class="w"> </span><span class="k">Recheck</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">joined_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2021-02-01 00:00:00+00'</span><span class="o">::</span><span class="nb">timestamp</span><span class="w"> </span><span class="nb">with time zone</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Bitmap</span><span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer_joined_at_ix</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.00..26.86</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">876</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">joined_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2021-02-01 00:00:00+00'</span><span class="o">::</span><span class="nb">timestamp</span><span class="w"> </span><span class="nb">with time zone</span><span class="p">)</span>
</pre></div>
<p>To check if your index can really be used for index only scans, you can speed up the process by manually issuing <a href="https://www.postgresql.org/docs/current/sql-vacuum.html" rel="noopener">vacuum analyze</a> on the table:</p>
<div class="highlight"><pre><span></span><span class="k">VACUUM</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">store_customer</span><span class="p">;</span>
</pre></div>
<p>Executing <code>VACUUM</code> will also <a href="/postgresql-unused-index-size#index-and-table-bloat">reclaim some unused space</a> and make it available for re-use.</p>
<p><em>UPDATE 2020-03-04</em>: I originally suggested using <code>VACUUM FULL</code> instead of plain <code>VACUUM</code>. <a href="https://twitter.com/Adrien_nayrat/status/1367153917239427078?s=20" rel="noopener">A commenter on twitter</a> mentioned that using <code>VACUUM</code> will be sufficient for this purpose, and will be much less intrusive and disruptive, so use that instead!</p>
<p>It is also important to keep in mind that inclusive indexes are not free. Additional fields in the index will make the index bigger.</p>
<hr>
<h2 id="provide-timezone-to-truncdate"><a class="toclink" href="#provide-timezone-to-truncdate">Provide Timezone to TruncDate</a></h2>
<p><a href="https://code.djangoproject.com/ticket/31948" rel="noopener">Ticket #31948</a></p>
<p>I write a lot about <a href="sql-dos-and-donts">mistakes in SQL</a> and timezones are usually at the top of the list. One of the most dangerous mistakes when working with timestamps is <a href="sql-dos-and-donts#be-aware-of-timezones">truncating without explicitly specifying a timezone</a>, which can lead to incorrect and inconsistent results.</p>
<p>In Django 3.2 it becomes easier to avoid this mistake:</p>
<blockquote>
<p>The new tzinfo parameter of the TruncDate and TruncTime database functions allows truncating datetimes in a specific timezone.</p>
</blockquote>
<p>In previous Django versions, the timezone was set internally according to the current timezone:</p>
<div class="highlight"><pre><span></span><span class="c1"># django/db/models/functions/datetime.py</span>
<span class="n">tzname</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">get_current_timezone_name</span><span class="p">()</span> <span class="k">if</span> <span class="n">settings</span><span class="o">.</span><span class="n">USE_TZ</span> <span class="k">else</span> <span class="kc">None</span>
</pre></div>
<p>As of Django 3.2, you can explicitly provide a timezone to the <a href="https://docs.djangoproject.com/en/dev/ref/models/database-functions/#django.db.models.functions.TruncDate" rel="noopener"><code>TruncDate</code></a> functions family:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pytz</span>
<span class="kn">from</span> <span class="nn">django.db.models.functions</span> <span class="kn">import</span> <span class="n">TruncDay</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">joined_at_day</span><span class="o">=</span><span class="n">TruncDay</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">,</span> <span class="n">tzinfo</span><span class="o">=</span><span class="n">pytz</span><span class="o">.</span><span class="n">UTC</span><span class="p">))</span>
</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'joined_at_day'</span><span class="p">)</span>
<span class="c1"># SELECT DATE_TRUNC('day', "store_customer"."joined_at" AT TIME ZONE 'UTC') AS "joined_at_day"</span>
<span class="c1"># FROM "store_customer"</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">joined_at_day</span><span class="o">=</span><span class="n">TruncDay</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">,</span> <span class="n">tzinfo</span><span class="o">=</span><span class="n">pytz</span><span class="o">.</span><span class="n">timezone</span><span class="p">(</span><span class="s1">'America/New_York'</span><span class="p">)))</span>
</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'joined_at_day'</span><span class="p">)</span>
<span class="c1"># SELECT DATE_TRUNC(</span>
<span class="c1"># 'day',</span>
<span class="c1"># "store_customer"."joined_at" AT TIME ZONE 'America/New_York'</span>
<span class="c1"># ) AS "joined_at_day"</span>
<span class="c1"># FROM "store_customer"</span>
</pre></div>
<p>A step in the right direction!</p>
<hr>
<h2 id="building-json-objects"><a class="toclink" href="#building-json-objects">Building JSON Objects</a></h2>
<p><a href="https://code.djangoproject.com/ticket/32179" rel="noopener">Ticket #32179</a></p>
<p>Building JSON objects in PostgreSQL is very handy, especially if you are working with unstructured data.</p>
<p>As of Django 3.2, the function <a href="https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-CREATION-TABLE" rel="noopener"><code>json_build_object</code> from PostgreSQL</a> that accepts arbitrary key-value pairs was added to the ORM:</p>
<blockquote>
<p>Added the JSONObject database function.</p>
</blockquote>
<p>One interesting use case is serializing objects directly in the DB, bypassing the need to create ORM objects:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">F</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">django.db.models.functions</span> <span class="kn">import</span> <span class="n">JSONObject</span>
<span class="o">>>></span> <span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">obj</span><span class="o">=</span><span class="n">JSONObject</span><span class="p">(</span>
<span class="nb">id</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="n">name</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'name'</span><span class="p">),</span>
<span class="n">joined_at_day</span><span class="o">=</span><span class="n">TruncDay</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">,</span> <span class="n">tzinfo</span><span class="o">=</span><span class="n">pytz</span><span class="o">.</span><span class="n">UTC</span><span class="p">),</span>
<span class="p">)</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'obj'</span><span class="p">)</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="p">({</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Haki Benita'</span><span class="p">,</span>
<span class="s1">'joined_at_day'</span><span class="p">:</span> <span class="s1">'2021-04-25T00:00:00'</span><span class="p">,</span>
<span class="p">},)</span>
</pre></div>
<p>We already showed how crucial <a href="/django-rest-framework-slow">serialization performance</a> can be, so this is something to consider.</p>
<hr>
<h2 id="loud-signal-receiver"><a class="toclink" href="#loud-signal-receiver">Loud Signal Receiver</a></h2>
<p><a href="https://code.djangoproject.com/ticket/32261" rel="noopener">Ticket #32261</a></p>
<p>A while back I <a href="https://twitter.com/be_haki/status/1335921247306264579?s=20" rel="noopener">twitted about a mysterious bug</a> I had that went unnoticed for a long time because it happened inside a signal receiver.</p>
<p>When you use <code>send_robust</code> to broadcast signals, if the signal fails, Django keeps the error and moves on to the next receiver. After all of the receivers processed the signal, Django returns a list with the receivers' return values and exceptions. To check if any of the receivers failed, you need to go over the list and check for instances of <code>Exception</code>. Signals are often used to decouple modules, and handing exceptions from receivers this way defeats that purpose.</p>
<p>To make sure I don't miss exceptions in signal receivers again, I created a "loud signal receiver" that logs exceptions:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="k">def</span> <span class="nf">loud_receiver</span><span class="p">(</span><span class="n">signal</span><span class="p">,</span> <span class="n">logger</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Subscribe to Django signal and log errors from the receiver function.</span>
<span class="sd"> When using `send_robust` to send Django Signals, errors happening in the</span>
<span class="sd"> receivers are kept and returned. Because signals are mostly used for decoupling</span>
<span class="sd"> modules, the return value from `send_robust` is often dismissed.</span>
<span class="sd"> To make it easier not to miss errors from Django signal receivers, use this decorator</span>
<span class="sd"> instead to log the exceptions to a specific logger.</span>
<span class="sd"> NOTE: Not necessary as of Django 3.2</span>
<span class="sd"> Example:</span>
<span class="sd"> logger = logging.getLogger('some.logger')</span>
<span class="sd"> @loud_receiver(signals.SomeSignal, logger=logger, dispatch_uid='uid')</span>
<span class="sd"> def receiver_func():</span>
<span class="sd"> pass</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">_decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">loud_func</span><span class="p">(</span><span class="o">*</span><span class="n">func_args</span><span class="p">,</span> <span class="o">**</span><span class="n">func_kwargs</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">func_args</span><span class="p">,</span> <span class="o">**</span><span class="n">func_kwargs</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">exception</span><span class="p">(</span><span class="s1">'exception from signal receiver'</span><span class="p">)</span>
<span class="k">raise</span>
<span class="k">return</span> <span class="n">receiver</span><span class="p">(</span><span class="n">signal</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)(</span><span class="n">loud_func</span><span class="p">)</span>
<span class="k">return</span> <span class="n">_decorator</span>
</pre></div>
<p>As of Django 3.2 this is no longer necessary:</p>
<blockquote>
<p>Signal.send_robust() now logs exceptions.</p>
</blockquote>
<p>Great!</p>
<hr>
<h2 id="queryset-alias"><a class="toclink" href="#queryset-alias">QuerySet Alias</a></h2>
<p><a href="https://code.djangoproject.com/ticket/27719" rel="noopener">Ticket #27719</a></p>
<p>The <code>alias</code> function is an entirely new feature in Django 3.2:</p>
<blockquote>
<p>The new QuerySet.alias() method allows creating reusable aliases for expressions that donβt need to be selected but are used for filtering, ordering, or as a part of complex expressions.</p>
</blockquote>
<p>I often use <code>SubQuery</code> and <code>OuterRef</code> to write complex queries, and there is a little gotcha when combined with <code>annotate</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Subquery</span><span class="p">,</span> <span class="n">OuterRef</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">id_of_previous_customer</span><span class="o">=</span><span class="n">Subquery</span><span class="p">(</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">joined_at__lt</span><span class="o">=</span><span class="n">OuterRef</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">))</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-joined_at'</span><span class="p">)</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)[:</span><span class="mi">1</span><span class="p">],</span>
<span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">id_of_previous_customer__isnull</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>The query above is a complicated way to find the first customer that joined. The queryset is using <code>SubQuery</code> to find the previous customer for every customer by <code>joined_at</code>, and then looks for the customer which no other customer has joined before. This is very inefficient, but I'm using it to illustrate my point.</p>
<p>To understand the problem, inspect the query this queryset is producing:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span>
</span><span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="w"> </span><span class="n">U0</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"joined_at"</span>
</span><span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span>
</span><span class="hll"><span class="w"> </span><span class="k">DESC</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"id_of_previous_customer"</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="ss">"store_customer"</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span>
</span><span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="w"> </span><span class="n">U0</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"joined_at"</span>
</span><span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="w"> </span><span class="k">DESC</span>
</span><span class="hll"><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span>
</pre></div>
<p>The annotated subquery appears in both the <code>SELECT</code> and the <code>WHERE</code> clauses. This affects the execution plan:</p>
<div class="highlight"><pre><span></span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">4401</span><span class="p">.</span><span class="mi">05</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">50</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
<span class="w"> </span><span class="n">Filter</span><span class="p">:</span><span class="w"> </span><span class="p">((</span><span class="n">SubPlan</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="n">SubPlan</span><span class="w"> </span><span class="mi">1</span>
</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Limit</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">Backward</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">store_customer_joined_at_ix</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="n">u0</span>
<span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">450</span><span class="p">.</span><span class="mi">59</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">3333</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">joined_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">store_customer</span><span class="p">.</span><span class="n">joined_at</span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="n">SubPlan</span><span class="w"> </span><span class="mi">2</span>
</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Limit</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">Backward</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">store_customer_joined_at_ix</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="n">u0_1</span>
<span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">450</span><span class="p">.</span><span class="mi">59</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">3333</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">joined_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">store_customer</span><span class="p">.</span><span class="n">joined_at</span><span class="p">)</span>
</pre></div>
<p>The subquery is executed twice!</p>
<p>To solve this in Django versions prior to 3.2, you can provide a <code>values_list</code> that excludes the annotated subquery from the <code>SELECT</code> clause:</p>
<div class="highlight"><pre><span></span><span class="c1"># Django 3.1</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Subquery</span><span class="p">,</span> <span class="n">OuterRef</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">id_of_previous_customer</span><span class="o">=</span><span class="n">Subquery</span><span class="p">(</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">joined_at__lt</span><span class="o">=</span><span class="n">OuterRef</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">))</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-joined_at'</span><span class="p">)</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)[:</span><span class="mi">1</span><span class="p">],</span>
<span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">id_of_previous_customer__isnull</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="hll"><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span>
</span></pre></div>
<p><em>Side note</em>: You might think that instead of using <code>values_list</code> in this case you can omit the annotated field using <code>.defer('id_of_previous_customer')</code>. This won't work. Django will throw a <code>KeyError: 'id_of_previous_customer'</code> at you!</p>
<p>Starting with Django 3.2, you can replace <code>annotate</code> with <code>alias</code> and the field will not be added to the select clause:</p>
<div class="highlight"><pre><span></span><span class="c1"># Django 3.2</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Subquery</span><span class="p">,</span> <span class="n">OuterRef</span>
<span class="hll"><span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span>
</span> <span class="n">id_of_previous_customer</span><span class="o">=</span><span class="n">Subquery</span><span class="p">(</span>
<span class="n">Customer</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">joined_at__lt</span><span class="o">=</span><span class="n">OuterRef</span><span class="p">(</span><span class="s1">'joined_at'</span><span class="p">))</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-joined_at'</span><span class="p">)</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)[:</span><span class="mi">1</span><span class="p">],</span>
<span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">id_of_previous_customer__isnull</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>The generated SQL now uses the subquery only once:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="p">,</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"name"</span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"store_customer"</span>
<span class="k">WHERE</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span>
</span><span class="hll"><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="w"> </span><span class="n">U0</span>
</span><span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="ss">"store_customer"</span><span class="p">.</span><span class="ss">"joined_at"</span>
</span><span class="hll"><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"joined_at"</span><span class="w"> </span><span class="k">DESC</span>
</span><span class="hll"><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
</span><span class="p">)</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span>
</pre></div>
<p>The execution plan is simpler:</p>
<div class="highlight"><pre><span></span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">4380</span><span class="p">.</span><span class="mi">04</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">50</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">28</span><span class="p">)</span>
<span class="w"> </span><span class="n">Filter</span><span class="p">:</span><span class="w"> </span><span class="p">((</span><span class="n">SubPlan</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="n">SubPlan</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Limit</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">Backward</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">store_customer_joined_at_ix</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">store_customer</span><span class="w"> </span><span class="n">u0</span>
<span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">29</span><span class="p">..</span><span class="mi">450</span><span class="p">.</span><span class="mi">59</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">3333</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">joined_at</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">store_customer</span><span class="p">.</span><span class="n">joined_at</span><span class="p">)</span>
</pre></div>
<p>One less way to shoot yourself in the foot!</p>
<hr>
<h2 id="new-admin-decorators"><a class="toclink" href="#new-admin-decorators">New Admin Decorators</a></h2>
<p><a href="https://code.djangoproject.com/ticket/16117" rel="noopener">Ticket #16117</a></p>
<p>Before Django 3.2, to <a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger">customize a calculated field in Django admin</a> you first added a function, and then assigned some attributes to it:</p>
<div class="highlight"><pre><span></span><span class="c1"># Django 3.1</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Customer</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Customer</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomerAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">list_display</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'joined_at'</span><span class="p">,</span>
<span class="s1">'joined_at_year'</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">joined_at_year</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">:</span> <span class="n">Customer</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">return</span> <span class="n">obj</span><span class="o">.</span><span class="n">joined_at</span><span class="o">.</span><span class="n">year</span>
<span class="hll"> <span class="n">joined_at_year</span><span class="o">.</span><span class="n">admin_order_field</span> <span class="o">=</span> <span class="s1">'joined_at__year'</span>
</span><span class="hll"> <span class="n">joined_at_year</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Year joined'</span>
</span></pre></div>
<p>This is the kind of weird APIs that are mostly only possible in dynamic languages such as Python.</p>
<p>If you are using Mypy (<a href="/python-mypy-exhaustive-checking">and you should</a>), this code will trigger an annoying warning, and the only way to silence it is to add a <code>type: ignore</code> comment:</p>
<div class="highlight"><pre><span></span><span class="o">...</span>
<span class="n">joined_at_year</span><span class="o">.</span><span class="n">admin_order_field</span> <span class="o">=</span> <span class="s1">'joined_at__year'</span> <span class="c1"># type: ignore[attr-defined]</span>
<span class="n">joined_at_year</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Year joined'</span> <span class="c1"># type: ignore[attr-defined]</span>
</pre></div>
<p>If you are using <a href="tag/django-admin">Django Admin</a> and <a href="python-mypy-exhaustive-checking">Mypy</a> as much as I do, this can be pretty annoying.</p>
<p>The new <a href="https://docs.djangoproject.com/en/dev/ref/contrib/admin/#the-display-decorator" rel="noopener"><code>display</code> decorator</a> solves this problem:</p>
<blockquote>
<p>The new display() decorator allows for easily adding options to custom display functions that can be used with list_display or readonly_fields.
Likewise, the new action() decorator allows for easily adding options to action functions that can be used with actions.</p>
</blockquote>
<p>Adjusting the code to use the new <code>display</code> decorator:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="nd">@admin</span><span class="o">.</span><span class="n">display</span><span class="p">(</span><span class="n">ordering</span><span class="o">=</span><span class="s1">'joined_at__year'</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s1">'Year joined'</span><span class="p">)</span>
</span><span class="k">def</span> <span class="nf">joined_at_year</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">:</span> <span class="n">Customer</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">return</span> <span class="n">obj</span><span class="o">.</span><span class="n">joined_at</span><span class="o">.</span><span class="n">year</span>
</pre></div>
<p>No type errors!</p>
<p>Another useful decorator is <a href="https://docs.djangoproject.com/en/dev/ref/contrib/admin/actions/#django.contrib.admin.action" rel="noopener"><code>action</code></a> which uses a similar approach to customize <a href="https://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.actions" rel="noopener">custom admin actions</a>.</p>
<hr>
<h2 id="value-expression-detects-type"><a class="toclink" href="#value-expression-detects-type">Value Expression Detects Type</a></h2>
<p><a href="https://code.djangoproject.com/ticket/30446" rel="noopener">Ticket #30446</a></p>
<p>This is a small feature that addresses a small nuisance in the ORM:</p>
<blockquote>
<p>Value() expression now automatically resolves its output_field to the appropriate Field subclass based on the type of its provided value for bool, bytes, float, int, str, datetime.date, datetime.datetime, datetime.time, datetime.timedelta, decimal.Decimal, and uuid.UUID instances. As a consequence, resolving an output_field for database functions and combined expressions may now crash with mixed types when using Value(). You will need to explicitly set the output_field in such cases.</p>
</blockquote>
<p>In previous Django versions if you wanted to use some constant value in a query, you had to explicitly set an <code>output_field</code>, otherwise it will fail:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="c1"># Django 3.1</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Value</span>
<span class="gp">>>> </span><span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="go"> number=Value(1),</span>
<span class="go"> text=Value('text'),</span>
<span class="go"> boolean=Value(True),</span>
<span class="go"> date_=Value(datetime.date(2020, 1, 1)),</span>
<span class="go"> datetime_=Value(pytz.UTC.localize(datetime.datetime(2020, 1, 1))),</span>
<span class="go">).values_list('number', 'text', 'boolean', 'date_', 'datetime_').first()</span>
<span class="hll"><span class="go">FieldError: Cannot resolve expression type, unknown output_field</span>
</span></pre></div>
<p>In Django 3.2, the ORM figures it out on its own:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="c1"># Django 3.2</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Value</span>
<span class="gp">>>> </span><span class="n">Customer</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="go"> number=Value(1),</span>
<span class="go"> text=Value('text'),</span>
<span class="go"> boolean=Value(True),</span>
<span class="go"> date_=Value(datetime.date(2020, 1, 1)),</span>
<span class="go"> datetime_=Value(pytz.UTC.localize(datetime.datetime(2020, 1, 1))),</span>
<span class="go">).values_list('number', 'text', 'boolean', 'date_', 'datetime_').first()</span>
<span class="hll"><span class="go">(1, 'text', True, datetime.date(2020, 1, 1), datetime.datetime(2020, 1, 1, 0, 0, tzinfo=<UTC>))</span>
</span></pre></div>
<p>Very cool!</p>
<hr>
<h2 id="more-mentionable-features"><a class="toclink" href="#more-mentionable-features">More Mentionable Features</a></h2>
<p>There are plenty more features in Django 3.2 that the documentation explains better than me. To name a few:</p>
<ul>
<li>
<p><strong>Navigable links in the admin</strong> (<a href="https://code.djangoproject.com/ticket/31181" rel="noopener">Ticket #31181</a>): Read-only related fields are now rendered as navigable links if target models are registered in the admin. I'm still using a <a href="things-you-must-know-about-django-admin-as-your-app-gets-bigger#admin_link">decorator to add links to Django Admin</a>, guess now i'll have less use for it.</p>
</li>
<li>
<p><strong><a href="https://docs.djangoproject.com/en/dev/topics/db/transactions/#controlling-transactions-explicitly" rel="noopener">Durable argument for <code>atomic()</code></a></strong> (<a href="https://code.djangoproject.com/ticket/32220" rel="noopener">Ticket #32220</a>): When you execute code inside a database transaction, when the transaction finishes without any errors you expect it to be committed to the database. However, if the caller executed your code inside a database transaction of his own, if the parent transaction is rolled back, so is yours. To prevent this from happening, you can now mark your transaction as <code>durable</code>. When there is an attempt to open a durable transaction inside another transaction, a <code>RuntimeError</code> error is raised.</p>
</li>
<li>
<p><strong>Cached templates are reloaded on Django's development server</strong> (<a href="https://code.djangoproject.com/ticket/25791" rel="noopener">Ticket #25791</a>): If you are using Django's <code>runserver</code> command to develop locally, you probably got used to it reloading when a python file changes. However, if you are using Django's <a href="https://docs.djangoproject.com/en/dev/ref/templates/api/#django.template.loaders.cached.Loader" rel="noopener"><code>django.template.loaders.cached.Loader</code> loader</a>, when an HTML file is changing the dev server will not reload it, and you will have to restart the devserver to see the changes. This is pretty annoying, and so far I had to disable the cached loader in dev. Starting at Django 3.2 this is no longer necessary because cached templates are correctly reloaded in development.</p>
</li>
<li>
<p><strong>Support for <a href="https://docs.djangoproject.com/en/dev/releases/3.2/#functional-indexes" rel="noopener">function based indexes</a></strong> (<a href="https://code.djangoproject.com/ticket/26167" rel="noopener">Ticket #26167</a>): FBIs are useful when you query an expression often and you want to index it. A classic example is indexing lower case texts.</p>
</li>
</ul>
<hr>
<h2 id="wishlist"><a class="toclink" href="#wishlist">Wishlist</a></h2>
<p>Django ORM is comprehensive and feature rich, but there are still some things on my wishlist for future versions:</p>
<ul>
<li>
<p><strong>Custom joins</strong>: Django is currently able to perform joins only between tables that are connected via <code>ForeignKey</code>. There are situations where you want to join tables that are not necessarily connected with a foreign key, or using more complex conditions. One common example is <a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension" rel="noopener">slowly changing dimensions</a>, where a join condition require a <code>BETWEEN</code> operator.</p>
</li>
<li>
<p><strong>Update returning</strong>: When updating many rows it's sometimes useful to fetch them immediately. This is a well known (<a href="sql-tricks-application-dba#implement-complete-processes-using-with-and-returning">and a very useful</a>) feature in SQL. Django currently <a href="https://code.djangoproject.com/ticket/28682" rel="noopener">has no support for it</a>, but I hear <a href="https://groups.google.com/g/django-developers/c/qQ5DT91nBLM" rel="noopener">it might soon</a>.</p>
</li>
<li>
<p><strong>Database views</strong>: There are many hacks for getting database views to work with the ORM. These hacks usually involves creating a view directly in the database or in a manual migration, and then setting <a href="https://docs.djangoproject.com/en/3.2/ref/models/options/#managed" rel="noopener"><code>managed=False</code></a> on the model. These hacks get the job done, but not in a very graceful way. I wish there was a way to define database views so that migrations can detect changes. Maybe even an option to create views using a Django queryset.</p>
</li>
<li>
<p><strong>Database partitions</strong>: Database partitions are extremely useful in data modeling. When used correctly they can make queries much faster and maintenance a lot easier. Some database engines such as Oracle already provide very mature implementations for database partitioning, and other engines such as PostgreSQL are getting there. At the moment, there is no native support for database partitioning in Django, and most implementations I've seen resort to manually managing tables. As a result, I often avoid partitions all together and that's unfortunate.</p>
</li>
<li>
<p><strong>Require authentication by default</strong>: Django currently permits access to any view unless explicitly marked otherwise, usually using the <a href="https://docs.djangoproject.com/en/3.2/topics/auth/default/#the-login-required-decorator" rel="noopener"><code>require_login</code> decorator</a>. This makes it easier to get started with Django, but it can potentially cause security issues down the road if you are not careful. I know there are solutions for this, usually using custom middleware and decorators. I really wish Django had an option to flip the condition so that access is <em>restricted</em> by default unless marked otherwise.</p>
</li>
<li>
<p><strong>Typing</strong>: If you are following this blog you know I'm a <a href="/python-mypy-exhaustive-checking">big fan of type hinting in Python</a>. At the moment, Django does not come with type hinting or official stubs. Shiny new frameworks such as <a href="https://www.starlette.io/" rel="noopener">Starlette</a> and <a href="https://fastapi.tiangolo.com/" rel="noopener">FastAPI</a> advertise themselves as being 100% type annotated, but Django is still lagging behind. There is a project called <a href="https://github.com/typeddjango/django-stubs" rel="noopener">django-stubs</a> that is making some progress in this regard.</p>
</li>
<li>
<p><strong>Database connection pooling</strong> Django currently supports two modes for managing database connections - creating a new connection per request, or a new connection per thread (persistent connections). Creating database connections in common deployments is a relatively heavy operation. It requires setting up a TCP connection, often a TLS connection, and initializing the connection, which adds significant latency. In PostgreSQL in particular, it also consumes many database server resources, so creating a new connection per request is a really bad idea. <br><br>
Persistent connections are much better. They work well with the way Django is usually deployed, small amount of worker processes and/or threads. But such deployments tend to breakdown under real world conditions. Whenever your database or one of your upstreams starts taking longer to process requests for some reason, the workers get tied up, requests back up, and the entire system chokes. Even with strict timeouts, this will still happen. <br><br>
To improve upon this catastrophic failure mode, a common solution is to use async workers such as <a href="http://www.gevent.org/api/gevent.greenlet.html" rel="noopener">gevent greenlets</a>, or in the future, <a href="https://docs.python.org/3/library/asyncio-task.html#creating-tasks" rel="noopener">asycnio tasks</a>. But now, each request gets its own lightweight thread, hence its own connection, which renders Django's persistent connections feature useless.<br><br>
It would be great if Django included a high-quality connection pool, which maintains a certain number of connections and hands them out to requests as needed. External solutions like <a href="https://www.pgbouncer.org/" rel="noopener">PgBouncer</a> exist, but they add operational overhead. A built-in solution would often be sufficient.</p>
</li>
</ul>The Unexpected Find That Freed 20GB of Unused Index Space2021-02-01T00:00:00+02:002021-02-01T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-02-01:/postgresql-unused-index-size<p>In this article I describe the process we took to identify potential free space, and one surprising find that helped up clear up ~10GB of unused indexed values!</p><hr>
<p>Every few months we get an alert from our database monitoring to warn us that we are about to run out of space. Usually we just provision more storage and forget about it, but this time we were under quarantine, and the system in question was under less load than usual. We thought this is a good opportunity to do some cleanups that would otherwise be much more challenging.</p>
<p>To start from the end, <strong>we ended up freeing more than 70GB of un-optimized and un-utilized space</strong> without dropping a single index or deleting any data!</p>
<p>Using conventional techniques such as rebuilding indexes and tables we cleared up a lot of space, but then <strong>one surprising find helped us clear an additional ~20GB of unused indexed values!</strong></p>
<p>This is what the free storage chart of one of our databases looked like in the process:</p>
<figure><img alt="Free space over time (higher means more free space)" src="https://hakibenita.com/images/00-postgresql-unused-index-size.png"><figcaption>Free space over time (higher means more free space)</figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#the-usual-suspects">The Usual Suspects</a><ul>
<li><a href="#unused-indexes">Unused Indexes</a></li>
<li><a href="#index-and-table-bloat">Index and Table Bloat</a><ul>
<li><a href="#clearing-bloat-in-indexes">Clearing Bloat in Indexes</a></li>
<li><a href="#activating-b-tree-index-deduplication">Activating B-Tree Index Deduplication</a></li>
<li><a href="#clearing-bloat-in-tables">Clearing Bloat in Tables</a></li>
<li><a href="#using-pg_repack">Using pg_repack</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#the-find">The "Find"</a><ul>
<li><a href="#the-aha-moment">The "Aha Moment"</a></li>
<li><a href="#utilizing-partial-indexes">Utilizing Partial Indexes</a></li>
</ul>
</li>
<li><a href="#bonus-migrating-with-django-orm">Bonus: Migrating with Django ORM</a><ul>
<li><a href="#prevent-implicit-creation-of-indexes-on-foreign-keys">Prevent Implicit Creation of Indexes on Foreign Keys</a></li>
<li><a href="#migrate-exiting-full-indexes-to-partial-indexes">Migrate Exiting Full Indexes to Partial Indexes</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="the-usual-suspects"><a class="toclink" href="#the-usual-suspects">The Usual Suspects</a></h2>
<p>Provisioning storage is something we do from time to time, but before we throw money at the problem we like to make sure we make good use of the storage we already have. To do that, we start with the usual suspects.</p>
<h3 id="unused-indexes"><a class="toclink" href="#unused-indexes">Unused Indexes</a></h3>
<p>Unused indexes are double-edged swords; you create them to make things faster, but they end up taking space and slow inserts and updates. Unused indexes are the first thing we always check when we need to clear up storage.</p>
<p>To find unused indexes we use the following query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">relname</span><span class="p">,</span>
<span class="w"> </span><span class="n">indexrelname</span><span class="p">,</span>
<span class="w"> </span><span class="n">idx_scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">idx_tup_read</span><span class="p">,</span>
<span class="w"> </span><span class="n">idx_tup_fetch</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="n">indexrelname</span><span class="p">::</span><span class="n">regclass</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">size</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">pg_stat_all_indexes</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">schemaname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'public'</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">indexrelname</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'pg_toast_%'</span>
<span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">idx_scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span>
</span><span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">idx_tup_read</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span>
</span><span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">idx_tup_fetch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span>
</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="n">indexrelname</span><span class="p">::</span><span class="n">regclass</span><span class="p">)</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
</pre></div>
<p>The query is looking for <strong>indexes that were not scanned or fetched</strong> since the last time the statistics were reset.</p>
<p>Some indexes may seem like they were not used but they were in-fact used:</p>
<ul>
<li>
<p><a href="https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW" rel="noopener">The documentation</a> lists a few scenarios when this is possible. For example, when the optimizer uses meta data from the index, but not the index itself.</p>
</li>
<li>
<p>Indexes used to enforce unique or primary key constraints for tables that were not updated in a while. The indexes will look like they were not used, but it doesn't mean we can dispose of them.</p>
</li>
</ul>
<p>To find the unused indexes you can actually drop, you usually have to go over the list one by one and make a decision. This can be time consuming in the first couple of times, but after you get rid of most unused indexes it becomes easier.</p>
<p>It's also a good idea to <strong>reset the statistics counters from time to time</strong>, usually right after you finished inspecting the list. PostgreSQL provides a few <a href="https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-FUNCS-TABLE" rel="noopener">functions to reset statistics</a> at different levels. When we find an index we suspect is not being used, or when we add new indexes in place of old ones, we usually reset the counters for the table and wait for a while:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Find table oid by name</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">relname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'table_name'</span><span class="p">;</span>
<span class="c1">-- Reset counts for all indexes of table</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">pg_stat_reset_single_table_counters</span><span class="p">(</span><span class="mi">14662536</span><span class="p">);</span>
</pre></div>
<p>We do this every once in a while, so in our case there were no unused indexes to drop.</p>
<h3 id="index-and-table-bloat"><a class="toclink" href="#index-and-table-bloat">Index and Table Bloat</a></h3>
<p>The next suspect is bloat. When you update rows in a table, PostgreSQL marks the tuple as dead and adds the updated tuple in the next available space. This process creates what's called "bloat", which can cause tables to consume more space than they really need. Bloat also affects indexes, so to free up space, bloat is a good place to look.</p>
<p>Estimating bloat in tables and indexes is apparently not a simple task. Lucky for us, some good people on the world wide web already <a href="https://wiki.postgresql.org/wiki/Show_database_bloat" rel="noopener">did the hard work</a> and wrote queries to estimate <a href="https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql" rel="noopener">table bloat</a> and <a href="https://github.com/ioguix/pgsql-bloat-estimation/blob/master/btree/btree_bloat.sql" rel="noopener">index bloat</a>. After running these queries you will most likely find <em>some</em> bloat, so the next thing to do it clear up that space.</p>
<h4 id="clearing-bloat-in-indexes"><a class="toclink" href="#clearing-bloat-in-indexes">Clearing Bloat in Indexes</a></h4>
<p>To clear bloat in an index, you need to rebuild it. There are several ways to rebuild an index:</p>
<ol>
<li>
<p><strong>Re-create the index</strong>: If you re-create the index, it will be built in an optimal way.</p>
</li>
<li>
<p><strong>Rebuild the index</strong>: Instead of dropping and creating the index yourself, PostgreSQL provides a way to re-build an existing index in-place using the <a href="https://www.postgresql.org/docs/current/sql-reindex.html" rel="noopener"><code>REINDEX</code></a> command:</p>
</li>
</ol>
<div class="highlight"><pre><span></span><span class="k">REINDEX</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">index_name</span><span class="p">;</span>
</pre></div>
<ol>
<li><strong>Rebuild the index concurrently</strong>: The previous methods will obtain a lock on the table and prevent it from being changed while the operation is in progress, which is usually unacceptable. To rebuild the index without locking it for updates, you can <a href="https://www.postgresql.org/docs/current/sql-reindex.html#SQL-REINDEX-CONCURRENTLY" rel="noopener">rebuilt the index concurrently</a>:</li>
</ol>
<div class="highlight"><pre><span></span><span class="k">REINDEX</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">CONCURRENTLY</span><span class="w"> </span><span class="n">index_name</span><span class="p">;</span>
</pre></div>
<p>When using <code>REINDEX CONCURRENTLY</code>, PostgreSQL creates a new index with a name suffixed with <code>_ccnew</code>, and syncs any changes made to the table in the meantime. When the rebuild is done, it will switch the old index with the new index, and drop the old one.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 314.6178987276779 320.25169897053615" width="auto" height="50vh"><g transform="translate(10 107.64129082580851) rotate(0 6.281818150930462 6.5734725011127075)"><path d="M-1.3842925820499659 -1.1872281339019537 L11.874384720654007 1.136923560872674 L13.000843186707016 13.56048340789713 L1.9123801793903112 13.650584351939337" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.05795299176248214 0.783424572535047 C4.617292622409793 0.4092872833196779, 8.339113611975323 -0.6279773110008966, 13.269594132837716 -0.3166460868843761 M-0.12968918729727869 0.2332091903590222 C4.088457091815665 0.6669499351364089, 7.727869091193958 0.1664303611183769, 13.108608599698075 0.6161000606459712 M13.09428494502322 -0.5669882488247653 C12.66378901352661 5.217510340309369, 13.376305351980335 8.800848271500545, 12.96301501223118 12.876522021465087 M12.746510372692322 0.6553986082474832 C12.581113803029872 2.5919560008029765, 12.622839104445353 5.245383714807264, 12.558198560950945 13.600653581123265 M11.415610236150687 12.46772735150659 C9.355425389381876 12.362106375017214, 6.354124046039259 13.711371459097675, -0.20865465424804808 12.188989366410045 M11.995184746102135 13.659943864136133 C8.592181550162307 13.001284919018138, 5.1106813878129635 13.685477705799183, -0.04532038597203758 12.815673899989063 M-1.1000040886275568 12.731057849432949 C0.21902566548406913 9.489100552062574, 1.1305611659235009 5.642408810245789, -0.3697369778005192 0.7188796123611438 M-0.14602182774982642 12.604513383399677 C-0.36849946392273375 7.913082986757547, -0.5731878546013287 3.708427301249994, 0.21159051344801982 -0.1758975795591194" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(37.36399155052743 110.03810659595638) rotate(0 7.661128495758078 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7823861285718032 6.21932599772989 C1.4500921559493123 5.108182356875764, 3.441469844423457 1.8062618040133345, 5.504833485516256 0.7565337944624239 M-0.5410761464061091 6.493475117956392 C1.1042147590433127 4.450646080947254, 2.799193682642527 2.947971151017162, 4.936016817348546 0.20714467091832695 M0.749740474773537 12.50263638687092 C3.429416267581061 8.328572483740857, 8.430646119535286 3.310563269361819, 9.894020706367623 -1.2278720308944822 M-0.2530843212176455 11.5469090965088 C3.6198122759283624 8.115291381178743, 7.263796162362432 3.2204682975414873, 11.011310499113929 0.26771362601665283 M1.9822030287651877 16.18222850012341 C7.9079710780281545 7.660818137719537, 11.162632307947874 3.256727180213902, 17.40162397288107 -1.3712397212232479 M2.8965779627623567 15.074667477474824 C6.039740370888001 9.748772210846312, 10.362500897020059 6.389892315776484, 16.156475719300513 0.5774116917909107 M7.991257522999459 15.791074121828208 C11.157692524742462 13.017885557118333, 13.823949476039282 10.31235320142318, 15.124491205474788 6.933083097492914 M8.542597238951512 15.010502843410725 C10.065679500528379 12.58774874205957, 11.700320943807514 10.608037233401689, 15.43983565409524 7.124962455402698 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M6.2573820087746075 12.258746413783095 C3.6406931770093367 10.838096683894566, 2.6110627295665267 8.509710568718162, 0.7097642420763544 6.642233357922574 M5.953746929733484 12.289846019715926 C4.156915574139798 11.156445197377138, 2.099160544225261 9.062240973186897, 0.6041243846500686 7.300675533683439 M12.639922596477781 12.234711891984539 C9.479159855375592 8.829157312289606, 5.428766041175559 6.918530999021034, -0.8353254491257551 1.817709034658029 M11.564287603610072 11.857971174841717 C8.586089238068022 9.110229739716594, 4.070770943018075 5.774034595831673, 0.3099437890780087 1.9091063685482288 M17.79961197254866 9.889928571972144 C13.390884911628714 7.16659965384633, 10.133121394450043 5.370337843963153, 2.6919010213791985 -3.8236372402865806 M15.813024015069601 11.302633100281549 C12.24047516447084 6.120951890525337, 5.277437527884766 1.260175445221373, 0.9323548746511001 -2.8756859523240843 M17.219884106443903 4.846063522998845 C13.325119344161799 3.2295918377387403, 12.60798064011254 0.9736529774247229, 8.90229585811149 -0.8253047820503379 M16.033102332215464 5.703039742032667 C13.261186573292777 3.347397258867739, 10.59688428133295 0.8245542573304023, 8.640990040382627 -1.7184323719559724" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.9554425442952907 0.17351121434821248 C6.020914674081412 -0.7521373383153069, 10.362654422614952 0.8538985044074764, 14.936084341971874 0.11788972649868734 M0.28441536045858595 0.2716002979256231 C5.26031749302167 0.027190904072255995, 9.21531618367666 0.6962659218955699, 16.073635273638825 0.4665317254535257 M14.785011498779108 0.20878779183966767 C15.893441702951487 3.482225933567059, 14.79302138157144 5.575025505643894, 15.066019709972592 11.224815477452086 M15.9432750619205 -0.05896997058487752 C15.213790548045978 4.47305994358339, 15.223313621437892 8.979834340188118, 15.752165176622714 11.956165625267571 M14.493902274184343 11.908574566482287 C10.239204099332579 11.472190360597354, 7.782717539140194 12.06281046436649, -1.1682957135793768 12.753982614078424 M15.947895955403313 12.50718534840714 C9.09735388014193 13.015962183585783, 3.859093964666937 12.263364861976484, -0.40400890636864245 12.712069310836586 M-0.3940707744615911 12.548505916712244 C0.5718840886765646 10.83823815603984, 1.0784481677227495 6.958975438401683, 0.6811690231007677 0.939137292326574 M-0.5139770408579577 12.209975628186587 C-0.26907735731890764 7.260290507492793, 0.4231567017782914 3.1463125241907344, -0.16667044157858651 -0.4311133475963682" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(137.17690806005203 10) rotate(0 14.902507806102847 12.090713880423056)"><path d="M-0.6892515812069178 1.136923560872674 L30.242222497051753 0.41353840567171574 L31.717395791595973 24.685067110560034 L-0.8249499592930079 23.021942728064154" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.22648257948458195 0.5405943002551794 C6.315562732056079 -1.2964765519575752, 15.70000154246449 -0.6899212688879646, 29.958895979261197 -0.04080186225473881 M0.35451735090464354 0.9070455180481076 C6.630746084515305 -0.00037904585653902867, 14.140627886668447 0.8701175550097787, 30.413975148273803 -0.9142344547435641 M30.140221414899624 0.7193018402904272 C31.3217514927654 8.597681899834354, 30.510180073526357 18.878792901150742, 27.826295731401242 25.394061450711582 M29.710340170933105 -0.07822566758841276 C29.841088208187912 6.031803132773185, 29.864016118277405 12.879690705412616, 29.000467895342208 23.85858704508761 M29.088782666063107 22.90088451360831 C19.296032693008062 24.887641517194606, 10.66106813560243 22.495329765352107, 0.3872703406959772 26.108043496362065 M29.87014377005229 23.98273517550448 C22.268235174484772 25.315298706557957, 12.838883456053484 23.897478580739705, 0.3325612945482135 23.247676897659098 M0.14644611813127995 25.281996075860356 C1.7354599115129878 16.635301946417464, -0.36014242203269686 6.3550838759345325, 1.5077714417129755 1.6352629270404577 M-0.39705940056592226 23.727103043212686 C-0.6547287341754147 17.853970332811464, 0.3756015959580234 9.944831648594322, -0.6921462910249829 -0.5936140669509768" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(140.75580855110945 59.579564489112144) rotate(0 11.799059530240754 8.987265604560992)"><path d="M1.136923560872674 0.43720688484609127 L24.011657466153224 1.9123801793903112 L24.10175841019543 17.149581249828977 L-1.1594850327819586 16.47251379119937" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M0.5405943002551794 -0.972532382234931 C6.077665226765294 1.0970884995034595, 13.87250859895412 0.12715755267652468, 23.55731719822677 -1.5466928984969854 M0.9070455180481076 -0.0057982997968792915 C6.702519104674329 0.7838796914875356, 14.165264305375258 0.5253001095115987, 22.683884605737944 0.988635073415935 M24.244574729335465 1.275412213644298 C22.579267645625464 6.663894741379943, 23.150877431059946 10.457398523244194, 24.68794516566749 17.112511783432044 M23.52781557531039 -0.028040412727310615 C23.72076882198848 5.884976510081964, 24.39764287054829 11.061753599772697, 23.307973534432683 18.11456145418283 M22.317575813243707 16.871443793203277 C19.542352360203246 17.55786598738598, 12.226927720029 18.277292798127405, 1.926615735515952 17.6427002996463 M23.399426475139876 18.884744183186037 C19.54458511392089 17.917754896770237, 13.182759995682854 18.322461523900746, -0.9337508631870151 18.886614308718187 M0.9891099762997153 19.35239950287033 C1.7768877133294114 13.580594595726321, 1.039713917294642 11.55274259122118, 1.469654225860444 -0.20063965317028365 M-0.4083136908088776 17.337832976046787 C0.7388284377620029 13.270741554876434, -0.09216481250280911 7.3663276438471055, -0.533496728629208 -0.3097243514335105" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(76.03758884250306 58.6756194766938) rotate(0 10.764576771620057 9.332093190767882)"><path d="M0.41353840567171574 1.9123801793903112 L22.032792892954035 -0.8249499592930079 L20.369668510458155 17.162168963613148 L0.11721945740282536 16.9646146733185" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-1.110622862353921 1.1238113138824701 C6.198477395117356 0.25856936197007985, 13.364288086548594 -1.4420039035824408, 21.116250195256157 0.7424894664436579 M0.11027050483971834 0.8675391180440784 C8.20275901681495 0.610207860934309, 13.529749367474263 0.9898833982821026, 21.932782403840406 -0.4312699632719159 M20.551703393238842 0.5669818110523825 C21.21237088680448 4.065513260734155, 22.72792052130785 10.929052398793285, 22.04839139534486 20.525072203520956 M21.470420605635834 -0.007719740960441812 C22.15866508807139 4.671740479275288, 21.213275394319957 10.990506803212263, 20.67641585064616 18.159673017647346 M22.396499731293602 18.332029911490316 C14.316436369990583 18.93271403048709, 12.080266059835024 18.21361278031543, -1.8098313007503748 20.297467860194082 M22.41962458909927 18.59204104895347 C14.402842989282139 19.192708824840718, 7.397479732055336 19.772230629888707, -0.8366993917152286 18.34784889573806 M1.786022349504847 18.13928574000676 C0.9178582656589649 14.335331945336698, 1.4233365418697481 6.984006899559073, -0.41460257245070076 -1.5401364851359807 M0.6131625023375992 18.964572880368017 C-0.16491867132689514 13.160092163796408, -0.4691701671372353 8.626224684569598, 0.5304938310421726 0.20400276965145225" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(208.8363304496985 59.652533409828656) rotate(0 11.454231944033836 8.642438018354088)"><path d="M1.9123801793903112 0.5036393497139215 L22.083513928774664 -1.1594850327819586 L21.406446470145056 17.402095494111002 L-1.6995717082172632 17.2589486811952" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M1.1238113138824701 -0.5040675792843103 C7.645628783750388 0.08641145642274695, 13.436803934892719 -0.2856900293064706, 23.65095335451133 0.7090347018092871 M0.8675391180440784 0.9807671057060361 C9.559400651992314 -0.9394957335239609, 18.88951730526428 0.37836758181989594, 22.477193924795756 0.16760290134698153 M23.433544923949523 -0.35553717606091517 C21.321015155984547 6.855126118495292, 22.202166485051805 12.095095360758775, 24.631827558680307 17.121230710096498 M22.901314647119847 0.596510941645596 C22.207745601461262 4.771237626722614, 23.03772847853919 9.310782283756463, 22.4412347908325 16.975376094521167 M22.576307418022225 15.759910472919167 C17.88995899916615 19.310839616361513, 11.765082519396993 15.574392580095186, 1.6332814786583185 17.415132352401436 M22.836318555485377 16.757526956749082 C16.823660207808143 16.555118411495506, 10.985817038048388 16.652169882967293, -0.31633748579770327 17.358099095773817 M-0.48610972559688204 18.23001918463158 C1.1705819628566982 12.561519730014167, -0.32544157787897565 4.627668881975331, -1.4263181732648549 -0.6863122517991622 M0.2781875024861955 17.05361562926586 C-0.4757293221467767 12.371680590121654, -0.175386883646735 8.975829269946939, 0.1889266701740009 0.1786990019613386" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(65.4665685716327 108.59569422986644) rotate(0 6.281818150930462 6.5734725011127075)"><path d="M1.3343888614326715 0.28473021648824215 L12.775916405797716 -1.027102867141366 L13.2811365209825 11.600022219789402 L0.49036903120577335 11.750450336110966" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.2980506168941339 0.6749647943930426 C5.399263025606478 0.5862991074577972, 9.511757369439357 -0.2663333328507295, 12.224003444817319 1.1277315766505231 M-0.600791329632999 -0.15659712891818633 C5.216611504887707 -0.5157867739841867, 9.499124237113 -0.11518759758012044, 13.139655175623721 0.42355575557897907 M11.256037798479353 -0.604401047626477 C13.110714932369563 3.5578859460965324, 11.152806729985327 4.184410098019729, 13.395859230540172 13.57478647815581 M12.385253984037703 0.5561128129199817 C12.067887480056767 5.339097537128964, 12.7816545723657 10.450401690611576, 13.030733187461486 13.370031010304153 M11.999712923962498 12.375982079214948 C9.795735347523918 13.674365335329437, 4.529039531708586 13.32486257075303, -1.1915241219968304 13.41304201253567 M11.985965636960055 13.39705155265722 C9.582689183077557 13.44765344698502, 5.1988831471128725 12.987005587482129, 0.03455255334209162 12.582346491201404 M1.142761063620029 13.41440666724438 C0.4320339571337861 8.704371412304816, 0.7430876350059443 2.0166435737942647, 0.9918849054178738 0.731960321674658 M0.3308142792836719 13.603620601365968 C-0.7247183180055076 10.541206419275358, 0.19644450980978093 7.723298287217505, 0.4571101709669103 0.6472655088192294" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(151.9938275604818 110.5153082979854) rotate(0 7.661128495758078 6.228644914905818)"><path d="M-0.04964844323694706 -0.5800034906715155 L13.973758786793155 -1.3568401839584112 L15.289737135048313 10.691298270559493 L1.8214433398097754 11.348745787477675" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.4890231598455168 -0.16782777315389708 C5.319297083654991 1.3825177738088672, 9.156932125724856 -0.2895407393278089, 15.8151542439081 0.24261378532555367 M-0.590077745967338 0.4328716391619536 C3.6584988883845524 0.5145905795509805, 6.494188914634845 -0.00536263128786485, 15.597254842254081 0.7500212199783822 M14.97713741243248 1.1122164419827383 C14.740783466711616 2.3677191647192926, 15.179779146011029 5.89868242110735, 15.911943253250962 12.295568911069552 M15.198159927491313 0.42130161325275217 C15.43830569993609 3.404264047443032, 14.781056149027693 6.581456422326292, 15.695154303370934 13.04535527703237 M14.776376176810397 13.462057177628509 C10.579013840452832 11.954694213834003, 6.867259514836083 11.309227036028453, 0.067446568787771 13.723226786272054 M15.97654671359191 11.908127536319864 C11.331840038725502 12.277938636190681, 7.2025748687736515 13.137036191152346, 0.5181835605168363 12.111400887887688 M-0.9296305611670861 11.909896680869137 C1.1071458288424063 7.310317839340089, 0.6812797724121024 3.192346754639145, 0.23051173510905376 0.18068946138513353 M0.006337664729383907 11.933374168533385 C-0.2866798420101678 9.541982932091097, 0.5676605867442337 6.346999107767979, 0.6111412676164819 0.2920831382663428" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(207.46039613211508 111.46971170204333) rotate(0 7.661128495758135 6.228644914905818)"><path d="M-1.3484982047230005 -1.3568401839584112 L15.289737135048313 -1.765991559252143 L17.14370033132593 11.348745787477675 L1.4705324973911047 12.531471395826522" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M1.1950592019825934 1.4696172417327977 C6.376128139074211 -0.18565849408848867, 11.428240805780248 0.25939656264094335, 14.199044126368346 0.7303766291005642 M0.18503465607751624 0.4642831630159041 C3.6408633821248553 -0.142244375655332, 8.462203714946092 0.6634468921459915, 14.635521314481398 0.2851350004425315 M14.6944194180848 -0.2528091392177003 C15.098903139475848 4.017817172631002, 15.877431295342594 6.558661800053189, 14.629595752310603 12.372515352986369 M15.446783147618616 0.35973833770509445 C14.764714183905252 3.054801229467944, 15.670334890565968 5.712857900455789, 15.403124591767547 11.90258890173052 M15.374622514561498 12.021049151365606 C12.525349140914596 11.476058708341743, 7.523448116026293 13.82370352105374, -0.18250826386560304 12.898654895079002 M15.746008872314746 12.171244184492295 C10.367290921513428 12.920509280755605, 4.5131973231555635 12.001568429189334, 0.25832804430249157 12.957576659537894 M1.1884943930831493 11.46755613200225 C0.8555913927695539 7.873193274247615, 0.3234747912257624 5.370719159765075, -0.5333483123301106 -1.0212221980448937 M-0.38880206181465715 12.745821568704477 C0.4642708339938675 8.44689672651287, 0.6098737346706051 4.133813659114243, -0.015462126175040058 -0.18063178963993792" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(289.29564173616166 110.9925100000143) rotate(0 7.661128495758135 6.228644914905818)"><path d="M-1.97082363627851 -1.6277467999607325 L14.356476086999578 0.7899580802768469 L15.388750035192174 14.143001114911023 L-1.0305569674819708 14.441912507599774" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.32488331471691567 1.419771815704825 C4.815157983724492 0.4566664162766926, 9.302120985124073 -1.2559954181357016, 16.692964981359903 0.8752369825164386 M-0.05770336926112929 0.18698345366594482 C3.1063299484751568 0.316207911040457, 6.460936555457816 0.13481025420082446, 15.067507408919422 0.6263598639089083 M14.37529464013135 -0.04036218510101208 C15.437921363369561 2.266106634734104, 15.272207646034403 6.269377711116824, 14.649016726282278 13.054154737928483 M15.4076945475191 -0.4523968325485316 C15.408371749049355 3.4886535177004077, 15.039909117060152 7.730612040860872, 15.751101472014158 12.930697750861295 M14.807440321226842 11.378187135258814 C10.359905709361051 12.15886589244586, 6.162224875972772 11.82666784706842, -1.2000628449319772 11.548484244314745 M14.86703786550885 12.223609864507239 C12.338222649412208 12.268908535153864, 8.67840800115318 12.491201533809038, 0.012970510243453193 11.876478838532073 M0.2803017605941174 11.795362327052889 C1.1664129989227872 9.246297649534021, 0.9408064546032111 4.867992651307419, 1.0969654984976633 0.6654273833779427 M0.5073910179343025 12.45158214387215 C-0.31551810317729656 7.827619104442077, -0.331444194565933 2.9677313836548227, -0.10738630764373325 -0.1661580860239843" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(90.52787469172438 109.79410211494044) rotate(0 7.661128495758078 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.9855415314109871 6.104875547488325 C1.7837569587222952 4.932180392868143, 3.1944827736231227 2.487677039629971, 5.551522109242624 0.06308968996975961 M-0.4430160846235106 6.350717425768368 C1.4988444926306697 4.1860347776332985, 3.3916741988599117 2.2148890309990126, 5.092392704593486 0.6275978714076096 M0.2962450110446322 13.023106728910982 C3.1488409544488674 9.460784958652166, 6.920621684249594 4.6286478673248155, 10.762616249871385 0.34760843916682016 M0.6590376467536794 11.786425040334045 C3.0548091519057943 9.423868256438187, 6.172882898979238 6.001518376199493, 10.169267914396656 0.4301067384972932 M3.5758409941984874 15.480457930925578 C8.013590079766397 11.7734095533626, 8.51322905212178 6.343551420478306, 15.58034339044564 0.3937655381788483 M2.33218608386317 15.414281134826407 C5.761220127324565 11.405054314196358, 8.539724125972308 6.954283024158126, 15.874103708894406 0.7980973786138912 M8.39934875499379 15.962933720174263 C10.373039802720829 11.845152632003154, 11.430754881852373 11.524488008558258, 16.02887313909685 7.753400968057672 M7.776602889922597 15.097280868646706 C9.922554730482974 12.9795449061011, 11.748116921537898 10.624812347374696, 15.53925534336684 6.91278328476065 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M6.543177192254394 12.6780481665292 C4.268047541074683 10.655356012896254, 2.686386669487847 9.4967170875931, -0.2715383813744019 7.783296766476174 M6.158545916496246 12.541736045003793 C3.7494885913637783 10.335128716741936, 1.854284278965847 8.973623352139883, 0.2134420472169688 7.073565300525566 M12.917822568082128 12.344241205429629 C9.134862759794474 10.109097269591013, 2.754645868802969 4.810150916117756, 0.9859576439690705 0.17389581499154816 M11.238799469881569 12.2202679662164 C8.540765792435549 10.096439147123853, 6.485162503370937 7.191588513956833, 0.2068276627194997 1.9905709169572137 M16.299668406971392 10.11407459163953 C11.046811758573647 7.0780266954063435, 5.687104394536872 2.964908445194016, 1.2905474721669812 -0.5608112632293618 M16.053550339207646 11.748825567866755 C12.430041167557496 5.7421700983916, 6.887212240406566 1.649515262977829, 2.2155003675475378 -2.1595983483825245 M17.29174834463902 6.604705439698552 C13.19769496484621 2.8878448872661613, 10.917674795492632 1.280137029873109, 7.315838638205338 -0.9017668718794847 M16.523748635008637 5.590721871382227 C14.5688033293149 4.348919797214966, 12.717747446420107 2.2139241360542186, 8.555095291907225 -2.0438360281145687" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.32863426032948206 -1.16050865139706 C4.03329250966049 0.4574570621736611, 9.362066805710816 -1.1333591905818587, 13.866439489765483 0.23283888293376842 M0.5971500470663431 -0.7321561999014279 C3.9607174064863937 0.21148493090412762, 7.517364089120832 -0.7196843435550484, 15.09912584011393 -0.7493700001900138 M15.88564038491305 0.8160841826622598 C16.109894335929912 3.0865936196991637, 15.08971903039006 8.010372874837852, 14.138779171873637 11.439159416004774 M15.35862133532118 0.11110279187302885 C15.591138662261516 4.2549237868635705, 15.624450531475691 8.079864246372264, 15.113460969971412 12.857735144021456 M15.467799624723671 11.15868108051764 C12.199375767919626 11.133291998971648, 5.559716656554162 12.930811942776192, 0.24127098458311913 13.933673089274768 M15.137709607612562 12.530736286944132 C10.874645695005233 11.666211306068734, 7.526960006210287 12.287384680987214, -0.3696972877764938 12.632229589547448 M0.16319129735661364 11.874316158028892 C0.2915149107468349 9.0830452492131, -0.3044541875623684 3.0101157390312787, -0.4497359659243041 -1.131553702329326 M-0.40129058097077375 12.845341146685607 C-0.4546232719119787 10.10395521211974, -0.5578944677177882 6.520313775375806, -0.26667073712778694 -0.34557075104177193" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(120.87270227793113 109.79410211494042) rotate(0 7.661128495758078 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.17990700897293777 6.888908154566876 C1.2463909627291798 5.107747269437469, 3.132281327629543 2.6916787237737942, 5.053278610440439 0.5511590096947961 M0.0014893088815858824 6.270567310278408 C1.1993750614576435 5.08928891833048, 2.758411934994365 3.3781139782111325, 4.7566044427030745 0.5924081593600325 M0.49007783765901203 12.727480231642305 C4.227129699189883 9.605755282115588, 4.647878307489152 5.033243170213026, 10.599132487130296 0.022897164066112197 M-0.5572105078865184 12.671752403348266 C2.3303971076599193 9.295561396501908, 4.67018994862644 5.547543468048661, 10.846509597402942 0.36338713495878006 M3.60167854376571 16.28481309032747 C7.0107812625669625 9.172282119850104, 8.837743671975991 8.618406861172552, 16.77994793448918 2.104711063944267 M2.5260265950063774 14.789594528598055 C6.232670683247028 11.131686956928373, 9.385914467796443 7.064421628219126, 15.934244469137345 0.6527350655221364 M8.384125356189125 15.914250020208787 C9.824293061508781 13.441837364723604, 11.870725616326723 9.859611843594644, 14.830805864407175 7.247984422785799 M8.445919439521768 14.476446570214833 C10.729554174606625 12.08795102809125, 13.276549308889726 9.405689313978215, 15.105942469824532 6.967064523972208 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M6.771090309663146 12.672138604418299 C4.879610405519318 11.55456663649899, 1.6895019600235655 8.905093459762362, 0.8051578476066164 6.586965909199257 M5.931578760562866 12.610151984811685 C4.582561921839856 11.54823757526541, 3.5547602773075497 10.095812258681901, 0.4155928569818308 7.4953034601820905 M11.511144225262914 11.373954979157046 C7.308858906544717 8.945116662170495, 3.021093015315299 5.654622062000635, -0.49615252258061204 2.8340462952619334 M11.314249771051918 12.681755760138826 C8.415442433731798 7.876431384558703, 3.981179292011053 4.602307516227686, 0.24380979372383316 1.5550166271394037 M18.412469541557726 13.137642208579075 C10.968736123752617 6.379713931429274, 6.8232449067461065 3.45660873616918, 0.27445189349647947 -0.5104892670173562 M17.016106433138845 11.29403572073121 C13.461660422786595 9.036213767699827, 10.09610427206879 5.154403474680287, 2.5276458093180914 -2.5869786419902363 M16.521333428958137 4.506310607683515 C14.000953745478322 4.055717757096405, 12.1557081715277 2.1355430865347023, 8.23977406050663 -2.214996966193106 M16.833772057547872 5.231351262754431 C12.799186656019536 2.1771499802033367, 9.73942584662509 0.17833581534983434, 7.619131442183885 -1.6548260234153385" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.6929520992376381 1.003769820265283 C5.41866534890859 -0.6389968058993493, 8.650596775617139 0.9304389769527437, 13.866599176368728 -1.252283286682054 M0.04472753132668006 0.13665456554439193 C5.219327143213944 0.24133451703068487, 10.258728779704331 -0.052481640880421176, 15.065441396421095 0.49254100203142515 M15.440585953461115 -1.05579390650096 C16.478413435982205 2.5744875399601206, 14.655696346018257 7.611359405752805, 15.518414945539515 13.657617834569526 M15.172216416560815 0.05971338321625397 C15.176754207722166 2.8261689845081666, 15.878869509598957 6.7550430907653, 15.021685981274189 12.59951922140801 M15.522979542812433 11.740242017616833 C10.246552908093726 13.856743542312778, 4.074215476699009 11.826447838759469, -0.5531676746999261 11.065497805731091 M14.828676325583942 12.934586423554704 C11.456407713018752 12.933178743708217, 7.957498171655342 11.8972477512981, -0.3280005219603038 12.032243617629756 M-0.06962340821032598 12.642548192348771 C0.7055795318103719 7.440017518008115, 0.17225115013558298 2.6215059072792326, 0.2340742721675828 0.23731616223522178 M0.41184134132211103 11.861837393048491 C-0.3615948073035772 8.912785868461592, 0.22450442722757052 5.2560767257373024, 0.5860050160967799 0.2801154543877312" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(180.18304710551718 111.86306763218175) rotate(0 7.661128495758021 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.36741164031905427 5.70742812832176 C1.9941659641708465 4.095148712769747, 3.2892458185013913 1.9981303918740876, 4.566991302701181 0.5765271279809289 M-0.6411611046315977 6.492456615393154 C1.933201686907118 4.399261303496065, 3.061696058164395 1.9636502463191765, 5.3195265389555155 0.6049163609591537 M-1.4502702153195037 10.986614905714571 C4.546974056741427 7.386263956875403, 8.112494362703098 2.900097586198785, 9.22516627008928 0.499634377200878 M-0.19904209663163552 12.219132087796508 C4.3371437349716775 7.836328935604952, 7.604652859491832 3.9154690970177044, 10.005276153189076 -0.41619851262422786 M1.8376595789982537 15.725364800456255 C5.540665366158851 10.195988750455157, 11.047001735337934 5.667365046228224, 15.223712478022506 -0.5370080389152296 M2.6601734522680354 14.992541406077608 C7.583140852165204 9.262623883770514, 11.148868072396276 4.332465756773966, 15.240170958967362 -0.2696743497229197 M7.582646445742958 14.725763873281906 C10.469161735529571 11.07602843976743, 12.133900636452223 8.797385242711304, 15.65084337241804 6.678533615214388 M7.890138432708719 15.37146669751128 C9.801442690995792 13.589765561938973, 11.949423537318514 10.925059864090898, 15.318101984251717 7.118985438622172 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M6.655164330048888 12.009176644579432 C4.478012814696061 11.470817385642604, 1.0561147846853483 9.32326606913686, -0.08538971403889156 7.76561771889164 M6.216742067752808 12.928153350473305 C4.974019891585312 10.805768054131272, 2.7096541436186743 9.397657587678037, 0.24246148643739723 7.070915909828087 M13.443525664471899 12.558407036041812 C7.726189962252176 8.059476762409973, 3.6844192502108886 3.7113895882687733, -1.4044204974341277 3.030632141209199 M12.20968624871305 12.295968940680817 C8.321436974437704 9.69883460563286, 6.615100183290415 7.602661073840639, 0.37461004091621597 1.8137488506501458 M15.239292776115832 12.721997562888157 C11.81675593396629 6.947692396233367, 7.4392804101145416 4.36498607599588, 3.15495554797646 -3.7185838638801494 M16.759074515566823 12.106342601682138 C12.891379073417461 7.818396522288394, 8.451106851925806 4.164195325532596, 1.1453850099101324 -1.6980268635783708 M17.298469325640877 5.433179380190229 C12.96114924135062 3.013980272925081, 11.398811447044116 0.6831175753889231, 7.8492173912714005 -0.7759776199559236 M16.06352495576412 5.733501059240331 C15.163797580258239 4.461884145945868, 13.306182794361252 2.6708795083398114, 8.414163728827868 -1.4513174317634219" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.5594314353673195 -0.369094767807187 C6.21995327118297 -0.530040805382757, 10.675849953423619 1.1971392217347163, 15.094332980940775 -0.7393945755529876 M0.5520616875747457 0.1003612756481389 C4.5880327108290695 0.1767577478479005, 10.537633921346039 -0.18975823628866867, 14.951629827232598 -0.27658383734996306 M15.310575607632055 -0.8025811619415475 C15.998781890584402 1.8024327180082351, 15.99882922574577 4.201523573010781, 14.413943291230698 11.923948355556062 M15.019695175763939 -0.03481170410516299 C15.517103685462132 3.4279345424874914, 15.295519386150598 6.207315798510043, 15.16372710596754 12.574326965895427 M16.469756984061956 13.470405677267763 C8.968892039030631 11.500784550936933, 5.440374414696131 12.94256895455997, -0.48549818196419525 13.898842459255146 M15.142074138837888 11.763876581709361 C9.756004120015684 12.679712623206022, 5.358421299634263 12.272104988215492, 0.2897218734012925 12.997884626810238 M-0.588049678695103 11.89185574728466 C-0.1869595591162626 9.333873044151947, -0.7551682002479978 6.520621728306203, -0.6031024456379208 0.32817888171464604 M0.18741164192816273 12.520782934944457 C0.3718726584062983 7.772787374954364, 0.44728497702461295 3.4104925205127388, 0.026582263547794138 0.051591244972721384" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(235.3554608986209 109.10444694252658) rotate(0 7.661128495758021 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.5323107883008072 5.606076218982332 C1.0506465119020651 4.353168144563857, 3.5051128751055476 1.8891143054474946, 5.152168347271627 -0.34235441201950967 M-0.07287498597520087 6.2316902783262655 C1.5566417016122425 4.728592552443554, 3.1589401478755677 2.594220610280873, 4.987472982662267 0.0477005295303875 M-0.764476520465721 10.934059192657053 C3.616660441408613 7.3617053322105, 5.748174750345383 5.302484458508978, 11.391119629097116 -0.3657729555770992 M-0.1715098812152041 11.626702454357922 C3.027903014675699 9.078005131490022, 4.9928292024835805 5.392218668496865, 10.204121511381995 -0.3519131821498512 M4.2578534584908345 13.704121786828429 C5.516804442595644 10.892778457924608, 7.309740179758776 5.9611029218001494, 15.064564594257106 0.8354947214876285 M3.410065575963498 14.235244309769287 C7.540369349496765 10.445212424218505, 11.471352743355446 7.055605169261649, 15.140716430035834 0.26075959647307 M7.56453540176075 16.087559808130393 C9.979359888317914 11.992055580534126, 12.903836165910057 9.323181838782183, 15.871613230964595 6.6603755159728335 M8.1267703341778 15.429982511551136 C10.473280924457287 11.80688428728262, 13.086412208058738 9.419702213516878, 15.415498541191646 7.148185485969867 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M6.633647534592479 13.37194316823866 C5.023614917110692 11.45585948764098, 3.3221635991854024 10.48494995004123, -0.13721852312139965 6.737174497143766 M6.244448700515955 12.755023460359237 C4.034671910479897 10.224055407653893, 1.677394811493194 8.797777616313343, 0.4211475665653517 7.626689766318937 M13.277695347332273 10.963168389176921 C9.435816621394908 8.211808570665102, 3.5970624880976834 4.486487411377835, -0.389752405302513 2.8741074263816615 M12.581683576815639 12.178993780737713 C10.097468489777464 10.47026327196649, 7.238810196222479 8.412225573839287, 0.6059591470492002 1.2664509959285999 M16.931407378890214 12.785988085064185 C11.659468726341824 5.962333188153747, 6.878246127659667 4.07739025064636, 2.0106928876816887 -2.421458952780477 M17.42590209925234 10.540634685288262 C11.524823911534295 7.127350790789602, 5.978157700384616 1.7667571167768479, 1.7462903929158706 -1.3942837935869015 M15.668211576269647 5.261359481517827 C13.046147850562134 3.0351585610070524, 10.025919934117846 1.0203475479969917, 7.19861298247825 -1.1013944741661215 M16.4499837862064 5.5877053385178055 C12.759607859827291 2.676591374245515, 9.904639962057052 -0.18832885707109137, 8.16365571983743 -1.4234194091378267" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.6521740765027317 -0.014367905727007857 C3.056348090890494 1.077072126504154, 7.099095703064359 1.0771303479565812, 14.080290180893172 -1.117210575881672 M0.357916861836062 -0.37214594507393983 C3.897858242927035 -0.011793486382127877, 8.351069272310946 -0.284338448064834, 15.46398168662541 -0.19498909316523905 M15.140698370715334 0.9329395789905266 C16.1696712903839 2.7803563475847843, 14.586487268972359 7.857367940622688, 15.734954121811132 12.06257046386422 M15.388423066804354 -0.14649212706837195 C14.814521766094481 4.681144124933373, 15.545880879166326 10.312439493921179, 15.158464936130473 12.692839288941205 M16.15807650672912 11.733998614479841 C8.948915887683713 12.32666373948781, 4.152577924241559 11.62777666668441, 0.7536069429552348 11.715483964288488 M14.868031488798337 12.68780299760784 C10.400020840190674 12.940533995072634, 5.181051175981136 13.03328987873448, 0.11522471015014157 12.48998556693102 M-1.2398805346856494 11.987258365848346 C0.7891492271474708 8.220726543980104, -0.2715888590016168 5.9363170602361715, 0.009313864740602806 1.1271310204189275 M0.13851265328863815 12.774125065758799 C0.06660439087724673 9.70497212687148, 0.5193099107472823 6.6556894120831815, -0.257097009312332 -0.3776080572264844" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(260.1830471055174 109.10444694252658) rotate(0 7.661128495758135 6.228644914905818)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.5660293310554574 6.778298909266583 C1.5005700697896844 3.8740094959367646, 3.888061078483078 1.7924914670934593, 4.6823601457797945 -0.05404862427737178 M-0.530856939745123 6.184799850409023 C1.329800196194094 4.156057980507952, 3.4571234448483166 2.6005126571272275, 5.219830093490428 0.1577973157015854 M-1.4086926854122772 11.575561057447969 C3.4450884320537836 8.774673877878202, 4.745468169500141 6.261404238795016, 11.064754053902757 -0.7584531624719738 M0.9188013405631887 11.639688919156372 C3.878405681683727 8.14056674278562, 6.972964896043772 2.7219518266729446, 10.212298057180774 0.7381290706887604 M3.8977376992298822 15.556306501391619 C5.73435983873332 12.527384854568659, 9.193260449775455 9.979156380407193, 15.574860391955305 1.723945411287712 M3.4998199466566158 14.768472037238345 C7.389117309106338 10.529988369256705, 10.74889041979753 7.259601915433032, 16.04075838658348 0.7860441079759022 M7.15316062401902 15.815663437071274 C9.2839274020957 13.644911380718113, 11.125928054283278 11.158928847965003, 14.661932959629649 6.5839979067869585 M7.894978580203988 14.710226441678609 C10.821619687037325 11.524109719036767, 13.950275948433275 8.233747946626105, 15.607421272982338 7.250371754444774 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M-0.13035152897861035 12.343976974416462 C-0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462, -0.13035152897861035 12.343976974416462 M5.629200165315002 12.492980775133631 C4.5920169520594625 9.887211951553695, 1.794171426664012 8.220396201681634, 0.9009324981803468 7.325123753742715 M6.62384493869397 12.990836551309487 C3.6254380705323883 10.674185057531117, 1.6544275639062012 8.649133956521277, 0.25626296577699686 7.682125111640831 M13.048884932660375 12.45564176199396 C8.062756210212067 10.484691498035609, 3.3388711114235736 7.238739457159812, 0.8384021496605989 0.7226300507550979 M12.36858803551725 12.244119766181306 C8.172195278801519 9.08737216747763, 4.054883597707195 5.778521976792255, -0.7001612679827097 1.5278969906037592 M17.56418338967174 9.51680452013303 C13.987377469178806 9.371215018238136, 7.327485118167129 6.199560007345894, 1.8709136664151806 -1.7042487083930886 M16.370749539599416 11.271427678967905 C14.214905597243503 7.424003480531246, 10.714030395892983 6.08861716021963, 1.117068273688819 -2.575201002601198 M17.0745033708603 5.172576166879987 C13.134525068497116 3.3023025038347287, 10.959671749697593 1.7301966993347677, 8.072638368863869 -2.6219682539204623 M16.767710882678333 5.292379135316839 C13.78784838497269 3.3063549707083486, 11.385884543842904 0.9838796626769983, 8.278524597758684 -1.687813272312891" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.10473860489052567 0.5076105297725482 C5.124294762946779 1.5349700168285583, 11.908660485991804 0.8577719191306236, 15.4850233112432 -0.3603657053565359 M0.6076676953675075 -0.20146147356884891 C3.499572474651589 0.49936556471426696, 6.135264870903162 0.20709832575715237, 15.740166749122638 -0.3616456076658977 M14.607467808079651 0.6126969487164788 C14.704496491647758 3.499931748131162, 16.140650596488406 6.448576475733482, 14.583668839838397 12.832113113667962 M15.780049042830814 0.09367990699354001 C15.425315201603219 3.9157644041818527, 15.547482036942519 7.614240793899875, 14.70231672417333 12.22227409782999 M16.748749800361438 12.468745726807644 C11.104700037800413 11.544940029237884, 5.004281762865903 11.627521748335944, 0.34073646824630077 13.236693853555618 M15.614532229501496 12.141064832059188 C9.919315157083043 12.588012164334614, 5.472756907558488 12.204900828725892, -0.7078376429081014 13.059520093880272 M0.5179209595159548 11.239238192513948 C-0.7522856114352774 10.13209875287417, 0.5213260091281064 6.7596753566437915, -0.5214804406721543 -1.1825215681706165 M-0.5700580102239778 12.094583379098811 C0.41157710960632443 8.806468942745342, 0.47703382283467965 4.021015417475526, -0.21088164588019565 -0.6164083438358398" stroke="transparent" stroke-width="1" fill="none"></path></g><g><g transform="translate(148.0210102608686 35.884815995363454) rotate(0 -23.971252877089796 9.31638414407385)"><path d="M0 0 C-14.643344386807778 5.691109353397764, -29.286688773615555 11.382218706795529, -47.94250575417959 18.6327682881477 M0 0 C-11.275560284562957 4.382226143509122, -22.551120569125914 8.764452287018244, -47.94250575417959 18.6327682881477" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(151.4333980754019 36.28321350685066) rotate(0 0.08027251050077666 11.80897408051031)"><path d="M0 0 C0.051162149233997305 7.526517987769038, 0.10232429846799461 15.053035975538076, 0.16054502100155332 23.61794816102062 M0 0 C0.03292207247257026 4.843200967303777, 0.06584414494514051 9.686401934607554, 0.16054502100155332 23.61794816102062" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(154.36003610032083 34.93365429834115) rotate(0 26.90922922078005 10.624632932971565)"><path d="M0 0 C12.516916978464527 4.942083151380956, 25.033833956929055 9.884166302761912, 53.8184584415601 21.24926586594313 M0 0 C17.911966441767866 7.072222953325145, 35.82393288353573 14.14444590665029, 53.8184584415601 21.24926586594313" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(71.20483722401627 80.15212228048075) rotate(0 -23.100814592801328 11.698068685788046)"><path d="M0 0 C-13.340347275402854 6.755445704869569, -26.680694550805708 13.510891409739138, -46.201629185602656 23.39613737157609 M0 0 C-14.285602182183645 7.23411526782713, -28.57120436436729 14.46823053565426, -46.201629185602656 23.39613737157609" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(81.4575686941929 83.8873990919624) rotate(0 -14.554766232255702 10.28985587741582)"><path d="M0 0 C-10.98264786990617 7.764457493195205, -21.96529573981234 15.52891498639041, -29.109532464511403 20.579711754831635 M0 0 C-10.938602576674775 7.733318571951305, -21.87720515334955 15.46663714390261, -29.109532464511403 20.579711754831635" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(92.71052560552914 84.93456877733809) rotate(0 3.1552117382235565 10.07488288030119)"><path d="M0 0 C2.2683405328688284 7.2430211020196875, 4.536681065737657 14.486042204039375, 6.310423476447113 20.149765760602378 M0 0 C1.5530035148384072 4.958883847681873, 3.1060070296768143 9.917767695363747, 6.310423476447113 20.149765760602378" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(143.81412547848242 81.49718680857619) rotate(0 -5.565999043464103 11.056965635904577)"><path d="M0 0 C-2.3605525059325987 4.689283583419557, -4.721105011865197 9.378567166839114, -11.131998086928206 22.11393127180915 M0 0 C-3.976737750394021 7.89986708688782, -7.953475500788042 15.79973417377564, -11.131998086928206 22.11393127180915" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(157.79351462670945 82.56881322977169) rotate(0 0.4877464437154231 10.995352828391818)"><path d="M0 0 C0.3523996010162537 7.944205436419236, 0.7047992020325075 15.888410872838472, 0.9754928874308462 21.990705656783632 M0 0 C0.23704197662084248 5.343678465867512, 0.47408395324168495 10.687356931735025, 0.9754928874308462 21.990705656783632" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(171.0109003680185 84.63672563239413) rotate(0 7.832070467882659 11.116733664246953)"><path d="M0 0 C3.250898556179442 4.6142809320639095, 6.501797112358884 9.228561864127819, 15.664140935765317 22.2334673284939 M0 0 C4.5253930646618254 6.423280999854955, 9.050786129323651 12.84656199970991, 15.664140935765317 22.2334673284939" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(218.44693905993915 81.66042333333068) rotate(0 -2.497168498125802 13.759874370826923)"><path d="M0 0 C-1.8079588683835757 9.962197951500483, -3.6159177367671513 19.924395903000967, -4.994336996251604 27.519748741653842 M0 0 C-1.5365421491108837 8.466640097820521, -3.0730842982217674 16.933280195641043, -4.994336996251604 27.519748741653842" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(224.8630249115887 83.77721257796638) rotate(0 8.393870813360422 10.293591016739073)"><path d="M0 0 C6.589048878704789 8.08029761891666, 13.178097757409578 16.16059523783332, 16.787741626720845 20.587182033478143 M0 0 C6.510040000865417 7.9834072696066025, 13.020080001730834 15.966814539213205, 16.787741626720845 20.587182033478143" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(231.67339697531736 82.31858859402979) rotate(0 15.20624662486216 11.55343980935841)"><path d="M0 0 C8.612502377846084 6.543628436731532, 17.22500475569217 13.087256873463064, 30.41249324972432 23.106879618716818 M0 0 C9.289055824504032 7.0576618939422415, 18.578111649008065 14.115323787884483, 30.41249324972432 23.106879618716818" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(233.9060627600553 81.27525822827378) rotate(0 27.449387833183778 13.390851929034325)"><path d="M0 0 C13.340245188556029 6.507877301402344, 26.680490377112058 13.015754602804687, 54.898775666367555 26.78170385806865 M0 0 C16.504549297784564 8.051544797472674, 33.00909859556913 16.10308959494535, 54.898775666367555 26.78170385806865" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g transform="translate(93.48756125873672 296.15035056425285) rotate(0 6.281818150930462 6.573472501112704)"><path d="M-1.0481440220028162 1.0399139020591974 L14.247602601379867 -1.2627559211105108 L14.072568376392837 14.80711473218836 L-1.9469649586826563 11.20142443172373" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.9302226919813625 -0.3434217814047571 C3.126674595896559 0.8471740464244658, 8.923615430391688 0.3069009453128152, 13.605127155423105 -0.8420332267100237 M-0.46818925146349366 -0.4943776068716876 C4.385062233064431 -0.1613314331863579, 7.439812459435915 0.6084334649972717, 12.651749926654212 0.5418942168523657 M11.491539432112859 0.4463071512517389 C13.554224330711994 3.5212727741481733, 13.44494251160216 10.102772877534235, 13.091883152828864 12.33616175923646 M12.119324715333018 0.561483341972385 C12.1188182127935 5.289246859829869, 12.771397082599039 9.610384216516465, 12.312145744990739 13.335289376396702 M11.691406509554149 13.000615358163635 C7.715706545076113 13.197389991424775, 4.058742761156982 13.636908849246149, -0.6670739142189769 13.42187340754191 M13.029577927124652 13.193430516230412 C9.987684687483556 13.158171384218846, 7.660899297088668 13.512479694659788, 0.2611278410700292 12.95265963653631 M1.1558128573929323 12.38418377224846 C0.058841857271697495 9.010535033433078, 0.8336101418223658 2.6282846327386196, -0.16117039798512778 0.5057711768986115 M-0.6192909244193764 12.878278414654526 C0.6716345808381846 8.441358257382207, 0.19797994362715898 4.362724373226808, -0.26037401433240664 -0.03462371210619508" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(138.61318726750665 201.58598281536746) rotate(0 14.902507806102847 12.09071388042306)"><path d="M1.6839662995189428 -1.2627559211105108 L31.313947686737574 1.660169729962945 L27.858050653523005 22.235907190344427 L-0.22345868684351444 25.60715513756046" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-1.3548203613609076 1.4663367476314306 C12.23703141501568 0.6877490337264924, 23.049717923447933 -1.258958393775186, 29.94872760567645 1.1216368284076452 M0.8368502063676715 -0.40368842612951994 C6.308821332682734 1.0562618606503382, 13.645945771680902 1.097203903716363, 30.258644162489272 -0.8962492598220706 M31.49121331963519 -1.5504646692425013 C31.127773776666686 6.758283132989625, 30.411426737920806 9.174168615275462, 31.5352264026163 23.186491672746037 M29.227822600675918 0.316140447743237 C30.05712246369953 8.155130397104713, 29.258985424753376 18.277724968353397, 30.502997665478087 24.473121274127756 M29.483580646848477 24.194215004197453 C21.11037816198577 24.859169100537464, 8.84778091699123 24.584371005311176, -1.8407668378204107 24.52539933179984 M29.863602190090514 24.214128840579782 C21.598332696959524 24.800206495650094, 12.779698450105386 23.8942014409378, 0.6456922991201282 24.39439858855227 M0.1321082804352045 25.180232648602818 C1.0981460312018734 14.746798717522147, -0.4577739140381473 8.919403833682686, 0.30974639393389225 -0.2318184170871973 M0.9985529882833362 23.332725185766016 C0.5058130405034518 13.894355941444005, -0.1682833948050046 5.04189844791545, -0.5240720110014081 0.5199569510295987" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(171.42285698933347 249.62708576601804) rotate(0 11.799059530240754 8.987265604560989)"><path d="M-1.2627559211105108 1.5089320745319128 L25.258288790444453 -1.9469649586826563 L21.652598489979823 17.75107252227847 L1.4257273767143488 19.260732587452573" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M1.4663367476314306 1.5411449167877436 C5.938488469858327 1.817800987956855, 9.323991251045184 0.3035680544355844, 24.719755888889154 -1.4686559345573187 M-0.40368842612951994 -0.44401769805699587 C6.142344917649696 -0.057213197855086106, 11.503934787612994 0.2561476422270825, 22.701869800659438 -0.8480208711698651 M22.204675281184493 1.366020329997454 C25.05620003327256 4.6488064586976545, 22.550633692534493 9.408158870380177, 22.70394357214968 19.268779529007034 M23.88224287770284 0.37346853077435904 C22.99629756443374 3.8335934428187404, 24.009156134221953 8.982158745655758, 23.860271768380475 18.277922978274244 M23.61090630383285 19.22755502166931 C18.86943699426957 16.87469273652848, 13.153501360853754 16.09583806838807, 0.34397157095372677 17.826232716466826 M23.630820140215178 18.488351033094865 C15.365061821711071 18.404044855862477, 5.629276533297297 18.63404883864484, 0.21297082770615816 18.289355413082582 M0.8976524813403239 19.152525278445573 C-1.1709279814747136 10.781589534549676, 0.7957763081385391 4.307730194025288, -0.2083413686391542 1.701459044047568 M-0.7627515461519696 18.224375672868646 C-0.5948674996519207 13.919579681237186, 0.026126693215926047 10.105598531421434, 0.46729912218407166 0.7567126201453225" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(106.70463728072718 248.7231407535997) rotate(0 10.764576771620057 9.332093190767878)"><path d="M1.5089320745319128 1.660169729962945 L19.582188584557457 -1.9455205705016851 L21.3056948563966 20.089913758250113 L1.2862013783305883 16.89091977736556" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M1.5411449167877436 0.6062782611697912 C7.528200517594965 -1.4982741112611493, 11.8842231746766 0.9637918834783833, 20.060497608682795 -1.4906170163303614 M-0.44401769805699587 0.8216970907524228 C4.904674467107502 0.9495498349161614, 9.98244206409573 -0.8093385587243568, 20.681132672070248 -0.8154722405597568 M22.947585976521076 1.450664879471047 C22.436441785708272 5.081823940215335, 23.30115991825645 11.53723664279354, 22.8730601845474 17.402643983506568 M21.916951488283235 -0.7843355038412898 C21.064394186185222 7.460871216624397, 22.072788520449183 14.65918374900089, 21.844185988882305 18.016306300482693 M22.78217735578744 19.376641853513355 C14.259247027626172 19.103220250914596, 7.272398013551818 19.00953444511702, -0.14829849265515804 20.147647424401875 M22.042973367212994 19.260909376466806 C15.32677985582176 18.36194458742067, 8.938721834848007 18.88419747490808, 0.3148242039605975 19.543335763538416 M1.223191893596719 17.340045562217615 C-0.24113556688164858 14.865814647311284, 0.276729753526604 9.415062015753866, 1.7667414159061516 -1.7583645836808892 M0.25943061232083775 17.970211710200108 C-0.5560142539170538 13.778606770753926, -0.6754406753854238 9.476563601161759, 0.7857465218611658 -0.5892077966498611" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(128.44130931754898 297.1047539683108) rotate(0 6.281818150930462 6.573472501112704)"><path d="M-1.9469649586826563 -1.9455205705016851 L12.34017761501741 1.4257273767143488 L13.849837680191513 11.373678398055212 L-1.352249899879098 14.692027938288824" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.0414908535621796 -0.8420332267100237 C4.780494231132144 0.6665065608890127, 8.457850989728657 -0.9744638007822969, 11.57488108811755 1.0513881631940682 M0.08811362479328744 0.5418942168523657 C3.5468726401372694 -0.666242520795934, 5.991111577995895 -0.6154998662179277, 12.77688891401901 0.5296193682294073 M13.091883152828864 -0.8107832429889554 C13.800899167751494 2.270871312397336, 13.610189932491613 4.961156454836958, 13.686602985805695 12.388112524400375 M12.312145744990739 0.18834437417128747 C13.077085411385177 4.551455537444534, 12.840175861762242 8.26306023308607, 12.487074559048038 13.041297806941836 M11.896562387641948 13.42187340754191 C7.630176444219054 13.40777684516352, 4.916230676975727 14.123583409908859, 0.09297102800999335 13.22055104789701 M12.824764142930954 12.95265963653631 C9.159073124510334 13.355699510148446, 5.141052466133164 13.774180763562478, -0.3644593735262613 13.18843901192172 M-0.16117039798512778 13.652716179124027 C0.11745001034315858 8.393914656195808, 1.1583215110963072 2.70707285932228, -0.53733317514178 1.3127921218768859 M-0.26037401433240664 13.11232129011922 C-0.38104558003754274 10.66043838127585, 0.5169270233889314 7.07531265030333, 0.49594617489912385 0.5456540033545563" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(158.5583118961415 297.4859064979682) rotate(0 7.661128495758078 6.228644914905814)"><path d="M-1.9455205705016851 -0.22345868684351444 L16.747984368230505 1.2862013783305883 L13.548990387345953 11.105039929932538 L1.5450829360634089 11.155696546620312" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-1.0269200082730467 0.11009960483612535 C6.63666840806493 -1.2521734743158925, 10.412755192521713 -1.3328753026268696, 16.604500384053036 -0.618541780965719 M0.6608789249652576 0.3475306613067197 C3.229713695241281 -0.6711718651448846, 7.207939777262348 0.23858470070857954, 15.968165856809671 -0.5939154529599925 M14.55400536763256 1.0776868641406396 C14.79937031496967 5.5808922011800455, 14.577072405615612 10.431879623835457, 14.603230928279373 12.851115148258028 M15.500721300636569 0.43474823668108187 C15.399242597014057 4.4651197394247815, 14.785133718165758 8.43716125514713, 15.222151778393908 12.461272189725435 M15.65755193155697 11.047054702284392 C10.02914329084076 12.262321993184518, 5.345486076157367 12.489320221305457, 0.0897678602605192 12.50739526456957 M15.085312031374464 12.951962997039711 C10.101451642726984 12.762173529936303, 5.227854556805708 12.076498274605221, 0.05060492558838725 12.839888459176404 M0.47923971212509286 12.65021985996031 C-0.05601254256168478 9.082891474908488, -1.0852300890680031 2.8409469331335435, 1.243926398547002 -1.0572533957080235 M-0.03280744055883855 12.130863983174796 C0.41650049821296165 8.898233736980236, -0.3006251360266924 4.760224935960268, 0.517030387320713 -0.6063476694699271" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(182.8197522626466 296.9274893892056) rotate(0 7.661128495758135 6.228644914905814)"><path d="M-0.22345868684351444 1.4257273767143488 L16.608458369846744 -1.7732666041702032 L13.970007091637058 14.002372765875045 L-1.3015932831913233 11.24791690144557" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.11009960483612535 0.8593003868005527 C3.935998480685373 -1.039288849179737, 8.916451316129898 1.384935393364184, 14.703715210550437 -0.6803353278410714 M0.3475306613067197 -0.6866280743724951 C5.206022810712122 -0.7569195469582554, 11.971481873389413 0.5137330803004635, 14.728341538556164 0.5822269940824538 M16.399943855656794 -0.6197103605800823 C16.269292376994336 3.863415519345308, 16.436369346976868 8.691821515322115, 15.716082309962548 12.974956263060147 M15.757005228197238 0.18168553182127212 C15.484863747893188 2.257764338682755, 15.202051881423534 5.2802701582973155, 15.326239351429955 12.847521849725576 M13.912021863988912 12.720810870208062 C10.066145221432839 13.627317890600573, 5.150645060266301 12.580587184076236, 0.05010543475793483 13.244577768836443 M15.816930158744231 12.620449517502118 C10.052456453964936 13.083428476032159, 3.855789793070943 12.460505763371685, 0.3825986293647672 12.959376022065092 M0.19293003014867516 12.312898369334468 C1.194471650214235 8.4302830530395, -0.4938744674786677 7.994420688781336, -1.0572533957080235 0.34631055030647984 M-0.3264258466368386 12.78115255171168 C0.5737844980667266 9.253217731657708, 0.5192731962967294 7.352701845503714, -0.6063476694699271 -0.6058978404149993" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(209.19345940515484 296.4246466615356) rotate(0 7.661128495758021 6.228644914905814)"><path d="M1.4257273767143488 1.2862013783305883 L13.548990387345953 -1.352249899879098 L16.867339927579565 11.155696546620312 L-1.209372928366065 10.816664671486798" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.8593003868005527 -1.1251561830701284 C4.724869159905446 -1.216867971864718, 13.015943412361855 -0.6295517414493315, 14.641921663675085 1.2590253993689795 M-0.6866280743724951 -0.6496796861117045 C4.186855125821445 0.1905707474804157, 10.269106640866813 -0.6634215852213364, 15.90448398559861 0.5954575151963637 M14.702546630936073 0.8969817485071647 C14.403423559193655 5.301782037143338, 14.526405765891013 8.869271955753678, 15.839923424764667 11.410290566577928 M15.503942523337427 0.21026635723277942 C14.925526506313211 3.188045431149141, 15.257844640217991 6.525164240685831, 15.712489011430096 12.679171437443138 M15.585778031912582 12.343676449015796 C12.675623019158937 12.42965116438401, 7.845759773492527 12.366371751816846, 0.7872879390248073 13.371604137919698 M15.485416679206638 12.698480697823324 C10.762179211180383 12.111740075937858, 4.905651079935272 12.93882309262202, 0.5020861922534563 11.91376691614686 M-0.14439146047716878 13.636489898990527 C-1.2215329781959963 8.363672642707565, 1.1960024273344927 4.23109415692349, 0.34631055030647984 -0.9263777631291734 M0.3238627219000443 12.981731236230207 C-0.38670142422784204 9.68346390199827, 0.5235918754324808 5.3634300954178284, -0.6058978404149993 -0.06959224067496939" stroke="#000" stroke-width="1" fill="none"></path></g><g><g transform="translate(149.45728946832332 227.4707988107309) rotate(0 -11.697563280880559 8.772680379480363)"><path d="M-0.8880353961139917 1.6433941815048456 C-4.543679509754321 5.494981965602701, -8.887153174860638 5.837573503404779, -24.155102059371984 15.813087198943379 M0.759975497610867 0.7772451741620898 C-6.951276746958386 5.0345740345273144, -13.988122486323556 10.979206876689743, -21.739014656409836 16.768115584798636" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(152.86967728285663 227.8691963222181) rotate(0 13.692874450714157 10.239304725707782)"><path d="M1.6433941815048456 0.2805354204028845 C12.115812330402305 8.450372697654618, 19.066658725791093 13.269041799010857, 25.653475341410967 20.02286041187736 M0.7772451741620898 0.40180197823792696 C9.719003543677257 8.09180576042503, 20.32391623107886 15.380801296378355, 26.608503727266225 20.19807403101268" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(115.20521899557366 271.7381050958482) rotate(0 -3.5845992397433974 11.677531061903167)"><path d="M1.7252782676368952 0.9072571005672216 C-3.7361799191971983 3.9832129191479577, -5.700508883514224 12.02579473949178, -6.631086028164418 20.623351611615277 M-0.6167084770277143 0.8651053952053189 C-3.452401432330565 9.090765261662712, -6.7581277887958455 17.56218569059391, -8.89447674712369 22.489956728601015" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(119.30410431190398 274.44774088168884) rotate(0 7.987897081870074 10.214256913514816)"><path d="M0.9072571005672216 -1.7924985196441412 C3.987023390061884 6.152750549626826, 12.016045240021429 17.381211332409038, 13.24408365154909 22.221012346673774 M0.8651053952053189 -0.49746804405003786 C6.391486559510588 7.234621547348584, 12.113870022848557 15.152643040063653, 15.110688768534828 21.116614365246164" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(174.48117391670644 271.54470808548206) rotate(0 -5.626211785070041 10.643295373582745)"><path d="M-1.6960417423397303 -1.6309444811195135 C-3.520622159013798 10.128240890278185, -10.049927853035678 18.50380781137655, -9.577507738604027 22.917535228285004 M0.7200456606224179 -0.675916095264256 C-1.6076053558089534 4.637823349292976, -4.798710074596285 10.900963868586292, -11.9724692307625 21.731347246657123" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(189.46056306493358 271.6163345066776) rotate(0 0.10445925753458596 11.478089711611148)"><path d="M-1.6309444811195135 0.6789518799632788 C1.8260051794139314 6.052580603754815, 1.9966574758691138 16.752308511036624, 1.7790968439067 20.757288702728204 M-0.675916095264256 0.854165499098599 C-0.30955146261400834 8.808755961378266, 1.0602328449291973 16.151783877641392, 0.5929088622788186 22.277227543259016" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(196.1394872677812 271.658605883659) rotate(0 8.31480735110199 11.693242553687398)"><path d="M0.6789518799632788 1.6861977074295282 C1.5563356962139068 6.3250550511791435, 7.956635522567357 10.28607534733735, 14.430723981709889 23.96367811890454 M0.854165499098599 -0.5771930115297437 C4.396232889873626 6.085425711676383, 7.159622623721761 10.851284920842774, 15.950662822240702 22.931449381766328" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(150.48844094030744 146.0666644239527) rotate(0 0.4820803153887141 20.65912277933048)"><path d="M-0.8983009401708841 -1.0846829887479543 C-0.4825426286831657 6.038177052176947, 1.3350375337525406 35.46382285523889, 1.862461570948267 42.40292854740891 M0.8309523368999363 0.9602544968202711 C1.114898973554357 7.801758659630474, 0.6219958722590605 34.099780903455894, 0.7398274347557663 40.850540399212875" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(150.48844094030744 146.0666644239527) rotate(0 0.4820803153887141 20.65912277933048)"><path d="M-8.103164856855276 20.841693136183807 C-2.7994453142706845 26.378954251456772, -4.147756390803014 32.07181256180475, 1.3637297761066793 41.240459749136605 M-5.728209069531861 22.332503024762758 C-2.8488482554373027 29.000476914302446, -1.1374554925382698 36.92471136188208, 0.5753684493153333 40.25085420407031" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(150.48844094030744 146.0666644239527) rotate(0 0.4820803153887141 20.65912277933048)"><path d="M6.047930538232997 20.871073095337536 C7.851798443471957 26.295134931080266, 3.0035538986470023 31.980726814380134, 1.3637297761066793 41.240459749136605 M8.422886325556412 22.361882983916487 C5.785470410464322 29.047978763015742, 1.9801009677463268 36.960759521411845, 0.5753684493153333 40.25085420407031" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(86.91304841660292 85.70572291836558) rotate(0 -6.069917747407203 7.8656134531734105)"><path d="M0 0 C-3.301496364709221 4.278195406017231, -6.602992729418442 8.556390812034461, -12.139835494814406 15.73122690634682 M0 0 C-2.4435118269815015 3.166388787970618, -4.887023653963003 6.332777575941236, -12.139835494814406 15.73122690634682" stroke="#ccc" stroke-width="1" fill="none"></path></g></g></svg>
<figcaption>Clearing bloat in Indexes</figcaption></p>
</figure>
<p>If for some reason you had to stop the rebuild in the middle, the new index will not be dropped. Instead, it will be left in an invalid state and consume space. To identify invalid indexes that were created during <code>REINDEX</code>, we use the following query:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Identify invalid indexes that were created during index rebuild</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">index_name</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">))</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">pg_index</span><span class="w"> </span><span class="n">i</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indexrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">oid</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="c1">-- New index built using REINDEX CONCURRENTLY</span>
<span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'%_ccnew'</span>
<span class="w"> </span><span class="c1">-- In INVALID state</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">indisvalid</span>
<span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
</pre></div>
<p>Once the rebuild process is no longer active, it should be safe to drop any remaining invalid indexes.</p>
<h4 id="activating-b-tree-index-deduplication"><a class="toclink" href="#activating-b-tree-index-deduplication">Activating B-Tree Index Deduplication</a></h4>
<p>PostgreSQL 13 introduced a new efficient way of storing duplicate values in B-Tree indexes called <a href="https://www.postgresql.org/docs/current/btree-implementation.html#BTREE-DEDUPLICATION" rel="noopener">"B-Tree Deduplication"</a>.</p>
<p>For each indexed value, a B-Tree index will hold in its leaf both the value and a pointer to the row (TID). The larger the indexed values, the larger the index. Up until PostgreSQL 12, when the index contained many duplicate values, all of these duplicate values would be stored in the index leaves. This is not very efficient and can take up a lot of space.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 307.18219907171897 313.43858782190574" width="auto" height="50vh">
<g transform="translate(10 107.6412908258085) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-0.4485612418502569 -0.4997053537517786 L5.225899289149993 -0.3881890568882227 L8.229065189857238 14.57994898475446 L0.5352406594902277 13.748819255006687" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.40972277036861016 -0.5883272895292948 C1.0568773124552022 0.25752631693984657, 3.3622637583959856 -0.21277034508830905, 6.012274169592565 -0.04777237005700674 M-0.2872641045593696 -0.11420775746779133 C1.6014323836369901 0.11785649128669029, 4.083158833132231 0.10432876751219083, 6.682955778322728 -0.2615904198902685 M6.021882515627821 -0.08882281421537641 C7.463708399526013 1.9044728441130412, 7.35920955562554 6.18011459499136, 6.990542116238773 12.801108364339493 M6.123002482002408 -0.3876052728191664 C7.155053783548244 4.374456628539132, 6.115607548598652 8.635350305214427, 6.178960106540743 13.507330362256337 M6.90553007803129 13.763417261198372 C4.295392697003441 12.787946379765202, 2.6280506957903556 13.335979590829707, 0.056463214848035626 12.612511797688947 M6.7703396987202975 13.129161615514041 C4.634502191435737 12.982269663225896, 2.88706590760344 13.126123710662355, 0.12038776119571343 13.276742095112123 M-0.9162939349060555 14.304178639203204 C-0.587151479793689 11.006612650585243, -0.33109782446759545 7.2176473450205645, -0.6505719101596745 -0.9638430020250367 M-0.21321498499324226 13.789646635703976 C-0.42551721426394473 10.579754929318598, -0.6352463516175878 7.204710819764004, -0.4939289143367984 -0.4578624158084681" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(137.17690806005203 10) rotate(0 14.90250780610279 12.090713880423053)"><path d="M-1.277130952104926 -0.3881890568882227 L31.53105056080798 1.4330039825290442 L30.34025627169589 24.783302013627384 L1.0720560047775507 25.396834377995823" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-1.2718049678951502 0.7023947332054377 C8.902038422181352 -0.47435249873357027, 13.702428671999067 0.12272171191019804, 29.664922544544734 1.5675309393554926 M-0.9347187532112002 0.19106374215334654 C7.22507303045762 0.254147921091956, 13.327710158669897 -0.6998299915724575, 28.890552454440094 -0.05108850169926882 M31.276680299824275 -1.2738639619201422 C31.02921134225174 8.088833235979468, 30.45815241328522 13.53265569327849, 29.972988376205958 24.39471881916294 M30.775587730853058 0.4867392284795642 C29.054515081910058 7.780296401193272, 29.172220278052254 14.50739677916821, 29.993785106389023 24.273681485890542 M29.39298892933706 23.241419010660742 C24.146448604055045 25.20943396320054, 16.986294005701737 22.820328822037666, -0.3206170592457056 23.675360911390875 M29.737760507313705 23.710259521245156 C21.627311828867033 24.13637996390211, 13.344046873870067 24.564316190031644, -0.7201772285625339 24.506427818774377 M-0.8282135520130396 25.485058539412115 C-0.2701521419483307 16.83812933247224, -0.8211572073894623 7.8300646427442775, -1.9115876350551844 1.75326825119555 M-0.4719980312511325 24.364461237429772 C-0.7126177489960982 16.049894760950014, -0.6729631662572219 8.947619048325235, -0.22428062092512846 -0.2498526768758893" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(140.75580855110957 59.579564489112144) rotate(0 11.799059530240811 8.987265604560996)"><path d="M-0.3881890568882227 1.7260349486023188 L25.031123043010552 0.5352406594902277 L24.19999331326278 19.046587213899535 L1.2154066171497107 18.867347642805022" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M0.7023947332054377 1.2961665596812963 C7.822235226760403 -1.359254027730538, 16.985542288308533 0.009967957609580302, 25.165649999837 -1.7140263710170984 M0.19106374215334654 0.837897484190762 C5.1440539349252 0.7739351103996087, 9.184670584911409 -0.6938871135021399, 23.54703055878224 -0.09306552540510893 M22.45326368349604 1.1797531008921722 C24.53813161218691 5.401060441292348, 23.113146598477698 9.620027919336295, 23.78980939969863 17.770229102870857 M24.035564533132003 -0.5488541645047786 C24.21175954519893 3.9101404605429932, 23.984851741397428 8.87607183616862, 23.68102993347995 17.237181445481024 M22.658110310296138 18.81215101110522 C15.80355823823092 18.09956794929363, 4.874430741971366 17.605298876284138, -0.5060668494552374 17.621641155358 M23.126950820880552 18.21243743135367 C14.549917906276974 18.328911542200363, 5.958399632481502 17.238483040832794, 0.32500005792826414 18.05232746436034 M1.1716076057253306 17.580270795136506 C-0.06901650395889575 12.94546756458809, -1.4759854346175836 7.977335942277191, 1.575708744953857 0.010563147031878994 M0.16449704685835442 17.263591918655212 C-0.9512462743014427 12.489887145901578, -0.8017310785488132 8.154277459091253, -0.2245492369094172 -0.5738957539186418" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(76.03758884250328 58.6756194766938) rotate(0 10.764576771620114 9.332093190767885)"><path d="M1.7260349486023188 1.4330039825290442 L22.06439420273034 0.6018742527812719 L22.601209548017664 19.879592998685474 L0.8928164336830378 19.806256695451374" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M1.2961665596812963 -0.7439976241439581 C3.0016946347653546 -0.22253323379590498, 8.881926307871874 1.4919211190597774, 19.815127172223015 -1.766955366358161 M0.837897484190762 0.14945937227457762 C6.255904940846837 -0.8456529468034902, 10.490628591655744 -0.09222290811292319, 21.436088017835004 -0.36597683001309633 M22.754171959746667 0.9886892063244848 C22.358591565597294 4.982187460523971, 21.82324441206047 11.1453607144933, 21.317012678766485 17.585166285590184 M20.959240686909233 0.7472384357416592 C21.25023366204446 5.010345366906043, 21.88646438356101 9.083533714140383, 20.763512800966286 19.241788798673088 M22.36677334522335 19.409647914381857 C13.20314542379133 16.919354809645334, 4.209216217569271 18.056931568506876, -0.35289005376398563 20.308401140185232 M21.7670597654718 18.635439727256234 C15.558407325299987 18.961937419635095, 8.126981333187256 18.887747905713358, 0.0777962552383542 17.967222992131646 M-0.4093875808984646 18.756809601820677 C-0.8463009823342184 10.053723702829119, -1.7066217347730308 5.816085192200483, 0.010968438769546562 -0.6053853908381921 M-0.738216939782809 17.85892575364117 C-0.6131163213568973 10.590086645170809, -0.12120617326576756 3.5828395895087928, -0.5959152530928642 -0.18113082272585945" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(208.83633044969838 59.65253340982865) rotate(0 11.454231944033836 8.642438018354092)"><path d="M1.4330039825290442 0.5352406594902277 L23.510338140848944 1.0720560047775507 L24.123870505217383 18.177692470391214 L1.1420703139156103 18.160999057133616" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.7439976241439581 -1.5093150530010462 C5.268860843935556 -0.2885589917878272, 12.399099454226985 -1.862492295143998, 21.14150852170951 -0.7024894747883081 M0.14945937227457762 0.5533590661361814 C3.926723545686624 -1.1064126003504355, 9.41139567045365 -0.28501496798997716, 22.542487058054576 -0.06756156217306852 M23.824087533310326 0.6409536454357037 C22.343684103943463 6.602509315831301, 22.982706459857805 12.718216443305142, 21.909184936482234 16.26567209378156 M23.600480323589117 -0.42606948646896065 C23.466410896191142 7.026227551886576, 23.07232587039827 13.017110588117907, 23.443380651897165 18.104159215557484 M23.653925420913765 17.458528050471962 C14.864226032601373 17.060201892390786, 9.601208544313394 17.027928897872506, 1.6442147586494684 17.175490805198372 M22.879717233788142 17.65512730951321 C16.58918540766793 16.619199469153056, 9.796527841496447 17.417172952953944, -0.6969633894041181 18.16510611052525 M0.08577823045794819 16.429540077048276 C-1.3798729439898347 11.163113889357431, 1.402585737024807 8.355665127824196, -0.5606465356252652 1.6899771104825243 M-0.7457507038276036 16.635485622903886 C-0.6336092861702043 12.721422835163999, -0.2475707368951354 7.993759651888079, -0.16774499317798974 0.7458575030404262" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(124.72110028775455 108.09106587374295) rotate(0 3.418704253333999 5.925614611875506)"><path d="M0.6018742527812719 1.0720560047775507 L8.052815123817709 0.8928164336830378 L7.9794788205836085 12.727352244176451 L-0.6656810436397791 10.761571687974516" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.05022876986474489 -0.0478936766275021 C3.0302360340223 -0.5343759065791224, 4.384617347481149 -0.18856135762495851, 6.1983031112131615 0.13063808559151124 M-0.2750410955916792 -0.3126279086970385 C2.6066616428069875 -0.06965177061453479, 5.123137926686378 -0.060932719856709755, 7.088967823020158 -0.21774820723926447 M6.5256562882732565 0.09953418647572732 C7.026575529516304 4.1985172976505485, 6.215040297569077 7.967632892071869, 7.987655772295194 12.428075040641316 M7.162275577298287 0.11185752730092757 C6.986969553286865 2.3454515880968025, 7.2990647110641085 6.224958097032192, 6.7153329400435595 11.572722744481549 M6.275495367196217 11.74161973333754 C4.326819227625516 11.736523199755334, 2.7243569256027715 11.819770296910106, -0.04598506263051072 11.529072250799075 M6.973879615385016 11.60502192830691 C4.734627659937296 11.903141054641512, 2.1713598267545695 12.177469445500924, -0.1415708596467919 12.074065628124048 M-0.8688500903269519 10.718496061534658 C1.0330378848803383 8.562774612047857, -0.39028387244587615 6.419109233091924, -0.5593756861516365 0.2169171686613014 M-0.41273713716517735 11.718329171299567 C-0.0457895333663373 8.85496482178756, -0.01274940345849064 7.126869898533532, 0.4245714668874948 0.15858149363725882" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(208.6725173442362 107.83334806567969) rotate(0 3.721734556364254 6.228644914905818)"><path d="M1.0720560047775507 1.2154066171497107 L8.336285546411546 1.1420703139156103 L8.319592133153947 11.791608786171857 L-1.089657535776496 13.598380362576428" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.052138921102075075 0.5833934065169456 C1.31822944950221 -0.6521457852012162, 3.6508411987310283 -0.6902869174099404, 7.585686819056583 0.6236864043206861 M-0.34033891347780887 -0.019013784220704266 C2.3096538377412745 -0.07632277244874705, 4.663436091801682 0.33374135522465154, 7.206419936369244 0.24427495952384054 M7.54809338296182 0.1328514265779983 C7.348070896914918 3.6768961840747947, 6.755117957962409 10.003232079071989, 8.049814276799392 11.696520570019727 M7.561046927731962 0.05746156953791037 C6.871033129599475 5.120182795696993, 7.972516227447658 9.341013241737045, 7.150720076638054 12.718151645824001 M7.324143953853045 12.268945181666842 C4.801651073417531 13.008050866192422, 2.3817953760380943 12.346056648063778, -0.35071262383683843 12.634374591502384 M7.175438264903725 12.578246224452837 C5.260779563379984 12.227612109206918, 3.3477808800638025 12.332883832464718, 0.24258838586645654 12.3756558450437 M-1.1906600602483313 13.549338367539134 C0.06896526998816997 8.25422157242456, 1.2796187498938543 4.271292772357941, 0.2280101066360689 -0.9854361918769434 M-0.13969643490372208 12.301665469281774 C-0.30413602763974856 9.674140230293965, 0.631146673163188 7.579063458738153, 0.16669120059923215 0.18744305019994045" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(251.7198841604038 107.96220696971129) rotate(0 3.1156739503036306 5.622584308845209)"><path d="M1.2154066171497107 0.8928164336830378 L7.3734182145228715 0.8761230204254389 L5.565666856967482 10.155511081913922 L1.1410905327647924 12.8262757069864" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.48839153140448877 -0.534034731431144 C0.8486253302073951 -0.2145079724807483, 2.2158462476289715 0.12342330807023508, 6.753470973510883 0.09313333456492678 M-0.01591751139044545 -0.028996183317601842 C2.1001952081184965 0.03593340085280666, 4.6577046779140465 -0.1414638573360066, 6.435844249439673 0.16504513736205512 M6.351272596378179 -0.12781482904974384 C5.551657554781466 3.1274690764815114, 7.293191662983952 8.128428954825841, 5.544603133294988 12.145590638319936 M6.283218335293994 -0.4612983963748014 C6.641003978574488 3.3521753122759543, 6.177165897951327 5.374070669100698, 6.46682729837771 11.454739633561825 M6.07367397061127 11.135219582907048 C4.820122934077816 11.201098363858968, 2.3424215534352237 10.941577913428246, 0.1482476438444822 11.227255577310995 M6.332607322040688 11.269407394278149 C3.9140760136078265 11.474542860204304, 1.9020840924799665 11.403376291957342, -0.06834041384439987 11.260630516148549 M0.9857898558368579 11.251777099888088 C-0.2823027865128511 9.527543975836796, -0.44832795266848535 5.816818038440428, -0.8895511215539019 -0.9703387393109475 M-0.14048177405253476 10.886129795108133 C-0.030805254607593854 9.201815757927404, 0.481185315732688 6.338456982986692, 0.16920443647929573 0.301386263533277" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(148.02101026086848 35.884815995363454) rotate(0 -23.971252877089796 9.31638414407385)"><path d="M0 0 C-12.698796441889426 4.935364306014655, -25.397592883778852 9.87072861202931, -47.94250575417959 18.6327682881477 M0 0 C-14.05884466792951 5.463944593128704, -28.11768933585902 10.927889186257408, -47.94250575417959 18.6327682881477" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(151.43339807540178 36.28321350685064) rotate(0 0.08027251050077666 11.80897408051031)"><path d="M0 0 C0.033157062113982214 4.877770542458591, 0.06631412422796443 9.755541084917182, 0.16054502100155332 23.61794816102062 M0 0 C0.05997692820951473 8.823269463435237, 0.11995385641902946 17.646538926870473, 0.16054502100155332 23.61794816102062" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(154.36003610032094 34.93365429834114) rotate(0 26.90922922078005 10.624632932971565)"><path d="M0 0 C17.17381313914491 6.780776185218786, 34.34762627828982 13.561552370437571, 53.8184584415601 21.24926586594313 M0 0 C12.717667797728016 5.021346059588701, 25.43533559545603 10.042692119177403, 53.8184584415601 21.24926586594313" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(71.2048372240165 80.15212228048075) rotate(0 -23.10081459280127 11.698068685788044)"><path d="M0 0 C-17.73171164169389 8.979197675815334, -35.46342328338778 17.958395351630667, -46.201629185602656 23.39613737157609 M0 0 C-16.89291917980909 8.554439847781541, -33.78583835961818 17.108879695563083, -46.201629185602656 23.39613737157609" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(81.45756869419301 83.8873990919624) rotate(0 -14.554766232255702 10.289855877415818)"><path d="M0 0 C-9.167928984288652 6.481496606496379, -18.335857968577304 12.962993212992759, -29.109532464511403 20.579711754831635 M0 0 C-10.274865373056617 7.264072961494574, -20.549730746113234 14.528145922989149, -29.109532464511403 20.579711754831635" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(92.71052560552926 84.93456877733809) rotate(0 3.1552117382235565 10.074882880301189)"><path d="M0 0 C2.2423200471191946 7.1599352845956865, 4.484640094238389 14.319870569191373, 6.310423476447113 20.149765760602378 M0 0 C2.1271286593114835 6.7921185302007965, 4.254257318622967 13.584237060401593, 6.310423476447113 20.149765760602378" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(143.8141254784823 81.49718680857619) rotate(0 -5.565999043464103 11.056965635904575)"><path d="M0 0 C-2.4440095408266695 4.855072610635041, -4.888019081653339 9.710145221270082, -11.131998086928206 22.11393127180915 M0 0 C-3.0467669246354205 6.052461907240674, -6.093533849270841 12.104923814481348, -11.131998086928206 22.11393127180915" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(157.79351462670957 82.56881322977169) rotate(0 0.4877464437154231 10.995352828391816)"><path d="M0 0 C0.2034426356074676 4.586242683395541, 0.4068852712149352 9.172485366791083, 0.9754928874308462 21.990705656783632 M0 0 C0.30084067805747333 6.781903677609435, 0.6016813561149467 13.56380735521887, 0.9754928874308462 21.990705656783632" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(171.0109003680186 84.63672563239413) rotate(0 7.832070467882659 11.11673366424695)"><path d="M0 0 C4.619216531648152 6.556452745208931, 9.238433063296304 13.112905490417862, 15.664140935765317 22.2334673284939 M0 0 C4.8662933406202615 6.9071501873755246, 9.732586681240523 13.814300374751049, 15.664140935765317 22.2334673284939" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(218.44693905993927 81.66042333333068) rotate(0 -2.497168498125802 13.759874370826921)"><path d="M0 0 C-1.4518210392148485 7.999810634930292, -2.903642078429697 15.999621269860585, -4.994336996251604 27.519748741653842 M0 0 C-1.4415344685922782 7.943129670237908, -2.8830689371845564 15.886259340475815, -4.994336996251604 27.519748741653842" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(224.86302491158904 83.77721257796638) rotate(0 8.393870813360422 10.293591016739072)"><path d="M0 0 C4.421930041653634 5.422711448091953, 8.843860083307268 10.845422896183907, 16.787741626720845 20.587182033478143 M0 0 C4.065784034826878 4.9859616555309065, 8.131568069653756 9.971923311061813, 16.787741626720845 20.587182033478143" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(231.67339697531725 82.31858859402979) rotate(0 15.20624662486216 11.553439809358409)"><path d="M0 0 C8.91827641956437 6.775950197216492, 17.83655283912874 13.551900394432984, 30.41249324972432 23.106879618716818 M0 0 C7.330473150815566 5.569565088047047, 14.660946301631132 11.139130176094094, 30.41249324972432 23.106879618716818" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(233.9060627600552 81.27525822827378) rotate(0 27.449387833183778 13.390851929034326)"><path d="M0 0 C20.509262176994724 10.005195549529535, 41.01852435398945 20.01039109905907, 54.898775666367555 26.78170385806865 M0 0 C21.79795480087572 10.63386866287188, 43.59590960175144 21.26773732574376, 54.898775666367555 26.78170385806865" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(86.91304841660303 85.70572291836558) rotate(0 -6.069917747407203 7.86561345317341)"><path d="M0 0 C-2.8687257014201695 3.7173959202648605, -5.737451402840339 7.434791840529721, -12.139835494814406 15.73122690634682 M0 0 C-4.232844064705801 5.485068596647267, -8.465688129411602 10.970137193294534, -12.139835494814406 15.73122690634682" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g transform="translate(20.511979697757397 107.85943350261125) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-0.8196858074516058 0.946388503536582 L5.622484455604308 -0.8256191406399012 L5.041492412585967 11.254417323243992 L1.796407887712121 11.55209275401868" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.38481820062735256 -0.30378136238499065 C2.5249011509447694 0.43741566127245457, 3.417532931489477 0.30527252823361095, 6.668353311829706 -0.20877235249213555 M-0.26895931014899577 -0.2852865608539624 C1.1456036729190713 -0.019992757351968753, 3.108375164039332 0.021733284437424623, 6.337712334080037 -0.19228819481254883 M6.8365775197051315 0.9118283449767994 C5.339975719230488 3.110564883706303, 7.151402729680647 5.435810125445369, 7.651012127193148 13.007376746390577 M6.610484859853435 0.4720909910286373 C5.76867946831169 3.54177053573575, 6.923369530707925 7.137166266952054, 6.185578141102774 13.495713152006502 M6.423926847509626 13.326383188048796 C4.76059252886008 13.610518734491542, 4.2779833677426 13.567967653408127, 0.18231681988016468 12.561028923246296 M6.770093129795381 12.93248747511016 C4.4225939772703375 13.338190399506399, 1.87442684917397 12.902693115996104, 0.17785706168388893 13.184645040481374 M-0.42952780439938343 12.142567349063825 C1.1096744941103647 8.651832580272302, -0.8942503227559575 2.12099341254441, -0.43831440655484255 0.8185677727746472 M0.5777536229940521 13.277647327338636 C-0.14769565002710763 7.58747469919055, 0.055687231419579736 3.058178433119986, -0.5557089063085043 0.40196612292429723" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(31.42107060684839 108.46549410867185) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M1.171240458264947 0.948160907253623 L7.178184280891173 -1.6393527183681726 L5.307961891193145 14.002619170320408 L0.24776811338961124 13.161546432149784" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.012468147842925492 -0.3306358143497638 C1.064852482341545 -0.3999144303560199, 2.6157867988567975 -0.01358438587305845, 6.668016710604478 0.4510285318158571 M-0.26974236150229436 0.2839199874725833 C1.3694024784719003 -0.18444659627667442, 2.5008866219214863 0.1117679569261476, 6.556181801118475 0.23351599864178052 M7.419661825338314 -0.6349042003042887 C7.241035600473209 1.5427821383082359, 6.905441875326629 4.487313002182269, 6.343109722900069 13.509708801992684 M6.976608419904325 0.18429171594030747 C6.01408542638882 3.4112232171923313, 7.187015631215586 6.476512928335731, 7.0429417867222766 12.713383845395635 M5.9933410096118935 13.502659125593192 C4.438421898951283 13.632588999286014, 1.6314659246618708 13.308651077280992, -0.21246246188724466 12.650137702046942 M6.268360630482741 13.03854066036039 C4.264833355256135 13.4075425624225, 1.7719232486544427 13.08539136150543, 0.28578116677972965 13.211595852492891 M0.624600531380977 12.035527189608406 C0.8757196229759756 8.83798534539094, 0.6938932574640739 5.027837760729067, -0.5427184717388694 -0.9607378726091331 M-0.365772064693377 13.310748176957489 C0.25223677506953385 10.414736113596145, -0.49900129240444874 7.920345080213589, 0.34617934324233235 0.6559798221231616" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(44.14834333412091 107.85943350261122) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-1.9521041419357061 1.9809646215289831 L5.646276066798919 -0.380755165591836 L7.070431957740539 14.195201479566471 L0.3884177301079035 14.459195040834324" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.299089329926429 -0.6139092927352279 C1.7047641258511024 0.11320675086164175, 3.2898806960273355 -0.5967958766480737, 6.703207178736831 0.5386167327992468 M-0.292042415549828 0.0895159866654896 C2.166910503606255 -0.07405476697492555, 4.898346068214204 -0.22230980566128825, 6.758557405792482 0.25430080452066006 M6.956038471090809 1.161334706944889 C6.754038032016898 5.971062948901153, 5.672766270783421 10.164775140582723, 7.233468390394264 12.020346138978308 M7.050264784515254 -0.5979552519055283 C6.794731354856184 2.6988990025236808, 6.0727548741374235 6.322367531311375, 6.77475032387251 12.640024315656884 M7.124523052226328 13.447314445914403 C4.998650799560066 13.758997130029615, 3.669192240125631 13.101395762629284, -0.35509046726879073 13.2621093701448 M6.369769201400288 13.300804828687275 C4.10001216012668 13.052485205959043, 1.7456942551789116 12.879031743721843, 0.2920523704855198 12.88766069221487 M-0.06434650699909761 12.38511517442967 C0.31418206154881456 10.62676431742921, 1.2791990092999086 7.491777954123201, 0.6232709650461767 0.44381065135640774 M0.08143484400095924 13.151744107129653 C0.34991828267298297 10.83603375756053, -0.19005924045204664 7.847716902075161, -0.50186384019652 0.40612547455353565" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(54.952959545848444 108.05339246724017) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-0.3225600626319647 1.7032166924327612 L6.475990663070434 -1.2274785432964563 L6.886266778011077 12.357810758722202 L-0.29906814359128475 12.828585707319156" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.02280959086273382 0.5715623335594593 C2.485108715358828 -0.11607597820976925, 4.356523562378963 0.08512742355048628, 6.8119838531842944 -0.5497538511718403 M-0.14315539682238737 -0.1342256559835044 C2.1109570520834855 -0.33845108351307635, 4.989187709581103 -0.2900562753029167, 6.322104041464424 0.08102391838681372 M6.849972937750149 0.6923586864846647 C7.808555234801332 5.454641502279372, 7.119866555601186 9.813270266996097, 5.425406239873165 12.361370108617294 M6.77439103456709 0.5555482546209853 C6.250700480242252 4.940249460477603, 6.0174533698585915 9.41924259866577, 5.86142509643879 13.798035825490311 M6.843872419891931 13.27323961448198 C5.257376168426965 13.042361274196018, 2.8053970787212665 13.13843098106358, 0.3672933409371284 12.902155620934488 M6.606573173217494 13.03877033914437 C5.008519245653804 12.783082829948274, 3.068515993985178 12.990784439683727, 0.26543262655495425 12.924777091856686 M-0.01799375915264978 12.268567091541499 C-1.3653842653456747 8.733391644412123, -0.9590600717513276 6.337225828259107, 0.1400514940734865 0.2665502769564383 M0.2897059798678753 13.138900337260827 C-0.5375612354487048 8.552646333329626, -0.3988350700454671 2.9574690922774844, -0.40220543036761 0.5500500115910041" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(65.46493924360584 108.27153514404293) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M1.7032166924327612 -0.027039578184485435 L5.275551697958463 0.3832365367561579 L5.713895997751706 12.84787685863413 L-0.3183592949062586 13.625420582217352" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.5715623335594593 0.12930170053495127 C2.4863628495070142 -0.35280950818251733, 5.259968134615154 0.31123457101564833, 5.9532763900830785 0.39765859717131535 M-0.1342256559835044 -0.23761061746430612 C1.3767078744465089 0.32068344984999725, 3.1094896753238817 -0.15229512042601723, 6.584054159641733 0.18317735917619538 M7.195388927739584 1.3119596442463233 C7.1967589529348714 4.503007958714704, 6.870657909831753 6.6818499859553, 5.717455347646799 13.709420063613909 M7.058578495875905 -0.2686167298128587 C6.937680960657253 3.605089348473023, 6.910115536789302 7.572301232522957, 7.154121064519814 12.865352501925413 M6.629324853511483 13.5736250864835 C4.552891406060703 12.828064371727539, 2.821855158369675 13.548322561026243, -0.24478938129092742 12.924669068055668 M6.394855578173875 13.381067144831029 C3.727530844759222 13.190405617761646, 1.4660436193611248 13.433086215172885, -0.22216791036872896 13.026914446797557 M-0.8783779106839158 11.864450555883971 C-0.9880040447525349 7.497793058768385, 0.0745906281034045 3.7410331148790332, 0.2665502769564383 1.0480397864456452 M-0.008044664964588777 12.657800062212186 C0.29577124800493504 8.368279772593052, -0.3072827059339366 3.888246661813062, 0.5500500115910041 -0.10601698508342061" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(76.37403015269683 108.87759575010352) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-0.027039578184485435 -1.2274785432964563 L6.886266778011077 -0.7891342435032129 L6.203962097663634 12.828585707319156 L0.4784755799919367 13.641666793001072" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.12930170053495127 -0.08603903384087053 C1.8961880487966722 0.05800813456343909, 4.8115106440603 -0.6069100845277862, 6.900688838426234 -0.26652207970926245 M-0.23761061746430612 -0.30767911822070443 C2.5506810957857144 -0.23243917881384393, 4.336331250809895 0.10786904958351459, 6.686207600431114 -0.015914240185207318 M7.814989885501243 0.7699116944595272 C7.160995475950619 3.6260643242282864, 5.460100509522688 5.578932452135637, 7.065505302643412 13.309814690227334 M6.23441351144206 0.433690898421486 C6.055438559720834 4.359807185029636, 5.915697255101224 8.558454244217097, 6.221437740954917 13.021800821691688 M6.929710325513002 13.023912121589788 C4.01466479445457 13.132611289683007, 2.5995223853145033 12.914785082160146, -0.22227593416974706 13.66522476498249 M6.737152383860533 12.840634440863269 C5.150389307172106 13.026701117982514, 4.017676941356815 12.903142159240478, -0.12003055542785823 13.351948902213365 M-1.2824944463414438 12.192152133421079 C-0.8743708768318317 8.978551546127415, 0.1418508699887515 4.86110475888265, 1.0480397864456452 0.48416295435474765 M-0.48914494001323006 13.544769620442896 C-0.4083599705835674 7.977098463710935, -0.460146626341185 3.7677606082302386, -0.10601698508342061 0.5598024045571448" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(89.10130287996935 108.2715351440429) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-1.2274785432964563 0.3832365367561579 L5.713895997751706 -0.29906814359128475 L6.18467094634866 13.625420582217352 L0.4947217907756567 13.862507533949987" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.08603903384087053 -0.35509046726879073 C2.043263849433127 0.29602344187587987, 3.306445111855644 0.38472842711782024, 6.236508161545657 0.30771965292371906 M-0.30767911822070443 0.2920523704855198 C1.4053139687211562 -0.13340407629763348, 3.410220475850216 0.2306994826690566, 6.487116001069712 -0.18841648793720334 M7.272941935714446 0.6232709650461767 C7.016076761259794 3.2134021957811, 5.7866912162952655 9.1445274557142, 6.665899929256837 13.156543212033892 M6.936721139676405 -0.50186384019652 C7.020265366732652 3.845199942475838, 7.2652307154440114 8.692017674283932, 6.377886060721192 13.333434981245533 M6.379997360619292 12.793980153863838 C4.184315816473365 13.556906351588353, 1.6747383579370159 12.967337076481476, 0.5182797627570752 12.571428893075469 M6.196719679892773 13.16969703136913 C4.246426353326602 13.459202057301493, 1.9648724582829389 13.07373887531868, 0.20500389998795032 13.229083202964146 M-0.9547928688043372 12.270774749478319 C0.19563959511999382 9.360906754783926, 0.3543880680248299 6.342805305641109, 0.48416295435474765 1.0205337021254657 M0.3978246182174816 12.796526227615423 C-0.20362040272847226 8.22124411145443, 0.5516589445839218 3.9939370963352436, 0.5598024045571448 -0.00888719618187006" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(100.81501000078788 108.16246380564154) rotate(0 3.2515151206274595 6.5734725011127075)"><path d="M-1.2321522738784552 -1.2224123869091272 L7.434701750774138 0.7154360022395849 L5.314294348258727 11.676654183042423 L-0.4081327822059393 12.169415139329807" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.14693217811379866 0.35838861855894333 C2.8496969913305596 -0.07978627489817579, 4.483342512127156 0.13735878415267744, 6.343898704618293 -0.05908401625209314 M0.3167664452615834 0.05807295920052624 C1.9189009456011623 -0.17439104941656802, 4.086321155905166 -0.17344507555060967, 6.460395428060921 0.184003669712536 M6.763130126687492 -0.05090740954199813 C7.525842128862617 6.074786832804691, 7.4424721604253925 9.450814709879284, 7.245675136798296 12.269065708302762 M6.921213472979419 0.303422156427853 C5.869715499909763 2.458682561186232, 6.538139890065207 5.805813026185644, 6.418324127144416 12.998006466245046 M6.706677283871458 13.412826464229012 C3.8905265416704635 13.334889098212228, 1.6827318920966476 12.833946995477689, 0.26325350312372153 13.735702875276633 M6.441447471037673 13.020167123173307 C4.470351766498922 13.42348910748538, 1.8666767745738087 12.871262364165414, 0.07665925414291669 13.341594460907384 M0.62279566094977 14.311852619940343 C-0.7701963023042613 10.166397566147745, -1.1988275425084185 7.656453133329493, 1.2430244077162267 -0.6258052868016647 M0.31871600533787947 13.332197062824447 C-0.20937246680492447 8.654102561018863, 0.2730583896099861 4.142979712222109, 0.21176989975134508 0.5950916669202637" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(136.10236632262672 109.11335199790902) rotate(0 3.418704253333999 5.925614611875506)"><path d="M-0.7134236600250006 -1.6734930668026209 L5.653580877355125 -0.5434945616871119 L7.811421308091667 12.42316674567062 L-0.003879418596625328 12.587814146049801" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.5800189464733781 0.27421211232041465 C2.4412431776519585 -0.350892726748624, 3.898047543971332 -0.5321672011976656, 7.3431936210012045 -0.26504163187949253 M0.26735502449740695 -0.11416882151384633 C2.2934808537051405 0.1326495748581104, 4.8555962046515235 0.3169758306745296, 7.015193950275245 0.2497316464262192 M5.787056185544101 0.37197207992781456 C7.656839089408684 3.301658856827218, 7.35969900211877 5.138237417466598, 5.71116744151604 11.02746846423648 M6.489439138871285 -0.08217409282387866 C6.70933928546969 3.539849045695814, 6.477200785389395 6.363580496864367, 6.2876988940865 11.720752280764101 M6.176643130120872 11.188939822295236 C4.989375987002836 11.83309388777104, 4.437614409797442 12.307108012289357, -0.40820063892022085 11.642022421606583 M6.515908586772752 11.594683544648724 C4.753179250264562 12.020574736100368, 2.009243360274268 11.97780006672931, -0.1427801690606661 11.817463217680114 M0.5138778333209741 12.625356149630127 C0.9506866756736718 7.699465885242861, 0.4833646474336429 3.2585818331513323, 0.8371176970937675 0.55719511180147 M-0.2194035523066621 11.280790172769361 C-0.5313797901641131 9.231162790299498, -0.3385471725824598 7.128249429646121, -0.5824975795230847 0.4857951128286738" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(146.40539662565675 107.90123078578783) rotate(0 3.418704253333999 5.925614611875506)"><path d="M-0.21320042945444584 0.6020698044449091 L5.34893886687496 -1.9179824497550726 L7.906576379558828 13.653621537007872 L-0.7206467781215906 13.510602516450469" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.4772669326391108 -0.18788305536461442 C2.4030384421414355 -0.6559330805209902, 5.349325732336909 0.44160151294354316, 7.168300893161164 0.3201168351872167 M-0.14750279654938453 -0.31401424195501954 C1.9371938259941406 -0.15323050450643172, 3.688088362948572 0.036440798167459446, 7.123460203650199 -0.14501418823351347 M5.818797666904361 -0.984042732344001 C5.72746240093523 2.9610789746271706, 5.579525298873979 7.546334287223387, 6.905975082249551 12.765199849593891 M7.037446096805332 -0.3670579930864536 C7.283083547984413 3.888287077300943, 6.321986679804827 9.233945891801783, 7.027648190757878 11.393494441664604 M7.076026389308503 11.876671830670121 C4.7937335954425855 12.14095097906669, 2.913659291321575 12.395506270690701, -0.10546400445449788 12.13264426567479 M6.495687642307391 11.891408009427739 C4.861288359532921 11.756793901129685, 3.161320897276587 11.584839132191172, -0.14383153202100094 11.66394470870019 M-0.4227473664301846 10.859581726799217 C-0.5849911139152131 7.936708870854959, 0.6936619647316058 5.005204514897179, -0.0022987939521743606 0.4364718378460888 M0.5780657787471678 11.945227806528823 C-0.5395985307125286 9.046313617749064, 0.45783456351872587 4.778098470264037, -0.4712881982623274 0.06699959163851277" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(157.38143027488786 108.69712647980357) rotate(0 3.418704253333999 5.925614611875506)"><path d="M-0.06811509467661381 0.016264865174889565 L7.958715363762167 -1.3967012073844671 L6.67342685105541 12.292731148519103 L-0.2605895195156336 12.934532684602324" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.20920680214442844 0.2308180349287804 C2.154987285601084 0.6367127238268169, 4.514051569429478 -0.4722712943125542, 7.233031167719488 0.21951249271175222 M-0.03376600607089825 0.1193089413202526 C1.5357759103892146 0.0993114311844464, 3.1943466921242485 0.251867132108961, 7.112923630744932 -0.05273200222724894 M7.394603618469468 -1.184604456535371 C6.911219620411112 2.994050920403584, 6.33312933606018 6.945408623358075, 5.802505562287294 11.35262497802785 M7.323203619496672 -0.2113736832150923 C6.3452945236070954 2.555385742997991, 6.680091306272592 6.100843340229306, 7.006862573516333 11.850079826774925 M6.5386336213477065 12.518243893435809 C4.642831809208529 11.4181557711533, 2.8858346532321777 12.5690671904787, -0.37945337189062583 11.307422353585496 M6.582976632242049 11.523378485811131 C4.891635352268613 12.179904523138996, 2.457161412135446 12.155457536444633, -0.0669703829713752 11.850614765128551 M-0.0914145177704988 12.578804848878377 C0.7937777650713524 10.120350659269885, -0.569411698589887 7.7117109731439095, -0.6894490402492769 0.22728426474766117 M-0.21147827835260918 12.232108204074613 C-0.5003163334138555 9.076048892298068, 0.037894286875527206 5.850099521324699, -0.3145476965927946 0.26301923900922797" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(168.76269630976003 109.71941260396964) rotate(0 3.418704253333999 5.925614611875506)"><path d="M0.016264865174889565 1.1213068570941687 L5.440707299283531 -0.16398165561258793 L7.278910431436088 11.590639704235379 L1.0833034608513117 11.992587977417294" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.2308180349287804 -0.4235384190089021 C3.1606879069868383 -0.6220791595760508, 4.61649913589951 0.41654334126593195, 7.05692099937975 -0.5281679450707394 M0.1193089413202526 0.012721303459554578 C2.2559411033423524 0.154990679479108, 4.593682510236906 0.28226832529111395, 6.784676504440749 0.14070752096188854 M5.652804050132627 0.13928317973760773 C5.87202783412027 2.4618410025670414, 5.780242174414072 4.766393269980027, 6.338804260944836 11.201991514533614 M6.626034823452906 -0.49582374847589816 C6.370163675853466 3.4314394554648504, 7.009490215176876 7.354386909398838, 6.836259109691911 12.069465142674057 M7.504423176352795 11.9596916789724 C4.3956392479543585 12.475856187144874, 3.6010398233391703 11.441898363328356, -0.5438068701655165 11.928538263878997 M6.509557768728117 12.03398766148001 C5.16716518954184 11.677158938590036, 3.1643825723459322 11.733372465894448, -0.0006144586224609005 11.591553637613135 M0.7275756251273644 12.72154808300476 C0.9483682032884603 7.037578910394265, 1.194909766712771 2.520220765603224, 0.22728426474766117 -0.6292717143296647 M0.38087898032360146 11.26347069699709 C0.1392311085442275 9.120734483561836, -0.16062120963350324 6.174787859708945, 0.26301923900922797 -0.020181190015251316" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(179.06572661279006 108.50729139184845) rotate(0 3.418704253333999 5.925614611875506)"><path d="M1.1213068570941687 -1.3967012073844671 L6.67342685105541 0.44150192476809025 L6.576818987152365 12.934532684602324 L0.1413587536662817 11.37962744883496" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M-0.4235384190089021 0.5958926599347948 C1.000022073289685 -0.5361731616974543, 3.6816664872118463 0.1964306892188742, 6.309240561597258 0.13396315396963954 M0.012721303459554578 0.07075539737231318 C1.9138996921439955 0.19944931003277724, 3.806839551835069 -0.07659379049129947, 6.978116027629887 -0.18886617266786468 M6.976691686405606 -1.0490924422571044 C6.454320913234821 3.2516083622557685, 5.858225073467441 8.28636868271498, 6.1881707974506 11.88832687790096 M6.3415847581921 -0.350745314909915 C6.773540563423332 2.6789548324926287, 7.104021596082068 4.7795978191821815, 7.055644425591042 11.759614565753811 M6.945870961889385 11.354970808977958 C4.808840784840071 12.572583695271838, 1.0918107872697351 11.374123820553976, 0.07730904012798356 11.778342302252158 M7.020166944396996 12.159321537126399 C4.4571721842658345 12.06930440284406, 2.2563332994438676 11.785044617540574, -0.2596755861378772 12.07202603671939 M0.8703188592537476 12.775531908820172 C-0.48372915263044197 9.707359887312183, -0.6803075884722269 5.449086318634075, -0.6292717143296647 -0.8744219334006675 M-0.5877585267539223 12.002145945093709 C-0.0072522507940177144 8.030068731805988, -0.27225204092227273 4.769798672540261, -0.020181190015251316 0.004818966137025482" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(189.43569965595998 108.50729139184845) rotate(0 3.418704253333999 5.925614611875506)"><path d="M-0.7246999647468328 -1.9919982943683863 L7.087741050294426 -0.19778660871088505 L7.48001942353585 11.32279735184509 L0.06511122919619083 10.835373753078763" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.1955285238827872 -0.0013262584856745807 C1.8983234975262335 -0.13901032513010173, 3.9248634154449733 -0.3320735023905801, 7.504423176352795 0.1084624552213872 M-0.18972668594531292 -0.27190343508275827 C1.5936186572913378 -0.10190876120466198, 3.212843204517681 -0.3198971748811838, 6.509557768728117 0.1827584377289973 M6.605249974960512 -0.0021300731048006227 C5.894518238308598 5.38880793172595, 6.554050573690611 9.155379462033498, 7.564984131795363 12.72154808300476 M6.492683986543359 0.11364213237383058 C6.473485589062417 4.04246048549413, 6.846549510219592 8.747864626036318, 7.218287486991599 11.26347069699709 M6.474460316934868 12.15472010191493 C4.588760642490287 11.839604419215485, 2.7700275032540556 11.356553107007954, -0.05606047835115058 12.002165674556995 M6.75679515697873 11.733407517821648 C4.860102734636713 11.819797535506346, 2.7744379735248628 11.80427890259564, 0.14134196916697905 11.519474079000911 M0.3453364438968507 11.647686440460154 C-0.4323155283999318 8.397493946373032, 0.9003788694481824 5.385756670873423, 0.3085115384136843 -0.09417179857266111 M-0.10965230110254248 11.58397403012427 C0.5570818461889648 7.65870142843556, 0.14702786596220474 3.946061314345894, 0.5708745835228232 0.20883016434479484" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(218.82963904989947 108.20426108881814) rotate(0 3.721734556364254 6.228644914905818)"><path d="M-0.36202881671488285 0.5069883558899164 L8.278396274884017 -1.2309555914252996 L7.9861154220893695 10.537288987970534 L-0.3606365118175745 12.172227883672896" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.3303919862920167 -0.025350630166797616 C2.1768544126383516 0.45683260089537225, 3.82178700827766 -0.021517666368502962, 7.6077844097395815 -0.09698450198076902 M-0.12826529642611406 -0.01651581737959462 C2.5967387691567083 -0.13344368297029066, 5.216062107503459 0.17142213696617983, 7.082307545370359 -0.2810090237684325 M7.229517344827572 -0.5725194322341011 C6.717232008340659 3.3845576569505065, 6.348078450060772 5.436748456739996, 7.344481457822363 12.840578571052468 M7.1625467423419344 0.6019678026770555 C7.135852912835255 4.072104200290424, 7.353827599806211 8.40238886532985, 7.66297865332846 12.231594892298997 M7.682631838288908 12.260621513975984 C5.976029938917584 12.071854296300122, 4.326357045401201 11.781693375597584, 0.6286047229123713 11.789684583850125 M7.586326436461866 12.693377731053042 C5.675948776805768 12.743143751590324, 3.2846048881698637 12.230046158985985, 0.2814949204946174 12.445731739596873 M0.6863837173952303 13.477680847946784 C-0.7794858207804899 6.431975643662891, 0.6776962333921731 2.351694819225727, -0.6319813491458556 -0.9603693768623351 M-0.032689544682295346 12.838899907866432 C0.030531035684077085 8.357416899132083, -0.30994790279408857 3.6956810019019963, -0.14558589076329698 -0.41934003836482237" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(230.11471456070376 108.01880457724889) rotate(0 3.721734556364254 6.228644914905818)"><path d="M1.9124569986015558 0.2117794957011938 L6.251506105054887 -1.2463434133678675 L9.200562432397874 13.108923589772168 L-0.986772945150733 11.940454458779278" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.34232348916682825 0.20428468532138977 C2.0134654479408503 0.6451674526323373, 4.452923987815517 -0.34165154699865724, 6.70210022389156 0.09316712781970382 M0.01211633558534081 -0.1890372204736293 C2.092545619493964 -0.31445084924133326, 4.633161977808211 -0.3141357913971906, 7.334643532066865 -0.26345126563467297 M8.273130356193052 0.8825383985634379 C7.062562997856755 3.2736196917942175, 8.433498160481578 8.178328587285897, 8.58020490842008 11.960641303415805 M7.028893776018471 -0.57569572075684 C7.826123174844326 2.6458430246774185, 7.196116809895103 5.169250882239449, 7.727560182349631 12.832666854423756 M7.569672851984621 12.702865604651711 C5.586338299401819 12.322597490901263, 4.116590025695319 12.001766296338356, -0.1347375157567441 12.645977438190616 M7.0861824386620835 12.39018016139573 C4.846644406983376 12.435578570006035, 2.592467903862625 12.232344942414775, -0.3059380796702856 12.362965718817087 M0.12460175023914521 12.72884579052805 C0.8391682179128391 7.904146608717549, -0.6619557162551849 4.806662988703214, 0.47964141129841975 -0.34831290188354114 M-0.4252059971895157 11.969323092349251 C-0.4871743917598916 9.566103293316102, -0.6517348203732559 7.399923726172492, -0.5795698380345411 0.165848025634748" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(240.27183626636725 108.38971760038734) rotate(0 3.721734556364254 6.228644914905818)"><path d="M0.2117794957011938 -1.1919630076736212 L6.19712569936064 1.7570933196693659 L8.09510287268904 11.470516884660903 L-0.5168353710323572 12.297094726896468" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.20428468532138977 -0.08183539392444406 C2.6144326566391367 0.29655465300161077, 3.5249414041957086 -0.7071365399202648, 7.536636240548211 -0.07361092564253957 M-0.1890372204736293 -0.06688418868204882 C2.138085778554054 0.31187909433911754, 4.610604295777149 -0.11124884777871061, 7.1800178470938345 0.11093564535685241 M8.326007511291946 -0.4287229341291733 C6.687603753958948 5.15756737001335, 7.483804891422478 9.567446078726974, 6.946820586332677 11.544820666203453 M6.867773391971668 0.31401569037629506 C7.366633453884132 4.107579991765793, 7.065323526110855 9.131232402298565, 7.818846137340629 11.907858437521986 M7.689044887568583 12.176608203676016 C5.069197432113718 12.295929491086193, 2.5543907610195276 12.335172687980503, 0.18868760837898002 12.768027556955765 M7.376359444312601 12.404243585029185 C5.626650260085769 12.622388208057831, 3.648504894871692 12.079894415328065, -0.09432411099454985 12.50818426792337 M0.27155596071641397 13.17471064402926 C-0.6277301176262975 9.153873797869903, 0.18232570600163656 7.897505417127391, -0.34831290188354114 -1.0537325066304444 M-0.48796673746238384 12.058515990920395 C-0.2707613131096651 7.272763084677038, 0.2465135620597722 2.635335036517112, 0.165848025634748 0.5956007779657811" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(260.647820868081 108.20426108881813) rotate(0 3.1156739503036306 5.622584308845209)"><path d="M0.4510419461876154 0.24578442238271236 L6.491200736515793 -0.6437578592449427 L7.395724287979874 12.85776348109028 L1.5666511747986078 11.064027321288798" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.11236257853983844 -0.08881600798073275 C1.4157131230086593 0.4558892303816591, 2.561315196935474 -0.4524150845078648, 6.073419831820775 0.08521321047408659 M0.06791845131748053 0.17943303662360632 C1.0989321979075237 -0.20224359515569768, 2.5619372207677076 0.11372959805449995, 6.144231886628351 -0.2635474462221986 M5.350374861117218 -0.7199445657793659 C5.913205615066607 3.1544992137283505, 6.847091439743296 7.296728051162809, 6.530769209520002 12.320463688858249 M6.725318167024747 0.1831932876933947 C5.88691807168069 3.3473278698824576, 6.119293251970332 6.76658487720356, 5.724601011433939 11.681375394225832 M5.96132815865984 11.855743563977033 C4.884541535182534 11.541708924113218, 3.390861302088614 10.90356224621934, -0.012086803246702527 11.057252592952063 M5.92772236483275 11.067193474550495 C5.027585611644389 11.261593670385134, 3.619128063066844 11.449837947330126, 0.225720606744997 11.492772593646299 M0.7833472603041991 11.196776915887884 C0.723114948189205 7.901542416875713, -0.6849455215015297 7.265407183608593, 0.7911008439923994 0.8852484039887782 M-0.3587348242176842 11.191623187329284 C-0.5140448645915481 7.513589172397766, -0.05053360483175305 3.5950981148002303, 0.5364425512163262 0.36861380756230466" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(270.9508511711117 109.41638230093935) rotate(0 3.1156739503036306 5.622584308845209)"><path d="M-0.5410631205886602 -1.6578939352184534 L6.033200970642838 -0.7504563126713037 L4.954678944103989 13.157969248244974 L0.7992374990135431 11.238483500430796" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.06232791123688797 0.13583690263496107 C2.3134836539034778 -0.31400126384293375, 3.517213553248512 0.09120241407574399, 6.471272678963639 -0.17423202795782156 M-0.2126952598928631 -0.2440892478696049 C1.142836032250031 -0.1354394064087605, 2.402829589320305 0.12331026960104108, 5.941437226389092 0.08295999856130609 M5.530580808663786 0.9879405328349713 C6.609641956688697 3.894024248695208, 5.952660336559635 8.807620265749446, 5.794016086000006 10.231674839343773 M5.938932593498803 -0.24364050737787424 C6.694088208776425 4.2765811127119076, 6.439974313551244 8.420637013076783, 5.952355858956074 11.234262619283173 M6.548046988844056 10.637917546141395 C4.193909268883592 11.49741746755678, 2.8378814974616837 11.330459068391825, 0.34599884163205796 11.696609831180412 M6.15907438961183 11.462208907253704 C3.853327796927916 11.534530797438828, 1.2535295509144257 11.144402603276744, 0.25297111173557213 11.46435717550473 M-0.7757686595794064 10.527698969255049 C-0.125587032595487 5.8829283943443595, -0.36463045437987895 2.5511519245680456, -1.0660917896230862 1.0728851024326524 M0.07305222388941213 11.064189475786097 C0.3019800065689012 7.587737946032505, 0.4150712019501732 2.972688106285366, 0.036119874451467915 0.5413239047932843" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(281.25388147414174 108.20426108881816) rotate(0 3.1156739503036306 5.622584308845209)"><path d="M1.95912523008883 0.9339816179126501 L6.458026162140641 1.9863624777644873 L5.934512070172104 10.682797682235453 L1.7925746534019709 12.616262984225962" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.3096771103189987 -0.5798213484363376 C1.6363774763292902 0.652217118392925, 3.006898544209821 -0.3150189360423753, 5.843027929990984 0.5474529884346374 M-0.024955785455949897 -0.12117055677291338 C1.7916257279601 0.25782059495593684, 3.900531358661885 -0.11228137383931165, 6.069310178274739 -0.13500987097371075 M6.088765373249016 -0.5579840833023741 C6.215659298659563 4.136286179016825, 5.254993265718927 9.12795752867276, 6.80286704944896 10.14931571367001 M6.29944831252905 0.31219692574851143 C6.6824357446847475 3.1780274740748284, 5.811519992452787 5.997826952646276, 6.100922215715407 11.636842247842518 M5.911135440537594 11.751110841161562 C4.827136209224514 11.685997503864078, 2.6486641536752598 10.98667513179713, -0.4298810069083596 10.847593166443392 M6.182467958305264 10.949788927933179 C4.718548283256062 11.430250107633702, 2.6814898522487685 11.26427729207245, 0.040480835587636876 11.144881638072782 M-0.10184822108316283 11.317408366593353 C0.9919783038318041 6.912720290668518, -1.0228343483436557 3.0772967841035612, -0.4219503888099506 -0.7178178842425084 M0.3662283673579432 10.925105481861468 C-0.07540355654382347 7.051757494386256, 0.03832680285571177 2.5895780464518, 0.0017652106283896352 -0.2547992514489215" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(290.9508511711117 108.20426108881816) rotate(0 3.1156739503036306 5.622584308845209)"><path d="M0.7010932061821222 -1.52984438277781 L7.113146832936081 1.3162624444812536 L7.5358054552009435 10.715786409328196 L-1.8085798528045416 9.287093889663431" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.4834353798694532 -0.4886660958239201 C1.5627242415492348 -0.2679141495996812, 3.368965856593136 0.6667362413519787, 6.265279809540792 0.32894785088930456 M-0.09395801236917731 -0.2722130105935556 C1.8394779693899088 0.1595582244430675, 3.277163104985585 -0.17676646281525224, 6.352563960639456 -0.015244856128964979 M7.125006384917892 -0.9271550255314837 C7.364648263371464 3.2489000393669936, 7.077134546061163 6.9711111164988715, 7.061548946557917 10.347937976651355 M6.673972102601651 0.09248085936609829 C6.134272106986565 2.4078068317773424, 5.963916044936522 5.537836892772957, 5.679627283582184 11.072101476437087 M6.639871943950593 11.385698581913518 C4.5362818272498995 11.385582668442456, 2.4874849702156197 11.667402715115733, 0.5024319808288397 11.733286043143679 M6.147059087092857 10.986895770374092 C4.387513190876219 11.188253124383152, 2.4065303934005273 11.603145138431858, 0.12450817278911602 11.244127185405391 M0.5112554862669827 11.411697827821726 C0.5489869504391666 7.142691718924185, -0.4455858757320532 4.134688655420415, 0.5251390389625528 0.12745176364538913 M0.5039451059325779 11.630623301251859 C0.04372520856237548 6.214735431718187, -0.5174657091585682 2.088746505044885, 0.24240505112041466 0.5362759266579591" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(153.48917286035044 133.46298868842794) rotate(0 -0.4981727561221305 24.042046276215796)"><path d="M1.0755447920411825 0.8226566199213268 C0.9128721321853238 8.702696603609981, -0.5829034724671449 39.12649996764392, -1.1541374573962457 46.942817340858944 M0.1810670785233377 0.2089474000409246 C-0.11899421959050127 8.222030032760264, -2.032367679572484 39.769758785845646, -2.0718903042854437 47.875145152390665" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(153.48917286035044 133.46298868842794) rotate(0 -0.4981727561221305 24.042046276215796)"><path d="M-8.104929504262003 25.083817094841447 C-9.349974970753527 29.547809114928846, -5.942386310339137 38.83665686928682, -0.16431223270610662 48.57623835857279 M-8.823795093500706 24.722958953396823 C-8.62452149085801 29.075417016937948, -5.177948397128917 34.93767405594667, -2.0798954716430673 47.457711632542974" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(153.48917286035044 133.46298868842794) rotate(0 -0.4981727561221305 24.042046276215796)"><path d="M8.339106951219872 25.67918843834183 C2.4065836066291695 30.120791247485947, 1.125020291247444 39.23986397615207, -0.16431223270610662 48.57623835857279 M7.6202413619811695 25.318330296897205 C4.141519878984737 29.64594157077442, 3.912722876149891 35.375128482009075, -2.0798954716430673 47.457711632542974" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g transform="translate(86.76456876456905 288.6038096144189) rotate(0 12.039393908506213 5.967411895052081)"><path d="M-1.4324995186179876 -0.18426320888102055 L25.509431921269652 -1.3784433994442225 L25.23745324495053 13.873695790306698 L-0.7096782233566046 11.057304143444668" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9559627380222082 1.2773270700126886 C8.720613084813934 -1.2283543716678411, 15.689691150721872 0.9647815455188959, 22.90110448407745 0.04783589579164982 M0.7712464081123471 0.8353659911081195 C8.483504285246484 -0.9636778766569958, 17.625057702236578 -1.0225103730616436, 24.73691656007236 0.532556246034801 M24.027245875149845 -0.7637977908691629 C23.808436589507238 1.3483213435562094, 25.12735319104351 4.061448754888094, 24.205011175540356 10.941536064699763 M24.353946523291643 -0.07948520286848215 C24.339557154491594 3.4241639944578384, 23.73099402936093 7.398576280117092, 24.571305360394167 12.0801704826894 M25.949628327365872 11.277099061296354 C17.65146696957668 10.223084487789746, 14.439447198091198 10.341662146919843, 1.016716381534934 13.851276862905394 M23.170538950104174 11.853769261860322 C14.940127929343364 11.437696138237051, 6.9374133273447605 12.246373483751348, -0.3661751104518771 12.296067167543839 M0.04219220473927443 12.513800167850306 C1.256549936826116 8.629821220982548, -0.0498245549369435 6.770594026226293, 1.0924033041889916 0.9614270686075603 M-0.46087712813711224 11.446516237714256 C0.2620707966023077 8.520818042514438, 0.2518958269596929 6.208912882619414, 0.36632236736942425 0.1120678711508124" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(140.3447402278848 194.0394418655335) rotate(0 14.90250780610279 12.090713880423053)"><path d="M-0.18426320888102055 1.430644104257226 L28.42657221276144 1.1586654279381037 L31.743887612408198 23.471749537489508 L-0.8775196466594934 25.430563860437964" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M1.2773270700126886 1.854996582493186 C9.682447489627005 1.8159957244231808, 23.017997367859486 -1.3201457545921695, 29.85285150799731 1.086525758728385 M0.8353659911081195 -0.04824321996420622 C7.723455141414892 0.33169257946873704, 16.13654867862352 0.42825863071347275, 30.337571858240462 -0.9774476541206241 M28.525067446565426 -0.3779036197811365 C28.020186462953806 8.178198013238495, 28.521654823377848 12.903326974749337, 28.1404954770563 24.129983966103886 M29.67181682474742 0.36133060324937105 C29.72347220130277 7.510712170524872, 30.43511032409978 17.16373009796883, 30.048582999779082 23.422793317451273 M29.147290883397854 23.151043479195927 C19.312660063147675 25.614762426591163, 10.61854170932816 25.476946295476203, 1.9164530728012323 25.287704949132298 M29.72396108396182 23.598294902457987 C18.80885418685277 23.880403613160635, 9.02757736023932 23.54454893381455, 0.3612433774396777 23.760500151529108 M0.9702302906662226 26.1677885100759 C-0.405682363841079 18.166914083286205, 1.6368283244638222 14.38865585217522, 1.611129054799676 -1.1893957648426294 M-0.8182903425768018 24.48830123604754 C-0.8101341061531185 18.816380804285867, 0.12625262809408913 13.4855601115989, 0.18779979180544615 -0.7162497593089938" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(173.1544099497114 242.0805448161841) rotate(0 11.799059530240811 8.987265604560996)"><path d="M1.430644104257226 -1.3784433994442225 L24.756784488419612 1.9388720002025366 L22.888440837124904 17.09701156246249 L1.2491360995918512 19.023194607372922" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M1.854996582493186 -1.4599664714187384 C6.7680080430174305 0.5824568643813215, 8.441416460939136 -0.10287668559502733, 24.684644819209893 -0.315100422129035 M-0.04824321996420622 -0.7484708921983838 C9.75010792182362 -1.004434049775497, 19.035219203076778 0.3354255848623356, 22.620671406360884 -0.3757120566442609 M23.258487040091698 -1.6047935172840495 C25.237088803099212 4.244205045326619, 23.78045618679301 9.300682799865715, 23.551885155776016 18.654493343322457 M23.922856470727343 -0.1891374985970542 C23.089206363305045 6.800982080875894, 24.422759782316984 12.920003474637445, 22.916314136525738 17.935690842422567 M22.567734778831323 16.29487167301361 C17.98760964761288 16.33584399984656, 11.084751590979403 19.81337894962607, 1.1062771882861853 19.08068697157089 M23.014986202093382 17.568323955896837 C17.37941706581939 18.35787395733224, 10.855180822916976 18.31664724248277, -0.4209276093170047 17.377901867989046 M1.7851951639802883 17.632768929001305 C-0.18863436189972185 13.887202273636134, 1.638867830643985 8.260045161072476, -1.0689415647580678 -1.1889835311210035 M0.27579534286298935 17.156259951828943 C0.39929274686180416 13.351336233490205, 0.6905240525099907 8.850353175534758, -0.6437126826112809 -0.08280111996812167" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(108.43619024110512 241.17659980376573) rotate(0 10.764576771620114 9.332093190767885)"><path d="M-1.3784433994442225 1.1586654279381037 L23.46802554344265 -0.7096782233566046 L20.65163389658062 19.913322481127615 L1.0486633982509375 18.695083352537985" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-1.4599664714187384 1.9584581460803747 C8.22096893716485 -1.377365864168383, 15.023434878585352 0.8868432274949776, 21.21405312111108 1.7875234093517065 M-0.7484708921983838 0.5615626918151975 C7.311870908361318 0.6685150999602302, 16.7709048399635 -0.9670612972203271, 21.153441486595852 0.003313724882900715 M19.86278656198225 1.7099148445582164 C20.211315553541056 5.716045898493422, 19.856361885716705 10.989297027067646, 22.2352047808192 19.659142877295224 M21.33275912913927 -0.6144998799125952 C21.972005987194603 7.2300856883352695, 21.73159242919767 12.211766770634737, 21.488822931288617 18.539856465076554 M19.84949400713174 19.818718894185658 C11.495444973174148 19.569986256029757, 6.500353229865972 19.65954706278101, 1.106155762448907 17.90899555272066 M21.122946290014966 18.63386595042997 C17.42826050367139 18.17246770264042, 12.883670656964046 18.117715203775294, -0.5966293411329389 18.769260553443964 M-0.3548751742194991 18.474912605613046 C1.4606720395305495 14.707295368711847, 1.4129778532844048 6.223755402832478, -1.2346030041749803 -0.7744982867731891 M-0.8496670694265589 18.934651516951263 C0.1769049919134569 15.079279618783335, 0.32798375051416473 9.680220827387803, -0.08597807184538053 0.6675452051875538" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(122.11225621732115 287.9521524124162) rotate(0 13.251515120627346 5.967411895052109)"><path d="M1.1586654279381037 1.9388720002025366 L25.793352017898087 -0.8775196466594934 L27.752166340846543 12.983487188355156 L0.030896971002221107 11.362511038319298" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.9584581460803747 0.7331694457679987 C8.46587763766741 0.24130357350779347, 20.373647699933155 -0.12163274441289135, 28.290553650606398 1.5424928162246943 M0.5615626918151975 -0.8073033886030316 C10.82301082792176 0.7309237129411275, 19.35231651560303 -0.1773445897379344, 26.506343966137592 -0.043186177499592304 M27.596436187204453 -0.7663124159783408 C26.60670253422397 1.8233616207576957, 26.4497805444289 6.084783527239248, 27.139255695339106 12.485141202662653 M26.110087970111685 0.2990792685888721 C27.03867195543397 3.5668234238795407, 26.093445991901163 6.817164031581614, 26.423527410836805 12.493027585865644 M27.657562753904585 10.37374191312594 C20.440422469313933 13.62869652550299, 13.450659122589494 12.818399215150665, -0.7551908288151026 10.118326056287714 M26.4727098101489 12.337293882393368 C18.40076511975012 12.188409936379465, 10.609922609832296 11.230537217806848, 0.10507417190819979 11.970175970816143 M-0.12103121548121965 12.9267489311712 C0.9738689765608123 8.736827343934369, -0.8282957711328355 2.82650922071928, -0.49525333649261516 -0.9217542562742289 M0.17294907297668305 11.950284269672673 C0.3296975588455664 9.44305962900358, -0.4640405418767296 6.672181002376462, 0.426862132266537 -0.41128697692497596" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(160.28986485651967 289.9393655481342) rotate(0 12.206583041212525 5.622584308845219)"><path d="M1.9388720002025366 -0.7096782233566046 L23.535646435765557 1.2491360995918512 L25.461829480675988 11.27606558869266 L-0.5723127517849207 13.136327207783694" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.7331694457679987 -1.1776833329349756 C9.636102703056018 0.8474649387996928, 18.8614331923997 1.5484625894230142, 25.955658898649745 1.670731982216239 M-0.8073033886030316 0.6581287430599332 C7.738729096628604 -1.114542886730405, 14.036633644543345 -0.13378150772688016, 24.369979904925458 -0.6399740828201175 M23.691135117992108 0.11892952716358418 C23.37687702526996 3.1321463817128996, 24.738163236171168 6.9216032707173305, 24.931683339680397 11.095384335226456 M24.694963020904023 0.4640573602008764 C24.52644838041623 3.3533243295760946, 24.365201377509646 7.484837132497045, 24.939114007318313 11.060262980703726 M22.852084205446772 12.261884999225373 C20.26341646033108 12.492374775871031, 13.386916455083547 10.630906758769743, -1.8164977338165045 11.08305956120276 M24.8156361747142 10.878993507238562 C18.87407156713207 10.816838831170669, 12.015860955826794 11.342840612395873, 0.03535218071192503 11.73028376302355 M0.9346066321208069 12.274447270661668 C0.9272521165591658 6.08602215790965, -0.7224668445210969 1.7977362258205167, -0.8684905800177749 -0.920181288050381 M0.014567093969312328 11.590323004713422 C0.2079768646468909 7.123816169299145, 0.05058307703832611 4.210094507095764, -0.3875207114173175 0.32573470271631066" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(195.77349721401788 289.4841663177622) rotate(0 15.236886071515642 6.228644914905814)"><path d="M-0.8775196466594934 1.2491360995918512 L31.52243554128222 0.030896971002221107 L29.901459391246362 14.348448419904884 L0.11630239151418209 10.490030610895332" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.04783589579164982 1.086525758728385 C11.811111585546191 1.966965677074237, 25.794916831575147 1.8501742499387694, 30.37728570310287 -1.4969417843967676 M0.532556246034801 -0.9774476541206241 C10.477712937705487 -0.19724292868497917, 21.663663811199903 -0.8405307363879974, 30.284820333140715 -0.8928152276203036 M29.437001655475644 -0.03204251305246286 C30.865035553184917 5.542535735645636, 30.96819468531654 9.590987395187915, 30.923892147937092 12.195125480678188 M30.62548162003589 -0.47252645681236716 C30.50928812693139 2.833915857017503, 31.118846593184834 5.628961413142117, 30.152877252216317 11.934189688393904 M32.390225215832515 13.563567018097814 C20.67864822646009 11.85701491772144, 6.854715051174583 12.450096690048863, -1.1662657167762518 11.644875323361333 M30.83501552047096 12.036362220494624 C20.827408949868598 12.45077663291623, 12.409656619683716 12.830817606341142, 0.9931803746148944 12.267152865378733 M1.0035150794435013 11.716457441561873 C-0.9275625720796831 8.329498586732592, -1.0656403515559816 4.216207037521594, 0.3822811821665699 -1.1342095204497615 M0.11697382182493632 12.01116328769938 C-0.060612962318156285 8.804611195445549, -0.43251918088979346 4.621680924867485, 0.6038272612357398 -0.2210166828564754" stroke="#000000" stroke-width="1" fill="none"></path></g><g><g transform="translate(151.18884242870126 219.92425786089694) rotate(0 -11.434738171350318 9.095722503524513)"><path d="M1.086525758728385 -0.315100422129035 C-5.876227252857398 7.420545618529638, -13.554051615357972 11.654247418336954, -23.95600210142902 18.567157063693287 M-0.9774476541206241 -0.3757120566442609 C-4.806472633913799 3.9013678664170133, -10.374350986848492 7.580549149423564, -23.351875544652557 18.36017922038633" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(154.60123024323457 220.32265537238413) rotate(0 14.015916574758307 9.35257060469435)"><path d="M-0.315100422129035 1.7875234093517065 C8.125825105257778 6.1866284834520755, 13.065551472174075 7.683077342820599, 28.407545206160876 17.72930175470802 M-0.3757120566442609 0.003313724882900715 C10.04082305466567 6.702967619047348, 19.87604946132599 13.526377254618879, 28.200567362853917 18.7018274845058" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(116.93677195595183 264.19156414601423) rotate(0 -4.072183551724152 11.723443794739033)"><path d="M1.7875234093517065 1.5424928162246943 C-0.7238842615719037 6.595589764494374, -6.406636384765367 14.52038169055623, -9.93189051280001 23.490073766977645 M0.003313724882900715 -0.043186177499592304 C-3.7948387909079853 7.612824472466724, -6.985807196729778 16.550949142817963, -8.95936478300223 22.279576835640825" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(121.03565727228215 266.9011999318549) rotate(0 8.033809814705933 10.563099880350755)"><path d="M1.5424928162246943 1.670731982216239 C5.11299638242185 5.70576043333631, 11.574705594182726 12.92302566553109, 16.110805806911458 21.766173843521642 M-0.043186177499592304 -0.6399740828201175 C2.7255466480029824 3.2935925416078984, 6.822920839461268 7.73122474167714, 14.900308875574638 19.86880128387736" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(176.2127268770846 263.99816713564815) rotate(0 -5.146763086143324 10.949628782273727)"><path d="M1.670731982216239 -0.09648643992841244 C-4.463510496552928 7.494754446635733, -7.745390050624151 13.973660476733974, -10.066885594858604 20.159035963567902 M-0.6399740828201175 -0.18895180989056826 C-3.8134392692756465 6.7146723775509845, -6.48406192556211 11.707617713691214, -11.964258154502886 22.088209374438037" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(197.87104022815902 264.112064933825) rotate(0 3.53116285350643 9.672422305658294)"><path d="M-1.4737507421190548 1.1057256099541988 C1.63880030090153 8.057148159825726, 7.355716316570035 11.594017081564703, 7.423862215947794 17.924590495964555 M-0.8789834835232732 0.9019543254892166 C1.667033129987356 5.417585877157989, 3.893638270926999 10.586178827544083, 8.536076449131933 18.44289028582738" stroke="#ccc" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(200.89985845474848 289.00108782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.24032276617363096 3.244357343344018, 0.48064553234726193 6.488714686688036, 1 13.5 M0 0 C0.23496605260297657 3.1720417101401837, 0.46993210520595313 6.344083420280367, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(206.39985845474848 288.75108782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.21157085103914142 2.856206489028409, 0.42314170207828283 5.712412978056818, 1 13.5 M0 0 C0.3721991633065045 5.024688704637811, 0.744398326613009 10.049377409275621, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(210.89985845474848 288.75108782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.35065756542608145 4.7338771332521, 0.7013151308521629 9.4677542665042, 1 13.5 M0 0 C0.2696682636626065 3.6405215594451876, 0.539336527325213 7.281043118890375, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(214.89985845474848 288.87608782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.33321742760017514 4.498435272602364, 0.6664348552003503 8.996870545204729, 1 13.5 M0 0 C0.267552615236491 3.6119603056926284, 0.535105230472982 7.223920611385257, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(220.39985845474848 288.62608782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.33844768805429337 4.5690437887329605, 0.6768953761085867 9.138087577465921, 1 13.5 M0 0 C0.2322900806553662 3.1359160888474436, 0.4645801613107324 6.271832177694887, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(164.28933362396356 289.93858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.32418523179367187 4.37650062921457, 0.6483704635873437 8.75300125842914, 1 13.5 M0 0 C0.2805317829363048 3.787179069640115, 0.5610635658726096 7.57435813928023, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(169.7893336239622 289.68858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.3453239123336971 4.661872816504911, 0.6906478246673942 9.323745633009821, 1 13.5 M0 0 C0.3496941183693707 4.720870597986504, 0.6993882367387414 9.441741195973009, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(174.289333623964 289.68858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.33057225989177824 4.462725508539006, 0.6611445197835565 8.925451017078013, 1 13.5 M0 0 C0.284787807893008 3.844635406555608, 0.569575615786016 7.689270813111216, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(178.28933362396538 289.81358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.25355723602697255 3.4230226863641295, 0.5071144720539451 6.846045372728259, 1 13.5 M0 0 C0.39227480338886384 5.295709845749662, 0.7845496067777277 10.591419691499324, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(183.78933362396356 289.56358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.26134025799110533 3.528093482879922, 0.5226805159822107 7.056186965759844, 1 13.5 M0 0 C0.29703438384458425 4.009964181901887, 0.5940687676891685 8.019928363803775, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(126.14985845474848 287.43858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.2823092435486615 3.81117478790693, 0.564618487097323 7.62234957581386, 1 13.5 M0 0 C0.3148826132528484 4.250915278913453, 0.6297652265056968 8.501830557826906, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(131.64985845474712 287.18858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.34949533743783834 4.718187055410818, 0.6989906748756767 9.436374110821635, 1 13.5 M0 0 C0.29862432824447754 4.031428431300447, 0.5972486564889551 8.062856862600894, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(136.14985845474894 287.18858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.2894334618933499 3.9073517355602236, 0.5788669237866998 7.814703471120447, 1 13.5 M0 0 C0.29494868917390704 3.981807303847745, 0.5898973783478141 7.96361460769549, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(140.1498584547503 287.31358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.24263905389234425 3.2756272275466474, 0.4852781077846885 6.551254455093295, 1 13.5 M0 0 C0.2681751136668027 3.6203640345018364, 0.5363502273336054 7.240728069003673, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(145.64985845474848 287.06358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.229770437348634 3.101900904206559, 0.459540874697268 6.203801808413118, 1 13.5 M0 0 C0.28091181023046374 3.7923094381112605, 0.5618236204609275 7.584618876222521, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(89.14985845474848 288.93858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.37955497251823545 5.1239921289961785, 0.7591099450364709 10.247984257992357, 1 13.5 M0 0 C0.2154642234556377 2.908767016651109, 0.4309284469112754 5.817534033302218, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(94.64985845474712 288.68858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.29807842774316673 4.024058774532751, 0.5961568554863335 8.048117549065502, 1 13.5 M0 0 C0.2735304270870984 3.6926607656758286, 0.5470608541741968 7.385321531351657, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(99.14985845474894 288.68858782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.34378559039905665 4.641105470387265, 0.6875711807981133 9.28221094077453, 1 13.5 M0 0 C0.38724592132493857 5.227819937886671, 0.7744918426498771 10.455639875773342, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(103.1498584547503 288.81358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.2742341528646648 3.7021610636729747, 0.5484683057293296 7.404322127345949, 1 13.5 M0 0 C0.3478682761080563 4.69622172745876, 0.6957365522161126 9.39244345491752, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(108.64985845474848 288.56358782190574) rotate(0 0.5 6.75)"><path d="M0 0 C0.35679293023422365 4.816704558162019, 0.7135858604684473 9.633409116324039, 1 13.5 M0 0 C0.3495560119859874 4.71900616181083, 0.6991120239719748 9.43801232362166, 1 13.5" stroke="#000" stroke-width="1" fill="none"></path></g></g>
</svg>
<figcaption>B-Tree Index Deduplication</figcaption></p>
</figure>
<p>Starting at PostgreSQL 13, when B-Tree deduplication is activated, duplicate values are only stored once. This can make a huge impact on the size of indexes with many duplicate values.</p>
<p>In PostgreSQL 13 index deduplication in enabled by default, unless you deactivate it:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Activating de-deduplication for a B-Tree index, this is the default:</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">index_name</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">table_name</span><span class="p">(</span><span class="k">column_name</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">deduplicate_items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ON</span><span class="p">)</span>
</pre></div>
<p>If you are migrating from PostgreSQL versions prior to 13, you need to rebuild the indexes using the <code>REINDEX</code> command in order to get the full benefits of index de-deduplication.</p>
<p>To illustrate the effect of B-Tree deduplication on the size of the index, create a table with a unique column and a non unique column, and populate it with 1M rows. On each column create two B-Tree indexes, one with deduplication enabled and another with deduplication disabled:</p>
<div class="highlight"><pre><span></span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_unique</span><span class="w"> </span><span class="nb">serial</span><span class="p">,</span><span class="w"> </span><span class="n">n_not_unique</span><span class="w"> </span><span class="nb">integer</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span>
<span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_not_unique</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="p">)::</span><span class="nb">int</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1000000</span><span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">1000000</span>
<span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">ix1</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_unique</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">deduplicate_items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">OFF</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span>
<span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">ix2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_unique</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">deduplicate_items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ON</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span>
<span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">ix3</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_not_unique</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">deduplicate_items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">OFF</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span>
<span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">ix4</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">test_btree_dedup</span><span class="w"> </span><span class="p">(</span><span class="n">n_not_unique</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">deduplicate_items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ON</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span>
</pre></div>
<p>Next, compare the sizes of the four indexes:</p>
<table>
<thead>
<tr>
<th>Column</th>
<th>Deduplication</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Not unique</td>
<td>Yes</td>
<td>6840 kB</td>
</tr>
<tr>
<td>Not unique</td>
<td>No</td>
<td>21 MB</td>
</tr>
<tr>
<td>Unique</td>
<td>Yes</td>
<td>21 MB</td>
</tr>
<tr>
<td>Unique</td>
<td>No</td>
<td>21 MB</td>
</tr>
</tbody>
</table>
<p>As expected, deduplication had no effect on the unique index, but it had a significant effect on the index that had many duplicate values.</p>
<p>Unfortunately for us, PostgreSQL 13 was still fresh at the time, and our cloud provider did not have support for it yet, so we were unable to use deduplication to clear space.</p>
<h4 id="clearing-bloat-in-tables"><a class="toclink" href="#clearing-bloat-in-tables">Clearing Bloat in Tables</a></h4>
<p>Just like in indexes, tables can also contain dead tuples that cause bloat and fragmentation. However, unlike indexes that contain data from an associated table, a table can not just simply be re-created. To re-create a table you would have to create a new table, migrate the data over while keeping it synced with new data, create all the indexes, constraints and any referential constraints in other tables. Only after all of this is done, you can switch the old table with the new one.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 285.30256543554844 135.73045230480716" width="30em" height="auto">
<g transform="translate(22.673209503165367 14.920893655339711) rotate(0 4.375420648808699 9.594111612924156)"><path d="M0.8274188842624426 0.33696223236620426 L10.254759846663319 -0.34771900437772274 L8.006780980563008 18.652658693344826 L-0.23554847575724125 21.027749948055977" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.7212407404954484 -0.6647660147306157 C2.4220520806157584 -0.2533850251695737, 3.494795551723251 -0.414854797894422, 9.17304400743558 -0.5623765174982176 M0.43047284510851164 -0.39303908930895154 C3.317387033400377 0.5146720902907246, 6.200620459826327 0.4342160202835075, 8.863374918634467 -0.41185744595629076 M7.172910073302858 1.2527884402652683 C8.601072085210854 7.576502704249869, 7.563271859464784 15.456118112834455, 8.620167323001736 20.512591212095018 M8.980682284833168 0.023624706921751892 C9.28682093416313 7.543541007398155, 8.752749602079486 13.603205771305223, 8.426565253327865 19.811796972814452 M8.450619929846667 18.59576841930935 C5.523127368689286 19.01469574965876, 3.384117341196169 19.39274446522339, -0.3994007866634297 19.56639710608713 M8.734565974374416 19.38764348988301 C5.8393005699118685 19.57561329060369, 2.38378615564748 19.59343911214239, 0.2988339317948095 19.314149900576005 M-0.40782254301627785 20.303279007470596 C1.7986539441708702 10.926477409796798, 0.3782626060284646 3.625453195123502, 1.8367175910200106 -0.9094012277727832 M0.813918030037509 19.938396242027007 C-0.5329160369557114 12.522799365047204, 0.6104175717327626 6.344884937962021, -0.32766010242479304 0.37160091957861485" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(34.55481513997495 14.107190599746815) rotate(0 3.340937890188002 10.628594371544882)"><path d="M-0.34771900437772274 -0.7440603170543909 L6.146311247872518 -0.23554847575724125 L8.521402502583669 21.051596429308383 L-0.14657854102551937 21.764434900244204" stroke="none" stroke-width="0" fill="#f071ff"></path><path d="M-0.15069683449311344 -0.3969232997634849 C1.1798550475458567 0.2907127427973598, 2.4782769877663666 0.2607766957722901, 7.167455133221267 -0.6192963653022694 M0.3310582897718325 0.14058922887522624 C2.931579116683631 0.03871767468797886, 5.009427464405206 0.1555718204502795, 6.862517860394668 -0.07404880973683409 M7.364990985172199 -1.3459492828696966 C8.502829382261076 4.492883080472155, 8.201059470018187 10.386534550869671, 5.93361923672478 22.165570154895512 M7.416189898887978 0.07681469153612852 C6.609085327879419 6.548408831524864, 7.337064838901987 13.733789096696706, 7.229858656565057 20.938605264869956 M6.5706628391389374 21.116304242753586 C5.084006474054464 20.903609190613057, 3.4199921419200914 21.648720927954397, 0.37648464801170856 21.181885449617358 M6.985205359556748 21.284907838862893 C5.01910856147761 21.52967731088082, 2.8186134477433304 21.509302621424716, 0.16163390757211182 21.061192411440533 M-1.367270389571786 21.74821372341129 C-1.8567035396866145 17.24624977651266, -1.0714029868416133 10.563875444482191, 1.2477255072444677 0.9579601977020502 M0.20999691355973482 22.018203185049323 C0.7619157819622576 12.735152944227638, 0.22110981630025503 5.309870762182808, 0.16848111618310213 0.7519592745229602" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(44.34431667403794 14.107190599746389) rotate(0 3.68576547639492 10.973421957751771)"><path d="M-0.7440603170543909 -0.5355645325034857 L7.135982477032599 1.8395267222076654 L7.165938639008459 21.800265374478023 L0.5072461571544409 23.226095917519253" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-0.43789086871132477 -0.30226953477171975 C2.0071288849805393 -0.4571094869379356, 3.625576952891625 0.5523224299333185, 6.688315269675325 0.725243161001419 M0.1550998381811084 0.29745318692135014 C1.6440790322618437 -0.419985895587877, 3.3222770185355657 0.12624110742452477, 7.289839357355601 -0.3030965588762574 M6.025581669920143 1.6821665968745947 C7.20633208914226 9.397347112667386, 8.470898212627826 15.285644312420139, 8.279912364595589 22.425973190810602 M7.448345644325968 -0.07802485954016447 C7.172565052889436 7.28805011543337, 7.471541623257249 13.824173804090986, 7.052947474570033 21.603766767192703 M7.2161053673507105 22.142612932809314 C5.322696451779057 22.253140756213647, 4.432325873590679 22.34991744765606, -0.08307555796070243 21.919423912106847 M7.402111014316135 22.296497102741046 C5.075828141690694 22.119849403792685, 2.5059160782790246 22.19208792825014, -0.2162256637019158 21.868507415461156 M0.4910249803215265 20.213669015910547 C1.6980799227952434 13.139553120063661, 0.941066303849168 7.475736525364237, 0.9579601977020502 1.6967032756656408 M0.7610144419595599 22.874971745420318 C-0.24699943891357967 14.034067309832816, 0.5082320905606788 5.515128091835646, 0.7519592745229602 -0.17385950218886137" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(55.66031502161354 12.28632084886354) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M-0.5355645325034857 -0.23554847575724125 L9.211057674997619 -0.20559231378138065 L7.224952411764434 23.143745245071706 L1.2792520020157099 21.40988838824874" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.3022695347717244 0.35565498652385585 C2.0870943787608054 0.3664182428493844, 5.657355229148095 -0.6394265962433922, 8.096774113791383 -0.6621762900182322 M0.2974531869213547 0.09479603644311518 C2.385340957156624 0.1926910592618869, 5.663849263716174 -0.11271156307046264, 7.068434393913692 0.24064158145378767 M9.053697549664548 -0.1362022664397955 C8.829244320973217 7.620235753016812, 6.700591092690288 13.892125203747833, 7.850660228097013 22.685747434976157 M7.293506093249789 -0.33799486327916384 C8.187708613011768 8.185909373451002, 8.085736837718418 16.1312718545049, 7.028453804479113 21.959472975353357 M7.567299970095729 22.300051978282248 C6.014901480351943 23.106526232925738, 4.436482704510095 22.537880971412648, -0.027420003396695702 22.972474247093235 M7.721184140027463 22.88823072183104 C4.777337294472156 22.871869115737546, 2.149304087776706 22.426389436520605, -0.07833650004238724 22.8506843230891 M-1.7331748995929956 24.550920834424552 C-0.7931232721156043 14.771556238886827, 1.4024765818768579 5.549757332436016, 1.6967032756656408 1.563819656148553 M0.9281278299167752 22.294977000574228 C0.3397859148321689 17.234187979457644, 0.12094516676049835 11.756417261258541, -0.17385950218886137 -0.37203015852719545" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(66.9310344827586 13.835594199804149) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.34322934899014734 6.404043390651338 C0.6624300982644543 4.481709919376768, 2.102082363092603 2.929648813544574, 4.934326984318441 1.0372144747373873 M-0.40120653991120553 6.139250233446353 C1.488549329046423 4.5581762812239965, 2.909793415498857 2.6493006605083895, 4.956180079238957 0.7079593829659039 M1.4327498647477137 11.248899830363273 C2.649045165702563 7.630088356332252, 6.7936722034483 6.388321735390163, 10.096647083078576 1.7101541454215734 M0.30706777685318465 11.648835073548883 C3.8360702587056545 8.791321417211156, 6.8719991901069255 4.273260110664813, 9.73160963670264 1.3875801326018957 M0.27039420860351515 17.159160381090516 C4.032425946573632 15.149862703097563, 6.696312015421077 9.839038391031536, 8.317191855706351 8.807011948040904 M0.21332125407175628 18.790572178550683 C3.1513909416209462 15.529562547607389, 5.456526753549495 11.987816070841976, 8.50728783850007 7.930197383630308 M-0.3209813630582181 22.28888302271853 C4.952756580839749 21.226999079212543, 7.876987736267955 15.747448815615574, 9.832110296822183 14.635123356593468 M1.0704283054078578 23.824989548685426 C3.922561173065596 19.568054720140402, 7.659986147368928 15.979765264129192, 9.062317568835832 13.870350727460973 M5.8215113547764386 23.894220058731875 C6.8687038848770605 22.574245062889702, 7.64673834736125 21.87058670165667, 8.94044955886346 20.259779638263197 M6.019476823754948 23.88537312035463 C7.085885682792281 22.58684324144952, 8.049396438575107 21.310415294368173, 9.164982628509446 19.69558451965152 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.154914029936921 22.60556192271389 C3.6969444563240272 19.791904554876552, 2.0943971435021576 18.196808337256012, 0.2550059190003878 16.98184214077915 M5.529157211508285 22.098179554044975 C4.051250120862574 21.065030629808405, 2.1310395191277633 19.401826510498633, -0.17878061067473822 16.87226819276835 M8.664201152777373 19.153788725701162 C6.820648636185988 17.290046791397568, 5.978489345500897 17.816305474268862, -0.623087389970201 12.40959661078245 M8.99208290268302 19.533738421131858 C7.689198470699609 18.231427825152608, 5.523383263083713 16.67110611556701, 0.303927809728307 12.360301257451777 M9.307349779801259 15.651035585059317 C5.7507843663399605 11.091575325869439, 2.066915540322647 9.813402668997822, 0.4000587232476538 7.695587443626791 M9.63923595372569 14.928537091278002 C6.928233149053461 12.087873349512567, 3.3331132466270876 9.526868840173425, -0.3868663421300731 6.735756595359489 M9.58058611154077 8.441964676693136 C6.280102485794384 5.954771907290144, 3.852731651767772 4.828067304363285, -0.32648926592360117 2.3674739472724733 M8.545789345535313 8.967272827304955 C5.491572577894129 6.783750429819175, 2.691651516175046 3.555525097142441, 0.044499105485874546 1.547150254078978 M8.823332010748404 5.29913711765926 C7.156662293843425 1.3238206331626325, 3.083964013598174 0.051264217982659455, 2.546328189216408 -2.651453008929086 M9.154374024345147 4.028175446057279 C7.335729850575792 1.9776862374470778, 5.192528646920037 0.7706968467230366, 2.281088480625155 -2.0374264172816696" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.3318240854747156 -0.617975727987058 C2.1380129566140633 0.01073319755071734, 5.759681939247164 0.18444185471795393, 6.7537409977222485 -0.5476046566960793 M-0.07251587977784135 0.3260167237690098 C1.2946083662070693 0.23138485495592137, 3.0656635811973376 -0.21407443714576596, 7.1819619345208725 0.3324439266189426 M6.051644527279905 -0.23764579556882381 C8.182432136010792 9.772904270303998, 9.204192420911458 15.889481939326235, 8.07328524526315 21.017951061608848 M7.1287744715075405 -0.09810798335820436 C7.391380596729638 5.6355844980710845, 7.582915443750741 10.950525262886604, 6.420023299680224 22.410575828174707 M6.69882546880465 22.06392867305334 C5.093871544329115 21.944800785638876, 1.688852744714333 22.043337227842024, -0.24321155493599284 22.471726185885817 M7.161794870496661 22.526681270104667 C4.8829520859719295 22.73687869683377, 2.40678627943932 22.71131594549412, -0.22882631997998196 22.463186994674015 M-0.4307693038135767 20.97143470275646 C1.7581314163380628 17.38230765908363, -0.0291559977358814 8.425697476372715, 0.323884567245841 -1.7680544760078192 M-0.16971688997000456 22.23250334582722 C0.9092588839477871 16.98188137279433, 0.16385639881556965 12.85688024174142, 0.07439181674271822 0.9673859877511859" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(78.01698592322589 13.502516930389078) rotate(0 3.68576547639492 10.973421957751771)"><path d="M1.2554447744041681 1.5747052635997534 L7.969310176485237 -0.5990929994732141 L8.55335338126963 21.697289063479822 L1.737737962976098 20.296054734733026" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.12701584375485553 -0.45765263995995686 C1.5688841382110517 0.06155833299447492, 3.567021902373667 0.6689852920610172, 7.212759489961164 -0.6137036826800424 M-0.11040909015224959 0.059688127814591485 C2.5037540019137823 -0.31984458593317544, 5.35350954333268 0.25603505880549593, 7.308977287408584 -0.1489033558806039 M7.018858515583929 0.14878363348543644 C9.119101528854813 9.545621116704847, 6.045116727085556 17.392846554636172, 5.835295223557409 22.3119581368529 M8.120037205443737 -0.8546781437471509 C7.194632533556559 6.855634245385087, 7.120044562584498 13.474038179622305, 7.540651954636928 21.586724076304655 M8.101307632638928 21.924839718791883 C4.012935709200087 21.45805445436708, 2.202371883681761 22.21554042374826, 0.5571828551717339 21.952611561260603 M7.048484799775023 21.922569268940446 C5.705870589631688 21.870750695170127, 3.618940025213984 21.76696117659838, -0.12877116950375875 21.708620191957667 M-1.9885263238102198 21.79266727238528 C1.8363539481168487 13.451780908127846, -1.3647937870020173 6.533084256394826, 1.8565778341144323 -1.131369462236762 M0.3958599613979459 22.50304055574916 C0.3685855455714068 16.286728846068964, 0.22510223463352536 9.987724990087498, 0.24846056569367647 -0.5600334005430341" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(89.33298427080138 11.681647179506228) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M1.5747052635997534 0.5977792236953974 L6.772437953316739 1.1818224284797907 L7.121976100766233 24.374237050893363 L-1.6507891807705164 21.391954114320335" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.4576526399599639 -0.3466241864864994 C2.3944513850948166 -0.27669633139691435, 5.391478976432508 -0.1714730472041638, 6.757827270109901 0.6181264234795958 M0.05968812781459243 -0.3258317074027615 C2.4594553907375336 -0.2835800572699514, 5.818697592270858 -0.04047184735080653, 7.222627596909347 0.2659403527415318 M7.52031458627539 1.9347719755023718 C8.744157341332441 3.7208969470214512, 8.423791137547498 8.183983817344242, 7.736645174139312 23.06507784280425 M6.516852809042803 -0.16867681872099638 C7.1048020336275135 6.462885384995293, 6.668855241263393 13.58142135122379, 7.011411113591066 23.29174111842608 M7.349526756078293 21.972376821110913 C5.2264661482068195 22.929263640360926, 4.3710208294268975 23.26665961107392, 0.005767645757062234 22.402657411607386 M7.347256306226857 22.948456334286195 C5.56824302446088 22.7745341487926, 3.7363564622913086 22.44311868781054, -0.2382237235458787 22.927426157132626 M-0.1541766431182623 24.37575912627822 C-0.33120365819393405 15.385111444120803, 1.0664744190269826 12.914077027954594, -1.131369462236762 -0.33531163074076176 M0.5561966402456164 22.804520384068248 C0.35605415069101565 17.938754926728762, 0.2731518682145905 13.440395360807733, -0.5600334005430341 0.6277223872020841" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(100.60370373194644 13.230920530446838) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.1957550689251969 6.687206199069612 C0.6824230711572109 5.422426855034799, 1.8719805533338358 2.4575357315330746, 5.664970709713644 1.1404830265638641 M0.0829556595860173 6.732782163892977 C2.1188497557710617 3.886008973449713, 3.9567827235703823 1.9899367041124933, 4.8638106777840315 0.344581752255302 M0.8968301805760368 11.06982517936039 C2.4902429966498163 7.364390327024551, 6.120375579574652 5.476856066649113, 9.829335682903482 0.72155629342992 M0.6178433518949555 12.096617914873688 C2.650085016481502 7.868621539301731, 6.723822717589006 5.43961653470112, 9.687760940351588 1.567158571003118 M-1.3490122233098607 17.191285490524642 C3.0063263846061985 14.342482621456398, 5.157437449657136 10.735572155399584, 9.485223053492682 7.2409602920331375 M0.09074712437109544 18.414757397957764 C3.1211115806126584 15.874891965141751, 5.025176208363602 12.086848239319288, 9.653047811430024 7.909850685465368 M-0.09294758498681466 23.907384307437045 C3.301273797188709 20.689821873596124, 3.894663824061067 19.24669730333069, 9.511903699374788 12.34245797337375 M0.8334506820047857 23.378130258979674 C4.448132782443175 20.21254896068377, 6.833544402347294 15.725559554509545, 9.748636921522237 13.99182425292551 M5.635629904004153 23.210709829740367 C7.430430604364139 22.774134488627883, 7.857945747593954 20.841972334551706, 9.028249271879016 19.439726612183065 M5.972980946548622 23.48288449381492 C7.136270914802845 22.58801297492195, 8.128127685902022 20.864720486174257, 9.424810994848652 19.87065607949792 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M6.093573960166108 22.25905401580013 C4.500946948496016 21.23148074017254, 3.2211458580362833 19.21610642600191, -0.40150092933487824 16.339766149148595 M6.053391279901993 22.600894986688864 C4.293200214913044 20.862442315061188, 1.7430336152780552 18.902267893840204, -0.43500880729567937 17.280989103317513 M9.28056087539166 21.053959040013144 C5.539345979974032 18.392080809253983, 2.1033110312715473 15.648656791086532, 0.8130343718286579 11.443160403487985 M8.864138618005514 19.935172753264197 C6.230648858602779 18.134327929612382, 4.2155070040628075 15.729422825920341, 0.10401449276320385 11.544578034957652 M9.91930146177281 13.970676033748777 C5.532784824774005 11.746392058067856, 3.599893450570887 10.763628705674154, -0.13580577596365728 5.780346014417082 M8.95505416963231 15.28963711973326 C7.075222996895217 12.8049450929475, 3.3516328339312222 10.302960568257399, -0.07489195823944428 6.851241379903003 M9.914210391039823 8.819560375526601 C7.351242665736041 7.821970066860937, 4.44496284258774 5.585628139576228, -0.11096915817747699 1.7385949604369935 M8.553151991399801 9.169551351942177 C6.432862980987809 6.823314034426687, 4.842541479864386 4.870687586403165, 0.5994117620315136 1.3359943686209854 M8.835978060009497 3.736580999886204 C8.347659821152826 3.4720363895725868, 7.108715680154763 1.2058362280737867, 1.1894054137094339 -2.670749506035638 M9.083029723240601 4.167416828536947 C7.1289149785426495 1.612810758842918, 3.7447169526498083 -0.736433303491734, 1.6729480865335487 -2.58238961320166" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.3466241864864994 0.004850752823676752 C1.380388737107566 0.6580429758862247, 3.129995505429311 -0.5679384186840555, 7.989657376269549 0.5719032709576586 M-0.3258317074027615 -0.32390716605711073 C1.8651716092972808 0.24003485311865097, 4.279113303814108 -0.16084098144352466, 7.637471305531485 -0.31438709722256647 M9.306302928292325 1.3780294749885798 C6.1924017800073425 7.794582133117894, 5.795378877041797 13.838951945177872, 7.800109707676938 22.56157623756176 M7.202854134068957 -0.19871648121625185 C7.297701774151899 5.476162073332922, 7.710087565692045 11.861530854627702, 8.02677298329877 21.824553778986093 M6.707408685983602 22.104365571495244 C4.946417998556207 22.859651121205268, 2.6389141305767048 22.64006631400316, -0.23384167630987995 22.153136845678098 M7.683488199158883 22.561308203186012 C5.371739314452907 22.452663755301643, 2.8378879236555026 22.39342043505877, 0.2909270692153608 22.473439022458976 M1.7392600383609533 22.457810809495506 C-1.236821325181765 15.539385362619496, 2.08164420569058 4.872922577486651, -0.33531163074076176 -1.8277274873107672 M0.16802129615098238 23.192485591987726 C-0.06507085133817472 14.31931004934167, 0.15885172914240084 4.49591916703497, 0.6277223872020841 0.7873526317998767" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(113.44827586206873 13.145939027390327) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7288802548916885 6.601354410548799 C0.1307399341667509 5.446333101498398, 2.5339963734626307 3.685793048281024, 4.3960674780928555 1.112589459478598 M-0.11087012175935002 6.568490841661684 C1.3743267155048193 4.559271304353472, 2.97511008037005 2.919962652303248, 5.190431292312688 0.24826566500936087 M0.9048826429154384 13.370234764597893 C2.149936808269546 9.450704941687233, 4.549994473829592 7.6092131995589956, 9.185732375894737 2.146323674382755 M0.057424880200962924 12.336570774156183 C2.3031140783307813 9.736671879327615, 4.686117832241708 7.3781105394577535, 8.790116459288699 1.3704131842834144 M-0.5600273345178227 19.480852370989197 C2.67674904016384 17.518290666756872, 4.090111348584494 12.7886346118244, 8.17081267098345 7.337386188916149 M-0.12720756787343424 18.523808062263452 C3.6419393606286765 14.506146586167686, 6.871419965449026 10.661317912947352, 9.714020063918161 8.301450540292295 M1.7455731356142672 22.741088442258935 C2.971506825795825 19.563762877401658, 5.266644716714533 17.663629320639323, 8.588778626061075 13.572044047129967 M1.4007615144456387 23.539323011400576 C2.15814810817845 20.85661566186247, 4.12908009898633 19.421967979493218, 10.01403673368416 13.606157121084326 M5.946273174615916 23.272025530152163 C6.949001223554608 23.20588902768108, 7.8682662822926 21.735310773533232, 9.626406752238093 20.09946102521083 M6.094293408342045 23.856727724122024 C6.974621503419702 22.552231523011216, 8.451177561837145 21.183436602212385, 9.426913928672715 19.88137470081149 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.005240949492585 21.73629579417385 C3.895579427011634 21.610770598672232, 2.0355012261312817 19.057325867840902, -1.039723313787698 16.70704622708286 M5.7123769304413905 22.297232173979058 C4.289855913299945 21.38813618960238, 2.855769281299908 20.034441500902005, -0.2059429718197578 16.917933361530554 M10.311413944220245 19.76244446453173 C5.842689971237482 18.167559276547543, 5.208536332339324 16.029726454267834, 0.23993961045704526 12.385934234378642 M8.79379541565299 20.398451994587667 C7.424459454318667 18.600533953461728, 4.965966874032738 15.801559337332986, -0.2500695161689098 12.355382872899776 M10.245945599274526 15.00983737911144 C4.557634894486528 10.988886122897988, 1.5329844528330594 9.278923550139261, -0.185635923587673 5.429290055550009 M9.22640738908183 14.980893167518543 C6.194439806060802 11.718346821497732, 2.1031870027601163 8.305777312483393, -0.5480599275258861 6.260195954070732 M10.107376539702392 9.142722466304953 C7.574923475598925 8.634021473684244, 3.8486310924248137 4.950124268563841, -0.385837753777605 0.7566167466499149 M8.97538882139591 8.96357209412014 C5.8070628410215 6.730006974433245, 2.428176578440914 3.743225226414562, 0.42249030497928275 1.5603410003863512 M10.425120054724234 5.209807874952961 C7.659912855922 2.5665881224313796, 5.607550261954416 -0.5914069596914173, 2.04687416776255 -1.7965226080806973 M9.351751944495904 4.657759043240507 C6.586356361689889 1.8169320144231618, 4.823675997639515 -0.1229408694693293, 2.2223247381782554 -2.622038071563286" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.3892816547942453 0.4919462791317373 C2.6911196114404436 -0.07522285232662045, 4.709373966138839 0.3850977131170454, 7.742139628844251 0.42008558635712 M-0.28918891042532224 0.030270157602468983 C2.6887961815893053 -0.23988074860606007, 5.412565545969817 -0.36117647214401133, 7.077201512538528 0.3123480307626846 M7.957674228512815 1.7220624182373285 C9.24828715885782 10.476796354456308, 6.83862044180536 18.4537688313247, 8.903783940636686 22.020475437524375 M6.773377723918429 0.5454891426488757 C6.7699815981347795 6.169376358632517, 6.655013149257445 13.23046888925239, 7.7857416643958 23.0017590137521 M7.004117950186225 23.146922815943345 C5.615467452018722 21.952637206089783, 3.092277505064931 22.946080835388454, 0.5845872242818122 23.025915115330523 M7.33514844475029 22.315126459891548 C4.734551637783485 22.62072071746868, 2.603057080248773 22.379054544451666, -0.17140091309227312 22.749364622245366 M-1.6635945830494165 23.262380709531364 C-0.01805405426005205 18.705986644907053, 0.8940909047128789 11.085820378744659, -1.5918935928493738 1.7043795678764582 M0.3914818214252591 21.855501106599924 C0.6088518867555469 16.015586707932236, -0.117858581775108 8.063500182494533, 0.3322915779426694 0.04675887059420347" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(126.41533860447021 12.425543420240587) rotate(0 3.340937890188002 10.628594371544882)"><path d="M-0.9847520384937525 1.0343498680740595 L7.784357583301471 1.8991090152412653 L6.573150491493152 20.97876907896015 L0.40439279936254025 21.702006772269932" stroke="none" stroke-width="0" fill="#f071ff"></path><path d="M0.4626702333435717 0.053225398801870716 C1.6711114256439332 -0.5556020172239233, 4.848922866846407 0.5631991571878138, 7.034859378861481 0.08804253366663639 M-0.2913061063479764 -0.29766729990549823 C2.240153419433423 -0.13054552680806225, 4.2538057425122515 -0.14876949402900078, 6.784182002122344 -0.1507593583465518 M7.307757401990102 -0.06824306584894657 C8.412244828896377 6.749750082697048, 6.5615594112442475 10.219802786296754, 8.386255348252462 21.363309707601992 M5.900877799058662 0.4464438306167722 C6.852842973721117 4.5717695636387825, 5.81181653047809 10.036363981048, 6.728634650970207 20.354631195885567 M6.590489195536083 21.309742139516256 C4.611526009299493 21.835193217584685, 1.4329768112072243 20.89854602741276, 0.3058446212155237 21.74655027489418 M6.498969256864388 20.93886092470857 C4.781681152824069 21.5031845001482, 2.1353103559249176 21.04526379894313, -0.1078528730421916 20.94057461440266 M0.5089259836822748 19.623347070177523 C0.6513333412343466 15.455195141635713, -0.6522062329119241 4.813004135335113, 1.8812052104622126 -0.3432857785373926 M0.4267670614644885 21.73001269541445 C-0.33461315425288407 14.312780228341484, -0.16370836885821546 6.349623574561974, -0.27960427571088076 -0.7779928399249911" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(136.2048401385332 12.42554342024016) rotate(0 3.68576547639492 10.973421957751771)"><path d="M1.0343498680740595 1.1024818029254675 L9.270639968031105 -0.1087252888828516 L7.093111288660225 22.351236714866083 L0.4448180291801691 21.757930471446436" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.05871894175208603 -0.6687671280275931 C1.4108231851597772 0.2736341284684559, 4.7056107199581065 0.3383736546106434, 7.468660603716863 0.25169392399795165 M-0.3283903782424522 0.00821273786043547 C1.8202522037335573 -0.1691789095514301, 3.771695492401156 0.1771400599332419, 7.205211297501096 -0.08171975953095667 M7.303287886940893 1.8389684055000544 C8.032879592649511 5.451109284494298, 5.597084106676152 12.630032161169206, 7.477651917302069 20.51192188530795 M7.817974783406612 0.29014770220965147 C7.177182219181729 4.60270315285127, 7.789740435038281 10.753922658363495, 6.468973405585643 22.591482821736555 M7.4295085329133554 22.349464378689202 C6.216601712667501 21.51942315334391, 3.440176794530306 22.27417373426137, 0.5398699103918245 22.171064136624466 M7.0203476259128825 22.286204368691784 C4.959875384975806 21.762016749730883, 1.7906832737158787 21.78750228047774, -0.34929276245483043 22.138655774584677 M-1.63384167291224 22.775450768765133 C1.8096446536970872 13.057609370517483, -1.3491389609383804 6.999445477762997, -0.3432857785373926 1.25218422152102 M0.47282395232468843 21.63184658053897 C0.6810213537988495 15.01617596249763, 0.42591713984583135 8.094088832920646, -0.7779928399249911 -0.49237601924687624" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(147.5208384861088 10.604673669357297) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M1.1024818029254675 1.8991090152412653 L7.262805663907102 -0.27841966412961483 L7.775923752152494 23.081317117097434 L-0.18891344405710697 23.595641007306632" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.6687671280276033 -0.6496885626783846 C2.1687227845043253 0.5787153301066035, 4.077508593879771 0.09125775675181594, 7.623224876787909 -0.2399110830902425 M0.008212737860435582 -0.15129599901315827 C1.9052408346010832 -0.10572283744378166, 4.39025407384342 -0.10064157964025477, 7.289811193258996 -0.18829948096712432 M9.210499358290008 0.8439018931239843 C7.4306527844219366 3.3161582704326955, 9.146749364450471 9.922224691076368, 5.93660892259436 21.515179516407308 M7.661678654999605 -0.28026663791388273 C6.443861610839026 8.009419093680757, 7.241499222466604 14.451697484271978, 8.016169859022966 22.00114186167216 M7.774151415975618 23.22467358769744 C4.24785838524712 22.197150167475083, 2.29627527567968 23.18216675632145, 0.22422022112092765 23.08904793000878 M7.710891405978202 22.888851648031693 C4.825270363423301 22.313430999324822, 2.4489655066381966 22.216969660087035, 0.19181185908113718 22.94387208930801 M0.82860685326159 24.317912878128347 C-0.5325342337118868 15.690783295325186, 1.8236036976350065 11.356521373044487, 1.25218422152102 0.18455704115331173 M-0.3149973349645734 23.40014301299548 C-0.24069767154584507 15.5019001353388, -0.37620939768682105 6.572228920584957, -0.49237601924687624 0.5171749340370297" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(158.79155794725386 12.15394702029792) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.013008253243739332 5.658745958705537 C1.1582632845551535 3.7839145756650368, 4.009100879140464 2.2009422307441544, 4.89662575903768 0.36136894708846146 M0.11947054977995658 6.143046715532534 C1.6984690175865513 4.778406975115222, 2.74078888659107 2.4411618675461995, 5.280430777328557 0.613406866028648 M0.2578982087876307 11.184614981196402 C1.5533114393351806 9.499575112159896, 3.5175849329780453 6.99746953015816, 10.015513383661462 2.5053495818992344 M-0.19185702688064105 12.24963403804931 C3.8776687282704296 8.679757956655553, 5.960312610568196 5.12691821954405, 9.02275212422858 1.5265954972488074 M0.8167141224676511 18.167722516786927 C3.0388044264607164 16.58581435285724, 4.6490191043578335 13.98912383333935, 9.673196522988455 8.908306322506848 M0.3918111537070711 19.07651747483822 C3.961021379795617 14.374404927294238, 6.82956459985793 10.20803774523026, 8.93231475595765 7.404529978501828 M-0.1981439864636747 23.771888617925743 C1.5973173768516802 22.545256650451222, 4.214424483280778 18.062823025858112, 10.006333934403054 13.584333022845605 M1.2188467676843167 23.832582095625277 C2.7093921191758845 20.48591076875468, 4.7305312437861655 18.930299741316357, 10.000637383041003 12.906102225206489 M5.509095397325572 23.263537947945437 C6.1870068669120535 22.5770900206534, 7.015562042794304 21.659339845951955, 9.335419618258296 19.859516686412306 M5.764661662526768 23.70272547488115 C6.920128527209017 22.470023918098068, 7.773192300422957 21.437998780339203, 9.419969331905289 20.15874414041339 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.412531408171784 22.854026697799338 C4.065781921735993 20.09063008743878, 2.83427516438454 18.463323104440946, -0.3614184508116238 16.761086945161473 M5.795738842215073 22.260075401365533 C3.778604451883542 20.64980105885142, 1.8392325528012485 18.900505555495435, -0.4842113090695136 17.427025978565467 M7.993788230871855 19.251635424462147 C6.109472205258845 17.11799980910361, 4.671540317263059 15.357419985632335, 0.5363550348039692 11.854882884738748 M8.930067226184606 19.975018166233788 C7.383945809262959 18.229407751154696, 4.935615171334181 16.740971247865225, 0.10796590520255012 12.68320789464355 M10.663626029686817 13.861893810881766 C5.840092333575545 12.256077127518786, 2.2473504831014885 9.497972046610403, 0.5296206481820414 6.1792659514316615 M9.963995282091016 14.686076613925861 C6.428192114589025 11.288537221705399, 2.2626819040911696 8.72444633242534, -0.09819443136728623 6.517561025844261 M8.955967867369628 8.694700291469747 C6.718783011369739 6.049542821471571, 3.7815261016505857 5.340587563411138, 1.1512087740849433 1.929563199744397 M9.145104845794712 8.92578584877407 C6.933457071454239 7.18972351906894, 5.321359347715506 6.143196194647029, -0.06229959103206972 1.467270636387938 M10.174803196432608 4.164741279875447 C7.044376627742389 2.0726424514207276, 4.0998592576806745 -1.0833575550212828, 1.7593487523420177 -2.9358503725918057 M8.988257190658318 4.4024965491532875 C7.175815235363304 2.9619738370298974, 5.4804752947498745 0.9763005661916502, 2.196342954311373 -1.7887271098399582" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.6496885626783846 0.32467650127106795 C3.306470238144878 0.45629274021601685, 5.540895678653157 0.31857063680071585, 7.131619869699711 -0.07276501607932684 M-0.15129599901315827 -0.23345343515657627 C1.3973131166932775 0.11122298675601391, 2.9711084042823552 -0.08336230710304507, 7.183231471822829 -0.30658097404606177 M8.215432845913938 -0.011717012152075768 C5.771913439451348 9.203717732092223, 7.469927996812951 13.563754298235937, 6.250211381279996 23.419462730767783 M7.091264314876071 -0.7508787410333753 C7.804881784378251 5.069802587793861, 6.570032659231385 10.690726340632327, 6.736173726544848 23.307831010440943 M7.959705452570127 22.219160210859346 C4.815743839515644 23.014172838095387, 3.690119993542236 22.90098135535676, 0.45254884209151336 22.826830189612238 M7.623883512904381 22.492028886082572 C5.310809883115598 22.585386507467074, 3.4064588974205035 22.896183037636053, 0.30737300139074375 22.421147444890742 M1.6814137902110815 22.161566367032584 C-1.2924475467309904 17.15073944373224, -0.15836350084399692 9.154753600143806, 0.18455704115331173 0.7529335115104914 M0.7636439250782132 22.49240653834736 C0.4015089214697626 14.599893866071803, -0.8850236295804236 6.630121176401225, 0.5171749340370297 0.5512409014627337" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(169.87750938772115 11.82086975088285) rotate(0 3.68576547639492 10.973421957751771)"><path d="M-0.1087252888828516 -0.27841966412961483 L7.77592375215238 0.4448180291801691 L7.182617508732733 22.90598583489291 L-1.2604091558605433 20.736481371220272" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.32467650127106307 0.5845872242818031 C2.659594255002289 0.16209850719486066, 4.792050379176028 -0.17494222682240132, 7.298765936710514 -0.6427452560514255 M-0.2334534351565727 -0.17140091309227048 C1.667544063680147 -0.13606045548611248, 3.027637299173139 -0.15804028116449023, 7.064949978743782 0.11534264366276564 M7.359813940637764 -1.5918935928493738 C9.277708283471558 6.615288143438178, 6.138406685399507 11.897014786341941, 8.154494595640358 20.38484795286886 M6.620652211756465 0.3322915779426694 C7.481968086712415 6.661298934205447, 8.079848122351224 14.492355736574181, 8.042862875313517 21.81007605593286 M6.954192075731926 21.501697237049015 C5.439505790688921 22.545531275873383, 3.056877441989829 22.458210207573064, 0.19033110169496958 21.54327427612387 M7.227060750955148 21.694012492286497 C5.7434536496189645 21.648021665071596, 4.5451575477049 22.304687428917166, -0.21535164302652104 22.040633006538364 M-0.47493272088468075 20.469474091079157 C1.7902786840829692 16.812388371217594, 1.1612576951417766 12.54906142515974, 0.7529335115104914 0.853534122928977 M-0.1440925495699048 22.455383626628738 C-0.21047230226820998 13.547098552809345, -0.4232440458042055 7.220513804002161, 0.5512409014627337 0.9495545076206326" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(181.19350773529663 10) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M-0.27841966412961483 0.40439279936254025 L7.816348981970123 -0.18891344405710697 L8.33067287217932 21.376089932056722 L-1.2103625442832708 21.226123990150747" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.5845872242818122 0.38941602741325654 C1.6398218087364076 0.21922627387084875, 2.8454732325285232 -0.10523266620643364, 6.728785696738518 -0.6567807564849145 M-0.17140091309227312 0.11286553432810104 C2.061565106269111 -0.03502907249964269, 4.26747004214859 -0.25989028701474653, 7.486873596452721 -0.012576396805469792 M5.77963735994058 1.7043795678764582 C7.5731668394721465 7.43741802787012, 6.34572630344996 18.527643758981945, 5.809534990155271 23.52938674915081 M7.703822530732623 0.04675887059420347 C6.405530850650645 8.0923814522512, 6.672731171609736 15.56681701456005, 7.2347630932192715 22.715149751524088 M6.926384274335419 22.97391079177682 C5.297321114159136 22.8189854213324, 2.595920296836166 22.785096301906446, -0.403569639379679 21.934132434163338 M7.118699529572905 22.517514416829425 C4.132057998121635 22.799416906631464, 1.8985435697537305 22.392253404523803, 0.09378909103482408 22.335401226321498 M-1.477369824424386 24.517704298379478 C-0.5335923871800159 16.159895669158207, -0.0057495674893115745 7.931857318867602, 0.853534122928977 0.9456479046493769 M0.508539711125195 22.356894812206384 C-0.9508550900602565 16.066217030857402, 0.3443126839017644 11.031927913754025, 0.9495545076206326 -0.0543626444414258" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(192.4642271964417 11.54927335094061) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.567355200615699 6.777249366839521 C1.9037561241922645 4.341789029725279, 3.27065785341736 2.5307012737222676, 5.705021855565256 0.04934442973111208 M-0.016461285543615534 6.177423811380856 C1.6438679629538473 4.255618784818018, 4.179019914204079 1.395638612041059, 4.959164331065959 -0.039220675841230734 M0.19373443267304769 10.734978154159664 C3.287109871059163 9.535394042660874, 6.902778428962417 6.031215373286549, 8.220309403975559 0.6140461636950462 M0.23047158923928013 11.674262762765913 C2.4900306541884705 8.98688005447295, 4.811316347948462 6.0217594996020845, 9.640729035035772 2.0900570712295563 M0.39219238231421905 19.27880357047605 C3.4840081940074823 15.09209514747888, 7.6786140733975285 11.542246420949079, 8.441332623086947 9.205994176752151 M-0.21844932600392508 18.592901345343016 C2.8434398304496673 14.27241369406649, 6.183048459181963 9.803682102738042, 8.849655428905116 7.356153825651141 M2.126710600741079 23.0276675786231 C1.4832697620247632 21.85898232796814, 4.604364718868101 18.39458922054696, 8.640875116809301 14.121748954778234 M0.4519616590904655 23.02150516182783 C4.09866092766867 20.525034623427164, 6.131695308343058 16.094319623308156, 9.200961694055266 13.960919194783711 M5.669233822742816 23.305471263751528 C6.3523683349978235 23.033964067742314, 7.330258679350298 22.00510726840174, 9.004792740302204 20.16949086787811 M5.801038281772625 23.638513091644413 C6.652850983537542 22.997535484168356, 7.740627058888571 21.80889189373379, 9.030031701516403 19.979387638510005 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.120616082351338 23.009334019049778 C3.489772369413227 20.916992910472807, 2.286960567652815 19.63576735791621, 0.20025094207215377 17.104350837990893 M5.631942789133322 22.0607970921693 C3.5884767136729607 20.488886810414424, 2.3581107497105074 18.995806581254037, -0.11535949018013408 17.473175587803375 M8.41702828974098 20.672517095920632 C7.075918833355789 18.789453019549875, 5.220787351741552 15.951173780353281, -0.05698293691315004 12.437650127677028 M9.220752543477415 20.045969207440503 C7.183178509479742 17.970574775799328, 4.897212841951202 16.357144606780516, 0.43841625860146927 12.252216813867694 M10.020160729968737 15.348411119347901 C6.794016666434738 12.523701647400676, 1.635596342067878 9.12657105269637, -1.222762144164138 5.309496609001727 M8.946990627441371 14.754392020907716 C5.8377932512083 12.09453520757226, 2.287382619752878 8.717850704892236, 0.2083445430333235 6.860394317676619 M8.128458099807476 9.57071029888099 C7.2354192272202935 8.219067008508931, 4.381863765005492 4.037941828394149, 0.20718048521389387 1.5400311052955922 M8.7708608923432 9.795908790729994 C6.827550300438322 6.812951417491322, 3.5785689194234074 4.76737689850458, 0.07747152214010378 1.9586745703749593 M9.374862127917426 4.20942298354726 C5.6910310011291605 1.7835134050983985, 3.130940389529428 -0.7145401340147588, 2.7960190143360224 -1.8256336600126746 M9.86127079415934 4.6421511469795895 C8.145470372041883 3.2753665534918586, 6.76298471061984 1.2153854405988382, 1.8595674903056048 -2.277150132281524" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.38941602741325654 0.09712965092702464 C1.813464647156557 -0.29836980551842374, 3.050776430237876 -0.7012039784796167, 6.714750196305039 0.016425475720871163 M0.11286553432810104 -0.16631965528874623 C1.8271475282797118 -0.15401929834866968, 3.511153601575278 0.14962282628122203, 7.358954555984484 0.33890031305866086 M9.075910520666412 0.10612096451222897 C5.935282761919892 6.624779376564859, 8.153168434966004 13.93020193351495, 8.264418614023498 23.216794492336568 M7.418289823384157 -0.9025575472041965 C7.846197199657581 5.521166950778951, 7.872890215948246 12.17628049447743, 7.450181616396776 23.182682052845713 M7.708942656649507 23.176368998309098 C4.796117388293722 23.044533262163323, 1.9625944832506073 22.18841478069213, -0.7023666537539254 23.315219994293763 M7.252546281702114 22.287206325462428 C4.847178405805248 22.8933058046646, 1.7238504976317437 22.67972189430868, -0.3010978615957674 22.789201614580058 M1.8812052104622126 22.293213309379873 C1.3059382920366607 14.127040316269163, 0.8066875820261321 6.102001585243414, 0.9456479046493769 -0.6299946699291468 M-0.27960427571088076 21.858506247992274 C-0.459758510577728 14.634101794226462, 0.583858410131882 6.5469091400822315, -0.0543626444414258 -0.13920983206480742" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(203.2398338093226 11.464291847884084) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.11800852570218356 6.769682457347505 C0.7760396345729886 5.16557043832652, 2.0534504663091457 4.628771424501106, 4.676252462300009 1.121962122976523 M-0.4818170297564814 6.1902949121344015 C1.2155433274190546 5.1921395720845025, 1.982141480171994 3.484191375847998, 4.587687356727666 0.3957510404137188 M-1.1727349189017309 12.292908288023948 C3.094476424575242 10.543743620507419, 5.0479318069874 5.892090901598098, 8.457093619619561 0.8249195153136938 M-0.23345031029548174 12.100250320273965 C2.775807684083158 8.594849168975648, 5.519288640451704 5.381025146998058, 9.933104527154072 1.947437113640943 M0.7118496562773127 18.655075042974822 C2.973785881127019 16.474777970176962, 5.884645679154769 11.858683303956061, 10.389800791539328 8.912953541687907 M0.02594743114428122 18.06876564640614 C3.2396313530760206 13.849690266304034, 6.318178245250688 9.978408112229207, 8.539960440438318 7.269365544784101 M0.6230100417135866 22.097982762508494 C4.413742535625726 20.037420852671428, 5.897568003235523 15.879505029059082, 10.057083337641352 14.108776738656381 M0.6168476249183179 23.631961638095255 C4.739213082321844 19.072555639569597, 6.921407658909388 15.725589120844472, 9.89625357764683 12.995432043097608 M5.527721759410914 23.376662169508776 C6.797060165278413 22.836431687864476, 7.304377471493449 21.893227264640167, 9.445584409603892 19.624159729691954 M5.860763587303799 23.638736386941886 C7.137065160859093 22.79087824327103, 7.866639370106391 21.23044469402011, 9.255481180235787 20.15117490186988 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M6.425222381453644 21.65999798648037 C3.471523986771593 20.049683507136013, 1.340307040142407 18.950212010869386, -0.2689652094633572 17.1715000673386 M5.476685454573166 22.03400053888537 C3.881085573609633 21.186803394626164, 2.324118740630918 19.547706783260953, 0.09985954034912653 16.710998265743505 M9.83013044640988 19.72672166046221 C7.451487454164194 17.78747447928472, 4.057052209790512 16.015838179376882, 0.41145686337910337 11.69525003504545 M9.203582557929753 20.43359683625866 C6.400720315019209 18.209647360735307, 4.166740926374808 15.631770449938866, 0.22602354956976822 12.643759144624477 M9.853147252993473 16.04086404646793 C8.23740949689022 12.929123322239496, 6.073788556186298 10.486246971238785, -1.4682244233721424 7.3998169594177305 M9.259128154553288 14.944963240288661 C7.175561229373053 12.803727113177535, 4.488597732035945 11.112298161294314, 0.08267328530275009 6.3831124051672665 M9.323918664450622 10.04762618811434 C7.9552941302264735 7.7949856742995225, 3.8166796656294966 5.569243657293415, 0.10943285607804665 1.56422508882063 M9.549117156299626 9.804026231445427 C6.919052330380272 7.480973717678037, 5.213006303791429 5.712458720949668, 0.5280763211574138 1.9029279349528487 M9.309754132273214 3.381365241800953 C7.177030857031283 1.778123143326903, 4.999109750726371 1.5559887840228923, 2.288191976390632 -3.144582321921182 M9.742482295705544 4.602542283797224 C7.43125151233723 2.8082631929275466, 4.391726119010941 0.31859004178466765, 1.8366755041217826 -2.296215211560768" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.09712965092702464 0.25169392399795554 C2.2962247040178116 -0.1117066188206525, 4.4295263181646725 -0.6957223592262402, 7.387956428510825 -0.30259199802631653 M-0.16631965528874623 -0.08171975953095795 C1.556252934556709 -0.32915408091168197, 3.6044474747104345 -0.03514950367109, 7.710431265848614 0.1555212231570372 M7.4776519173021825 -1.4349220301955938 C6.070036891967015 7.56065176248102, 5.62936050084229 14.448263900494547, 7.9518263572092565 22.0759658120895 M6.468973405585757 0.6446389062330127 C6.811392906555066 5.762555055245011, 7.3099822732294895 10.26109692904949, 7.917713917718402 23.43439914904034 M7.911400863181786 22.860719309038192 C6.029959386102401 22.860571359998193, 3.379720495852142 21.967873604549293, 0.6787209063764972 23.14120420814612 M7.022238190335118 22.828310946998403 C5.7563850720130985 22.397350510673174, 3.620282279489581 22.31160429210393, 0.15270252666279238 22.946363932891988 M-0.3432857785373926 23.888683309438285 C0.3517694445688072 17.136023578208984, 1.0207465263444726 11.075328950129096, -0.6299946699291468 1.5272878501564264 M-0.7779928399249911 22.14412306867039 C0.40205934696647794 16.722965349696302, 0.8344389205500808 9.652587164108445, -0.13920983206480742 0.20219639968127012" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(215.51724137931035 10.387318337735152) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.061922028693382924 6.618385345538251 C2.484142354807133 4.557275987195287, 3.7104214876341066 2.1590077460211443, 4.4293491337024635 -0.3486126091482824 M-0.5984903420926878 6.252835131113522 C1.6925950878817 4.252451465079096, 3.7084753522121745 1.8807468315635174, 5.310030171977824 0.6057859808054977 M-1.0003993735555898 12.302017691589475 C3.6532821995576628 10.257945722745205, 6.012188566867276 4.982218835132635, 9.50887060507114 1.551784248488137 M-0.25092944893058067 12.564749265413313 C3.5650478446472214 8.675524597507952, 5.848050082562786 5.864321862154854, 9.357543481485052 2.040201624414065 M-0.24774983455894994 18.22805288573789 C1.9342538418395172 14.281024764131303, 5.686799659177284 10.270719679036546, 10.292630335412996 8.397865867502274 M0.4332222981797299 18.83387231454315 C2.09100354024471 16.685532834026738, 4.216603599378313 13.517580899569154, 8.981598201770412 7.765742806325885 M1.818886775858572 23.06381667511205 C2.5178705892813795 19.609037181245764, 5.488617726436562 18.51932973859877, 8.186311332209996 13.273302561765233 M1.4184405194687306 22.82257448107603 C3.259880875765517 20.69272143683649, 6.476597996694614 16.67688192974677, 9.520906969838805 14.039326541986012 M5.592759752193805 23.825611792430422 C7.307942866480669 22.4480183445804, 8.212307620526063 20.989759635203818, 9.10149448152077 20.059241229320676 M6.0050420330505485 23.603591028293735 C6.997184120310262 22.826767431007063, 7.422964857937555 21.837638021920853, 9.053178748082413 19.810472426355616 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.929861573379171 22.098431423529757 C4.256652082614477 21.011830189662597, 1.574092713930904 19.304944365128065, 0.4650933569281107 17.312558349892747 M5.5869803829748745 22.08048941776134 C4.357565699133809 20.956594082376572, 3.1516408444050854 20.236826496550613, -0.00457543637764396 17.13023668780948 M10.121445232337415 20.182753168460913 C7.062717129171834 17.198955974588607, 3.4216690828510186 14.707999372327537, -0.3524713898245271 12.114539642600125 M9.251906651524902 20.28591876760591 C5.579113179342314 17.57830213233177, 3.6714818775702374 14.709627436250415, -0.2918783027846392 12.603444145029192 M10.473748190486619 15.868717778092714 C7.538174725392173 13.909576462265296, 5.004208184830361 11.334146893947477, 0.13938320819944816 6.027800551682085 M10.138568288478815 14.899667959418611 C6.538828202620438 12.459409775321086, 2.6625674854352317 8.949545898549559, -0.8127368688052079 6.439563459922865 M9.432038860763287 9.117622396811898 C7.605199956174584 6.99650064732268, 3.572165479234237 5.115933526413237, -0.09012071183810799 2.5139482081093125 M8.661270981835997 9.381338697575087 C5.689245326046724 6.923768970958996, 2.1051725573445337 4.108438762764942, -0.1915162765506921 1.648643966966146 M8.546483688967841 3.55379644813161 C6.324612367895975 2.6493646259805126, 5.35246386323143 0.4417941854987504, 2.4521757926715937 -1.48178662365098 M9.031431922203469 4.489962965102453 C6.431945841591261 1.3809518556959768, 4.00655561273762 -1.008657331300717, 1.977829662050905 -2.6438578148910823" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.6829217684119491 -0.16973786348026032 C0.8791864254884287 0.37920312296655917, 3.1910578855279725 -0.3957563616960291, 6.912090617661279 0.34360756759894406 M-0.2667292824749523 -0.17323220549182303 C2.72063262621087 0.3456653111718335, 4.749708376006726 -0.24394909796231215, 7.229682582308265 0.25241052498899064 M6.411005449407867 -1.5265737567096949 C5.937954922552307 5.561845427003775, 6.9635645946218565 9.745134224364707, 6.6320605398352654 23.660195352169332 M6.938223285527101 -0.19440644513815641 C7.192152617843996 6.40195309500238, 8.36637523618544 10.782680196483248, 6.733016384692064 22.89978243469691 M7.544573473186498 22.153125447951556 C3.9002544672952384 22.429798976618194, 1.1287212224425012 23.16429728723289, 0.4340183562157014 23.24560414699097 M7.5258059476205545 22.92420604742868 C5.076450937305314 22.22806534097259, 3.1177178310772287 22.601830003019696, -0.35215642511539313 22.44407947640819 M-1.5875023510307074 24.310512484642324 C-1.8448505228911976 16.617934432282617, -0.9709475463782885 10.968346110757873, 1.2057764027267694 0.032736023887991905 M-0.3091855840757489 21.939170167441127 C-0.7192435305716547 15.113115754060097, 0.8845006484387365 8.430376655925649, 0.4167503220960498 0.9547978984192014" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(254.65193321074992 11.704604740973252) rotate(0 4.375420648808699 9.594111612924156)"><path d="M0.37308462522923946 1.1679444406181574 L10.596934376692616 0.7590200398117304 L7.407183049654805 19.46093582347178 L0.10979988239705563 19.33834641412043" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.4970000984258989 0.6147504312965698 C2.5666620831383358 0.7487948395769995, 4.128921428782176 -0.6417130746241304, 8.272290934589876 0.3662837488167566 M0.25285741212791574 0.006698909873221304 C2.8902842951674472 -0.14102344353873647, 6.429024648194269 0.36868418834627936, 8.794904652839426 -0.34729952833660566 M7.9589205791888045 -0.026688731649193276 C9.744641337618305 7.393941411420018, 8.780272605784631 14.342967080889228, 8.067429173524694 18.594951025302336 M9.556193932658989 -0.16320947611656744 C9.004016722076814 5.33446042948282, 7.69212199793707 8.256938093495318, 8.407539454827885 18.614547775853975 M8.569000174871066 18.629208647847438 C5.519885994902364 19.46477885996846, 2.1876181193779862 19.218828070649888, 0.552670143263983 19.31392880516876 M9.173045233581423 19.560118124066634 C6.287719263156525 19.300496098985526, 3.6924457566728823 18.812921311523343, 0.0585408116636153 19.365088039564757 M-0.41609017383506286 20.33151485154626 C-1.4926972045136928 11.202753158025413, 1.2303449870514662 5.720055084151639, 1.1867734767147577 -1.6167685946440988 M0.08323916493677708 19.212010396381977 C0.7880354359612445 14.716351143686706, -0.10193379675387909 8.544537708445223, 0.3371522058606785 -0.7768110637272067" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(242.91751404262448 11.828652624152255) rotate(0 3.68576547639492 9.938939199131074)"><path d="M1.1679444406181574 1.8460930790752172 L8.13055099260157 -1.343658247962594 L7.644243550413307 19.987678280659203 L0.15012318827211857 18.474299481697564" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.5178532758647338 0.43401835621569473 C2.261361330927914 0.06381452758670983, 2.742281981323233 -0.3612548262620555, 7.680080942451037 0.5754139190228198 M0.005643025601875018 -0.3521564251153877 C1.8956947127412969 -0.2006670287477837, 4.413176392372298 -0.005482314156328258, 7.078972884833602 0.3085010392335906 M7.3438829858008035 1.1984138354448346 C7.38434852279161 4.473797221653989, 6.9559779386091 7.840851079598597, 6.7569356085145005 18.49173644978054 M7.202455469275614 0.41420561125309296 C8.220040110953592 3.499216224336667, 6.670334685829903 8.532473172713628, 6.777236697358505 20.453978844271916 M6.90062842757593 19.79492982039429 C5.518596404261556 19.611319662354457, 3.2536666107433927 20.32811857372328, 0.10589182655516516 19.96267537816339 M7.684807660727328 20.099250695704555 C4.68994052143129 20.28043248139598, 1.5067263148703065 19.998323192745673, 0.14898732640999424 19.979513993323827 M1.184383339816526 18.492225600191233 C-1.8903670984794712 13.742831615289422, -1.0586227186870658 5.9666931334308515, -1.674877821890868 -0.7203190822808749 M0.024642119165571907 20.680484423502843 C0.5436687365131456 13.932691305689548, -0.6069250239741115 8.575474430044567, -0.8047308852647257 0.18540327031420079" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(230.1756529961466 11.828652624152113) rotate(0 4.720248235015617 9.594111612924184)"><path d="M1.8460930790752172 0.7590200398117304 L8.09683822206864 0.27271259762346745 L9.55029635242829 19.338346414120487 L-1.4035789165645838 19.030341736824802" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M0.5558341660672721 0.7800624045249582 C2.5828066630286814 -0.7863515569485454, 4.59313141865066 0.3288307274932898, 10.177411651712172 -0.15796415723026758 M-0.45099606981130963 -0.2464259697001864 C1.686833317646383 0.3020064300352456, 3.8260649471210657 -0.37611562892857087, 9.835584409095432 -0.4010621892081429 M10.597331808830319 0.03140730669447467 C9.553771005153783 5.1840214667691535, 8.677892274095669 10.53897299011605, 8.102446171237634 17.721628365530922 M9.840331380522395 0.9160437605219266 C8.689507183390967 4.206838784782955, 9.258265154250038 8.999792608424523, 9.996609337665673 18.98816384591761 M9.334266738165393 19.45051885073929 C5.926037012331108 19.124877580776985, 3.6513167109184614 19.263528621112027, 0.10859692437756219 19.906719693816658 M9.724001267874732 19.304636972035688 C6.985717077934134 18.858877137926424, 3.7452056170936077 19.459267581889787, 0.13016162891457073 19.569811755306763 M-1.3375781192640064 17.48833856758677 C1.427073089684667 11.285789962731954, 1.2283824940982677 3.560582750119568, -0.6953278950459845 -0.13275511075310864 M0.7747599248859359 19.777490216395275 C-0.13185714081654573 12.265742616062306, 0.3205043728733239 5.609265686521168, 0.17897077677576567 0.5602694660492452" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(59.352226529371364 104.59731144489695) rotate(0 4.375420648808699 9.594111612924156)"><path d="M-0.369867792353034 -1.8882046733051538 L7.223056184536063 0.2848064508289099 L8.643029259925925 18.993351820708746 L1.3624025080353022 19.71968859791898" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.257932677524917 -0.028721444771974358 C2.015914188556263 0.8165451309521037, 4.702480022283992 0.0017425785039629493, 9.059581898991084 0.5348390666981043 M-0.37868762312197013 0.002130872364445424 C2.709257334890388 0.061260596374975884, 5.994891460276675 -0.10025076017536844, 8.335863278300463 0.14684392818677044 M8.170504025536683 1.3454491302861096 C10.477795174083466 7.091496019151587, 10.09845522104235 11.66586883463413, 7.196738823848957 19.38523572593781 M8.857216882438347 0.08731965499660588 C8.122543677918351 5.120618531274985, 9.037327240162123 12.762092335700187, 9.457736861131576 19.326607092623682 M8.690144125729535 19.056705469109595 C5.029676284001398 19.989292894577815, 3.3228012814648547 18.538004371077598, -0.4158204075901513 19.801096916444155 M8.921443670668673 18.918627596176115 C6.6693829215346465 18.91386832889657, 4.890851915223433 19.393695824080837, -0.21989156389074815 19.573036665448676 M-1.5059652661234475 17.403730984649304 C0.35625614458301097 13.536888628688969, -0.6449348133499031 7.107155746099529, -0.5938359338392722 -1.6865075833037624 M0.2089025903478724 19.71385672471351 C0.20810779298953008 15.111003901532284, 0.7633006527000255 10.329384148624959, -0.8558645648956917 0.7250154940643584" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(71.23383216618095 103.78360838930405) rotate(0 3.340937890188002 10.628594371544882)"><path d="M0.2848064508289099 -0.10781203769147396 L6.487004375236438 1.3624025080353022 L7.213341152446674 19.622163966407506 L-0.28099522925913334 21.33647717546436" stroke="none" stroke-width="0" fill="#f071ff"></path><path d="M0.5895505386496986 0.08510253820951419 C2.168904403863962 0.2570942471899623, 4.8114113501468125 0.3913692868669061, 7.261195876415207 -0.6261601212656447 M-0.01583928731945461 -0.17164892237791307 C1.2870896349911798 -0.25568930357922115, 2.964633695267779 0.15418294899674684, 6.7033197308718275 0.09837600365191917 M7.516608679818319 1.3917889799922705 C5.8767811729200465 4.114394308167641, 7.036275274921237 11.788436632802972, 6.861973310040639 22.745043184717623 M5.6991797640569075 0.2805962609127164 C7.363333273234086 7.349649115798138, 6.8454610887190865 13.75427802811764, 5.900110329652534 20.65711687050524 M7.333644228278222 21.828013108689525 C4.267268308812738 20.89131964411412, 3.2334150103557633 20.583967770045163, -0.6137339157226358 21.808177926859653 M6.417287822165951 21.06922459632513 C4.518573195474599 21.047359219281873, 2.4455150340090306 21.153199165262752, 0.3316425848200567 21.538405967216207 M-0.26194762252271175 20.783501949270693 C-1.1375339718747903 15.929418975252847, -1.5601664873052408 10.32645561488246, 0.6143809054046869 0.7806847896426916 M0.2371787829324603 22.114219675881294 C-0.37630518069159147 13.52510443479062, 0.6392260313998545 7.392231755040276, -0.9441023366525769 -0.763892556540668" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(81.02333370024394 103.78360838930362) rotate(0 3.68576547639492 10.973421957751771)"><path d="M-0.10781203769147396 -0.19487140513956547 L8.733933460825142 0.5314653720706701 L5.736506176107582 21.66584868624441 L0.0792884323745966 21.278763069655817" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.09388621027868205 -0.03597341293252998 C2.447341735313624 0.3854975949589043, 4.782740603270547 0.5740733801730895, 6.680743086174846 -0.6379975228306478 M-0.18936528991423496 -0.15352799063399614 C1.8654802390263747 0.12719316318872426, 4.532705851378314 0.02715206180755199, 7.480060620171816 -0.1114739523781908 M8.76331993278211 -0.9541467931121588 C5.9477951404094345 8.328279915527409, 7.887592839860881 16.42596274294139, 8.8593853944177 22.168595733936947 M7.652127213702556 0.662110517732799 C7.960654726148933 6.9280380728749, 7.752436002613395 12.785834028590731, 6.771459080205318 21.87748238805316 M8.00127170492763 21.44334724400279 C4.920580888931103 22.535020193183502, 2.4808349653286825 21.341668728076986, 0.6078583254631535 22.23426786270946 M7.164166498341353 22.05914402593206 C5.429043244450673 22.212623575335762, 3.788551707032669 22.254336898321682, 0.3102424439247914 21.657570921271827 M-0.47368679381906986 20.611619475182216 C-0.6814919521964778 15.645975950654844, -1.5756442954696406 13.475202802714385, 0.7806847896426916 0.4354808423668146 M0.8570309327915311 21.687000695500593 C-0.9531618220826636 14.916074923002547, -0.2460229558965217 5.429827609671982, -0.763892556540668 0.14240322541445494" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(92.33933204781965 101.96273863842077) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M-0.19487140513956547 1.3624025080353022 L7.902996324860624 -1.6350247766822577 L7.09053572353082 22.71578752029186 L-0.6680808458477259 21.70598917234188" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.035973412932530535 0.26007681113229186 C2.4264682950930347 0.3988221338800127, 4.590974809829297 -0.7001764876428732, 6.733533429959296 0.0035900071905613506 M-0.1535279906339985 -0.3495690540011441 C1.6602642378383106 0.14048032330613575, 3.0967890320397413 0.14640272536482518, 7.260057000411761 0.258440288935869 M6.417384159677795 -1.619850317016244 C7.671355414263387 9.133668958105988, 7.6461056350389915 19.11529759296943, 7.5932827712233575 22.818526686760244 M8.033641470522753 0.7368014799430966 C7.2933213112366815 5.796349339336301, 6.367317527295413 9.935461293272962, 7.302169425339571 22.486207530493495 M6.868034281289193 22.286220423146244 C6.209872132979358 22.01011997384532, 3.33858950044144 23.30733148859421, 0.2874239472059208 22.1822953236674 M7.483831063218471 22.45126690116519 C5.946775009099274 22.526557994444584, 4.2395733636118385 22.905268640952364, -0.28927299423172015 22.29372530328431 M-1.3352244403213263 22.01754033717757 C-1.7282090701601893 13.68745588956595, 0.6440287909962742 8.545319668883039, 0.4354808423668146 1.0957418885082006 M-0.25984322000294924 21.744426325554606 C0.701999019053257 15.030528310256788, -0.9977913096878166 7.0305327684086745, 0.14240322541445494 -0.05390601884573698" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(104.35117536322502 103.17893471994631) rotate(0 3.68576547639492 10.973421957751771)"><path d="M0.5314653720706701 -1.6350247766822577 L7.0905357235307065 0.0792884323745966 L6.703450106942114 21.016333999928158 L-0.6441347394138575 20.91883766919009" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.45053756557095337 0.40821075490787484 C2.9444588469007726 -0.7167955477282232, 3.97269346940062 -0.02241767392266744, 6.696637562535469 -0.03494820933020126 M0.12369834225570414 0.10260726532328662 C1.856385946883014 0.17844347818220369, 3.5539834596387734 0.32835409973609275, 7.700886769685429 0.15383148513877587 M7.576878261678871 0.3199473824352026 C7.446925928749922 9.599176391639336, 7.48858021751869 16.40467149886593, 6.025754702203926 19.98145188286535 M7.5157692860076395 0.5285827564075589 C6.735077863941321 6.59980153621907, 7.447481787214407 14.249455260183414, 6.647762549967524 22.92227127950273 M7.887803300410619 21.257490726217824 C3.8532517973293583 22.519674374489597, 1.9765548240462727 21.457612284776584, -0.46254431736362145 21.363050265998996 M7.6956899775725125 21.87837571292056 C5.940672923618456 22.22003992207792, 3.4887972046781663 21.567023693520177, 0.10501861495221537 21.898570040317647 M-1.7578569557517767 20.433732821490686 C0.7609007173241755 16.663272036934295, 0.5820006542863032 10.91407246758786, 0.5567001793533564 0.4743575658649206 M0.7556879920884967 21.761910019327026 C-0.8870749507335112 12.717985773038397, 0.1994306113335207 4.963006724808849, -0.09743570256978273 0.6812012540176511" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(115.66717371080051 101.35806496906346) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M-1.6350247766822577 -0.28099522925913334 L7.45081938516455 -0.6680808458477259 L6.441021037214568 21.992364348503408 L-1.0280062463134527 23.746983291240987" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M0.40821075490788106 0.6391133507851483 C1.4846980062894506 -0.6830512793877537, 4.354561752999472 -0.7199471468114775, 7.3365827434597515 -0.37873057982847574 M0.10260726532328818 0.023657240874532248 C2.012932971950967 -0.13621362082933372, 4.067246898073847 0.30461614844445306, 7.525362437928732 0.25649038864412443 M7.691478335225156 0.18009752966463566 C8.676046693299309 5.932774394207455, 7.370219850514428 10.076268900962189, 5.406138920151761 23.197691609742698 M7.900113709197512 -0.7817654507234693 C6.938245140492512 8.220374612000692, 7.388025455653263 15.855703876079833, 8.346958316789141 23.490786692718622 M6.682177763504226 21.959419970431515 C6.271426663916891 22.975550269885264, 3.5014019598676382 22.225582005315715, -0.583793649504554 22.221770179020286 M7.303062750206969 23.002371435685866 C5.245713890020728 22.412057898642082, 2.156638154768955 22.80634950782602, -0.04827387518589771 22.54920416635314 M-1.5131110940128565 23.250879993321952 C0.850740085625579 17.792270545183825, 1.1657971844910882 12.633780497070116, 0.4743575658649206 1.7140618655830622 M-0.184933896176517 21.69239675126469 C-0.5619585138248864 16.576146668216364, 0.14802802387004466 10.133552095116771, 0.6812012540176511 0.26573268603533506" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(127.92194183757294 102.7916163822116) rotate(0 3.340937890188002 10.628594371544882)"><path d="M-0.6680808458477259 -0.9305099155753851 L6.037741040962146 -1.0280062463134527 L7.7923599836997255 21.440167382468907 L0.5619014706462622 20.803078308810917" stroke="none" stroke-width="0" fill="#f071ff"></path><path d="M-0.5783086611239967 0.0032541384213965463 C1.7628300121611689 0.03093743748765526, 4.4058675348468395 -0.21571286386868482, 6.048146637689529 0.22425109858033998 M-0.10104483143446913 0.23426150122321343 C2.4891205120052375 0.11976433786628224, 4.6136502200766385 -0.179062459425873, 6.411284815348373 0.034302630245775234 M6.903627598809408 0.18202759884297848 C5.355241365691317 4.751823685102868, 7.262210138102663 14.758652470947718, 8.155478740262197 21.545665409525363 M6.612514252925621 -0.15029155742377043 C5.791926020165134 8.47620232885593, 7.369982027788806 14.31852129600721, 6.206698144937263 21.95754847965899 M6.942409328133602 20.845478726387178 C4.882902457442042 20.783111745046646, 3.546541353481942 21.51587505540474, -0.33580499636546657 21.844852377837416 M6.4196661760953315 20.946483699001206 C4.896631583921409 21.26967465104851, 2.8418726130802345 21.09038709168632, -0.10339513714048365 20.96354419763968 M0.4354808423668146 22.352930631597964 C0.4914001250216673 16.96261409176247, 1.648761811251373 11.199637868699439, -1.784145524725318 1.5113759841769934 M0.14240322541445494 21.203282724244026 C0.10325391306037626 15.954350847782404, 0.46642230166549403 8.471598056098614, -0.14049761462956667 0.0396442161872983" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(137.7114433716357 102.79161638221117) rotate(0 3.68576547639492 10.973421957751771)"><path d="M-0.9305099155753851 -0.6441347394138575 L6.343524706476387 1.1104842033237219 L7.5545095921689835 22.508745386149805 L-0.4541104342788458 21.582070841338556" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M0.0035900071905613506 -0.6748933902543711 C2.8156244272918896 -0.31493082754540513, 5.394089291976189 -0.6353383557192126, 7.6189276373012484 0.20521453064657325 M0.258440288935865 0.32935581689558924 C2.4126052527247914 0.2787851791197587, 4.3417094696987455 -0.27622462829365374, 7.40937405387853 0.05896255082212959 M7.553558551632818 -1.3457762505859137 C5.569407466148625 7.342874315132373, 8.859020534252416 15.036966546500073, 7.660007619225439 23.00400942831866 M7.221239395366069 -0.7237684028223157 C8.215261479532064 5.722612346896295, 6.556805481826604 9.261471848751135, 8.071890689359066 21.01168784860156 M6.917327188539982 21.48429959813992 C5.253357134934082 21.560857401327162, 4.5273708369444465 21.60512193671999, 0.648318049565346 21.809907510337574 M7.02875716815689 22.051862530455757 C4.380053096229801 21.801694395093072, 1.239056782772804 21.774922476917755, -0.32395242399752855 21.667995283890406 M1.0957418885082006 22.5035440948569 C0.3886897297374413 16.402666514656424, -0.6053542761333778 5.64621985791786, 1.5113759841769934 -0.369867792353034 M-0.05390601884573698 21.84940821293376 C0.8488515610837811 17.700530089165444, -0.6498620812749988 12.782253276127106, 0.0396442161872983 -0.33404042292386293" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(149.02744171921142 100.97074663132831) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M-0.6441347394138575 -1.0280062463134527 L8.482015156113675 0.18297863937914371 L7.933432423436216 22.18238865363842 L-0.3647730741649866 22.675436069848594" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.6748933902543814 -0.03494820933020182 C1.1419408393935055 -0.30741498198705314, 2.3422047304416744 0.2470376837923559, 7.57674548343653 0.047314481749064496 M0.32935581689559423 0.1538314851387782 C2.6850090092450656 -0.22752612326046256, 4.558517822432586 -0.01384495669848343, 7.430493503612084 0.03318986286109671 M6.02575470220404 -1.9653920326381922 C7.957821898793945 9.799161547383914, 8.870232336854706 17.23835769027223, 8.428696465605071 21.072968186470327 M6.647762549967638 0.975427363999188 C8.21011753606264 4.749568453024475, 6.880652295822542 11.565553910030719, 6.436374885887972 21.71799366235232 M6.908986635426325 22.052705438412712 C4.137484432592825 22.815678932349314, 1.3624313566855046 23.239396761057634, -0.13693640516596906 23.368243783454467 M7.47654956774217 22.588225212731366 C4.452291857539247 22.321878123721916, 1.5935757656773504 22.24399190699314, -0.278848631613141 22.74972228444211 M0.5567001793533564 23.110856653782186 C1.664773057127853 14.442264395101303, -1.8334343331805272 8.79877856647413, -0.369867792353034 -1.8882046733051538 M-0.09743570256978273 23.317700341934916 C0.2334975812132272 16.75722228914092, -0.17273271945167454 12.552614483234128, -0.33404042292386293 -0.46525495778769255" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(161.72894020703052 101.49728754044006) rotate(0 3.68576547639492 10.973421957751771)"><path d="M1.1104842033237219 0.18297863937914371 L7.933432423436102 -0.4541104342788458 L7.006757878624853 21.98578089743487 L-0.47294519282877445 20.40944087773196" stroke="none" stroke-width="0" fill="#f938c5"></path><path d="M-0.3787305798284699 -0.3070559812679923 C1.515911185025236 0.2508915054444284, 4.6353131167016 0.05080930268208385, 7.588590287553791 -0.2229479047563816 M0.25649038864412055 -0.17583806547328604 C2.42982044493939 0.007076804060934597, 5.515642278239493 0.0024235658333410376, 7.64572507953248 0.040866259835481755 M7.932723474615273 1.324221035465598 C8.69745620609203 7.0722764060926995, 8.281018759020954 12.004068577867262, 6.171387207620796 21.808120860602777 M8.225818557591197 -0.6830286337062716 C6.737508995673953 6.954570375960547, 6.2775305642107115 11.589915853217697, 8.196133507342097 22.336754707504134 M6.956802043892867 22.171444136360574 C5.243811057698601 22.536782600118435, 3.7200835044496854 22.620209246090276, 0.6204848878495828 21.36829792704011 M7.2842360312257135 21.70077770823479 C5.731552396479343 21.613236716468833, 4.040858888034598 22.050412336991204, 0.14387105228058317 22.02709792822489 M1.7140618655830622 21.427157475497644 C-1.8452345844453226 15.671762915091044, -0.4309568520730387 4.485725273019405, -1.527785113081336 0.2848064508289099 M0.26573268603533506 21.129331527162414 C-0.11911355306860219 15.520741655211136, -0.3126563613628984 8.55009600475644, -0.32206736970692873 -0.5140031231567264" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(173.04493855460623 99.6764177895572) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0.18297863937914371 0.5619014706462622 L6.917420518511108 -0.3647730741649866 L7.410467934721282 22.16355389508849 L-1.5374030377715826 22.654462815853414" stroke="none" stroke-width="0" fill="#daaeff"></path><path d="M-0.307055981267997 -0.6991381080022882 C1.7839625800940166 0.24308758862942392, 3.1204462729142737 0.2549323927468028, 7.148583048033569 0.516880577871738 M-0.17583806547328876 -0.2985194187692965 C2.908014020716865 0.007664473093305646, 5.773531702117446 0.22289604901381965, 7.412397212625436 0.033545551978326205 M8.695751988255552 1.4736029598861933 C7.437559644454481 6.3249320957438595, 5.585552076571943 9.33538942068844, 7.232807897889188 22.335915973069724 M6.688502319083682 -0.47517763543874025 C8.292693295714152 4.21743538043813, 6.67382813357998 11.12978544923248, 7.761441744790545 22.020339748481867 M7.596131173646989 22.266034714413113 C6.270934033881952 22.4580897918616, 4.605445711380439 23.21551108487716, -0.5785459884634403 21.950951518651355 M7.125464745521196 22.522432248177818 C4.6260820130510005 22.3401914719913, 2.641761117831958 22.762911155885025, 0.08025401272135041 22.83843146910243 M-0.5196864400058985 20.852353563191947 C1.4576870111417537 14.84559441408027, -1.9418936463403935 6.2666402118680065, 0.2848064508289099 -0.10781203769147396 M-0.8175123883411288 22.4960014732877 C-0.11455589946528824 15.732069257853368, -0.6194550734402791 9.17365290393027, -0.5140031231567264 0.5552421016618609" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(210.6412950645422 100.69136735811665) rotate(0 4.375420648808699 9.594111612924156)"><path d="M0.038936981931328773 -0.47294519282877445 L7.213438259845816 0.017963727936148643 L7.877952503448569 19.973239902259344 L1.5399870369583368 19.90248424172544" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.05616764326389667 0.25767398968417565 C1.5306319621200837 0.589234442021449, 4.372557940968321 0.3408694697451067, 9.359807521801656 -0.41747935805775416 M0.039400122504704205 0.3254994523160138 C1.9650126110442203 0.02582332189513896, 3.538582613785503 -0.44396988796685516, 8.873613965015 0.2897012031081508 M7.250772299743614 -1.151431304270461 C8.645425623772098 6.881875299502077, 7.389735103255443 16.212187989667417, 10.39006742361777 17.8776126365281 M7.869616940622751 0.7911328944676733 C9.259312653008 4.253535366292679, 8.283222437486616 8.929558517840503, 8.211068017609948 19.48054236565661 M9.619505044948728 19.924808992891272 C5.878572487846078 18.358144793735804, 4.629240731305262 19.05735422153004, -0.20725789787439475 18.604006367150713 M8.885250042604472 19.359014443288988 C6.491679991699526 19.36173483483863, 4.163767371793327 19.225793945031658, 0.37498708400038455 19.074530886822927 M-1.8115646383734638 17.722449136301684 C0.12824338892046394 15.040049315602323, -0.33196489992050693 12.40585172230394, 0.5098938098050272 -1.5686610197385973 M-0.44637079934814383 18.879228196663394 C-0.50074735399388 14.2340516868688, 0.08016885375106861 8.484016480023914, -0.21783930955323527 -0.17498367934641712" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(198.90687589641675 100.81541524129565) rotate(0 3.68576547639492 9.938939199131074)"><path d="M-0.47294519282877445 -1.5374030377715826 L7.3894946807259885 -0.8728887941688299 L8.156547629200873 21.417865435220484 L0.7142610158771276 17.971375803090815" stroke="none" stroke-width="0" fill="#fc54ee"></path><path d="M0.2170593347639519 -0.2229479047563816 C2.97573654822009 0.6539801856162721, 5.225374910974272 0.5082493291133348, 7.019854821843268 -0.5970388375385838 M0.2741941267426403 0.040866259835481755 C2.3206910399926577 -0.25464875474030724, 4.212092270954689 0.09678322855553992, 7.615569361571591 0.27156774577309495 M6.178715381444581 -0.13787600081764295 C7.177854260067202 4.811380694305147, 9.41554464313404 14.198299961109976, 6.013814940455399 18.933326072779646 M8.191098418104145 0.3875299654778923 C6.5732447458145735 6.731887607011164, 6.398520390912251 14.151886586978382, 7.674356506057661 19.378386107273613 M7.992015840639422 19.299332409798716 C3.826206048121642 20.161090097720297, 1.5554285316134495 19.776463024687615, -0.49213241453750733 19.649744718783257 M7.515402005070423 19.958132410983495 C6.009908896133109 20.00311635265529, 4.331840614092632 20.216404539995672, -0.09577211695621601 19.549081299267293 M-1.518456334825299 20.16094579809303 C-0.07039279375586244 12.218685972674162, 1.3908445042097917 5.475076758497851, -1.625041184451782 -0.2792794498852422 M-0.3201008005541234 19.36701381934033 C0.6182518979913452 13.926641266503403, 0.3456353767026228 7.56787397779233, -0.1812728702802966 0.01934961480065711" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(186.16501484993887 100.81541524129551) rotate(0 4.720248235015617 9.594111612924184)"><path d="M-1.5374030377715826 0.017963727936148643 L8.567607675862405 0.7850166764110327 L10.980483506989572 19.902484241725496 L-1.9065025951713324 18.401451710464052" stroke="none" stroke-width="0" fill="#e08fff"></path><path d="M-0.28552263041870807 0.6619533041476315 C3.9385530188021294 0.3662165178157768, 6.846880900030825 -0.4781791033277768, 8.675886910042694 0.09692902723486574 M0.05233618147858898 0.04296077260813502 C2.665552149155429 -0.5340873366792497, 6.09879249599879 0.24230217725081157, 9.788285058557065 0.06808407377929138 M9.307404022830859 -0.2883827952807716 C7.901706602526823 8.642395160184485, 10.929715696244234 12.230836130845299, 8.5287150159583 20.532089122217034 M9.814581235784969 -0.5911501473888852 C8.662246148864654 3.202256121959725, 8.724478318624852 8.216456402772998, 8.958333867612039 20.032014802935784 M8.69957014186155 18.31026321331612 C7.052706156565347 19.144041066224396, 3.9033472776704894 18.63742776573469, -0.29216389507265106 18.358471106568643 M9.543275353909488 19.446831911611543 C7.453961678062816 19.33702775900452, 5.609190456978142 19.102421035792542, -0.4210804882047847 19.54492671693619 M0.2732464877333365 19.08478715356549 C-0.005805337270289007 15.0937157186618, 0.6910502736423759 6.81633884697292, -0.2695899592211344 0.07607020698156708 M-0.4931403332957248 19.72092869540222 C0.17705003431885458 14.342267749488169, -0.12856514968360688 8.782234051296903, 0.018678287525978998 -0.22687444833976067" stroke="#000" stroke-width="1" fill="none"></path></g><g><g transform="translate(128.78551472372624 43.63110606137064) rotate(0 0.6689726410712069 25.499127641102167)"><path d="M-0.11396733485162258 -0.5172206226736307 C0.49421996988353767 8.064764385073092, 2.668021065369131 41.90783711745692, 2.9708841971306583 50.58666851497415 M-1.6329389149881899 1.8256346050836147 C-1.0846361757256324 10.589759684818441, 1.2893051975033 43.02417936594513, 2.2555832671186407 51.51547590487796" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(128.78551472372624 43.63110606137064) rotate(0 0.6689726410712069 25.499127641102167)"><path d="M-9.627772883528586 28.812116302754166 C-8.191227618343932 32.4175108136179, -3.892590440391108 39.77484918645406, 2.9008493324926574 51.15371757404107 M-8.310003533975895 28.856388560700598 C-5.331996795898999 36.29735609692032, 0.12924019606037973 46.58343227924906, 2.276089867967971 51.38960550290692" stroke="#000" stroke-width="1" fill="none"></path></g><g transform="translate(128.78551472372624 43.63110606137064) rotate(0 0.6689726410712069 25.499127641102167)"><path d="M8.019102309500205 27.212187460793825 C5.146916510163693 31.19037296747343, 5.148891973888661 38.93726206391053, 2.9008493324926574 51.15371757404107 M9.336871659052896 27.256459718740256 C5.650157380467125 35.36265801452498, 4.4376883717032625 46.25379626005339, 2.276089867967971 51.38960550290692" stroke="#000" stroke-width="1" fill="none"></path></g></g><g transform="translate(10 12.76662868256274) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7552016228111336 6.4426035710173775 C1.621243728538996 4.299708212135282, 3.1074143241110512 2.366726119987497, 4.6126878235065405 -0.006717076540250166 M-0.485753330183203 6.581137013857357 C1.6515754652648869 4.646728176401884, 3.1990555128822553 2.313813745920484, 5.211171267139262 0.5875905185786301 M-1.2102604017500174 12.500325718380093 C3.1818136753930677 8.98340670422732, 7.280621028042143 4.961200982196949, 10.171657053073002 2.1164157463480917 M0.314295311840217 11.929176432590515 C3.5107046363204786 9.217293830248776, 5.5524839798492165 5.273127431200929, 8.701614792243642 2.1064432523456604 M-0.27357727815865074 17.511522884574408 C2.648758499697518 15.267172215390914, 5.445603818982638 10.325406455700012, 10.236679371915834 6.788906896001877 M0.5671327813016017 19.026803090423964 C2.453525090925607 16.14856697525979, 4.016786112156146 13.240677970246768, 9.615174326676952 8.118868169914217 M1.244936765320471 23.522857990429205 C4.5931753897498275 19.797064317171383, 8.025440202348898 14.587438966480832, 9.567000565751485 14.723294842963734 M1.1935768501924933 23.428394847816254 C3.697228728853348 19.39297638067927, 7.539100467117044 16.33404521544999, 9.07403526299543 13.99516550537541 M6.3197550713218416 23.616661421880266 C7.212091287681375 22.125320225137845, 8.331885681693405 20.97085948034352, 9.521527787404178 19.486949527288587 M5.687275202725422 23.680924404105312 C6.901522153293149 22.781146819126423, 7.511698712191703 21.559421149433785, 9.210891520571007 19.763713223279254 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.2340910712887165 22.16052501521982 C3.3012735679612963 21.093576691015166, 1.9948683828300124 19.890107289090835, -1.0519921476037495 16.478169092461716 M6.151125781114828 21.943003031462204 C4.556681157354208 21.528870094411054, 3.047172754751848 20.038069597303174, -0.42629353311073714 17.17598928466464 M10.235068684523998 20.900413667079995 C5.951159826143114 18.3130208611814, 3.8782560626094917 17.122240424273592, 1.0249070977624481 12.351513953475063 M9.039400917558073 20.021634454880843 C6.349555132820052 17.433826776694623, 2.054701411776085 13.472411219182206, -0.17420711109228687 11.909538326566821 M10.142008377397248 15.856200141078325 C6.416396580109765 13.306236332723314, 4.189216435004601 9.992989587090843, 0.12006462161154219 6.900307366877168 M9.326860079560245 14.932715251055079 C6.998328656360223 12.539625192417406, 3.030417247934273 10.133785220410727, -0.2502952783768556 6.85757502691944 M9.901504903281904 9.912960812911445 C5.756876564643497 5.371202885859453, 3.7366096251139393 2.9818122735653803, -1.1537258057941016 2.0324578583397206 M8.554620808171904 9.270805758141035 C5.667376011511837 6.564894041267278, 3.1793301560033473 4.550190141688032, 0.3253454363818945 1.8803889358572896 M9.578435295241492 4.889782883544971 C7.498976932491038 3.0040753157202316, 5.091224478759875 0.005368196122827129, 2.1711066927684866 -2.968308046513987 M9.164992728716301 4.603654146046896 C7.915394289161332 2.9513231775263526, 5.99789631016448 1.9416086000214126, 2.180452701534929 -1.8271188815662684" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.057201613398925155 -0.19641453560150834 C1.699812103655209 -0.4514639959478138, 3.411916645775749 -0.25025606413398693, 7.436843496510396 0.6351625165069053 M0.3515048248741449 -0.3005456320647199 C2.7416175364991604 0.23454588431571824, 4.8398018722926555 -0.1687970767873844, 7.19130043266978 -0.026606238194261833 M7.298265290373138 -0.6067905221134424 C8.815628573386467 5.154582130712671, 6.723947748629845 11.259481623748746, 7.186980140322021 22.18423044833785 M8.223170978875032 -0.48430084716528654 C7.506573572113407 7.155510875369227, 7.134918733789814 14.407509983462823, 6.839316083044878 22.09252162336802 M7.825286958430421 22.12557423888637 C4.7484641760252515 22.869828597752953, 3.9278039991741847 22.44021278538961, 0.08954056512396147 22.183367470634717 M7.649182903439767 22.993383637660163 C5.37599036369493 22.489570033500236, 3.3973659797048343 22.93699053093902, 0.15289052736329456 22.438355508205664 M1.8339905831962824 23.195940555664357 C0.9457828427045663 16.86841038289031, 1.8647488380163033 9.652839667513723, 0.03198964335024357 0.17207415960729122 M0.1757476134225726 22.149546608919856 C0.5013853302599612 18.311524545820145, -0.7816903302071866 12.221981372185764, 0.5752083761617541 -0.11647429596632719" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(267.9310344827585 10.697663165321401) rotate(0 3.6857654763949768 11.318249543958633)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.7972335740597793 6.318284847636812 C0.49068469613281374 5.3648339337173985, 3.2000245763489477 2.7698239832317504, 5.306576158436483 0.2770983267423921 M-0.5867226827563593 6.289518955026858 C1.3281313765770475 4.649887215737377, 1.9379477138079473 3.6692545056039827, 4.900843532341069 0.7575040750053936 M0.037014898946474695 11.326365638754844 C4.701987058877135 8.459820346404223, 8.273390662117684 2.522310757923557, 8.688209556852533 1.822325688066074 M0.3880379628482865 11.360683243114083 C3.0643856694391785 9.34873852964501, 6.442111163959109 5.557584174075994, 9.711286449589831 1.7670150102359443 M0.29085988061965895 19.457614272367827 C2.0269877429963734 16.161587532958286, 6.34807921242555 10.830144019280512, 9.47945936182894 9.021799991539897 M0.07105135065035417 18.280064124055826 C2.9307024961103068 14.867385362106889, 4.752673628626287 12.832374226729666, 9.724265333455133 7.250856359469922 M0.7832014226912172 22.29613393371592 C3.496449920176392 19.60331272600894, 5.496694346870157 17.2428686376453, 10.029319625235193 12.926615180981972 M0.6612416233027936 22.874657655122157 C2.260034854258274 21.431048029384865, 5.440014436673795 19.307922346044947, 9.073947022137261 13.635691040776365 M5.914031323524531 23.92563149791225 C7.551943756234243 22.749352745895976, 8.289229393781728 20.50665394277791, 9.227315955529033 19.587613082143278 M5.817067213602703 23.871081461357093 C6.614235159747115 22.436888721428723, 7.534574137560055 21.59226030851202, 8.971242394730492 20.160759775784598 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M0.11254258591050537 22.73433086528881 C0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881, 0.11254258591050537 22.73433086528881 M5.360962947707307 21.59287161700405 C3.5110655605176966 20.51090267080492, 1.8661468722127266 19.37254289004061, -1.0507447848112577 17.489618448838844 M5.8472162029404755 22.32375497728564 C3.687039546571605 20.58939550305942, 1.3282183817453599 18.882939125452378, -0.21906741392028267 16.987988727092993 M8.10906020448178 20.345073549118826 C6.559621191883568 18.425987346349658, 4.41291781624305 15.996725982437372, -0.43369632532634095 12.90567803454191 M9.651833972228765 19.577616463829763 C5.399905923023596 17.434355210786602, 3.0652837871700904 14.467938945006898, 0.31047415210120133 11.558793939431911 M9.832623835997472 15.955002232445867 C7.341841132438334 13.390410505706186, 4.568598166506716 9.867589909630823, 0.24965204671681596 6.614904623545081 M10.087241978682393 15.127367776297497 C6.141580058795535 11.569772010599072, 1.874114652538208 8.796423327554828, -0.16625890344656324 6.077429287062332 M8.047968661780333 9.865047290638143 C6.590443559389407 7.479719150788045, 3.6614616319392113 3.8990644890152044, 0.09348193788041481 1.9974527709342773 M9.583866068157231 9.56875050989544 C5.892769389664476 6.604074022135377, 2.881959730850925 3.8507324537128147, -0.48754784437755916 1.508833396264189 M8.61973440551998 3.4391351364008678 C7.4305235156137766 2.9108461156711467, 5.064664852023305 0.24333181629985212, 2.341408791385444 -2.3157057073505642 M9.129756143762338 4.698461833262403 C7.065833912001313 1.6483850815105323, 4.223482234137086 -0.571609781163823, 2.3428401145076774 -2.2243072740340653" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.04264309311222214 -0.2933056011221431 C1.462131578716766 -0.3827730617278492, 3.4610864092484577 -0.1941846224421251, 6.6355325930281746 0.45375600564046814 M0.24177138367184803 -0.06134323327573826 C1.7453047193680715 -0.2572587432224599, 3.246241509885488 0.2928656479164419, 7.068083924463665 0.27765195064981374 M6.4460469120379 1.9598688576370478 C8.143589025426405 7.907609268842981, 7.1656888183837095 16.669507166010828, 8.441362404191068 24.470489671113548 M8.23055876039075 0.1313006980344653 C7.477968731724079 7.510176619898833, 6.557853111826237 14.033228071243181, 8.044278062806598 22.812246701339838 M6.659301486929901 21.954010179098844 C5.943046829017729 22.58155311611282, 3.4227229853291132 23.27257990178649, 0.13260342918638413 22.099149941072707 M7.137803651588086 22.514066017569284 C4.387248407812705 22.663080054972276, 2.0190728432087806 22.383261153269995, -0.08668012467937397 22.581169493434746 M-1.0456213746219873 21.44712470996624 C-0.21843305159612503 17.056528479543445, 0.3298241205115715 11.858141492895061, -0.06167749501764774 -1.2343619968742132 M0.1608767556026578 22.318368783811685 C-0.5484638136969665 15.26019907275996, 0.1905525701893708 9.402296199781967, -0.9093360556289554 -0.5607412653043866" stroke="transparent" stroke-width="1" fill="none"></path></g></svg>
<figcaption>Clearing bloat in Tables</figcaption></p>
</figure>
<p>There are several ways to rebuild a table and reduce bloat:</p>
<ol>
<li>
<p><strong>Re-create the table</strong>: Using this method as described above often requires a lot of development, especially if the table is actively being used as it's being rebuilt.</p>
</li>
<li>
<p><strong>Vacuum the table</strong>: PostgreSQL provides a way to reclaim space occupied by bloat and dead tuples in a table using the <a href="https://www.postgresql.org/docs/current/sql-vacuum.html" rel="noopener"><code>VACUUM FULL</code> command</a>. Vacuum full requires a lock on the table, and is not an ideal solution for tables that need to be available while being vacuumed:</p>
</li>
</ol>
<div class="highlight"><pre><span></span><span class="c1">-- Will lock the table</span>
<span class="k">VACUUM</span><span class="w"> </span><span class="k">FULL</span><span class="w"> </span><span class="k">table_name</span><span class="p">;</span>
</pre></div>
<p>The two options above require either a significant effort, or some down time.</p>
<h4 id="using-pg_repack"><a class="toclink" href="#using-pg_repack">Using pg_repack</a></h4>
<p>Both built-in options for rebuilding tables are not ideal unless you can afford downtime. One popular solution for rebuilding tables and indexes without downtime is the <a href="https://reorg.github.io/pg_repack/" rel="noopener">pg_repack extension</a>.</p>
<p>Being a popular extension, <code>pg_repack</code> is likely available from your package manager or already installed by your cloud provider. To use <code>pg_repack</code>, you first need to create the extension:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pg_repack</span><span class="p">;</span>
</pre></div>
<p>To "repack" a table along with its indexes, issue the following command from the console:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>pg_repack<span class="w"> </span>-k<span class="w"> </span>--table<span class="w"> </span>table_name<span class="w"> </span>db_name
</pre></div>
<p>To rebuild a table with no downtime, the extension creates a new table, loads the data from the original table into it while keeping it up to date with new data, and then also rebuilds the indexes. When the process is finished, the two tables are switched and the original table is dropped. See <a href="https://reorg.github.io/pg_repack/#details" rel="noopener">here</a> for full details.</p>
<div class="admonition info">
<p class="admonition-title">pg_repack on RDS</p>
<p>pg_repack is a <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.FeatureSupport.Extensions" rel="noopener">supported extensions for PostgreSQL on Amazon RDS</a>. You can find <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.pg_repack" rel="noopener">more details on how to use <code>pg_repack</code> on AWS RDS</a>.</p>
</div>
<p>There are two caveats to be aware of when using <code>pg_repack</code> to rebuild tables:</p>
<ul>
<li>
<p><strong>Requires amount of storage roughly the amount of the table to rebuild</strong>: the extension creates another table to copy the data to, so it requires additional storage roughly the size of the table and its indexes.</p>
</li>
<li>
<p><strong>May require some manual cleanup</strong>: if the "repack" process failed or stopped manually, it may leave intermediate objects laying around, so you may need to do some manual cleanup.</p>
</li>
</ul>
<p>Despite these caveats, <code>pg_repack</code> is a great option for rebuilding tables and indexes with no downtime. However, because it requires some additional storage to operate, it's not a good option when you are <em>already</em> out of storage. It's a good idea to monitor the free storage space and plan rebuilds in advance.</p>
<hr>
<h2 id="the-find"><a class="toclink" href="#the-find">The "Find"</a></h2>
<p>At this point we already used all the conventional techniques we could think of and cleared up <em>a lot</em> of space. We dropped unused indexes and cleared bloat from tables and indexes, but... there was still more space to shave off!</p>
<h3 id="the-aha-moment"><a class="toclink" href="#the-aha-moment">The "Aha Moment"</a></h3>
<p>While we were looking at the sizes of the indexes after we finished rebuilding them, an interesting thing caught our eye.</p>
<p>One of our largest tables stores transaction data. In our system, after a payment is made, the user can choose to cancel and get a refund. This is not happening very often, and only a fraction of the transactions end up being cancelled.</p>
<p>In our transactions table, there are foreign keys to both the purchasing user and the cancelling user, and each field has a B-Tree index defined on it. The purchasing user has a NOT NULL constraint on it so all the rows hold a value. The cancelling user on the other hand, is nullable, and only a fraction of the rows hold any data. Most of the values in the cancelling user field are NULL.</p>
<figure>
<p><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 309.6594202898557 217.01811594202906" width="auto" height="20vh">
<g transform="translate(10.659420289855689 10) rotate(0 144.5 19)"><path d="M0 0 L289 0 L289 38 L0 38" stroke="none" stroke-width="0" fill="var(--light-color)"></path><path d="M0 0 C59.032957619894304 0, 118.06591523978861 0, 289 0 M0 0 C68.55619995994493 0, 137.11239991988987 0, 289 0 M289 0 C289 11.035891681723298, 289 22.071783363446595, 289 38 M289 0 C289 7.6242059560492645, 289 15.248411912098529, 289 38 M289 38 C228.8924437279813 38, 168.78488745596258 38, 0 38 M289 38 C201.03417980866504 38, 113.06835961733012 38, 0 38 M0 38 C0 24.59847422260791, 0 11.196948445215817, 0 0 M0 38 C0 28.538693573884665, 0 19.077387147769333, 0 0" stroke="#000000" stroke-width="2" fill="none"></path></g><g transform="translate(10 47.981884057971) rotate(0 144.49275362318838 79.51811594202903)"><path d="M-0.7780494858853485 1.3681983066036707 C90.47544329404681 -1.1169060822291295, 178.1603350345365 -1.8922651697643396, 290.2772367165998 -1.0494040810178824 M0.16908271668100122 0.3440149955665091 C79.6610272954566 0.8815468357837666, 158.92673607316198 1.2364868645741653, 289.584298538843 0.19168855099288123 M289.5521144329475 -1.304497042670846 C288.58678669248303 49.94471266770779, 291.2888890555926 102.01565966751514, 287.09366322149504 157.83330681852573 M288.7845519849004 -0.31142672430723906 C286.90449401060096 60.513345319849975, 287.6244882277583 121.8210628062069, 289.80552706801313 158.21304209369742 M287.6530878453379 159.25950799031165 C206.964409212794 156.05033231346448, 128.32039527825032 157.79604598420383, -0.47952031432682846 160.01558805720134 M289.5844737818952 158.66724233004297 C230.07805262921983 160.39482576930365, 171.22288994423585 159.6377570341142, 0.3015134409762986 158.38369116401205 M1.8084924910217524 160.7772659950661 C2.4549477136940228 110.32020957105988, 1.6431607640594708 65.35566680366985, 0.9607170913368464 -1.2252840790897608 M-0.2655061976984143 158.78656278390014 C-0.641924654866326 124.72435196602278, -1.5818253809462668 90.48645621925138, -0.1441378192976117 0.323324684984982" stroke="#000000" stroke-width="1" fill="none"></path></g><g transform="translate(103.82608695652243 18.3333333333336) rotate(0 50.5 10.5)"><text x="50.5" y="15" font-family="inherit" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">Transactions</text></g><g><g transform="translate(157.24275362318895 49.41666666666663) rotate(0 0.33333333333337123 78.50000000000003)"><path d="M0 0 C0.11111111111112375 26.166666666666675, 0.5555555555556188 130.83333333333337, 0.6666666666667425 157.00000000000006 M0 0 C0.11111111111112375 26.166666666666675, 0.5555555555556188 130.83333333333337, 0.6666666666667425 157.00000000000006" stroke="#000000" stroke-width="2" fill="none"></path></g></g><g><g transform="translate(297.15942028985523 78.5) rotate(0 -140 0.8333333333333428)"><path d="M0 0 C-46.666666666666664 0.27777777777778095, -233.33333333333334 1.3888888888889046, -280 1.6666666666666856 M0 0 C-46.666666666666664 0.27777777777778095, -233.33333333333334 1.3888888888889046, -280 1.6666666666666856" stroke="#000000" stroke-width="2" fill="none"></path></g></g><g transform="translate(15.826086956522431 50.166666666666686) rotate(0 61 10.5)"><text x="67" y="20" font-family="inherit" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">Purchasing user</text></g><g transform="translate(165.82608695652198 54.166666666666686) rotate(0 58 10.5)"><text x="58" y="15" font-family="inherit" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">Cancelling user</text></g><g transform="translate(163.94739488946334 90.51508378937461) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M1.5393925315839105 4.436834412194569 C1.9237084554699764 3.132450287200631, 3.778133759205122 2.1872322643159405, 5.080774208302146 -0.20883660783644364 M0.8868998954211229 4.964794198270668 C1.9596842954022202 3.9039372495152618, 2.7741885449311208 2.9279456469240532, 4.923914575255012 0.6198165739734881 M6.885581066747498 4.662871751990929 C7.4039111006188385 3.642816521709601, 9.350239195970248 1.1055912744158745, 10.782153592769934 -0.16531250503959638 M6.822797518791003 4.638228343949481 C7.392992553833969 3.619222613762915, 8.654782288089557 1.9002519271853764, 10.512583290086074 0.2066066423512758 M11.914234246931809 4.919886868116016 C12.32949991612133 3.9266963078914663, 14.2049832459887 2.2784635893868415, 16.00268585468032 0.6868566104141047 M11.863533272746789 4.9331619830285565 C13.225729981317622 3.116236615655646, 14.381562697457163 1.3367225215603455, 15.856544847417105 0.5142950078402629 M17.03631656566473 4.114750883327416 C17.749728137831507 2.8522892297948705, 19.81229661078348 2.5761368115383636, 20.75782767506119 0.4969246737822417 M17.301147700621062 4.178466360335788 C18.866523611677422 2.7752782567982677, 20.22743592815966 0.857787462185694, 21.278406196830684 0.14274900901834398 M21.813361088012133 5.3702435537737925 C23.439859250434058 3.5702344166074114, 24.44307254223033 2.335981449700328, 26.032354306478414 0.11130425371348279 M22.036355386034373 4.928003385340156 C23.58869322089371 3.7972214031158695, 24.295230381536484 2.351130637421441, 26.05007332819736 0.42620164910449987 M27.674691682501887 4.4186266272127295 C28.640927427765888 3.826075385910765, 30.004581904342828 1.655098752274374, 32.29758180495987 0.4671399104654078 M27.742642184819285 4.257323990369555 C29.237954559331786 3.4323347580553722, 30.239292189814627 1.9995653974566763, 32.05684850359971 -0.3700500458233769 M33.137700789199876 5.070570484792897 C34.56648173777899 3.058781135669507, 36.22377825188175 1.310547544617977, 36.53164479807143 0.6612348376866898 M33.23341799309637 4.835719772392968 C34.058023009457614 3.766674485238033, 35.136804039297836 2.330611550034633, 36.664583025408284 0.12240661495369878 M38.925071651633964 4.008727499271872 C39.605544314777816 3.4756035643728964, 41.571368954478736 2.025687631517144, 42.73671871684097 0.1436242206349021 M38.42611397919818 4.622921351238283 C39.24833729063001 3.3694178569572246, 40.233942797180994 1.8515816488200425, 42.38377446820565 -0.42820165498638685 M43.18505608487004 5.056658676798326 C44.97843790353661 3.4556300770812474, 46.14328695355458 1.2728321635587418, 46.95929756383437 0.3865018779587196 M43.50933535174528 4.941150011374486 C45.00430136931589 3.2835743045501222, 45.793649091604664 2.1865219280414, 47.33671944686237 0.5182175225753531 M49.008696707435966 3.9724070151298427 C50.76520236195961 2.9146710815812273, 51.83567854728847 2.2409241081513613, 52.6074161450795 -0.12220638008427542 M49.30695859157023 4.371816060129472 C50.325924824727934 2.944835467612754, 51.34361826016059 1.405420976644909, 53.121610848461 -0.2296794512831859 M53.650104709397624 5.1174241454150895 C55.18688053562107 3.635023920882298, 56.077291808381716 2.5757029348823703, 57.87092783814223 0.5900838901545845 M54.0452842197936 4.427570469472221 C55.432606428617106 3.348254086625624, 56.67315326910066 1.9258155226560236, 58.27969060037212 0.3773794891569757 M59.64368208428289 3.8501436904116417 C61.07358324920668 2.3715731338661437, 63.19319619696277 0.6659878659635972, 63.83872682405044 -0.6258108760671248 M59.717134029747946 4.049003234230622 C60.97791504644552 3.4678071385071476, 61.67456703216618 2.237060697404451, 63.75934690217827 -0.174019567526274 M65.22439939715763 4.394907906268174 C65.62215017232198 3.65800919842936, 67.48992096088756 2.294208349838591, 69.11902585537001 0.4280843898992672 M64.64900385505383 4.676695013873661 C65.79926639797489 3.198196471337359, 67.10038684101566 2.1883605545184337, 68.78326699375003 0.21207513188604854 M70.64501801654973 4.635541040742273 C70.90341520557372 3.2095495686981783, 71.76927098680481 2.479747614322322, 74.3565784881288 -0.5095232516886425 M70.46516080154383 4.147915490729237 C71.74779030508007 2.6467744407845037, 72.8198430306947 1.1854008350229113, 74.51253438716995 0.037679363471852356 M75.79636788252506 4.38427597498259 C75.9170211256814 3.74046041070576, 77.13607731018217 2.9520791252776917, 79.25295684989271 -0.09026810718221157 M75.25479601533848 4.768675011766315 C76.84793134197857 2.7335117981247943, 78.37326813449343 1.430895243362516, 79.3137816348086 0.18292456795009288 M80.52456973861149 4.575327581746724 C82.05289594724225 2.8950351209436205, 82.92986976811221 1.850625102428568, 84.6837738716896 -0.7625524913238393 M81.13612191477037 4.210422946812805 C82.5612721304312 2.682923326669072, 84.26254159680353 0.7212182124998098, 84.82328756124855 -0.09724379110131737 M85.85131289987491 4.73381936718333 C87.10743197806742 3.4737529174772117, 88.01761692829487 2.572620855870631, 89.71896105608532 0.4973799145351301 M86.22352495846005 4.782099330337676 C87.03055860406319 3.929687226049853, 87.6026427072787 2.8982541724729005, 90.01413003136034 -0.20101458546340845 M91.2611222985768 5.382658545011359 C91.89683656176052 3.3213078022435534, 94.71856462771281 1.951256639797427, 96.21818772688553 0.0272625685473602 M90.91180525637319 5.099714334066661 C92.30631290378709 3.2125195379028098, 93.96024414461517 1.12751538330858, 95.42519561219075 -0.3951058636149697 M96.67141218358219 5.044954098495407 C98.50345053448078 2.5253629646571896, 99.89886609072077 0.9490942784100664, 100.66015370832288 -0.2934930293236917 M96.84261744223103 4.293994336775256 C97.87325012228705 3.3235045451929346, 98.60811025483213 2.7282028706106978, 100.40377019121775 0.01775278883305731 M101.80836616336356 4.6898665640488835 C102.48567703510314 3.9830100101377073, 103.00832824367748 2.438094167138435, 106.6985956160253 0.09509013256779442 M101.59817390503349 4.999364713624549 C103.459938840738 2.8058829421764564, 104.77148014763277 1.673137061442095, 106.43840418470968 -0.2054909035211936 M107.33571846802188 4.272221184543804 C108.27536606662882 3.933949229222387, 109.08634893788913 2.4869710685921387, 111.51652504098035 0.10571657686630687 M107.42606272446392 4.47982389826654 C108.27597002689917 3.5115009450030756, 109.30048178177178 2.5366333061012742, 111.47872262219332 -0.054840691973022604 M112.14823795832582 4.339892085275835 C114.4046094462894 3.4164997686057372, 114.93563530844334 1.3513171412356706, 116.64651565402599 0.8219533421613264 M112.6298198521685 4.979489764671063 C113.52883657026821 3.5234356522366697, 114.94135116190736 1.3578223217392895, 116.56446226714033 0.6229149838492263 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M5.988668010136191 3.2676014407784413 C4.7865542212842325 1.9092823058157844, 3.288144786789191 1.1306316193886807, 0.28973050091729513 -0.4342907335172237 M6.079426773715843 3.6838781941462977 C4.984077860232479 2.584834521021346, 4.074900602522962 1.3388099060441387, 1.2011976034132485 -0.8057861513486893 M12.318044095140438 2.739857139020713 C9.958344335882046 2.0081206555914424, 8.529024581909855 0.919165579217171, 6.77503414831728 -0.3127585350424943 M12.09528733410735 3.2482453108313694 C10.686559119667749 1.9541384333996044, 9.364684052488684 0.908479703961911, 7.407353292011174 -0.8353732870780823 M18.156861529668454 3.725407794692096 C16.600196023826015 1.8079381686068314, 15.883278585425394 1.7147139785837888, 12.653160798708836 -0.5019610260269096 M17.900340085971763 3.4787991116442796 C16.83353187162474 1.7859136668840638, 14.577450899318947 0.4786571852771641, 13.043598056324685 -1.205098651361694 M24.03946638650703 3.9736208034506952 C23.26002587121453 2.25399710250924, 22.05722991934483 1.6966462920558756, 19.87166071704033 -0.6632346925307594 M24.57999911884288 3.5291433872948352 C23.704787169253166 2.9745996488270428, 22.064568604704256 1.5330920841020976, 19.631744647560673 -0.8518245451599888 M30.072800944586458 3.4253303137668842 C28.126238615773037 2.217213233704073, 26.442229927547356 0.4106217342479104, 24.877215409854866 -1.3734013838442123 M30.35270576791762 3.3110573323038843 C28.257274514619464 1.5832834785922787, 26.705396051441454 -0.07815645045956021, 25.315406201191685 -1.3657127642287024 M36.942575257906974 3.530377161941987 C35.49684729404519 1.968621884838873, 34.710274741932885 2.1427141911098504, 31.940984149902675 -1.563381982415934 M36.36641831717668 3.558691170179781 C35.46897310296466 1.9881937046475007, 33.92954601586112 1.1792697593525228, 31.413097641823427 -1.0730134756844687 M42.00865651199291 3.0978496023421926 C40.65019044778788 1.693115067690257, 39.993303387437734 0.4382998224048471, 37.22708665153449 -0.4803571590623198 M42.09686184085602 3.25727915372381 C41.33112468447788 2.0606797690729826, 40.28856919692957 1.2983491997402403, 38.05826299239387 -0.7672075357688903 M48.4370457473489 3.303798025438158 C47.24807870896469 1.1528568432371005, 44.84026923050446 0.29681579673744696, 43.03993824377576 -0.5618916227649764 M48.6576968676223 3.154024742554469 C46.59181085250396 1.8032168816884084, 44.88119028777237 0.05959306472420678, 43.093944997963625 -1.1018602202348275 M54.66893563455247 3.6897117826403636 C52.69439060313656 1.4967970822850853, 50.7644606429255 0.5686225330128047, 50.36259074618318 -0.5737793601611627 M55.17725577422694 3.7111740089734604 C54.09998681947207 2.5696384731240234, 52.78346619688113 1.6647008998505306, 49.73051960014016 -1.0800700026592238 M60.92656437827033 3.8460562455017153 C58.8114592304109 1.9629532892125412, 57.817065951836646 -0.3233942372140323, 55.293303272097084 -1.9901845120153836 M60.95609307096098 3.325581116137657 C59.24395022991165 2.2150055104968587, 58.377589869390974 1.0407529222195646, 55.322248141762515 -1.2923808920923137 M67.3767943762527 3.12745740657355 C66.18989472917826 2.157739169281465, 64.3744687664649 0.6665526914433428, 61.6937624380177 -1.3966178045263165 M67.11174711993013 3.6313474594304274 C66.11136967283844 2.7236864538716508, 64.99902724205685 1.30188561489862, 61.62753812556137 -1.210404383775743 M72.23689691193863 2.8962139605001003 C71.6260139558229 2.3703436976559042, 69.4994303017901 1.1838567218759046, 68.60350419839185 -0.22293423004260027 M72.89433945810885 3.1053601424597588 C71.19281157334838 1.8477696125550043, 69.97941259847057 1.0164837827947104, 68.1004279761323 -1.0152589208427598 M78.59714510708866 3.6660106019178653 C77.36244994269165 1.7185303992522196, 75.1562061831845 0.27442845123210136, 73.3346795369567 -1.0989099079490872 M78.91172403844824 3.605606259924012 C77.28313496532839 1.9106094130627889, 76.27212267078046 0.6238955897293295, 73.57081004606314 -0.9431048308712665 M85.00863923852442 3.5283976905714836 C84.30741979840266 2.3519337947935677, 82.05292364844678 1.2391841095113856, 80.01553430605722 -0.7291821019797948 M85.42626246966532 3.3193880927950254 C83.6027023986341 2.220513846974194, 82.06330713304568 0.4693661859870144, 80.08105832568413 -1.1192306918631676 M90.87039310049789 3.0083254702358175 C89.80391605692935 1.9990284371923779, 88.04334696159377 0.6919392446058854, 85.62184106643768 -0.736295318866975 M91.19233545556257 3.4877467971729716 C88.91214838590952 1.6826067135851825, 87.24307715235483 -0.5821521119774251, 86.18368741242672 -1.351328082227043 M98.30229643450932 2.8433300897121905 C96.47036759000258 1.5573324399020663, 93.98827481889623 0.7675130184679588, 92.57768877834829 -0.9822251285593042 M97.28612393637069 3.680672440316202 C95.72394037224687 2.14847033900861, 94.02209378151151 0.5060814481356448, 92.3160598871466 -1.3704323253295394 M104.57134071014063 3.308727642080407 C102.6068121682459 2.9101604103581002, 100.78416245577088 1.0291362528804546, 98.47614313012093 -1.463042156726976 M103.8960961272771 4.077468236886557 C102.09868886708948 2.2005583230664536, 100.72142959599354 0.8834722801235716, 98.54188810565614 -0.9873280971338192 M109.10262286572372 3.188916251670278 C108.06717746231293 2.0352571609878978, 106.62739268421242 1.2877641212123985, 103.79025027883756 -1.2376099977901873 M109.33869875909414 3.5083651359208745 C107.85416890575675 1.3628753234822375, 105.48863859072928 0.1694358837547872, 104.61692543939273 -1.0009348392280482 M115.59034904900882 3.648678161033125 C115.16284503379971 3.1368210935920553, 113.83812662615615 2.104019833839385, 110.48404059099843 -0.8562831279475972 M115.72404831892756 3.4461364078025083 C114.21465780903996 1.5333897907544762, 112.01265884416503 0.23626961845643102, 111.10465751608712 -0.9566921718538538" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M1.8840626571327448 1.5885224547237158 C23.185989588825077 -1.037407684731364, 46.69996219007307 1.5737156983138325, 118.37116087686377 1.1360649671405554 M0.7441802425310016 0.32448721397668123 C28.618409254206945 -0.5781302627167648, 58.276248609815994 0.7788670782008225, 116.6742162238072 -0.07928272616118193 M117.45986536600778 0.2603812047707857 C117.14356146944424 1.3035390284538402, 117.23517688520677 2.391206429838955, 117.296524554507 3.2591292380369614 M117.34395924389491 0.11375671914366522 C117.5219226165778 1.0372537750854383, 117.22717611693898 1.9902157962151619, 117.28882946737619 3.4286306624873117 M116.70615429889517 5.073372266225476 C78.71845773555732 5.64181185072903, 41.02007734958764 6.618203941049617, 1.3209982607513666 5.376877150468488 M117.33637945160376 2.671718662574847 C88.53689239222314 4.37715907351875, 60.00874761036936 3.331154629158017, -0.8725300328806043 3.5726152416020227 M0.2106256626084752 3.278231446444655 C-0.29379012048615827 2.681823241631754, 0.17704635460715612 1.4899836582652406, -0.08980876472267635 0.04974518218981927 M-0.009024683072564349 3.3033246924475934 C0.0076270465839475485 2.2287565173201003, -0.11870761614710032 0.9921284691140803, 0.06461778745923352 0.1328549824955503" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(17.78399964104733 88.07255505374252) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M0.6559000704437494 0.9523003902286291 L133.40964469044002 -0.06809172965586185 L134.06162574617656 6.397672954799191 L-1.1521338131278753 8.036929730178372" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.4141949620097876 0.6050111744552851 C45.54198800061405 2.3952677612043276, 91.29931862161577 2.4841651682592287, 133.28839079393538 -1.1149299051612616 M-0.2999035669490695 -0.645080198533833 C47.30730687647925 0.3461694973924405, 96.5760821468729 -0.8148558777353518, 133.69056671122553 0.3187736077234149 M132.47131344992718 0.49867399661324896 C133.45620521775257 1.1260058934356882, 133.5099314254555 3.874435210500552, 132.42391372481526 6.52710958566867 M132.92331153035045 -0.09261557688349709 C132.61679828274876 1.976226030071012, 133.09093178218777 3.730407469108883, 132.95282545185486 6.670565482890404 M132.45620975746306 7.362481765301482 C103.19684351390127 6.3407837355781815, 74.60888496992445 6.471455892388942, -0.9236967954784632 6.920208684952513 M133.6119505987549 6.128631763943986 C105.16410918772317 6.878552972829952, 77.93990535756302 6.990738123214855, -0.26032599341124296 6.49219527173932 M0.5632868783599667 6.806426175983673 C-0.6167651730584072 5.447040537466737, 0.6367623645620537 3.086424866339067, 0.615730591819506 -0.5326179506165992 M0.21356087638979293 6.926829437508684 C0.008557033575723058 5.3384565330347735, -0.2598104480066321 3.855866301920013, -0.11969403288501654 0.2053834177420627" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(18.200666307713618 110.82255505374252) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-0.5297946836799383 -1.7191759143024683 L134.57694866983684 -1.137702265754342 L134.89143773643764 6.252254847289578 L-0.6296014096587896 7.198577586413876" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.314145127311349 0.9005595538765192 C48.01583600254428 -0.4682059484947716, 97.81916574995363 -0.8782995540131127, 132.8970200802246 -0.970651438459754 M0.6788537846878171 0.9510406656190753 C50.75362877650677 2.147700681190269, 101.34296542146294 0.5228376824662853, 132.35934462408068 0.897405038587749 M133.28888040842756 -0.5942833163285821 C133.44232701830066 2.1942006547640527, 133.37599604210035 5.086517745317043, 133.23596784367973 6.852799239956767 M132.72486476107795 0.22551125417782203 C132.8573965667957 2.1086732600627216, 132.93964740822386 4.293429308682917, 132.97417884546635 6.765029400074435 M131.49949518694075 5.793448082955138 C84.26117665850788 6.1386112280017375, 35.18260811481622 7.800199598801756, -1.27183554507792 5.9146267695739425 M133.27810686091425 7.157562457808808 C81.04913483137051 8.425556216257974, 30.54328158387021 7.261098567265434, 0.7754772948101163 5.752721004971818 M-0.30450345725276157 6.041199302091185 C0.23295076599769463 4.225749784874816, -0.14777726966736746 1.7470532928993467, 0.34140385329490863 0.12337390824347516 M-0.30547190097492133 6.779837293241168 C0.050252284409389336 4.476978219014427, -0.32283778266043356 2.2305994402109324, 0.2126458968158299 0.213900487486005" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(19.033999641046876 97.65588838707589) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-1.7191759143024683 1.6594407055526972 L131.7798056985298 1.9739297721534967 L132.4815395857254 6.058621816189543 L0.5103543605655432 6.003562085182921" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9005595538765192 -1.089774826541543 C31.916650143021116 0.48130856891013085, 63.943858667346376 0.8528765930686086, 131.9468565258244 1.6844141092151403 M0.9510406656190753 -0.3160299016162753 C43.14496195874078 -1.6986674681114344, 83.74444183798309 -1.4604788735317378, 133.8149130028719 0.6386176692321897 M132.32322464795556 0.4770978116844562 C132.6723650631977 2.894446226559329, 133.13109343321662 5.285818661545984, 133.08208397839257 6.665452591459706 M133.14301921846197 -0.09837197309824988 C133.03276856562547 2.3729515921823605, 133.23382828087978 4.834816328887101, 132.99431413851025 6.916925206313231 M132.02273282139095 6.997300628425137 C88.85916933178966 5.941747169956559, 46.987119285181535 4.473994057640428, -0.7735964562743902 8.413682404757992 M133.38684719624462 6.162289189958528 C82.7810063535888 7.478174106629518, 30.84189353064731 7.942999587805894, -0.9355022208765149 7.060519295597032 M-0.6470239237571478 6.861219611115741 C-0.14328414408556756 5.073568414999042, -0.3665137462480076 4.215780758239967, 0.12337390824347516 0.18796452543407316 M0.0916140673928354 6.7376734144873005 C0.04322117496016088 4.366137337367191, -0.00872611065476063 2.90397710256359, 0.213900487486005 -0.08858462770797837" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(17.90899964104733 123.78088838707583) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-0.03133909963071346 -0.7696782741695642 L131.77753552524837 0.3903953041881323 L133.68923642961772 6.7929733431353725 L0.3929115626960993 4.922266129256741" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.7191759143024683 1.6594407055526972 C47.1895295892747 2.3260247175908146, 96.21849533149933 -0.2775064642214716, 133.42786232484968 -0.684661140665412 M0.6952143358066678 0.6912037236616015 C30.477293124273622 0.4632572892361635, 62.89041916158091 1.1375819507294649, 133.36896555880548 0.3095451397821307 M132.25551173821384 -0.2269001812091314 C132.8344357277318 1.9146765077589212, 132.54829597782543 4.2236252532663885, 133.4614650021835 6.074014985511295 M132.88199768312325 0.07483086759095647 C132.77124227428723 2.678364500243415, 132.86180766171498 4.72271613195414, 133.00701136184765 6.514523115479583 M134.39513502372893 5.224009685070769 C106.7646979215627 7.611390648801965, 80.32764117523143 8.003735779245538, 0.5038911905139685 6.019880525620238 M133.56774660209658 6.357509069928483 C103.57646757613686 5.510940227991234, 73.51268355292373 5.25486213374723, 0.3255282985046506 6.264720343837098 M-0.35300457029031695 7.05974555834218 C-0.07350080291591098 5.362248815860351, -0.24700207050477846 3.8762523605523493, 0.032181718497548806 0.43830055319750216 M0.30981097107670386 6.706475072549202 C0.22497843745014043 5.154347191111585, -0.016617679420900887 3.635910905049422, 0.1326351846661491 -0.2030950849271993" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(18.82566630771339 146.53088838707583) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-0.7696782741695642 -1.1399724390357733 L133.30790326847227 0.771728465333581 L133.02225808157118 7.081134788544432 L-1.7659570965915918 6.173213653118864" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.6594407055526972 -1.137702265754342 C50.35102750628491 0.7065745128477641, 96.12459405860403 1.6528972519720622, 132.23284682361873 -1.2779210601001978 M0.6912037236616015 -0.9050551308318973 C49.74513605133823 -0.7983389187948031, 100.08081715068234 -0.4726039756433291, 133.22705310406627 0.05344242323189974 M132.69060778307502 -0.06429249675248605 C132.79493017095234 1.4682373230970247, 133.0616684630748 4.181296177411104, 132.3032997239471 7.13581410235507 M132.9923388318751 -0.14855479762380314 C133.14988118723517 1.9713686338274163, 132.74017497468463 4.148219485610455, 132.7438078539154 6.366831539375414 M131.45329442350658 7.636396352054135 C90.15381864936644 9.282335306860082, 46.51749938462572 8.17412694857227, -0.668342700228095 5.117740515471951 M132.5867938083643 6.785202818775133 C97.46757468306734 7.023719769837276, 62.459633138031634 7.87803251755418, -0.42350288201123476 5.780605661535219 M0.3715223324938475 6.640977246640446 C0.011399294083353412 4.198460774216835, -0.07516508085283488 2.182374534676395, 0.43830055319750216 -0.3193492519892899 M0.018251846700870045 6.88411647932113 C0.13233788127196794 4.926905234469019, 0.30101931097501583 3.389628154552342, -0.2030950849271993 -0.005240072350682801" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(19.65899964104642 133.36422172040932) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-1.1399724390357733 0.3903953041881323 L133.68923642961772 0.10475011728703976 L133.31041952698024 4.922266129256741 L-0.5150095727294683 6.66113800168182" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-1.137702265754342 1.9739297721534967 C35.918334556353315 -1.7324450256415793, 73.21896023038968 -1.7875047566482016, 131.63958690418394 1.4725079033523798 M-0.9050551308318973 0.08377961348742247 C36.2061982831677 -0.16062648946661406, 72.612408858117 -0.6091856246652072, 132.97095038751604 -0.28078817296773195 M132.85321546753167 -0.12753394562609988 C132.50087609501827 2.195232837193865, 133.39526537957647 3.4978487410579993, 133.3650988407909 6.711114302117311 M132.76895316666034 0.22430665598217714 C132.8524164978559 1.90173308087452, 132.99990930561546 4.00056516451696, 132.59611627781123 6.792720169365494 M133.86568109048994 7.953198306114928 C98.0724316969303 7.372445361719928, 60.50704752248687 5.046782450781665, -1.5704827103763819 5.917310647518889 M133.01448755721094 5.990153395423249 C97.30560280010678 5.294498286645465, 63.076794832858596 5.825143329303317, -0.9076175643131137 7.080776267537431 M-0.047245979207886646 6.761680896232166 C-0.18544678976772241 4.0480657905366835, 0.0674821755265805 1.8724223175272476, -0.3193492519892899 -0.2908518397734303 M0.1958932534727974 6.841464906876794 C-0.10766505761358469 4.794641371260408, 0.07067299045080833 2.054118210282265, -0.005240072350682801 -0.12869450274329353" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(19.242332974381043 159.6142217204092) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-1.9010888244956732 0.5413527693599463 L132.55703773824962 -0.25828091241419315 L133.43958481876643 7.860065940619961 L-1.9803152587264776 6.890369240046994" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M-0.02708522416651249 0.5691442582756281 C43.64898490032468 -1.043913175145103, 84.57586946268208 -0.9355093136224284, 133.28318963302763 -0.18016808293759823 M0.9604518162086606 -0.030379791744053364 C33.47402622627954 0.4512371903518153, 67.31948406683138 -0.13590798915830127, 132.2781248913193 0.3396849138662219 M133.50471484923773 0.3735124489456425 C132.56233612850247 2.2108252021626784, 132.92638679575396 5.291671418116801, 133.1531879051075 6.526939541865397 M133.24716882362935 -0.23017980781824202 C133.1005729190086 1.7189137078666354, 132.74667541755784 3.5659769550955533, 133.11877206852492 6.594741165062349 M133.47506979955824 4.75557374194409 C91.2709945855587 6.197360604637216, 46.04319767661288 6.4606983956032495, -1.8492508921772242 5.498406939060942 M133.62944900254251 6.794080995807008 C103.68214299960529 7.109692471905201, 75.19324630605527 7.90812354056403, -0.4665891481563449 5.963452570924119 M-0.5291221325140583 6.420619509358146 C0.14775073595237542 4.209812747834006, 0.12906885985196825 1.584567120446917, 0.532795040543945 -0.34419058041599665 M-0.19063576544796426 6.835525731886468 C0.24688670486100045 4.283717546552303, 0.3502984820478245 1.5167009989287776, 0.22143943972508795 -0.015164581085301054" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(19.658999641047103 182.3642217204092) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M0.5413527693599463 -0.36047022603452206 L132.65922705186995 0.5220768544822931 L134.08935067905577 4.707907967121855 L0.20214601419866085 8.478474609406248" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.5691442582756281 1.1624912228435278 C26.75273944667553 -0.3791173897858868, 55.0000648749436 0.38368217123782533, 132.73733988134654 -0.8935314808040857 M-0.030379791744053364 -0.462927277199924 C42.18481371596396 -1.8310315594298099, 83.74507996994883 -1.9159916922671054, 133.25719287815036 0.9304772363975644 M133.29102041322977 -0.29274918682613393 C132.47418222957418 1.453541958624628, 132.92623753141407 3.0001608972219342, 132.7562242803012 6.455348573002247 M132.6873281564659 0.13574190747563902 C132.89788254950784 2.2615418462677304, 132.97135348182834 5.1606589847264726, 132.82402590349815 6.759877505494167 M130.9848584803799 7.764985684157864 C106.3732725627737 5.631013260650136, 79.69174989284818 5.209520982074239, -1.1898162867873907 5.066243711711422 M133.02336573424282 6.548636901044802 C83.58795017071571 8.101813133598185, 35.02830803705923 7.167928363920069, -0.7247706549242139 7.283939379119829 M-0.26760371649018694 6.823562931505138 C-0.11170842302893397 4.922524390605937, -0.3320588214291784 3.456304936794691, -0.34419058041599665 0.46063039185601107 M0.1473025060381355 6.877380793323804 C0.12445759975875215 5.551616976611422, -0.17546158825426267 4.051311477775011, -0.015164581085301054 -0.3178726607598167" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(20.492332974380133 169.1975550537427) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-0.36047022603452206 -0.25828091241419315 L133.43958481876643 1.1718427147716284 L130.93719270555766 6.890369240046994 L1.7902513835579157 7.912758949996487" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M1.1624912228435278 -1.3861821200698614 C36.05544504775744 -1.6560244409580738, 73.27080754881881 -0.5584142653484852, 132.02397648348006 0.34189010597765446 M-0.462927277199924 0.03740228246897459 C35.18248037032929 -0.5093628349302068, 70.83442354794883 0.38006497597715705, 133.8479852006817 -0.9333218531683087 M132.624758777458 -0.417965711951558 C133.04753013320484 1.7869563300588167, 133.21190859360286 3.34444979635804, 132.68463331143806 7.161453698336681 M133.0532498717598 -0.05467860704177929 C132.76528454217805 2.375791215688023, 133.18470424039396 4.620515720032614, 132.98916224392997 6.362903601145308 M133.99427042259367 7.08884828522946 C100.52937376732001 6.067771310350436, 69.14750590538132 4.213992173215884, -1.6219795141369104 8.115096323044554 M132.7779216394806 6.716738634595231 C89.58001282342096 7.655873213069695, 44.8413372913579 6.104156080739754, 0.5957161532714963 6.5026577942460335 M0.13533970565680475 6.614707976820851 C0.1695700428078034 4.005958870651584, 0.5857072537953508 1.567235229894292, 0.46063039185601107 -0.5772913738552948 M0.18915756747547097 6.750814906162814 C0.33069649775188786 5.063087944838003, 0.2595665928146808 3.897443098139437, -0.3178726607598167 0.09051720413526271" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(165.97365481346037 99.26508378937467) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M1.5563645701062732 5.294105925164898 C2.5009033194576897 3.1945714626228, 3.564872493871019 2.8035990248601816, 4.57152999020601 0.06351167353216947 M0.7861843042214269 5.122491397741963 C2.313091094111671 3.4240555501965857, 3.6127121955890837 2.053847861867834, 5.079918162016666 0.4689531457458847 M6.900466239367915 4.385039312076583 C7.147230083144773 3.2547477890967147, 8.378352559046226 2.0430507759308, 10.894611342390013 -0.5172805941581962 M6.567777822547895 4.782470992916092 C7.566642930913744 3.7339659906655385, 8.236274636832917 2.315981062028338, 10.683232471206171 0.21322849058431992 M11.84624069879188 4.80332460556602 C12.430193664608687 3.360137761186192, 13.467806161986804 2.0827256468343496, 15.870274908564468 0.27807665215106514 M11.906528616662907 4.961408157967533 C12.494916435106193 3.7134642640124342, 13.84645410975747 2.384828361892253, 15.489294266145158 0.5614864297141304 M17.815650977062308 4.566123455734542 C19.308255924047074 3.229486313910824, 20.308962568170518 0.7737350788711064, 21.381616529913508 -0.19126274993871423 M17.276380306070916 4.431564030302307 C18.970777427452337 2.786567502252413, 20.38567719062707 0.9429778636009201, 21.28366826008808 -0.25219549320866436 M22.872244299401316 4.721740521074565 C22.797533879416168 4.197815324993202, 24.69822736573689 2.755877433016659, 26.234566244159254 0.2837480455648841 M22.54881773806435 4.5981982244675175 C23.08165185777833 3.9119443745519162, 23.79114167296591 2.6312720538919825, 26.258835394077362 0.5798837426130414 M28.00087431949303 4.380696826687421 C28.86263288441952 3.2975122811469086, 30.068303739441212 1.8982218147428511, 31.837610517485835 -0.4068777487368102 M27.675714219647652 4.577517190369185 C28.625758797353864 3.1047462473587997, 29.60396460752825 1.964087228288125, 31.997040068867452 -0.1830030066647346 M33.490478887559576 4.822156406065533 C34.38992224632274 4.352793470150357, 34.8580999147883 2.5793616242334854, 36.78457001261537 0.8142660881486258 M32.77523428561325 5.049975855054418 C34.06012413854988 3.5901935317970044, 34.61468394511081 2.6083147984266213, 36.65619291300078 0.12329059225511757 M38.013802282285084 4.5249996593102235 C40.244860450696166 3.3479304681058624, 40.833029974374575 0.9320906909783255, 42.53432182073199 -0.5569478561774843 M38.8146809304656 4.538225980929228 C39.306248437592494 3.397769344669816, 40.38184142689621 2.3710032171212205, 42.55271801473178 0.04573270802860918 M43.410172978891666 4.474249994913918 C44.950474181355865 3.198394260105821, 45.72193720851398 2.283383263779157, 47.993577229557815 0.10024601427728397 M43.49639837649712 4.460417807540727 C44.62788615848755 3.4603370480678306, 46.318145615052416 1.8843100040337009, 47.54745569010291 -0.032585013547147657 M48.96522588106346 4.392147187977948 C50.04138923026813 3.2496114816610424, 51.821371451530496 2.023948403245727, 52.796603955276964 0.16257966211426333 M49.14061930571225 4.208148409794935 C49.97250288485278 3.338637546595381, 51.24667747384562 1.9686592770975366, 53.22850971486858 -0.11449909131613295 M53.79368008902259 4.444574021994732 C55.5989186197301 3.739564788057161, 56.889121534623605 1.1396662204920354, 57.88800391061158 0.46917805832587123 M54.37136762790344 4.999763339738957 C55.72599435997549 2.9319359374592207, 57.47488903989303 1.3896624147988845, 58.09715009257124 -0.03547256698124174 M59.51906113145317 3.872129647065594 C61.10269538571836 2.776896161471627, 61.50974032166194 1.4887468166575677, 64.00243543674577 -0.24736745948656702 M59.599305712575116 4.3583049847788065 C61.02722437507495 2.7307762890315948, 61.98753550576292 1.4777905502735278, 63.95066028646533 -0.48564664688646103 M64.25647873357734 5.0965562216829605 C66.11439594508293 3.02140097341658, 67.04760825058229 1.611499740818761, 68.64739107140169 0.5454274546693698 M64.70763715701973 4.764814209461594 C66.22180667595117 3.1652357140522613, 67.3942784263958 1.511599664166543, 68.4682399875933 -0.05653906023468247 M70.0923490490792 3.951812270948166 C71.85198498862152 3.3336476925731184, 73.38598713975459 1.1016153152968222, 74.18291978076341 0.011013513395027985 M70.3349540182417 4.536278713693345 C72.02052278827549 2.545663069318533, 72.95376857452116 1.3197355035889, 74.59385234670954 -0.4630957008698715 M75.86717242818091 4.590499328353614 C77.12390741497808 2.8398505269803036, 78.54328146129676 1.9059641695826497, 78.8044047275531 0.6502344376828005 M75.56113958103347 4.678934674521149 C76.4945857347116 3.7122660974354185, 77.5136476183966 2.1880977285945407, 79.52212674235655 0.009611014505051696 M81.05525564956656 3.7149352940006524 C82.536841176963 2.336897143940687, 83.80741260505158 1.7343320889139306, 84.62228550295488 -0.06759968486596224 M80.99021558989949 3.9662151043510407 C82.6543885540998 2.3213813187190104, 83.86585751463838 0.667815549479045, 85.28120601278872 -0.515669495453875 M85.74748968845832 4.2835810763816875 C86.84261838066308 2.9479644745611715, 88.1775384823243 1.6556142068192494, 89.84483589469085 -0.1982857582588532 M85.83624501713219 4.592051492730114 C87.43102595436058 3.1608329732679725, 88.38746260864566 1.8990471399839992, 90.1186492240485 0.24493566505044245 M91.54111583955672 5.350875186817425 C91.58052103751953 3.1420314863292877, 92.9250456386334 2.185368809705369, 95.65573237169579 -0.05518779542393293 M90.92874107678507 5.008701436444821 C93.11663163755198 3.265686885460024, 94.56222832395663 0.6261506756484904, 95.45319061846517 -0.048157990510605764 M97.27228403003454 4.5298542001421875 C97.33969560284686 3.186539970130988, 97.80311674698753 2.668038491448401, 100.80760058843994 -0.08514289878057896 M96.78576108250334 4.780653921901001 C97.6023590306505 3.5380106756878154, 98.49624993967784 2.533874008755417, 100.4160837998567 -0.19204460078599508 M101.04543321403662 5.553469603693541 C103.42893239334248 3.8158356675575, 104.50485492711263 1.3214443263229314, 106.04341709389297 -0.027351717476038706 M101.65928367540228 5.029090185229164 C102.71274919841372 3.960707750422711, 103.73875900879145 3.0346178645042294, 106.09342741187606 -0.2792923549769953 M107.4695339685346 4.080246434010104 C107.71870705437958 3.378355740580109, 109.54904568729425 2.3898752829694643, 111.28473982442536 0.40152825049261054 M107.51093289433447 4.447793719846294 C108.52734503206287 3.1222711251574653, 109.86503572065315 2.033888924759041, 111.57114433696859 0.23375322480822408 M112.02416367545612 4.35018368215043 C114.27697041617691 3.129949033081702, 115.26360011311692 1.3614599519271071, 116.67677206583174 0.9292076174683855 M112.42069704370462 4.812402405130899 C113.34690058795799 3.8325563178848574, 114.53020564078255 2.565061404869489, 116.20994128892052 0.14077533553838606 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M6.353717987177973 4.195043224586613 C3.513899728720805 2.2150873477227706, 1.574454549864193 0.5529044419488525, 1.549538231569687 -1.4553244119073092 M6.51701653933533 3.379454412382535 C4.722898586242825 1.954393382828246, 3.1874571834925813 0.7630959164700497, 0.9865563403567021 -1.0541951539023608 M11.479445817348104 3.562389825239332 C9.925170355389906 1.7349504910373441, 7.958236722881505 0.756987163574414, 7.871825448867384 -1.0998896810115275 M11.55591784466852 3.1436206089812533 C10.599502738157325 2.301156356021367, 9.622551057219152 0.9129985454299542, 7.029080727115724 -0.6303533553966282 M18.619481079635804 3.433704268575188 C16.802810422139686 2.7181047899535846, 16.21718973736226 1.6083914861038526, 13.387963105338553 -1.2598177029407074 M17.984249568760056 3.6734427854106837 C16.812358190484627 2.4637724871024225, 15.62036594029017 1.0210774528706763, 13.050965254028254 -1.18309830232807 M24.313803214239652 3.8866322848773516 C22.694407794209493 1.9593061741533047, 21.99270504854148 1.141089197016397, 19.62660410667214 -1.1908737755362473 M24.78830866598035 3.3392932600635055 C22.75756697200444 2.0090159883382745, 21.657382975127668 1.055943189757858, 19.633712543616102 -1.0595140534900658 M29.982523911473514 3.862664918887007 C28.123236497344024 1.366316247143404, 25.7537153500438 0.28909283092649196, 24.619578363018547 -1.571289219682113 M29.967976395487028 3.0395207351600257 C28.49890528363874 1.5901232395876455, 26.63896038587766 0.4013450466190116, 25.115455611486425 -1.5806284267669626 M36.587140578853244 3.342984559639853 C34.52908936435557 1.4811116978766845, 32.607313366873164 0.8453695986911696, 31.069496563143645 -0.5406824275925952 M36.454557598877976 3.5299581734664107 C35.276793938737455 2.653614842282024, 33.73612364961614 1.3212289434697226, 31.43116140861212 -1.3169268187675822 M41.951873202786544 2.9107309018229883 C40.59369515983288 1.3044055313916205, 38.88372055530594 0.08936863686743812, 37.51011072157388 -0.6254722875943719 M42.17756080019916 2.876745606210623 C41.026884350095536 1.649276815559669, 39.189540255303065 0.5776435207326975, 37.975312744383416 -1.058629248517597 M48.40457164769047 3.746533413381843 C46.59578897230201 1.7078828878445198, 45.866833680885456 0.9649090810018621, 42.94575274337003 -1.190181707700701 M48.71399100290598 3.128229021012379 C46.69909824422348 2.0453380444896103, 44.43750631844519 -0.1717419354280964, 43.59932770372288 -1.4732705151364072 M55.09666227976137 3.2826026612208743 C52.852790636337105 1.9128928791429622, 50.701425819215956 0.18439243195181487, 50.07421851658472 -0.3896743075484408 M55.10246281794549 3.9803813469377864 C53.36867392891694 1.7198402487919247, 51.174629630635685 -0.2540156928497732, 49.87941953486041 -1.1313211215687915 M60.17061302231383 2.9181528695408288 C59.76511401718823 1.788124747686856, 57.5313595770064 0.9611175828604219, 55.1512954989138 -1.047403842423063 M60.5071240147831 3.1402703534826673 C58.981449423333046 2.1075285020658807, 57.46175278640539 0.6182143961051868, 55.726773440667145 -1.6516774668152892 M67.32708143911373 3.0342687711274188 C65.66866550725548 1.4667261253142978, 63.22284909318633 0.48394094059327286, 61.75937953537312 -0.793711360788869 M66.8670431068917 3.3239340815382015 C65.44128614121396 1.6668289401582825, 63.32989356942262 0.24902268876549993, 61.788807543990956 -1.274307369359734 M72.61454430923328 3.1137287700936582 C70.98067644268967 1.575368494372889, 68.88775823665443 -0.15296105629811985, 68.27934406099182 -0.9569266669433345 M72.81102006661236 2.995013215897271 C71.83722942187278 2.538412372631693, 70.5662630074305 1.692450897622408, 68.27038194655714 -0.9671525438297321 M78.52284674409596 4.009730581107201 C77.36845883117056 1.7201135270910757, 74.61063590887537 0.05899353387658324, 73.47142234432265 -1.3868555459873046 M79.28517774001514 3.4892345959848043 C77.54762799958532 2.2373161651356286, 76.46167572536018 1.0894452140853192, 73.62752504378311 -0.9534498900388603 M85.29786517539172 3.9535962118502828 C84.08471790319551 2.894877047104037, 83.41608721585456 1.5915057276983169, 79.91520422242624 -1.162649052970055 M85.56047773918247 3.5246629553116975 C84.57409358845291 2.788439147637685, 83.18582092854274 2.08106843412792, 80.5926796171539 -0.6478701539877232 M91.3142560462706 3.1165808645977306 C88.82996439000713 2.1769071178486015, 86.48877340354325 -0.1606644290307777, 86.45705104499581 -1.5154203397062487 M91.10324122903461 3.620338066058852 C89.77705751980939 2.171931816545441, 88.70697124513566 0.5937375059534686, 85.92749342095564 -1.1914903828249195 M97.38996207674846 3.351546615885245 C96.16785343949465 1.9094643308461183, 93.61627136326784 -0.286146744817524, 92.63726881329717 -0.5943581109101908 M97.74623789491945 3.5415655472424348 C96.65677610975808 2.776264327754183, 95.44197398868339 1.628075563443002, 92.32777324991116 -0.8546983169092816 M103.43162927742289 4.0981589240270075 C102.34990725474225 2.457319327577726, 99.884387587561 -0.31843773100707806, 98.11299181917786 -0.33997268951769477 M104.03716141589159 3.8400149465320963 C102.84886651070687 2.5529881346876495, 101.15557566095895 1.2594284034268182, 98.86518739245005 -0.8074552679461575 M109.80186069014137 3.4979387160871998 C108.14146504888164 2.6438793174630657, 108.2733433848893 1.334856837650165, 104.72543638573546 -1.8690439297145915 M109.52619570538056 3.173459327222488 C107.78080685152757 1.7607428236444014, 106.61358248382133 0.5757221844168452, 104.22483187137689 -1.1495565320159045 M116.58647012866977 3.00931447485152 C115.08037492283525 1.80424049841849, 113.04906460030851 0.6531201794373869, 110.86715460798581 -1.5595691299651548 M115.90358115185929 3.675514855204777 C114.84523494651779 2.7827119361894113, 113.69814914447178 1.5555982297440254, 110.99279668453958 -0.7908918026023344" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.6936921272426844 -0.21267413161695004 C24.051720587243658 -1.207226524681973, 52.94247898959621 -0.46735293659679833, 116.66252466376977 -0.44257729314267635 M0.5681388126686215 0.6286263270303607 C37.69559455642062 1.5574767827047875, 74.17641810565358 0.29424517741342104, 117.54680509112882 0.657931755296886 M117.5078514860092 0.13825536658070625 C117.21773296013258 0.8091293467644585, 117.40581788552923 1.3952605086317045, 117.21469720158798 3.4282048351086987 M117.43763449486546 0.02618494519093037 C117.26941211593444 1.2819701346386, 117.46046356301522 2.768440430514467, 117.33475497272377 3.422765358196188 M118.64358940061288 3.0031672818287802 C78.89149186682405 2.725733525654066, 36.260203888088924 3.9408535431826, -1.878772834315896 5.226347158633132 M117.99084160469579 4.21431056779619 C78.6223588779912 3.5445168761791384, 37.608769835673826 4.194205269056717, 0.25321943033486605 2.624954115569551 M0.3348770017319666 3.756265386961949 C-0.14793677586804385 1.8180595906010613, -0.10555174773600279 0.6265026468284213, 0.23962946667152746 0.12062622029609393 M-0.08474007716841493 3.5670088370266515 C-0.07250837336885896 2.9170132122322023, 0.03659643649813502 1.9401977892442988, -0.17085012360099192 -0.008617515051748098" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(160.7006663077134 107.65588838707583) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-0.9886523392051458 0.7629342284053564 L132.51564731924327 1.784803232178092 L131.1543284329632 8.249066833259121 L-0.5182266738265753 7.368450943233029" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.9030838813632727 0.7620372865349054 C38.582499120441135 -0.3969555008647584, 75.63089285988521 1.8054730188610413, 133.41419492019804 -0.4239510800689459 M0.9418623195961118 0.6360292239114642 C36.03596081701502 0.9878805006070936, 71.61379821366488 -0.28421646112958543, 132.60027941326143 -0.939386417157948 M132.7770764660154 0.41420878860918764 C133.43960575535294 3.0651252698101166, 133.15645938233095 4.615757924103919, 133.5668112942292 6.857582033368506 M132.6675047067349 0.3227435606211416 C133.25237271844333 1.2654631937306808, 132.68167640397635 2.7477060020941746, 132.77176598536957 6.919170310608771 M132.17849329961928 5.711345188172118 C89.54171765522013 6.292091978959988, 45.811884378374685 7.549845053129147, -1.114913085475564 4.718674234898344 M132.21202527741434 6.33344533372815 C94.21884057167999 6.683815724795013, 54.42967064408285 6.70647469872728, -0.2864329172298312 6.284875624665574 M0.04751241015146568 6.09920718126725 C0.6191206897195044 4.051700195285685, 0.2478109798746857 2.987161920755868, -0.20971263778346594 0.25593619835174775 M-0.3240067153966796 6.970077959054409 C0.2341349065768699 5.053788944680436, -0.23031853392962234 3.1980828040634353, 0.27076079336146164 -0.23157160169768087" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(165.9605248514622 120.39008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.5092985491406972 5.170207895837505 C2.831891926953226 3.829138366920982, 3.2791244248714366 1.7502041719398083, 5.130399638185746 0.06028930338446442 M0.8755307607491368 5.1652944468294875 C2.1581225642337865 3.9065296934859015, 2.7088058147422087 2.915326788522704, 4.98804971375922 0.48708731055606536 M6.397690379535151 4.891509324741378 C7.30942560087489 2.827532367633543, 8.9673210638573 1.6845596445227224, 10.211540682752561 -0.6210919535637137 M6.455364940696699 4.300251153820023 C7.383057618614786 3.744844127319895, 8.494162187655567 2.7117670997526924, 10.900688668944818 0.041349532835906866 M11.275743281341928 5.122486878973003 C13.119488413869533 3.3036838187348816, 14.273295778759513 2.9014060494073792, 15.38637722255983 0.052875522237735006 M11.797581005807805 4.95128457131287 C12.934202363612288 3.2948477270526806, 14.026704806431821 2.459212715864648, 15.869737219254901 0.12327599003212358 M16.893545283245167 4.002177208633712 C18.436160697042457 3.8732134297069765, 19.4874472479388 1.9152276166677795, 21.366130098791682 -0.13677943160985673 M17.287060075586908 4.55471833671405 C18.67030116661469 2.9057723359303336, 19.700008162847894 1.6678273692202519, 21.141697577943916 -0.11899327419926495 M22.54276861046699 5.400759972215922 C23.178589975262362 3.2056828053346482, 24.254163556274047 1.4178405895169426, 26.68485434917024 0.5408650308859289 M22.256224497326436 4.829874564570667 C23.532001796053542 3.103747153709851, 25.14748519880095 1.463384877669281, 26.319988186054292 0.48630526093884713 M27.736391916691517 4.350631194820966 C28.79137639577091 2.7494689094554228, 30.304287371851043 0.8192778092629859, 32.179161439879636 -0.2023028776735778 M28.06788507803083 4.457022204788741 C28.78380167250358 2.902100786308512, 30.08635276147767 1.8305799321288918, 32.14779094434047 -0.0874930483975227 M33.391478052140876 4.352829909941265 C34.26932981170784 4.166820039317687, 35.16194236890409 2.9177715814032026, 36.77099703364198 0.8329980637040456 M32.80114385249253 4.73067868441538 C34.49373619989534 3.1375061360043017, 35.354787060396255 1.9124014877783198, 36.6939539813449 0.08782063108873572 M38.50889629549919 4.008111517678143 C39.44919914838182 2.9275805529514876, 40.33742568132049 2.4868825391317744, 41.94736322164141 -0.43548455737987696 M38.534241315215965 4.637558265856596 C39.48520796112954 3.139782772929509, 40.503959562724205 2.5298983622843734, 42.33583888532199 -0.12672303194636642 M43.756100121988595 4.723374426524026 C44.98249992694415 3.493064155046104, 45.45847975699909 2.9282087985572254, 47.54309629160005 0.13488985506707896 M43.630459188493575 4.709873688659501 C44.469052703282266 3.4361790025537524, 45.449382608678725 2.53273531547676, 47.74522347231421 0.416030108294283 M49.346849435293194 4.542733411580243 C50.47726410322301 2.9346568822227743, 51.68774695349536 2.6720900875774074, 52.62058895970202 -0.2572717363537542 M49.42096765432981 4.5032845277301154 C50.15527195029708 3.2779153177969276, 51.43557788913361 1.881351224460259, 53.07114901807597 -0.07269396703329606 M53.823988335635136 4.653946256339214 C56.08550584389658 3.08582756619483, 57.43049214924518 1.281499589577535, 58.45256813866944 0.5490306393500411 M54.29072812251709 4.650198835115544 C54.96667738158475 3.789058454441019, 55.50718795004841 2.959061640020806, 57.92582302614973 0.23570420036800144 M60.33916735169853 4.419487110902264 C61.50861719138427 2.3677153838559164, 62.279348991323914 0.9879901184573024, 63.40181716968645 -0.6917366802202096 M59.76134046255986 4.337721498221799 C60.88038898898223 2.9896856198083652, 61.71430605786238 1.8112772644317021, 63.56746645978991 -0.23485409584581163 M64.6356212087055 4.282973133761818 C65.84335202954755 3.197261418849712, 66.5296440353523 3.229101075669192, 68.30395359047516 0.4747856509279147 M64.57694577175056 4.560955881595993 C65.86832301332511 3.9268648550870524, 66.59953541837615 2.579273288644567, 68.48621651003897 0.08817591512237324 M70.92789106034019 4.400237850019211 C71.85232856778512 2.965311968686901, 72.36379148203669 2.7255173897396934, 74.94715136721972 -0.1660766546042396 M70.18707360709362 4.400952185144151 C71.4438935035328 3.6170460256437647, 72.07033494863775 2.341680142928155, 74.18697382765131 -0.49608916057140956 M75.08460225682471 5.145757793166847 C76.51798716396009 3.4566730810592903, 78.1636953307809 1.9699599219232962, 79.45491866264221 0.008395124175720525 M75.43254475338149 4.8780090859133605 C76.82421617462471 2.8870111073356286, 78.3823360868451 1.846919765511096, 79.57530529116941 0.19134076036991987 M80.50818657684317 4.543725863306072 C81.86876477732567 3.1445483675490467, 83.02535775463443 2.409375919781316, 85.1658200679268 0.08938607486914518 M80.8283184551186 4.029622096309659 C82.71544555658052 2.677811653553928, 83.72669914527555 0.7501518075555438, 84.94029123884768 -0.012722403397600612 M86.11765999561773 4.774477968929293 C87.67554750093062 3.734138729296535, 87.9374009721142 2.2234656345856525, 90.01625437079558 0.34297429346844077 M86.24939518459739 4.363785205880139 C87.61747356183459 2.639080938720554, 88.87954983839829 1.265162781380567, 89.94042823672855 0.13965674392312327 M90.72126763589766 5.6582363604471855 C92.47944058883776 3.841592778936123, 93.05601378752633 2.5392446243601454, 95.84507148493127 0.3331471941237969 M90.93747308571909 5.111504690572323 C92.10669637404102 3.794517606454439, 93.51314492814657 2.370768744532949, 95.80575426192068 -0.548524749810838 M96.53464794070959 4.49818639542417 C97.84560555409014 3.8300265012356665, 98.81629086407413 2.3265874918286396, 100.4518767305603 -0.431486606818909 M96.96780468258575 4.607273499308837 C97.83368171629192 3.5742589501138884, 98.9336713426783 2.1177290578575265, 100.82736171390394 -0.19585562640550191 M101.73792937072913 5.235449307596739 C103.62766870867017 3.6674414781795157, 104.71500103416855 1.379101686186477, 105.86316697373 0.11909789609512811 M101.94834252759802 5.0889445257824475 C102.79425163642944 3.1686743568025975, 104.6548132296482 1.6802996572928324, 106.3383784454683 -0.6479498247264457 M107.18854460772528 4.042810807026461 C108.50156989958424 2.247959023990582, 110.351621760441 0.9766173089252826, 111.77882675756133 0.4718776576886555 M107.17651836453406 4.3240257240119435 C108.26870518983776 3.4358404150506945, 109.58171530851612 1.9352913860473133, 111.48031958988301 0.23238665719652968 M111.99098010077961 4.545850472158975 C114.08769829194688 2.8379872032383644, 115.61680789869335 1.3254527405335759, 116.49493431635054 0.23920241040661805 M112.2671403630253 4.877869333612955 C113.30479342279077 3.6735436849614262, 114.32034080557493 2.506913970344653, 116.07980837533104 0.494903457868098 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M6.4027388020736025 3.0787990515366865 C4.594051670622626 1.2337941060369446, 1.8356418626524715 0.035461359011204174, 0.5056955821098481 -1.3838564877892674 M6.389873609304507 3.6630788945373816 C4.09085743375749 1.9214117797323973, 2.5127533177003922 0.29699482008569966, 0.7175516854723637 -1.177891075800249 M11.705109288570029 2.9638091787244396 C10.577598554813907 1.2869960530295956, 8.294568430855914 0.23382764634872877, 7.89639333903462 -0.7832634183467326 M11.88253236760248 2.953257154162528 C10.024338192985137 1.4294149341007065, 8.230583988071782 -0.056848805973118643, 7.009245970264527 -1.0739013194926974 M18.437682192382994 3.0196176882986987 C17.799580305075846 2.1669094953274293, 15.424377860949104 1.9621871335516583, 12.375117159980276 -1.5464460922992278 M18.298078013240954 3.0862863227413113 C16.85354460212473 1.8677604452624155, 15.265353346643934 0.4369647845968923, 12.813117520530636 -0.931041965925177 M25.160417718100128 3.415348133800081 C22.89471292152181 2.0094587557731374, 20.233585954938103 0.241895846951094, 20.041048505552144 -1.545295438820549 M24.880593989123355 3.9441266960757737 C23.268827824981283 2.4371114672272953, 21.66813231859477 1.1242197920633834, 19.316680087218092 -1.1332168418430322 M30.91385063457227 3.134915618883955 C28.17614500366939 1.9856622765372927, 27.812490493987287 1.310612188237684, 24.762924434261834 -0.9130842350171959 M30.280491058229643 3.4348059839164344 C29.26910819039192 1.9603195160263733, 27.779377240810543 1.565782057843676, 24.925854643125525 -1.3089721758132882 M36.692290849315285 4.05169003258602 C35.458045632478424 1.4128156067716673, 32.579618298952525 -0.03248842633060889, 31.72616813487521 -1.1472433731034595 M36.8001678121577 3.38056339902946 C34.96910080001915 2.108533750502394, 32.988050558266934 0.6852164818891477, 31.12170188922583 -1.2373871144447672 M42.68121404217253 3.3223889027690334 C40.966864401308634 1.4821870739470002, 39.178802723311996 1.1321188972526781, 37.41345363615518 -0.7596425337186395 M42.552702849988584 3.3586309917239294 C40.8021702319875 1.82230341045103, 39.51792161518852 0.6690048134130935, 37.75630043626269 -1.001754544156635 M48.10595299725407 3.5256581978514476 C47.43785666632186 2.2843670630098174, 45.689958641994025 1.1267388449079896, 43.96359680045315 -1.8397022479381278 M49.023765950072416 3.5747036923762097 C46.864106339822115 1.3245875861204839, 45.045048366524576 0.06446299488435647, 43.46765596509831 -1.3997656618878112 M54.93607616106325 3.4300669842646974 C53.78875873713033 2.3342819044948415, 51.81161216778193 0.22234106372697243, 49.955534558531134 -0.5830135123366489 M54.96049444846154 3.3252179872957575 C53.51379931624571 1.9459620547621337, 51.56382390093146 0.38020451065388705, 50.01258162802295 -0.6639220727635545 M60.068420763479 3.5956266530338707 C59.367094517353166 1.6610178798536466, 56.92855951814884 0.3231202058381486, 55.511757353848495 -1.793204289887052 M61.06065353958626 3.563962126897279 C58.75023258759267 1.663319530739352, 56.724061479369894 -0.23285131205398021, 55.81185432655429 -1.6122348083909117 M66.88645686826717 3.8456094369446796 C64.93463748160497 1.8050849010936343, 64.4321650581127 0.5004942056785358, 62.1474735314211 -0.5084281290536394 M67.40940548691947 3.6983804020719537 C65.62937664397539 2.3660058188339606, 64.58645923147908 1.680348557455836, 61.933985933540214 -1.0682654372537157 M72.23476954803333 3.4062308872506404 C71.74329910066396 2.3877688626889255, 70.50506655896166 1.4402197822267606, 68.57679403123718 -0.7494129531066648 M73.02155487763225 2.920875957367608 C71.19273150554056 1.75144520803659, 70.23983102556532 0.9602546465699636, 68.44999482154188 -0.874064155386739 M78.79612903360096 3.4191272771979 C77.55305901479706 2.4868215872372588, 77.06621641232005 1.7104345491689332, 73.74148960697218 -1.770020190228637 M78.84458571330464 3.0715132529443387 C78.1456237471479 2.430027417205129, 76.82346368244559 1.0683339219889292, 73.97680957346908 -0.8399174183820611 M85.63911163726002 3.8615570558901995 C84.75548114205105 2.1135615380505572, 82.30198025190445 0.6777065540336904, 79.75523761056407 -0.5109702523871683 M85.51528595050908 3.9718319015778722 C84.2648697688124 2.619116817304935, 82.25367398109448 1.3087544052639646, 80.00316028331449 -0.9356339164617836 M90.75789050862717 2.9957132784618663 C90.48462695307049 1.8306237425255738, 88.78548764228789 1.0644774135267812, 86.2467868867469 -1.1766031724241444 M91.38115963933582 3.519921715694167 C89.8882006026147 1.881644355885855, 88.10661650909476 0.71990098576239, 85.84172393452833 -1.5703697153673388 M97.1032208677183 3.8666455214485986 C95.14506568541348 1.6501760027048642, 93.35399070601416 0.6226725545711913, 91.73925593678655 -0.3739201550538678 M97.29607021513276 3.59749456430939 C95.94964254290284 1.862978414114294, 94.54825453166669 0.935044124867541, 92.68440107183346 -1.17310630545756 M103.56456122513102 3.2562067431645794 C102.36130434675846 2.4635861892721547, 99.95045400255191 0.19831590330654691, 98.72259155253053 -0.9512440232121536 M103.80380510695451 3.8976262450931927 C102.92900445215223 3.1891784646712438, 101.7368008240956 1.7648553228049664, 98.97395373365947 -0.5473987484377953 M109.13732712271278 3.9981053706739926 C108.47809348706706 2.306545653019478, 106.58475484952585 0.8639404527302139, 104.5208225570764 -1.7831671310848305 M109.82564197823059 3.4444960713402253 C108.07146533523547 2.202392645755058, 106.6428117515164 0.797277071924211, 104.34157169281092 -1.0016360903884016 M116.39014487422948 4.192957386762531 C113.51070661310561 2.6143489139081604, 111.84293422939767 0.2558443499857662, 111.05931354544003 -0.8554992771361754 M115.89791496743392 3.8631264589416547 C114.2330404660206 1.6910775691546807, 112.07620390567769 0.09931748903811721, 111.09443869713571 -0.7293017747931912" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.1372508201748133 -0.23434065841138363 C31.991630108883935 -1.7706760088648623, 65.4367551633089 -1.8775126258578128, 117.99884267982202 0.964375564828515 M-0.11952855717390776 0.23701665829867125 C30.562328027096314 1.175764466152922, 63.73866569809749 0.8781242915256181, 116.6588966324191 0.428721378557384 M117.50389364884009 -0.06971945474937447 C117.6571516126058 1.0159928101970888, 117.61829649425238 2.553871384900442, 117.4895447621707 3.343555829495461 M117.31033785380102 0.016258915912562538 C117.3403125141565 1.4049634695314255, 117.35083853833815 2.521413796449462, 117.35713774063296 3.355539125397641 M115.5327333443995 2.8704778058155966 C83.94026392008487 1.6856916946810063, 51.32700248832397 2.7943358821314153, 1.2882946934551 2.7429801923855734 M116.87949869774388 2.5808490924238754 C93.1018492956443 4.714837299741258, 68.97645685918192 3.975624935782899, -0.6042630029842257 3.090415369689424 M-0.11861939677549724 3.510392218965438 C0.16790248531686403 2.9816574310483093, 0.10973882644943883 1.7087973026245038, 0.09712929404272419 0.09058514252828276 M0.1551346205447676 3.3807691946805556 C-0.033857076349634564 2.6184976331600676, 0.04568035199197799 1.6065795482449354, 0.1511032440194743 -0.030040646842850433" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(167.98678477545945 129.14008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M1.3325042731267267 5.33371042547353 C2.699792038494034 3.1325813706239245, 3.3291424355913812 2.3693734187784097, 4.687197335953361 -0.15804175605650594 M1.3275908241187089 5.156613359447349 C2.226733512014481 3.513258627994754, 3.399135433830692 2.1602836853639777, 5.113995343124962 0.20856242429811894 M7.091482343812776 4.123416969981804 C7.30658959848112 3.4999383861675577, 8.439708095914234 1.457165848285649, 10.04349272078736 -0.4642639758914945 M6.5002241728914205 4.3002484252311435 C8.30151420680029 3.0240389895357955, 9.62356533052007 1.7563610593154908, 10.70593420718698 -0.15127397979723792 M11.949367930613297 5.257931857225056 C12.427352032688278 4.1221275049204635, 14.327362606598342 2.072704101284397, 15.344368229157705 0.23675500833662055 M11.778165622953164 4.951338794962074 C12.30658245770941 3.8811446099439193, 13.653170464675167 3.041659128769993, 15.414768696952095 0.13773591126547957 M16.86673490205618 4.638266834960869 C19.006863361734343 3.414377548426943, 19.313836639581368 1.8554174884763257, 21.19238991709229 0.05090975688542165 M17.41927603013652 4.551060400968036 C18.4390364558902 2.9187041208515034, 19.873422574255343 1.5309697915641336, 21.21017607450288 0.006514123564166141 M22.89222569820729 4.44384228433791 C23.670678760011885 2.4988050250084224, 24.8551643998125 1.7909808520468289, 26.496942412156972 0.6437534634035766 M22.321340290562038 4.634329840218009 C23.458350347222122 3.4257473167268215, 24.676100021744237 2.0613823501512463, 26.44238264220989 0.3890983478178025 M27.879773562594508 3.9260605781137605 C28.730439909353823 3.05234713711679, 29.255382765594014 2.5450820590819525, 31.79145114537964 -0.1659286868513341 M27.98616457256228 4.126502732539889 C29.024780142220315 2.7978572310909873, 30.547940130442363 1.7147337388859023, 31.906260974655694 -0.38065970239571884 M32.50888031028371 5.277232722437093 C34.277441944643805 4.152860220423028, 34.98574244699253 2.308620200982974, 37.45366011932616 0.4628071247693619 M32.88672908475782 5.015425362402658 C33.94005372039191 3.281016926128194, 35.35704435429691 2.0246396867067022, 36.70848268671085 0.002957351142587328 M38.20183855980276 4.424151715065482 C39.423779121687765 2.943294153634138, 41.28515410123109 0.7703204744162271, 42.22285414002441 -0.6133632914767617 M38.83128530798121 4.448133037134203 C39.74283393652125 2.9891264733525915, 41.53677530620919 0.9817643376727434, 42.53161566545792 -0.38576909048939284 M43.544009501217545 5.297019187147599 C44.39635985696542 3.6372911194260493, 45.91803508087573 2.238387191373943, 47.42013658504027 -0.05310549194103015 M43.53050876335302 4.602901728703511 C44.19987719173076 3.6684095734013127, 45.235359659780315 3.1670627400965223, 47.701276838267475 0.06805803842732772 M49.40104512805593 4.62301876144103 C49.94791452269091 3.659420960643076, 51.842784354725474 1.4439676363679865, 53.06565163540161 -0.23685304136586116 M49.3615962442058 4.185946431938477 C50.87180676109624 2.6654001388435864, 52.20644957289185 0.9792625878860053, 53.25022940472206 0.11664982647064409 M54.1391660053838 4.995242694094442 C55.15239571411431 3.8939457564828728, 55.93466030960586 3.234228373079385, 58.4988620436743 -0.39788087110258696 M54.13541858416013 4.703472806308083 C55.537255458215725 2.889076641141112, 56.96808467422777 1.3997434144488403, 58.18553560469226 0.42158302685953614 M59.942383501729026 4.466265232865408 C60.50655119420899 2.673965594343884, 61.74482082088573 1.8247295172000533, 63.295771365886225 -0.42873192089363243 M59.860617889048555 4.3583755341402135 C61.204876081282244 2.4462727844013905, 62.71403227488977 0.6761394324439924, 63.75265395026062 -0.44950158432810805 M64.43277755715748 4.817069258902604 C65.40316014538845 3.4330754010527316, 67.48830787347462 1.7897418018574243, 69.08920172960326 -0.4122904970903266 M64.71076030499165 4.922289962403858 C66.30543711755352 3.381205867268815, 67.1897857339746 1.4420067575473627, 68.70259199379771 -0.05084756637330523 M70.58771891519704 4.750619644963616 C71.37363983052494 2.9703455431481034, 73.35685051388226 2.0961071751599487, 74.48601606585326 -0.828208914435072 M70.58843325032198 4.4322904609269145 C71.65190362957345 3.2235235177608175, 72.22835010160632 1.9469712868902052, 74.15600355988609 -0.2618844556981458 M75.96014689091358 4.560789897197099 C76.87921441200385 3.605583999488882, 77.99932370813067 1.7858582144678057, 79.28739587720213 0.31601914572077305 M75.69239818366009 4.662942064178348 C76.07732198395802 3.920847490343411, 77.40838368277302 2.841153876672555, 79.47034151339633 -0.11517612408367484 M81.39579160283498 4.125935513360647 C82.03558904540473 3.2556199540727566, 83.34207705689884 2.032684709007777, 85.40606346967772 0.07863199363480933 M80.88168783583856 4.422998829486069 C82.10242650070181 2.9668805527944397, 82.74842610428958 2.0079843576980587, 85.30395499141098 -0.3997224204519796 M86.2534517410271 5.177820624602916 C87.5597906673864 3.0633546421942226, 88.39960729184287 1.8870387551591874, 90.28655972084591 0.6525265729556926 M85.84275897797795 4.627455478819046 C87.45371734780876 2.52901187335013, 89.41977471405637 0.9323853105237693, 90.0832421713006 -0.08890752908409041 M91.7641181651139 5.296132157081442 C92.90653440514463 2.8446158794118652, 94.56962150890584 1.2927851658088008, 96.31440926328347 -0.4629539667824589 M91.21738649523903 4.711695627625735 C92.65383365042436 3.3885132122585255, 93.97812140291181 2.1850559058800054, 95.43273731934883 -0.34935963551610033 M96.64174484187305 4.572702456745072 C98.53057965012616 2.9819686076636875, 99.58403475584892 1.5111682715778871, 100.17668349490964 -0.2841646997605508 M96.75083194575771 4.7162940735134615 C97.90644647097919 3.5828594372621, 98.63367844014132 2.236725688518533, 100.41231447532306 0.2588908358129615 M102.00591578661452 5.207730511201974 C104.1951240524274 2.5246857248692507, 105.65992750875675 0.3575143165050997, 106.76494463960586 -0.3711926106477662 M101.85941100480022 4.675407553509307 C102.98761121814346 3.4378154031945387, 104.54747553162483 1.5316756667905054, 105.99789691878428 -0.03172438886226581 M106.85095392782641 4.203101409248546 C108.37641155772742 2.7436378147091807, 110.42223350592256 1.470965011393794, 111.74463243376829 -0.12232615680297415 M107.1321688448119 4.328359215449081 C108.31592229662087 3.6301020717189965, 108.89028453250391 2.8717359269315947, 111.50514143327617 0.2075890877949675 M111.98090162552782 5.247771592817491 C113.35152585444035 3.7225951371248183, 114.91676688163285 1.8589468645162235, 116.13886521905513 0.02780686398520943 M112.3129204869818 4.930232837896085 C113.50273047243114 3.5092956122219094, 114.73010981795683 1.7319652533003203, 116.39456626651662 0.37306887708157177 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M5.718025342127387 3.8753739877850486 C4.293672963699714 1.941726880150926, 3.478955873103708 0.8934154863263211, 0.5648159441755807 -0.784859212816769 M6.302305185128082 3.247675474338673 C4.406684346963436 1.9341114971701878, 2.5516047506343598 0.301457763336606, 0.770781356164599 -0.7284570909133942 M11.640712111097317 3.6476553935106115 C10.282779207859775 2.1166806796907167, 9.609753040000331 1.7319711693302353, 7.301736206632557 -0.277701598301619 M11.630160086535405 3.0908777654650823 C10.265344503107869 1.8110141705295728, 8.87289979465699 0.4041310663127683, 7.011098305486592 -1.0598278151714204 M17.832847813686016 4.05932360646921 C16.56047263106247 1.6252110881643784, 15.96247906295596 0.5235664902028234, 12.576230174462237 -1.220124033524697 M17.89951644812863 3.620976785818588 C16.196618217496656 1.9229860418472637, 14.191655446876839 -0.04961156344502207, 13.191634300836288 -0.9475398317100903 M24.36490545220184 3.8427609575740016 C23.41535553867607 2.088695701158106, 22.102758859550672 2.2900867808003613, 18.713708020955355 -1.1978274104167306 M24.893684014477532 3.7127405675421423 C22.815245326727606 1.9165904061240235, 20.8483091624232 0.4794226710641176, 19.12578661793287 -0.9518727086931996 M30.122149579067887 2.733265366877007 C28.45298815025178 2.404442996706189, 27.33270430383092 0.023874201043605103, 25.383595866540887 -1.8823766929777421 M30.422039944100366 3.4410758496788767 C28.340843181522235 1.6547847924460737, 27.27968567647927 0.3230670122689556, 24.987707925744793 -1.08916686114204 M37.1752511857844 4.230091733179551 C34.81837209466615 2.099467514034439, 33.69261654687685 1.5467037193875957, 31.285763921469066 -0.8990287011563975 M36.504124552227836 3.5763633244871462 C34.82612648177355 1.5939310984336694, 32.9014490403332 -0.13509177157041008, 31.195620180127758 -1.027566987423246 M42.48362669774959 3.1836862632041614 C40.52525859822373 1.6861230371458296, 40.0882736778771 0.565564841980387, 37.809691953868324 -0.7398082026681244 M42.519868786704485 3.0255622049761755 C40.98874781268643 2.0759229097269554, 39.78692463198837 0.9697241891679245, 37.567579943430324 -0.9286640551342011 M48.823223185846444 3.47493174035248 C47.169101486195494 1.624627952394874, 45.57128013554424 1.1229632985322764, 42.76730888143101 -0.6538413657974318 M48.87226868037121 3.5499979451406434 C46.8055451914638 2.371656489035398, 45.6707219346694 0.7224097435585588, 43.20724546748133 -1.0748961309604212 M54.86395916527413 4.070563248104002 C54.05596997575476 2.7321317456070005, 52.203387619454936 1.6065395512433254, 50.16032481004694 -0.5527825928786767 M54.75911016830519 3.90463056040607 C53.767962642594384 2.7815173360168455, 52.65572571263757 1.8150000560206174, 50.07941624962003 -0.7899029600461944 M61.067195475825486 3.6497663306552948 C59.42180519200481 1.8167500292718, 58.33943344179441 1.0261858107999904, 54.98781067427871 -0.8233738027773314 M61.0355309496889 3.2714251912213896 C58.735505860756945 1.490244640955061, 56.51781417635464 0.15667504131750387, 55.168780155774854 -1.2455325025859736 M67.45350545275073 3.047831043057696 C64.93791082998172 2.5008305736995746, 63.13045885672232 0.3779961457154427, 62.40891402812657 -1.5142116591920727 M67.30627641787801 3.255088701719971 C65.45709831448613 2.135529277266349, 64.29045319405184 0.9440271471171828, 61.849076719926494 -1.220915785071405 M73.05180354483888 3.2542906878903892 C71.42075181007513 1.768748137960133, 69.91655347857123 0.7266319134171768, 68.30425639708798 -0.8135509688388339 M72.56644861495585 2.8864703040026516 C71.17743154305337 2.147687216491256, 70.2155733275634 1.0437195246854878, 68.17960519480792 -1.051736230308168 M79.20102712780056 2.8550323859850204 C78.07275111792543 2.4989736562398353, 77.16466001767213 1.2834331387639328, 73.32132580174819 -1.6970621580005845 M78.853413103547 3.5784676422892385 C77.48717173825935 1.8642443958668222, 75.39735327955356 0.3484527022879838, 74.25142857359477 -1.3061623518568692 M85.77978409950732 4.26198219589108 C83.50528978197305 1.614159189353454, 81.44715492568304 -0.07747453887166911, 80.71670293260408 -1.0184100585132272 M85.890058945195 3.839058385530126 C84.49929433194346 2.1637487504334065, 83.16232629284643 1.0234578482405214, 80.29203926852946 -1.1581386042367103 M90.95161696386116 3.765433878448479 C89.29767212350363 1.829179580362958, 87.95366364278294 0.6728175603368073, 86.0887466543493 -1.3366612377417155 M91.47582540109346 3.530907518832632 C89.2634037333601 1.690741267212872, 87.56638752952173 -0.11346721086815315, 85.6949801114061 -1.0414556486838378 M97.95887639986232 3.0148376449585435 C96.3800776272248 2.2189464731332444, 95.97994064025819 0.9540284995700095, 93.02775686473402 -0.9456272031347284 M97.68972544272312 3.5289855782051105 C95.82416763666252 2.342221229397511, 94.73160556349382 1.0165938925584095, 92.22857071433032 -0.7602933398148997 M103.48476481459276 4.203673651001077 C102.3376613492646 2.0307405948192105, 99.81474367365783 0.22221955263861215, 98.58676018959017 -1.4669645178624586 M104.12618431652137 3.7644512470594007 C102.83495267284236 2.3805533561919923, 100.92555276046987 1.5221517869592631, 98.99060546436452 -1.0517058848430554 M110.26434008388435 3.4407540834058894 C107.65500233594562 1.0207193340894731, 105.31583271173358 0.12787660647309607, 103.79251372349967 -1.1788727138451036 M109.71073078455058 3.3662659431823414 C108.70116324444629 2.5003774740494245, 107.46828596022984 1.2733693137197764, 104.57404476419609 -0.9940072566064739 M116.59551929298733 3.0215252472243024 C114.63981720039678 1.5565616414006276, 111.95781636931865 0.5590968783301888, 110.85650877046277 -0.24479220478132224 M116.26568836516645 3.952808405707286 C114.03866849966818 2.280649382683252, 112.44315153056631 0.7675961508242987, 110.98270627280576 -0.6651566065967813" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.23434065841138363 0.142077824100852 C28.715953912368914 2.4050549429488974, 59.086424035517375 1.2947160762382346, 118.33590651761847 -0.6271101627498865 M0.23701665829867125 -0.9688872648403049 C40.54571875747384 0.8165276117050515, 79.95095844667587 -0.5723418228900565, 117.80025233134734 0.809664343483746 M117.30181149804058 0.30964840602822036 C117.04848178797722 1.0714655519263452, 117.26447067737034 1.7193583378838562, 117.24525436103481 3.774069461958904 M117.38778986870251 -0.04196367157969905 C117.44389687467893 1.1169644084395502, 117.21516043451348 2.2824108995181795, 117.257237656937 3.451698091471317 M116.77217633735495 4.023190758161206 C76.14580382012839 4.315442243970393, 36.8684398630825 3.6790146626601214, -0.7268522288650274 1.5858928775116397 M116.48254762396323 3.355566626384814 C80.35674389053537 2.8080842616731956, 42.32692843072813 2.2880636277372672, -0.3794170515611768 2.629336511686404 M0.04055979771483731 3.652375844574424 C0.26388576688719817 2.4068530792622296, -0.2596626308548632 1.192243502403786, 0.09058514252828276 -0.04740227727084212 M-0.0890632265700454 3.4211925880837044 C0.11493123267303818 2.6076078582227575, 0.09656059686224477 1.8346913488649563, -0.030040646842850433 -0.049094910607272294" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(167.4605248514622 137.39008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.6636327809792811 5.437912976123173 C2.2004859760752415 4.226952148857306, 3.7809761552800425 2.588543634480981, 4.753457107538826 0.26739244589928923 M1.2526921490345948 4.722668374176849 C2.893076157371256 3.315675006092145, 4.0593697646852505 1.1757016715799575, 4.942586639201736 0.13901534628469886 M6.888673206706834 3.923559729066506 C7.8167811755684395 3.665116062906882, 9.193282153746134 1.761861752315569, 10.254336575929951 -0.02053238776626576 M6.783235177220148 4.72443837724702 C7.780195310455909 3.2706019642173962, 8.591260857205842 2.4053002574624074, 10.690039552793783 -0.0021361937664686192 M11.864336236439485 4.693022393104191 C12.832123101116894 3.466765492856924, 14.293252541030547 1.4697871681264814, 15.768112818248053 0.8118149884906583 M11.492984622124421 4.77924779070964 C12.474223981093884 3.3169147608756475, 14.08396577532556 2.4076180127643365, 15.793423126268614 0.3656934490357509 M16.953734801205123 4.2103986534938125 C18.215125911337292 3.2035176473623537, 19.171040867078883 2.9002483596160022, 21.456140953021013 -0.4228349275723574 M17.07446930832999 4.385792078142604 C18.593328624419634 2.5674006463956744, 20.28613083027282 1.1920316374214188, 21.228957590458805 0.009070832019251907 M21.714300403569613 4.411944828884037 C23.279297800514495 3.4413857570461186, 25.32263367856215 1.9551547888790175, 25.692844670314702 0.04165699519335647 M22.557017100707416 4.989632367764892 C23.926326600236063 3.2925432065914566, 24.954926295529926 1.993984194845566, 26.350287216484904 0.25080317715301526 M28.415005349560833 4.099649229532447 C28.41402131952634 3.663007678639202, 29.384683365561937 2.0508462146115596, 31.382938463657 0.11841187954537913 M27.970852136339253 4.179893810654398 C29.151002846331227 3.1444238808792417, 29.981890749989084 1.645116514312564, 31.652577547679492 0.0666367292649333 M33.05664051674466 4.210158799087713 C34.178850818027534 4.019508889309605, 34.20989773831537 2.899275209830106, 36.38170539811377 0.13645948163238375 M33.21424759259607 4.661317222530107 C34.4486520009651 3.2183493860011745, 35.82136545288414 1.4339502470899435, 36.73966816766311 -0.042691602176008536 M38.289037586148964 4.00835247280741 C39.145836703035805 3.4663063561696146, 40.86230015071601 2.699425754378073, 42.212219342212904 -0.36568845078805623 M38.717622919060425 4.250957441969904 C40.17001742219369 3.121345222071163, 41.00214852858819 1.2375657455553983, 42.48816993226834 0.04524411515807569 M43.23134681033963 5.156267819340211 C44.37349218722228 3.939623670560652, 45.161392539645064 2.882708366472267, 48.08562273602027 -0.37111153656726614 M43.71069660560289 4.850234972192776 C45.161177248017175 3.089904194708075, 46.81635661836112 1.4158002326649208, 47.21461773761573 0.34661047823617325 M49.189259245344516 4.3066743989437 C49.701047107349076 3.2945671903931624, 50.85297011705536 2.072741990438038, 53.618348823063734 -0.5909074029476635 M49.286583414306016 4.2416343392766205 C50.208256533528306 2.8854043338700746, 51.62258134818292 1.0755015901270104, 53.039567752037854 0.06801310688617906 M54.27528451872952 4.3720004052665455 C54.751167317820595 3.398595663233414, 55.520224550197646 2.664658105859668, 57.65239941840918 0.004734956219419217 M54.26894714754037 4.460755733940415 C55.19112882291247 3.8536901626917057, 55.97529565756454 2.6122107708420255, 57.854750184155264 0.27854828557707306 M60.3259730615225 4.71790602356632 C61.5987501786983 2.300839209151037, 62.10364199285903 1.0048224632885137, 63.37256714058448 -0.21989146130844262 M60.076097879888515 4.193013369762047 C61.16541959394812 3.067354877079675, 62.727114079099 1.3043253590358614, 63.487166514800535 -0.3934986783632567 M65.12219387628933 5.2322100724917 C66.30239659451297 2.799977599565531, 67.37904640648433 0.761502337955084, 68.72671317965552 0.3029149756174221 M64.78814472622102 4.745687124960506 C66.15484341978211 3.376684759111635, 67.0732096483159 2.080054104864959, 68.86624226509421 -0.08860181296582692 M70.50635676905789 3.718463362300271 C71.86991218744134 3.6291806603476133, 72.56147411136503 2.4145581894636736, 73.94298952915015 -0.46216211024539006 M70.41109307362044 4.24462090061369 C71.95683242216171 2.7967594625068632, 73.30082420069581 1.3224896128209231, 74.58792422836888 -0.419296123402728 M75.80269011739406 4.764875336640687 C76.29453464410345 2.747531740019447, 77.81159742447727 2.315747127254188, 79.15011675413427 0.11546953725176368 M75.74399734945733 4.806274262440563 C76.79807116512586 2.9760640011653474, 78.32234134386346 1.465533444010473, 79.68183323824508 0.4018740497949949 M81.13446310232571 3.937887430770543 C81.99490079271949 3.10912057465615, 83.84135676676516 1.0190819947702123, 84.69460622313575 0.12588416586648343 M81.24305825331426 4.3344207990190355 C82.58066978530059 2.5098003610886312, 84.29079976208823 0.9364831235448969, 84.97536140055539 -0.3409466110447454 M85.64426347641871 5.0510997133384965 C86.75647735203347 4.109874768157185, 88.08740020747075 2.1795883581900215, 89.96103787622044 0.18756465674742606 M86.0887755250618 4.892199763568076 C87.19385989368864 3.4489991670746076, 88.35093970094168 1.6755024321737275, 89.86082486142034 0.0051272410470205865 M91.40128832791086 5.137060529822188 C92.68020500257708 3.1654529074144793, 93.43078495070915 2.0942183646073267, 96.15902809932874 0.3328807742139024 M91.21325127250601 5.300359081979545 C92.012548947557 3.8400492494909555, 93.07103963377553 2.638415967344628, 95.6576864452944 -0.23010111699908287 M96.2712791877383 4.225111718210144 C98.26520220059713 2.9963325739213125, 99.09012389732493 1.3548952590822787, 101.19477928028905 0.6174913497294242 M96.55131226084694 4.301583745530559 C97.69375421173844 3.4627999310720625, 98.9226360300207 2.6034795421865886, 100.58908139243255 -0.22525337202223594 M101.20668477295885 5.327470338715669 C103.36456700691004 2.937246299648309, 105.4049611249472 1.7780722332996386, 106.74990180115161 0.09595236441841837 M101.90929637761774 4.6922388278399225 C103.31821544142318 3.277841446186783, 104.5123895625816 1.8433431926146149, 106.09821368617882 -0.24104548689187916 M107.4049774857278 4.274264866880611 C108.71343890640962 2.788070035169462, 109.31707772496281 2.088468924340156, 111.28351870071508 0.25666563182274305 M107.53700306462524 4.6809838255154945 C108.0860312841616 3.12222539537236, 109.12650269543003 2.3610878484570605, 111.37902958909415 0.2627585777747114 M112.2427111712832 4.656354751072115 C113.81741926793998 3.599521478843223, 114.18645041046885 2.1053450067537076, 116.68359045237156 0.059544280967860186 M112.47804475730175 4.643885451655126 C113.14877852810976 3.6926663947829215, 114.2196652425751 2.4064126642711727, 116.38533321795839 0.4845819225117552 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M6.038753206035201 3.1300926818758916 C4.723818621693496 2.1285291867907636, 3.13791404806339 0.7830587097280205, 0.25563286219106074 -1.3614096875796855 M6.510433165062298 3.393394878857277 C4.13005972867639 2.2360394515574655, 2.4373793680172717 0.3336049185055383, 0.8839488849727158 -0.8186739943018944 M12.182263897758466 2.8939493137891605 C11.164632495092897 1.887166621124267, 9.701507768673842 1.8858835342337947, 7.694971991401616 -1.1931350110919174 M11.902747525953245 3.159165903973879 C10.20691216110827 1.9279064921942974, 9.39630670044131 0.5169822903619603, 7.18241497387782 -0.6329279022180436 M18.298998231609943 3.406147212364253 C16.663144277764403 2.210227778792071, 15.85473329376894 1.0317988228814723, 12.871631023446955 -0.9257426921865394 M17.95483518962901 3.411947750548374 C16.51043599035948 2.0023286122392463, 14.106608229645348 0.15343760856561472, 12.660253699609004 -1.1205416739108462 M24.942657222817417 3.0984803421250495 C23.001491422954494 3.2358929608261473, 21.973631237132544 1.502402049608751, 19.784031145483286 -1.2302833226491294 M24.970303977825445 3.434991334594316 C22.701928544994644 2.0311390018294455, 21.22492413130367 0.5617980190221387, 19.418983300564843 -0.6548053808957868 M30.33339296116418 3.462562536919999 C28.673046880117205 2.606973976570341, 27.57646978371944 1.01041260236581, 25.027391681651093 -1.4145855081947603 M30.290241585838782 3.002524204697969 C28.24634011332845 1.9679953086904836, 26.46690308665172 0.21954108720015225, 25.014776020653983 -1.385157499576927 M36.37713034440046 3.3389572893004233 C35.34497515543884 2.4242901434088577, 34.06926468566778 0.9055546823716396, 31.05246174845061 -1.0282224750220972 M36.70015512577472 3.5681790062426892 C35.093988115850074 2.327015776613409, 34.09976614687362 0.7231080653761941, 31.649008112607138 -1.0386782751959014 M42.54837356039951 2.579493237376753 C41.700813259380475 2.4450540523519404, 40.162638775913145 0.8968843998136137, 37.96443458220428 -1.158395797892505 M42.48194969154591 3.232919805307482 C41.163669496073105 1.8943144784125137, 39.921438145895706 1.143648522031354, 37.940435365331076 -1.024593484069252 M49.171947017619246 3.1230152453869264 C47.07898134002786 1.4929833755779627, 45.235620449808906 0.505715224140042, 43.68473932102242 -1.5690918489526822 M48.89500356528894 3.38562780917768 C47.20569896906054 2.293098779975278, 45.96714488619246 0.7539871622378518, 43.67167303848 -0.8916164542250201 M55.0925247061839 3.757788503474142 C52.90777734311879 1.6673511434334716, 51.75262614980788 -0.3604394416166038, 49.96046866287687 -0.4088626391747965 M54.86109152653531 3.5467736862381507 C54.0346190894434 2.4025511021072354, 52.39318357643779 1.471350594739946, 49.593309715317794 -0.9384202632149681 M60.18001251591128 3.041108311947066 C58.755724954149606 2.2985288661187715, 57.30069067397136 0.1818664858943746, 55.12986288663623 -1.0210310928783826 M60.81374822214936 3.397384130118046 C58.865076481164905 1.7644968491946185, 57.10797838209198 -0.036615627871190104, 55.454080128964684 -1.3305266562643907 M66.54336268830161 2.9464483196070352 C65.52397321760216 2.13805774372614, 63.76885572567046 0.019868619414590394, 61.25607345255074 -1.6816352800121181 M66.87212334236179 3.5519804580757333 C65.28837370785398 1.977155273453115, 63.23428011052899 -0.10435818156351012, 61.609315610276354 -0.9294397067399345 M73.08496643074318 3.259956174941399 C71.56893728456264 1.50095402520942, 70.26520973868138 1.3593546608890896, 68.05576587875646 -0.4993613500129217 M73.04555899596303 3.02367190228928 C71.67287681224579 2.144145835278109, 70.71462234912832 1.7368549659512011, 68.22245334971038 -0.9284509337488344 M79.5299714681488 3.9272853360573072 C76.89786613779968 2.552731377637286, 75.30481919236661 0.5674010000250406, 73.43743683292284 -1.1014763260008067 M79.43475523295632 3.2443963592468084 C78.12302170314885 2.411650468483611, 77.15245725231453 1.5294228406179258, 73.8969472147686 -0.9758342494470302 M86.10065816168785 3.573837711728748 C83.8670605537796 2.4605049722698684, 81.77089279621147 0.08484396370151315, 80.85779117096259 -1.5480562554746307 M85.32651272125653 3.599282651880395 C83.93709452737835 2.651293425811132, 82.25896523907227 1.3754990242531546, 80.35951983711725 -0.903612303508389 M91.21697506916301 2.866534904387467 C90.39033630334995 2.700039490653049, 89.14235548366706 0.608007862275713, 86.54027657743069 -0.7357441755783627 M91.43327674910019 3.4649271514701327 C89.17151436063847 1.34694036322608, 87.04432828565803 -0.5007749393561702, 85.61119720457579 -1.4261512546175041 M97.56277677582457 4.089885719084785 C95.25614476593243 1.2987704096684802, 93.90450856573715 0.3826807647656454, 91.83232185632738 -0.5889180000106826 M97.64690208484299 3.502677347149374 C95.63580196991317 1.8707505467866168, 94.33340239070559 0.33113339958950927, 92.35032822612489 -1.303441504266021 M103.95339459500448 4.231803905150515 C102.99990325244588 3.100824167613683, 101.39221605206542 1.9855922357296154, 99.13470753973455 -0.3004256981917798 M103.70434327179738 4.075381375454959 C101.91804560380528 2.1266179395915157, 100.39896224975898 0.6194764690564218, 99.08442408570193 -0.6330353065421019 M110.24664030878459 3.054910243164456 C108.29821828541148 1.9383571245340734, 106.59507396268434 0.5993288313569899, 104.91256962848414 -1.560283208291859 M109.95334586890068 3.6390026072987243 C108.13350928557453 2.2655465992861474, 106.85884151434145 0.7884743144311759, 104.04584948286931 -1.4084321811589153 M115.95380990866516 3.4678346319279383 C114.03705975899312 1.9378389784554808, 112.44010205978047 -0.06739445709964764, 110.2064033328264 -0.4843723502595709 M116.36813769138362 3.784234234939025 C114.92119691659654 2.4273365895296304, 113.28339144448272 1.7972876661618962, 110.85751387348276 -0.7274643264830843" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.48375444673001766 0.6891018953174353 C39.10751305899041 -2.0577524826356917, 78.77392437879776 -1.4275613353082686, 117.16247959148245 0.8817383255809546 M0.2766791684553027 -0.4198594940826297 C26.72960303917632 0.48619844362871045, 52.17863115031482 -0.011690270446378048, 116.42956118092047 0.17714208830147982 M117.33188263682342 0.0957035415666283 C117.32522756774468 1.1666784346092876, 117.14478912220966 2.523287032747493, 117.07989295709652 3.6162418013201902 M117.46280266445187 0.13056837023972614 C117.4218588953406 0.8822536336664173, 117.40823366013377 2.0722114958595848, 117.34782981415454 3.3959186371642907 M116.81081415426092 5.10925062649693 C73.2879641513797 4.294939576213515, 30.23393472039787 2.5209902511288287, -0.5659629013389349 3.8746218895241213 M118.15323926314818 3.3116817231922937 C84.45962233541337 4.064319416637466, 52.18538845668979 3.6291881946278073, 0.2337651653215289 3.5481276567726923 M-0.2629101199759744 3.794495027187076 C-0.04156435476283983 2.3380613986824215, 0.31137420504295416 2.0230080490371862, 0.1357712119890644 -0.007303334008396556 M0.12877287445269817 3.607455699116977 C-0.09660054197418294 2.41367973642349, 0.07276723417762318 1.3097636034644708, -0.14982422125015224 0.006731646528611218" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(169.48678477545945 146.14008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M1.600209353412395 4.925640894971568 C2.4996527121755583 4.456277959056393, 2.967830380641111 2.682846113139521, 4.894300478468186 0.9177505770546608 M0.884964751466071 5.153460343960454 C2.1698546044026967 3.69367802070304, 2.7244144109636297 2.711799287332657, 4.7659233788535955 0.22677508116115272 M6.123532748137904 4.628484148216262 C8.354590916548995 3.4514149570118997, 8.942760440227394 1.035575179884363, 10.644052286584806 -0.4534633672714469 M6.924411396318418 4.6417104698352665 C7.415978903445314 3.501253833575854, 8.491571892749027 2.4744877060272583, 10.662448480584604 0.14921719693464647 M11.519903444744486 4.577734483819957 C13.060204647208685 3.30187874901186, 13.831667674366798 2.3868677526851956, 16.103307695410628 0.2037305031833222 M11.606128842349934 4.563902296446766 C12.737616624340363 3.5638215369738697, 14.427876080905238 1.9877944929397395, 15.65718615595572 0.07089947535889052 M17.07495634691628 4.495631676883987 C18.15111969612095 3.3530959705670806, 19.931101917383312 2.1274328921517647, 20.906334421129788 0.2660641510203008 M17.250349771565073 4.3116328987009735 C18.082233350705597 3.4421220355014186, 19.356407939698435 2.0721437660035744, 21.338240180721396 -0.011014602410095609 M21.903410554875407 4.548058510900767 C23.708649085582916 3.843049276963196, 24.998852000476425 1.243150709398071, 25.997734376464397 0.5726625472319067 M22.481098093756263 5.103247828644992 C23.835724825828304 3.035420426365256, 25.58461950574584 1.49314690370492, 26.206880558424057 0.06801192192479394 M27.628791597305987 3.9756141359716315 C29.212425851571183 2.8803806503776643, 29.619470787514757 1.592231305563605, 32.1121659025986 -0.14388297058052968 M27.70903617842794 4.461789473684844 C29.136954840927775 2.834260777937632, 30.09726597161574 1.5812750391795651, 32.06039075231815 -0.3821621579804237 M32.36620919943015 5.200040710589001 C34.22412641093575 3.1248854623226228, 35.15733871643511 1.7149842297248044, 36.7571215372545 0.648911943575414 M32.817367622872545 4.868298698367635 C34.33153714180399 3.268720202958304, 35.504008892248606 1.6150841530725868, 36.577970453446106 0.04694542867136209 M38.202079514932024 4.055296759854199 C39.96171545447436 3.4371321814791522, 41.4957176056074 1.205099804202856, 42.29265024661623 0.11449800230106177 M38.444684484094516 4.6397632025993785 C40.130253254128306 2.6491475582245667, 41.063499040373976 1.4232199924949338, 42.70358281256237 -0.3596112119638377 M43.97690289403373 4.693983817259651 C45.23363788083089 2.943335015886341, 46.653011927149585 2.009448658488687, 46.914135193405926 0.7537189265888379 M43.67087004688629 4.782419163427186 C44.604316200564426 3.815750586341456, 45.623378084249424 2.291582217500578, 47.63185720820937 0.11309550341108904 M49.16498611541939 3.8184197829066897 C50.64657164281581 2.4403816328467243, 51.917143070904395 1.837816577819968, 52.7320159688077 0.0358848040400751 M49.099946055752305 4.069699593257078 C50.7641190199526 2.4248658076250478, 51.97558798049119 0.7713000383850823, 53.39093647864154 -0.41218500654783763 M53.85722015431113 4.387065565287731 C54.952348846515896 3.0514489634672155, 56.287268948177115 1.7590986957252932, 57.95456636054368 -0.09480126935280919 M53.945975482985 4.695535981636159 C55.54075642021339 3.2643174621740165, 56.49719307449849 2.002531628890043, 58.22837968990133 0.3484201539564871 M60.24080241439308 4.645598439504101 C60.274578298361206 2.752303839085699, 61.427027956458815 1.9323072591223405, 63.76761658479799 0.01183016901151035 M59.71590976058881 4.352306653470441 C61.591244526960445 2.8582941811977594, 62.83032740102157 0.5958345727878738, 63.594009367743176 0.017855716080076467 M65.38201449588736 4.633338689048225 C65.44942606869968 3.2900244590370247, 65.91284721284035 2.7715229803544377, 68.91733105429276 0.018341590125454776 M64.89549154835616 4.884138410807038 C65.71208949650332 3.641495164593852, 66.60598040553066 2.6373584976614524, 68.52581426570951 -0.08856011187996143 M69.9059444274781 4.8143229640214384 C71.94894372402597 3.3249224473334036, 72.8711630386861 1.186872726275202, 74.18993061021212 0.030761831590371047 M70.43210196579152 4.364854891051972 C71.33507241408704 3.449098518360727, 72.2145093944108 2.6553071875734577, 74.23279659705477 -0.18518728626759184 M75.57926443438743 4.183730922916145 C75.8284375202324 3.4818402294861497, 77.65877615314707 2.493359771875505, 79.39447029027818 0.5050127393986514 M75.6206633601873 4.5512782087523345 C76.63707549791569 3.225755614063506, 77.97476618650597 2.137373413665082, 79.6808748028214 0.337237713714265 M80.78995317029944 3.6989585908336906 C83.04275991102023 2.478723941764963, 84.02938960796024 0.7102348606103679, 85.44256156067506 0.2779825261516463 M81.18648653854794 4.16117731381416 C82.11269008280131 3.181331226568118, 83.29599513562587 1.9138363135527499, 84.97573078376384 -0.5104497557783532 M86.5300734854363 4.386045001266483 C87.6134446301795 3.6969214517081457, 87.71010486125863 2.22273607931836, 90.1311500841249 0.0739524128417628 M86.37117353566589 4.526569633284542 C87.33788147132714 3.292293346481812, 87.97350677711567 2.0108767122282525, 89.94871266842449 -0.16254086144835142 M91.2429423344889 5.617612057821529 C92.17594571106946 3.3282348097590164, 94.00999803784863 1.4021832535391994, 96.31414284337357 -0.7233094372982467 M91.40624088664626 4.802023245617451 C92.61334169360515 3.1288412926863653, 94.08205321960232 1.7274729894301026, 95.75116095216059 -0.3221801792932979 M96.36867016465902 4.98495865847425 C97.93131117378991 3.002896295768975, 99.08384639685221 1.8067984562789157, 101.22566145145798 -0.2692241551701969 M96.44514219197944 4.566189442216171 C97.80307253708999 3.6102463171145533, 99.13552780442272 2.0605990411503643, 100.38291672970632 0.2003121704447025 M102.09793681773344 4.954923653042373 C102.67273042491709 4.143242270980425, 104.47352396520175 2.8666546283954575, 106.74179910792914 -0.42915217709937525 M101.4627053068577 5.194662169877868 C102.68543910674988 3.7611716153594172, 103.88395548821825 2.1513159335920458, 106.40480125661884 -0.352432776486738 M107.08240798768057 4.722024628914431 C108.03268869736583 2.899017961483962, 109.769056959808 2.034212281897078, 111.52942040790238 -0.22202673026224173 M107.48912694631545 4.252876893359707 C108.29302192799366 2.8823694334387504, 109.89475711485191 1.8875036342154852, 111.53551335385434 -0.10943268279408613 M112.09140590444096 5.399819009012765 C113.69012623719256 3.0504545325059396, 114.8504622991196 1.9039589515862847, 115.95920708961638 0.15024072570279118 M112.07893660502397 4.694266851532495 C113.62135276790966 3.216944483167724, 114.8247503613836 2.002361974475887, 116.38424473116028 0.14223569105863448 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M5.769318972466593 3.2702040915623973 C4.184777922353994 1.396157826059134, 2.189807550405893 -0.02138521755241296, 0.5872627443851625 -0.8553662960911931 M6.032621169447978 3.2305545800146382 C4.690165310993749 1.7985076575885242, 2.5465972004025397 0.5482688136237224, 1.1299984376629535 -1.360716083834955 M11.570852246162039 3.4421151026631907 C10.02046709582907 1.6947003664883418, 9.395648274614883 1.0578656749089206, 6.891864613887372 -0.7893550011218486 M11.836068836346756 2.91213990920365 C10.10901790033319 1.983947643612705, 8.170510535380371 0.08359337511181311, 7.4520717227612465 -1.032002550352454 M18.21937733775157 3.025979774833199 C15.97550569432731 1.6562699927552875, 13.824140877206172 -0.07223045443585918, 13.196933574574926 -0.6462971939361146 M18.225177875935692 3.723758460550111 C16.491388986907143 1.4632173624042502, 14.297344688625898 -0.5106385792374468, 13.00213459285062 -1.387944007956465 M24.04803766052681 3.317589012143663 C23.642538655401196 2.1875608902896904, 21.40878421521937 1.360553725463256, 19.028720137126776 -0.6479676998202287 M24.384548652996074 3.5397064960855014 C22.85887406154602 2.506964644668715, 21.33917742461837 1.017650538708021, 19.604198078880117 -1.252241324212455 M30.449796497103932 2.7776458847397443 C28.791380565245685 1.2101032389266233, 26.345564151176525 0.22731805420559836, 24.88209459336332 -1.0503342471765436 M29.989758164881902 3.067311195150527 C28.564001199204174 1.410206053770608, 26.452608627412815 -0.007600197622174565, 24.911522601981154 -1.5309302557474085 M36.4625184424988 3.507840594157141 C34.5563392648646 1.713086939149577, 32.11460135782348 -0.3032975366332665, 31.40478481955043 -1.24125741571935 M36.69174015944107 3.3693391142613556 C35.55565107391157 2.8366381304515156, 34.07285692372891 1.8496830762740162, 31.394329019376624 -1.2531876054201472 M41.74073103235731 3.6670221124743865 C40.75125567842124 1.7044932090319929, 38.38740745931107 0.2806760719910003, 37.410938689694454 -0.9586231393209041 M42.39415760028804 3.2208826966551882 C40.904829251348175 2.147809755927324, 39.97401301629805 1.1639203693127738, 37.54474100351771 -0.5871325770793794 M48.42058023338192 3.696973325462608 C47.20743296118572 2.638254160716362, 46.53880227384477 1.334882841310642, 43.03791928041646 -1.419271939357729 M48.68319279717267 3.268040068924023 C47.696808646443124 2.5318162612500106, 46.308535986532945 1.8244455477402453, 43.71539467514412 -0.904493040375398 M55.19168068448358 3.5160170072005648 C52.70738902822011 2.5763432604514342, 50.36619804175621 0.23877171357205373, 50.33447568320879 -1.1159841971034183 M54.98066586724759 4.019774208661686 C53.65448215802237 2.571367959148274, 52.58439588334864 0.9931736485563009, 49.80491805916862 -0.7920542402220889 M60.51267713473868 3.0949237294975704 C59.29056849748488 1.6528414444584443, 56.73898642125805 -0.542769631205199, 55.75998387128738 -0.8509809972978644 M60.868952952909666 3.2849426608547603 C59.77949116774831 2.5196414413665087, 58.564689046673614 1.3714526770553275, 55.450488307901374 -1.1113212032969557 M66.55434433541309 3.8415360376393357 C65.47262231273247 2.200696441190053, 63.007102645551214 -0.575060617394751, 61.23570687716809 -0.5965955759053698 M67.1598764738818 3.5833920601444254 C65.97158156869708 2.2963652482999777, 64.27829071894915 1.0028055170391457, 61.987902450440274 -1.064078154333832 M72.90552883252963 3.227652808503845 C71.48233256859272 2.4956018953974444, 71.5953711423136 1.3735826269863889, 68.55430800018173 -1.372618030754827 M72.66924455987751 2.9495276180483785 C71.17319697086066 1.7386277578385914, 70.17271894139819 0.7228957813578295, 68.12521841644582 -0.7559145470130972 M79.70918518665997 2.7526915884638465 C78.20308998082545 1.5476176120308165, 76.1717796582987 0.3964972930497131, 73.98986966597602 -1.8161920163528285 M79.02629620984948 3.4188919688171024 C77.96795000450798 2.5260890498017368, 76.82086420246199 1.298975343356351, 74.11551174252979 -1.047514688990009 M85.49206475534587 3.761520178496477 C84.05461121671922 1.9207630736317183, 81.42422105136544 1.1088957250737872, 79.67961692951663 -0.9968650933429033 M85.51750969549752 3.514226238635939 C84.66352026955705 2.292920472446747, 83.52845068743653 1.6176655507167197, 80.32406088148286 -1.042859730923492 M90.82243858978676 3.796138933111601 C90.7756099835855 3.0097791757950727, 88.82075325706427 1.9811235184151597, 86.52960565119507 -0.6320244950082967 M91.42083083686943 3.2882978604609745 C89.97012087253111 2.219387551238216, 88.71340311853322 0.8182630498519998, 85.83919857215594 -0.9721367934093692 M98.1821165974985 3.2424177413449162 C95.94435540603256 2.9065816063447434, 95.610040228169 0.8443813930405555, 92.8127590197772 -0.9066011135633121 M97.5949082255631 3.2070960470583763 C96.17131722752133 2.5767519469334967, 94.92409057719877 1.4855227181782171, 92.09823551552186 -1.2706750230564747 M104.4603619765787 4.054279874010604 C102.59275934310172 2.1196036052910467, 100.75281098013083 1.0159369838302492, 99.23757851461053 -0.415597459443766 M104.30393944688313 3.582348939847793 C102.0782023302114 2.2962376561768787, 100.36874547619048 0.9100141827075392, 98.90496890626021 -0.6140021378201579 M109.32114495637481 3.868754070672848 C108.43352454895503 2.9113453217515817, 107.37968253777 1.6594092754330125, 104.01539764629263 -1.3490355743354856 M109.90523732050907 3.2912631958738783 C108.7348073136849 2.556444434534085, 107.41753450708173 1.1693861064687556, 104.16724867342558 -1.048104840030774 M115.87039653815273 3.4210266266173432 C114.559126491048 2.364249533864955, 112.67933300867229 0.5771727986471796, 111.22763569733937 -1.3769049442027494 M116.18679614116382 3.746263574777798 C114.45999141945708 2.2125451402883427, 113.54687817371676 1.407346669528906, 110.98454372111586 -0.8696892294221357" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M0.6891018953174353 -0.3624111320823431 C29.123832458821493 0.11797099674185574, 60.82561281175426 0.07484363878210842, 118.25326927837091 -1.609285881742835 M-0.4198594940826297 0.9623611373826861 C24.554797365023767 0.31940477088411257, 48.47724341632987 -1.266712347712918, 117.54867304109143 0.8257444007322192 M117.46723449435657 -0.029230690336674026 C117.17989395992824 0.6605536434174992, 117.20903336235858 1.4488075628849448, 117.51794033285954 3.442342157173142 M117.50209932302968 0.06218988222819888 C117.24438110393012 0.9708949766210452, 117.42087950382557 1.820959520563945, 117.29761716870364 3.6062395629544333 M119.01094915803628 3.826017866812606 C73.09245945795222 4.882679737431558, 25.61179591448436 2.57480857454017, 0.40478946827352047 3.062255452357192 M117.21338025473165 3.377486448705156 C71.65823756449531 4.840318226265352, 25.14225369536493 4.621100598024767, 0.07829523552209139 2.859146308124025 M0.3246626059364753 3.413369760381883 C-0.3040527352174651 2.8743416548712752, 0.2099457940590694 1.8216703026068612, -0.007303334008396556 -0.004261920248263329 M0.13762327786637604 3.341924582688068 C-0.035764013952607664 2.783415598348192, -0.10299234078470978 1.8642546790170842, 0.006731646528611218 0.16699495353629376" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(168.4605248514622 154.89008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.5412350933206136 4.7791320045811725 C2.3898014437187407 2.966051119073953, 3.464688773789443 1.825541752612348, 4.706651329765565 0.5056935528141671 M1.2181881390650624 4.779603134874035 C2.43581630537634 3.009125923208155, 3.836426814817765 1.4577501424243429, 4.701629657535351 0.5045533281599396 M6.809685784493876 4.721702087354674 C7.879022374932515 2.603081614200287, 8.52316913774228 1.644407387826863, 10.702317787845564 0.48054799075124266 M6.834508086377126 4.71903629373835 C7.721436709732617 3.3736898937778896, 9.309608019076641 2.0287713165991708, 10.645874619343264 -0.08763878381925208 M11.234539490438836 4.574296195846554 C13.115829181364996 3.526776647272496, 14.458151176432029 0.8230546889325857, 15.86994512202539 -0.04104937567142941 M11.660551837798046 4.8163155722846644 C12.487374282778353 3.752565205601494, 13.392996188337959 2.668961024794281, 15.639671876470972 0.03911446704761884 M16.766470177339635 4.359740042658141 C18.839988659547917 3.418708468019949, 18.756432138859566 2.5842118717236175, 21.465090110989177 -0.4380306463525173 M17.519970044082278 4.551807576668998 C18.543397651750066 3.365000376196681, 19.62901589142837 1.7996341832477132, 21.346418173422926 -0.051128906833427856 M21.89148600791839 4.938252348781855 C23.374262206386273 3.271043435307007, 24.197033057551764 1.9383919133091316, 26.359893729959044 0.7668150872003973 M22.437079583772245 4.885752390188458 C23.862900221822866 3.316775258471505, 25.065433097278024 1.7814858678869565, 26.539551482491554 0.026187584084713666 M27.510651573359226 4.838518201819029 C29.37905051491272 2.7665784201635066, 30.615204435846298 1.7553708095530443, 32.12280338261912 -0.2620848999113555 M28.21971079512239 4.23621193149906 C29.19113183712907 3.094281642754589, 30.633783126979917 1.4775519136207895, 31.985398235681302 0.0540672647080605 M32.72286566138427 4.9641156977620815 C34.27174061885099 3.11248366643923, 35.772019718843154 0.9687285867960934, 37.001350669191524 -0.16666032419535315 M33.04511024417515 5.032371596397916 C34.19869828497451 3.453783931389634, 35.27943103165267 1.6641753343290475, 36.9290760927846 0.06726193708372322 M38.104583488177646 4.224589299212338 C39.496477880137626 3.440279041807342, 39.98259161161114 2.044661422199152, 42.20187361836039 -0.5641245355502719 M38.44278795668497 4.2924886160410365 C39.56571985398436 3.1537271460887446, 40.65411393726129 1.7491488162795363, 42.777341173305885 -0.36977976138228713 M43.892594466461475 4.241876477162224 C44.764739915661096 3.6190170682361127, 46.79416326710375 1.860311329015238, 47.07229445534784 -0.34054787548368703 M43.68042018619427 4.977193698725534 C44.75420053906599 3.722724686005822, 45.66487112322653 2.2614082215966897, 47.784490824537606 0.4903892838571416 M49.66374443741989 3.781381069213071 C50.71705896196027 3.255879155202149, 52.2699208075994 1.5014557404792517, 52.98184183383799 -0.6197843188942663 M49.1466966459575 4.167439388107427 C50.31341348173052 3.3287606718648393, 50.975084084209925 2.600417922186099, 53.13521085348418 -0.14383697729197065 M53.717713438775014 4.6583999213572955 C55.75700717228737 3.504420220233135, 55.83503755251824 2.724595792616348, 58.50382860405297 0.542955167710408 M54.337223277017095 4.417801704983822 C55.65841071059085 3.1989803747236047, 56.74681797904691 1.8720154776262194, 58.23047829493807 0.2501838822174811 M59.956737936710965 3.7691962461962247 C60.79266405300983 3.051815774699893, 62.77658578550891 0.6130392793738813, 63.29537404467691 -0.40782401556833886 M60.037369480127495 4.334428505510732 C61.47020517955698 2.8483928819463125, 63.07484578603989 0.7142517797391535, 63.77040864340369 -0.21270887520849013 M65.26838859394988 4.997426264722282 C65.64177334981532 3.120515843807354, 67.17531505830993 0.9832617012839308, 69.28018163093193 -0.021473460238085884 M65.10817088383114 4.528341122985268 C66.404942240883 3.351464196657714, 67.44692231922889 1.6793869049054375, 68.8401586211735 0.2278224899772241 M70.00529250167587 4.045212564536804 C72.26523662688743 2.0733857454730464, 74.00848594413058 0.5193224398104023, 74.92826126650216 -0.010128391197486097 M70.46917657638245 4.193382726063454 C71.3525998728902 3.226941002125219, 72.58540991236679 2.093245371975073, 74.59277341025191 -0.344437447319342 M74.96791094761107 5.139570741632194 C76.41045764874487 3.9386790817256996, 77.35802333278049 3.2194993662211564, 78.83870705160973 0.4577189750458529 M75.28435187382388 4.659519960542381 C76.41594651460606 3.738414328888021, 77.42825545762763 2.4887885473265663, 79.53508942043138 -0.0689556350511083 M81.0842616479396 4.663482758609798 C82.80664846378282 2.1344052792184054, 83.63697519278213 1.3329772887678044, 84.81941951493874 -0.21170721140671844 M80.94492555190988 4.170791307578083 C81.65366590757898 3.078430232800901, 82.46776986888483 2.270408312706995, 85.01058152121634 -0.030502185812036953 M86.39269651538359 4.262323586462025 C87.14351506104948 2.51519547935812, 89.39665296749476 1.2566329416253768, 89.86988338171133 0.46896020793720583 M86.05529261418285 4.438608725467656 C87.34943198600303 3.073371889882478, 88.78353468027099 1.4348634092081944, 89.88526580989921 0.19415126494496615 M91.06342643924144 4.80583256770827 C92.32270853010913 3.556114616843055, 93.34213920137518 2.2370834499933343, 95.19257000614003 -0.7657355376715148 M90.94003701080655 4.974732586964668 C92.65261599068532 3.37480967596482, 94.36100715833608 1.0239265487729021, 95.39050321699165 -0.63171451890006 M96.47228631289244 5.040050033586783 C98.0167425974201 2.8222976620836366, 99.28966172882186 1.787194820366464, 100.5173141368618 -0.10244702718901094 M96.60895697209551 4.6694404255736 C97.36502572907686 3.730006290453081, 98.18362436941412 2.3731915319888692, 100.60912040676453 -0.12199255222203789 M101.1647467968146 5.3394238794208135 C103.06906474542387 2.989504841447742, 103.9150183318933 2.0646758624610486, 106.06399477197733 -0.722252906684069 M101.96179521992791 4.8638559815925175 C102.74939344225527 3.8654657885878034, 103.8470403445367 2.182650050732702, 106.36730779375826 0.000621480250494022 M106.88934635576261 4.643990317381457 C108.08883252448608 3.0525968219825046, 109.31255905607033 1.5849708986740145, 110.97772783268609 -0.5743249959578136 M107.4317463147494 4.632410568672702 C108.02002439177575 3.920335393997924, 108.8526598882165 2.4548209399295433, 111.47352332821481 -0.22996032873963512 M111.79300770178327 5.020178519196099 C112.74723441533912 4.109784694205149, 114.02480878773547 2.9279507301419723, 115.87249654717597 0.14564658445790285 M112.33440339701139 5.131782876241243 C113.65236178837512 3.326162358894096, 115.22482954607061 1.3375563335800293, 116.2399330616914 0.6736387587666499 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M5.764510680673723 3.569150957955487 C4.645597906126344 2.2357910605928373, 1.8800082678547427 -0.07702992789081109, 1.540147266465584 -1.1666009520674885 M6.322265701353629 3.5617541976985256 C4.829078895484 2.4911323441748117, 2.747955767634416 0.6326476653488589, 0.7732146229346936 -0.8722368316753597 M11.25267235862885 3.380763988151701 C10.248000199553816 2.117767066801332, 9.38960565163392 0.5057643077384779, 7.545937572118345 -1.233588233858436 M11.942999041572701 3.340076370771529 C10.803418122606299 2.197962384843349, 9.363961661991647 1.0288137184473378, 7.489536576882455 -0.9660292655118702 M17.785687400756064 3.979031064749237 C17.104162931928656 1.7159462317678331, 14.804428792810555 0.5724378237409167, 12.875800953405836 -1.2390536470925857 M18.13305198776688 3.1150106638358053 C17.1040618433583 2.402512978038966, 15.865606889250234 1.148048589748175, 12.970864903946334 -0.9070971067185004 M24.21604307036358 4.306727242825082 C23.831397093352102 3.010408846124513, 21.75239154231488 2.3917288161096746, 19.604830348141522 -0.8319718069378815 M24.527026598979486 3.5926370612638956 C23.238438787005876 2.754070869715282, 22.202415337526766 1.547650488617991, 19.143132504889774 -0.848563412156009 M30.73455157506204 2.990014000522482 C27.711998472707197 2.105882355158652, 26.281001335990947 0.3303286679121743, 24.80341211410097 -0.9322907201694406 M30.151475870310502 2.942669939116153 C28.442457441488084 1.7273913702590287, 26.2878205646037 -0.31441156021910377, 25.06618574007272 -1.2865914378529495 M37.220581416117305 4.165640133127135 C35.98387183482342 2.5108470965929954, 34.27503095234932 1.3051608425439767, 31.076403033800037 -0.35843737275155374 M36.57017534648283 3.414252807033744 C35.6782932300361 2.3515169377084293, 34.26486371060253 1.9582209649595241, 31.340274763593065 -0.9061685335431445 M42.529352177844146 3.5983359073256898 C40.89856641841421 1.172447737155778, 38.71358012265343 0.01706323133260429, 38.24645257352357 -0.6679368240228235 M42.510008761171946 2.8722697235612014 C41.12339709752043 2.0609760115899367, 39.75713575435648 0.8491496781510808, 37.90688303888281 -0.6289434681115984 M49.114074305938416 3.949349700064926 C46.56552558301061 1.3952897909975293, 44.875002292309695 0.8920496951163146, 43.70705965805718 -0.7441028439607338 M48.93730080144466 3.2155142818685305 C47.317440354416135 1.8383286413923878, 45.88509539748569 0.9074150228486848, 43.53912200123923 -1.2627351035997136 M55.083430772817664 2.984530981747951 C53.32607138632413 3.040513609163299, 52.987053924319014 1.7993814905820886, 49.31822034322867 -0.7775003777379168 M54.767850300886636 3.6995703992215505 C53.113470201527505 2.1648869347276585, 50.7684596839957 -0.191530907134363, 49.57364045476751 -1.193296286280267 M60.83150709138942 3.135761217075962 C59.46660591945784 1.9469178121251616, 57.483797149627016 -0.6970431796661916, 55.99819182247547 -1.9149691971279195 M60.43493372558375 3.042820411549512 C59.07493186696795 2.168639708500107, 57.484267821128476 0.5724471926261987, 55.456751854547015 -1.1636610715444646 M67.42370691916955 3.8428165450846716 C65.94931437387267 2.61496525907428, 63.6707665947877 0.7232855045712855, 62.17849103456345 -1.2625339443450443 M67.11165279213388 3.278050406618596 C66.07282919213665 2.1857174166660185, 64.63331915322146 1.2445125003096342, 62.15272002655064 -1.354861731001141 M72.58718758330211 2.5772113467023168 C71.0910221085163 1.5722098563338647, 68.83506580243423 0.46582905939618313, 68.81003844437461 -0.20834639593770476 M73.08541987466157 2.9481475589253394 C71.31793348222585 2.3074085313639907, 70.50510097135341 1.095052856269266, 68.21310880004701 -0.7992077340353063 M79.88161192599503 2.963826031150881 C77.96717670768673 2.0504188613544225, 76.8528077852396 0.5467352059869754, 73.48628560590312 -0.7223557952539303 M79.26421592638647 3.1415289262837054 C77.56115155882736 1.7584043998901109, 75.27108288267448 -0.292450474865476, 73.6727807003235 -1.1176530286883235 M85.15940583565383 3.9828077240582678 C84.53316630533732 2.5706750653109465, 83.7463649576255 1.9114712433861807, 80.6987212582682 -1.01181192659581 M85.77677335849143 3.8305026175655543 C83.95630038128927 1.7598071274131228, 82.10079573182306 0.24133949168315938, 80.2205982852452 -0.7858427189830122 M91.59291239485668 2.937183463384085 C89.94366393839859 2.4508979779641553, 87.55742815984387 0.9135255193331335, 86.06877690881016 -0.6962481570270727 M91.15042984036322 2.961795508998134 C89.15630973098664 1.891558882852161, 87.66041335156943 -0.14359300523982305, 86.10675948780487 -1.2391775035605648 M97.91169541736946 3.3416317319140307 C95.49218946488479 2.0696940967731483, 94.43869048126861 0.6191833762255833, 92.23275465254488 -1.0974117850120195 M97.40355076047473 3.3908987664726102 C96.80587000430478 2.855954844592082, 95.45112446706774 1.6218024119980632, 92.14547930585036 -1.3656720759048135 M104.14339700753973 4.438259394212287 C101.51717560173175 2.209979649798188, 100.0448151296213 1.0340402507068345, 99.25315987506985 -1.4877467272364693 M104.08973800100559 3.5643549845336335 C102.19909478690894 1.8503158479906923, 100.1752362781833 0.43224536683387615, 98.79714673093625 -0.7982272008461095 M109.480991300965 3.0116537835930353 C108.05322830218664 1.9572440335076786, 106.082227774392 0.9941373532204522, 104.58595499323268 -0.7753414316362429 M109.4353961785103 3.5894977733508573 C107.98926338931805 1.9115361097006265, 106.44065432985599 0.6612125582705999, 104.15589098899211 -1.1948901275898067 M116.45003820516114 4.303934789091976 C114.68081168289665 1.997124654739301, 113.76310261068413 1.4123410255586692, 110.74781319579918 -1.2888767904256269 M116.02729888458681 3.9541913456904454 C114.62279345163391 2.0058469454812387, 113.16898797309653 1.24124329562122, 110.68714778099087 -0.8691150027447179" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.221372226253152 -0.8587334658950567 C34.49522458871958 -1.6073016045615034, 70.70805820483977 -0.41734523754566477, 115.62666776117044 1.7088773343712091 M-0.012282784096896648 -0.9022711412981153 C42.82799382322792 0.7669470785154882, 85.50976689814307 -0.712552112530273, 117.25933603189992 0.23897371720522642 M117.70552085986255 -0.20255779791857156 C117.66186586536547 1.1778224326527134, 117.36872272844138 1.7096371458044965, 117.37118594234201 3.4710287106284503 M117.28380397201082 -0.08560229580788073 C117.46901262511183 1.0807296373240165, 117.19311787712661 2.4639649785923092, 117.20681725233472 3.348807800245013 M116.8414075010653 1.880694220744033 C92.34496209574054 2.5974649488085273, 71.73037360875074 4.011813253080225, 1.8388225641101599 5.273824582777877 M118.3009362890582 2.794819425761659 C79.64044344111988 3.96190058488064, 40.461644814678166 4.172115825597574, 0.7475630389526486 3.085285704553087 M-0.1542135723269599 3.60495917760911 C0.11780939412302584 2.699341419649306, -0.32461275187714356 1.7812937266881816, 0.19740933463365734 0.29219686287288 M-0.12114588461447473 3.345415729578865 C-0.1104389712543098 2.714665389973682, 0.15513103312274523 1.9415412849800786, -0.12194775469338784 0.03063584712024159" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(170.48678477545945 163.64008378937473) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.9414283818703943 4.909363591052372 C2.0867767494804412 3.0483022286944785, 3.9054874367785164 1.39219249789145, 5.132601585383064 0.7504824651253633 M0.941899512163257 4.851293957825531 C2.13747847664476 3.2865365175785097, 3.548668168931442 1.5682554237899962, 5.131461360728836 0.31069818090994294 M6.921675106426072 4.55826107641758 C7.22535172318142 2.767375261341396, 8.690514728816776 1.9356835629645026, 11.145132665102315 -0.002511948618316495 M6.919009312809748 4.277737422809764 C7.775958537057377 3.7054451528894248, 8.630874640324684 2.6108690907526424, 10.576945890531821 0.25369275902283794 M11.401177247486848 4.74546666037714 C13.46410150514223 3.043933913825917, 13.87002794273532 2.107811081772535, 15.25044333124854 -0.011991964939517763 M11.643196623924958 4.662786704436447 C12.529008245962126 3.636051598927434, 13.39715507378167 2.9279077794388897, 15.33060717396759 0.17685992623912 M17.224297736080608 4.953828028188637 C18.282883146037037 2.921411647881903, 19.4513600447251 2.7013432323339277, 20.89113870234963 -0.04647708608065815 M17.416365270091468 4.445762266283473 C18.461988005120762 3.325509584217287, 19.1278663290617 2.5508525985427313, 21.27804044186872 -0.1126087601133165 M22.429718074773227 4.7383514985007364 C23.06138920446536 3.3142058168305266, 24.021712217776745 2.658074291079943, 26.72289246847144 0.6789218191069305 M22.377218116179826 4.912850236569824 C23.440392262797875 3.5375694672478697, 24.53500778740666 2.8433255665796677, 25.982264965355757 0.2487189219345094 M28.36766056959257 4.71977538157102 C28.725255359644194 3.549846117693245, 30.147197339287505 2.0666078079669203, 31.731669123141863 0.09607587338082835 M27.7653542992726 4.393800766373392 C29.6986347537821 2.694860416067989, 31.15126342661289 0.8952531720203889, 32.04782128776128 0.07721421721048999 M33.12016609810452 4.653496621048162 C33.69407979000776 3.83349364660361, 33.978326485490356 3.0900841628823636, 36.454001731426764 0.35276483288970795 M33.18842199674036 4.833979671413163 C34.532874700429275 2.945456241938951, 35.66217860494218 1.3997908318822485, 36.68792399270584 0.014239035183610582 M38.418316341336954 4.4231441521897334 C39.65729633016644 2.8867301274855204, 40.289713796847984 1.721024043451135, 42.09421416185402 -0.193749554608857 M38.48621565816565 4.191066790267023 C39.52832024221131 3.1612786773398955, 40.302803664875704 2.681001565203191, 42.288558936022 0.022844308181757367 M43.06251155185574 4.50950554364441 C44.849037407479244 4.194163274378626, 45.50151069693447 2.4307227822264177, 46.9446988544895 0.09220309203380805 M43.79782877341905 4.807107626153958 C44.71716145596336 3.5571789666491553, 45.42951920580623 2.911993216127355, 47.775636013830336 0.2598216331739174 M48.63969278568876 4.584100355669775 C50.32921060113907 3.9614427817511686, 50.78612666432129 2.32636610732852, 52.7031390528611 -0.6817248544247142 M49.025751104583115 4.605211705151865 C50.31384764265893 3.1012641188940315, 51.71375770149806 1.6940805759947242, 53.17908639446339 -0.22759837001201433 M54.14361967040188 5.178106781914495 C55.5800775001887 2.671835610811852, 57.38592104247975 2.0990914110876537, 58.49278657203467 0.3330276121642435 M53.903021454028405 4.861604217979721 C55.02869231648769 3.6503130050243238, 56.0440424111871 2.779954120484246, 58.20001528654174 0.3703055830929422 M59.29209263702298 3.8759350076361443 C61.62596585238128 2.8057884565535014, 62.23493439271071 0.5765589550175122, 63.5796840305381 0.3195034895582566 M59.85732489633749 4.4023243782154164 C61.14564520110887 3.283647800015878, 61.78582871525072 1.942478343453303, 63.77479917089794 -0.0775562001833251 M65.14723068811794 4.184153851483399 C66.3723431316746 2.550864891249585, 67.34270002225676 1.5904660180387409, 68.59294261843725 0.2641544089059963 M64.67814554638093 4.975277064077939 C65.89340250925079 3.6299132247058123, 66.61742330809791 2.5231326162464804, 68.84223856865256 0.42896725212816744 M70.23269362971463 4.525064123713845 C71.13211731676772 3.4964295735959476, 72.44802229733097 2.3155428360852492, 74.64196432926002 -0.6495860070349408 M70.38086379124128 4.246931176645184 C71.60101492337965 3.2591152411921462, 72.6549323530431 1.853325278563798, 74.30765527313817 -0.4820340651446713 M75.95395983937892 4.705625267976136 C76.71188471046169 3.7137687243763935, 77.94996242309551 1.9072500824967333, 79.73671972807226 -0.3723755021159121 M75.47390905828911 4.774371281636119 C76.99071379674261 3.2768243856131924, 78.18443826621711 1.4589555961789835, 79.2100451179753 0.317520507467114 M81.5155484981387 4.511028719527867 C81.80992616707626 2.559318086914238, 83.83331220557048 1.3493185589348085, 85.10497018340186 -0.7953898285793173 M81.02285704710698 4.0019959676286545 C82.42453237780434 2.2331689523492373, 84.116148610843 0.7156368294117381, 85.28617520899654 -0.04759934770856372 M85.74129735855983 4.173624547017697 C86.7934681555785 3.671737891539838, 88.32878057101458 1.9253556533794678, 90.41254563531467 0.5186038947234997 M85.91758249756546 4.433569487841758 C87.66837936602937 2.75753800495764, 89.14963274025192 1.176803380199743, 90.13773669232243 0.016116716297264716 M90.91171437237497 4.912652856940108 C92.23681050838093 3.38053148131609, 93.49464344228797 1.9373493372602493, 95.21552653148815 -0.03130045823849048 M91.08061439163137 4.985482382531913 C92.647841394351 3.570142782417806, 93.46688773557817 1.548322044958825, 95.34954755025962 -0.10270334134905845 M97.18360848003566 4.472380204233093 C97.88869362816708 2.8510810362277255, 99.78131347149342 1.4617312680531167, 100.50572307453955 -0.4146163849507516 M96.81299887202248 4.375991799882366 C98.33377947495389 2.6661381629189362, 99.4340648378987 1.2101752841211768, 100.48617754950652 -0.09665525419386681 M102.10989035843859 5.164287031067964 C102.6314271569429 3.1128495475329956, 104.58057028838115 1.1781669994523725, 105.92359383682665 -0.6157935652995986 M101.63432246061029 4.826290961400103 C103.80927620383784 2.6318022602343625, 105.2966849893879 0.8534611177085872, 106.64646822376122 -0.34626184698404217 M107.4521334381814 4.139245755247311 C108.72785632730991 2.475537134083229, 110.12811552705952 0.7868941854116739, 110.69842978012181 -0.5706203255136827 M107.44055368947265 4.373195834945426 C108.46262580593526 3.461082780447329, 108.72856175188255 2.8432511343876303, 111.04279444734 -0.1437690368187944 M112.45522967256494 4.336930121860689 C113.8155157881226 3.348239274658195, 114.90814646956146 1.8775159427889057, 116.04530939310642 0.7945324935533095 M112.56683402961009 4.79084904799912 C113.87759529807009 3.2543380554942543, 115.00130543856552 1.5619619044025614, 116.57330156741517 0.10477855108231182 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M6.208377248546188 3.8522487296969365 C5.34481313234163 1.9367223464240146, 3.517326736624261 2.149044457492854, 0.7820714798973595 -1.042894386828953 M6.200980488289226 3.8032343659330827 C4.636054252921042 1.6609144146513342, 2.238667082733335 0.10533487904613992, 1.0764356002894884 -1.0924389383784323 M12.057666920524579 2.8732612568433598 C10.402258052001098 2.0006658667580712, 8.460566509253264 0.8249793314138901, 6.851411391120854 -1.1068711637408433 M12.016979303144407 3.2412826985600605 C10.470774654274424 1.7279894074835025, 8.884738488138627 0.9514042657575552, 7.118970359467419 -0.7679101066552995 M18.792261190136557 3.9024182504196725 C16.323086477431463 1.7641498701114129, 14.95193212597685 0.4860808619694872, 12.88362261966888 -1.6794014637014318 M17.928240789223125 3.473645758131992 C16.926872592744214 2.2661328183735026, 15.38622942673137 1.1077703232021063, 13.215579160042966 -1.5052410662295719 M25.256284561226842 4.003732703422657 C22.92042193403976 1.25778604190675, 21.264630996524563 0.46074569986905933, 19.42703165283802 -1.3527928021718345 M24.542194379665656 3.55657856336934 C23.110687321707633 2.2429694581473485, 21.25000815225443 0.42888895815353967, 19.410440047619897 -0.7461682863981061 M29.977247960706414 2.5872454402097933 C28.908581121444545 1.4117186886678192, 26.86990135404948 -0.10181814879875328, 25.36438938138864 -0.7986214769752775 M29.929903899300086 3.4420572952893997 C28.7021204581903 1.6921894514147278, 26.59298396735589 0.2496893019348847, 25.010088663705133 -1.5485647776378402 M37.289201286325515 4.115394953332406 C34.66112080875867 1.9717934106484278, 32.56364194154894 -0.12121149120452496, 32.074569921820974 -1.6445857143104274 M36.53781396023212 3.697777099275526 C34.70135947827161 1.936567419655694, 33.60631219164353 0.5507582953308177, 31.52683876102938 -0.7754482851904083 M42.75957370230625 3.421543217540328 C40.76547403458837 2.0421140968178073, 40.03717912518207 1.787766157610463, 37.90139766356414 -0.8134954759320557 M42.03350751854176 3.280786721963976 C41.20949017244975 2.290976456430697, 40.02031769232142 1.3216596136651806, 37.94039101947536 -0.7652122851861953 M49.246914688059924 2.887435087153284 C46.542190185238 1.4884501075637193, 45.883507641172095 1.0025568801235214, 43.86290828540841 -1.3750119730066372 M48.513079269863525 3.4179633289683293 C47.349607649566245 2.4799519530941256, 46.543600313500534 1.6027098551040586, 43.344276025769425 -1.4728170062117143 M54.418423162757385 3.0021857622326866 C54.23524297618792 2.762982383547139, 52.789567348772465 1.0815611620502477, 49.96583794464567 -0.22826237748656397 M55.13346258023099 3.7699066303452065 C53.97928328389795 2.2607320512715665, 52.05131756746552 0.9332474235164292, 49.55004203610332 -0.7125636252602561 M60.60733003986758 3.8853072064477985 C59.59891333680909 2.4894254653826975, 57.090027534550025 1.1843711375316188, 54.866045767037846 -1.3372112266979628 M60.51438923434113 3.1396836391665075 C59.75940215117415 1.9981928263437516, 58.294226253928095 1.0799813200831299, 55.617353892621296 -1.2426811877432726 M67.45071256089074 4.032148068599955 C66.06340508906572 2.100559808759122, 64.07136552575867 1.1428343445265974, 61.654808212835164 -1.5560215905895107 M66.88594642242465 3.705878172690601 C64.92586741229286 1.7243156179509043, 63.028851871896975 0.33999230619981125, 61.562480426179064 -0.71061126702741 M72.22278400429055 3.2520488797143483 C71.34739992079582 1.5375117002964869, 70.37552486088536 1.1902856972542353, 68.84532295425694 -0.989769865943097 M72.59372021651357 2.913949097266087 C71.19049839335213 1.839170729626159, 69.28611970109503 0.11105527942988747, 68.25446161615935 -0.8574211752954652 M78.74572588175354 3.6085844502846025 C78.06012943848432 3.0766315826658257, 76.75386173239474 1.1651977112839977, 74.36899019672289 -0.689052333843825 M78.92342877688637 3.705035406869488 C77.97936430428892 2.344145192128929, 76.43360082611862 1.1020579845149723, 73.9736929632885 -1.4818074454766923 M85.90103476767538 3.752985336698231 C83.95371680284237 2.6861354737422887, 82.68978489434045 1.261004307038347, 80.21586125839545 -0.5468657838254595 M85.74872966118268 3.9331898269444823 C83.352320260449 2.340701248773473, 81.57798413726246 0.3439283139246983, 80.44183046600824 -0.8872451064716458 M90.89308714878338 3.672730258849137 C89.68566799186821 1.1276156396229045, 87.47300566907461 -0.22952369799383066, 86.56910166974636 -0.9197130056870051 M90.91769919439743 3.1207840946006256 C89.55636318090424 1.7752273622524384, 87.17788982624067 -0.025110899685733323, 86.02617232321288 -1.3621995238893547 M97.43386261032775 2.8742030217886447 C96.47169646919774 2.4289817618190543, 95.26935841668345 1.5809726003361648, 92.30426523477587 -1.610096913127136 M97.48312964488633 3.843228198971273 C96.75088388822874 2.6228798350818052, 95.3821081601106 1.3039591407601867, 92.03600494388307 -0.8025985992214661 M104.66681746564046 3.267192374846558 C102.39642390893162 2.297539732580114, 101.23513847020983 1.3663964060524152, 98.05025748556585 -0.3889904755660475 M103.79291305596182 3.827432776593264 C102.46545458103559 2.6752580226518665, 101.50725870341593 1.8018268866326959, 98.7397770119562 -0.8919320865561404 M109.27788849680339 3.236194098602689 C107.80136095025306 1.2142040802170229, 106.3913037558402 0.6361543552841995, 104.80033942294824 -1.024534631173938 M109.85573248656121 3.443423698378436 C107.96520933817462 2.086363019065811, 106.49510937610354 0.17119873059535545, 104.38079072699469 -1.359637005606547 M116.70649669531677 3.611713091858358 C113.96700687961939 2.629731219698785, 112.90141062520749 0.8126902635013277, 110.42313125717331 -1.0107361998994633 M116.35675325191524 3.745956656585143 C114.15197457159678 2.403687537754133, 113.09438169239156 0.7506239651535789, 110.84289304485422 -0.8894830547393099" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.8587334658950567 0.07686777971684933 C36.439791531592846 2.5881162271270606, 75.58494471816766 -1.0280915513267663, 119.08040828716116 1.2178074326366186 M-0.9022711412981153 0.4697383986786008 C40.54633724309996 0.8076725407825838, 78.87093268041552 -0.09777842915187973, 117.61050466999518 -0.4996967865154147 M117.16897315487138 0.3202997568255426 C117.6831363994378 0.7684579037763417, 117.347387797279 1.4822575573018093, 117.3727272421678 3.616698786657805 M117.28592865698208 0.11500952512992277 C117.26139919894354 0.7047859754394339, 117.45681844842944 1.5666286961342282, 117.25050633178436 3.3002162780156836 M115.78239275228339 2.1797557687088442 C89.2770872658783 4.9905524806122, 63.88708672637262 4.997547979746264, 1.803992161527276 3.9754615043922854 M116.69651795730101 3.917527174547274 C87.40235726329207 4.008368085007249, 56.96996243079436 5.308252732138215, -0.38454671669751406 3.0152707155495477 M0.13512675635850935 3.582503135975769 C0.13830517339502163 2.1505180575789993, 0.10348191241902917 1.688115891091058, 0.29219686287288 -0.2340246469053716 M-0.124416691671736 3.3376634354225616 C0.05095059263493708 2.7795838591617317, 0.08475156711114255 1.7005855052814434, 0.03063584712024159 -0.01960159582430271" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(162.70066630771385 174.65588838707578) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M0.2149947490543127 -1.988468399271369 L131.5594067360321 -0.9043889548629522 L133.15826777471693 8.405032627136961 L0.10660960339009762 4.840388469250456" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.12425346113741398 1.8388225641101599 C33.10555586387472 -0.7117803593490728, 61.82459973694894 -1.5264592071388372, 134.77631863682063 -1.3500259909778833 M0.8836516244336963 0.7475630389526486 C45.44135276627561 0.2795704118378244, 91.50370307041 -0.002841312491170367, 132.47306698384227 0.3894330905750394 M133.0676825379529 0.38051338987149763 C133.50245738924718 1.7725530234427085, 132.52038410362763 4.462851402070203, 132.45048211356456 6.208588185362891 M133.08242596986037 -0.23505855795376052 C132.9965127192642 1.459163530755903, 133.00428722520002 3.029962879813268, 133.12229247144631 6.647240647672041 M131.99705615370067 7.558874551059262 C88.02335998477027 5.998541348679216, 45.21233204398928 7.952573590977342, -0.40627347491681576 7.46131551623489 M131.9509488624567 5.913818913364366 C80.89231842108057 6.773848694427449, 27.995990950590667 6.491206180913884, -0.8490435676649213 6.506168472433046 M0.04053492972923878 6.375700302711297 C0.20353963082480114 5.093501511315473, 0.2596929342667195 2.2340418589457673, 0.3734857793161992 -0.2424742245112672 M-0.1645692205659853 6.370102625237511 C0.21660612868082718 4.709325786648978, 0.02078827128944527 2.927595919743373, 0.0646250203024859 0.1270424856353129" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(169.96052485146174 184.89008378937461) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M0.7720197922688776 5.387334936238606 C2.8892656683254994 2.882304120121355, 3.4877562752851685 1.8891172397354645, 5.353462275261528 -0.08315641155119591 M1.253602374545192 4.86016399621569 C2.597011577732011 3.1606145399303847, 4.032872367143238 1.8583040023826989, 4.6903956112028125 0.6259028102119677 M6.893434538980317 4.5578299023777396 C7.822159960088756 3.7166794535678798, 8.926791848356157 1.7422022820031176, 10.879054687683416 -0.25255993631782514 M6.669181203664583 4.22508087657468 C8.064179400362042 2.7900025566796236, 9.348712718790477 1.7418460966188016, 10.420873217330259 0.06968464647305622 M11.45245309086873 4.281702547189351 C13.28256619972712 2.7335915521561898, 14.199860158286834 1.1585052606793997, 16.098079762704153 -0.15380917108385067 M11.508805927041756 4.689457455412056 C12.591232347904251 3.4025008905130405, 13.794794687347942 2.1396106207827073, 15.525656117129554 0.18439529742346908 M16.813555566462004 4.418558368431023 C18.920453272646583 2.7758941329454356, 19.780731759211832 1.4631942952163854, 21.442009806351628 0.2525841944083098 M17.52213427550929 4.621422790055713 C17.97175908518714 3.171621123745091, 18.97163105995517 2.274272541518663, 21.25371844768327 0.040409914141100955 M22.054290655826005 4.881411497251423 C24.33501659377223 3.6246421021376687, 25.334677006442863 0.8005487131291432, 25.8741356961917 0.7407671038073241 M22.58624802018774 4.795346602474631 C23.013370384707095 3.6516588629371793, 23.64711318710143 2.9299229471465527, 26.322083386361957 0.2237193123449327 M28.08249785248861 4.830258056408133 C28.81956618439908 3.4135094490909363, 29.67357124168558 2.4794234002110205, 31.40678911325226 -0.5868815076292271 M27.669750204409663 4.688537637496706 C29.24540235885721 3.2852407955780727, 30.278106467299576 1.5586717367465477, 31.779920382748195 0.03262833061285664 M32.8565852215157 4.941002802049825 C33.894742106583585 4.009735867296317, 34.86383777938648 3.33068919676734, 36.52162031009917 0.3691759287473282 M32.999917911795634 4.55329530459092 C33.550660653178205 3.720247066905668, 34.875180540599004 2.755102873786105, 36.87569792671253 0.4498074721638581 M38.902751631435144 4.199032481934073 C39.74966013101522 2.9633928302794166, 41.22689402212494 2.164939444649783, 43.08682034276031 0.2992089731945641 M38.4267994981782 4.113608036085161 C40.26608695476985 2.7040091891662055, 41.45352222338514 1.0212704464467839, 42.72128078033753 0.13899126307582316 M43.73526452231282 4.56814247720955 C44.651950256817535 3.754102012898804, 46.03848001927212 2.346565650217412, 47.37739883880156 -0.24685418063884512 M43.79036735967794 4.637204928948415 C44.83426907457057 3.7192330621245673, 45.3938065821895 2.394063542543734, 47.55908274599376 0.2170298940677391 M49.773682137676595 4.184110160633841 C50.642741026559186 2.9326968202222155, 50.85565157740864 1.587344521672506, 52.99240622925079 -0.6658533474953167 M49.284615164627404 4.632254103008576 C50.43432109417691 2.7768638137753165, 51.97240684605152 1.0862632562350365, 53.42616389826378 -0.3494124212824993 M54.763131292592355 4.423621538355611 C55.654223809140156 2.948000960851998, 56.97455124291416 1.9231797691178978, 58.27521151667388 0.1675302912738168 M54.02471062164716 4.919414342734627 C55.521601416320536 3.1087499014723377, 56.82320261373758 1.6763001398278519, 58.42138335201816 0.028194195244088144 M59.29152960558797 4.297844608132436 C60.89565035857071 3.5990798819976084, 62.48660215178714 1.7352335368910776, 63.230724507098685 0.0943475459261307 M60.011092711569766 4.328533455454453 C60.749606155455595 3.4476457736109776, 61.996990505504776 2.35340352026085, 63.96949649433945 -0.24305635527461078 M64.40986326672932 5.121356001112993 C65.52612097154717 3.7826722771124106, 66.627857060867 2.1190636565962566, 68.9456735780911 0.14030786011029617 M64.69662445551579 4.770699913137043 C65.71728871691786 3.2321181675768043, 67.12906057731097 2.174389177590864, 68.88830775259822 0.03454549288038741 M70.74940926954427 3.7057143400548638 C71.17547522702681 2.3503799308951305, 73.19554370887275 1.7219964819020825, 74.90582791067901 -0.49064733091609014 M70.3506901468178 4.534941210610294 C71.89608082404133 2.509774738706855, 73.21829709601037 1.0828980951603615, 74.25276782637702 -0.35397667171300784 M74.9420131790987 4.228386881106705 C76.69536422236025 2.7886751359824045, 77.80115625660433 1.8561430431640955, 79.61627525742827 -0.3474179585043856 M75.42634811909635 4.5602755073431736 C76.7517537668946 2.901050697872805, 78.14383829271796 1.7137614851838558, 79.26212645554645 0.3357664041641584 M80.73731136392048 4.226322872026112 C82.3301617854296 3.549879211500196, 83.31156279012183 2.19825756323776, 85.11864417436675 -0.73817196239699 M80.92442797818399 4.13240998566359 C82.22666330408696 2.5316341260633766, 83.33575518663774 1.1131787431999083, 84.90402965218902 -0.195772003410201 M85.82300876407551 4.064880193961744 C87.78596927811844 2.59322942809483, 88.39457099786168 1.5525101279047582, 90.41138178906033 -0.46141864894524554 M85.96068337020131 4.755792492540511 C87.19036355676583 3.028115434684894, 88.8586514793223 1.8918997851769075, 89.82167246867532 0.07997704628288321 M91.08731654319999 4.405040750736074 C92.74372327254017 3.671071549089661, 93.9213863864534 0.7043474860723924, 96.01633727088615 -0.8960989772538032 M91.07306931532074 4.777103791888507 C92.62106783934846 2.973380667793735, 94.23800810426621 0.997389534542433, 95.79574416744568 -0.045795152588777754 M96.71315275603473 5.104126955303116 C98.82511612568673 2.6416017359072423, 100.15517662529386 0.5683778207645924, 100.75223340736234 -0.314340695587272 M96.47299446664319 4.423339735916568 C98.16531054542027 3.1361053218485386, 99.0851446157711 2.006625360510006, 100.52234129979739 0.163735036424076 M101.22195550381194 4.660919347776529 C102.79749382585769 3.6308276859104494, 104.35675090239228 1.8359226368719916, 106.81960168163238 -1.021116948511688 M101.79793951274814 4.685014707640242 C103.42472161428952 2.886542590450785, 104.64376165351082 1.386908556998287, 106.24798529459109 -0.21573581841052925 M106.86962288935138 4.1725431807558095 C109.13298367530935 2.5338866115037737, 110.21138838376807 1.3126412572904465, 111.15694534670926 -0.3657055828054766 M107.26320816623448 4.527443604658828 C108.78667419534891 2.8104410027658027, 110.09837538825187 0.9726294932170371, 111.29609095672511 -0.06796450822477737 M112.37812054432817 4.6022010047056146 C113.21351733468701 4.392270812769404, 113.44913058803665 3.41530668392683, 116.289713440834 0.022320351801066396 M112.05306739812606 5.161780682306028 C113.96326021458519 3.1581906812710128, 115.19817011278512 1.4484099397447308, 116.088654141406 0.28887766204327225 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M6.136495784422461 3.2620940133244467 C5.254714109157254 2.6919633450008478, 3.5483710330946017 2.2075345352610523, 1.4838666019247566 -0.49055900234084726 M6.231354170370926 3.4712421316434368 C3.9438965731670774 2.1084332518066717, 1.7466963134721512 0.047233569901790506, 1.1612681389538293 -1.1409650719753295 M11.615133636098843 3.749088742166312 C10.54802053868561 2.3316964459455796, 9.902761656784454 0.65995430825262, 7.270680589300099 -0.563405853405684 M11.7303690246572 3.0765298426631107 C10.349489050751458 1.7026922331570318, 8.407725480634221 0.024362363815115362, 7.429371742509756 -0.582749270077884 M18.467157641967418 3.4290445780535306 C17.40309157949331 2.840673194540449, 16.068401815856223 1.4960903011318052, 12.913195413510241 -0.7710699473163536 M17.933115222800183 3.567428956429016 C16.414230772515356 1.8590464777160731, 14.186241356181448 -0.3939719289374438, 12.63231296289798 -0.9478434518101116 M24.65122066683578 3.7644865525714075 C23.5637808215728 2.352057223110212, 21.465562093743085 0.8848946053514064, 19.59023968151215 -0.9380406734515511 M24.694477727808756 3.7260767789850897 C23.353809812829727 2.849092675511236, 22.887260348003206 1.7316961290474706, 19.514782044826106 -1.2536211453825765 M29.986214808960618 2.825977518881249 C28.132455598845898 1.3677939053249646, 27.4223654734955 0.4097860240543908, 24.666454202512718 -1.2276409966619766 M30.530964875388133 2.9564340414074683 C29.019561585363725 1.7248281347066265, 27.265445305340805 0.5073335182854706, 25.029823584266445 -1.6242143624676473 M36.343201684842164 3.568631750784379 C35.43692065676252 2.814000437534832, 34.371416776591985 1.4278375822318523, 31.804599811762188 -0.7717683618962962 M36.62129666826902 3.591576748454508 C34.97982522375143 1.9024843819468158, 32.75802029238404 0.7084467898276151, 31.073589224574704 -1.0838224889319588 M42.19293668100307 2.885887891889159 C41.76679134186904 1.7118858150044005, 40.84016958313902 1.4402932591786346, 37.818139052600316 -0.9899053105553925 M42.137124462497184 3.3742883941126807 C41.64883941229826 2.632878971755583, 40.34631741248446 1.7880461727389307, 38.033210176718136 -0.491673019195938 M49.09141248495304 3.4405889149397297 C47.39156692654475 2.347417254787234, 46.115178443207746 0.62051682460568, 43.21111501708546 -0.4878671898674085 M48.98675181630985 3.2654528394449187 C47.236101354359455 2.5516411496804765, 46.443782150380954 0.9770963420764935, 43.66058949666949 -1.10526318947597 M55.04178324526452 3.0167284640730876 C53.41767425454746 2.7823599343288956, 52.1072054504015 1.629123497405109, 49.331404886741424 -1.3464004732230679 M55.32476651438788 3.80211231780849 C53.47717947378399 2.040119626328446, 51.64618803936432 0.6963510997005429, 50.071891252624106 -0.7290329503854736 M61.30735209060592 2.849718396599967 C58.71988334486056 1.9161401002256337, 57.13025628411405 0.7338677754450866, 55.79237807091949 -0.9505705558023863 M60.5766952227125 3.5120483136750793 C59.4544540727929 2.4767783555400804, 58.26218210546973 1.2724295653550701, 55.282338036891716 -1.3930531102958583 M67.62158298122745 3.3182081425717036 C66.37428467426338 2.7736733279248544, 64.85340268273941 0.9676496885110881, 61.428572748619885 -0.7681147263040414 M67.05021302213942 3.4189130036780546 C65.69695915155646 2.260992793676653, 64.03775294212281 0.7421682449360609, 62.087349070953024 -1.27625938319877 M72.99778421290681 3.2871189950421833 C71.24363104908747 1.637179475861029, 70.84976179392382 1.2751806082913382, 68.60789468844693 -0.6895887810718052 M72.77700835263818 3.4155679683588827 C71.26652400390431 1.3135310053114162, 69.44852105692 -0.1587004925545989, 68.48564642357913 -0.7355822152439289 M78.98577412038861 3.1771116034189495 C78.06045440262304 2.277579707748488, 76.52442263361438 1.3604959734276907, 73.73004168166533 -1.3728226775051322 M78.8401563331746 3.460904996615487 C78.0100560988573 2.479453170067207, 76.70258420181159 1.2309505490876296, 73.88049338983126 -1.4184177999598202 M85.9915608925104 4.1395572476755955 C83.25461698275748 2.0356040689962867, 81.5074690070389 0.4896600896922163, 79.96266589809252 -0.5401029663234377 M85.76644584395102 3.7110482377201435 C83.50442117182573 1.6541114225249378, 81.41311010038503 0.05768427128901066, 80.17429145506625 -0.9628422868977666 M91.53257443618553 3.7635804481773305 C89.00341976980678 1.5867946683922547, 87.25089297194094 -0.9852286303055177, 86.08012320931526 -1.7989764752638409 M91.46833912327477 3.360790933339544 C89.57941996461578 1.7920909294497345, 87.72280730382127 0.24400663949492407, 85.70827001004005 -1.424758688949875 M97.07655605541423 3.3294907912254805 C96.437439046144 2.832506199373149, 95.24145899918214 1.640342276698816, 92.06060413808378 -0.5748953367750183 M97.9715288301266 3.3602989319820424 C96.72862242303539 2.3662909713129103, 95.3576893318294 1.7991812994872742, 92.41617835435832 -1.351901548232791 M103.39459878227846 4.399127807399851 C103.07242955623639 2.676764161196718, 101.17406217474526 1.272779421067152, 98.60752593522996 -1.483656739027162 M103.98586989844391 3.8471762295594774 C102.12008968731034 2.1605885001914995, 99.92673472561818 0.49091082013403564, 98.72999056258344 -0.7936297594596781 M109.17215111761008 3.5871630585302485 C108.27511148598452 1.8563311306507106, 107.262252513863 0.7089899374075188, 105.06053632033576 -1.8601008389404725 M109.52107077778902 3.5880795563586876 C107.8831195144891 1.7899684642616736, 105.71568090555184 0.11213265816360568, 104.5455967249028 -1.5093119829687032 M115.92002077165935 3.4324022246679973 C115.01087050788067 2.5995815493707886, 113.76666127475535 0.9542807294108469, 110.8065172224014 -1.1868086752136038 M115.9523846740177 3.9703329640130263 C114.46428607183394 1.8480508231312003, 112.82566180745371 0.886592009377907, 110.93849418779713 -1.2267120577514903" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-0.49748157523572445 -1.9331182036548853 C38.19960194539251 2.1934787216626335, 80.3293121902639 1.9100670400105644, 117.63873697530585 -1.6980871353298426 M-0.41478194762021303 0.06060642469674349 C32.60738130978343 0.9666121913147736, 66.79993469627642 1.0505706680966187, 116.78178883895384 0.5584230171516538 M117.28654811330972 -0.17075614786687024 C117.0371630495552 1.0822224937193772, 117.3100960843778 1.7941976700980449, 117.46181656062215 3.5368869985568487 M117.25372136096172 -0.07845195292511045 C117.39614583644264 1.3216943567362736, 117.38450886799978 2.1852447436986657, 117.29939871137634 3.551241869864277 M118.50657875549155 5.3623166298195315 C73.45173274806956 4.078675220705402, 31.242721472703224 5.1202490432037235, -0.268763592466712 1.982460460595746 M116.82965801462638 2.721234327152331 C88.05047287873637 3.7607098516642923, 57.32394105552946 3.526261269730317, 0.3698272807523608 3.4025016184597803 M-0.2832299080721211 3.512646083509471 C0.04820170466607624 2.140951809117664, -0.1499479836098199 1.199887624538762, 0.19676403945076032 -0.3223216037442839 M-0.166182294337272 3.589877156058053 C0.03064971805667819 2.3797309490305603, -0.0220911125960356 1.3597497097084106, -0.03474557313898158 0.11343144076098646" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(171.98678477545945 193.64008378937461) rotate(0 58.68576547639498 1.7349162106253004)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M1.5496313135278277 5.473077627025326 C1.8041889953541403 3.3222775322096867, 3.573226905142686 2.27395172817741, 4.543751621017701 0.773852584682148 M1.0224603735049123 5.0046317808663145 C2.3714417688546003 3.380398390567686, 4.114311182156894 1.4028800390905802, 5.252810842780864 0.17154631436217965 M6.757802921449137 4.6840200547695305 C8.12802325109653 3.5303314664290615, 8.36767268072615 2.106171724215097, 10.412024738033248 0.1447405004024268 M6.425053895646077 4.555795515292153 C7.578679206337695 3.2857347888230253, 9.118981370285185 1.7691265160876033, 10.73426932082413 0.21299639903826129 M11.108583598829645 4.957452328544612 C12.536417982560444 2.9514913460508447, 13.938707273274149 2.4846448844313156, 15.13768353583612 0.15992368207545782 M11.51633850705235 4.69462714674374 C12.664548763941191 3.46526462133957, 13.837557047201473 2.018419082464015, 15.475888004343439 0.22782299890415575 M17.28311606185349 4.973231410951904 C17.769502642766962 3.75660801451212, 18.58719204681026 2.5590070962577807, 21.581753543110455 -0.5774987201974271 M17.48598048347818 4.241727656377098 C18.02900129175458 3.1916020842890767, 19.129228050942032 2.073420519990473, 21.369579262843246 0.1578185013658825 M22.372877223242792 5.373607272983821 C24.067690315387434 3.4674989111960253, 24.19975508240954 1.5749577063037066, 26.696844485078365 -0.28328454792380897 M22.286812328466002 4.73507150248933 C23.395919970451203 2.997822893112974, 24.929850103831985 1.7965462466877322, 26.179796693615973 0.1027737709705474 M28.359400424181676 4.394296252241697 C29.5258743163568 2.5711195700122937, 31.174161275708137 0.46291838454354783, 31.40687251542399 -0.16097527600235895 M28.217680005270246 4.607954978736457 C29.613302349483597 2.8274398650160477, 30.68596896777838 0.9370773779185669, 32.02638235366607 -0.40157349237583234 M33.09705320239227 4.970907694717953 C34.44766841923304 3.577338637797762, 36.049257319073526 1.2779092863197734, 36.989837984369444 -0.2954693709406534 M32.70934570493336 4.542297855915586 C33.70346553646172 4.018012097293633, 34.568857342083476 2.730824153842902, 37.070469527785974 0.2697628883738538 M38.39275952405869 4.370017313013913 C39.817087930006984 3.147397724602528, 41.68418044625365 1.9742387868995972, 42.957547670598856 0.178051067362631 M38.307335078209775 4.64659174018273 C39.349113842586846 3.4832094904887994, 40.118049634708 2.645418963569272, 42.79732996048011 -0.29103407437438306 M43.38877755190307 4.895593828122002 C45.26596717181432 3.489200741075238, 46.55239733474526 1.0054999475318303, 47.03839254933435 -0.019453052600081122 M43.457840003641934 5.009201714835 C45.05765836479117 2.9792413199714773, 46.24586824411422 1.6967081434658535, 47.50227662404093 0.12871710892656907 M49.042421877109525 4.7200723276435514 C50.13714284278094 2.53962257432617, 51.13787346390144 1.564412068115193, 52.657070024260044 0.3201955442725427 M49.49056581948426 4.076314698528417 C50.582177811967824 2.698223751630309, 51.84130984174802 1.8166509863812361, 52.973510950472864 -0.15985523681726976 M53.90884128740019 4.99078090757098 C55.15662680872018 3.52588944770854, 56.8574923340168 1.7149991980472297, 58.11736169559808 0.5988171414729201 M54.404634091779215 4.605241087060086 C55.61786081280648 2.925577693749725, 57.20486105788691 1.9154688501147907, 57.97802559956835 0.10612569044120548 M59.820740998959195 4.261014030505612 C61.21326103516876 3.833599244975956, 61.43734222468245 1.846400818523619, 64.08185559203257 -0.5570516108976236 M59.85142984628121 4.242363111586019 C60.96209137106634 3.489291293165262, 61.855155219222056 2.4336966284375525, 63.74445169083182 -0.3807664718919923 M65.27116042450865 4.396459167439873 C65.97147600227404 3.4844363766559163, 66.35026270563864 2.7100818674446314, 68.75472393878563 0.010268945653332584 M64.9205043365327 4.476436480403328 C65.77929230745399 3.4989371256619526, 67.11946000353701 2.180299701816563, 68.64896157155573 0.15504039073024528 M69.89319540523269 3.8062701559750978 C71.09104221975232 3.2496301922919617, 73.01949881284052 2.013430013531814, 74.16144538954141 0.22067483622713546 M70.72242227578812 4.2165387371680385 C71.51713077786806 2.792042437087208, 72.90918340547294 1.5874846496606514, 74.29811604874449 -0.14993477178604736 M75.04277597885344 4.549493842103691 C76.01290931820627 3.334654411632619, 77.49297406179794 2.7922077273865153, 78.93158279452203 0.4627050943161777 M75.3746646050899 4.545370902677239 C76.64684051826323 3.0305605007068785, 78.38584019419348 1.5062703680024443, 79.61476715719057 0.05507546760620982 M81.07838861155501 4.366143438426998 C82.71486364388524 3.060202731946405, 83.68101738175217 1.9264547805408179, 84.57850543241159 -0.17538487997819063 M80.9844757251925 4.0972058330869645 C81.73509933172562 2.9295732831778722, 82.66677151755786 2.0323282140692283, 85.12090539139838 -0.1869646286869454 M85.54385396605956 4.983625165937426 C86.5171332390313 3.21061390464558, 87.9267770093126 2.810574939554832, 89.48216677843223 0.20080332183643723 M86.23476626463832 4.41118986169431 C86.75934109705506 3.9083247138194688, 87.87883028871546 2.561766532147548, 90.02356247366035 0.31240767888158094 M90.51092255540279 4.890855792062061 C92.96742285923249 2.954318995901138, 93.19279115014547 2.5445152264668165, 95.08516309190587 0.19510746512465893 M90.88298559655522 4.845915261134209 C92.7716013228647 2.767952464059411, 94.48862942029632 1.1257477405770018, 95.93546691657089 0.011027870540476825 M97.247685401752 5.129468373316092 C97.66148529029721 3.6553477228953093, 98.4599325280639 2.1483478581519293, 100.29382940614128 0.06611636258912623 M96.56689818236545 4.849601365267051 C97.9988058914795 3.056295898661257, 99.59113029656756 1.3669598910908543, 100.77190513815263 0.059776282368874 M101.43138582679431 4.672040029799994 C103.70732212269553 2.9027866936301665, 105.22520240585246 1.5542206218275518, 105.62472979499904 -0.08195662630000577 M101.45548118665802 5.079156011108738 C102.60158321883347 3.399677491135645, 104.04713995002732 2.2003299914962327, 106.4301109251002 -0.12942551324353968 M106.98068630155576 4.914513698614672 C108.27269030024453 3.1130405386489484, 109.98243281326091 1.4676720679072521, 110.90704919327415 0.49707195094922973 M107.33558672545878 4.51287592120795 C108.79721821152745 2.666479191364677, 110.13557558341671 1.2179441407844966, 111.20479026785485 -0.24351696411942642 M112.03725215807447 4.9443281516500175 C114.03965602324949 2.90382984788429, 115.2755583795127 2.0516649044322177, 115.92198316044959 0.9139559153981495 M112.59683183567488 5.0149456012130145 C113.71987234189677 3.183973127707367, 115.13905644747044 1.616658765005467, 116.18854047069179 0.3018786169171321 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M-0.20861296487881575 3.2884879375449243 C-0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243, -0.20861296487881575 3.2884879375449243 M5.901320303915147 3.706474444760449 C4.385183230096921 1.4807903742904154, 3.00707994998702 0.9524963494863312, 1.4581134296240008 -0.42193913213264356 M6.110468422234137 3.3952938207206054 C4.519284835321317 1.5058062815707307, 2.2276239559719597 0.3076480844788293, 0.8077073599895184 -1.173326458226034 M12.42599167453919 3.0774622961427287 C10.545886225161693 2.252386674296111, 8.351882893969766 0.40043783426267265, 7.521593771573606 -0.3331843289435802 M11.75343277503599 3.3868243771153703 C10.360244536321957 1.7674192197239227, 8.5935924026065 0.5479069672946043, 7.502250354901406 -1.0592505127080676 M18.24227470344085 3.7047189619545904 C16.773512454952904 1.9056857101728453, 14.566415769362681 0.2465201287824128, 13.351606319445112 -0.6382295651948502 M18.380659081816333 3.5570475250170643 C17.298859050492904 2.241099378438929, 15.587098364818017 0.9900309184346809, 13.174832814951355 -1.3720649833912453 M24.714043870973168 3.9033734682775054 C22.90551635458559 1.7096903763849656, 20.96436713921715 1.154228193657299, 19.320962786324355 -1.6030482835118256 M24.675634097386848 3.425192683126307 C23.59777598068645 3.096329715478931, 22.327848549708058 1.9835478282457073, 19.005382314393326 -0.8880088660382264 M29.81321147906518 2.7822401362532254 C27.952821653732713 2.1707012293563404, 26.67632170465306 -0.0679429238925558, 25.069039104896106 -1.4518180481838145 M29.9436680015914 3.5490389993940914 C27.905220471451223 1.5354861369773607, 25.928982992375705 -0.1864174621032042, 24.672465739090434 -1.5447588537102648 M36.69219290398276 3.5237708861674486 C35.56141930626127 2.467210627453197, 33.86844607291691 1.5491210699596303, 31.661238932676227 -0.7447627201751073 M36.71513790165289 3.701753420961585 C34.57365203118419 1.5855967759271823, 32.99971226272929 -0.17499676926749524, 31.349184805640565 -1.3095288586411822 M42.04712568686972 3.643390080795233 C40.24933391039259 2.3598599632581334, 39.29099848753336 0.4418850844738067, 37.57942917703157 -1.3543088895669522 M42.53552618909324 3.4240329124001607 C41.1648803340786 1.7739116549130098, 39.68924941290213 0.8724840801638527, 38.07766146839102 -0.9833726773439289 M48.73815390293473 3.29557630881813 C47.486120524351904 2.262058437622486, 45.65774475760308 1.1793432611562502, 44.11914393950173 -1.6237532341088965 M48.563017827439914 3.2020760987993016 C47.208322425899 2.1920354676782825, 44.98373843425585 0.6680579287038343, 43.50174793989317 -1.4460503389760717 M54.45062064508252 3.2068550789044754 C53.20549683607919 1.3068160190439624, 51.04693044441157 -0.3616840344750778, 49.396937849160516 -0.6047715412015093 M55.23600449881793 3.8708587405053487 C53.41946334772132 2.461631063978777, 52.010979904510236 1.0876020056953983, 50.01430537199811 -0.7570766476942224 M60.321287219391586 3.1123200096670574 C59.08420252821678 1.6691017790190281, 57.50782013692981 0.19598612858253883, 55.83044440836338 -1.650395801875692 M60.9836171364667 3.3397471424207126 C59.13772490983876 1.812635793908702, 57.204311517004264 -0.21112867945197447, 55.3879618538699 -1.625783756261643 M66.92610415837777 3.8599503651415867 C65.91565710748698 2.311755896613032, 63.72408244570716 0.7859628847874712, 62.14922743087617 -1.2459475333457464 M67.02680901948412 3.4843008298115743 C66.04948720909883 2.3050157683218013, 64.69538901173289 1.8598107915824047, 61.64108277398144 -1.1966804987871669 M72.93269165263041 2.8041418934628757 C70.47141312817243 2.0702640151012237, 69.37106956630353 0.45203925498441044, 68.36408056912285 -0.2409426741283176 M73.06114062594712 3.364532863853165 C71.473465926698 2.434716996484003, 70.52698773809334 1.60011689060388, 68.31808713495073 -0.9900035967100193 M78.9590114540216 3.516443402233186 C77.5974179082453 1.781167307833925, 76.12195469938129 0.38019514087992246, 73.7185233144717 -1.57592548166674 M79.24280484721815 3.507326392323362 C77.9487338656398 2.168511862787188, 76.36213958482777 0.7914001759421838, 73.672928192017 -0.9980814919089201 M86.05778429129272 3.2092398687001427 C83.9303714551308 1.9284058212039636, 82.36081351889364 0.49457984899173374, 80.68757021866782 -0.2836444761678002 M85.62927528133727 3.650820309130114 C83.89294980219273 2.31108160634826, 82.5486286908263 0.8789816480897568, 80.26483089809349 -0.633387919569331 M91.71948413357663 3.0815832136167813 C89.6812725378638 1.8895984991376726, 87.15909297998006 0.45886441031945313, 85.4663733515096 -1.3183626127269932 M91.31669461873884 3.2289124788219707 C89.21987915121984 1.4109323420291395, 87.2293352052325 -0.09327110470597802, 85.84059113782357 -1.6515994512782313 M97.4217216696392 3.4188550447687973 C96.27860333680236 1.9690157867595879, 94.5381270167516 0.6212285750806978, 92.82678168301287 -1.618624064808619 M97.45252981039576 3.758617504790099 C96.36432061704319 2.610653660427744, 95.66766403537534 1.67894168428773, 92.0497754715551 -1.0854573430194534 M104.62768587882803 4.346365351720374 C103.17522492820228 2.958526573484984, 101.97990963153643 1.646821347638308, 98.05434747377515 -0.36116962409849773 M104.07573430098766 3.7917514190791506 C102.5309047293229 2.252128035367595, 101.0825101110835 1.272451088824185, 98.74437445334263 -0.8688389221233466 M109.8533977717406 3.4444045927856934 C108.28785705481556 2.8774044752804344, 107.28213956994992 2.0580263075750826, 103.71558001564402 -0.6458846135324524 M109.85431426956904 3.5846712693587763 C108.59833255067122 2.291277225170618, 107.39064101477724 1.3414233558028104, 104.0663688716158 -1.4170180002952666 M115.8349641308928 3.8632957820004536 C114.43959227990644 2.3468424727580683, 112.1445960855381 0.6686766068955579, 110.52519937238534 -0.6807468003247664 M116.37289487023783 3.818014705895142 C114.3399007294772 2.7134725891800655, 113.51454171518402 1.3602773198976832, 110.48529598984746 -1.1569252866440252" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.9331182036548853 -1.5488086249679327 C48.04332697501148 1.1244419200741358, 94.34437809545031 0.5591568930470057, 115.67344381746011 -0.3641095068305731 M0.06060642469674349 -0.4672734634950757 C35.86487851623516 1.1374610717570603, 71.5710718800514 -0.10290619179846627, 117.92995396994161 -0.3625390725210309 M117.20077480492309 -0.3300802430161512 C117.50975789451323 0.9409259159956732, 117.30657832729707 2.0864318562106643, 117.4385855300962 3.6016510285692163 M117.29307899986485 0.020884904899346246 C117.48595871523031 1.0874974503650356, 117.1767407684989 2.08436678640348, 117.45294040140362 3.6001093857161686 M119.26401516135888 3.5750642521008444 C90.2575159188685 6.057194503843591, 64.5393677353381 3.86718032032232, -1.4873719606548548 2.537919650755782 M116.62293285869168 3.891232203662355 C83.66954664813241 2.2711536735793363, 50.344216963757205 3.203655138661505, -0.0673308027908206 3.3446509055494857 M0.042813662258870344 3.4959332223242123 C-0.33332569976136206 2.4778428435977857, -0.2768760461371855 2.0267150798003004, -0.3223216037442839 -0.05755745263404327 M0.12004473480745206 3.5290828232650275 C-0.11340075576919306 2.149385426659648, -0.1499897213502056 0.781173716206288, 0.11343144076098646 0.053516240129377585" stroke="transparent" stroke-width="1" fill="none"></path></g><g transform="translate(21.200666307713618 196.15588838707583) rotate(0 66.45875398214207 3.3441116129241664)"><path d="M-1.0960838105529547 -1.0616192016750574 L131.49702390758785 -0.18590078689157963 L131.30062392084392 6.078562322856442 L-0.9414483215659857 6.0362929141535915" stroke="none" stroke-width="0" fill="#f41d92"></path><path d="M0.07504191063344479 -1.6519318129867315 C48.6717207366297 2.1251636629523842, 98.76146253564842 0.7274964337768167, 134.73962012781294 -0.8237543012946844 M-0.14539700280874968 -0.45872258115559816 C42.57345630329015 -0.7780584158653236, 87.34047111604188 -0.21417305490913163, 132.42589526633265 -0.6365428166463971 M133.34979469917167 -0.34348489165605 C132.676242273321 1.9144993480679706, 133.07805235207408 3.4745177784773227, 133.2153963306507 6.377620532226333 M132.870245864389 -0.035881700723496945 C133.18692599393896 2.315616917464927, 133.21328781639215 5.2897618465496015, 132.72819011305828 6.914442500883744 M133.58270917190703 8.615716390164152 C92.31425296018838 8.108846602087727, 54.408854838679716 6.13039552891802, 0.9311732854694128 7.353886119873778 M133.51983698467257 5.712366395482377 C92.0379204296055 7.022827596903826, 50.62717563509753 7.167296530247714, -0.04139300715178251 6.606375002154664 M0.10716979992885733 7.209792720621895 C-0.48025882347219 5.5130160182701164, 0.537929206635043 3.982754117283542, 0.16160612139750152 -0.25448465069912857 M-0.2701862171989351 6.564623220302453 C0.2565257148885263 5.052346940938092, -0.10389336805681824 3.9692018223104717, 0.04134105659023152 -0.19166792607665015" stroke="transparent" stroke-width="1" fill="none"></path></g></svg>
<figcaption>Transactions table with FK to purchasing and cancelling user</figcaption></p>
</figure>
<p>We expected the index on the cancelling user to be significantly smaller than the index on the purchasing user, but they were exactly the same. Coming from Oracle, I was always taught that <a href="https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/CREATE-INDEX.html#GUID-1F89BBC0-825F-4215-AF71-7588E31D8BFE" rel="noopener">NULLs are not indexed</a>, but in PostgreSQL they are! This "Aha" moment led us to the realization that <strong>we were indexing a lot of unnecessary values for no reason</strong>.</p>
<p>This was the original index we had for the cancelling user:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">transaction_cancelled_by_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">transactions</span><span class="p">(</span><span class="n">cancelled_by_user_id</span><span class="p">);</span>
</pre></div>
<p>To check our thesis, we replaced the index with a partial index that excludes null values:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">transaction_cancelled_by_ix</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">transaction_cancelled_by_part_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">transactions</span><span class="p">(</span><span class="n">cancelled_by_user_id</span><span class="p">)</span>
<span class="hll"><span class="k">WHERE</span><span class="w"> </span><span class="n">cancelled_by_user_id</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span>
</span></pre></div>
<p>The full index after we reindexed it was 769MB in size, with more than 99% null values. The partial index that excluded null values was less than 5MB. That's more than 99% percent of dead weight shaved off the index!</p>
<table>
<thead>
<tr>
<th>Index</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Full index</td>
<td>769MB</td>
</tr>
<tr>
<td>Partial Index</td>
<td>5MB</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>Difference</td>
<td>-99%</td>
</tr>
</tbody>
</table>
<p>To make sure those NULL values were indeed unnecessary, we reset the stats on the table and waited a while. Not long after, we observed that the index is being used just like the old one! <strong>We just shaved off more than 760MB of unused indexed tuples without compromising performance!</strong></p>
<figure><img alt="Clearing space, play by play" src="https://hakibenita.com/images/01-postgresql-unused-index-size.png"><figcaption>Clearing space, play by play</figcaption>
</figure>
<h3 id="utilizing-partial-indexes"><a class="toclink" href="#utilizing-partial-indexes">Utilizing Partial Indexes</a></h3>
<p>Once we had a good experience with one partial index, we figured we might have more indexes like that. To find good candidates for partial index we wrote a query to search for indexes on fields with high <code>null_frac</code>, the percent of values of the column that PostgreSQL estimates are NULL:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Find indexed columns with high null_frac</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">,</span>
<span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">index_size</span><span class="p">,</span>
<span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indisunique</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">unique</span><span class="p">,</span>
<span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">attname</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">indexed_column</span><span class="p">,</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">null_frac</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">''</span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">null_frac</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="s1">'999.00%'</span><span class="p">)</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">null_frac</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">((</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">null_frac</span><span class="p">)::</span><span class="nb">bigint</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">expected_saving</span>
<span class="w"> </span><span class="c1">-- Uncomment to include the index definition</span>
<span class="w"> </span><span class="c1">--, ixs.indexdef</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">c</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_index</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indexrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">oid</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_attribute</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">attrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">oid</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c_table</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">c_table</span><span class="p">.</span><span class="n">oid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indrelid</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_indexes</span><span class="w"> </span><span class="n">ixs</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ixs</span><span class="p">.</span><span class="n">indexname</span>
<span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_stats</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c_table</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">attname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">attname</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="c1">-- Primary key cannot be partial</span>
<span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indisprimary</span>
<span class="w"> </span><span class="c1">-- Exclude already partial indexes</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">indpred</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="c1">-- Exclude composite indexes</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">array_length</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">indkey</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="c1">-- Larger than 10MB</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="mi">2</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="k">c</span><span class="p">.</span><span class="n">oid</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">null_frac</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
</pre></div>
<p>The results of this query can look like this:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">index</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">index_size</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">unique</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">indexed_column</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">null_frac</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">expected_saving</span>
<span class="c1">---------+--------------------+------------+--------+----------------+-----------+-----------------</span>
<span class="w"> </span><span class="mf">138247</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tx_cancelled_by_ix</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1418</span><span class="w"> </span><span class="n">MB</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">f</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">cancelled_by</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">96.15</span><span class="o">%</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1363</span><span class="w"> </span><span class="n">MB</span>
<span class="w"> </span><span class="mf">16988</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tx_op1_ix</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1651</span><span class="w"> </span><span class="n">MB</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">op1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">6.11</span><span class="o">%</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">101</span><span class="w"> </span><span class="n">MB</span>
<span class="w"> </span><span class="mf">1473377</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tx_token_ix</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">22</span><span class="w"> </span><span class="n">MB</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">11.21</span><span class="o">%</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2494</span><span class="w"> </span><span class="n">kB</span>
<span class="w"> </span><span class="mf">138529</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tx_op_name_ix</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1160</span><span class="w"> </span><span class="n">MB</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="n">bytes</span>
</pre></div>
<p>In the table above we can identify several types of results:</p>
<ul>
<li><code>tx_cancelled_by_ix</code> is a large index with many null values: great potential here!</li>
<li><code>tx_op_1_ix</code> is a large index with few null values: there's not much potential</li>
<li><code>tx_token_ix</code> is a small index with few null values: I wouldn't bother with this index</li>
<li><code>tx_op_name_ix</code> is a large index with no null values: nothing to do here</li>
</ul>
<p>The results show that by turning <code>tx_cancelled_by_ix</code> into a partial index that excludes null we can potentially save ~1.3GB.</p>
<p><strong>Is it always beneficial to exclude nulls from indexes?</strong></p>
<p>No. <code>NULL</code> is as meaningful as any other value. If your queries are searching for null values using <code>IS NULL</code>, these queries might benefit from an index on <code>NULL</code>.</p>
<p><strong>So is this method beneficial only for null values?</strong></p>
<p>Using partial indexes to exclude values that are not queried very often or not at all can be beneficial for any value, not just null values. <code>NULL</code> usually indicate a lack of value, and in our case not many queries were searching for null values, so it made sense to exclude them from the index.</p>
<p><strong>So how did you end up clearing more than 20GB?</strong></p>
<p>You may have noticed that the title mentions more than 20GB of free space but the charts only show half, well... indexes are also dropped from replications! When you release 10GB from your primary database, you also release roughly the same amount of storage from each replica.</p>
<hr>
<h2 id="bonus-migrating-with-django-orm"><a class="toclink" href="#bonus-migrating-with-django-orm">Bonus: Migrating with Django ORM</a></h2>
<p>This story is taken from a large application built with Django. To put the above techniques to practice with Django, there are several things to note.</p>
<h3 id="prevent-implicit-creation-of-indexes-on-foreign-keys"><a class="toclink" href="#prevent-implicit-creation-of-indexes-on-foreign-keys">Prevent Implicit Creation of Indexes on Foreign Keys</a></h3>
<p>Unless you explicitly set <code>db_index=False</code>, <a href="9-django-tips-for-working-with-databases#fk-indexes">Django will implicitly create a B-Tree index on a <code>models.ForeignKey</code> field</a>. Consider the following example:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="k">class</span> <span class="nc">Transaction</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="n">cancelled_by_user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
</span> <span class="n">to</span><span class="o">=</span><span class="n">User</span><span class="p">,</span>
<span class="n">null</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">CASCADE</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>The model is used to keep track of transaction data. If a transaction is cancelled, we keep a reference to user that cancelled it. As previously described, most transactions don't end up being cancelled, so we set <code>null=True</code> on the field.</p>
<p>In the <code>ForeignKey</code> definition above we did not explicitly set <code>db_index</code>, so Django will implicitly create a full index on the field. To create a partial index instead, make the following changes:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="k">class</span> <span class="nc">Transaction</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">cancelled_by_user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">to</span><span class="o">=</span><span class="n">User</span><span class="p">,</span>
<span class="n">null</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">CASCADE</span><span class="p">,</span>
<span class="hll"> <span class="n">db_index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="n">models</span><span class="o">.</span><span class="n">Index</span><span class="p">(</span>
</span><span class="hll"> <span class="n">fields</span><span class="o">=</span><span class="p">(</span><span class="s1">'cancelled_by_user_id'</span><span class="p">,</span> <span class="p">),</span>
</span><span class="hll"> <span class="n">name</span><span class="o">=</span><span class="s1">'</span><span class="si">%(class_name)s</span><span class="s1">_cancelled_by_part_ix'</span><span class="p">,</span>
</span><span class="hll"> <span class="n">condition</span><span class="o">=</span><span class="n">Q</span><span class="p">(</span><span class="n">cancelled_by_user_id__isnull</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span>
</span><span class="hll"> <span class="p">),</span>
</span> <span class="p">)</span>
</pre></div>
<p>We first tell Django not to create the index on the FK field, and then add a partial index using <a href="https://docs.djangoproject.com/en/3.1/ref/models/indexes/" rel="noopener"><code>models.Index</code></a>.</p>
<div class="admonition tip">
<p class="admonition-title">take away</p>
<p>Nullable foreign keys are good candidates for a partial index!</p>
</div>
<p>To prevent implicit features such as this one from sneaking indexes without us noticing, <a href="/automating-the-boring-stuff-in-django-using-the-check-framework">we create Django checks</a> to force ourselves to <a href="/automating-the-boring-stuff-in-django-using-the-check-framework#h008-must-set-db_index-explicitly-on-a-foreignkey-field">always explicitly set <code>db_index</code> in foreign keys</a>.</p>
<h3 id="migrate-exiting-full-indexes-to-partial-indexes"><a class="toclink" href="#migrate-exiting-full-indexes-to-partial-indexes">Migrate Exiting Full Indexes to Partial Indexes</a></h3>
<p>One of the challenges we were facing during this migration is to replace the existing full indexes with partial indexes without causing downtime or degraded performance during the migration. After we identified the full indexes we want to replace, we took the following steps:</p>
<ol>
<li>
<p><strong>Replace full indexes with partial indexes</strong>: Adjust the relevant Django models and replace full indexes with partial indexes, as demonstrated above. The migration Django generates will first disable the FK constraint (if the field is a foreign key), drop the existing full index and create the new partial index. Executing this migration may cause both downtime and degraded performance, so we won't actually run it.</p>
</li>
<li>
<p><strong>Create the partial indexes manually</strong>: Use Django's <a href="https://docs.djangoproject.com/en/3.1/ref/django-admin/#django-admin-sqlmigrate" rel="noopener"><code>./manage.py sqlmigrate</code></a> utility to produce a script for the migration, extract only the <code>CREATE INDEX</code> statements and adjust them to create the indexes <code>CONCURRENTLY</code>. Then, create the indexes manually and concurrently in the database. Since the full indexes are not dropped yet, they can still be used by queries so performance should not be impacted in the process. It is possible to <a href="how-to-create-django-index-without-downtime">create indexes concurrently in Django migrations</a>, but this time we decided it's best to do it manually.</p>
</li>
<li>
<p><strong>Reset full index statistics counters</strong>: To make sure it's safe to drop the full indexes, we wanted to first make sure the new partial indexes are being used. To keep track of their use we reset the counters for the full indexes using <a href="https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-FUNCTIONS" rel="noopener"><code>pg_stat_reset_single_table_counters(<full index oid>)</code></a>.</p>
</li>
<li>
<p><strong>Monitor use of partial indexes</strong>: After resetting the stats we monitored both overall query performance and the partial index usage by observing the values of <code>idx_scan</code>, <code>idx_tup_read</code> and <code>idx_tup_fetch</code> in the <a href="https://www.postgresql.org/docs/13/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW" rel="noopener"><code>pg_stat_all_indexes</code></a> tables, for both the partial and the full indexes.</p>
</li>
<li>
<p><strong>Drop the full indexes</strong>: Once we were convinced the partial indexes are being used, we dropped the full indexes. This is a good point to check the sizes of both partial and full indexes to find out exactly how much storage you are about to free.</p>
</li>
<li>
<p><strong>Fake the Django migration</strong>: Once the database state was effectively in-sync with the model state, we fake the migration using <code>./manage.py migrate --fake</code>. When faking a migration, Django will register the migration as executed, but it won't actually execute anything. This is useful for situations like this when you need better control over a migration process. Note that on other environments such as dev, QA or staging where there is not downtime considerations, the Django migrations will execute normally and the full indexes will be replaced with the partial ones. For more advanced Django migration operations such as "fake", check out <a href="move-django-model">How to Move a Django Model to Another App</a>.</p>
</li>
</ol>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>Optimizing disks, storage parameters and configuration can only affect performance so much. At some point, to squeeze that final drop of performance you need to make changes to the underlying objects. In this case, it was the index definition.</p>
<p>To sum up the process we took to clear an much storage as we could:</p>
<ul>
<li>Remove unused indexes</li>
<li>Repack tables and indexes (and activate B-Tree deduplication when possible)</li>
<li>Utilize partial indexes to index only what's necessary</li>
</ul>
<p>Hopefully, after applying these techniques you can gain a few more days before you need to reach into your pocket and provision more storage.</p>Re-Introducing Hash Indexes in PostgreSQL2021-01-11T00:00:00+02:002021-01-11T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-01-11:/postgresql-hash-index<p>There is a type of index you are probably not using, and may have never even heard of. It is wildly unpopular, and until a few PostgreSQL versions ago it was highly discouraged and borderline unusable, but under some circumstances it can out-perform even a B-Tree index.</p><hr>
<p>If you work with databases you are probably familiar with B-Tree indexes. They are used to enforce unique and primary-key constraints, and are the default index type in most database engines. If you work with text, geography or other complex types in PostgreSQL, you may have dipped your toes into GIN or GIST indexes. If you manage a read intensive database (or just follow this blog), you may also be familiar with BRIN indexes.</p>
<p>There is another type of index you are probably not using, and may have never even heard of. It is wildly unpopular, and until a few PostgreSQL versions ago it was highly discouraged and borderline unusable, but under some circumstances it can out-perform even a B-Tree index.</p>
<p><strong>In this article we re-introduce the Hash index in PostgreSQL!</strong></p>
<p><em>This article was co-authored by Michael from <a href="https://www.pgmustard.com/" rel="noopener">pgMustard</a></em></p>
<figure><img alt="<small>Photo by <a href="https://unsplash.com/photos/lRoX0shwjUQ">Jan Antonin Kolar</a></small>" src="https://hakibenita.com/images/00-postgresql-hash-index.png"><figcaption><small>Photo by <a href="https://unsplash.com/photos/lRoX0shwjUQ">Jan Antonin Kolar</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#hash-index">Hash Index</a><ul>
<li><a href="#hash-function">Hash Function</a></li>
<li><a href="#hash-collision">Hash Collision</a></li>
<li><a href="#index-split">Index Split</a></li>
</ul>
</li>
<li><a href="#hash-index-in-postgresql">Hash Index in PostgreSQL</a><ul>
<li><a href="#creating-hash-indexes">Creating Hash Indexes</a></li>
<li><a href="#hash-index-size">Hash Index Size</a></li>
<li><a href="#hash-index-fillfactor">Hash Index fillfactor</a></li>
</ul>
</li>
<li><a href="#hash-index-performance">Hash Index Performance</a><ul>
<li><a href="#hash-index-insert-performance">Hash Index Insert Performance</a></li>
<li><a href="#hash-index-select-performance">Hash Index Select Performance</a></li>
<li><a href="#hash-index-limitations">Hash Index Limitations</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="hash-index"><a class="toclink" href="#hash-index">Hash Index</a></h2>
<p>Just like the name suggests, Hash indexes in PostgreSQL use a form of the hash table data structure. Hash table is a common data structure in many programming languages. For example, a <a href="https://docs.python.org/3/library/stdtypes.html#typesmapping" rel="noopener">Dict</a> in Python, a <a href="https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html" rel="noopener">HashMap</a> in Java or the new <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map" rel="noopener">Map</a> type in JavaScript.</p>
<p>To understand how you can benefit from Hash indexes, it's best to understand how they work.</p>
<h3 id="hash-function"><a class="toclink" href="#hash-function">Hash Function</a></h3>
<p>Hash indexes use a <em>hash function</em>. PostgreSQL's hash function maps any database value to a 32-bit integer, the <em>hash code</em> (about 4 billion possible hash codes). A good hash function can be computed quickly and "jumbles" the input uniformly across its entire range.</p>
<p>The hash codes are divided to a limited number of <em>buckets</em>. The buckets map the hash codes to the actual table rows.</p>
<figure>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 264 328.2462667685287" width="20em" height="328.2462667685287"><g transform="translate(11.666666666666572 132.59615384615387) rotate(0 25.615384615384613 24.730769230769226)"><path d="M0.19221067428588867 -1.79854154586792 C9.969303963734554 -1.6432035889992347, 22.75762772193322 -1.9653624024757972, 52.3237825173598 0.8443570137023926 M-0.6429002285003662 0.5630700588226318 C20.33801717024583 1.0528750944137573, 40.39535058461703 0.7731586027145385, 50.664933516429016 0.5442330837249756 M52.2408967751723 -0.13330411911010742 C52.423765991834486 17.79776027202606, 51.24861035218605 38.03790669441223, 51.85605485622699 48.62396493324866 M50.888993575022766 0.14732146263122559 C50.99830201937602 19.028628809635453, 51.12237886263774 36.86086389101468, 51.06591160480792 49.61907568344702 M53.07395036403949 49.65802255043616 C33.88981666931739 50.1807713996447, 16.31066344334529 48.861542273668135, 0.21956968307495117 48.30971017250647 M51.35833103840167 48.99755468735327 C34.6531761096074 50.69362970829009, 19.977562196438125 50.10776087284087, -0.49410414695739746 48.560260681005616 M0.8632464408874512 51.23048653969397 C0.8073489297353305 31.131840854424695, 0.988638602586893 17.248527200405412, 1.5477404594421387 -1.02028226852417 M-0.19582533836364746 48.776630309911866 C0.9785774012712332 29.629223884068995, -0.1458809117170481 9.229620192601125, -0.2645127773284912 -0.29627442359924316" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(106.58974358974342 135.13461538461547) rotate(0 24.84615384615384 22.03846153846152)"><path d="M-1.79854154586792 -1.3989605903625488 C12.821326706959649 -0.026658346102787833, 26.547151991037218 1.0452563267487747, 50.53666470601007 1.9574084281921387 M0.5630700588226318 -0.04519057273864746 C10.698960531674896 -0.05159294165097761, 20.51234123890216 -0.24636375463925886, 50.236540776032655 0.6751844882965088 M49.55900357319757 1.2768664360046387 C49.711678131727055 12.57739543731395, 51.77379666200049 25.57665490737326, 48.854734164017884 45.565139000232364 M49.839629154938905 -0.6456773281097412 C50.38517004288158 16.383737664956296, 49.698471129307364 33.645818673647334, 49.84984491421625 44.55616182547345 M49.888791781205384 44.560378258044864 C38.96100327785197 45.263592394682036, 26.045421805748564 44.7544523800336, -1.1518282890319824 44.17358321409955 M49.2283239181225 43.1161593840672 C31.08563929337721 44.120763305150504, 10.91731990300692 43.64620733591222, -0.9012777805328369 44.49717897635236 M1.7689480781555176 44.96960372191205 C-1.3148316979408265 31.167395432178765, 1.718966519832611 17.657667611195475, -1.02028226852417 1.9546160697937012 M-0.6849081516265869 44.87553590994611 C-0.4762380246015693 32.13172847491042, -1.203464281925788 20.57371205549972, -0.29627442359924316 0.5372984409332275" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(200.05128205128187 138.98076923076928) rotate(0 24.846153846153868 18.961538461538453)"><path d="M-1.3989605903625488 -1.2266573905944824 C18.969163957008966 -1.1019478045977076, 38.98914397313048 1.29122541317573, 51.649716120499875 -1.93776273727417 M-0.04519057273864746 0.6058633327484131 C10.200747555952816 0.046344334895793926, 20.57778920026928 0.2644305775715752, 50.367492180604245 -0.16956543922424316 M50.969174128312375 -0.3802676200866699 C49.98555421150652 16.49719568582681, 50.50912904060808 28.79311602299029, 51.18052361561706 39.59391098756055 M49.046630364197995 0.5096948146820068 C49.041095362626635 12.775863047746506, 49.397838221513354 26.063271515186006, 50.17154644085815 37.256708200161256 M50.17576287342956 38.78812485474805 C36.92768673163197 37.51771446778222, 22.92521605858438 36.82006737305566, 0.09666013717651367 39.80455857056836 M48.7315439994519 38.898859079067506 C32.27894599621115 38.08242599890781, 14.410576508595405 36.79123489783359, 0.4202558994293213 38.09559827584485 M0.8926806449890137 36.51049118775586 C0.9102855119338403 29.00842637098751, 1.3840556535354027 15.831068585469161, 1.9546160697937012 -0.7276949882507324 M0.7986128330230713 37.7631397797511 C-0.2515824862626883 22.937588447790866, -0.19024978362596956 9.430152350205635, 0.5372984409332275 -0.06695771217346191" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(10 284.2692307692306) rotate(0 122 15.5)"><path d="M-1.4425529179052352 -1.062887403612709 C84.43206217809087 1.87951077746899, 168.00396656412605 1.340752656552242, 244.53920865744266 -0.935478985112953 M0.749495891116619 0.4175032903210639 C86.42419033096954 0.8697627262504117, 172.8014032617404 0.38405045267558263, 244.69513824538603 0.2584414289715767 M244.8311266899109 -0.6835513114929199 C244.5253163933754 6.7104063749313365, 245.25006309747695 14.960321092605593, 243.64599180221558 30.670284748077393 M243.71537613868713 0.9215905666351318 C244.24680605649948 10.030092978477478, 244.5810879778862 19.941085624694825, 243.7729094028473 31.109784841537476 M244.8271357903101 31.21089405867977 C184.08521485743654 31.30906240437859, 127.31784122710208 32.865145402602465, 0.644631839486122 30.183112634601404 M243.81666848886442 31.356794774506856 C193.52231486539944 31.716934098833367, 140.99850942660245 31.791864275063222, 0.6700753280947684 31.63970806257143 M1.4352221488952637 30.608349323272705 C-1.6727817940711975 24.878663182258606, -0.622839777469635 14.911184072494507, -1.7743268013000488 -0.5290255546569824 M-0.11572432518005371 30.87109923362732 C-0.3354560923576355 21.956659626960754, 0.5159156727790832 14.141585254669188, 0.9930784702301025 0.8908364772796631" stroke="currentColor" stroke-width="1" fill="none"></path></g><g><g transform="translate(41.5 283.2692307692306) rotate(0 -0.24344904067797302 16.591623114454535)"><path d="M-1.0470545768737793 -0.7714802742004394 C-1.2968480745951336 4.829924551645915, -0.45333102544148773 26.97473758061727, -0.32431592941284193 32.445771312713624 M0.6041030409280213 1.4378886365052312 C0.1526466680907957 7.340648744121815, -0.7074486827508856 28.671301598663753, -0.917627255981788 33.95472650310956" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(78.5 285.0192307692306) rotate(0 -0.5031951796170233 16.605925130802234)"><path d="M-0.7714802742004394 0.6756840705871581 C-0.7534087816874185 6.388032499949136, -0.9419290860493977 27.857753976186117, -1.0542286872863769 33.227035999298096 M1.0243538525048645 -0.015185737693682366 C0.9813915897750609 5.446763365284228, -1.4525607251779484 26.357901601906242, -2.030744211738929 31.48615515015088" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(115 282.5192307692306) rotate(0 -0.07653179319780179 16.115036894148204)"><path d="M0.6756840705871581 -0.054228687286376864 C0.8046991666158039 5.416805044809978, -0.058912690480550145 26.74218448003133, -0.2729640007019043 32.05472211837768 M-0.4287205216940493 -1.1283026934508236 C-0.3024937890625249 4.575641496197011, -0.7159607219354559 27.706658120270202, -0.8393155646976083 33.35837648174726" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(149 283.0192307692306) rotate(0 -1.3406614601938145 16.47725108142481)"><path d="M-0.054228687286376864 0.7270359992980957 C-0.16652828852335605 6.096318022410075, -1.1744821866353352 27.487035592397056, -1.4452778816223144 32.891376209259036 M-1.5418374774511905 0.06312595359049733 C-1.7836156581497442 5.00189878703716, -2.4172042035715036 25.792556066627927, -2.627094233101234 30.97427397034131" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(183 284.5192307692306) rotate(0 -0.3120803726045409 15.638646589582777)"><path d="M0.7270359992980957 -0.4452778816223144 C0.5129846890767416 4.867259756724039, -0.42963107426961267 26.195080598195393, -0.6086237907409668 31.820997142791747 M-0.3504088304098696 -1.724652714813128 C-0.7473583673095952 3.7375849318659546, -1.2813062572137763 27.482324700470393, -1.3511967445071786 33.0019458939787" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(217 284.0192307692306) rotate(0 -0.07044146727900369 16.35216223712547)"><path d="M-0.4452778816223144 0.39137620925903316 C-0.7160735766092936 5.795716826121013, -1.721586068471273 27.835555489857995, -1.679002857208252 33.15307970046997 M1.521812501186505 -0.44875522621907304 C1.6483277775192011 4.5434819626963385, 1.018462376628692 25.714048910255855, 0.6764751791302115 31.37337179443799" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(40.05825821257167 195.5068732198265) rotate(0 42.66547450719139 37.72219758300872)"><path d="M0.39137620925903316 -0.6790028572082519 C14.635743380143058 12.08244312693308, 72.03568825996823 63.27669813283435, 86.19323902460225 76.12339802322568 M-0.8622900102194399 1.5789166974183173 C13.217584695704966 14.044586615673627, 71.36698602319106 62.25102666198922, 85.58806040372178 74.42618950207375" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(40.05825821257167 195.5068732198265) rotate(0 42.66547450719139 37.72219758300872)"><path d="M56.27913662900037 65.21696416960359 C68.88329802068695 67.41067472443576, 79.50662756242923 72.47313996643902, 85.45475628461168 75.70305593807839 M57.72345087040967 63.70971123800874 C66.07758382065266 67.62047886489466, 73.66647007325521 69.33679969492935, 85.73538186635301 73.78051217396401" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(40.05825821257167 195.5068732198265) rotate(0 42.66547450719139 37.72219758300872)"><path d="M69.49058989751853 49.51417763601024 C77.27261426851784 57.413187146680364, 83.13207431893325 68.1378626070296, 85.45475628461168 75.70305593807839 M70.93490413892783 48.00692470441539 C75.58478121784738 56.44106370170914, 79.46142588450977 62.5696574286289, 85.73538186635301 73.78051217396401" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(116.42434244501254 184.75866006214244) rotate(0 59.45567138190653 47.41568588349551)"><path d="M-0.6790028572082519 0.6530797004699707 C19.32693439323235 16.16292202302737, 99.4991544643307 77.84927922829603, 119.5903456210213 93.45043014460343 M1.1653819134179502 -0.049657402122393224 C21.12315406095948 15.61250287920719, 99.31628733631055 79.50014256751714, 119.0676663850209 94.88102916911345" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(116.42434244501254 184.75866006214244) rotate(0 59.45567138190653 47.41568588349551)"><path d="M91.73378960634192 86.63085537705872 C97.80955392309608 89.77859696588061, 106.63086184163672 90.35584681871104, 120.34453282102554 94.50076154902678 M90.22653667474707 86.02459436212037 C100.23271971970914 88.5361761061078, 107.72677287232753 91.19692099690045, 118.42198905691116 95.39072398379545" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(116.42434244501254 184.75866006214244) rotate(0 59.45567138190653 47.41568588349551)"><path d="M104.55446461611936 70.60742582282448 C107.31201659003598 77.89507908920568, 112.76285225642022 82.68478457762023, 120.34453282102554 94.50076154902678 M103.04721168452451 70.00116480788613 C109.27673727120045 77.44134612612115, 112.85938786763218 84.9906076722, 118.42198905691116 95.39072398379545" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(223.39278132660553 180.04746221468122) rotate(0 -94.601415491361 51.59332348246434)"><path d="M0.6530797004699707 0.8102213859558105 C-30.878989499882103 18.089499078271373, -157.36027838625134 86.16025669811958, -188.80103899285342 102.99666329668514 M-0.46319218612276014 0.18998366824351232 C-32.20533246471571 17.21824359545431, -158.5266107042053 84.2929367439981, -189.85591068319192 101.44529896803812" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(223.39278132660553 180.04746221468122) rotate(0 -94.601415491361 51.59332348246434)"><path d="M-168.54181239031473 78.71309970256111 C-174.8457524523537 87.61927071278855, -183.43265250467223 95.33427297208146, -190.2361783032786 101.54700976439432 M-169.1480734052531 79.88764761325142 C-177.0745273140418 87.22786010779531, -185.010688807317 95.52496907397051, -189.34621586850992 100.9236984831901" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(223.39278132660553 180.04746221468122) rotate(0 -94.601415491361 51.59332348246434)"><path d="M-158.8184508978827 96.78452033496656 C-168.2542692191381 99.73278923031133, -180.07530208269986 101.43697162462958, -190.2361783032786 101.54700976439432 M-159.42471191282107 97.95906824565687 C-171.1423354859107 98.3174626375699, -182.8131676457216 99.6734736520745, -189.34621586850992 100.9236984831901" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(219.19289863647785 185.5522747426461) rotate(0 -24.301407397243516 45.009824274286075)"><path d="M0.8102213859558105 -0.20347852706909175 C-7.249311660667366 14.916934397467859, -40.53379699657413 76.08571072940472, -49.03620113694731 91.37555724769167 M-0.22355111575685438 -1.3559086991194635 C-8.286490964497855 13.35475919590842, -40.948312607870655 74.41263783510533, -49.41303618044283 89.57117267539994" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(219.19289863647785 185.5522747426461) rotate(0 -24.301407397243516 45.009824274286075)"><path d="M-45.26910895639891 61.72139377232179 C-44.84966387542568 67.02548904590631, -45.67417497618659 73.99902243783148, -49.31132538408663 91.2530235855867 M-44.0945610457086 60.32359070416078 C-46.88105608033692 69.91504812756526, -48.67003477292695 80.10376199307848, -49.93463666529085 89.39416857650772" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(219.19289863647785 185.5522747426461) rotate(0 -24.301407397243516 45.009824274286075)"><path d="M-27.250843037716315 71.54290565540174 C-31.270791812610604 74.57794011983158, -36.48124485224933 79.1607561660591, -49.31132538408663 91.2530235855867 M-26.076295127026007 70.14510258724073 C-34.79046687918523 76.61711928693661, -42.54825348111142 83.55231707748823, -49.93463666529085 89.39416857650772" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g transform="translate(35.78205128205116 146.8269230769231) rotate(0 1.5 10.5)"><text x="1.5" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(125.43589743589726 146.67307692307702) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(218.89743589743574 147.44230769230774) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g transform="translate(106.84615384615387 10) rotate(0 30 25.769230769230774)"><path d="M31 0 C39.68177935369313 7.783664248138667, 48.36355870738626 15.567328496277334, 60 26 M31 0 C38.24992911927402 6.499936451762915, 45.49985823854804 12.99987290352583, 60 26 M60 26 C50.50674240104854 34.36011014019067, 41.01348480209708 42.72022028038134, 31 51.53846153846152 M60 26 C49.55936349518597 35.1944066832845, 39.11872699037194 44.388813366568996, 31 51.53846153846152 M31 51.53846153846152 C21.15606831945479 43.42882055101485, 11.312136638909578 35.319179563568184, 0 26 M31 51.53846153846152 C24.131178402528167 45.87977972615222, 17.262356805056335 40.221097913842925, 0 26 M0 26 C7.858693736419082 19.40883751139045, 15.717387472838164 12.817675022780895, 31 0 M0 26 C7.245209634676576 19.923372564464806, 14.490419269353152 13.846745128929614, 31 0" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(126.84615384615387 27.576923076923094) rotate(0 10 10.5)"><text x="10" y="15" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">f()</text></g><g><g transform="translate(137.3970474355813 65.33645052431393) rotate(0 -52.383310467614095 31.177928583996874)"><path d="M0 0 C-25.991002644063574 15.469538237070433, -51.98200528812715 30.939076474140865, -104.76662093522819 62.35585716799375 M0 0 C-29.585556689429794 17.608974410869195, -59.17111337885959 35.21794882173839, -104.76662093522819 62.35585716799375" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(137.3970474355813 65.33645052431393) rotate(0 -52.383310467614095 31.177928583996874)"><path d="M-85.78976298951312 39.12056370076698 C-90.49763235324909 44.884886044575794, -95.20550171698504 50.64920838838461, -104.76662093522819 62.35585716799375 M-85.78976298951312 39.12056370076698 C-91.14873039276 45.68209146537034, -96.50769779600687 52.243619229973696, -104.76662093522819 62.35585716799375" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(137.3970474355813 65.33645052431393) rotate(0 -52.383310467614095 31.177928583996874)"><path d="M-75.29414560989099 56.75467888147318 C-82.60581775575058 58.14424588626216, -89.91748990161017 59.53381289105114, -104.76662093522819 62.35585716799375 M-75.29414560989099 56.75467888147318 C-83.61702155770524 58.33642291639734, -91.93989750551951 59.918166951321496, -104.76662093522819 62.35585716799375" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(136.4225195153905 65.3421809033423) rotate(0 -0.6857927390579732 31.377114064155194)"><path d="M0 0 C-0.5008379759127923 22.914868301846877, -1.0016759518255847 45.82973660369375, -1.3715854781159464 62.75422812831039 M0 0 C-0.38426774981563094 17.581424139461546, -0.7685354996312619 35.16284827892309, -1.3715854781159464 62.75422812831039" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(136.4225195153905 65.3421809033423) rotate(0 -0.6857927390579732 31.377114064155194)"><path d="M-11.013736285378604 34.345973556619356 C-7.49288004214336 44.71932100445397, -3.9720237989081166 55.09266845228858, -1.3715854781159464 62.75422812831039 M-11.013736285378604 34.345973556619356 C-8.312360648783699 42.30492050861567, -5.610985012188795 50.26386746061199, -1.3715854781159464 62.75422812831039" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(136.4225195153905 65.3421809033423) rotate(0 -0.6857927390579732 31.377114064155194)"><path d="M9.502572527769305 34.794387505319676 C5.531845677478852 45.00399544291492, 1.5611188271884 55.213603380510165, -1.3715854781159464 62.75422812831039 M9.502572527769305 34.794387505319676 C6.456033836064143 42.62770537928748, 3.4094951443589814 50.46102325325527, -1.3715854781159464 62.75422812831039" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(137.45579120803774 67.48422409514626) rotate(0 45.52114500138972 32.22916939133012)"><path d="M0 0 C23.15651321601319 16.39491245065737, 46.31302643202638 32.78982490131474, 91.04229000277944 64.45833878266023 M0 0 C20.505074218864813 14.51768206536577, 41.010148437729626 29.03536413073154, 91.04229000277944 64.45833878266023" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(137.45579120803774 67.48422409514626) rotate(0 45.52114500138972 32.22916939133012)"><path d="M62.10538237018054 56.54282384916159 C69.46545518913504 58.556126965367426, 76.82552800808953 60.56943008157326, 91.04229000277944 64.45833878266023 M62.10538237018054 56.54282384916159 C68.622721055892 58.32560217324557, 75.14005974160347 60.10838049732954, 91.04229000277944 64.45833878266023" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(137.45579120803774 67.48422409514626) rotate(0 45.52114500138972 32.22916939133012)"><path d="M73.96332763332161 39.794416864549895 C78.30734421959232 46.06766014344812, 82.65136080586304 52.34090342234635, 91.04229000277944 64.45833878266023 M73.96332763332161 39.794416864549895 C77.80995089989631 45.34936884833347, 81.656574166471 50.904320832117044, 91.04229000277944 64.45833878266023" stroke="currentColor" stroke-width="1" fill="none"></path></g></g></svg>
<figcaption>Hash Index</figcaption>
</figure>
<p>A simple hash function for an integer type is modulo: divide a number by another number, and the remainder is the hash code. For example, to divide values across 3 buckets you can use the hash function <code>mod(3)</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">mod</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">bucket</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="go"> n β bucket</span>
<span class="go">βββββΌββββββββ</span>
<span class="go"> 1 β 1</span>
<span class="go"> 2 β 2</span>
<span class="go"> 3 β 0</span>
<span class="go"> 4 β 1</span>
<span class="go"> 5 β 2</span>
<span class="go"> 6 β 0</span>
<span class="go"> 7 β 1</span>
<span class="go"> 8 β 2</span>
<span class="go"> 9 β 0</span>
<span class="go"> 10 β 1</span>
</pre></div>
<p>When a new value is added to the index, PostgreSQL applies the hash function to the value and puts the hash code and a pointer to the tuple in the appropriate bucket. In the example above using the hash function <code>mod(3)</code>, if you insert the value <code>5</code> the index entry will be added to bucket <code>2</code>, because <code>5 % 3 = 2</code>.</p>
<p>When you query a value using a Hash index, PostgreSQL does the opposite. It takes the value and applies the hash function to determine which bucket may hold matching tuples. Once the bucket is identified, PostgreSQL will fetch the tuples referenced in that bucket and match them against your query.</p>
<h3 id="hash-collision"><a class="toclink" href="#hash-collision">Hash Collision</a></h3>
<p>You may have noticed that multiple values can map to the same bucket; this is called a <em>collision</em>. For example, the hash function <code>mod(3)</code> returned the hash code <code>2</code> for the values <code>2</code>, <code>5</code> and <code>8</code>.</p>
<p>What PostgreSQL actually does, is to first use a hash function to produce an integer hash code:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">hashtext</span><span class="p">(</span><span class="s1">'text'</span><span class="p">),</span>
<span class="w"> </span><span class="n">hashchar</span><span class="p">(</span><span class="s1">'c'</span><span class="p">),</span>
<span class="w"> </span><span class="n">hash_array</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">1</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="mf">3</span><span class="p">]),</span>
<span class="w"> </span><span class="n">jsonb_hash</span><span class="p">(</span><span class="s1">'{"me": "haki"}'</span><span class="o">::</span><span class="nb">jsonb</span><span class="p">),</span>
<span class="w"> </span><span class="n">timestamp_hash</span><span class="p">(</span><span class="n">now</span><span class="p">()</span><span class="o">::</span><span class="nb">timestamp</span><span class="p">);</span>
<span class="go">β[ RECORD 1 ]βββ¬ββββββββββββ</span>
<span class="go">hashtext β -451854347</span>
<span class="go">hashchar β 203891234</span>
<span class="go">hash_array β -325393530</span>
<span class="go">jsonb_hash β -1784498999</span>
<span class="go">timestamp_hash β 1082344883</span>
</pre></div>
<p>It then uses <code>mod(n_buckets)</code> to determine which bucket the tuple should be put in. This can cause multiple values to end up in the same bucket. This is why even after the bucket is identified, the database still needs to sift through the hash codes in the bucket and recheck the condition to filter only the matching tuples.</p>
<figure>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 264 404.84182842513394" width="20em" height="404.84182842513394"><g transform="translate(15.512820512820554 205.0482045158265) rotate(0 25.615384615384613 24.730769230769226)"><path d="M-0.19955050200223923 -0.5022821500897408 C12.515398778651768 1.3839953097529136, 27.426766821512807 -1.110534175949601, 52.87060018514211 0.2799985334277153 M-0.22747409716248512 -0.4021441303193569 C14.981192744465973 -0.13306419310088335, 30.730505185459666 -0.04445461687560259, 51.473113606468985 0.20535940304398537 M52.097404918418476 1.3712785169482231 C52.5340261001627 20.319714284659575, 50.905008985523416 37.343093594851396, 52.55776568387563 48.9073263607346 M51.521042416588614 -0.22304731234908104 C50.4750079852142 14.56966808911126, 50.79115876970669 27.401877051362618, 51.53992934037859 48.92918941521873 M52.43172617887075 48.85438027003636 C30.712035871411743 50.321871338985275, 10.87089323240977 51.26784783472807, -0.3571072742342949 47.53630389788975 M51.303831050888846 50.22865745568504 C38.01795174353397 49.79344410992012, 25.740247332419337 48.88824112511024, -0.9162337817251682 49.94066080594291 M0.5526808574795723 47.91920985797276 C-1.5679373418683042 34.313584183700954, 0.46043995267210097 20.68671064565961, 0.6570998504757881 -1.1331176832318306 M0.9474298916757107 48.849839539768595 C0.9555796319217635 35.72578146867453, 0.6599521571369125 22.443218911496487, 0.9200445376336575 -0.5301238857209682" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(106.58974358974342 210.6635891312111) rotate(0 24.84615384615384 22.03846153846152)"><path d="M-0.5022821500897408 -1.6616669818758965 C12.121727174119306 -0.8777840748142738, 20.07580699393382 1.6894115791011315, 49.972306225735394 -0.19079291075468063 M-0.4021441303193569 0.10068535432219505 C10.70609081238508 1.0825275334543907, 21.318052714318032 0.35538150398203955, 49.897667095351665 0.9037443362176418 M51.0635862092559 0.9852916076779366 C50.3439581250084 15.412852115384648, 48.05881737221604 33.44015169779839, 49.13809559150383 43.70460517417922 M49.4692603799586 -0.7168144024908543 C49.93026358338788 12.475344237026103, 48.84478182050183 26.061175819658295, 49.159958645987956 45.056108177567864 M49.085149500805585 43.94386107933059 C34.53647095119723 44.78678141304503, 19.36229649094434 44.01820741364012, -1.9252345636487007 43.07930119049087 M50.459426686454265 43.67788952674999 C35.54814443886279 44.250275157157006, 20.247278603624828 43.19699539965731, 0.479122344404459 43.79160982456341 M-1.542328603565693 42.332900357360984 C-1.411919892691076 32.707359693342646, -1.611535132788122 21.426187010281343, -1.1331176832318306 -0.7236872836947441 M-0.6116989217698574 44.75827032413616 C-1.0565939008072016 33.683347519401146, -1.2904292403534052 23.43837695465636, -0.5301238857209682 0.38991236314177513" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(190.82051282051282 126.04820451582646) rotate(0 24.846153846153868 18.961538461538453)"><path d="M-1.6616669818758965 1.6731178686022758 C16.78472960539737 -0.6966138213471723, 37.06401956425267 0.4047963291807818, 49.501514781553055 -1.7645950391888618 M0.10068535432219505 0.18273848667740822 C18.94037387107429 0.47118276150753885, 36.18411130750412 0.40519410164883524, 50.59605202852538 0.6428535617887974 M50.67759929998567 1.011194221675396 C49.21290964458142 7.0095195132665875, 51.18363160464917 16.012719987963248, 49.31998978956392 37.76559357631663 M48.97549328981688 0.651977363973856 C49.401724597854724 7.381244399885715, 50.11154832546245 15.09714855127609, 50.67149279295256 38.167070149515666 M49.55924569471529 38.88739113796214 C36.204669857311735 39.6825121753548, 21.53699129774022 35.84698690107807, -0.9976218864321709 37.71699695575694 M49.29327414213469 38.1745765200601 C34.14174680778616 37.650572666187685, 17.40086018999037 38.783398398418825, -0.2853132523596287 37.56707227144102 M-1.7440227195620537 36.2023809431837 C0.179557781617802 27.28250250383637, 0.5523030571152385 19.349432841344512, -0.7236872836947441 -1.108871228992939 M0.6813472472131252 37.23604714784483 C0.026425205815870156 22.445618331575613, 0.5607499710212535 8.279766083909912, 0.38991236314177513 -0.5403187833726406" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(10 359.79820451582634) rotate(0 122 15.5)"><path d="M1.3830574546884684 -0.766591932933393 C70.08647644775337 3.0313232719419307, 141.76242428282572 1.5180658793169806, 242.54132433269913 1.0191277193675368 M0.151057992386882 0.8014145071813656 C67.87393778094331 -0.0027509487309640146, 135.4690212875341 0.5439852753517442, 244.53140512547856 -0.6743016239396497 M245.0111942216754 -0.6437255069613457 C242.93860163901002 11.97028176896274, 243.65833606932313 20.914252728968858, 243.84251665323973 29.121368534862995 M244.65197736397386 -0.4006636179983616 C243.4973509342596 10.509006985649467, 243.39850628685207 21.37887885272503, 244.24399322643876 30.797033425420523 M244.79713568815882 31.340132714068066 C157.7274380126833 30.57061041290852, 66.70518457784806 30.041138744557248, -0.1703528725695461 31.25947859544004 M244.20789831905228 31.11328697277518 C168.47691761982585 29.43776302487535, 94.43064238150487 29.959303890483323, -0.29428583400409963 30.42090998810497 M-1.7206959798932076 31.28435457497835 C-1.6812502864375711 21.500444521382448, -0.8940567496791482 9.96300205811858, -1.108871228992939 1.676905281841755 M-0.6870297752320766 31.385719772428274 C-0.7312384490296244 19.683011873438954, -0.34049721997231247 6.9158470347523675, -0.5403187833726406 0.2720078192651272" stroke="currentColor" stroke-width="1" fill="none"></path></g><g><g transform="translate(41.5 358.79820451582634) rotate(0 -0.8478831860562792 16.463739876449097)"><path d="M1.0038707211613656 -0.5564188197255134 C0.8052911077936491 5.190397640566031, -1.1868683700760205 27.99357472707828, -1.4928469702601432 33.48389857262373 M0.07176412043161684 1.7658573545794933 C-0.30159107876631125 7.280370204724991, -2.4360931333185487 26.56502824701679, -2.699637093273923 31.87787057447247" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(78.5 360.54820451582634) rotate(0 0.40188434262697115 16.24821416716088)"><path d="M-0.5564188197255134 -0.49284697026014324 C-0.3929356927673022 4.784507762889067, 0.07690806041161197 27.34565079559882, -0.016101427376270516 32.66799912005663 M1.3523225705791266 -1.797195574985817 C1.5311130503782384 4.221388141267623, 0.10116592317509154 28.626944251510622, -0.4476001403760166 34.29362390930764" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(115 358.04820451582634) rotate(0 -0.12128858226351724 17.158845557896427)"><path d="M-0.49284697026014324 0.9838985726237295 C-0.7988255704442659 6.47422241816918, -0.5710158710678419 27.009554350872833, -0.8320008799433709 32.38552425354719 M1.449269641013816 0.45484137791208923 C0.9121309869208684 6.1882029905697955, -1.496918072331076 28.72439717330349, -1.6918468055408449 33.86284973788075" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(149 358.54820451582634) rotate(0 -0.5693612021720043 15.816707751555384)"><path d="M0.9838985726237295 0.1679991200566291 C0.8908890848358471 5.490347444514434, -0.9071123157938321 26.098070982595285, -1.1144757464528083 31.441242976486684 M0.04130659391172231 -0.7894052872527391 C-0.171054163776959 4.687793655746305, -2.0094651505382113 26.724385036680225, -2.1226209769677373 32.422820790363474" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(183 360.04820451582634) rotate(0 -0.945378951728344 16.009769264995157)"><path d="M0.1679991200566291 -0.11447574645280834 C-0.09298588881889985 5.2614941562215485, -1.8185956840713817 27.88440085699161, -2.058757023513317 33.239717988669874 M-1.202940071253106 -1.2201794586796315 C-1.061463498600448 3.7287923910996565, 0.2605227128385257 25.788538095134623, 0.0973500755149872 31.50549518394284" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(217 359.54820451582634) rotate(0 -0.9077400482707816 16.28119531495031)"><path d="M-0.11447574645280834 -1.058757023513317 C-0.32183917711178456 4.284414970378082, -0.03226580967505785 26.613767841955024, -0.26028201133012785 32.227031083405016 M-1.6337142426799984 0.999791593803093 C-2.0204647632470976 6.508746632688369, -0.6753242287070802 28.12082274720383, -0.8199755309056491 33.621147653413935" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(39.50643336181476 255.73369756918925) rotate(0 47.1660320627946 45.74964675731104)"><path d="M-1.058757023513317 0.7397179886698722 C14.811713314660995 15.78092279276526, 79.2502595633696 75.11693737314522, 95.3908211491025 90.13275307811952 M0.5862568098027259 0.0824659873824567 C16.26012115595786 15.295830033809825, 78.55011881144337 76.05270114217393, 94.29946700426294 91.4168275272396" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(39.50643336181476 255.73369756918925) rotate(0 47.1660320627946 45.74964675731104)"><path d="M68.18713547028958 78.62330896589998 C74.72460251364633 83.39520403521757, 84.48075692485948 89.00712432738975, 94.84216313006831 91.90151627863912 M66.13855326287805 79.51157500404958 C76.51653828872233 83.35006865960578, 85.10161388463223 86.49051535688916, 93.66249373870446 92.0803257537928" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(39.50643336181476 255.73369756918925) rotate(0 47.1660320627946 45.74964675731104)"><path d="M82.51577591223497 63.93283363419998 C84.5400367147757 73.19111564122828, 89.79699907791733 83.41584411032204, 94.84216313006831 91.90151627863912 M80.46719370482344 64.82109967234958 C86.38509883209882 73.46053082730776, 90.37733019066611 81.30980284855619, 93.66249373870446 92.0803257537928" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(116.42434244501248 260.287633808738) rotate(0 58.983241241032175 46.95685328907851)"><path d="M0.7397179886698722 -0.2729689165949821 C20.475055826192385 15.53261243680043, 98.58760254028084 77.92635341778015, 118.29755127868226 93.77473109685916 M-0.3310687966179102 -1.4618815431464465 C19.21403924622353 14.651280760211105, 97.31617065213452 79.61768070648041, 117.09615501295386 95.37558812130342" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(116.42434244501248 260.287633808738) rotate(0 58.983241241032175 46.95685328907851)"><path d="M88.31627940736169 84.68704535430743 C97.11265823398568 88.55319083636373, 107.07753679168015 91.60139405632607, 117.58084376435338 95.7863069273914 M89.20454544551129 86.17697287342025 C100.38474218322703 90.08654276478784, 110.6275611214584 92.80253264308425, 117.75965323950706 95.0984820709015" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(116.42434244501248 260.287633808738) rotate(0 58.983241241032175 46.95685328907851)"><path d="M101.29652990115403 68.79260923646282 C106.18242435272083 77.35547264051694, 112.37705475097935 85.02037940924626, 117.58084376435338 95.7863069273914 M102.18479593930363 70.28253675557565 C108.40001269094637 80.29100578855251, 113.57567369439494 89.21177716408982, 117.75965323950706 95.0984820709015" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(210.75609523023303 270.1277797646135) rotate(0 -23.21603481470501 41.01249409247748)"><path d="M-0.4825729563832283 0.12082242518663411 C-8.449873775255824 13.980706899933212, -39.31822317844701 69.2375145533293, -47.0640869245657 82.88633793188448 M1.4649375121761112 -0.8613497469294816 C-6.705922378416131 12.718583748763123, -39.72743348108118 67.31699609956553, -47.89700714158614 81.3745421383642" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(210.75609523023303 270.1277797646135) rotate(0 -23.21603481470501 41.01249409247748)"><path d="M-42.10851234209221 52.26487628944504 C-42.458495798913916 63.07911111068027, -45.933333101357825 73.60661775133254, -46.08951846915085 82.6602492619418 M-41.81723724689763 52.40499642692792 C-43.72440971030766 57.73893332611301, -43.95943368970588 64.53700030066432, -48.083166092958045 81.29580046498407" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(210.75609523023303 270.1277797646135) rotate(0 -23.21603481470501 41.01249409247748)"><path d="M-24.469394470231443 62.75208388195706 C-31.17458867399639 69.70275707268551, -41.02849240417217 76.43763675999281, -46.08951846915085 82.6602492619418 M-24.178119375036864 62.892204019439944 C-29.98361397149682 66.05638069461182, -34.071517142127675 70.56374672817189, -48.083166092958045 81.29580046498407" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g transform="translate(39.62820512820508 216.20205066967264) rotate(0 1.5 10.5)"><text x="1.5" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(125.43589743589723 222.2020506696726) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(208.89743589743574 222.20205066967264) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g transform="translate(191.76923076923043 208.93281990044176) rotate(0 25.615384615384613 24.730769230769226)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.8833133674831936 6.360576506008498 C2.4877938214388613 4.244359478950132, 2.9130609833291805 1.6766131467793663, 4.488451119842476 -0.23663964459325726 M-0.28850608092265817 6.303106069614284 C1.0343724624951793 5.088796225112576, 1.888014702162439 4.282290327798344, 4.970310227158016 0.38432958385322624 M-0.5029007366765632 13.458718479096472 C2.353221118850686 11.092114433712052, 4.956941893578481 8.698890616321265, 11.4407319977404 0.8728738243773342 M-0.6591631433833731 11.31608433467108 C3.9865724479397984 6.668010527512484, 7.219039421112019 3.1231247390868266, 10.500751577270162 -0.12200269446822387 M0.37998445792587 18.390085420478755 C4.482960960318687 14.556425820871183, 9.369435253632687 8.388453495554378, 15.583549394741137 0.5507988861692121 M-0.9552512681624155 19.226351412851983 C5.8000587369344 11.326027374332568, 13.478120582025662 5.072244249357301, 15.40744865266927 -0.00812166492222044 M1.5975401254556392 24.819753056108116 C9.012704787803802 16.08196464042257, 15.121256719053424 7.865628445076538, 22.419335900151992 1.5604580318718675 M-0.6602380534448411 23.609279351024984 C6.579393143173927 17.14423667885636, 10.002655516387541 10.809855341916405, 21.127566026801134 0.5369015125721219 M-1.5877207433293208 32.27249422459697 C12.03821191842475 18.014435009911747, 22.677173939928053 8.467875584083604, 25.392589072373124 -0.3963112426909863 M-0.32496810233691775 30.60522756888365 C10.012474822088501 19.477456448849097, 18.497309168309286 9.813292301640764, 27.05334541076085 -0.02710092868537295 M-0.7782840413135013 35.771675517597544 C6.147948382364462 26.70279687017547, 16.004970283356485 21.689271690024167, 30.62719454751054 -1.9961644260898943 M-0.6504380499106368 37.02552652526771 C6.812786442359539 28.29680120828104, 14.54967392142883 20.172678234300996, 31.292192948466933 0.14164813992224978 M0.6811768386186117 43.20785102676206 C15.96499601411529 27.856445482615456, 26.895655491491077 10.981070706839972, 37.13061434570824 0.7202688139956912 M0.7209492187620157 43.06041292901688 C13.223832637152727 29.18218176830856, 24.049749136934004 15.108359493210564, 36.76087241787587 0.9076704179030379 M-1.0143760833890028 48.35289108441107 C14.046826659574279 32.04593134494648, 31.86824917990042 12.516923527451716, 43.292359103072215 1.060707988995219 M0.6122074816375189 49.16815847963922 C17.46468029476411 29.97640987576258, 33.66591877306046 10.506942918369262, 41.80897746290617 0.7455555338739401 M2.3363039447317 53.308176881082936 C12.928061036636041 41.196284006174814, 22.17704874972543 29.41155637127906, 46.885515839746475 -1.5311015687707616 M2.9034164985964175 52.39826448486058 C20.342490193804384 31.369287138664443, 37.544321517093195 11.495909910963782, 48.34545936151433 -0.597362312656994 M9.693803943124706 51.32995374909821 C22.606799840305026 34.573881319768866, 38.25442919195592 18.802927776744745, 50.90916116284651 -0.008355767650019885 M7.409777547595198 51.09181514418903 C19.95502137514061 36.840040139474226, 32.11059270482506 22.873365436026603, 51.80604022862834 1.241028587149966 M13.276523431933999 51.54165452077743 C25.35359757323413 35.931560307685956, 41.26148723906369 19.76772342148547, 52.96565291857327 8.085323206014614 M14.505085656590342 51.46880090713259 C20.651686604505123 42.42554858136944, 28.914315990614792 33.807456230059316, 52.655504668261656 6.91913241413949 M19.810674431303564 53.68389952844113 C31.03742462798113 36.72207744339456, 45.68264438700518 21.451833886038244, 50.912105916975804 12.498147382315967 M18.496729603538338 52.25311847319931 C25.991329925828488 44.49947737034908, 33.07377906448256 36.01008788241084, 52.164175498275824 13.148906315591905 M22.404517577339426 51.84565828351879 C36.84623495178536 38.61102487237085, 44.40461980577457 25.433867055348827, 53.02103426516964 18.378483352906265 M23.624928032357708 52.27873247100615 C31.578880433937453 42.79037914344116, 36.92843365929103 36.11728645936361, 52.69355608359887 18.679180750346937 M28.549889890185554 51.251926118330935 C34.74415579811608 42.233448110521365, 41.51978242028503 35.73023594477074, 53.854587483298744 26.00601395010606 M28.189257221004922 51.15235210006116 C36.65995396770647 43.45364069744954, 43.41559877015189 36.109598009732366, 51.695566452187265 25.901821167912637 M34.620677005947975 52.97604754430703 C41.10065624890303 43.215905935815854, 48.21417976160118 35.617223404005344, 52.106963707469134 32.2587711853448 M35.17212331339851 51.84075448898128 C38.74243348634677 47.66508979693839, 42.06338637453563 42.9296166428115, 51.67995649795952 30.19376094690595 M37.96925618540117 50.78097336899365 C43.79053359549952 48.10613638695969, 46.797709473805014 41.6554014576446, 50.687528950343584 39.30374826509866 M40.33565787098924 51.35673839032526 C43.504435060770554 46.73001007672866, 46.78342345833629 43.87109225379813, 52.33568042683855 36.623670153695976 M45.603730846932564 52.2467145902127 C46.64466075348626 51.331873224640454, 48.09339524929117 48.625204865430554, 50.97473978067723 44.23787758821341 M44.52735821138177 52.0632239176914 C47.565088069366716 49.156750178839886, 50.34645790109832 45.727992784487675, 52.27758183128176 43.426502319105886" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M1.7881132438778877 -1.9856047704815865 C12.554057232519755 -1.2585733573654525, 25.196595764618653 0.12386554072109546, 52.77154823278005 -1.056793935596943 M52.15927392874772 -0.14972387999296188 C52.24389845523983 13.764610193068018, 52.41853553447872 25.27282396583603, 50.4256168070894 47.94889501252999 M51.174012145743916 49.74028720477452 C37.42190567249288 48.33493883444139, 22.680426945823886 49.37452037168809, -0.3310612365603447 48.80458845713963 M-0.5470633581280708 50.166178261431355 C1.7950452980485099 35.15590045225734, 0.6715027508225578 16.88916054591536, -1.2611968591809273 0.7664104774594307" stroke="#f41d92" stroke-width="1.5" fill="none" stroke-dasharray="3 6"></path></g><g transform="translate(112.12805383672253 10) rotate(0 30 25.76923076923076)"><path d="M31 0 C39.60545709244907 7.7152373932302005, 48.210914184898144 15.430474786460401, 60 26 M31 0 C39.98605573661625 8.05646376386285, 48.97211147323251 16.1129275277257, 60 26 M60 26 C53.66638129837811 31.577616469332803, 47.33276259675622 37.155232938665606, 31 51.53846153846152 M60 26 C48.87317028753459 35.79869353988995, 37.74634057506918 45.59738707977991, 31 51.53846153846152 M31 51.53846153846152 C23.677215790376067 45.50579563872171, 16.354431580752134 39.473129738981896, 0 26 M31 51.53846153846152 C22.449277359619735 44.49419375531947, 13.89855471923947 37.449925972177425, 0 26 M0 26 C10.536266363039614 17.163131437450645, 21.072532726079228 8.326262874901293, 31 0 M0 26 C11.768365174159408 16.129758241027595, 23.536730348318816 6.2595164820551865, 31 0" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(132.12805383672253 27.576923076923094) rotate(0 10 10.5)"><text x="10" y="15" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">f()</text></g><g><g transform="translate(136.9313105462623 67.30769230769232) rotate(0 -46.45611715214591 64.42307692307693)"><path d="M0 0 C-28.495658624052027 39.51638923026622, -56.99131724810405 79.03277846053244, -92.91223430429181 128.84615384615387 M0 0 C-35.64949572963878 49.436981531146635, -71.29899145927756 98.87396306229327, -92.91223430429181 128.84615384615387" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(136.9313105462623 67.30769230769232) rotate(0 -46.45611715214591 64.42307692307693)"><path d="M-84.74597029705222 99.97900760403179 C-87.2505174444394 108.83239866631195, -89.75506459182657 117.68578972859211, -92.91223430429181 128.84615384615387 M-84.74597029705222 99.97900760403179 C-87.87928418228017 111.05504311637392, -91.01259806750814 122.13107862871605, -92.91223430429181 128.84615384615387" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(136.9313105462623 67.30769230769232) rotate(0 -46.45611715214591 64.42307692307693)"><path d="M-68.101069209058 111.98181019995612 C-75.71051371238666 117.15400935175882, -83.31995821571532 122.32620850356152, -92.91223430429181 128.84615384615387 M-68.101069209058 111.98181019995612 C-77.62086529116603 118.45249031329095, -87.14066137327406 124.92317042662579, -92.91223430429181 128.84615384615387" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(141.70441950595915 65.3421809033423) rotate(0 -1.4550235082887468 69.06942175646287)"><path d="M0 0 C-0.709041979940326 33.657957604501625, -1.418083959880652 67.31591520900325, -2.9100470165774937 138.13884351292575 M0 0 C-0.8738156617443605 41.479702654018425, -1.747631323488721 82.95940530803685, -2.9100470165774937 138.13884351292575" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(141.70441950595915 65.3421809033423) rotate(0 -1.4550235082887468 69.06942175646287)"><path d="M-12.574637247120437 109.73821507488421 C-10.219829866573217 116.65811626704499, -7.865022486025997 123.57801745920575, -2.9100470165774937 138.13884351292575 M-12.574637247120437 109.73821507488421 C-9.672598118652262 118.2662264063747, -6.770558990184086 126.79423773786519, -2.9100470165774937 138.13884351292575" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(141.70441950595915 65.3421809033423) rotate(0 -1.4550235082887468 69.06942175646287)"><path d="M7.942019409577396 110.17042106398662 C5.297879771485093 116.98501392658794, 2.65374013339279 123.79960678918927, -2.9100470165774937 138.13884351292575 M7.942019409577396 110.17042106398662 C4.683410338505787 118.56865154993545, 1.4248012674341792 126.9668820358843, -2.9100470165774937 138.13884351292575" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(149.95302058214017 61.2864411631366) rotate(0 28.805171578798095 27.635753165027324)"><path d="M0 0 C21.82529086598449 20.93923827106187, 43.65058173196898 41.87847654212374, 57.61034315759619 55.27150633005465 M0 0 C22.81293896902497 21.886790314446475, 45.62587793804994 43.77358062889295, 57.61034315759619 55.27150633005465" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(149.95302058214017 61.2864411631366) rotate(0 28.805171578798095 27.635753165027324)"><path d="M30.164309002621188 43.158900947068474 C40.56205436774476 47.747679995049296, 50.95979973286833 52.33645904303012, 57.61034315759619 55.27150633005465 M30.164309002621188 43.158900947068474 C41.0325779245129 47.95533353560639, 51.90084684640462 52.75176612414431, 57.61034315759619 55.27150633005465" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(149.95302058214017 61.2864411631366) rotate(0 28.805171578798095 27.635753165027324)"><path d="M44.3712938687299 28.350741594873572 C49.38681861101766 38.549492174657146, 54.40234335330541 48.748242754440724, 57.61034315759619 55.27150633005465 M44.3712938687299 28.350741594873572 C49.613783444058825 39.011010727851094, 54.85627301938775 49.67127986082862, 57.61034315759619 55.27150633005465" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(215.84615384615375 165.89435836198027) rotate(0 0.3846153846154152 21.153846153846132)"><path d="M0 0 C0.23940691552485024 13.167380353865703, 0.4788138310497005 26.334760707731405, 0.7692307692308304 42.307692307692264 M0 0 C0.16685400129512135 9.176970071230935, 0.3337080025902427 18.35394014246187, 0.7692307692308304 42.307692307692264" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g transform="translate(207.5384615384616 137.70205066967247) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g><g transform="translate(209.91295929286983 266.7312869756914) rotate(0 -76.96539960588973 41.044413912251656)"><path d="M0.537255983054638 -0.5164419695734978 C-25.195717696318376 13.007344287378437, -127.84770122949952 67.49526718727175, -153.32559525799363 81.16888101990727 M-0.6398233551811427 1.8268220510613173 C-26.657042332164917 15.535847339030076, -129.1134273795782 69.08827673839266, -154.46805519483408 82.60526979407688" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(209.91295929286983 266.7312869756914) rotate(0 -76.96539960588973 41.044413912251656)"><path d="M-135.17579685561074 59.76519201654494 C-142.47761293964737 70.16713330094196, -149.7303268088794 76.5761446111615, -152.94518513850292 81.06575895538946 M-133.49052421040548 59.56104871961772 C-139.31195774937274 65.53611584887621, -142.4797328959443 70.67432462693765, -154.99567520863732 81.85958349770327" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(209.91295929286983 266.7312869756914) rotate(0 -76.96539960588973 41.044413912251656)"><path d="M-125.60610432944877 77.91845583715194 C-136.63787547258147 81.29519781571429, -147.61658646322502 80.63616503205189, -152.94518513850292 81.06575895538946 M-123.92083168424351 77.71431254022473 C-131.6475698672048 79.80475481530144, -136.8595776194713 81.0651489508949, -154.99567520863732 81.85958349770327" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g></svg>
<figcaption>Hash Index overflow pages</figcaption>
</figure>
<p>As rows are added to the index, it's possible for a bucket's primary page to fill up. In this case, additional rows are written to overflow pages. The overflow pages contain index entries that did not fit in the bucket's primary page.</p>
<h3 id="index-split"><a class="toclink" href="#index-split">Index Split</a></h3>
<p>At some point, the database can decide it needs to split a bucket into two buckets. PostgreSQL uses special hash functions that ensure values in a bucket can be split into exactly two buckets. When a bucket is split, additional storage is allocated to the index.</p>
<figure>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 306.71794871794884 331.53266336844274" width="20em" height="331.53266336844274"><g transform="translate(10 135.66358913121098) rotate(0 25.615384615384613 24.730769230769226)"><path d="M0.46364592760801315 0.6525715664029121 C12.807024691368525 0.3073646277131943, 26.215825513234506 1.0152157992067246, 53.08060819541032 1.576662190258503 M0.6510475315153599 0.715393777936697 C13.7359403777581 0.18703549221731147, 25.480321901979355 -0.2229871241691021, 50.583604353265116 0.7041979990899563 M52.42945653830583 1.8350220993161201 C49.60865819716969 18.146282168277178, 52.728272083049205 33.55630474944527, 52.18697983656938 50.703691040667195 M52.11430408318455 -0.8891390599310398 C51.78191803017201 15.581770259457135, 51.13395152607502 30.298403303210545, 51.633173125581095 49.89994492525091 M49.477539605246136 51.31428483644356 C36.15056328045634 50.31966970325089, 17.24594260752201 51.014016846424106, 0.8470594808459282 49.869738375338216 M50.411278861359904 49.841916704120536 C34.54117078402867 50.51754322387277, 20.280990837800957 49.60462906695901, -0.3735545240342617 49.611108803691764 M-0.590591199696064 49.03373793283333 C1.1766373284820182 36.278281890916126, -1.4092388979431527 19.053717267455955, 0.1285867616534233 -0.9884282276034355 M0.6587931551039219 50.06592848295202 C0.5911258164764597 34.8007444097732, -0.04406145178068144 21.873813357834624, -0.9480625949800014 0.07047772035002708" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(93.3846153846153 138.20205066967264) rotate(0 24.84615384615384 22.03846153846149)"><path d="M0.6525715664029121 0.28208183497190475 C17.064400093314735 1.050741619966351, 34.46439542930859 1.7346407841661802, 51.26896988256618 -0.9394140318036079 M0.715393777936697 0.7730547823011875 C18.874492391370804 -0.7689698315182557, 37.211565705732646 -0.9706854200878968, 50.396505691397635 0.3416140712797642 M51.5273297916238 -1.6482439115643501 C51.48058943185954 14.927326198695932, 50.362474722005416 27.868123054561682, 50.93446027143642 44.02407020043864 M48.80316863237664 0.36843806877732277 C50.252770382575676 14.571748311215854, 50.26165111701934 30.1049573878829, 50.13071415602013 44.39533294078245 M51.54505406721279 45.99718612145915 C35.98819373232813 43.65742431334001, 21.961004112775505 44.48837242773515, 0.4081999137997627 44.294962105269576 M50.07268593488976 43.315070756639386 C33.02121871870297 44.57541326099286, 14.81056800031891 43.29614487701306, 0.14957034215331078 43.986909159387494 M-0.4278005287051201 45.717601952071334 C1.559342373870313 25.773410758404754, -1.2719833955541255 9.488879904093643, -0.9884282276034355 -0.4189746454358101 M0.6043900214135647 44.5876467311038 C-0.5550855552892275 31.445874592575862, 0.32028840425209326 18.115690741401437, 0.07047772035002708 0.030039016157388687" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(173.76923076923066 140.5097429773648) rotate(0 24.846153846153868 18.961538461538453)"><path d="M0.28208183497190475 0.37225592881441116 C16.703058241938184 0.9179681649632178, 32.92407584763493 1.4145232549137794, 48.75289366050413 1.5452708080410957 M0.7730547823011875 0.12739646062254906 C13.883762321907755 -0.6381218331077927, 28.011258344180337 0.3487023217460281, 50.0339217635875 0.05283474549651146 M48.044063780743386 1.618144877254963 C51.39096722147504 13.340783490613097, 50.87580785296002 26.923659225553262, 49.63945481582334 38.661876144890584 M50.06074576108506 0.8740179501473904 C49.74784285871173 9.562042649462816, 50.42977518884326 17.796694393455976, 50.01071755616715 37.88561528090099 M51.61257073684385 38.9404997258232 C36.34580705779106 40.09518308368438, 23.814143171218742 38.79161305156463, 0.21803902834653854 38.88501423883895 M48.930455372024085 38.5497245109425 C31.525693588302712 37.82582908072722, 11.174097240372355 38.261665991928695, -0.09001391753554344 38.86126356485943 M1.6406788751482964 39.133059206490316 C-0.8332121737960441 27.930648485714414, 0.24057193913425384 20.19247879277054, -0.4189746454358101 -0.3251098319888115 M0.5107236541807652 37.06458788279156 C-0.37429824905040154 29.173473636318846, -1.1978972299540271 20.460578231398866, 0.030039016157388687 0.013348933309316635" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(27.564102564102598 287.33666605428766) rotate(0 122 15.5)"><path d="M0.30771970526431497 0.9638064137498049 C79.7493344336594 1.244663295860825, 159.01628446794743 -1.0610302970046417, 245.2773746253509 -0.19246128720987735 M0.1053103477474287 -0.36822388386848715 C88.85193115323015 -2.094211933352703, 178.7532341100877 -1.276852844780535, 244.04367503927656 0.31908091881090844 M245.61814487725496 1.4713699743151665 C244.78445859279483 12.114175932481885, 245.52658100452274 22.263077776879072, 244.73879922181368 29.57723616808653 M244.8740179501474 -0.2795284353196621 C244.60726557601242 12.441361937299371, 244.16277335513382 24.39632594883442, 243.9625383578241 30.689070526510477 M244.84103709506363 31.013318216183773 C193.60441003827287 33.63678347901353, 140.71623511686727 33.11681328306579, 0.7951708606274932 32.38557974972726 M244.51800870346577 31.748694595473307 C190.55892710553564 29.154559148142248, 137.71477007704675 29.388943680687195, 0.7755377269925667 30.344797258196035 M1.2099822834134102 30.054802648723125 C-1.5550344040617348 22.744003121182324, -0.9221191456541419 13.811858073621988, -0.3251098319888115 -1.3766999319195747 M-0.8584890402853489 30.875536385923624 C-0.17047598738223313 18.709678624197842, -0.11688318628817795 8.345521454513072, 0.013348933309316635 0.36635977402329445" stroke="currentColor" stroke-width="1" fill="none"></path></g><g><g transform="translate(59.0641025641026 286.33666605428766) rotate(0 -0.5001835371035099 16.81556717542469)"><path d="M0.2233535572886467 0.699563880264759 C0.09944044401248293 6.273006259898345, -0.3057096992929777 28.40377209534248, -0.3519357398152352 33.60990337878466 M-1.1185245544742792 0.02123097206465907 C-1.3780133410325894 5.271348349206771, -1.092326160374408 26.580579233619698, -0.959747466845438 32.07002790386789" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(96.0641025641026 288.08666605428766) rotate(0 0.15016923773380597 16.694345682798883)"><path d="M0.699563880264759 0.6480642601847648 C0.6896729265650112 6.185171552995841, 0.4871054286758103 27.982814276715118, 0.10990337878465639 33.4459973141551 M-0.39230381193570785 -0.057305948557332176 C-0.47790880513998363 5.137400421180452, 0.11671690977799393 26.548618560212976, -0.25544281098060306 31.820071155307815" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(132.5641025641026 285.58666605428766) rotate(0 0.04921238874437961 16.91243005971421)"><path d="M0.6480642601847648 1.1099033787846564 C0.6018382196625074 6.316034662226836, 0.06614761004845311 26.727951315542064, -0.05400268584489809 31.936351580917837 M-0.4708407325576991 0.6469987073075028 C-0.6118567331663022 5.946966662757719, -0.525243763628726 27.68495254442406, -0.505399559540674 33.17786141212098" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(166.5641025641026 286.08666605428766) rotate(0 -0.8488529619714313 16.91210222178603)"><path d="M1.1099033787846564 0.9459973141551019 C0.7327013288935025 6.409180351595084, -1.1887153511246045 28.037952530880766, -1.5636484190821647 33.427162484824656 M0.2334639233071356 0.39704195874743187 C-0.41229049158903475 5.479013839044297, -2.438909779417639 26.632704397815587, -2.8076093027275055 31.791348040578885" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(200.5641025641026 287.58666605428766) rotate(0 0.20593731994275544 16.630370956487013)"><path d="M0.9459973141551019 -0.5636484190821648 C0.8258470182617507 4.644751846293608, 0.12128586421410259 26.70661814560493, -0.07283751517534243 32.3603049710393 M-0.016492825252935273 1.7548322155606004 C-0.2702433153024564 7.668260368459547, -0.44115792602611065 28.872419460269935, -0.5341226742696015 33.82439033205621" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(234.5641025641026 287.08666605428766) rotate(0 -0.40989147561602124 16.824787940918497)"><path d="M-0.5636484190821648 0.9271624848246576 C-0.938581487039725 6.316372438768547, -1.2100485210617384 27.86450649549564, -1.1396950289607048 33.281257037818435 M1.3412974315602333 0.3683188440185039 C0.6990032141127934 5.337481771983827, -1.8614428635717681 26.368199193853265, -2.1610803827922793 31.568842233894394" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(41.49566809309596 198.57430850488367) rotate(0 49.13159908892089 37.45650515314486)"><path d="M0.9271624848246576 -0.13969502896070485 C17.154231952102748 12.64952123601864, 82.05380406216666 63.57069126763096, 98.30841411782366 76.1716491707939 M-0.045215939981862796 -1.2586388645041735 C16.00941746430461 11.249380732029106, 81.40030110334925 61.479366192554075, 97.77052859905113 74.4997725021153" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(41.49566809309596 198.57430850488367) rotate(0 49.13159908892089 37.45650515314486)"><path d="M68.99805450635274 66.55587208489976 C81.4300020129425 70.08206837209508, 91.06577137022342 71.1477494479479, 97.2052762875234 73.20544274710709 M69.61687941672524 65.85312067563734 C80.75162731437997 67.83947601090094, 91.21483979186033 72.824182749593, 98.02054368275155 74.97787780501538" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(41.49566809309596 198.57430850488367) rotate(0 49.13159908892089 37.45650515314486)"><path d="M81.61827532565145 50.374091315837944 C89.1932583769564 60.127990223348874, 94.04807023852263 67.32386562540185, 97.2052762875234 73.20544274710709 M82.23710023602395 49.671339906575525 C88.75234770307748 57.65841509803721, 94.54078050986584 68.63717383137637, 98.02054368275155 74.97787780501538" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(105.14193582029839 187.8260953471996) rotate(0 72.390013425589 47.363693723883955)"><path d="M-0.13969502896070485 0.7812570378184318 C24.36294645750008 16.390325692086627, 122.13781737503815 79.19188899591173, 146.45220049968253 94.58157441043383 M-1.6721736485045404 0.14581303733401008 C22.796882132497128 15.349293474522568, 121.49929664278622 77.27762246313114, 145.95485311615545 92.94602417450487" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(105.14193582029839 187.8260953471996) rotate(0 72.390013425589 47.363693723883955)"><path d="M117.94899271465034 87.9702723768126 C125.6582778164579 88.44943803078024, 130.63704108300433 89.25043495587504, 144.66052336114723 94.35442017268478 M117.24624130538793 87.45699587059727 C125.6197482126734 89.37624679709614, 136.97314279368652 90.95800764737733, 146.43295841905552 93.56710046406924" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(105.14193582029839 187.8260953471996) rotate(0 72.390013425589 47.363693723883955)"><path d="M128.92379587384124 70.6303272211077 C133.99010878740356 75.48423098691177, 136.19192531039405 80.67274206921931, 144.66052336114723 94.35442017268478 M128.22104446457882 70.11705071489237 C132.93635722112876 77.87300483920427, 140.6223961355141 85.24910616223775, 146.43295841905552 93.56710046406924" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(200.1213545109003 183.1148974997384) rotate(0 -74.10642478281548 51.97138544563046)"><path d="M0.7812570378184318 0.8584725335240362 C-24.128784318431002 17.892008238611208, -123.40366105667641 85.27634928016657, -148.53308565267193 102.06151920156233 M-0.26772174666635684 0.26356666828505704 C-25.33574035700848 17.527070064972552, -123.86512324663204 86.60497793161974, -148.99410660344938 103.67920422297581" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(200.1213545109003 183.1148974997384) rotate(0 -74.10642478281548 51.97138544563046)"><path d="M-130.21363307644265 80.75107433661547 C-137.96352274531202 87.66025912030273, -145.32897193241845 96.49006616708128, -147.58571060526947 104.36243236553534 M-130.72690958265798 78.38084281623092 C-136.0374500787696 87.51621485753745, -141.8930798493096 94.73021602982998, -148.373030313885 103.65277778473362" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(200.1213545109003 183.1148974997384) rotate(0 -74.10642478281548 51.97138544563046)"><path d="M-118.54095064394969 97.62912183080181 C-130.6748696521777 98.24682937259666, -142.44399633975533 100.70916446664445, -147.58571060526947 104.36243236553534 M-119.05422715016502 95.25889031041726 C-127.9933939366363 99.31163285046821, -137.4125007574824 101.37304504012151, -148.373030313885 103.65277778473362" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(270.22507838109743 187.4503347955134) rotate(0 -46.35928103175789 46.51509765599144)"><path d="M0.8584725335240362 0.9276657387614249 C-14.569786445583313 16.03150903354474, -77.03262414080605 76.36521807960611, -92.7092489036048 91.40431680651832 M-0.14996811571530988 0.3690863062720746 C-15.710648440235394 15.562165183109052, -77.91119114652791 77.37728488007936, -93.57703459703981 92.66110900571088" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(270.22507838109743 187.4503347955134) rotate(0 -46.35928103175789 46.51509765599144)"><path d="M-79.15756357266065 65.8183639442179 C-85.13030025076485 72.90422605372251, -89.24587781567507 82.69168647881304, -92.89380645448028 92.7667784967039 M-81.5277950930452 66.37264346160028 C-84.16450078864119 75.10692660782176, -88.78100048566147 84.65321380097839, -93.60346103528201 93.03050861661772" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(270.22507838109743 187.4503347955134) rotate(0 -46.35928103175789 46.51509765599144)"><path d="M-64.75139635359899 80.43282077443477 C-74.97342442895257 83.40753723964382, -83.24314633067466 88.98079119651447, -92.89380645448028 92.7667784967039 M-67.12162787398354 80.98710029181716 C-74.72385241084409 84.90510673292088, -84.21828012958429 89.50293904489264, -93.60346103528201 93.03050861661772" stroke="#f41d92" stroke-width="1" fill="none"></path></g></g><g transform="translate(34.11538461538453 149.89435836198027) rotate(0 1.5 10.5)"><text x="1.5" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">1</text></g><g transform="translate(112.23076923076911 149.74051220813408) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">2</text></g><g transform="translate(193.38461538461524 150.5097429773648) rotate(0 6 10.5)"><text x="6" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">3</text></g><g transform="translate(247.0256410256411 140.7789737465955) rotate(0 24.846153846153868 18.961538461538453)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0 M-0.8109265841217587 6.142996145834319 C1.252880061914266 4.531640957937469, 2.322047558470259 4.001243663879113, 5.588092198553032 0.53787079265688 M-0.43425123613792155 6.23871867363631 C1.5273840243874537 4.186059419470398, 2.94653180463719 2.6556828064695917, 4.793190426709598 0.23444213828418786 M0.6163307615648614 11.082370090901433 C0.7532618923567131 8.453697135479507, 3.003135913108353 8.481731246191256, 9.628620035898816 0.41636376931217933 M0.2659536399971353 11.62833489519316 C4.08636624871612 8.796940217394742, 6.503761546467216 3.9203962759054374, 10.828408376824987 0.1993686164811017 M1.3034774707117815 20.32130560921567 C1.2711472334565994 14.042767392317817, 6.696228160365458 10.206048064238841, 17.369825362696727 1.8780480704200437 M0.5722530045965453 19.385101760645085 C6.39600763243108 10.612871941881739, 11.498111626821661 5.279453034294253, 15.925257157071908 1.1261943547916582 M2.115051597561619 25.616023083864807 C6.886270844357069 14.744759359349793, 18.515364468390047 6.781765501989629, 21.320244951451087 -0.658726709505963 M0.5860383922419761 24.619589099376558 C7.243418370946743 15.856408897320264, 15.148404440342752 8.26774432000693, 22.01460599368241 0.843142838606191 M1.215932601182665 31.423087162662497 C6.5132129651595365 23.94699760483287, 14.921461256086971 16.878506942970912, 24.828281963851662 -0.06474668215841639 M0.6803713487972871 31.345873268086663 C6.099204513702615 25.291245088137746, 9.847510902718515 18.129290729275382, 27.021050250765295 1.0532309258368553 M-0.014431817559572124 35.68236834351097 C9.818730206662899 24.859506958270828, 18.43399194648397 15.86194645474048, 32.23413062975446 -1.5845384298578615 M0.292194692315487 36.85062652249013 C8.883630683147477 25.70785196509413, 19.027132395135578 14.458557640806934, 32.19774802536408 0.19076504197560595 M1.888770629171944 38.82742088364414 C13.114525548368238 31.080007877206896, 20.46234223289191 17.94787613065388, 38.120936563053476 0.5640222859662494 M2.580534577821112 40.96498612763337 C10.399335261478175 31.403118066908753, 17.26250628196795 21.345303658223706, 37.55252754856025 0.8924836834054908 M7.289362642242711 41.563740280964225 C15.926929844920835 28.94632154505149, 25.863525820478376 18.35806082337675, 43.31326899360471 -1.597966117006635 M7.11514632336716 39.9678118392311 C19.27942272803937 24.671395785258703, 31.916685623386172 12.14780868094751, 41.72135941739731 -0.6651192243815416 M13.575941219584692 41.13760159493286 C26.6298974953873 24.586136482216187, 37.11234166300646 11.255755131447657, 48.37864853530216 1.4772455997940348 M11.877278294073093 40.979524224557466 C25.813894652976032 25.357660640361892, 40.05521221090385 9.011837461986815, 47.34488436295676 0.6468823149975584 M19.7527630910539 41.26541574322951 C25.3744519847253 29.450445922632852, 35.47233316001168 23.106468797453378, 52.31390136002082 1.585800014493259 M17.725557336982426 40.01646095910442 C28.043584255130156 27.332081830295245, 40.48230011620848 14.900592876939168, 51.62976742430109 1.7565888874802909 M25.317036370544315 38.440244464424715 C32.138450379449324 28.639625449537036, 41.90458891205789 18.60257351016614, 50.04668696569265 8.059909686552672 M23.148280433176637 40.42239178925501 C33.95241041808278 29.580403599865804, 42.44954172453388 18.391724022721842, 51.725225716549765 6.80395231034079 M30.257392787911957 41.14412367619792 C38.518635466379315 31.05315751673234, 43.93623175632945 23.093148788362637, 51.65061972712619 14.678375132537859 M28.136886929699248 40.205841890501176 C37.00345620071903 31.069617590600483, 45.44960096623876 20.351804432366734, 51.81314998169882 14.176293607957142 M32.55541545705081 41.13574011711969 C36.19607964027359 36.36445970502528, 40.941068764086964 30.606286594304777, 52.63014734518697 19.25606685035106 M33.562486307631744 39.45276850922598 C37.23056065536866 35.68682524075329, 42.22612045260735 32.06695150961929, 51.26798303780129 20.222785036044087 M40.12965003468653 40.572293628818464 C42.630256120595284 35.83613708169783, 47.407493986168895 29.367832282559032, 50.71971143022628 26.584154070741498 M39.10834741026719 39.67331028326519 C43.2766285502787 36.34465283481751, 46.14621919772218 32.38313144542754, 51.269763469179 26.346126356921157 M44.61712292272642 39.91542664175373 C45.88873797469415 37.26137735047093, 48.671376310203286 35.82483841452909, 52.381108068972615 32.26134330091014 M44.47516977943525 39.595246833427666 C46.496346122583525 38.191231731041796, 47.80595211993141 36.07696887283977, 51.886637493707035 31.472204655790176" stroke="#ced4da" stroke-width="0.5" fill="none"></path><path d="M-1.672042615711689 0.8308969810605049 C16.326145670104502 -2.349320541889622, 34.10093765465116 -0.6152166238140605, 50.527528975445534 0.9665583446621895 M-0.03265887126326561 -0.4763747490942478 C17.65546513234195 -0.7798206117004156, 32.80959636861318 0.5081510040909052, 48.69781260221055 0.7275059185922146 M47.73080847068479 0.47107303887605667 C48.54675598642865 11.071272986697458, 50.266704039788635 19.93014755610089, 48.78152177138975 37.37588829088668 M49.909206912208106 -0.05775618925690651 C50.10324640379793 8.091488444375301, 49.59145314799196 18.150641372685243, 49.9030019494777 38.345564774309196 M49.604247067410256 39.148652496819295 C34.98506492897871 37.78430204707268, 19.577545083027633 38.23390584307793, 1.6416001245379448 39.60268849421004 M50.03856007783463 37.8719786918507 C32.90467660174923 38.832399427266054, 17.023177311454837 37.68657129940503, -0.48886721953749657 37.81352262858013 M0.8513493463397026 37.40737408686141 C-1.1794472935423257 26.30673083270397, 0.8638192412629722 18.568766106378565, -1.5498569086194038 -1.1428359672427177 M0.358940664678812 38.34790163401226 C0.7263785985542031 29.25189445141988, 0.9668665317130776 21.947387360380233, 0.6955167688429356 -0.7100511826574802" stroke="#f41d92" stroke-width="1" fill="none"></path></g><g transform="translate(267.37179487179486 150.0097429773648) rotate(0 4.5 10.5)"><text x="4.5" y="15" font-size="16px" fill="#f41d92" text-anchor="middle" style="white-space: pre;" direction="ltr">4</text></g><g transform="translate(130.46138717005596 10) rotate(0 30 25.769230769230774)"><path d="M31 0 C41.07880149371922 9.036166856437921, 51.15760298743844 18.072333712875842, 60 26 M31 0 C39.524535740539434 7.642687215656043, 48.04907148107887 15.285374431312086, 60 26 M60 26 C50.12890856973827 34.6928444425647, 40.257817139476536 43.3856888851294, 31 51.53846153846152 M60 26 C53.873228477314115 31.395459271967407, 47.74645695462823 36.790918543934815, 31 51.53846153846152 M31 51.53846153846152 C24.017179042473437 45.785864620598446, 17.03435808494687 40.03326770273537, 0 26 M31 51.53846153846152 C22.16923016272485 44.26348489832418, 13.338460325449702 36.98850825818684, 0 26 M0 26 C10.104321676120163 17.52540762647986, 20.208643352240326 9.050815252959726, 31 0 M0 26 C6.349167991057039 20.67489136233926, 12.698335982114077 15.349782724678516, 31 0" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(150.46138717005596 27.576923076923094) rotate(0 10 10.5)"><text x="10" y="15" font-size="16px" fill="currentColor" text-anchor="middle" style="white-space: pre;" direction="ltr">f()</text></g><g><g transform="translate(160.46138717005596 65.6643358915278) rotate(0 -65.59274113836435 31.01398590038997)"><path d="M0 0 C-32.79719711426486 15.507383762609525, -65.59439422852972 31.01476752521905, -131.1854822767287 62.02797180077994 M0 0 C-34.642218837772205 16.379758917629562, -69.28443767554441 32.759517835259125, -131.1854822767287 62.02797180077994" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.46138717005596 65.6643358915278) rotate(0 -65.59274113836435 31.01398590038997)"><path d="M-110.08588368739868 40.701768684211274 C-115.36091627451353 46.033453830876525, -120.63594886162838 51.365138977541775, -131.1854822767287 62.02797180077994 M-110.08588368739868 40.701768684211274 C-115.65766570958043 46.33339028228201, -121.22944773176218 51.96501188035275, -131.1854822767287 62.02797180077994" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.46138717005596 65.6643358915278) rotate(0 -65.59274113836435 31.01398590038997)"><path d="M-101.31403290034261 59.2537129530868 C-108.78208345199965 59.94729514445955, -116.25013400365668 60.64087733583231, -131.1854822767287 62.02797180077994 M-101.31403290034261 59.2537129530868 C-109.20220214238815 59.98631293630506, -117.09037138443368 60.718912919523326, -131.1854822767287 62.02797180077994" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(160.48928675551116 68.13634859666206) rotate(0 -19.355471583288264 29.98003021749537)"><path d="M0 0 C-9.837823551881751 15.237977855029664, -19.675647103763502 30.475955710059328, -38.71094316657653 59.96006043499074 M0 0 C-13.47866193275869 20.877336431544588, -26.95732386551738 41.754672863089176, -38.71094316657653 59.96006043499074" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.48928675551116 68.13634859666206) rotate(0 -19.355471583288264 29.98003021749537)"><path d="M-32.0406062237209 30.711015919506558 C-33.7357754060639 38.1442354555501, -35.43094458840691 45.57745499159365, -38.71094316657653 59.96006043499074 M-32.0406062237209 30.711015919506558 C-34.363133376920544 40.89516413785352, -36.68566053012018 51.07931235620049, -38.71094316657653 59.96006043499074" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.48928675551116 68.13634859666206) rotate(0 -19.355471583288264 29.98003021749537)"><path d="M-14.800245208047485 41.84160235666644 C-20.876801558176602 46.44614513000349, -26.95335790830572 51.05068790334053, -38.71094316657653 59.96006043499074 M-14.800245208047485 41.84160235666644 C-23.125648517713685 48.1502208299939, -31.45105182737988 54.45883930332135, -38.71094316657653 59.96006043499074" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(160.52824769907662 66.6655149121101) rotate(0 57.93333743067065 32.11362180521547)"><path d="M0 0 C25.16407317378604 13.948955209922183, 50.32814634757208 27.897910419844365, 115.8666748613413 64.22724361043095 M0 0 C26.33727004966099 14.59928198973174, 52.67454009932198 29.19856397946348, 115.8666748613413 64.22724361043095" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.52824769907662 66.6655149121101) rotate(0 57.93333743067065 32.11362180521547)"><path d="M86.23606482944287 59.533955942115654 C92.671278750261 60.55325018226234, 99.10649267107911 61.572544422409024, 115.8666748613413 64.22724361043095 M86.23606482944287 59.533955942115654 C92.97130064319347 60.60077161529422, 99.70653645694408 61.66758728847278, 115.8666748613413 64.22724361043095" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(160.52824769907662 66.6655149121101) rotate(0 57.93333743067065 32.11362180521547)"><path d="M96.18509786206775 41.58578767620121 C100.45956801820122 46.50308813875298, 104.7340381743347 51.42038860130475, 115.8666748613413 64.22724361043095 M96.18509786206775 41.58578767620121 C100.65885193374875 46.7323420238678, 105.13260600542975 51.8788963715344, 115.8666748613413 64.22724361043095" stroke="currentColor" stroke-width="1" fill="none"></path></g></g><g><g transform="translate(158.919574147391 65.60663583679252) rotate(0 21.79837457055784 33.44156867903382)"><path d="M0 0 C13.031323839052332 19.9917617587414, 26.062647678104664 39.9835235174828, 43.59674914111568 66.88313735806764 M0 0 C15.400439257044772 23.626295870581124, 30.800878514089543 47.25259174116225, 43.59674914111568 66.88313735806764" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(158.919574147391 65.60663583679252) rotate(0 21.79837457055784 33.44156867903382)"><path d="M19.60693131555252 48.86956911920136 C26.77762846806006 54.253930243415375, 33.9483256205676 59.63829136762939, 43.59674914111568 66.88313735806764 M19.60693131555252 48.86956911920136 C28.08127262513716 55.232815666950536, 36.555613934721805 61.59606221469971, 43.59674914111568 66.88313735806764" stroke="currentColor" stroke-width="1" fill="none"></path></g><g transform="translate(158.919574147391 65.60663583679252) rotate(0 21.79837457055784 33.44156867903382)"><path d="M36.79838097459263 37.663585851030334 C38.83045306483746 46.397481052426734, 40.86252515508229 55.131376253823134, 43.59674914111568 66.88313735806764 M36.79838097459263 37.663585851030334 C39.199887005619736 47.98531710318542, 41.60139303664684 58.307048355340505, 43.59674914111568 66.88313735806764" stroke="currentColor" stroke-width="1" fill="none"></path></g></g></svg>
<figcaption>Hash Index split</figcaption>
</figure>
<p>Luckily, PostgreSQL does all the heavy lifting for you, so you don't have to decide what hash function to use, or how many buckets there are.</p>
<p>If you want to learn more about the internals of Hash indexes in PostgreSQL, check out the <a href="https://github.com/postgres/postgres/blob/master/src/backend/access/hash/README" rel="noopener">readme on "Hash Indexing"</a>.</p>
<hr>
<h2 id="hash-index-in-postgresql"><a class="toclink" href="#hash-index-in-postgresql">Hash Index in PostgreSQL</a></h2>
<p>Hash indexes were discouraged prior to PostgreSQL 10. If you read the <a href="https://www.postgresql.org/docs/9.6/indexes-types.html" rel="noopener">index type documentation for PostgreSQL 9.6</a> you'll find a nasty warning about Hash indexes. According to the warning, Hash indexes are not written to the WAL so they cannot be maintained in replicas, and they are not automatically available after a crash, so you need to manually rebuild them.</p>
<p>Starting at PostgreSQL 10 these limitations were resolved, and <a href="https://www.postgresql.org/docs/10/release-10.html#id-1.11.6.20.5.3.3" rel="noopener">Hash indexes are no longer discouraged</a>.</p>
<h3 id="creating-hash-indexes"><a class="toclink" href="#creating-hash-indexes">Creating Hash Indexes</a></h3>
<p>Now that you know how Hash indexes work in PostgreSQL, you are ready to see them in action. To illustrate, we'll create an imaginary URL shortening service.</p>
<p>A URL shortener service, such as "bit.ly" or the late "goo.gl", provides a short random URL that points to a longer URL. For example, you can have the short url <code>https://short.url/hb9837</code> point to the longer URL <code>https://hakibenita.com/automating-the-boring-stuff-in-django-using-the-check-framework</code>. We'll call the <code>hb9837</code> part of the short URL the <code>key</code>.</p>
<p>To implement your URL shortening service, create a table to store the mapping between the keys and the full URLs:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span>
<span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="p">,</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span>
<span class="p">);</span>
</pre></div>
<p>The table includes an auto incrementing primary key <code>id</code>, a text field for <code>key</code>, and a <code>url</code> field containing the full URL.</p>
<p>Next, create a B-Tree and a Hash index on both fields:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_hash_index</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">UNIQUE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_btree_index</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">btree</span><span class="p">(</span><span class="k">key</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_url_hash_index</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="n">url</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_url_btree_index</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">btree</span><span class="p">(</span><span class="n">url</span><span class="p">);</span>
</pre></div>
<p>You might have noticed that the B-Tree on <code>key</code> is a unique index, but the Hash index is not. This is because Hash indexes currently cannot be used to enforce unique constraints.</p>
<h3 id="hash-index-size"><a class="toclink" href="#hash-index-size">Hash Index Size</a></h3>
<p>The first thing you want to do is to compare the size of a Hash index to the size of a similar B-Tree index:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTENSION</span><span class="w"> </span><span class="s s-Name">"uuid-ossp"</span><span class="p">;</span>
<span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="mf">0..1000000</span><span class="w"> </span><span class="k">loop</span>
<span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="p">(</span><span class="k">key</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">uuid_generate_v4</span><span class="p">(),</span>
<span class="w"> </span><span class="s1">'https://www.supercool-url.com/'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="mf">6</span><span class="p">)</span><span class="o">::</span><span class="nb">text</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">mod</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mf">10000</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'rows:% Hash key% B-Tree key:% Hash url:% B-Tree url:%'</span><span class="p">,</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s1">'9999999999'</span><span class="p">),</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_hash_index'</span><span class="p">),</span><span class="w"> </span><span class="s1">'99999999999'</span><span class="p">),</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_btree_index'</span><span class="p">),</span><span class="w"> </span><span class="s1">'99999999999'</span><span class="p">),</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_url_hash_index'</span><span class="p">),</span><span class="w"> </span><span class="s1">'99999999999'</span><span class="p">),</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_url_btree_index'</span><span class="p">),</span><span class="w"> </span><span class="s1">'99999999999'</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p>To get a better sense of how both indexes behave, keep track of the size of the indexes as rows are inserted to the table:</p>
<ul>
<li>Generate 1,000,000 short urls one by one</li>
<li>Every 10,000 rows log the size of all four indexes</li>
<li>Use <code>uuid_generate_v4</code> from the <a href="https://www.postgresql.org/docs/current/uuid-ossp.html" rel="noopener">uuid-ossp</a> package to generate UUIDs as keys</li>
<li>Generate a long URL from a 6-digit domain so it's not unique, but almost unique</li>
</ul>
<p>Running the script on PostgreSQL 13 and plotting the results gives the following chart:</p>
<figure><img alt="Hash vs. B-Tree index size" src="https://hakibenita.com/images/01-postgresql-hash-index.png"><figcaption>Hash vs. B-Tree index size</figcaption>
</figure>
<p>The chart provides several interesting observations:</p>
<ol>
<li>
<p><strong>The Hash index is smaller than the B-Tree index</strong>: Almost all along the way, the Hash index on both fields is smaller than the size of the corresponding B-Tree index.</p>
</li>
<li>
<p><strong>Hash index grows in increments</strong>: Unlike the B-Tree index that seems to grow linearlly as rows are inserted to the table, the Hash index grows in sudden increments. These sudden increments are caused when a split is triggered to allocate space for more buckets. When that space runs out, another split is triggered and so on.</p>
</li>
<li>
<p><strong>Hash index size is not affected by the size of the indexed value</strong>: A Hash index stores the hash codes of the index field values. A B-Tree on the other hand, stores the actual value of the indexed field in its leaves, so the size of the indexed value affects the size of a B-Tree index, but not the size of a Hash index.</p>
</li>
<li>
<p><strong>Hash index size is not affected by the selectivity of the indexed value</strong>: The selectivity of the URL field is different than that of the key field, which is almost unique, but the Hash index on both fields is the same size. A size of a B-Tree index on the other hand, can vary based on the selectivity of the data (see note below about "B-Tree deduplication").</p>
</li>
</ol>
<div class="admonition info">
<p class="admonition-title">B-Tree deduplication</p>
<p>PostgreSQL 13 introduced a new <a href="https://www.postgresql.org/docs/current/btree-implementation.html#BTREE-DEDUPLICATION" rel="noopener">B-Tree deduplication</a> mechanism that reduces the size of a B-Tree indexes with many duplicate values. In the chart above, the size of the B-Tree index on the <code>url</code> field is smaller than the size of the index on the <code>key</code> field because it has fewer unique values.</p>
<figure><img alt="Hash vs. B-Tree index size on PostgreSQL 12" src="https://hakibenita.com/images/02-postgresql-hash-index.png"><figcaption>Hash vs. B-Tree index size on PostgreSQL 12</figcaption>
</figure>
<p>If you run the same script with deduplication disabled, or on PostgreSQL versions prior to 13, you'll see that the size of the index is the same (as long as values are the same size).</p>
</div>
<p>The script you ran before imitates a workload of an OLTP-like system, where single rows are constantly being added to the table. This is the size of the indexes after the script finished:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">shorturl*</span>
<span class="go"> Name β Type β Table βSize</span>
<span class="go">ββββββββββββββββββββββββββΌββββββββΌβββββββββββΌββββββββ</span>
<span class="go">shorturl_key_btree_index β index β shorturl β 73 MB</span>
<span class="go">shorturl_key_hash_index β index β shorturl β 37 MB</span>
<span class="go">shorturl_url_btree_index β index β shorturl β 53 MB</span>
<span class="go">shorturl_url_hash_index β index β shorturl β 36 MB</span>
</pre></div>
<p>At this point the Hash indexes are roughly half the size of the corresponding B-Tree indexes. To make an even more strict comparison, reindex the table and let the DB rebuild the indexes from scratch:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">REINDEX</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">shorturl</span><span class="p">;</span>
<span class="go">REINDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">shorturl*</span>
<span class="go"> Name β Type β Table β Size</span>
<span class="go">ββββββββββββββββββββββββββΌββββββββΌβββββββββββΌβββββββ</span>
<span class="go">shorturl_key_btree_index β index β shorturl β 56 MB</span>
<span class="go">shorturl_key_hash_index β index β shorturl β 32 MB</span>
<span class="go">shorturl_url_btree_index β index β shorturl β 41 MB</span>
<span class="go">shorturl_url_hash_index β index β shorturl β 32 MB</span>
</pre></div>
<p>Reindexing reduced the size of all the indexes, some more than others. Reindexed, both Hash indexes are still smaller than the corresponding B-Tree indexes.</p>
<h3 id="hash-index-fillfactor"><a class="toclink" href="#hash-index-fillfactor">Hash Index <code>fillfactor</code></a></h3>
<p>A useful storage parameter that affects when a Hash index is split is <a href="https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS" rel="noopener"><code>fillfactor</code></a>. In a Hash index, the <code>fillfactor</code> parameter controls the ratio between tuples and buckets that triggers a split. The lower the <code>fillfactor</code>, the more "empty" space in the index.</p>
<p>The default fillfactor for a B-Tree index is 90, for a Hash index the default is 75. To illustrate how <code>fillfactor</code> affects the size of a Hash index, compare the size of Hash indexes with different fillfactor while inserting rows into the table:</p>
<p><details></p>
<p><summary>Code</summary></p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span>
<span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="p">,</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span>
<span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_hash_index_025</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">fillfactor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">25</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_hash_index_050</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">fillfactor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">50</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_hash_index_075</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">fillfactor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">75</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_hash_index_100</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">fillfactor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">100</span><span class="p">);</span>
<span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'rows,25,50,75,100'</span><span class="p">;</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="mf">0..1000000</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">shorturl_ff</span><span class="w"> </span><span class="p">(</span><span class="k">key</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">uuid_generate_v4</span><span class="p">(),</span>
<span class="w"> </span><span class="s1">'https://www.supercool-url.com/'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="mf">6</span><span class="p">)</span><span class="o">::</span><span class="nb">text</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">mod</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mf">10000</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'%,%,%,%,%'</span><span class="p">,</span>
<span class="w"> </span><span class="n">to_char</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s1">'9999999999'</span><span class="p">),</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_hash_index_025'</span><span class="p">),</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_hash_index_050'</span><span class="p">),</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_hash_index_075'</span><span class="p">),</span>
<span class="w"> </span><span class="n">pg_relation_size</span><span class="p">(</span><span class="s1">'shorturl_key_hash_index_100'</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p></details></p>
<figure><img alt="Hash index size with different fillfactor values" src="https://hakibenita.com/images/03-postgresql-hash-index.png"><figcaption>Hash index size with different fillfactor values</figcaption>
</figure>
<p>The chart confirms that low <code>fillfactor</code> values trigger bucket splits sooner, which leads to a larger index size. Unlike B-Tree indexes that reserve space for updated rows, new values in a Hash index can go to any bucket, so keep that in mind when you adjust the <code>fillfactor</code> for a Hash index.</p>
<hr>
<h2 id="hash-index-performance"><a class="toclink" href="#hash-index-performance">Hash Index Performance</a></h2>
<p>Striking a good balance between index size and speed is key to a healthy database. So far you've seen that Hash indexes can be smaller in size than similar B-Tree indexes under some circumstances, but can a Hash index be faster than a B-Tree?</p>
<h3 id="hash-index-insert-performance"><a class="toclink" href="#hash-index-insert-performance">Hash Index Insert Performance</a></h3>
<p>When you insert a row into a table, the row is also inserted into all the indexes on the table. Having many indexes on a table can speed up queries, but it might have the opposite effect on inserts.</p>
<p>To determine the effect of Hash indexes on insert performance, create the shorturl table, and add a Hash index on the <code>key</code> column:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">shorturl_hash</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">shorturl_hash</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span>
<span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="p">,</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span>
<span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_hash_hash_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_hash</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">);</span>
</pre></div>
<p>Next, insert 1M rows one by one, and get the total and average timing:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">INTEGER</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">1000000</span><span class="p">;</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="nb">TIMESTAMP</span><span class="p">;</span>
<span class="w"> </span><span class="n">uid</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">;</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="mf">1..</span><span class="n">n</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="n">uid</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">uuid_generate_v4</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">;</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s1">'https://www.supercool-url.com/'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="mf">6</span><span class="p">)</span><span class="o">::</span><span class="nb">text</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">clock_timestamp</span><span class="p">();</span>
<span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">shorturl_hash</span><span class="w"> </span><span class="p">(</span><span class="k">key</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">uid</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">);</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">clock_timestamp</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">start</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'Hash: total=% mean=%'</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">,</span><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">duration</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
<span class="hll"><span class="n">Hash</span><span class="p">:</span><span class="w"> </span><span class="n">total</span><span class="o">=</span><span class="mf">00</span><span class="p">:</span><span class="mf">00</span><span class="p">:</span><span class="mf">09.764746</span><span class="w"> </span><span class="n">mean</span><span class="o">=</span><span class="mf">9.764746e-06</span>
</span></pre></div>
<p>Inserting 1M rows one by one into a table with a Hash index took just under 10s.</p>
<p>Executing the same script on a table with a B-Tree index produces the following results:</p>
<p><details markdown="1"></p>
<p><summary>Code</summary></p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">shorturl_btree</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">shorturl_btree</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span>
<span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="p">,</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span>
<span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_btree_hash_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl_btree</span><span class="p">(</span><span class="k">key</span><span class="p">);</span>
<span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">INTEGER</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">1000000</span><span class="p">;</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="nb">TIMESTAMP</span><span class="p">;</span>
<span class="w"> </span><span class="n">uid</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">;</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="mf">1..</span><span class="n">n</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="n">uid</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">uuid_generate_v4</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">;</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s1">'https://www.supercool-url.com/'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="mf">6</span><span class="p">)</span><span class="o">::</span><span class="nb">text</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">clock_timestamp</span><span class="p">();</span>
<span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">shorturl_btree</span><span class="w"> </span><span class="p">(</span><span class="k">key</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">uid</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">);</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">clock_timestamp</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">start</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'B-Tree: total=% mean=%'</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">,</span><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">duration</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p></details></p>
<div class="highlight"><pre><span></span>B-Tree: total=00:00:10.822158 mean=1.0822158e-05
</pre></div>
<p>Inserting 1M rows one by one to a table with a B-Tree index took almost 11s. While far from scientific, this test shows that <strong>a Hash index has less impact on insert performance than a B-Tree index.</strong></p>
<h3 id="hash-index-select-performance"><a class="toclink" href="#hash-index-select-performance">Hash Index Select Performance</a></h3>
<p>In your URL shortening service, you want to be able to quickly look up a URL based on its key. To see how a Hash index performs in relation to a B-Tree, run a benchmark to query the table many times using the index, and get the total and average timing:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">INTEGER</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">100000</span><span class="p">;</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="nb">TIMESTAMP</span><span class="p">;</span>
<span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">[];</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="c1">-- Fetch radom keys from the table</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">keys</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">key</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl_hash</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">random</span><span class="p">()</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="n">n</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="n">array_lower</span><span class="p">(</span><span class="n">keys</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="mf">..</span><span class="n">array_upper</span><span class="p">(</span><span class="n">keys</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">clock_timestamp</span><span class="p">();</span>
<span class="w"> </span><span class="k">PERFORM</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl_hash</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keys</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">clock_timestamp</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">start</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'Hash: total=% mean=%'</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">,</span><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">duration</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
<span class="hll"><span class="n">Hash</span><span class="p">:</span><span class="w"> </span><span class="n">total</span><span class="o">=</span><span class="mf">00</span><span class="p">:</span><span class="mf">00</span><span class="p">:</span><span class="mf">00.590032</span><span class="w"> </span><span class="n">mean</span><span class="o">=</span><span class="mf">5.90032e-06</span>
</span></pre></div>
<p>Selecting 100K random keys from the table one by one took just over half a second.</p>
<p>Next, execute the same on a table with a B-Tree index:</p>
<p><details markdown="1">
<summary>Code</summary></p>
<div class="highlight"><pre><span></span><span class="k">DO</span><span class="w"> </span><span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="nb">INTEGER</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">100000</span><span class="p">;</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="nb">TIMESTAMP</span><span class="p">;</span>
<span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">[];</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="c1">-- Fetch radom keys from the table</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">keys</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">key</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl_btree</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">random</span><span class="p">()</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="n">n</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="n">array_lower</span><span class="p">(</span><span class="n">keys</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="mf">..</span><span class="n">array_upper</span><span class="p">(</span><span class="n">keys</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">clock_timestamp</span><span class="p">();</span>
<span class="w"> </span><span class="k">PERFORM</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl_btree</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keys</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">duration</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">clock_timestamp</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">start</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">NOTICE</span><span class="w"> </span><span class="s1">'B-Tree: total=% mean=%'</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">,</span><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">duration</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p></details></p>
<div class="highlight"><pre><span></span>B-Tree: total=00:00:00.923244 mean=9.23244e-06
</pre></div>
<p>While both are fast, you can see that <strong>Hash index outperforms the B-Tree index with a very slight difference</strong>.</p>
<h3 id="hash-index-limitations"><a class="toclink" href="#hash-index-limitations">Hash Index Limitations</a></h3>
<p>Hash index benefits do not come without a cost. While it can outperform a B-Tree under some circumstances, it has many limitations.</p>
<p>Hash index cannot be used to enforce a unique constraint:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">UNIQUE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_unique_key</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">);</span>
<span class="gs">ERROR:</span><span class="gr"> access method "hash" does not support unique indexes</span>
</pre></div>
<p>Hash index cannot be used to create indexes on multiple columns:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_and_url</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">);</span>
<span class="gs">ERROR:</span><span class="gr"> access method "hash" does not support multicolumn indexes</span>
</pre></div>
<p>Hash index cannot be used to create sorted indexes:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_sorted_key</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="k">key</span><span class="w"> </span><span class="k">desc</span><span class="p">);</span>
<span class="gs">ERROR:</span><span class="gr"> access method "hash" does not support ASC/DESC options</span>
</pre></div>
<p>If you don't mind these limitations and you decided to create a Hash index, there are some limitations on <em>using</em> Hash indexes as well.</p>
<p>You can't use a Hash index to cluster a table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CLUSTER</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">shorturl_url_hash_index</span><span class="p">;</span>
<span class="gs">ERROR:</span><span class="gr"> cannot cluster on index "shorturl_url_hash_index" because access method does not support clustering</span>
</pre></div>
<div class="admonition tip">
<p class="admonition-title">Clustering</p>
<p>Not sure what the <code>CLUSTER</code> command does? or why you would want to use it? Check out my tip on <a href="/sql-tricks-application-dba#always-load-sorted-data">always loading sorted data into tables</a>.</p>
</div>
<p>Hash index cannot be used for range lookups:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="n">COSTS</span><span class="w"> </span><span class="k">OFF</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'1'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Gather</span>
<span class="go"> -> Seq Scan on shorturl</span>
<span class="go"> Filter: ((key >= '1'::text) AND (key <= '2'::text))</span>
</pre></div>
<p>A Hash index cannot be used to satisfy <code>ORDER BY</code> queries:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="n">COSTS</span><span class="w"> </span><span class="k">OFF</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Limit</span>
<span class="go"> -> Index Scan using shorturl_key_btree_index on shorturl</span>
</pre></div>
<p>The database is able to use the B-Tree index on the <code>key</code> field to satisfy the query, but not the Hash index. To make sure, drop the B-Tree and check again:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">BEGIN</span><span class="p">;</span>
<span class="go">BEGIN</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">shorturl_key_btree_index</span><span class="p">;</span>
<span class="go">DROP INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="n">COSTS</span><span class="w"> </span><span class="k">OFF</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">shorturl</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go"> Limit</span>
<span class="go"> -> Gather Merge</span>
<span class="go"> Workers Planned: 2</span>
<span class="go"> -> Sort</span>
<span class="go"> Sort Key: key</span>
<span class="go"> -> Parallel Seq Scan on shorturl</span>
<span class="go">(6 rows)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">rollback</span><span class="p">;</span>
<span class="go">ROLLBACK</span>
</pre></div>
<p>Even without the B-Tree the Hash index is not being used.</p>
<div class="admonition tip">
<p class="admonition-title">invisible indexes</p>
<p>Generating a query execution plan with the absence of an existing index without actually dropping it is extremely useful. Check out my tip on <a href="/sql-tricks-application-dba#make-indexes-invisible">"Invisible Indexes"</a>.</p>
</div>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>Hash maps are usually the go-to data structure for storing data for fast retrieval by key. However, in the main data store, the database, we rarely use this data structure. In PostgreSQL at least, this is most likely because the implementation prior to version 10 was very restrictive, and DBAs and developers just got used to dismissing it.</p>
<p>This is a quick recap of our re-introduction to Hash indexes in PostgreSQL:</p>
<ul>
<li>Hash index is usually smaller than a corresponding B-Tree index</li>
<li>Hash index select and insert performance can be better than a B-Tree index</li>
<li>Hash index removed many of its restrictions in PostgreSQL 10, and is now safe to use</li>
<li>Hash index has many restrictions that limit its use to very specific use cases</li>
</ul>
<p>If you are interested in the internals of a Hash index, don't miss the <a href="https://github.com/postgres/postgres/blob/master/src/backend/access/hash/README" rel="noopener">readme in the source</a>.</p>2020 Year in Review2021-01-04T00:00:00+02:002021-01-04T00:00:00+02:00Haki Benitatag:hakibenita.com,2021-01-04:/2020-year-in-review<p>What I've been up to in 2020...</p><hr>
<p>The year 2020 has been a turbulent year, but it was also a year of personal and professional growth:</p>
<ul>
<li>
<p>This year <strong>I published 11 articles</strong>. This is consistent with my personal goal of publishing every month. 7 of these articles were about <a href="tag/python">Python</a>, and the other 4 about <a href="tag/sql">SQL</a>.</p>
</li>
<li>
<p>According to my analytics, <strong>my articles were viewed ~400K this year</strong>. Given the target audience of my content this is most likely an underestimation, but it's still double the previous year!</p>
</li>
<li>
<p>My articles appear on other sites such as <a href="https://medium.com/@hakibenita" rel="noopener">Medium</a> and <a href="https://realpython.com/team/hbenita/" rel="noopener">Real Python</a>. From the sources I have data on, <strong>my articles were viewed an additional 380K times on other sites</strong>.</p>
</li>
<li>
<p>At the beginning of the year I started a mailing list. Since then, <strong>more than 1.1K readers subscribed to the mailing list</strong>. If you want to get an email when I publish something new you can <a href="subscribe">subscribe to my mailing list</a>.</p>
</li>
<li>
<p>Four of the articles I published this year spent a fair amount of time on the <strong>front page of <a href="https://news.ycombinator.com/from?site=hakibenita.com" rel="noopener">Hacker News</a></strong>, which brought in a fair amount of traffic and new readers.</p>
</li>
<li>
<p>Some articles <strong>did very well this year on social platforms</strong> such as <a href="https://news.ycombinator.com/from?site=hakibenita.com" rel="noopener">Hacker News</a>, <a href="https://www.reddit.com/search/?q=site%3Ahakibenita.com&sort=top" rel="noopener">Reddit</a>, <a href="https://lobste.rs/domain/hakibenita.com" rel="noopener">Lobsters</a> and <a href="https://twitter.com/search?q=hakibenita.com]" rel="noopener">Twitter</a>.</p>
</li>
<li>
<p><strong>Gave two online live training sessions for <a href="https://www.oreilly.com" rel="noopener">OβReilly</a></strong>. My collaboration with OβReilly started back in 2019 and eventually came to light at the beginning of this year, when I did two live online training sessions titled <a href="https://www.oreilly.com/live-training/courses/sql-next-steps-optimization/0636920378372/" rel="noopener">SQL Next Steps: Optimization</a>. The class was inspired by my article <a href="sql-dos-and-donts">12 Common Mistakes and Missed Optimization Opportunities in SQL</a> along with some additional content (and many examples). You can find the slides <a href="https://github.com/hakib/oreilly-sql-next-steps" rel="noopener">here</a>.</p>
</li>
<li>
<p><strong>Wrote two tutorials for the <a href="https://www.twilio.com/blog" rel="noopener">Twilio Blog</a></strong>. The tutorials were inspired by the work my team and I did on an Interactive Voice Response (IVR) system. Developing an IVR system was a unique experience for us, and I ended up writing two tutorials about it: <a href="python-django-twilio-ivr">Building an IVR System with Python, Django and Twilio</a> and <a href="python-django-pytest-twilio-ivr">Testing an IVR System With Python and Pytest</a>.</p>
</li>
<li>
<p><strong>Published two articles on <a href="https://realpython.com/" rel="noopener">Real Python</a></strong>. I've been writing for Real Python for several years now and it's been a pleasure. They have a very strict publishing process which can be intimidating at first, but they helped me polish my writing and improve my style. This year I published <a href="django-pytest-fixtures">How to Provide Test Fixtures for Django Models in Pytest</a>, and a month later <a href="move-django-model">How to Move a Django Model to Another App</a>.</p>
</li>
<li>
<p><strong>Collaborated with <a href="https://popsql.com/" rel="noopener">PopSQL</a></strong>. PopSQL approached me about their product earlier in the year, and I thought it could fit nicely with an article I was working on at the time. The article was <a href="sql-anomaly-detection">Simple Anomaly Detection Using Plain SQL</a> and I thought readers could benefit from an interactive editor to experiment with.</p>
</li>
<li>
<p><strong>Gave a talk on <a href="https://www.postgresbuild.com/" rel="noopener">PostgreSQL Build 2020</a></strong>. Late this year I gave a talk in a conference organized by <a href="https://www.enterprisedb.com/" rel="noopener">EnterpriseDB</a>. The talk was inspired by my article <a href="sql-dos-and-donts">12 Common Mistakes and Missed Optimization Opportunities in SQL</a>.</p>
</li>
<li>
<p><strong>Collaborated with <a href="https://www.pgmustard.com" rel="noopener">pgMustard</a></strong>. I met Michael from pgMustard back in 2019 when he pointed out some things about my article <a href="be-careful-with-cte-in-postgre-sql">Be careful with CTE in PostgreSQL</a>. I was impressed with the work they've done on the <a href="https://www.pgmustard.com/docs/explain" rel="noopener">Explain Glossary</a>, so before I published <a href="sql-tricks-application-dba">Some SQL Tricks of an Application DBA</a> I reached out to get a quick review on the article.</p>
</li>
<li>
<p>I decided to <strong>give <a href="https://webmonetization.org/specification.html" rel="noopener">web monetization</a> a chance</strong>. I signed up for <a href="https://coil.com/" rel="noopener">coil</a> and added the meta tag to the site. I think it's an interesting idea, and I'm curious to see how it evolves.</p>
</li>
<li>
<p>I started to <strong>self host my analytics</strong>. I really want to stop relying on Google Analytics and experiment with different, self hosted analytics solutions. I initially went with <a href="https://usefathom.com/" rel="noopener">Fathom</a>. It worked fine for a while, but then they <a href="https://usefathom.com/blog/moved-to-vapor" rel="noopener">switched from Go to PHP</a> and that was the end of it for me. Mid-year I started using <a href="https://www.goatcounter.com/" rel="noopener">Goat Counter</a> instead. It's a privacy focused analytics solution written in Go. I was able to get started quickly, and i've been using it since.</p>
</li>
</ul>Exhaustiveness Checking with Mypy2020-12-08T00:00:00+02:002020-12-08T00:00:00+02:00Haki Benitatag:hakibenita.com,2020-12-08:/python-mypy-exhaustive-checking<p>What if mypy could warn you about possible problems at "compile time"? In this article I share a little trick to get mypy to fail when a value in an enumeration type is left unhandled.</p><hr>
<p><a href="https://mypy-lang.org/" rel="noopener">Mypy</a> is an optional static type checker for Python. It's been around since 2012 and is gaining traction even since. One of the main benefits of using a type checker is getting errors at "compile time" rather than at run time.</p>
<p>Exhaustiveness checking is a common feature of type checkers, and a very useful one! In this article I'm going to show you <strong>how you can get mypy to perform exhaustiveness checking!</strong></p>
<figure><img alt="Playing cards are also useful for explaining enumeration types...<br><small>Photo by <a href="https://unsplash.com/photos/G6wlppP4EN8">Daniel Rykhev</a></small>" src="https://hakibenita.com/images/01-python-mypy-exhaustive-checking.png"><figcaption>Playing cards are also useful for explaining enumeration types...<br><small>Photo by <a href="https://unsplash.com/photos/G6wlppP4EN8">Daniel Rykhev</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#exhaustiveness-checking">Exhaustiveness Checking</a></li>
<li><a href="#enumeration-types">Enumeration types</a></li>
<li><a href="#type-narrowing-in-mypy">Type Narrowing in Mypy</a></li>
<li><a href="#the-future">The Future</a></li>
<li><a href="#bonus-exhaustiveness-checking-in-django">Bonus: Exhaustiveness Checking in Django</a></li>
<li><a href="#updates">Updates</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="exhaustiveness-checking"><a class="toclink" href="#exhaustiveness-checking">Exhaustiveness Checking</a></h2>
<p>Say you have a system to manage orders. To represent the status of an order, you have the following enum:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">enum</span>
<span class="k">class</span> <span class="nc">OrderStatus</span><span class="p">(</span><span class="n">enum</span><span class="o">.</span><span class="n">Enum</span><span class="p">):</span>
<span class="n">Ready</span> <span class="o">=</span> <span class="s1">'ready'</span>
<span class="n">Shipped</span> <span class="o">=</span> <span class="s1">'shipped'</span>
</pre></div>
<p>You also have the following code to process an <code>Order</code>:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">status</span><span class="p">:</span> <span class="n">OrderStatus</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Ready</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Shipped</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
</pre></div>
<p>When the order is ready, you ship it; and when it's shipped, you charge it.</p>
<p>A few months go by and your system becomes big. So big in fact, that you can no longer ship orders immediately, and you add a new status:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">enum</span>
<span class="k">class</span> <span class="nc">OrderStatus</span><span class="p">(</span><span class="n">enum</span><span class="o">.</span><span class="n">Enum</span><span class="p">):</span>
<span class="n">Ready</span> <span class="o">=</span> <span class="s1">'ready'</span>
<span class="hll"> <span class="n">Scheduled</span> <span class="o">=</span> <span class="s1">'scheduled'</span>
</span> <span class="n">Shipped</span> <span class="o">=</span> <span class="s1">'shipped'</span>
</pre></div>
<p>Before you push this change to production, you run a quick check with mypy to make sure everything is OK:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
Success:<span class="w"> </span>no<span class="w"> </span>issues<span class="w"> </span>found<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>file
</pre></div>
<p>Mypy does not see anything wrong in this code, Do you? The problem is that <strong>you forgot to handle the new status in your function</strong>.</p>
<p>One way to make sure you always handle all possible order statuses is to add an assert, or throw an exception:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">status</span><span class="p">:</span> <span class="n">OrderStatus</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Ready</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Shipped</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
<span class="hll"> <span class="k">assert</span> <span class="kc">False</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'Unhandled status "</span><span class="si">{</span><span class="n">status</span><span class="si">}</span><span class="s1">"'</span>
</span></pre></div>
<p>Now, when you execute the function with the new status <code>OrderStatus.Scheduled</code>, you will get a runtime error:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">handle_order</span><span class="p">(</span><span class="n">OrderStatus</span><span class="o">.</span><span class="n">Scheduled</span><span class="p">)</span>
<span class="ne">AssertionError</span><span class="p">:</span> <span class="n">Unhandled</span> <span class="n">status</span> <span class="s2">"OrderStatus.Scheduled"</span>
</pre></div>
<p>Another way to with deal with cases like this is to go over your test suite and add scenarios in all the places that use order status. But... if you forgot to change the function when you added the status, what are the chances you'll remember to update the tests? That's not a good solution...</p>
<p><strong>Exhaustiveness Checking in Mypy</strong></p>
<p>What if mypy could warn you at "compile time" about such cases? Well... it can, using this little magic function:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">NoReturn</span>
<span class="kn">import</span> <span class="nn">enum</span>
<span class="k">def</span> <span class="nf">assert_never</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="n">NoReturn</span><span class="p">)</span> <span class="o">-></span> <span class="n">NoReturn</span><span class="p">:</span>
<span class="k">assert</span> <span class="kc">False</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'Unhandled value: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s1"> (</span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">)'</span>
</pre></div>
<p>Before you dig into the implementation, try to use it to see how it works. In the function above, place <code>assert_never</code> after you handled all the possible order statuses, where you previously used <code>assert</code> or raises an exception:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">status</span><span class="p">:</span> <span class="n">OrderStatus</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Ready</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Shipped</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
<span class="hll"> <span class="k">else</span><span class="p">:</span>
</span><span class="hll"> <span class="n">assert_never</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
</span></pre></div>
<p>Now, check the code with Mypy:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
error:<span class="w"> </span>Argument<span class="w"> </span><span class="m">1</span><span class="w"> </span>to<span class="w"> </span><span class="s2">"assert_never"</span><span class="w"> </span>has<span class="w"> </span>incompatible<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"Literal[OrderStatus.Scheduled]"</span><span class="p">;</span>
expected<span class="w"> </span><span class="s2">"NoReturn"</span>
Found<span class="w"> </span><span class="m">1</span><span class="w"> </span>error<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span>file<span class="w"> </span><span class="o">(</span>checked<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>file<span class="o">)</span>
</pre></div>
<p>Amazing! <strong>Mypy warns you about a status you forgot to handle!</strong> The message also includes the value, <code>OrderStatus.Scheduled</code>. If you use a modern editor such as VSCode you can get these warnings immediately as you type:</p>
<figure><img alt="mypy Error in VSCode" src="https://hakibenita.com/images/00-python-mypy-exhaustive-checking.png"><figcaption>mypy Error in VSCode</figcaption>
</figure>
<p>You can now go ahead and fix your function to handle the missing status:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">status</span><span class="p">:</span> <span class="n">OrderStatus</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Pending</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'schedule order'</span><span class="p">)</span>
<span class="hll"> <span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Scheduled</span><span class="p">:</span>
</span><span class="hll"> <span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
</span>
<span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Shipped</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
</pre></div>
<p>Check with mypy again:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
Success:<span class="w"> </span>no<span class="w"> </span>issues<span class="w"> </span>found<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>file
</pre></div>
<p>Great! You can now rest assure you handled all order statuses. The best part is that you did that with <strong>no unit tests</strong>, and there were <strong>no runtime errors</strong>. If you include mypy in your CI, the <strong>bad code will never make it into production</strong>.</p>
<hr>
<h2 id="enumeration-types"><a class="toclink" href="#enumeration-types">Enumeration types</a></h2>
<p>In the previous section you used mypy to perform exhaustiveness check on an <code>Enum</code>. You can use mypy, and <code>assert_never</code> to perform exhaustiveness check on other enumeration types as well.</p>
<p><strong>Exhaustiveness Checking of a Union</strong></p>
<p>A <code>Union</code> type represents several possible types. For example, a function that casts an argument to <code>float</code> can look like this:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span>
</span>
<span class="hll"><span class="k">def</span> <span class="nf">get_float</span><span class="p">(</span><span class="n">num</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">num</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
</pre></div>
<p>Check the function with mypy:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
error:<span class="w"> </span>Argument<span class="w"> </span><span class="m">1</span><span class="w"> </span>to<span class="w"> </span><span class="s2">"assert_never"</span><span class="w"> </span>has<span class="w"> </span>incompatible<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"float"</span><span class="p">;</span><span class="w"> </span>expected<span class="w"> </span><span class="s2">"NoReturn"</span>
</pre></div>
<p>Whoops... you forgot to handle the <code>float</code> type in the code:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span>
<span class="k">def</span> <span class="nf">get_float</span><span class="p">(</span><span class="n">num</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">num</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
<span class="hll"> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">num</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
</span><span class="hll"> <span class="k">return</span> <span class="n">num</span>
</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
</pre></div>
<p>Check again:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
Success:<span class="w"> </span>no<span class="w"> </span>issues<span class="w"> </span>found<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>file
</pre></div>
<p>Great! mypy is happy...</p>
<p><strong>Exhaustiveness Checking of a Literal</strong></p>
<p>Another useful type is <code>Literal</code>. It is included in the built-in <code>typing</code> module since Python3.8, and prior to that it is part of the complementary <a href="https://pypi.org/project/typing-extensions/" rel="noopener"><code>typing_extensions</code> package</a>.</p>
<p>A <code>Literal</code> is used to type primitive values such as strings and numbers. <code>Literal</code> is also an enumeration type, so you can use exhaustiveness checking on it as well:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing_extensions</span> <span class="kn">import</span> <span class="n">Literal</span>
<span class="n">Color</span> <span class="o">=</span> <span class="n">Literal</span><span class="p">[</span><span class="s1">'R'</span><span class="p">,</span> <span class="s1">'G'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_color_name</span><span class="p">(</span><span class="n">color</span><span class="p">:</span> <span class="n">Color</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="n">color</span> <span class="o">==</span> <span class="s1">'R'</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'Red'</span>
<span class="k">elif</span> <span class="n">color</span> <span class="o">==</span> <span class="s1">'G'</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'Green'</span>
<span class="c1"># elif color == 'B':</span>
<span class="c1"># return 'Blue'</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">color</span><span class="p">)</span>
</pre></div>
<p>Checking the code without the commented part will produce the following error:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mypy<span class="w"> </span>main.py
error:<span class="w"> </span>Argument<span class="w"> </span><span class="m">1</span><span class="w"> </span>to<span class="w"> </span><span class="s2">"assert_never"</span><span class="w"> </span>has<span class="w"> </span>incompatible<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"Literal['B']"</span><span class="p">;</span><span class="w"> </span>expected<span class="w"> </span><span class="s2">"NoReturn"</span>
</pre></div>
<p>Very handy indeed!</p>
<hr>
<h2 id="type-narrowing-in-mypy"><a class="toclink" href="#type-narrowing-in-mypy">Type Narrowing in Mypy</a></h2>
<p>Now that you've seen what <code>assert_never</code> can do, you can try and understand how it works. <code>assert_never</code> works alongside <strong>"type narrowing"</strong>, which is a mypy feature where the type of a variable is narrowed based on the control flow of the program. In other words, mypy is gradually eliminating possible types for a variable.</p>
<p>First, it's important to understand how various things translate to a <code>Union</code> type in mypy:</p>
<div class="highlight"><pre><span></span><span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span>
<span class="c1"># Equivalent to Union[int, None]</span>
<span class="n">Literal</span><span class="p">[</span><span class="s1">'string'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="kc">True</span><span class="p">]</span>
<span class="c1"># Equivalent to Union[Literal['string'], Literal[42], Literal[True]]</span>
<span class="k">class</span> <span class="nc">Suit</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span>
<span class="n">Clubs</span> <span class="o">=</span> <span class="s2">"β£"</span>
<span class="n">Diamonds</span> <span class="o">=</span> <span class="s2">"β¦"</span>
<span class="n">Hearts</span> <span class="o">=</span> <span class="s2">"β₯"</span>
<span class="n">Spades</span> <span class="o">=</span> <span class="s2">"β "</span>
<span class="n">Suit</span>
<span class="c1"># ~Equivalent to Union[</span>
<span class="c1"># Literal[Suit.Clubs],</span>
<span class="c1"># Literal[Suit.Diamonds],</span>
<span class="c1"># Literal[Suit.Hearts],</span>
<span class="c1"># Literal[Suit.Spades]</span>
<span class="c1"># ]</span>
</pre></div>
<p>To display the type of an expression, mypy provides a useful utility called <a href="https://mypy.readthedocs.io/en/stable/common_issues.html#reveal-type" rel="noopener"><code>reveal_type</code></a>. Using <code>reveal_type</code> you can ask mypy to show you the inferred type for a variable at the point it's called:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="hll"> <span class="c1"># Revealed type is Union[Suit, None]</span>
</span> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</pre></div>
<p>In the function above, the reveled type of <code>suit</code> is <code>Union[Suit, None]</code>, which is the type of the argument <code>suit</code>.</p>
<p>At this point you haven't done anything in the function, so mypy is unable to narrow down the type. Next, add some logic and see how mypy narrows down the type of the variable <code>suit</code>:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="hll"> <span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
</span><span class="hll"> <span class="c1"># Revealed type is Suit</span>
</span> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</pre></div>
<p>After eliminating the option of suit being <code>None</code>, the revealed type is <code>Suit</code>. Mypy used your program's logic to narrow the type of the variable.</p>
<p>Keep in mind, the type <code>Suit</code> is equivalent to the type <code>Union[Literal[Suit.Clubs], Literal[Suit.Diamonds], Literal[Suit.Hearts], Literal[Suit.Spades]]</code>, so next, try to narrow down the type even more:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="hll"> <span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Clubs</span><span class="p">:</span>
</span><span class="hll"> <span class="c1"># Revealed type is Literal[Suit.Clubs]</span>
</span> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="k">return</span> <span class="s2">"Clubs"</span>
<span class="hll"> <span class="c1"># Revealed type is Literal[Suit.Diamonds, Suit.Hearts, Suit.Spades]</span>
</span> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</pre></div>
<p>After checking if <code>suit</code> is <code>Suit.Clubs</code>, mypy is able to narrow down the type to <code>Suit.Clubs</code>. Mypy is also smart enough to understand that if the condition does not hold, the variable <em>is definitely not</em> <code>Clubs</code>, and narrows down the type to <code>Diamonds</code>, <code>Hearts</code> or <code>Spades</code>.</p>
<p>Mypy can also use other conditional statements to further narrow the type, for example:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Clubs</span><span class="p">:</span>
<span class="c1"># Revealed type is Literal[Suit.Clubs]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="k">return</span> <span class="s2">"Clubs"</span>
<span class="c1"># Revealed type is Literal[Suit.Diamonds, Suit.Hearts, Suit.Spades]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="hll"> <span class="c1"># `and`, `or` and `not` also work.</span>
</span><span class="hll"> <span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Diamonds</span> <span class="ow">or</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Spades</span><span class="p">:</span>
</span><span class="hll"> <span class="c1"># Revealed type is Literal[Suit.Diamonds, Suit.Spades]</span>
</span> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="k">return</span> <span class="s2">"Diamonds or Spades"</span>
<span class="hll"> <span class="c1"># Revealed type is Literal[Suit.Hearts]</span>
</span><span class="hll"> <span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</span></pre></div>
<p>By the end of the function, mypy narrowed down the type of <code>suit</code> to <code>Suit.Hearts</code>. If, for example, you add a condition that imply a different type for <code>suit</code>, mypy will issue an error:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Clubs</span><span class="p">:</span>
<span class="c1"># Revealed type is Literal[Suit.Clubs]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="k">return</span> <span class="s2">"Clubs"</span>
<span class="c1"># Revealed type is Literal[Suit.Diamonds, Suit.Hearts, Suit.Spades]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="c1"># `and`, `or` and `not` also work.</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Diamonds</span> <span class="ow">or</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Spades</span><span class="p">:</span>
<span class="c1"># Revealed type is Literal[Suit.Diamonds, Suit.Spades]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="k">return</span> <span class="s2">"Diamonds or Spades"</span>
<span class="c1"># Revealed type is Literal[Suit.Hearts]</span>
<span class="n">reveal_type</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
<span class="hll"> <span class="c1"># mypy error [comparison-overlap]: Non-overlapping identity check</span>
</span><span class="hll"> <span class="c1"># left operand type: "Literal[Suit.Hearts]"</span>
</span><span class="hll"> <span class="c1"># right operand type: "Literal[Suit.Diamonds]"</span>
</span><span class="hll"> <span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Diamonds</span><span class="p">:</span>
</span><span class="hll"> <span class="c1"># mypy error [unreachable]: Statement is unreachable</span>
</span><span class="hll"> <span class="k">return</span> <span class="s2">"Diamonds"</span>
</span></pre></div>
<p>After mypy narrowed down the type of <code>suit</code> to <code>Literal[Suit.Hearts]</code>, it knows the next condition <code>suit is Suit.Diamonds</code> will always evaluate to False, and issues an error.</p>
<p>Once all the possibilities have been narrowed-out, the rest of the function becomes unreachable:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Clubs</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"Clubs"</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Diamonds</span> <span class="ow">or</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Spades</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"Diamonds or Spades"</span>
<span class="k">if</span> <span class="n">suit</span> <span class="o">==</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Hearts</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'Hearts'</span>
<span class="hll"> <span class="c1"># This is currently unreachable</span>
</span><span class="hll"> <span class="n">assert_never</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</span></pre></div>
<p><code>assert_never</code> works by taking an argument of type <code>NoReturn</code>, which is only possible when the argument type is "empty". That is, when all possibilities have been narrowed-out and the statement is unreachable. If the statement does become reachable, then the <code>NoReturn</code> is not allowed and mypy issues an error. To illustrate, remove the last condition and check the code with mypy:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">describe_suit</span><span class="p">(</span><span class="n">suit</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Suit</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">suit</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Clubs</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"Clubs"</span>
<span class="k">if</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Diamonds</span> <span class="ow">or</span> <span class="n">suit</span> <span class="ow">is</span> <span class="n">Suit</span><span class="o">.</span><span class="n">Spades</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"Diamonds or Spades"</span>
<span class="hll"> <span class="c1"># if suit == Suit.Hearts:</span>
</span><span class="hll"> <span class="c1"># return 'Hearts'</span>
</span>
<span class="hll"> <span class="c1"># mypy error: Argument 1 to "assert_never" has</span>
</span><span class="hll"> <span class="c1"># incompatible type "Literal[Suit.Hearts]"; expected "NoReturn"</span>
</span> <span class="n">assert_never</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span>
</pre></div>
<p>Mypy narrowed down the type of <code>suit</code> to <code>Suit.Hearts</code>, but <code>assert_never</code> expects <code>NoReturn</code>. This mismatch triggers the error, which <strong>effectively performs exhaustiveness checking</strong> for <code>suit</code>.</p>
<hr>
<h2 id="the-future"><a class="toclink" href="#the-future">The Future</a></h2>
<p>In 2018 <a href="https://github.com/python/mypy/issues/5818#issuecomment-431863917" rel="noopener">Guido though <code>assert_never</code> is a pretty clever trick</a>, but it never made it into mypy. Instead, exhaustiveness checking will become officially available as part of mypy if/when <a href="https://www.python.org/dev/peps/pep-0622/" rel="noopener">PEP 622 - Structural Pattern Matching</a> is implemented. Until then, you can use <code>assert_never</code> instead.</p>
<hr>
<h2 id="bonus-exhaustiveness-checking-in-django"><a class="toclink" href="#bonus-exhaustiveness-checking-in-django">Bonus: Exhaustiveness Checking in Django</a></h2>
<p>Django provides a very useful attribute to most model field types called <a href="https://docs.djangoproject.com/en/3.1/ref/models/fields/#choices" rel="noopener"><code>choices</code></a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">django.utils.translation</span> <span class="kn">import</span> <span class="n">gettext_lazy</span> <span class="k">as</span> <span class="n">_</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">status</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span> <span class="o">=</span> <span class="mi">20</span><span class="p">,</span>
<span class="n">choices</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="s1">'ready'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Ready'</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'scheduled'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Scheduled'</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'shipped'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Shipped'</span><span class="p">)),</span>
<span class="p">),</span>
<span class="p">)</span>
</pre></div>
<p>When you provide choices to a field, Django adds all sorts of nice things to it:</p>
<ul>
<li>Add a validation check to <code>ModelForm</code> (which are used by Django admin, among others)</li>
<li>Render the field as a <code><select></code> html element in forms</li>
<li>Add a <code>get_{field}_display_name</code> method to get the description</li>
</ul>
<p>However, mypy can't know that a Django field with choices has a limited set of values, so it cannot perform exhaustiveness checking on it. To adapt our example from before:</p>
<div class="highlight"><pre><span></span><span class="c1"># Will not perform exhaustiveness checking!</span>
<span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">order</span><span class="p">:</span> <span class="n">Order</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">order</span><span class="o">.</span><span class="n">status</span> <span class="o">==</span> <span class="s1">'ready'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">order</span><span class="o">.</span><span class="n">status</span> <span class="o">==</span> <span class="s1">'shipped'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
</pre></div>
<p>The function is not handling the status "scheduled", but mypy can't know that.</p>
<p>One way to overcome this is to use an enum to generate the choices:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">enum</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="hll"><span class="k">class</span> <span class="nc">OrderStatus</span><span class="p">(</span><span class="n">enum</span><span class="o">.</span><span class="n">Enum</span><span class="p">):</span>
</span> <span class="n">Ready</span> <span class="o">=</span> <span class="s1">'ready'</span>
<span class="n">Scheduled</span> <span class="o">=</span> <span class="s1">'scheduled'</span>
<span class="n">Shipped</span> <span class="o">=</span> <span class="s1">'shipped'</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">status</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span> <span class="o">=</span> <span class="mi">20</span><span class="p">,</span>
<span class="hll"> <span class="n">choices</span> <span class="o">=</span> <span class="p">((</span><span class="n">e</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">OrderStatus</span><span class="p">),</span>
</span> <span class="p">)</span>
</pre></div>
<p>Now, you can achieve exhaustiveness checking with a slight change to the code:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">handle_order</span><span class="p">(</span><span class="n">order</span><span class="p">:</span> <span class="n">Order</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="hll"> <span class="n">status</span> <span class="o">=</span> <span class="n">OrderStatus</span><span class="p">(</span><span class="n">order</span><span class="o">.</span><span class="n">status</span><span class="p">)</span>
</span>
<span class="k">if</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Pending</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ship order'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">status</span> <span class="ow">is</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">Shipped</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'charge order'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">assert_never</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
</pre></div>
<p>The tricky part here is that the model field <code>status</code> is actually a string, so to achieve exhaustiveness checking you have to turn the value into an instance of the <code>OrderStatus</code> enum. There are two downsides to this approach:</p>
<ol>
<li>
<p><strong>You have to cast the value every time</strong>: This is not very convenient. This can possibly be solved by implementing a custom "Enum field" in Django.</p>
</li>
<li>
<p><strong>The status descriptions are not translated</strong>: Previously you used gettext (<code>_</code>) to translate the enum values, but now you just used the description of the enum.</p>
</li>
</ol>
<p>While the first is still a pain, the second issue was addressed in Django 3.1 with the addition of <a href="https://docs.djangoproject.com/en/3.1/ref/models/fields/#enumeration-types" rel="noopener">Django enumeration types</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="hll"><span class="k">class</span> <span class="nc">OrderStatus</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">TextChoices</span><span class="p">):</span>
</span> <span class="n">Ready</span> <span class="o">=</span> <span class="s1">'ready'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Ready'</span><span class="p">)</span>
<span class="n">Scheduled</span> <span class="o">=</span> <span class="s1">'scheduled'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Scheduled'</span><span class="p">)</span>
<span class="n">Shipped</span> <span class="o">=</span> <span class="s1">'shipped'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Shipped'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">status</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span> <span class="o">=</span> <span class="mi">20</span><span class="p">,</span>
<span class="hll"> <span class="n">choices</span> <span class="o">=</span> <span class="n">OrderStatus</span><span class="o">.</span><span class="n">choices</span><span class="p">,</span>
</span> <span class="p">)</span>
</pre></div>
<p>Notice how you replaced the enum with a <code>TextChoices</code>. The new enumeration type looks a lot like an Enum (it actually extends Enum under the hood), but it let's you provide a tuple with a value and a description instead of just the value.</p>
<hr>
<h2 id="updates"><a class="toclink" href="#updates">Updates</a></h2>
<p>After publishing this article a few readers suggested ways to improve the implementation, so I made the following edits:</p>
<ol>
<li>
<p><strong>2020-12-09</strong>: The initial version of the article had <code>assert_never</code> take a value of type <code>NoReturn</code>. <a href="https://lobste.rs/s/1un01t/exhaustiveness_checking_with_mypy#c_ws1qku" rel="noopener">A commenter on Lobsters</a> made an excellent suggestion to use the more intuitive <code>Union[()]</code> type instead. This also results in a better error message.</p>
</li>
<li>
<p><strong>2020-12-09</strong>: The initial version of the article used <code>assert False, ...</code> in <code>assert_never</code> instead of <code>raise AssertionError(...)</code>. <a href="https://lobste.rs/s/1un01t/exhaustiveness_checking_with_mypy#c_l3obsb" rel="noopener">A commenter on Lobsters</a> mentioned that <code>assert</code> statements are removed when python is run with the <code>-O</code> flag. Since the <code>assert</code> in <code>assert_never</code> should not be removed, I changed it to <code>raise AssertionError</code> instead.</p>
</li>
<li>
<p><strong>2020-12-10</strong>: After looking some more, <a href="https://lobste.rs/s/1un01t/exhaustiveness_checking_with_mypy#c_oeezlr" rel="noopener">tmcb found</a> that <code>Union[()]</code> is not currently accepted by Python <em>at runtime</em>, so I reverted the argument to <code>NoReturn</code> again.</p>
</li>
</ol>The Surprising Impact of Medium-Size Texts on PostgreSQL Performance2020-10-20T00:00:00+03:002020-10-20T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-10-20:/sql-medium-text-performance<p>Any database schema is likely to have plenty of text fields. In this article I demonstrate the surprising impact of medium-size texts on query performance.</p><hr>
<p>Any database schema is likely to have plenty of text fields. In this article, I divide text fields into three categories:</p>
<ol>
<li>
<p><strong>Small texts</strong>: names, slugs, usernames, emails, etc. These are text fields that usually have some low size limit, maybe even using <code>varchar(n)</code> and not <code>text</code>.</p>
</li>
<li>
<p><strong>Large texts</strong>: blog post content, articles, HTML content etc. These are large pieces of free, unrestricted text that is stored in the database.</p>
</li>
<li>
<p><strong>Medium texts</strong>: descriptions, comments, product reviews, stack traces etc. These are any text field that is between the small and the large. These type of texts would normally be unrestricted, but naturally smaller than the large texts.</p>
</li>
</ol>
<p><strong>In this article I demonstrate the surprising impact of medium-size texts on query performance in PostgreSQL.</strong></p>
<figure><img alt="Sliced bread... it gets better<br><small>Photo by <a href="https://unsplash.com/photos/WHJTaLqonkU">Louise LyshΓΈj</a></small>" src="https://hakibenita.com/images/00-sql-medium-text-performance.jpg"><figcaption>Sliced bread... it gets better<br><small>Photo by <a href="https://unsplash.com/photos/WHJTaLqonkU">Louise LyshΓΈj</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#understanding-toast">Understanding TOAST</a><ul>
<li><a href="#finding-the-toast">Finding the TOAST</a></li>
<li><a href="#toast-in-action">TOAST in Action</a></li>
<li><a href="#toast-compression">TOAST Compression</a></li>
<li><a href="#configuring-toast">Configuring TOAST</a></li>
</ul>
</li>
<li><a href="#toast-performance">TOAST Performance</a><ul>
<li><a href="#set-up-test-data">Set Up Test Data</a></li>
<li><a href="#comparing-performance">Comparing Performance</a></li>
<li><a href="#making-sense-of-the-results">Making Sense of the Results</a></li>
</ul>
</li>
<li><a href="#possible-solutions">Possible Solutions</a><ul>
<li><a href="#adjusting-toast_tuple_target">Adjusting toast_tuple_target</a></li>
<li><a href="#create-a-separate-table">Create a Separate Table</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="understanding-toast"><a class="toclink" href="#understanding-toast">Understanding TOAST</a></h2>
<p>When talking about large chunks of text, or any other field that may contain large amounts of data, we first need to understand how the database handles the data. Intuitively, you might think that the database is storing large pieces of data inline like it does smaller pieces of data, but in fact, <a href="https://www.postgresql.org/docs/current/storage-toast.html" rel="noopener">it does not</a>:</p>
<blockquote>
<p>PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly.</p>
</blockquote>
<p>As the documentation explains, PostgreSQL can't store rows (tuples) in multiple pages. So how does the database store large chunks of data?</p>
<blockquote>
<p>[...] large field values are compressed and/or broken up into multiple physical rows. [...] The technique is affectionately known as TOAST (or βthe best thing since sliced breadβ).</p>
</blockquote>
<p>OK, so how is this TOAST working exactly?</p>
<blockquote>
<p>If any of the columns of a table are TOAST-able, the table will have an associated TOAST table</p>
</blockquote>
<p>So TOAST is a separate table associated with our table. It is used to store large pieces of data of TOAST-able columns (the <code>text</code> datatype for example, is TOAST-able).</p>
<p>What constitutes a large value?</p>
<blockquote>
<p>The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had</p>
</blockquote>
<p>PostgreSQL will try to compress a the large values in the row, and if the row can't fit within the limit, the values will be stored out-of-line in the TOAST table.</p>
<h3 id="finding-the-toast"><a class="toclink" href="#finding-the-toast">Finding the TOAST</a></h3>
<p>Now that we have <em>some</em> understanding of what TOAST is, let's see it in action. First, create a table with a text field:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>The table contains an id column, and a value field of type <code>TEXT</code>. Notice that we did not change any of the default storage parameters.</p>
<p>The text field we added supports TOAST, or is TOAST-able, so PostgreSQL should create a TOAST table. Let's try to locate the TOAST table associated with the table <code>toast_test</code> in <a href="https://www.postgresql.org/docs/current/catalog-pg-class.html" rel="noopener"><code>pg_class</code></a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">relname</span><span class="p">,</span><span class="w"> </span><span class="n">reltoastrelid</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">relname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'toast_test'</span><span class="p">;</span>
<span class="go"> relname β reltoastrelid</span>
<span class="go">βββββββββββββΌβββββββββββββββ</span>
<span class="go"> toast_test β 340488</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">relname</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">340488</span><span class="p">;</span>
<span class="go"> relname</span>
<span class="go">βββββββββββββββββ</span>
<span class="go"> pg_toast_340484</span>
</pre></div>
<p>As promised, PostgreSQL created a TOAST table called <code>pg_toast_340484</code>.</p>
<h3 id="toast-in-action"><a class="toclink" href="#toast-in-action">TOAST in Action</a></h3>
<p>Let's see what the TOAST table looks like:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\d</span><span class="w"> </span><span class="ss">pg_toast.pg_toast_340484</span>
<span class="go">TOAST table "pg_toast.pg_toast_340484"</span>
<span class="go"> Column β Type</span>
<span class="go">βββββββββββββΌβββββββββ</span>
<span class="go"> chunk_id β oid</span>
<span class="go"> chunk_seq β integer</span>
<span class="go"> chunk_data β bytea</span>
</pre></div>
<p>The TOAST table contains three columns:</p>
<ul>
<li><code>chunk_id</code>: A reference to a toasted value.</li>
<li><code>chunk_seq</code>: A sequence within the chunk.</li>
<li><code>chunk_data</code>: The actual chunk data.</li>
</ul>
<p>Similar to "regular" tables, the TOAST table also has the same restrictions on inline values. To overcome this restriction, large values are split into chunks that can fit within the limit.</p>
<p>At this point the table is empty:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="p">;</span>
<span class="go"> chunk_id β chunk_seq β chunk_data</span>
<span class="go">βββββββββββΌββββββββββββΌββββββββββββ</span>
<span class="go">(0 rows)</span>
</pre></div>
<p>This makes sense because we did not insert any data yet. So next, insert a small value into the table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'small value'</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="p">;</span>
<span class="go"> chunk_id β chunk_seq β chunk_data</span>
<span class="go">βββββββββββΌββββββββββββΌββββββββββββ</span>
<span class="go">(0 rows)</span>
</pre></div>
<p>After inserting the small value into the table, the TOAST table remained empty. This means the small value was small enough to be stored inline, and there was no need to move it out-of-line to the TOAST table.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 176.3 111" height="10em"><path d="M9 47c62-3 116 2 153-2M13 47c49 0 93 0 150 2m-3-4c3 17 2 22 4 53m-2-52c-3 14-2 31 0 56m2-4c-37 1-78 7-150 5m149-4H13m0 4c-3-15 1-28-4-56m3 55c2-11-1-25 1-54" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(33 60)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(61 61)">"small value"</text><path d="M10 12l155-1v37L12 46" stroke-width="0" fill="#f2f2f2"/><path d="M10 9c43-1 84 0 156 2M11 11c36 1 75 0 155-1m1-2l-2 38m1-36v39m1 0c-39 2-78 1-159 0m158-2c-39 2-77 2-155 0m-3 1c0-8 3-20 3-36m-1 37V10" stroke="currentColor" fill="none"/><path d="M52 16l4 81m0-84c-1 19-5 36-3 88" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(23 18)">id</text><g><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(68 19)">value</text></g></svg>
<figcaption>Small text stored inline</figcaption>
</figure>
<p>Let's insert a large value and see what happens:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'n0cfPGZOCwzbHSMRaX8 ... WVIlRkylYishNyXf'</span><span class="p">);</span>
<span class="go">INSERT 0 1</span>
</pre></div>
<p>I shortened the value for brevity, but that's a random string with 4096 characters. Let's see what the TOAST table stores now:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="p">;</span>
<span class="go"> chunk_id β chunk_seq β chunk_data</span>
<span class="go">βββββββββββΌββββββββββββΌββββββββββββββββββββββ</span>
<span class="go"> 995899 β 0 β \x30636650475a4f43...</span>
<span class="go"> 995899 β 1 β \x50714c3756303567...</span>
<span class="go"> 995899 β 2 β \x6c78426358574534...</span>
<span class="go">(3 rows)</span>
</pre></div>
<p>The large value is stored out-of-line in the TOAST table. Because the value was too large to fit inline in a single row, PostgreSQL split it into three chunks. The <code>\x3063...</code> notation is how psql displays binary data.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 435.8 254" height="20em"><path d="M9 43c58 5 116 4 154 6M14 48c34-1 70 1 148-1m2 0c-5 34-4 68-3 92m0-94c1 34 4 69 2 96m-4 0c-27-2-54-2-143-5m145 3c-47 2-96 2-147 1m-2 3c5-29 2-60 0-94m2 89c1-36-2-72-3-90" stroke="currentColor" fill="none"/><path d="M10 95c45-2 79 3 150 0M12 96c53 0 106-3 147-3" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(33 60)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(61 61)">"small value"</text><text y="15" font-size="16" fill="currentColor" transform="translate(34 106)">2</text><path d="M108 109l4 3 3 5-2 4-5 4h-4l-4-3v-7c0-2 1-2 3-3l6-3 1 1m-4-2l3 3 3 5 1 4c0 2-1 2-2 3l-6 3-4-5c-1-2-3-4-2-5l2-4 4-4 1 3" stroke-width="0" fill="#f41d92"/><path d="M105 110h5l2 4 2 5c0 2-3 3-4 4l-4 2c-2 0-2-1-3-2l-4-6 2-4 4-5 1 2m-1 0l6 1 3 3-1 4-1 4-6 3-3-1-3-5v-7l6-2-2-1M115 115c30 2 64 1 141 0m-141 1c34-1 70-3 142-1" stroke="currentColor" fill="none"/><path d="M229 125c5-1 13-4 26-11m-26 11l27-10" stroke="currentColor" fill="none"/><path d="M229 104c5 4 13 6 26 10m-26-9c6 3 14 4 27 10" stroke="currentColor" fill="none"/><path d="M275 104h152v137l-151 3" stroke-width="0" fill="#f41d92"/><path d="M276 99c36 2 66 5 146 7m-148-3c47-2 87-3 150 0m2 4c-5 40-4 85-2 135m2-139c-2 41-1 78 0 140m1-3c-53 5-105-2-149 0m146 4c-45 1-90 0-146-2m1-1c-4-56-3-110 0-135m-5 135c5-48 3-100 2-138" stroke="currentColor" fill="none"/><path d="M314 107c1 30 5 66 8 136m-3-136v137" stroke="currentColor" fill="none"/><path d="M279 153c36 6 77 8 145 2m-145 0c50-2 102-2 146-2M271 194c46 0 91 2 151 5m-148-2c29-3 59-1 146-1" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(294 165)">2</text><text y="15" font-size="16" fill="currentColor" transform="translate(295 119)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(292 207)">3</text><text y="15" font-size="16" fill="currentColor" transform="translate(329 118)">\x.....</text><text y="15" font-size="16" fill="currentColor" transform="translate(331 164)">\x.....</text><text y="15" font-size="16" fill="currentColor" transform="translate(332 206)">\x.....</text><g><path d="M12 9l153 1-1 36-153 1" stroke-width="0" fill="#f2f2f2"/><path d="M11 12c50 0 100-4 157-2M9 11c48-3 97-3 158-1m1 0c-3 13 0 29-2 39m0-39v39m-1-1c-41-2-80 1-154-2m156 2c-62 2-122 2-158 0m2 1c1-12-2-25-3-40m1 39c1-12 2-23 1-38" stroke="currentColor" fill="none"/></g><g><path d="M55 11c-1 39 2 67 0 130M54 16c4 36 3 74 1 120" stroke="currentColor" fill="none"/></g><g><text y="15" font-size="16" transform="translate(23 18)">id</text></g><g><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(68 19)">value</text></g></svg>
<figcaption>Large text stored out-of-line, in the associated TOAST table</figcaption>
</figure>
<p>Finally, execute the following query to summarize the data in the TOAST table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">chunk_id</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">chunks</span><span class="p">,</span><span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">sum</span><span class="p">(</span><span class="n">octet_length</span><span class="p">(</span><span class="n">chunk_data</span><span class="p">)</span><span class="o">::</span><span class="nb">bigint</span><span class="p">))</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go"> chunk_id β chunks β pg_size_pretty</span>
<span class="go">βββββββββββΌβββββββββΌββββββββββββββββ</span>
<span class="go"> 995899 β 3 β 4096 bytes</span>
<span class="go">(1 row)</span>
</pre></div>
<p>As we've already seen, the text is stored in three chunks.</p>
<div class="admonition tip">
<p class="admonition-title">size of database objects</p>
<p>There are several ways to get the <a href="https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE" rel="noopener">size of database objects in PostgreSQL</a>:</p>
<ul>
<li><code>pg_table_size</code>: Get the size of the table including TOAST, but excluding indexes</li>
<li><code>pg_relation_size</code>: Get the size of just the table</li>
<li><code>pg_total_relation_size</code>: Get the size of the table, including indexes and TOAST</li>
</ul>
<p>Another useful function is <code>pg_size_pretty</code>: used to display sizes in a friendly format.</p>
</div>
<h3 id="toast-compression"><a class="toclink" href="#toast-compression">TOAST Compression</a></h3>
<p>So far I refrained from categorizing texts by their size. The reason for that is that the size of the text itself does not matter, what matters is its size after compression.</p>
<p>To create long strings for testing, we'll implement a function to generate random strings at a given length:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span>
<span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span>
<span class="w"> </span><span class="n">characters</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">default</span><span class="w"> </span><span class="s1">'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'</span>
<span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">AS</span>
<span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">then</span>
<span class="w"> </span><span class="k">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Invalid length'</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">__</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="mf">1..</span><span class="n">length</span><span class="w"> </span><span class="k">LOOP</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">characters</span><span class="p">,</span><span class="w"> </span><span class="n">floor</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">length</span><span class="p">(</span><span class="n">characters</span><span class="p">))</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">end</span><span class="w"> </span><span class="k">loop</span><span class="p">;</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>Generate a string made out of 10 random characters:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">10</span><span class="p">);</span>
<span class="go"> generate_random_string</span>
<span class="go">ββββββββββββββββββββββββ</span>
<span class="go"> o0QsrMYRvp</span>
</pre></div>
<p>We can also provide a set of characters to generate the random string from. For example, generate a string made of 10 random digits:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">10</span><span class="p">,</span><span class="w"> </span><span class="s1">'1234567890'</span><span class="p">);</span>
<span class="go"> generate_random_string</span>
<span class="go">ββββββββββββββββββββββββ</span>
<span class="go"> 4519991669</span>
</pre></div>
<p>PostgreSQL TOAST uses the <a href="https://doxygen.postgresql.org/pg__lzcompress_8c_source.html" rel="noopener">LZ family of compression</a> techniques. Compression algorithms usually work by identifying and eliminating repetition in the value. A long string containing fewer characters should compress very well compared to a string made of many different characters when encoded into bytes.</p>
<p>To illustrate how TOAST uses compression, we'll clean out the <code>toast_test</code> table, and insert a random string made of many possible characters:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">TRUNCATE</span><span class="w"> </span><span class="n">toast_test</span><span class="p">;</span>
<span class="go">TRUNCATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">));</span>
<span class="go">INSERT 0 1</span>
</pre></div>
<p>We inserted a 10kb value made of random characters. Let's check the TOAST table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">chunk_id</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">chunks</span><span class="p">,</span><span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">sum</span><span class="p">(</span><span class="n">octet_length</span><span class="p">(</span><span class="n">chunk_data</span><span class="p">)</span><span class="o">::</span><span class="nb">bigint</span><span class="p">))</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go"> chunk_id β chunks β pg_size_pretty</span>
<span class="go">βββββββββββΌβββββββββΌββββββββββββββββ</span>
<span class="go"> 1495960 β 6 β 10 kB</span>
</pre></div>
<p>The value is stored out-of-line in the TOAST table, and we can see it is not compressed.</p>
<p>Next, insert a value with a similar length, but made out of fewer possible characters:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">,</span><span class="w"> </span><span class="s1">'123'</span><span class="p">));</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">chunk_id</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">chunks</span><span class="p">,</span><span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">sum</span><span class="p">(</span><span class="n">octet_length</span><span class="p">(</span><span class="n">chunk_data</span><span class="p">)</span><span class="o">::</span><span class="nb">bigint</span><span class="p">))</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go"> chunk_id β chunks β pg_size_pretty</span>
<span class="go">βββββββββββΌβββββββββΌββββββββββββββββ</span>
<span class="go"> 1495960 β 6 β 10 kB</span>
<span class="go"> 1495961 β 2 β 3067 bytes</span>
</pre></div>
<p>We inserted a 10K value, but this time it only contained 3 possible digits: <code>1</code>, <code>2</code> and <code>3</code>. This text is more likely to contain repeating binary patterns, and should compress better than the previous value. Looking at the TOAST, we can see PostgreSQL compressed the value to ~3kB, which is a third of the size of the uncompressed value. Not a bad compression rate!</p>
<p>Finally, insert a 10K long string made of a single digit:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="p">,</span><span class="w"> </span><span class="s1">'0'</span><span class="p">));</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">chunk_id</span><span class="p">,</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">chunks</span><span class="p">,</span><span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">sum</span><span class="p">(</span><span class="n">octet_length</span><span class="p">(</span><span class="n">chunk_data</span><span class="p">)</span><span class="o">::</span><span class="nb">bigint</span><span class="p">))</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_340484</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go"> chunk_id β chunks β pg_size_pretty</span>
<span class="go">βββββββββββΌβββββββββΌββββββββββββββββ</span>
<span class="go"> 1495960 β 6 β 10 kB</span>
<span class="go"> 1495961 β 2 β 3067 bytes</span>
</pre></div>
<p>The string was compressed so well, that the database was able to store it in-line.</p>
<h3 id="configuring-toast"><a class="toclink" href="#configuring-toast">Configuring TOAST</a></h3>
<p>If you are interested in configuring TOAST for a table you can do that by setting storage parameters at <code>CREATE TABLE</code> or <code>ALTER TABLE ... SET STORAGE</code>. The relevant parameters are:</p>
<ul>
<li><code>toast_tuple_target</code>: The minimum tuple length after which PostgreSQL tries to move long values to TOAST.</li>
<li><code>storage</code>: The TOAST strategy. PostgreSQL supports <a href="https://www.postgresql.org/docs/current/storage-toast.html#STORAGE-TOAST-ONDISK" rel="noopener">4 different TOAST strategies</a>. The default is <code>EXTENDED</code>, which means PostgreSQL will try to compress the value and store it out-of-line.</li>
</ul>
<p>I personally never had to change the default TOAST storage parameters.</p>
<hr>
<h2 id="toast-performance"><a class="toclink" href="#toast-performance">TOAST Performance</a></h2>
<p>To understand the effect of different text sizes and out-of-line storage on performance, we'll create three tables, one for each type of text:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>Like in the previous section, for each table PostgreSQL created a TOAST table:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relname</span><span class="p">,</span>
<span class="w"> </span><span class="n">c2</span><span class="mf">.</span><span class="n">relname</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">toast_relname</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c1</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">reltoastrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c2</span><span class="mf">.</span><span class="n">oid</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relname</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'toast_test%'</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relkind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'r'</span><span class="p">;</span>
<span class="go"> relname β toast_relname</span>
<span class="go">ββββββββββββββββββββΌβββββββββββββββββ</span>
<span class="go"> toast_test_small β pg_toast_471571</span>
<span class="go"> toast_test_medium β pg_toast_471580</span>
<span class="go"> toast_test_large β pg_toast_471589</span>
</pre></div>
<h3 id="set-up-test-data"><a class="toclink" href="#set-up-test-data">Set Up Test Data</a></h3>
<p>First, let's populate <code>toast_test_small</code> with 500K rows containing a small text that can be stored inline:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="s1">'small value'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">500000</span><span class="p">);</span>
<span class="go">INSERT 0 500000</span>
</pre></div>
<p>Next, populate the <code>toast_test_medium</code> with 500K rows containing texts that are at the border of being stored out-of-line, but still small enough to be stored inline:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">str</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">1800</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">value</span><span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">500000</span><span class="p">),</span><span class="w"> </span><span class="n">str</span><span class="p">;</span>
<span class="go">INSERT 0 500000</span>
</pre></div>
<p>I experimented with different values until I got a value just large enough to be stored out-of-line. The trick is to find a string which is roughly 2K that compresses very poorly.</p>
<p>Next, insert 500K rows with large texts to <code>toast_test_large</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">str</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">4096</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">value</span><span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="k">value</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">500000</span><span class="p">),</span><span class="w"> </span><span class="n">str</span><span class="p">;</span>
<span class="go">INSERT 0 500000</span>
</pre></div>
<p>We are now ready for the next step.</p>
<h3 id="comparing-performance"><a class="toclink" href="#comparing-performance">Comparing Performance</a></h3>
<p>We usually expect queries on large tables to be slower than queries on smaller tables. In this case, it's not unreasonable to expect the query on the small tables to run faster than on the medium table, and a query on the medium table to be faster than the same query on the large table.</p>
<p>To compare performance, we are going to execute a simple query to fetch one row from the table. Since we don't have an index, the database is going to perform a full table scan. We'll also disable parallel query execution to get a clean, simple timing, and execute the query multiple times to account for caching.</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">max_parallel_workers_per_gather</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="go">SET</span>
</pre></div>
<p>Starting with the small table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Seq Scan on toast_test_small (cost=0.00..8953.00 rows=1 width=16</span>
<span class="go"> Filter: (id = 6000)</span>
<span class="go"> Rows Removed by Filter: 499999</span>
<span class="go">Execution Time: 41.513 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go">Execution Time: 25.865 ms</span>
</pre></div>
<p>I ran the query multiple times and trimmed the output for brevity. As expected the database performed a full table scan, and the timing finally settled on ~25ms.</p>
<p>Next, execute the same query on the medium table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go">Execution Time: 321.965 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go">Execution Time: 173.058 ms</span>
</pre></div>
<p>Running the exact same query on the medium table took significantly more time, 173ms, which is roughly 6x slower than on the smaller table. This makes sense.</p>
<p>To complete the test, run the query again on the large table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go">Execution Time: 49.867 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go">Execution Time: 37.291 ms</span>
</pre></div>
<p>Well, this is surprising! <strong>The timing of the query on the large table is similar to the timing of the small table, and 6 times faster than the medium table.</strong></p>
<table>
<thead>
<tr>
<th>Table</th>
<th>Timing</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>toast_test_small</code></td>
<td>31.323 ms</td>
</tr>
<tr>
<td><code>toast_test_medium</code></td>
<td>173.058 ms</td>
</tr>
<tr>
<td><code>toast_test_large</code></td>
<td>37.291 ms</td>
</tr>
</tbody>
</table>
<p>Large tables are supposed to be slower, so what is going on?</p>
<h3 id="making-sense-of-the-results"><a class="toclink" href="#making-sense-of-the-results">Making Sense of the Results</a></h3>
<p>To make sense of the results, have a look at the size of each table, and the size of its associated TOAST table:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">c1</span><span class="p">.</span><span class="n">relname</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">(</span><span class="n">c1</span><span class="p">.</span><span class="n">relname</span><span class="p">::</span><span class="n">regclass</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">size</span><span class="p">,</span>
<span class="w"> </span><span class="n">c2</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">toast_relname</span><span class="p">,</span>
<span class="w"> </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_relation_size</span><span class="p">((</span><span class="s1">'pg_toast.'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c2</span><span class="p">.</span><span class="n">relname</span><span class="p">)::</span><span class="n">regclass</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">toast_size</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c1</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">c1</span><span class="p">.</span><span class="n">reltoastrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c2</span><span class="p">.</span><span class="n">oid</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">c1</span><span class="p">.</span><span class="n">relname</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'toast_test_%'</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">c1</span><span class="p">.</span><span class="n">relkind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'r'</span><span class="p">;</span>
</pre></div>
<div class="table-container">
<table>
<thead>
<tr>
<th>relname</th>
<th>size</th>
<th>toast_relname</th>
<th>toast_size</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>toast_test_small</code></td>
<td>21 MB</td>
<td>pg_toast_471571</td>
<td>0 bytes</td>
</tr>
<tr>
<td><code>toast_test_medium</code></td>
<td>977 MB</td>
<td>pg_toast_471580</td>
<td>0 bytes</td>
</tr>
<tr>
<td><code>toast_test_large</code></td>
<td>25 MB</td>
<td>pg_toast_471589</td>
<td>1953 MB</td>
</tr>
</tbody>
</table>
</div>
<p>Let's break it down:</p>
<ul>
<li><code>toast_test_small</code>: The size of the table is 21MB, and there is no TOAST. This makes sense because the texts we inserted to that table were small enough to be stored inline.</li>
</ul>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 173.9 240.1" height="15em"><path d="M11 49c52-4 100 1 148 1M12 49c31 0 65-2 148-1m-3-1c-2 54-2 118 5 183m-4-182c2 48-1 92 0 181m-2 0c-49-3-98 0-145 3m149-3c-44 4-86 3-148 3m-2 1c0-67 4-131 2-184m0 179c-1-71 1-145-1-179" stroke="currentColor" fill="none"/><path d="M15 95c54-5 115-2 147-4M15 95c43-2 84-2 145-1" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(31 60)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(82 66)">.....</text><text y="15" font-size="16" fill="currentColor" transform="translate(32 106)">2</text><path d="M9 10l148 2 2 35L9 48" stroke-width="0" fill="#f2f2f2"/><path d="M10 11c40-3 83-1 150-1M11 10l147-1m1 1c-2 8-1 16 0 39m-1-39v37m1 2L9 48m150 1L11 48m0 1c1-13-2-21-3-40m3 40l-1-38M72 14c1 30 5 57 1 130M70 18l1 121" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(21 18)">id</text><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(83 19)">value</text><path d="M14 144c57-3 108 3 150 5m-150-3l146-1" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(21 200)">500K</text><path d="M10 190c47-1 90 4 149-3m-149 2c50 4 103 2 148-2" stroke="currentColor" fill="none"/><path d="M77 194c3 7-2 11 2 33m-3-36c-2 16 1 28-1 38" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(83 114)">.....</text><text y="15" font-size="16" fill="currentColor" transform="translate(84 202)">.....</text><path d="M42 151h3l-1 3v1l-3-1v-1h-1v-3h3m-1-1l2 2c1 0 0 0 0 0l-1 3-1 1-1 1v-1l-1-2v-1l4-1-2-1" stroke-width="0" fill="#f2f2f2"/><path d="M42 150l1 1c1 1 0 0 0 0v3h-1l1 2-3-1-1-2 2-2h2m0-2l-1 1 1 2 1 3h-1v-2h-3c-1 0 0 1 0 0l1-2 2-1s0 0 0 0" stroke="currentColor" fill="none"/><g><path d="M43 179v2h1v1l-2 2h-1l-1-1v-2h2v-2h1m-2 0l3 1h1v5l-3-2-1 1-2-2 2-1 1-1s0 1 0 0" stroke-width="0" fill="#f2f2f2"/><path d="M41 180l3-1h1l-1 3v2h-1-1l-2-2v-1l2-2h-1m2 2a13 13 0 000 0l2 2-2 2-2-1-1-2 1-1-1-2h2l1 1" stroke="currentColor" fill="none"/></g><g><path d="M42 163l2 1v2h-1v2h-1l-1-2s-1 1 0 0l1-2h1m-1 1l2 1v1l-2 2h2l-3-1v-2s0 1 0 0v-2h2c1 0 1 0 0 0" stroke-width="0" fill="#f2f2f2"/><path d="M43 165v2l1 1-1 1h-1l-1-3v-1-1l1-1s0 0 0 0m3 0l-2 1 1 4c0 1 0 0 0 0h-1l-1-1h-2l1-3c0-1 0-1 0 0h1l1-1" stroke="currentColor" fill="none"/></g></svg>
<figcaption>Small texts stored inline</figcaption>
</figure>
<ul>
<li><code>toast_test_medium</code>: The table is significantly larger, 977MB. We inserted text values that were just small enough to be stored inline. As a result, the table got very big, and the TOAST was not used at all.</li>
</ul>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 412.6 240.1" height="15em"><path d="M11 49c100 0 197-3 391-1M13 48h390m-2 0c-1 49 0 106 3 185m-2-186l2 185m-1-2c-132 3-267 1-391 2m392-2c-112 4-220 4-390 0m-3-1c7-58 5-120-1-179m5 179c0-54 2-105-3-180" stroke="currentColor" fill="none"/><path d="M15 94c149 4 294 5 384-1M14 94c131 3 261 3 384 1" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(32 60)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(83 66)">.................................................................</text><text y="15" font-size="16" fill="currentColor" transform="translate(33 106)">2</text><path d="M12 9l390 2-3 35-386 3" stroke-width="0" fill="#f2f2f2"/><path d="M10 10c108 0 215 2 391 0M11 10h390m0 0c1 9 0 22-1 37m1-36v37m-1 0c-153 2-309 3-388 0m389 0c-87 2-174 1-390 0m1-1c0-13-2-29 0-38m-1 38c1-11 1-22-1-38" stroke="currentColor" fill="none"/><path d="M70 18c-3 49 3 94 5 120M72 17l2 122" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(22 18)">id</text><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(84 19)">value</text><path d="M15 143c100 5 200 5 387 0m-388 0l386-1" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(22 200)">500K</text><path d="M10 191c77-1 158-3 388-3m-388 2c111-2 222-2 387-1" stroke="currentColor" fill="none"/><path d="M79 188c2 18 0 31-3 42m2-40c-3 12 0 24 0 39" stroke="currentColor" fill="none"/><path d="M42 150h2l1 1 1 2-3 2v-1h-3l1-2 2-2-1 1m1-1v2l1-2 1 4v1h-4l1-2v-1h-1l2-2v-1" stroke-width="0" fill="#f2f2f2"/><path d="M43 150l1 1 1 1-2 2v0h-2v-3h3l1 1m0-1h1l-2 3v2l-2-3v1l-1-2v-1l2-2v1" stroke="currentColor" fill="none"/><path d="M43 180l1 1v2l1-1-1 2h-1l-2-2c-1-1 0-2 1-2h1v-1l1 1m-2-1h3l-2 2 2 2v-1l-2 3-3-1v-3h2v-1l-1-2" stroke-width="0" fill="#f2f2f2"/><path d="M43 180h1v3l-1 2-2-1 1-2-1-1 1-1h3m-2 1l1-1 2 1-1 3h-1-3l1-1v-3-1s0 0 0 0l1 2" stroke="currentColor" fill="none"/><g><path d="M44 163l2 3-2 2-1 1v-1h-2v-2l1-2v1l2-1m-1 1l1-2 1 3 1 2-4 1 1-2-1-1h-1l1-2 2-2-2 1" stroke-width="0" fill="#f2f2f2"/><path d="M43 163l1 1v1l1 2h-1l-2 2-1-2v-1l1-2 2 1s0 0 0 0m0-2l-1 2 1 1 2 2-2-1-1 2-1-1-1-2v-3l1 1 2 1" stroke="currentColor" fill="none"/></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(87 114)">.................................................................</text></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(90 198)">.................................................................</text></g></svg>
<figcaption>Medium texts stored inline</figcaption>
</figure>
<ul>
<li><code>toast_test_large</code>: The size of the table is roughly similar to the size of the small table. This is because we inserted large texts into the table, and PostgreSQL stored them out-of-line in the TOAST table. This is why the TOAST table is so big for the large table, but the table itself remained small.</li>
</ul>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 406.8 300" height="20em"><path d="M8 50l150-4M11 48c47 0 93 2 148 1m-1-5c9 44 5 90 5 188m-3-183c2 51 1 103 2 183m-2-2c-59 3-115 3-146-1m146 2c-47-1-96 0-148-2m0 2c5-46 1-94 1-183m-3 181c2-72 2-142 1-183" stroke="currentColor" fill="none"/><path d="M17 94c48 3 98 0 143 0M15 94l148-2" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(33 60)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(34 106)">2</text><path d="M10 10l151-2-3 39-148 2" stroke-width="0" fill="#f2f2f2"/><path d="M13 9c38 1 79 0 147 2M12 9c37 1 74 3 147 0m1 2c0 14 1 30-1 38m1-38c1 10-1 19-1 37m1 1H11m149-1H12m-1 1c1-13 0-31-1-40m2 39c-1-12 1-23-1-38M69 16c4 47 2 90 7 122M70 19c-1 30 3 57 2 123" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(23 18)">id</text><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(84 19)">value</text><path d="M16 140c33 3 65-1 143 2m-143 2l146-2" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(23 200)">500K</text><path d="M11 191c45-2 90 0 144 0m-145-3c38-2 70-1 149 1" stroke="currentColor" fill="none"/><path d="M80 189c-5 12-2 25 0 39m-4-35l2 37" stroke="currentColor" fill="none"/><path d="M44 150c0-1 0 0 0 0l1 2v2h-1l-2 1v-2l-1-2 1 1 1-2c0-1 0 0 0 0m1 1l2 1-1 1 1 1-3 2-1-2-1-1 1-2h2l1-1-2-1" stroke-width="0" fill="#f2f2f2"/><path d="M43 151l3-1-1 3-1 1v1l-1-1s0 0 0 0l-2-2 1-1h3m-1 0l2 1s0 0 0 0l-1 1-1 1-1 1-2-3h1v-2l1 2 1-1" stroke="currentColor" fill="none"/><path d="M43 179l1 1 1 1v2h-2l1 1-3-2v-1l2-2c1 0 0 0 0 0v1m1 0v-1l2 3h-1v2l-2-1h-1-1l2-3v-2l-1 2" stroke-width="0" fill="#f2f2f2"/><path d="M43 180v-1l3 1-1 2v2h-2v-1l-1-1v-2h2m0 0l1 1s0 0 0 0v2l-1 1-1-1h-2l1-2-1-2 4 1-1 1" stroke="currentColor" fill="none"/><path d="M44 163l1 1v3l-2 2-1-2-1-2 1 1v-3h2l2 2m-1-1l-1 1 3 1-2 3-1-1h-1l-1-1 1-1v-2-1h2" stroke-width="0" fill="#f2f2f2"/><path d="M45 163l-1 3h1v2h-1-1l-2-1 1-2v-1l3 1v-1m-1 0h2v3l-1 1c-1 1-2 0-3-1v1l-1-3v1l3-2h1" stroke="currentColor" fill="none"/><path d="M109 65h5l2 4c1 2 2 3 1 5 0 2-2 4-3 5l-5 2-4-4c-1-1-3-3-3-5l2-4 5-2h2m-4 0c1-1 3 0 4 1 2 0 4 1 5 3 1 1 2 3 1 4l-4 6-3 1-6-1-2-6 2-6 2-2h2" stroke-width="0" fill="#f41d92"/><path d="M111 64l3 4 3 3-1 6-3 4-6-2-3-3-1-5 2-4 7-1h2m-5 0l8 2c1 1-1 3-1 5s2 3 1 4c-1 2-5 4-7 5l-2-2-5-5v-6l4-3 4-1v1" stroke="currentColor" fill="none"/><path d="M108 109l6-1 4 3 2 5-4 5-3 3c-2 1-4 0-6-1-1-1-3-3-3-5l2-5 4-4-1 1m5-1c1 1 5 2 6 4l-1 5-4 5h-4l-4-2c-1-1-4-3-4-5s3-5 4-6l4-2 4 1-1 2" stroke-width="0" fill="#f41d92"/><path d="M115 108l3 5v4l-2 4-4 3-5-2-3-6c0-2 2-3 3-4l3-4 6 1v1m-2 0l4 3-1 6-2 5-5-1-3-3-3-4 1-4 4-3 5-1h1" stroke="currentColor" fill="none"/><path d="M114 202l5 4-1 5-4 5h-4l-4-2c-1-1-3-3-3-5l3-5c1-2 2-3 4-3l5 1v2m-2-3l4 6 1 3c0 2-2 4-3 5l-3 2-7-1-2-6c0-2 2-3 3-5l4-2 4-1v1" stroke-width="0" fill="#f41d92"/><path d="M114 203l3 3v6c0 2-1 4-3 4h-5l-3-2-2-5c-1-1 0-3 1-4 0-2 2-2 4-3l5-1 1 1m-6 0l3-1 6 4 1 4c0 2 0 4-2 5-1 2-5 2-6 2h-4l-4-4v-6c1-2 5-4 6-4v1" stroke="currentColor" fill="none"/><path d="M208 56l191-6-2 238-186 5" stroke-width="0" fill="#f41d92"/><path d="M213 49c52 8 108 8 186 0m-188 2c69 1 139 5 186 1m0-2c-1 72-5 144 2 238m-2-237l-1 238m4 1c-51-3-99 1-191-3m189 3c-67 1-137-1-188 1m1 1c-3-81-2-157 1-242m-1 242c2-76 2-150-1-241" stroke="currentColor" fill="none"/><path d="M273 54c5 49 7 99 2 138m1-137c-2 49 2 98 3 134" stroke="currentColor" fill="none"/><path d="M210 95c44 3 88-4 183-2m-182-2c44 4 87 2 179-2" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(228 65)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(285 63)">\x.....</text><path d="M214 128c61-1 128 3 172-1m-169 1c42-2 88-2 172 0" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(229 102)">1</text><text y="15" font-size="16" fill="currentColor" transform="translate(290 102)">\x.....</text><path d="M213 165c67-1 134 0 182-5m-180 3l180-2" stroke="currentColor" fill="none"/><text y="15" font-size="16" fill="currentColor" transform="translate(227 135)">2</text><text y="15" font-size="16" fill="currentColor" transform="translate(286 132)">\x.....</text><text y="15" font-size="16" fill="currentColor" transform="translate(229 170)">2</text><text y="15" font-size="16" fill="currentColor" transform="translate(287 171)">\x.....</text><path d="M217 200c33-4 72-5 175-2m-176-1l180-3M213 231c60 6 122 7 182 4m-183-2l179 1M219 264c41 0 86-4 176-3m-178 1c44 1 83-2 178-2" stroke="currentColor" fill="none"/><path d="M236 203v3l-1 1s1 0 0 0l-1 1v-2-2-1h2l-2 1m1 1l2 1v2l-2-1-2 2-1-3h2l1-3-1 2 2-1" stroke-width="0" fill="#f2f2f2"/><path d="M236 204h-1l1 2c0 1 0 0 0 0l-1 1h-1l-1-1 1-2s0-1 0 0h1v1m1 0l-1-1s0-1 0 0l2 2-1 2s0 0 0 0l-2-2-2 1 2-2v-1h2" stroke="currentColor" fill="none"/><path d="M234 223l2-1 1 1v2l-1 1-1 2-2-1-1-1 1-1 1-3 1 1m0-1l1 1 1 1v1h-2l1 1-3 1 1-1-2-3 3 1-1-2" stroke-width="0" fill="#f2f2f2"/><path d="M234 222c1-1 0-1 0 0l3 2-1 1v1l-1 1-2 1-1-2 1-2 2-2s0 0 0 0m1 0c0-1 0-1 0 0l-1 3v3l-1-1-2-2h2l1-3 1 2-1-1" stroke="currentColor" fill="none"/><g><path d="M235 214h1l1 1-1 2h-1l-1-1v-1-2s-1 0 0 0h1m-2 1h3l2 1h-1l-2 3-1-1v-3l-2 1 4-2-2 1" stroke-width="0" fill="#f2f2f2"/><path d="M233 212l1 1h3v2l-1 1-1 1v-1h-2v-1-1l1-1m0 0v-1l3 1v4l-2-1-2 2v-2l-1-1 4-2-1-1" stroke="currentColor" fill="none"/></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(218 237)">500K</text></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(220 267)">500K</text></g><g><path d="M278 239c-2 12-3 28-2 41m-2-36c2 5 3 17 1 37" stroke="currentColor" fill="none"/></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(286 240)">\x.....</text></g><g><text y="15" font-size="16" fill="currentColor" transform="translate(286 266)">\x.....</text></g><g><path d="M125 72l83-1m-83 1l83-1M180 82l28-11m-28 11l28-11M180 61l28 10m-28-10l28 10" stroke="currentColor" fill="none"/></g><g><path d="M129 120l78 25m-78-25l78 25M177 146l30-1m-30 1l30-1M183 127l24 18m-24-18l24 18" stroke="currentColor" fill="none"/></g><g><path d="M127 217l78 32m-78-32l78 32M175 248l30 1m-30-1l30 1M183 229l22 20m-22-20l22 20" stroke="currentColor" fill="none"/></g></svg>
<figcaption>Large texts stored out-of-line in TOAST</figcaption>
</figure>
<p>When we executed our query, the database did a full table scan. To scan the small and large tables, the database only had to read 21MB and 25MB and the query was pretty fast. However, when we executed the query against the medium table, where all the texts are stored inline, the database had to read 977MB from disk, and the query took a lot longer.</p>
<div class="admonition tip">
<p class="admonition-title">TAKE AWAY</p>
<p>TOAST is a great way of keeping tables compact by storing large values out-of-line!</p>
</div>
<p><strong>Using the Text Values</strong></p>
<p>In the previous comparison we executed a query that only used the ID, not the text value. What will happen when we actually need to access the text value itself?</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\timing</span>
<span class="go">Timing is on.</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 7509.900 ms (00:07.510)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 7290.925 ms (00:07.291)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 5869.631 ms (00:05.870)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 259.970 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 78.897 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'foo%'</span><span class="p">;</span>
<span class="go">Time: 50.035 ms</span>
</pre></div>
<p>We executed a query against all three tables to search for a string within the text value. The query is not expected to return any results, and is forced to scan the entire table. This time, the results are more consistent with what we would expect:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>Table</th>
<th>Cold cache</th>
<th>Warm cache</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>toast_test_small</code></td>
<td>78.897 ms</td>
<td>50.035 ms</td>
</tr>
<tr>
<td><code>toast_test_medium</code></td>
<td>5869.631 ms</td>
<td>259.970 ms</td>
</tr>
<tr>
<td><code>toast_test_large</code></td>
<td>7509.900 ms</td>
<td>7290.925 ms</td>
</tr>
</tbody>
</table>
</div>
<p>The larger the table, the longer it took the query to complete. This makes sense because to satisfy the query, the database was forced to read the texts as well. In the case of the large table, this means accessing the TOAST table as well.</p>
<p><strong>What About Indexes?</strong></p>
<p>Indexes help the database minimize the number of pages it needs to fetch to satisfy a query. For example, let's take the first example when we searched for a single row by ID, but this time we'll have an index on the field:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">toast_test_medium_id_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">toast_test_small</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">toast_test_medium_id_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">toast_test_large_id_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">toast_test_large</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>Executing the exact same query as before with indexes on the tables:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.42..8.44 rows=1 width=16)</span>
<span class="go"> Index Cond: (id = 6000)</span>
<span class="go">Time: 0.772 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Index Scan using toast_test_medium_id_ix on toast_test_medium(cost=0.42..8.44 rows=1 width=1808</span>
<span class="go"> Index Cond: (id = 6000)</span>
<span class="go">Time: 0.831 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">6000</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Index Scan using toast_test_large_id_ix on toast_test_large(cost=0.42..8.44 rows=1 width=22)</span>
<span class="go"> Index Cond: (id = 6000)</span>
<span class="go">Time: 0.618 ms</span>
</pre></div>
<p>In all three cases the index was used, and we see that the performance in all three cases is almost identical.</p>
<p>By now, we know that the trouble begins when the database has to do a lot of IO. So next, let's craft a query that the database will choose to use the index for, but will still have to read a lot of data:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ</span>
<span class="go">Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.4..9086 rows=249513 width=16</span>
<span class="go"> Index Cond: ((id >= 0) AND (id <= 250000))</span>
<span class="go">Time: 60.766 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_small</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go">Time: 59.705 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go">Time: 3198.539 ms (00:03.199)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_medium</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go">Time: 284.339 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go">Time: 85.747 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">TIMING</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">toast_test_large</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">250000</span><span class="p">;</span>
<span class="go">Time: 70.364 ms</span>
</pre></div>
<p>We executed a query that fetch half the data in the table. This was a low enough portion of table to make PostgreSQL decide to use the index, but still high enough to require lots of IO.</p>
<div class="admonition info">
<p class="admonition-title">see also</p>
<p>To understand why using an index is not always the fastest plan, see my tip on <a href="/sql-tricks-application-dba#avoid-indexes-on-columns-with-low-selectivity">avoiding indexes on columns with low selectivity</a>.</p>
</div>
<p>We ran each query twice on each table. In all cases the database used the index to access the table. Keep in mind that the index only helps reduce the number of pages the database has to access, but in this case, the database still had to read half the table.</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>Table</th>
<th>Cold cache</th>
<th>Warm cache</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>toast_test_small</code></td>
<td>60.766 ms</td>
<td>59.705 ms</td>
</tr>
<tr>
<td><code>toast_test_medium</code></td>
<td>3198.539 ms</td>
<td>284.339 ms</td>
</tr>
<tr>
<td><code>toast_test_large</code></td>
<td>85.747 ms</td>
<td>70.364 ms</td>
</tr>
</tbody>
</table>
</div>
<p>The results here are similar to the first test we ran. When the database had to read a large portion of the table, the medium table, where the texts are stored inline, was the slowest.</p>
<h2 id="possible-solutions"><a class="toclink" href="#possible-solutions">Possible Solutions</a></h2>
<p>If after reading so far, you are convinced that medium-size texts are what's causing you performance issues, there are things you can do.</p>
<h3 id="adjusting-toast_tuple_target"><a class="toclink" href="#adjusting-toast_tuple_target">Adjusting <code>toast_tuple_target</code></a></h3>
<p><code>toast_tuple_target</code> is a storage parameter that controls the minimum tuple length after which PostgreSQL tries to move long values to TOAST. The default is 2K, but it can be decreased to a minimum of 128 bytes. The lower the target, the more chances are for a medium size string to be move out-of-line to the TOAST table.</p>
<p>To demonstrate, create a table with the default storage params, and another with <code>toast_tuple_target = 128</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_default_threshold</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_128_threshold</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">toast_tuple_target</span><span class="o">=</span><span class="mf">128</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relname</span><span class="p">,</span><span class="w"> </span><span class="n">c2</span><span class="mf">.</span><span class="n">relname</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">toast_relname</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c1</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="n">c2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">reltoastrelid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c2</span><span class="mf">.</span><span class="n">oid</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relname</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'toast%threshold'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">c1</span><span class="mf">.</span><span class="n">relkind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'r'</span><span class="p">;</span>
<span class="go"> relname β toast_relname</span>
<span class="go">βββββββββββββββββββββββββββββββΌββββββββββββββββββ</span>
<span class="go"> toast_test_default_threshold β pg_toast_3250167</span>
<span class="go"> toast_test_128_threshold β pg_toast_3250176</span>
</pre></div>
<p>Next, generate a value larger than 2KB that compresses to less than 128 bytes, insert to both tables, and check if it was stored out-of-line or not:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test_default_threshold</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">2100</span><span class="p">,</span><span class="w"> </span><span class="s1">'123'</span><span class="p">));</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_3250167</span><span class="p">;</span>
<span class="go"> chunk_id β chunk_seq β chunk_data</span>
<span class="go">βββββββββββΌββββββββββββΌββββββββββββ</span>
<span class="go">(0 rows)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">toast_test_128_threshold</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="n">generate_random_string</span><span class="p">(</span><span class="mf">2100</span><span class="p">,</span><span class="w"> </span><span class="s1">'123'</span><span class="p">));</span>
<span class="go">INSERT 0 1</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_toast</span><span class="mf">.</span><span class="n">pg_toast_3250176</span><span class="p">;</span>
<span class="go">β[ RECORD 1 ]βββββββββββββ</span>
<span class="go">chunk_id β 3250185</span>
<span class="go">chunk_seq β 0</span>
<span class="go">chunk_data β \x3408.......</span>
</pre></div>
<p>The (roughly) similar medium-size text was stored inline with the default params, and out-of-line with a lower <code>toast_tuple_target</code>.</p>
<h3 id="create-a-separate-table"><a class="toclink" href="#create-a-separate-table">Create a Separate Table</a></h3>
<p>If you have a critical table that stores medium-size text fields, and you notice that most texts are being stored inline and perhaps slowing down queries, you can move the column with the medium text field into its own table:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test_value</span><span class="w"> </span><span class="p">(</span><span class="n">fk</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">toast_test</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="p">,</span><span class="w"> </span><span class="n">value_id</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span>
</pre></div>
<p>In my previous article I demonstrated <a href="/sql-anomaly-detection">how we use SQL to find anomalies</a>. In one of those use cases, we actually had a table of errors that contained a python traceback. The error messages were medium texts, many of them stored in-line, and as a result the table got big very quickly! So big in fact, that we noticed queries are getting slower and slower. Eventually we moved the errors into a separate table, and things got much faster!</p>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>The main problem with medium-size texts is that they make the rows very wide. This is a problem because PostgreSQL, as well as other OLTP oriented databases, are storing values in rows. When we ask the database to execute a query with only a few columns, the values of these columns are most likely spread across many blocks. If the rows are wide, this translates into a lot of IO, which affect the query performance and resource usage.</p>
<p>To overcome this challenge, some non-OLTP oriented databases are using a different type of storage: columnar storage. Using columnar storage, data is stored on disk by columns, not by rows. This way, when the database has to scan a specific column, the values are stored in consecutive blocks, and it usually translated to less IO. Additionally, values of a specific columns are more likely to have repeating patterns and values, so they are better compressed.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 436.4 261.5" height="15em"><path d="M14 134c51 4 101-2 148-5m-149 4c34 2 65 0 148-2m-2 1c1 12 4 17 2 39m-2-39c3 16 1 32 2 42m0-3c-51 8-96 8-153 2m154-1H13m-3 0l5-37m-3 37c-3-8 1-17 0-40" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(32 142)" style="fill: currentColor">2</text><text y="15" font-size="16" transform="translate(82 147)" style="fill: currentColor">.....</text><path d="M12 14l145-2v37L10 51" stroke-width="0" fill="#f2f2f2"/><path d="M9 13c46-1 95 0 151 1M10 13h147m0 0c0 12 3 21 0 38m1-38v37m2-1c-44 4-88 2-149 0m147 2c-50-2-98-2-149-1m2 3l-1-42m0 40L9 13" stroke="currentColor" fill="none"/><path d="M71 17c-4 14-2 27 2 33m-3-31v27" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(21 20)">id</text><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(83 21)">value</text><path d="M72 137l-3 26m2-25v29M13 71c49-2 102-5 147-5M11 68c56 3 111 2 152 1m-3 2c3 6-2 13 4 39m-1-43v43m-5-2c-25-1-60-3-144-1m147 1c-37 1-79 0-149 2m2 0c-4-17 0-26 0-40m-1 40c0-12 0-23-2-40" stroke="currentColor" fill="none"/><text style="fill: currentColor" y="15" font-size="16" transform="translate(32 78)">1</text><text style="fill: currentColor" y="15" font-size="16" transform="translate(83 83)">.....</text><path d="M73 78c-2 4-5 17-1 20m-3-24c0 10 2 13-1 27M10 202c51 5 104 4 156 5m-151-2c33-1 66-3 147 0m5-1c-3 15-7 19-4 43m-2-41c2 12 5 26 2 41m-1-4c-53 4-113 3-146 5m148-3c-30-3-64 0-148 2m-1 2c2-11-1-28-1-45m-2 42c1-8 2-18 0-42" stroke="currentColor" fill="none"/><text style="fill: currentColor" y="15" font-size="16" transform="translate(34 214)">3</text><text style="fill: currentColor" y="15" font-size="16" transform="translate(84 219)">.....</text><path d="M72 210c-1 9 2 9-1 31m2-30v28" stroke="currentColor" fill="none"/><text style="fill: currentColor" y="15" font-size="16" transform="translate(296 139)">2</text><text style="fill: currentColor" y="15" font-size="16" transform="translate(347 144)">.....</text><path d="M275 13l54 1v36l-56 1" stroke-width="0" fill="#f2f2f2"/><path d="M273 13l56-1m-55 1l54-1m2 0c0 8-2 17-2 40m0-40v38m0-1l-56 2m56-1h-54m-1-2V12m1 39V13" stroke="currentColor" fill="none"/><text y="15" font-size="16" transform="translate(285 20)">id</text><path d="M275 52c12-4 25 0 49 1m-49-4c14-1 28-1 51 1m1-1c1 79-2 152-3 202m-1-202c1 56 2 111 1 199m-2-2c-14 6-30 6-50 1m51 2c-11 2-23 1-50-1m-1-1V54m3 196c5-39 4-81 2-199" stroke="currentColor" fill="none"/><text style="fill: currentColor" y="15" font-size="16" transform="translate(296 76)">1</text><text style="fill: currentColor" y="15" font-size="16" transform="translate(347 81)">.....</text><path d="M341 49c29 3 56 4 87-2m-84 0c20 3 39 3 79 3m1 0c2 51 2 101 0 200m1-202c-3 46-1 95 0 203m2 2c-22-4-38-3-86-5m83 4c-23-1-44 2-83-2m1 4c-1-45-2-97-1-206m2 204c-2-65-4-133 1-203" stroke="currentColor" fill="none"/><text style="fill: currentColor" y="15" font-size="16" transform="translate(298 212)">3</text><text style="fill: currentColor" y="15" font-size="16" transform="translate(349 217)">.....</text><path d="M341 9h87l-1 38-84 1" stroke-width="0" fill="#f2f2f2"/><path d="M343 11c14-3 34-1 81 1m-82-2c23 1 46 2 84 0m0 1c2 12-1 30-1 36m1-37l1 39m-2-2c-23 1-47 2-82-1m83 3l-84-1m-1 1c0-12 1-23-1-38m2 36V10" stroke="currentColor" fill="none"/><g><text x="20" y="15" font-size="16" text-anchor="middle" transform="translate(352 18)">value</text></g><g><path d="M184 127h67m-67 0h67M223 137l28-10m-28 10l28-10M223 116l28 11m-28-11l28 11" stroke="currentColor" fill="none"/></g></svg>
<figcaption>Row vs Column Storage</figcaption>
</figure>
<p>For non-OLTP payloads such as data warehouse systems, this makes sense. The tables are usually very wide, and queries often use a small subset of the columns, and read a lot of rows. In OLTP payloads, the system will usually read one or very few rows, so storing data in rows makes more sense.</p>
<p>There has been <a href="https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5GN8u396o5sA2AF5N97vTRAEDYac7w@mail.gmail.com" rel="noopener">chatter</a> <a href="https://wiki.postgresql.org/wiki/ColumnOrientedSTorage" rel="noopener">about</a> <a href="https://www.pgcon.org/2019/schedule/events/1374.en.html" rel="noopener">pluggable storage</a> in PostgreSQL, so this is something to look out for!</p>Simple Anomaly Detection Using Plain SQL2020-09-21T00:00:00+03:002020-09-21T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-09-21:/sql-anomaly-detection<p>Many developers think that having a critical bug in their code is the worse thing that can happen. Well, there is something much worst than that: Having a critical bug in your code and not knowing about it! Using some high school level statistics and a fair knowledge of SQL, I implemented a very simple anomaly detection system.</p><hr>
<p>Many developers think that having a critical bug in their code is the worst thing that can happen. Well, there is something much worse than that: Having a critical bug in your code and <strong>not knowing about it!</strong></p>
<p>To make sure I get notified about critical bugs as soon as possible, I started looking for ways to find anomalies in my data. I quickly found that information about these subjects tend to get very complicated, and involve a lot of ad-hoc tools and dependencies.</p>
<p>I'm not a statistician and not a data scientist, I'm just a developer. Before I introduce dependencies into my system I make sure I really can't do without them. So, <strong>using some high school level statistics and a fair knowledge of SQL, I implemented a simple anomaly detection system <em>that works</em>.</strong></p>
<figure><img alt="Can you spot the anomaly?<br><small>Photo by <a href="https://unsplash.com/photos/KmKZV8pso-s">Ricardo Gomez Angel</a></small>" src="https://hakibenita.com/images/00-sql-anomaly-detection.png"><figcaption>Can you spot the anomaly?<br><small>Photo by <a href="https://unsplash.com/photos/KmKZV8pso-s">Ricardo Gomez Angel</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#detecting-anomalies">Detecting Anomalies</a><ul>
<li><a href="#understanding-z-score">Understanding Z-Score</a></li>
<li><a href="#optimizing-z-score">Optimizing Z-Score</a></li>
</ul>
</li>
<li><a href="#analyzing-a-server-log">Analyzing a Server Log</a><ul>
<li><a href="#preparing-the-data">Preparing the Data</a></li>
<li><a href="#getting-a-sense-of-the-data">Getting a Sense of the Data</a></li>
<li><a href="#identifying-anomalies">Identifying Anomalies</a></li>
</ul>
</li>
<li><a href="#backtesting">Backtesting</a><ul>
<li><a href="#finding-past-anomalies">Finding Past Anomalies</a></li>
<li><a href="#adding-thresholds">Adding Thresholds</a></li>
<li><a href="#eliminating-repeating-alerts">Eliminating Repeating Alerts</a></li>
<li><a href="#experiment-with-different-values">Experiment With Different Values</a></li>
</ul>
</li>
<li><a href="#improving-accuracy">Improving Accuracy</a><ul>
<li><a href="#use-weighted-mean">Use Weighted Mean</a></li>
<li><a href="#use-median">Use Median</a></li>
<li><a href="#use-mad">Use MAD</a></li>
<li><a href="#use-different-measures">Use Different Measures</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<style>p:empty{display: none;}</style>
<div class="admonition tip">
<p class="admonition-title">Interactive Course</p>
<p><div style="display:flex;align-items:center;"><svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 109.72693673325944 107.6819131095042" style="height:4em;min-height:3em"><g stroke-linecap="round"><g stroke-opacity="0.6" fill-opacity="0.6" transform="translate(12.09554198756814 105.20243761579223) rotate(0 42.2257148024091 -51.36148106104014)" fill-rule="evenodd"><path d="M-1.66 -5.05 L7.13 -87.78 L83.5 -86.79 L82.01 -15.03 L-0.81 -7" stroke="none" stroke-width="0" fill="#f41d92" fill-rule="evenodd"></path><path d="M-2.1 -10.09 C-1.54 -30.82, 2.14 -59.36, 7.53 -91.25 M-1.53 -7.52 C1.88 -36.65, 3.27 -66.22, 6.29 -89.45 M2.14 -95.2 C30.81 -89.31, 48.3 -87.5, 86.55 -84.03 M5.05 -91.17 C20.77 -91.14, 40.01 -90.01, 86.29 -84.29 M86.01 -81.64 C82.96 -67.53, 83.74 -54.72, 79.6 -17.01 M85.42 -83.3 C84.19 -65.56, 80.76 -42.9, 78.07 -16.48 M81.65 -13.13 C55.9 -16.93, 35.32 -13.22, 1.09 -8.34 M78.72 -14.99 C55.21 -14.55, 33.55 -15.14, 1.98 -8.14 M0 -8.18 C0 -8.18, 0 -8.18, 0 -8.18 M0 -8.18 C0 -8.18, 0 -8.18, 0 -8.18" stroke="transparent" stroke-width="1" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(104.40120730265056 21.034223086383463) rotate(0 -46.43324311682974 36.885851424860505)"><path d="M-9.61 -0.25 C-28.67 3.39, -53.55 2.8, -86.14 4.19 M-8.02 -0.71 C-36.35 0.99, -63.74 2.79, -86.4 3.37 M-85.91 5.78 C-84.74 25.27, -84.02 48.38, -84.66 73.9 M-88.19 3.96 C-87.82 21.88, -87.78 38.36, -85.17 74 M-87.22 74.48 C-68.49 72.13, -46.92 70.49, -6.96 71.07 M-84.84 72.51 C-57.62 73.88, -28.98 71.83, -4.67 70.81 M-4.88 70.43 C-6.67 41.96, -8.5 14.45, -8.34 1.17 M-4.8 69.91 C-6.4 54.87, -6.47 37.84, -7.93 -0.48" stroke="#000" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(36.04266212705488 42.52137844133961) rotate(0 17.1304131627725 6.13138926161713)"><path d="M0.52 -1.09 C5.99 0.95, 27.36 11.25, 32.8 13.36 M-0.66 0.94 C5.16 3.19, 28.87 9.83, 34.92 11.73" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(40.47640031284823 70.78754917499106) rotate(0 14.673419521024726 -7.19565281114194)"><path d="M-0.01 -0.88 C4.69 -3.45, 23.52 -13.44, 28.47 -15.66 M-1.47 1.27 C3.59 -1.11, 25.7 -11.47, 30.82 -14.34" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g><g stroke-linecap="round"><g transform="translate(64.73291211726252 72.57512139647838) rotate(0 10.217923741740321 -1.1320545942063802)"><path d="M-0.48 0.17 C3.07 -0.23, 17.17 -2.01, 20.91 -2.43 M1.47 -0.79 C4.86 -1.07, 16.39 -1.33, 19.75 -1.46" stroke="#ced4da" stroke-width="4" fill="none"></path></g></g></svg>
<div style="flex-grow:1; margin-left: 1em;" markdown="1">
This article is also available as an <strong><a href="https://www.educative.io/courses/simple-anomaly-detection-sql" rel="noopener">interactive course on educative β«</a></strong>
</div>
</div></p>
</div>
<div class="admonition tip">
<p class="admonition-title">Interactive Editor</p>
<p><div style="display:flex;align-items:center;">
<svg xmlns="http://www.w3.org/2000/svg" style="height:6em;min-height:3em" viewBox="0 0 110.7 144.7"><g stroke-opacity=".6" fill-opacity=".6" fill-rule="evenodd"><path d="M10 130l25-41 10-5 6 1 14 2 8 8 17 17 6 12 4 1-22 7-66 1" fill="#f41d92"/><path d="M15 134c4-8 14-41 22-49 9-8 22-6 31 2 10 7 37 37 28 44-10 8-72 1-86 1m3 0c5-7 21-37 30-44s15-5 24 2c9 6 38 30 29 37s-68 5-82 6" stroke="transparent" fill="none"/></g><path d="M41 48c2 10 2 16 3 38m-2-37c1 11 0 23 2 35m-2 1l-30 39m30-40c-9 13-17 24-31 40m1 1c34-5 68-5 88-3m-88 2c32-3 65-2 87-3m2-2L65 87m35 33c-11-8-20-19-36-35m2 0l-3-39m2 40l-1-38m3-3l-25 6m22-5l-22 3" stroke="#000" stroke-width="4" fill="none"/><path d="M42 30l4 2 1 3v4l-3 2h-4l-3-2v-4l2-3 3-2c1 0 0 0 0 0m0 0l4 2 1 3v4l-3 2h-4l-3-2v-4l2-3 3-2c1 0 0 0 0 0" fill="#ced4da"/><path d="M42 30l4 2 1 3v4l-3 2h-4l-3-2v-4l2-3 3-2c1 0 0 0 0 0m0 0l4 2 1 3v4l-3 2h-4l-3-2v-4l2-3 3-2c1 0 0 0 0 0" stroke="transparent" fill="none"/><path d="M61 23l3-2 4 1 1 4c0 2-2 3-2 3l-2 3-5-1-2-2 1-5 3-2h1m2 1l4 1-1 2-2 4c-1 1-2 3-4 3l-2-3-2-3 2-5 3-1c1-1 3 1 3 1l-1-2" fill="#ced4da"/><path d="M62 22h3l4 2v4l-1 2-4 2-5-2-1-3v-4l4-1 1-1m0 0l2 1 3 4 1 1-2 5c-1 1-2 2-3 1-2 0-3-3-4-4l-3-2 2-4h3l1-2" stroke="transparent" fill="none"/><g><path d="M49 10l3 3v5c0 2-1 3-2 3l-3 2-5-2-2-4 2-4 3-3 5 1h1m-6-1l2-1 4 2-1 7-1 2-2 2-4 1-3-4 1-4 3-5h-1" fill="#ced4da"/><path d="M47 9l4 3v7l-4 3-4-1-1-2v-5l1-2 4-1c1 0 0 0 0 0m1-1l4 3-2 2c0 1 2 4 1 6l-5 2-3-1-2-4-1-3 2-4h5l2-1" stroke="transparent" fill="none"/></g></svg>
<div style="flex-grow:1; margin-left: 1em;" markdown="1">
To follow along with the article and experiment with actual data online check out the <strong><a href="https://popsql.com/queries/-MECQV6GiKr04WdCWM0K/simple-anomaly-detection-with-sql?access_token=2d2c0729f9a1cfa7b6a2dbb5b0adb45c" rel="noopener">interactive editor on PopSQL β«</a></strong>
</div>
</div></p>
</div>
<hr>
<h2 id="detecting-anomalies"><a class="toclink" href="#detecting-anomalies">Detecting Anomalies</a></h2>
<p>Anomaly in a data series is a significant deviation from some reasonable value. Looking at this series of numbers for example, which number stands out?</p>
<div class="highlight"><pre><span></span>2, 3, 5, 2, 3, 12, 5, 3, 4
</pre></div>
<p>The number that stands out in this series is 12.</p>
<figure><img alt="Scatter plot" src="https://hakibenita.com/images/00-sql-anomaly-detection-scatter-plot.png"><figcaption>Scatter plot</figcaption>
</figure>
<p>This is intuitive to a human, but computer programs don't have intuition...</p>
<p>To find the anomaly in the series we first need to define what a reasonable value is, and then define how far away from this value we consider a significant deviation. A good place to start looking for a reasonable value is the mean:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="go"> avg</span>
<span class="go">ββββββββββββββββββββ</span>
<span class="go">4.3333333333333333</span>
</pre></div>
<p>The mean is ~4.33.</p>
<p>Next, we need to define the deviation. Let's use <a href="https://en.wikipedia.org/wiki/Standard_deviation" rel="noopener">Standard Deviation</a>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="go"> stddev</span>
<span class="go">ββββββββββββββββββββ</span>
<span class="go">3.0822070014844882</span>
</pre></div>
<p>Standard deviation is the square root of the <a href="https://en.wikipedia.org/wiki/Variance" rel="noopener">variance</a>, which is the average squared distance from the mean. In this case it's 3.08.</p>
<p>Now that we've defined a "reasonable" value and a deviation, we can define a <em>range</em> of acceptable values:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">lower_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">upper_bound</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="go"> lower_bound β upper_bound</span>
<span class="go">ββββββββββββββββββββΌββββββββββββββββββββ</span>
<span class="go">1.2511263318488451 β 7.4155403348178215</span>
</pre></div>
<p>The range we defined is one standard deviation from the mean. Any value outside this range is considered an anomaly:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">series</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">bounds</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">lower_bound</span><span class="p">,</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">upper_bound</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">lower_bound</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">upper_bound</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_anomaly</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span><span class="p">,</span>
<span class="w"> </span><span class="n">bounds</span><span class="p">;</span>
<span class="go">n β is_anomaly</span>
<span class="go">ββββΌββββββββββββ</span>
<span class="go"> 2 β f</span>
<span class="go"> 3 β f</span>
<span class="go"> 5 β f</span>
<span class="go"> 2 β f</span>
<span class="go"> 3 β f</span>
<span class="hll"><span class="go">12 β t</span>
</span><span class="go"> 5 β f</span>
<span class="go"> 3 β f</span>
<span class="go"> 4 β f</span>
</pre></div>
<p>Using the query we found that the value 12 is outside the range of acceptable values, and identified it as an anomaly.</p>
<h3 id="understanding-z-score"><a class="toclink" href="#understanding-z-score">Understanding Z-Score</a></h3>
<p>Another way to represent a range of acceptable values is using a z-score. <a href="https://en.wikipedia.org/wiki/Standard_score" rel="noopener">z-score, or Standard Score</a>, is the number of standard deviations from the mean. In the previous section, our acceptable range was one standard deviation from the mean, or in other words, a z-score in the range Β±1:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">series</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">stats</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="n">series_mean</span><span class="p">,</span>
<span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">series_stddev</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">series_mean</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">series_stddev</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span><span class="p">,</span>
<span class="w"> </span><span class="n">stats</span><span class="p">;</span>
<span class="go">n β zscore</span>
<span class="go">ββββΌβββββββββββββββββββββββββ</span>
<span class="go"> 2 β -0.75703329861022517346</span>
<span class="go"> 3 β -0.43259045634870009448</span>
<span class="go"> 5 β 0.21629522817435006346</span>
<span class="go"> 2 β -0.75703329861022517346</span>
<span class="go"> 3 β -0.43259045634870009448</span>
<span class="go">12 β 2.4873951240050256</span>
<span class="go"> 5 β 0.21629522817435006346</span>
<span class="go"> 3 β -0.43259045634870009448</span>
<span class="go"> 4 β -0.10814761408717501551</span>
</pre></div>
<p>Like before, we can detect anomalies by searching for values which are outside the acceptable range using the z-score:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">series</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">stats</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="n">series_avg</span><span class="p">,</span>
<span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">series_stddev</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span>
<span class="p">),</span>
<span class="n">zscores</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">series_avg</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">series_stddev</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">zscore</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span><span class="p">,</span>
<span class="w"> </span><span class="n">stats</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_anomaly</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">zscores</span><span class="p">;</span>
<span class="go">n β zscore β is_anomaly</span>
<span class="go">ββββΌββββββββββββββββββββββββββΌββββββββββββ</span>
<span class="go"> 2 β -0.75703329861022517346 β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f</span>
<span class="go"> 5 β 0.21629522817435006346 β f</span>
<span class="go"> 2 β -0.75703329861022517346 β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f</span>
<span class="hll"><span class="go">12 β 2.4873951240050256 β t</span>
</span><span class="go"> 5 β 0.21629522817435006346 β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f</span>
<span class="go"> 4 β -0.10814761408717501551 β f</span>
</pre></div>
<p>Using z-score, we also identified 12 as an anomaly in this series.</p>
<h3 id="optimizing-z-score"><a class="toclink" href="#optimizing-z-score">Optimizing Z-Score</a></h3>
<p>So far we used one standard deviation from the mean, or a z-score of Β±1 to identify anomalies. Changing the z-score threshold can affect our results. For example, let's see what anomalies we identify when the z-score is greater than 0.5 and when it's greater than 3:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">series</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">array</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">n</span>
<span class="p">),</span>
<span class="n">stats</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="n">series_avg</span><span class="p">,</span>
<span class="w"> </span><span class="n">stddev</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">series_stddev</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span>
<span class="p">),</span>
<span class="n">zscores</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">n</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">series_avg</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">series_stddev</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">zscore</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">series</span><span class="p">,</span>
<span class="w"> </span><span class="n">stats</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">-</span><span class="mf">0.5</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">0.5</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_anomaly_0_5</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_anomaly_1</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">-</span><span class="mf">3</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_anomaly_3</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">zscores</span><span class="p">;</span>
<span class="go">n β zscore β is_anomaly_0_5 β is_anomaly_1 β is_anomaly_3</span>
<span class="go">ββββΌββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββΌββββββββββββββ</span>
<span class="go"> 2 β -0.75703329861022517346 β t β f β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f β f β f</span>
<span class="go"> 5 β 0.21629522817435006346 β f β f β f</span>
<span class="go"> 2 β -0.75703329861022517346 β t β f β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f β f β f</span>
<span class="go">12 β 2.4873951240050256 β t β t β f</span>
<span class="go"> 5 β 0.21629522817435006346 β f β f β f</span>
<span class="go"> 3 β -0.43259045634870009448 β f β f β f</span>
<span class="go"> 4 β -0.10814761408717501551 β f β f β f</span>
</pre></div>
<p>Let's see what we got:</p>
<ul>
<li>When we decreased the z-score threshold to 0.5, we identified the value 2 as an anomaly in addition to the value 12.</li>
<li>When we increased the z-score threshold to 3 we did not identify any anomaly.</li>
</ul>
<p>The quality of our results are directly related to the parameters we set for the query. Later we'll see how using backtesting can help us identify ideal values.</p>
<hr>
<h2 id="analyzing-a-server-log"><a class="toclink" href="#analyzing-a-server-log">Analyzing a Server Log</a></h2>
<p>Application servers such as nginx, Apache and IIS write a lot of useful information to access logs. The data in these logs can be extremely useful in identifying anomalies.</p>
<p>We are going to analyze logs of a web application, so the data we are most interested in is the timestamp and the status code of every response from the server. To illustrate the type of insight we can draw from just this data:</p>
<ul>
<li><strong>A sudden increase in 500 status code</strong>: You may have a problem in the server. Did you just push a new version? Is there an external service you're using that started failing in unexpected ways?</li>
<li><strong>A sudden increase in 400 status code</strong>: You may have a problem in the client. Did you change some validation logic and forgot to update the client? Did you make a change and forgot to handle backward compatibility?</li>
<li><strong>A sudden increase in 404 status code</strong>: You may have an SEO problem. Did you move some pages and forgot to set up redirects? Is there some script kiddy running a scan on your site?</li>
<li><strong>A sudden increase in 200 status code</strong>: You either have some significant legit traffic coming in, or you are under a DOS attack. Either way, you probably want to check where it's coming from.</li>
</ul>
<h3 id="preparing-the-data"><a class="toclink" href="#preparing-the-data">Preparing the Data</a></h3>
<p>Parsing and processing logs is outside the scope of this article, so let's assume we did that and we have a table that looks like this:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">server_log_summary</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">period</span><span class="w"> </span><span class="n">timestamptz</span><span class="p">,</span>
<span class="w"> </span><span class="n">status_code</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="nb">int</span>
<span class="p">);</span>
</pre></div>
<p>The table stores the number of entries for each status code at a given period. For example, our table stores how many responses returned each status code every minute:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">server_log_summary</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10</span><span class="p">;</span>
<span class="go"> period β status_code β entries</span>
<span class="go">ββββββββββββββββββββββββΌββββββββββββββΌβββββββββ</span>
<span class="go">2020-08-01 18:00:00+00 β 200 β 4084</span>
<span class="go">2020-08-01 18:00:00+00 β 404 β 0</span>
<span class="go">2020-08-01 18:00:00+00 β 400 β 24</span>
<span class="go">2020-08-01 18:00:00+00 β 500 β 0</span>
<span class="go">2020-08-01 17:59:00+00 β 400 β 12</span>
<span class="go">2020-08-01 17:59:00+00 β 200 β 3927</span>
<span class="go">2020-08-01 17:59:00+00 β 500 β 0</span>
<span class="go">2020-08-01 17:59:00+00 β 404 β 0</span>
<span class="go">2020-08-01 17:58:00+00 β 400 β 2</span>
<span class="go">2020-08-01 17:58:00+00 β 200 β 3850</span>
</pre></div>
<p>Note that the table has a row for every minute, even if the status code was never returned in that minute. Given a table of statuses, it's very tempting to do something like this:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Wrong!</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">date_trunc</span><span class="p">(</span><span class="s1">'minute'</span><span class="p">,</span><span class="w"> </span><span class="k">timestamp</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">entries</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">;</span>
</pre></div>
<p>This is a common mistake and it can leave you with gaps in the data. Zero is a value, and it holds a significant meaning. A better approach is to create an "axis", and join to it:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Correct!</span>
<span class="k">WITH</span><span class="w"> </span><span class="n">axis</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">generate_series</span><span class="p">(</span>
<span class="w"> </span><span class="n">date_trunc</span><span class="p">(</span><span class="s1">'minute'</span><span class="p">,</span><span class="w"> </span><span class="n">now</span><span class="p">()),</span>
<span class="w"> </span><span class="n">date_trunc</span><span class="p">(</span><span class="s1">'minute'</span><span class="p">,</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span><span class="p">),</span>
<span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 minute'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="k">period</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">200</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">400</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">404</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">500</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">status_code</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="k">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">entries</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">axis</span><span class="w"> </span><span class="n">a</span>
<span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">server_log</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">date_trunc</span><span class="p">(</span><span class="s1">'minute'</span><span class="p">,</span><span class="w"> </span><span class="n">l</span><span class="p">.</span><span class="k">timestamp</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="k">period</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">l</span><span class="p">.</span><span class="n">status_code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">status_code</span>
<span class="w"> </span><span class="p">)</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">;</span>
</pre></div>
<p>First we generate an axis using a cartesian join between the status codes we want to track, and the times we want to monitor. To generate the axis we used two nice features of PostgreSQL:</p>
<ul>
<li><a href="https://www.postgresql.org/docs/current/functions-srf.html" rel="noopener"><code>generate_series</code></a>: function that generates a range of values.</li>
<li><a href="https://www.postgresql.org/docs/current/queries-values.html" rel="noopener"><code>VALUES</code> list</a>: special clause that can generate "constant tables", as the documentation calls it. You might be familiar with the <code>VALUES</code> clause from <code>INSERT</code> statements. In the old days, to generate data we had to use a bunch of <code>SELECT ... UNION ALL</code>... using <code>VALUES</code> is much nicer.</li>
</ul>
<p>After generating the axis, we left join the actual data into it to get a complete series for each status code. The resulting data has no gaps, and is ready for analysis.</p>
<h3 id="getting-a-sense-of-the-data"><a class="toclink" href="#getting-a-sense-of-the-data">Getting a Sense of the Data</a></h3>
<p>To get a sense of the data, let's draw a stacked bar chart by status:</p>
<figure><img alt="stacked bar chart by status, over time" src="https://hakibenita.com/images/00-sql-anomaly-detection-chart-by-status-over-time.png"><figcaption>stacked bar chart by status, over time</figcaption>
</figure>
<p>The chart shows a period of 12 hours. It looks like we have a nice trend with two peaks at around 09:30 and again at 18:00.</p>
<p>We also spot right away that at ~11:30 there was a significant increase in 500 errors. The burst died down after around 10 minutes. This is the type of anomalies we want to identify early on.</p>
<p>It's entirely possible that there were other problems during that time, we just can't spot them with a naked eye.</p>
<h3 id="identifying-anomalies"><a class="toclink" href="#identifying-anomalies">Identifying Anomalies</a></h3>
<p>In anomaly detection systems, we usually want to identify if we have an anomaly <em>right now</em>, and send an alert.</p>
<p>To identify if the last datapoint is an anomaly, we start by calculating the mean and standard deviation for each status code in the past hour:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">stats</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">MAX</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">period</span><span class="p">),</span><span class="w"> </span><span class="n">entries</span><span class="p">]))[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">last_value</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="c1">-- In the demo data use:</span>
<span class="w"> </span><span class="c1">-- period > '2020-08-01 17:00 UTC'::timestamptz</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">status_code</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">stats</span><span class="p">;</span>
<span class="go">status_code β last_value β mean_entries β stddev_entries</span>
<span class="go">βββββββββββββΌβββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββββββββββ</span>
<span class="go"> 404 β 0 β 0.13333333333333333333 β 0.34280333180088158345</span>
<span class="go"> 500 β 0 β 0.15000000000000000000 β 0.36008473579027553993</span>
<span class="go"> 200 β 4084 β 2779.1000000000000000 β 689.219644702665</span>
<span class="go"> 400 β 24 β 0.73333333333333333333 β 3.4388935285299212</span>
</pre></div>
<p>To get the last value in a GROUP BY in addition to the mean and standard deviation <a href="/sql-group-by-first-last-value">we used a little array trick</a>.</p>
<p>Next, we calculate the z-score for the last value for each status code:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">stats</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">MAX</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'epoch'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">period</span><span class="p">),</span><span class="w"> </span><span class="n">entries</span><span class="p">]))[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">last_value</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="c1">-- In the demo data use:</span>
<span class="w"> </span><span class="c1">-- period > '2020-08-01 17:00 UTC'::timestamptz</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">status_code</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">last_value</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="n">stddev_entries</span><span class="o">::</span><span class="k">float</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">stats</span><span class="p">;</span>
<span class="go">status_code β last_value β mean_entries β stddev_entries β zscore</span>
<span class="go">βββββββββββββΌβββββββββββββΌβββββββββββββββΌβββββββββββββββββΌββββββββ</span>
<span class="go"> 404 β 0 β 0.133 β 0.3428 β -0.388</span>
<span class="go"> 500 β 0 β 0.150 β 0.3600 β -0.416</span>
<span class="go"> 200 β 4084 β 2779.100 β 689.2196 β 1.893</span>
<span class="hll"><span class="go"> 400 β 24 β 0.733 β 3.4388 β 6.765</span>
</span></pre></div>
<p>We calculated the z-score by finding the number of standard deviations between the last value and the mean. To <a href="/sql-dos-and-donts#guard-against-division-by-zero-errors">avoid a "division by zero" error</a> we transform the denominator to NULL if it's zero.</p>
<p>Looking at the z-scores we got, we can spot that status code 400 got a very high z-score of 6. In the past minute we returned a 400 status code 24 times, which is significantly higher than the average of 0.73 in the past hour.</p>
<p>Let's take a look at the raw data:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">server_log_summary</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">status_code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">400</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span>
<span class="k">LIMIT</span><span class="w"> </span><span class="mf">20</span><span class="p">;</span>
<span class="go"> period β status_code β entries</span>
<span class="go">ββββββββββββββββββββββββΌββββββββββββββΌβββββββββ</span>
<span class="go">2020-08-01 18:00:00+00 β 400 β 24</span>
<span class="go">2020-08-01 17:59:00+00 β 400 β 12</span>
<span class="go">2020-08-01 17:58:00+00 β 400 β 2</span>
<span class="go">2020-08-01 17:57:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:56:00+00 β 400 β 1</span>
<span class="go">2020-08-01 17:55:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:54:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:53:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:52:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:51:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:50:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:49:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:48:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:47:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:46:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:45:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:44:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:43:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:42:00+00 β 400 β 0</span>
<span class="go">2020-08-01 17:41:00+00 β 400 β 0</span>
</pre></div>
<p>It does look like in the last couple of minutes we are getting more errors than expected.</p>
<figure><img alt="Status 400 in the past hour" src="https://hakibenita.com/images/00-sql-anomaly-detection-400.png"><figcaption>Status 400 in the past hour</figcaption>
</figure>
<p>What our naked eye missed in the chart and in the raw data, was found by the query, and was classified as an anomaly. We are off to a great start!</p>
<hr>
<h2 id="backtesting"><a class="toclink" href="#backtesting">Backtesting</a></h2>
<p>In the previous section we identified an anomaly. We found an increase in 400 status code because the z-score was 6. But how do we set the threshold for the z-score? Is a z-score of 3 an anomaly? What about 2, or 1?</p>
<p>To find thresholds that fit our needs, we can run simulations on past data with different values, and evaluate the results. This is often called backtesting.</p>
<h3 id="finding-past-anomalies"><a class="toclink" href="#finding-past-anomalies">Finding Past Anomalies</a></h3>
<p>The first thing we need to do is to calculate the mean and the standard deviation for each status code up until every row, just as if itβs the current value. This is a classic job for a <a href="https://www.postgresql.org/docs/current/tutorial-window.html" rel="noopener">window function</a>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">calculations_over_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WINDOW</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">calculations_over_window</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span>
<span class="k">LIMIT</span><span class="w"> </span><span class="mf">20</span><span class="p">;</span>
<span class="go">status_code β period β entries β mean_entries β stddev_entries</span>
<span class="go">βββββββββββββΌβββββββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββββββββββ</span>
<span class="go"> 200 β 2020-08-01 18:00:00+00 β 4084 β 2759.9672131147540984 β 699.597407256800</span>
<span class="go"> 400 β 2020-08-01 18:00:00+00 β 24 β 0.72131147540983606557 β 3.4114080550460080</span>
<span class="go"> 404 β 2020-08-01 18:00:00+00 β 0 β 0.13114754098360655738 β 0.34036303344446665347</span>
<span class="go"> 500 β 2020-08-01 18:00:00+00 β 0 β 0.14754098360655737705 β 0.35758754516763638735</span>
<span class="go"> 500 β 2020-08-01 17:59:00+00 β 0 β 0.16393442622950819672 β 0.37328844382740000274</span>
<span class="go"> 400 β 2020-08-01 17:59:00+00 β 12 β 0.32786885245901639344 β 1.5676023249473471</span>
<span class="go"> 200 β 2020-08-01 17:59:00+00 β 3927 β 2718.6721311475409836 β 694.466863171826</span>
<span class="go"> 404 β 2020-08-01 17:59:00+00 β 0 β 0.13114754098360655738 β 0.34036303344446665347</span>
<span class="go"> 500 β 2020-08-01 17:58:00+00 β 0 β 0.16393442622950819672 β 0.37328844382740000274</span>
<span class="go"> 404 β 2020-08-01 17:58:00+00 β 0 β 0.13114754098360655738 β 0.34036303344446665347</span>
<span class="go"> 200 β 2020-08-01 17:58:00+00 β 3850 β 2680.4754098360655738 β 690.967283512936</span>
<span class="go"> 400 β 2020-08-01 17:58:00+00 β 2 β 0.13114754098360655738 β 0.38623869286861001780</span>
<span class="go"> 404 β 2020-08-01 17:57:00+00 β 0 β 0.13114754098360655738 β 0.34036303344446665347</span>
<span class="go"> 400 β 2020-08-01 17:57:00+00 β 0 β 0.09836065573770491803 β 0.30027309973793774423</span>
<span class="go"> 500 β 2020-08-01 17:57:00+00 β 1 β 0.16393442622950819672 β 0.37328844382740000274</span>
<span class="go"> 200 β 2020-08-01 17:57:00+00 β 3702 β 2643.0327868852459016 β 688.414796645480</span>
<span class="go"> 200 β 2020-08-01 17:56:00+00 β 3739 β 2607.5081967213114754 β 688.769908918569</span>
<span class="go"> 404 β 2020-08-01 17:56:00+00 β 0 β 0.14754098360655737705 β 0.35758754516763638735</span>
<span class="go"> 400 β 2020-08-01 17:56:00+00 β 1 β 0.11475409836065573770 β 0.32137001808599097120</span>
<span class="go"> 500 β 2020-08-01 17:56:00+00 β 0 β 0.14754098360655737705 β 0.35758754516763638735</span>
</pre></div>
<p>To calculate the mean and standard deviation over a sliding window of 60 minutes, we use a <a href="https://www.postgresql.org/docs/current/tutorial-window.html" rel="noopener">window function</a>. To avoid having to repeat the <code>WINDOW</code> clause for every aggregate, we define a <a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW" rel="noopener">named window</a> called "status_window". This is another nice feature of PostgreSQL.</p>
<p>In the results we can now see that for every entry, we have the mean and standard deviation of the previous 60 rows. This is similar to the calculation we did in the previous section, only this time we do it for every row.</p>
<p>Now we can calculate the z-score for every row:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">calculations_over_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WINDOW</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">),</span>
<span class="hll"><span class="n">with_zscore</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">entries</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="n">stddev_entries</span><span class="o">::</span><span class="k">float</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">calculations_over_window</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">zscore</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">with_zscore</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span>
<span class="k">LIMIT</span>
<span class="w"> </span><span class="mf">20</span><span class="p">;</span>
<span class="go">status_code β period β zscore</span>
<span class="go">βββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββββββββ</span>
<span class="go"> 200 β 2020-08-01 18:00:00+00 β 1.8925638848161648</span>
<span class="go"> 400 β 2020-08-01 18:00:00+00 β 6.823777205473068</span>
<span class="go"> 404 β 2020-08-01 18:00:00+00 β -0.38531664163524526</span>
<span class="go"> 500 β 2020-08-01 18:00:00+00 β -0.41260101365496504</span>
<span class="go"> 500 β 2020-08-01 17:59:00+00 β -0.4391628750910588</span>
<span class="go"> 400 β 2020-08-01 17:59:00+00 β 7.445849602151508</span>
<span class="go"> 200 β 2020-08-01 17:59:00+00 β 1.7399359608515874</span>
<span class="go"> 404 β 2020-08-01 17:59:00+00 β -0.38531664163524526</span>
<span class="go"> 500 β 2020-08-01 17:58:00+00 β -0.4391628750910588</span>
<span class="go"> 404 β 2020-08-01 17:58:00+00 β -0.38531664163524526</span>
<span class="go"> 200 β 2020-08-01 17:58:00+00 β 1.6925903990967166</span>
<span class="go"> 400 β 2020-08-01 17:58:00+00 β 4.838594613958412</span>
<span class="go"> 404 β 2020-08-01 17:57:00+00 β -0.38531664163524526</span>
<span class="go"> 400 β 2020-08-01 17:57:00+00 β -0.32757065425956844</span>
<span class="go"> 500 β 2020-08-01 17:57:00+00 β 2.2397306629644</span>
<span class="go"> 200 β 2020-08-01 17:57:00+00 β 1.5382691050147506</span>
<span class="go"> 200 β 2020-08-01 17:56:00+00 β 1.6427718293547886</span>
<span class="go"> 404 β 2020-08-01 17:56:00+00 β -0.41260101365496504</span>
<span class="go"> 400 β 2020-08-01 17:56:00+00 β 2.75460015502278</span>
<span class="go"> 500 β 2020-08-01 17:56:00+00 β -0.41260101365496504</span>
</pre></div>
<p>We now have z-scores for every row, and we can try to identify anomalies:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">calculations_over_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WINDOW</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">),</span>
<span class="n">with_zscore</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">entries</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="n">stddev_entries</span><span class="o">::</span><span class="k">float</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">calculations_over_window</span>
<span class="p">),</span>
<span class="n">with_alert</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">alert</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">with_zscore</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">zscore</span><span class="p">,</span>
<span class="w"> </span><span class="n">alert</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">with_alert</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">alert</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span>
<span class="k">LIMIT</span>
<span class="w"> </span><span class="mf">20</span><span class="p">;</span>
<span class="go">status_code β period β entries β zscore β alert</span>
<span class="go">βββββββββββββΌβββββββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββΌβββββββ</span>
<span class="go"> 400 β 2020-08-01 18:00:00+00 β 24 β 6.823777205473068 β t</span>
<span class="go"> 400 β 2020-08-01 17:59:00+00 β 12 β 7.445849602151508 β t</span>
<span class="go"> 400 β 2020-08-01 17:58:00+00 β 2 β 4.838594613958412 β t</span>
<span class="go"> 500 β 2020-08-01 17:29:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 500 β 2020-08-01 17:20:00+00 β 1 β 3.3190952747131184 β t</span>
<span class="go"> 500 β 2020-08-01 17:18:00+00 β 1 β 3.7438474117708043 β t</span>
<span class="go"> 500 β 2020-08-01 17:13:00+00 β 1 β 3.7438474117708043 β t</span>
<span class="go"> 500 β 2020-08-01 17:09:00+00 β 1 β 4.360778994930029 β t</span>
<span class="go"> 500 β 2020-08-01 16:59:00+00 β 1 β 3.7438474117708043 β t</span>
<span class="go"> 400 β 2020-08-01 16:29:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 404 β 2020-08-01 16:13:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 500 β 2020-08-01 15:13:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 500 β 2020-08-01 15:11:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 500 β 2020-08-01 14:58:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 400 β 2020-08-01 14:56:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 400 β 2020-08-01 14:55:00+00 β 1 β 3.3190952747131184 β t</span>
<span class="go"> 400 β 2020-08-01 14:50:00+00 β 1 β 3.3190952747131184 β t</span>
<span class="go"> 500 β 2020-08-01 14:37:00+00 β 1 β 3.0027309973793774 β t</span>
<span class="go"> 400 β 2020-08-01 14:35:00+00 β 1 β 3.3190952747131184 β t</span>
<span class="go"> 400 β 2020-08-01 14:32:00+00 β 1 β 3.3190952747131184 β t</span>
</pre></div>
<p>We decided to classify values with z-score greater than 3 as anomalies. 3 is usually the magic number youβll see in textbooks, but donβt get sentimental about it because you can definitely change it to get better results.</p>
<h3 id="adding-thresholds"><a class="toclink" href="#adding-thresholds">Adding Thresholds</a></h3>
<p>In the last query we detected a large number of "anomalies" with just one entry. This is very common in errors that don't happen very often. In our case, every once in a while we get a 400 status code, but because it doesn't happen very often, the standard deviation is very low so that even a single error can be considered way above the acceptable value.</p>
<p>We don't really want to receive an alert in the middle of the night just because of one 400 status code. We can't have every curious developer fiddling with the devtools in his browser wake us up in the middle of the night.</p>
<p>To eliminate rows with only a few entries we set a threshold:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">calculations_over_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WINDOW</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">),</span>
<span class="n">with_zscore</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">entries</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="n">stddev_entries</span><span class="o">::</span><span class="k">float</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">calculations_over_window</span>
<span class="p">),</span>
<span class="n">with_alert</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">alert</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">with_zscore</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">zscore</span><span class="p">,</span>
<span class="w"> </span><span class="n">alert</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">with_alert</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">alert</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
<span class="go">status_code β period β entries β zscore β alert</span>
<span class="go">βββββββββββββΌβββββββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββΌβββββββ</span>
<span class="go"> 400 β 2020-08-01 18:00:00+00 β 24 β 6.823777205473068 β t</span>
<span class="go"> 400 β 2020-08-01 17:59:00+00 β 12 β 7.445849602151508 β t</span>
<span class="go"> 500 β 2020-08-01 11:29:00+00 β 5001 β 3.172198441961645 β t</span>
<span class="go"> 500 β 2020-08-01 11:28:00+00 β 4812 β 3.3971646910263917 β t</span>
<span class="go"> 500 β 2020-08-01 11:27:00+00 β 4443 β 3.5349400089601586 β t</span>
<span class="go"> 500 β 2020-08-01 11:26:00+00 β 4522 β 4.1264785335553595 β t</span>
<span class="go"> 500 β 2020-08-01 11:25:00+00 β 5567 β 6.17629336121081 β t</span>
<span class="go"> 500 β 2020-08-01 11:24:00+00 β 3657 β 6.8689992361141154 β t</span>
<span class="go"> 500 β 2020-08-01 11:23:00+00 β 1512 β 6.342260662589681 β t</span>
<span class="go"> 500 β 2020-08-01 11:22:00+00 β 1022 β 7.682189672504754 β t</span>
<span class="go"> 404 β 2020-08-01 07:20:00+00 β 23 β 5.142126410098476 β t</span>
<span class="go"> 404 β 2020-08-01 07:19:00+00 β 20 β 6.091200697920824 β t</span>
<span class="go"> 404 β 2020-08-01 07:18:00+00 β 15 β 7.57547172423804 β t</span>
</pre></div>
<p>After eliminating potential anomalies with less than 10 entries we get much fewer, and probably more relevant results.</p>
<h3 id="eliminating-repeating-alerts"><a class="toclink" href="#eliminating-repeating-alerts">Eliminating Repeating Alerts</a></h3>
<p>In the previous section we eliminated potential anomalies with less than 10 entries. Using thresholds we were able to remove some non interesting anomalies.</p>
<p>Let's have a look at the data for status code 400 after applying the threshold:</p>
<div class="highlight"><pre><span></span>status_code β period β entries β zscore β alert
βββββββββββββΌβββββββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββΌβββββββ
400 β 2020-08-01 18:00:00+00 β 24 β 6.823777205473068 β t
400 β 2020-08-01 17:59:00+00 β 12 β 7.445849602151508 β t
</pre></div>
<p>The first alert happened in 17:59, and a minute later the z-score was still high with a large number of entries and so we classified the next rows at 18:00 as an anomaly as well.</p>
<p>If you think of an alerting system, we want to send an alert only when an anomaly first happens. We don't want to send an alert every minute until the z-score comes back below the threshold. In this case, we only want to send one alert at 17:59. We don't want to send <em>another</em> alert a minute later at 18:00.</p>
<p>Let's remove alerts where the previous period was also classified as an alert:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">calculations_over_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">STDDEV</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">stddev_entries</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="w"> </span><span class="k">WINDOW</span><span class="w"> </span><span class="n">status_window</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="mf">60</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">),</span>
<span class="n">with_zscore</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">entries</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mean_entries</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="n">stddev_entries</span><span class="o">::</span><span class="k">float</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">zscore</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">calculations_over_window</span>
<span class="p">),</span>
<span class="n">with_alert</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">zscore</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">alert</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">with_zscore</span>
<span class="p">),</span>
<span class="n">with_previous_alert</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">LAG</span><span class="p">(</span><span class="n">alert</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status_code</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">period</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">previous_alert</span>
</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">with_alert</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">period</span><span class="p">,</span>
<span class="w"> </span><span class="n">entries</span><span class="p">,</span>
<span class="w"> </span><span class="n">zscore</span><span class="p">,</span>
<span class="w"> </span><span class="n">alert</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">with_previous_alert</span>
<span class="k">WHERE</span>
<span class="hll"><span class="w"> </span><span class="n">alert</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">previous_alert</span>
</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
<span class="go">status_code β period β entries β zscore β alert</span>
<span class="go">βββββββββββββΌβββββββββββββββββββββββββΌββββββββββΌββββββββββββββββββββΌβββββββ</span>
<span class="go"> 400 β 2020-08-01 17:59:00+00 β 12 β 7.445849602151508 β t</span>
<span class="go"> 500 β 2020-08-01 11:22:00+00 β 1022 β 7.682189672504754 β t</span>
<span class="go"> 404 β 2020-08-01 07:18:00+00 β 15 β 7.57547172423804 β t</span>
</pre></div>
<p>By eliminating alerts that were already triggered we get a very small list of anomalies that may have happened during the day. Looking at the results we can see what anomalies we would have discovered:</p>
<ul>
<li>Anomaly in status code 400 at 17:59: we also found that one earlier.</li>
</ul>
<figure><img alt="Anomaly in status code 400" src="https://hakibenita.com/images/00-sql-anomaly-detection-400.png"><figcaption>Anomaly in status code 400</figcaption>
</figure>
<ul>
<li>Anomaly in status code 500: we spotted this one on the chart when we started.</li>
</ul>
<figure><img alt="Anomaly in status code 500" src="https://hakibenita.com/images/00-sql-anomaly-detection-500.png"><figcaption>Anomaly in status code 500</figcaption>
</figure>
<ul>
<li>Anomaly in status code 404: this is a hidden hidden anomaly which we did not know about until now.</li>
</ul>
<figure><img alt="A hidden anomaly in status code 404" src="https://hakibenita.com/images/00-sql-anomaly-detection-404.png"><figcaption>A hidden anomaly in status code 404</figcaption>
</figure>
<p>The query can now be used to fire alerts when it encounters an anomaly.</p>
<h3 id="experiment-with-different-values"><a class="toclink" href="#experiment-with-different-values">Experiment With Different Values</a></h3>
<p>In the process so far weβve used several constants in our calculations:</p>
<ul>
<li><strong>Lookback period</strong>: How far back we calculate the mean and standard deviation for each status code. The value we used is 60 minutes.</li>
<li><strong>Entries Threshold</strong>: The least amount of entries we want to get an alert for. The value we used is 10.</li>
<li><strong>Z-Score Threshold</strong>: The z-score after which we classify the value as an anomaly. The value we used is 6.</li>
</ul>
<p>Now that we have a working query to backtest, we can experiment with different values.</p>
<figure><img alt="Experimenting with parameter values" src="https://hakibenita.com/images/00-sql-anomaly-detection-parameters.png"><figcaption>Experimenting with parameter values</figcaption>
</figure>
<p>This is a chart showing the alerts our system identified in the past 12 hours:</p>
<figure><img alt="Backtesting with default parameters. <a href="https://popsql.com/queries/-MECQV6GiKr04WdCWM0K/simple-anomaly-detection-with-sql?access_token=2d2c0729f9a1cfa7b6a2dbb5b0adb45c">View in editor</a>" src="https://hakibenita.com/images/00-sql-anomaly-detection-backtest-10-3-60.png"><figcaption>Backtesting with default parameters. <a href="https://popsql.com/queries/-MECQV6GiKr04WdCWM0K/simple-anomaly-detection-with-sql?access_token=2d2c0729f9a1cfa7b6a2dbb5b0adb45c">View in editor</a></figcaption>
</figure>
<p>To get a sense of each parameter, let's adjust the values and see how it affects the number and quality of alerts we get.</p>
<p>If we decrease the value of the z-score threshold from 3 to 1, we should get more alerts. With a lower threshold, more values are likely to be considered an anomaly:</p>
<figure><img alt="Backtesting with lower z-score threshold" src="https://hakibenita.com/images/00-sql-anomaly-detection-backtest-10-1-60.png"><figcaption>Backtesting with lower z-score threshold</figcaption>
</figure>
<p>If we increase the entries threshold from 10 to 30, we should get less alerts:</p>
<figure><img alt="Backtesting with higher entries threshold" src="https://hakibenita.com/images/00-sql-anomaly-detection-backtest-30-3-60.png"><figcaption>Backtesting with higher entries threshold</figcaption>
</figure>
<p>If we increase the backtest period from 60 minutes to 360 minutes, we get more alerts:</p>
<figure><img alt="Backtesting with higher entries threshold" src="https://hakibenita.com/images/00-sql-anomaly-detection-backtest-30-3-360.png"><figcaption>Backtesting with higher entries threshold</figcaption>
</figure>
<p>A good alerting system is a system that produces true alerts, at a reasonable time. Using the backtesting query you can experiment with different values that produces quality alerts you can act on.</p>
<hr>
<h2 id="improving-accuracy"><a class="toclink" href="#improving-accuracy">Improving Accuracy</a></h2>
<p>Using a z-score for detecting anomalies is an easy way to get started with anomaly detection and see results right away. But, this method is not always the best choice, and if you don't get good alerts using this method, there are some improvements and other methods you can try using just SQL.</p>
<h3 id="use-weighted-mean"><a class="toclink" href="#use-weighted-mean">Use Weighted Mean</a></h3>
<p>Our system uses a mean to determine a reasonable value, and a lookback period to determine how long back to calculate that mean over. In our case, we calculated the mean based on data from 1 hour ago.</p>
<p>Using this method of calculating mean gives the same weight to entries that happened 1 hour ago and to entries that just happened. If you give more weight to recent entries at the expense of previous entries, the new weighted mean should become more sensitive to recent entries, and you may be able to identify anomalies quicker.</p>
<p>To give more weight to recent entries, you can use a <a href="https://en.wikipedia.org/wiki/Weighted_arithmetic_mean" rel="noopener">weighted average</a>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">,</span>
<span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">mean</span><span class="p">,</span>
<span class="w"> </span><span class="n">sum</span><span class="p">(</span>
<span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="p">(</span><span class="mf">60</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'seconds'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="s1">'2020-08-01 17:00 UTC'</span><span class="o">::</span><span class="nb">timestamptz</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">period</span><span class="p">))</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="mf">60</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">61</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">weighted_mean</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">server_log_summary</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="c1">-- Last 60 periods</span>
<span class="w"> </span><span class="n">period</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="s1">'2020-08-01 17:00 UTC'</span><span class="o">::</span><span class="nb">timestamptz</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">status_code</span><span class="p">;</span>
<span class="go"> status_code β mean β weighted_mean</span>
<span class="go">ββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββ</span>
<span class="go"> 404 β 0.13333333333333333333 β 0.26229508196721313</span>
<span class="go"> 500 β 0.15000000000000000000 β 0.29508196721311475</span>
<span class="go"> 200 β 2779.1000000000000000 β 5467.081967213115</span>
<span class="go"> 400 β 0.73333333333333333333 β 1.4426229508196722</span>
</pre></div>
<p>In the results you can see the difference between the mean and the weighted mean for each status code.</p>
<p>A weighted average is a very <a href="https://www.investopedia.com/ask/answers/071414/whats-difference-between-moving-average-and-weighted-moving-average.asp" rel="noopener">common indicator used by stock traders</a>. We used a linear weighted average, but there are also exponential weighted averages and others you can try.</p>
<h3 id="use-median"><a class="toclink" href="#use-median">Use Median</a></h3>
<p>In statistics, a mean is considered not robust because it is influenced by extreme values. Given our use case, the measure we are using to identify extreme values, is affected by those values we are trying to identify.</p>
<p>For example, in the beginning of the article we used this series of values:</p>
<div class="highlight"><pre><span></span>2, 3, 5, 2, 3, 12, 5, 3, 4
</pre></div>
<p>The mean of this series is 4.33, and we detected 12 as an anomaly.</p>
<p>If the 12 were a 120, the mean of the series would have been 16.33. Hence, our "reasonable" value is heavily affected by the values it is supposed to identify.</p>
<p>A measure that is considered more robust is a <a href="https://en.wikipedia.org/wiki/Median" rel="noopener">median</a>. The median of a series is the value that half the series is greater than, and half the series is less than:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">percentile_disc</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="w"> </span><span class="k">within</span><span class="w"> </span><span class="k">group</span><span class="p">(</span><span class="k">order</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">n</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">120</span><span class="p">,</span><span class="w"> </span><span class="mf">5</span><span class="p">,</span><span class="w"> </span><span class="mf">3</span><span class="p">,</span><span class="w"> </span><span class="mf">4</span><span class="p">])</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="go"> median</span>
<span class="go">ββββββββ</span>
<span class="go"> 3</span>
</pre></div>
<p>To calculate the median in PostgreSQL we use the function <a href="https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE" rel="noopener"><code>percentile_disc</code></a>. In the series above, the median is 3. If we sort the list and cut it in the middle it will become more clear:</p>
<div class="highlight"><pre><span></span>2, 2, 3, 3, 3
4, 5, 5, 12
</pre></div>
<p>If we change the value of 12 to 120, the median will not be affected at all:</p>
<div class="highlight"><pre><span></span>2, 2, 3, 3, 3
4, 5, 5, 120
</pre></div>
<p>This is why a median is considered more robust than mean.</p>
<h3 id="use-mad"><a class="toclink" href="#use-mad">Use MAD</a></h3>
<p><a href="https://en.wikipedia.org/wiki/Median_absolute_deviation" rel="noopener">Median absolute deviation (MAD)</a> is another way of finding anomalies in a series. MAD is considered better than z-score for real life data.</p>
<p>MAD is calculated by finding the median of the deviations from the series median. Just for comparison, the standard deviation is the root square of the average square distance from the mean.</p>
<h3 id="use-different-measures"><a class="toclink" href="#use-different-measures">Use Different Measures</a></h3>
<p>We used the number of entries per minute as an indicator. However, depending on the use case, there might be other things you can measure that can yield better results. For example:</p>
<ul>
<li>To try and identify DOS attacks you can monitor the ratio between unique IP addresses to HTTP requests.</li>
<li>To reduce the amount of false positives, you can normalize the number of responses to the proportion of the total responses. This way, for example, if you're using a flaky remote service that fails once after every certain amount of requests, using the proportion may not trigger an alert when the increase in errors correlates with an increase in overall traffic.</li>
</ul>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>The method presented above is a very simple method to detect anomalies and produce actionable alerts that can potentially save you a lot of grief. There are many tools out there that provide similar functionally, but they require either tight integration or $$$. The main appeal of this approach is that you can get started with tools you probably already have, some SQL and a scheduled task!</p>
<hr>
<p><strong>UPDATE:</strong> many readers asked me how I created the charts in this article... well, I used <a href="https://popsql.com" rel="noopener">PopSQL</a>. Itβs a new modern SQL editor focused on collaborative editing. If you're in the market for one, go check it out...</p>Some SQL Tricks of an Application DBA2020-07-27T00:00:00+03:002020-07-27T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-07-27:/sql-tricks-application-dba<p>Some tips and misconceptions about database development I gathered along the way.</p><hr>
<p>When I started my career in development, my first job was a DBA. Back then, before AWS RDS, Azure, Google Cloud and the rest of them cloud services, there were two types of DBAs:</p>
<p><strong>The Infrastructure DBA</strong> was in charge of setting up the database, configuring the storage and taking care of backups and replication. After setting up the database, the infrastructure DBA would pop up from time to time and do some "instance tuning", things like sizing caches.</p>
<p><strong>The Application DBA</strong> got a clean database from the infrastructure DBA, and was in charge of schema design: creating tables, indexes, constraints, and tuning SQL. The application DBA was also the one who implemented ETL processes and data migrations. In teams that used stored procedures, the application DBA would maintain those as well.</p>
<p>Application DBAs were usually part of the development team. They would possess deep domain knowledge so normally they would work on just one or two projects. Infrastructure DBAs would usually be part of some IT team, and would work on many projects simultaneously.</p>
<p><strong>I'm an Application DBA</strong></p>
<p>I never had any desire to fiddle with backups or tune storage (I'm sure it's fascinating!). Until this day I like to say I'm a DBA that knows how to develop applications, and not a developer that knows his way around the database.</p>
<p><strong>In this article I share some non-trivial tips about database development I gathered along the way.</strong></p>
<figure><img alt="Be that guy...<br>Image by <a href="https://www.commitstrip.com/en/2014/08/01/when-i-help-a-rookie-coder-fix-his-queries">CommitStrip</a>" src="https://hakibenita.com/images/00-sql-tricks-dba.jpg"><figcaption>Be that guy...<br>Image by <a href="https://www.commitstrip.com/en/2014/08/01/when-i-help-a-rookie-coder-fix-his-queries">CommitStrip</a></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#update-only-what-needs-updating">Update Only What Needs Updating</a></li>
<li><a href="#disable-constraints-and-indexes-during-bulk-loads">Disable Constraints and Indexes During Bulk Loads</a></li>
<li><a href="#use-unlogged-tables-for-intermediate-data">Use UNLOGGED Tables for Intermediate Data</a></li>
<li><a href="#implement-complete-processes-using-with-and-returning">Implement Complete Processes Using WITH and RETURNING</a></li>
<li><a href="#avoid-indexes-on-columns-with-low-selectivity">Avoid Indexes on Columns With Low Selectivity</a></li>
<li><a href="#use-partial-indexes">Use Partial Indexes</a></li>
<li><a href="#always-load-sorted-data">Always Load Sorted Data</a></li>
<li><a href="#index-columns-with-high-correlation-using-brin">Index Columns With High Correlation Using BRIN</a></li>
<li><a href="#make-indexes-invisible">Make Indexes "Invisible"</a></li>
<li><a href="#dont-schedule-long-running-processes-at-round-hours">Don't Schedule Long Running Processes at Round Hours</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="update-only-what-needs-updating"><a class="toclink" href="#update-only-what-needs-updating">Update Only What Needs Updating</a></h2>
<p><code>UPDATE</code> is a relatively expensive operation. To speed up an <code>UPDATE</code> command it's best to make sure you only update what needs updating.</p>
<p>Take this query for example that normalizes an email column:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lower</span><span class="p">(</span><span class="n">email</span><span class="p">);</span>
<span class="go">UPDATE 1010000</span>
<span class="go">Time: 1583.935 ms (00:01.584)</span>
</pre></div>
<p>Looks innocent, right? the query updated emails of 1,010,000 users. But, did all rows really needed to update?</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lower</span><span class="p">(</span><span class="n">email</span><span class="p">)</span>
<span class="hll"><span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">lower</span><span class="p">(</span><span class="n">email</span><span class="p">);</span>
</span><span class="go">UPDATE 10000</span>
<span class="go">Time: 299.470 ms</span>
</pre></div>
<p>Only 10,000 rows needed to update. By reducing the amount of affected rows, the execution time went down from 1.5 seconds to just less than 300ms. Updating fewer rows also saves the database maintenance later on.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 248.6 124" width="20em"><path d="M10 11l32-2-2 27H9" fill="#f41d92"/><path d="M10 9c12-1 21 1 32 3m-31-2h29m1 2c-2 7 0 13 2 23m-2-25l1 24m1 2H9m32-1H9m2 0c-2-6-1-14-1-27M9 35V9" stroke="currentColor" fill="none"/><path d="M166 10l29 2v25l-29-1" fill="#e08fff"/><path d="M166 10c10 1 24 3 30 0m-32 2l32-1m-2 1c2 4 3 10 0 23m3-23l-1 24m1 1l-33-1m33 0c-10-1-20 0-32 1m-1 0l2-24m-1 24V11" stroke="currentColor" fill="none"/><path d="M204 9l31 3-1 25-29-3" fill="#daaeff"/><path d="M203 11l33-1m-33-1c9 1 16 0 31 2m-1 1l2 24m-1-25l1 25m-1-1h-31m30 0l-29 1m0 0c-3-6 1-14-2-27m2 26c-2-7-1-13-1-25" stroke="currentColor" fill="none"/><path d="M45 11l33-2-2 28-29-1" fill="#f41d92"/><path d="M47 12h32m-34-1c12-2 23 0 33-1m0 0c-2 10-1 19-2 28m1-29l1 26m1 2c-12-3-23 0-33-1m32 0l-31-1m0-1c0-7-1-11 1-22m-2 24V10" stroke="currentColor" fill="none"/><path d="M83 9l34 3 1 24-35-1" fill="#f41d92"/><path d="M86 12l31-2m-32 0c8 1 14 1 31-1m-1 2c1 10 0 19 3 26m-1-27v26m0 1l-31-3m31 2H85m0-1c-1-10 1-16-1-26m0 26l2-25" stroke="currentColor" fill="none"/><path d="M124 10l34 1-1 25-34-1" fill="#f41d92"/><path d="M123 13h33m-30-1l30-1m1 0c-1 9 0 16-3 25m2-24c-1 6 1 15 1 25m1 1c-14-2-24-1-34-3m32 2h-31m0-2c-1-8 2-16 0-25m1 27c-2-6-1-14 0-25" stroke="currentColor" fill="none"/><path d="M13 89l31-1 1 26-29-1" fill="#f41d92"/><path d="M16 87c10 0 17 2 30-1m-33 2c13 0 26-2 32 0m1 1c-2 4 2 9 2 23m-2-24v25m1-2c-13 2-20 3-31 3m29-2c-8-1-20 0-32 1m0-1c3-9 1-15 3-25m-2 26c1-8 0-15-1-25" stroke="currentColor" fill="none"/><g><path d="M52 87l31 2-1 23-31 2" fill="#f41d92"/><path d="M50 86c14 2 24 3 32 1m-31 0l32 1m0 1l-1 25m0-26v26m0-1c-10 1-21 0-32-2m31 1l-31 1m0 0c2-5 1-13 2-24m-1 24l1-26" stroke="currentColor" fill="none"/></g><g><path d="M89 89l33-1-2 25H91" fill="#f41d92"/><path d="M88 89c12-3 22-1 32-3m-31 1l32 1m2 0c-1 9-2 20-1 25m0-26v26m-1 1c-6 0-15-2-33-2m32 0H90m-1 2c2-8 0-13 2-28m-2 27V88" stroke="currentColor" fill="none"/></g><g><path d="M131 90l30-2v27l-31-1" fill="#f41d92"/><path d="M131 89c9 2 23-1 28 1m-30-2l32 2m0 1v24m-1-26l1 24m1 2c-6-1-14-3-34-2m32 0l-31 1m2 1c-2-11 0-19-3-25m2 23V89" stroke="currentColor" fill="none"/></g><g><path d="M168 90l32-2-2 26-27 1" fill="#f41d92"/><path d="M167 90h33m-31-2h30m2 1v25m-2-26v26m1 1l-32-3m31 1l-31 1m3 1c-2-8 0-20-2-28m-1 27c2-10 0-17 0-26" stroke="currentColor" fill="none"/></g><g><path d="M209 88l31 2v22l-34 2" fill="#f41d92"/><path d="M208 89c7-1 15-3 30-2m-30 2c10-1 22-2 30-1m3 1c-2 6-3 10-4 24m1-26c-1 7 0 13 1 26m-2 1c-4 1-14 1-28-1m29 0l-31 1m2 0c-2-8-2-11-1-25m-1 25V87" stroke="currentColor" fill="none"/></g><g><path d="M183 46l1 36m-2-37l1 36M177 64l6 16m-7-17c1 3 2 8 7 17M190 64c-1 4-5 8-7 16m5-18l-5 18" stroke="currentColor" fill="none"/></g><g><path d="M221 44l3 37m-1-35v37M215 67c3 1 4 7 7 16m-4-16c1 6 3 13 5 16M228 67l-6 16m8-16l-7 16" stroke="currentColor" fill="none"/></g></svg>
<figcaption>Update Only What Needs Updating</figcaption>
</figure>
<p>This type of large updates are very common in data migration scripts. So the next time you write a migration script, make sure to only update what needs updating.</p>
<hr>
<h2 id="disable-constraints-and-indexes-during-bulk-loads"><a class="toclink" href="#disable-constraints-and-indexes-during-bulk-loads">Disable Constraints and Indexes During Bulk Loads</a></h2>
<p>Constraints are an important part of relational databases: they keep the data consistent and reliable. Their benefits come at a cost though, and it's most noticeable when loading or updating a lot of rows.</p>
<p>To demonstrate, set up a small schema for a store:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">price</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="p">(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">price</span><span class="p">)</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">random</span><span class="p">()::</span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">1000</span><span class="p">)::</span><span class="nb">int</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">10000</span><span class="p">);</span>
<span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">customer</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">customer</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">customer</span><span class="w"> </span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">random</span><span class="p">()::</span><span class="nb">text</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100000</span><span class="p">);</span>
<span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">sale</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="n">timestamptz</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">product_id</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">customer_id</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">);</span>
</pre></div>
<p>The schema defines different types of constraints such as "not null" and unique constraints.</p>
<p>To set a baseline, start by adding foreign keys to the <code>sale</code> table, and then load some data into it:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">sale_product_fk</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">product_id</span><span class="p">)</span><span class="w"> </span><span class="k">REFERENCES</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 18.413 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">sale_customer_fk</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">customer_id</span><span class="p">)</span><span class="w"> </span><span class="k">REFERENCES</span><span class="w"> </span><span class="n">customer</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 5.464 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_created_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">sale</span><span class="p">(</span><span class="n">created</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="go">Time: 12.605 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">SALE</span><span class="w"> </span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">product_id</span><span class="p">,</span><span class="w"> </span><span class="n">customer_id</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">1000</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10000</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">100000</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">1000000</span><span class="p">);</span>
<span class="go">INSERT 0 1000000</span>
<span class="go">Time: 15410.234 ms (00:15.410)</span>
</pre></div>
<p>After defining constraints and indexes, loading a million rows to the table took ~15.4s.</p>
<p>Next, try to load the data into the table first, and only then add constraints and indexes:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">SALE</span><span class="w"> </span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">product_id</span><span class="p">,</span><span class="w"> </span><span class="n">customer_id</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 hour'</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">1000</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10000</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">100000</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">1000000</span><span class="p">);</span>
<span class="go">INSERT 0 1000000</span>
<span class="go">Time: 2277.824 ms (00:02.278)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">sale_product_fk</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">product_id</span><span class="p">)</span><span class="w"> </span><span class="k">REFERENCES</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 169.193 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">sale_customer_fk</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">customer_id</span><span class="p">)</span><span class="w"> </span><span class="k">REFERENCES</span><span class="w"> </span><span class="n">customer</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">ALTER TABLE</span>
<span class="go">Time: 185.633 ms</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_created_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">sale</span><span class="p">(</span><span class="n">created</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="go">Time: 484.244 ms</span>
</pre></div>
<p>Loading data into a table without indexes and constraints was much faster, 2.27s compared to 15.4s before. Creating the indexes and constraints after the data was loaded into the table took a bit longer, but overall the entire process was much faster, 3.1s compared to 15.4s.</p>
<p>Unfortunately, for indexes PostgreSQL does not provide an easy way of doing this other than dropping and re-creating the indexes. In other databases such as Oracle, you can <a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ALTER-INDEX.html#GUID-D8F648E7-8C07-4C89-BB71-862512536558" rel="noopener">disable and enable indexes</a> without having to re-create them.</p>
<hr>
<h2 id="use-unlogged-tables-for-intermediate-data"><a class="toclink" href="#use-unlogged-tables-for-intermediate-data">Use UNLOGGED Tables for Intermediate Data</a></h2>
<p>When you modify data in PostgreSQL, the changes are written to the <a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener">write ahead log (WAL)</a>. The WAL is used to maintain integrity, to fast forward the database during recovery and to maintain replication.</p>
<p>Writing to the WAL is often needed, but there are some circumstances where you might be willing to give up some of its uses to make things a bit faster. One example is intermediate tables.</p>
<p>Intermediate tables are disposable tables that stores temporary data used to implement some process. For example, a very common pattern in ETL processes is to load data from a CSV file to an intermediate table, clean the data, and then load it to the target table. In this use-case, the intermediate table is disposable and there is no use for it in backups or replicas.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 321.1 91.4" width="25em"><path d="M100 53l128-1" stroke="currentColor" stroke-width="1.5" fill="none" stroke-dasharray="12 8"/><path d="M200 61c8 0 18-6 27-10M200 41c7 6 18 7 27 10" stroke="currentColor" stroke-width="1.5" fill="none"/><path d="M13 24h69l-4 59-65-6" fill="#f41d92"/><path d="M11 25c20-1 35 1 69-3m-68 3c26 0 46 2 70-1m-2 4c4 19 3 44 0 50m1-53c-2 19-3 39-1 57m0 2c-13-4-32-2-70 0m69-5c-14 1-29 4-67 3m0-2c-2-10 2-23 4-55m-4 58l3-57" stroke="currentColor" fill="none"/><path d="M236 25l68 1-5 47-58 6" fill="#e08fff"/><path d="M239 30c24-6 40-6 59-7m-59 5l61-2m6-2c-5 11 0 26-6 48m2-44c-1 17 1 37 1 48m-3 2c-13-3-39 0-62-6m65 4l-67-1m4-2c0-15-4-26 0-48m-1 50V27" stroke="currentColor" fill="none"/><g><g stroke-opacity=".7" fill-opacity=".7" fill-rule="evenodd"><path d="M13 24l14-9 13-6 22 6 25 5-71 10" fill="#f41d92"/><path d="M15 27c5-3 13-16 25-17 11-1 48 9 44 11-3 3-55 3-67 4m-5 0c5-3 15-12 27-12s49 9 44 11-62 1-73 2" stroke="currentColor" fill="none"/></g></g><g><g stroke-opacity=".8" fill-opacity=".8" fill-rule="evenodd"><path d="M236 24l13-12 9 6 29-2 23 7-76 7" fill="#e08fff"/><path d="M240 29c3-3 7-15 18-15 11-1 51 9 47 11-3 2-56 3-67 3m-1-1c5-2 13-9 26-10 12-1 52 3 48 4l-72 2" stroke="currentColor" fill="none"/></g></g><g><path d="M134 30l32 43m-35-38l33 35" stroke="#d30101" stroke-width="4" fill="none"/></g><g><path d="M131 67l31-32m-27 37c6-6 28-34 32-41" stroke="#d30101" stroke-width="4" fill="none"/></g></svg>
<figcaption>UNLOGGED table</figcaption>
</figure>
<p>Intermediate tables that don't need to be restored in case of disaster, and are not needed in replicas, can be set as <a href="https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-UNLOGGED" rel="noopener">UNLOGGED</a>:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="n">UNLOGGED</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">staging_table</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="cm">/* table definition */</span><span class="w"> </span><span class="p">);</span>
</pre></div>
<p><strong>BEWARE</strong>: Before using <code>UNLOGGED</code> make sure you understand its full implications.</p>
<hr>
<h2 id="implement-complete-processes-using-with-and-returning"><a class="toclink" href="#implement-complete-processes-using-with-and-returning">Implement Complete Processes Using <code>WITH</code> and <code>RETURNING</code></a></h2>
<p>Say you have a users table, and you find that you have some duplicates in the table:</p>
<p><details>
<summary>Table setup</summary></p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">UNIQUE</span>
<span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span>
<span class="w"> </span><span class="k">CONSTRAINT</span><span class="w"> </span><span class="n">orders_user_fk</span>
<span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
<span class="w"> </span><span class="k">REFERENCES</span><span class="w"> </span><span class="n">USERS</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">email</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'foo@bar.baz'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'me@hakibenita.com'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="s1">'ME@hakibenita.com'</span><span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
</pre></div>
<p></details></p>
<div class="highlight"><pre><span></span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">email</span><span class="p">,</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">order_id</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">order_id</span>
<span class="c1">----+-------------------+----------</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">foo</span><span class="o">@</span><span class="n">bar</span><span class="p">.</span><span class="n">baz</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">foo</span><span class="o">@</span><span class="n">bar</span><span class="p">.</span><span class="n">baz</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span>
<span class="hll"><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">me</span><span class="o">@</span><span class="n">hakibenita</span><span class="p">.</span><span class="n">com</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span>
</span><span class="hll"><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ME</span><span class="o">@</span><span class="n">hakibenita</span><span class="p">.</span><span class="n">com</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">4</span>
</span><span class="hll"><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ME</span><span class="o">@</span><span class="n">hakibenita</span><span class="p">.</span><span class="n">com</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">5</span>
</span></pre></div>
<p>The user <em>haki benita</em> registered twice, once with the email <code>ME@hakibenita.com</code> and again with <code>me@hakibenita.com</code>. Because we didn't normalize the emails when we inserted them into the table, we now have to deal with duplication.</p>
<p>To consolidate the duplicate users, we want to:</p>
<ol>
<li>Identify duplicate users by lower case email</li>
<li>Update orders to reference one of the duplicate users</li>
<li>Remove the duplicate users from the users table</li>
</ol>
<p>One way to consolidate duplicate users is to use an intermediate table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">UNLOGGED</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="k">AS</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">lower</span><span class="p">(</span><span class="n">email</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">normalized_email</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">convert_to_user</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">array_remove</span><span class="p">(</span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="n">id</span><span class="p">),</span><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">id</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">convert_from_users</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">users</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">normalized_email</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">HAVING</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go">CREATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">duplicate_users</span><span class="p">;</span>
<span class="go"> normalized_email | convert_to_user | convert_from_users</span>
<span class="go">-------------------+-----------------+--------------------</span>
<span class="go"> me@hakibenita.com | 2 | {3}</span>
</pre></div>
<p>The intermediate table holds a mapping of duplicate users. For each user that appears more than once with the same normalized email address, we define the user with the min ID as the user we convert all duplicates to. The other users are kept in an array column, and all the references to these users will be updated.</p>
<p>Using the intermediate table, we update references of duplicate users in the <code>orders</code> table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">UPDATE</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SET</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">du</span><span class="mf">.</span><span class="n">convert_to_user</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="n">du</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">o</span><span class="mf">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ANY</span><span class="p">(</span><span class="n">du</span><span class="mf">.</span><span class="n">convert_from_users</span><span class="p">);</span>
<span class="go">UPDATE 2</span>
</pre></div>
<p>Now that there are no more references, we can safely delete the duplicate users from the <code>users</code> table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">DELETE</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">users</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span>
<span class="gp">db(#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">unnest</span><span class="p">(</span><span class="n">convert_from_users</span><span class="p">)</span>
<span class="gp">db(#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">duplicate_users</span>
<span class="gp">db(#</span><span class="w"> </span><span class="p">);</span>
<span class="go">DELETE 1</span>
</pre></div>
<p>Notice that we used the function <a href="https://www.postgresql.org/docs/current/functions-array.html#ARRAY-FUNCTIONS-TABLE" rel="noopener"><code>unnest</code></a> to "transpose" the array, that is, turn each array element into a row.</p>
<p>This is the result:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">u</span><span class="mf">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">u</span><span class="mf">.</span><span class="n">email</span><span class="p">,</span><span class="w"> </span><span class="n">o</span><span class="mf">.</span><span class="n">id</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">order_id</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="mf">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u</span><span class="mf">.</span><span class="n">id</span><span class="p">;</span>
<span class="go"> id | email | order_id</span>
<span class="go">----+-------------------+----------</span>
<span class="go"> 1 | foo@bar.baz | 1</span>
<span class="go"> 1 | foo@bar.baz | 2</span>
<span class="hll"><span class="go"> 2 | me@hakibenita.com | 3</span>
</span><span class="hll"><span class="go"> 2 | me@hakibenita.com | 4</span>
</span><span class="hll"><span class="go"> 2 | me@hakibenita.com | 5</span>
</span></pre></div>
<p>Nice, all occurrences of user 3 (ME@hakibenita.com) are converted to user 2 (me@hakibenita.com).</p>
<p>We can also verify that the duplicate users were deleted from the <code>users</code> table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go"> id | email</span>
<span class="go">----+-------------------</span>
<span class="go"> 1 | foo@bar.baz</span>
<span class="go"> 2 | me@hakibenita.com</span>
</pre></div>
<p>Now we can get rid of the intermediate table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">duplicate_users</span><span class="p">;</span>
<span class="go">DROP TABLE</span>
</pre></div>
<p>This is fine, but very long and needs cleaning up! Is there a better way?</p>
<p><strong>Using Common Table Expressions (CTE)</strong></p>
<p>Using <a href="https://www.postgresql.org/docs/current/queries-with.html" rel="noopener">Common Table Expressions</a>, also known as the <code>WITH</code> clause, we can perform the entire process with just one SQL statement:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">convert_to_user</span><span class="p">,</span>
<span class="w"> </span><span class="n">array_remove</span><span class="p">(</span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="n">id</span><span class="p">),</span><span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">convert_from_users</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">users</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">lower</span><span class="p">(</span><span class="n">email</span><span class="p">)</span>
<span class="w"> </span><span class="k">HAVING</span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span>
<span class="p">),</span>
<span class="n">update_orders_of_duplicate_users</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">UPDATE</span>
<span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span>
<span class="w"> </span><span class="k">SET</span>
<span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">du</span><span class="p">.</span><span class="n">convert_to_user</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="n">du</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ANY</span><span class="p">(</span><span class="n">du</span><span class="p">.</span><span class="n">convert_from_users</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">DELETE</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">users</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">unnest</span><span class="p">(</span><span class="n">convert_from_users</span><span class="p">)</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">duplicate_users</span>
<span class="w"> </span><span class="p">);</span>
</pre></div>
<p>Instead of creating the intermediate table, we create a common table expression and reuse it multiple times.</p>
<p><strong>Returning Results From CTE</strong></p>
<p>A nice feature of executing DML inside a <code>WITH</code> clause, is that you can return data from it using the <a href="https://www.postgresql.org/docs/current/dml-returning.html" rel="noopener"><code>RETURNING</code> keyword</a>. For example, let's report the number of updated and deleted rows:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">convert_to_user</span><span class="p">,</span>
<span class="w"> </span><span class="n">array_remove</span><span class="p">(</span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="n">id</span><span class="p">),</span><span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">convert_from_users</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">users</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">lower</span><span class="p">(</span><span class="n">email</span><span class="p">)</span>
<span class="w"> </span><span class="k">HAVING</span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span>
<span class="p">),</span>
<span class="n">update_orders_of_duplicate_users</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">UPDATE</span>
<span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span>
<span class="w"> </span><span class="k">SET</span>
<span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">du</span><span class="p">.</span><span class="n">convert_to_user</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">duplicate_users</span><span class="w"> </span><span class="n">du</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ANY</span><span class="p">(</span><span class="n">du</span><span class="p">.</span><span class="n">convert_from_users</span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="n">RETURNING</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">id</span>
</span><span class="p">),</span>
<span class="n">delete_duplicate_user</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">DELETE</span><span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">users</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">unnest</span><span class="p">(</span><span class="n">convert_from_users</span><span class="p">)</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">duplicate_users</span>
<span class="w"> </span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="n">RETURNING</span><span class="w"> </span><span class="n">id</span>
</span><span class="p">)</span>
<span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">update_orders_of_duplicate_users</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">orders_updated</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">delete_duplicate_user</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">users_deleted</span>
</span><span class="p">;</span>
</pre></div>
<p>This is the result:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">orders_updated</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">users_deleted</span>
<span class="c1">----------------+---------------</span>
<span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
</pre></div>
<p>The main appeal of this approach is that the entire process is executed in a single command, so no need to manage a transaction or worry about cleaning up the intermediate table if the process fails.</p>
<p><strong>CAUTION</strong>: <a href="https://www.reddit.com/r/programming/comments/hyv0xh/some_sql_tricks_of_an_application_dba/fzhqzw5?utm_source=share&utm_medium=web2x" rel="noopener">A reader on Reddit</a> pointed me to a possibly <a href="https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING" rel="noopener">unpredictable behavior of executing DML in common table expressions</a>:</p>
<blockquote>
<p>The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable</p>
</blockquote>
<p>This means you cannot rely on the order in which independent sub-statements are executed. It seems that when there is a dependency between sub-statements, like in the example above, you can rely on a dependent sub-statement to execute before it is being used.</p>
<hr>
<h2 id="avoid-indexes-on-columns-with-low-selectivity"><a class="toclink" href="#avoid-indexes-on-columns-with-low-selectivity">Avoid Indexes on Columns With Low Selectivity</a></h2>
<p>Say you have a registration process where users sign up with an email address. To activate the account, they have to verify their email. Your table can look like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">activated</span><span class="w"> </span><span class="nb">boolean</span>
<span class="gp">db-#</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>Most of your users are good citizens, they sign up with a valid email and immediately activate the account. Let's populate the table with user data, where roughly 90% of the users are activated:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">activated</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0.9</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">activated</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">1000000</span><span class="p">);</span>
<span class="go">INSERT 0 1000000</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">activated</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">activated</span><span class="p">;</span>
<span class="go"> activated | count</span>
<span class="go">-----------+--------</span>
<span class="go"> f | 102567</span>
<span class="go"> t | 897433</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">VACUUM</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="go">VACUUM</span>
</pre></div>
<p>To query for activated and unactivated users, you might be tempted to create an index on the column <code>activated</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">users_activated_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="p">(</span><span class="n">activated</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>When you try to query <em>unactivated users</em>, the database is using the index:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">activated</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">--------------------------------------------------------------------------------------</span>
<span class="go"> Bitmap Heap Scan on users (cost=1923.32..11282.99 rows=102567 width=38)</span>
<span class="go"> Filter: (NOT activated)</span>
<span class="hll"><span class="go"> -> Bitmap Index Scan on users_activated_ix (cost=0.00..1897.68 rows=102567 width=0)</span>
</span><span class="go"> Index Cond: (activated = false)</span>
</pre></div>
<p>The database estimated that the filter will result in 102,567 which are roughly 10% of the table. This is consistent with the data we populated, so the database has a good sense of the data.</p>
<p>However, when you try to query for <em>activated users</em> you find that the database decided <em>not to use the index</em>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">activated</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">---------------------------------------------------------------</span>
<span class="go"> Seq Scan on users (cost=0.00..18334.00 rows=897433 width=38)</span>
<span class="go"> Filter: activated</span>
</pre></div>
<p>Many developers are often baffled when the database is not using an index. One way of explaining why an index is not always the best choice is this: <strong>if you had to read the entire table, would you use the index?</strong></p>
<p>The answer is probably no, because why would you? Reading from disk is expensive and you want to read as little as possible. If for example, a table is 10MB and the index is 1MB, to read the entire table you would have to read 10MB from disk. To read the table using the index you would have to read 11MB from disk. This is wasteful.</p>
<p>With this understanding, let's have a look at the statistics PostgreSQL gather on the table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">attname</span><span class="p">,</span><span class="w"> </span><span class="n">n_distinct</span><span class="p">,</span><span class="w"> </span><span class="n">most_common_vals</span><span class="p">,</span><span class="w"> </span><span class="n">most_common_freqs</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'users'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">attname</span><span class="o">=</span><span class="s1">'activated'</span><span class="p">;</span>
<span class="go">------------------+------------------------</span>
<span class="go">attname | activated</span>
<span class="go">n_distinct | 2</span>
<span class="go">most_common_vals | {t,f}</span>
<span class="go">most_common_freqs | {0.89743334,0.10256667}</span>
</pre></div>
<p>When PostgreSQL analyzed the table it found that the column <code>activated</code> has two distinct values. The value <code>t</code> in the <code>most_common_vals</code> column corresponds to the frequency <code>0.89743334</code> in the column <code>most_common_freqs</code>, and the value <code>f</code> corresponds to the frequency <code>0.10256667</code>. This means that after analyzing the table, the database estimates that 89.74% of the table are activated users, and the rest 10.26% are unactivated users.</p>
<p>With these stats, PostgreSQL decided it's best to scan the entire table if it expects 90% of the rows to satisfy the condition. The threshold after which the database may decide to use or not to use the index depends on many factors, and there is no rule of thumb you can use.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 457.6 174.8" width="40em"><path d="M108 56l27 28m-28-26c4 5 22 20 27 24M156 105c4 5 24 21 29 26m-27-27l26 28M49 107l-19 24m18-25l-16 24M144 105l-18 24m17-22l-17 23M66 105l15 23m-13-23l15 24M97 61L70 78m26-15L73 76" stroke="currentColor" fill="none"/><path d="M10 133l30 1v25H10" fill="#f41d92"/><path d="M8 133c10-1 13 2 34-1m-31 1l30 1m-2 0c2 7 3 11 4 27m-1-27l-1 25m1 2c-13-3-26-3-34-2m32 0l-30 1m-2 0l4-27m-2 26c0-8-1-16 1-25" stroke="currentColor" fill="none"/><path d="M69 132h29l1 26H68" fill="#f41d92"/><path d="M69 133c11-2 22 0 31 1m-31-2h30m0 0c0 6 2 18 0 25m-1-26l2 26m-3-2c-11 1-22 2-28 4m30-1c-10-2-18 0-30 0m0-2c-1-8 0-19-3-25m2 25l-1-25" stroke="currentColor" fill="none"/><path d="M112 131l30-1 3 28h-32" fill="#f41d92"/><path d="M114 131c12-1 23 1 31 1m-33 0h31m1 1v25m-1-26c1 6 2 11 0 25m0 1c-5 0-13 1-32-3m33 3c-11-2-22-1-31 0m1 0c-2-9-2-14-1-28m0 27l-1-26" stroke="currentColor" fill="none"/><path d="M168 134l30-1 1 26-33 1" fill="#f41d92"/><path d="M168 131c7 2 19 2 30 3m-32-1l33 1m-3 0c0 8 3 19 1 23m1-24l-1 25m-1 1h-31m32-1c-7-1-15 1-30 1m-2-3c3-5 2-17 1-22m1 25v-25" stroke="currentColor" fill="none"/><path d="M43 79l32-2 1 28-33-2" fill="#f41d92"/><path d="M44 80c7-3 13-1 32-3m-32 0h32m-1 1c-1 7-2 15 2 27m-2-28v26m-1 2l-28-1m30-1H43m3-1l-3-26m1 26l1-23" stroke="currentColor" fill="none"/><path d="M129 79l34 2v25l-32-1" fill="#f41d92"/><path d="M132 80c4-2 12-1 31-2m-32 2h32m-1 1c-2 5 1 15 1 25m-1-26v25m0-1c-5 2-14 1-32-1m32 2h-31m0 0V80m-1 25l1-26" stroke="currentColor" fill="none"/><path d="M85 36l29-1-1 25-29 1" fill="#f41d92"/><path d="M85 35c10-2 20-2 29-1m-31 0h32m-1 1c1 5 3 12 2 26m-2-26l1 25m0-1c-10-1-19 1-31 2m31-1H84m-2 1c4-9 2-18 1-27m1 26V34M354 62l29 27m-27-27l27 26M404 111l28 27m-29-27l29 25M297 112l-18 24m17-24l-18 22M392 112l-20 23m19-24l-19 25M316 109l13 26m-14-26l13 24M344 67l-25 15m23-16l-24 17" stroke="currentColor" fill="none"/><path d="M256 134l30-1v27h-32" fill="#f41d92"/><path d="M257 133c11-1 21-1 32 2m-33-2l31 1m1-2c0 12-1 22-3 27m2-25v24m1 1c-8 0-16 2-32 0m30 0c-9 0-20-1-29 1m-2 0c0-10 3-19 2-27m0 27l-1-26" stroke="currentColor" fill="none"/><path d="M315 134h33l-1 25-31 3" fill="#fc54ee"/><path d="M314 136l34-2m-31 1l29 1m1 1v22m-1-24c2 5 2 11 1 25m2 0h-32m31 0l-33 1m3 1c-2-8-2-13-2-26m-1 25c0-10 0-20 2-27" stroke="currentColor" fill="none"/><path d="M334 41l29-3 2 26-33 2" fill="#fc54ee"/><path d="M334 40l30-3m-32 2h32m0 1l-1 27m1-28v25m0 1c-10-1-17 0-31-2m31 2l-32-1m2 0c-2-6 0-9-1-24m1 24c-2-8 0-19-1-26" stroke="currentColor" fill="none"/><g><path d="M359 134l28 2 1 23-29 1" fill="#e08fff"/><path d="M359 134l28 1m-30-1h32m0-1c-1 10-1 18 2 27m-3-26v26m1-1c-8 2-14 0-33 2m33 0c-11-2-21-2-32-2m0 0c0-7 2-15 1-23m-1 23v-24" stroke="currentColor" fill="none"/></g><g><path d="M380 85h34l-2 27-31-1" fill="#f071ff"/><path d="M381 85c8 0 16-1 32 2m-32-1h31m0 0c0 7-1 18 1 27m-1-28v26m0 0c-6 1-16-1-29 1m30-1h-32m0 0l2-25m-2 26l1-25" stroke="currentColor" fill="none"/></g><g><path d="M289 86l32-2 2 26-33-1" fill="#f938c5"/><path d="M290 84l32 2m-31-2l31 1m-1-1c0 7 3 16 2 27m-2-27v25m0 2c-9-2-20 0-30-1m30-1l-30 1m-1 0c2-8 1-16 1-24m0 24V84" stroke="currentColor" fill="none"/></g><g><path d="M418 138l29 3 1 23-32 1" fill="#daaeff"/><path d="M416 138c11 0 23 2 33 1m-33 1h32m-1-1c-1 11 1 18 2 26m-2-25v25m2 0c-12-2-21 0-32-1m30 1h-30m0 1l1-25m-1 23l-1-25" stroke="currentColor" fill="none"/></g><g><path d="M180 11c5 0 12 4 16 8 3 4 6 10 5 15 0 5-2 10-6 14-3 4-9 7-14 8s-11 1-16-2c-4-2-9-8-11-12-2-5-1-11 1-16 2-4 6-11 10-13 5-3 13-2 16-2s0 1 0 1m-9-2c5-1 13 2 18 5 4 2 8 7 9 11 2 5 1 13-1 17-2 5-5 10-10 12s-12 4-17 3c-6-1-11-6-14-10s-4-10-3-15c0-5 1-11 4-15 4-4 12-8 15-9 2-1 0 2 0 3" fill="#D30101"/><path d="M169 12c4-2 12-2 17 0s8 6 11 11c2 4 4 9 4 14-1 5-5 12-9 15-3 4-9 6-15 6-5 0-11-2-15-6-5-3-8-10-9-15s-1-11 2-15 12-9 15-11c4-2 5-1 5-1m9 2c5 1 10 3 12 7 3 4 5 12 4 17 0 5-4 10-7 13-4 3-10 7-15 7-4 1-10 0-14-3s-9-10-11-15c-1-4 0-9 2-13 2-5 7-11 12-14 4-2 13-1 15-1 3 0 2 2 1 3" stroke="currentColor" fill="none"/></g><g><path d="M167 21l20 25m-22-26l21 28" stroke="#fff" fill="none"/></g><g><path d="M189 26l-21 19m20-20l-18 21" stroke="#fff" stroke-width="2" fill="none"/></g><g><path d="M426 18c5 0 12 4 15 8 4 4 6 10 6 15-1 5-3 12-7 16-3 3-9 6-14 7-6 1-12 0-16-2-4-3-9-9-10-13-2-5-1-12 1-16 1-5 4-9 9-11s16-2 21-2l4 2m-1-2c4 2 7 7 9 12 2 4 3 12 1 17-1 5-5 10-9 12-4 3-11 5-17 5-5-1-10-4-14-8l-7-17c0-5 2-10 6-14 3-4 9-9 14-10s13 4 15 4l2 1" fill="#51950F"/><path d="M431 20c5 1 9 7 11 11s5 9 4 14-5 11-9 15c-4 3-10 6-15 5-5 0-12-3-16-7-4-3-6-8-7-13s-1-12 2-16c3-5 9-9 14-11 6-1 15 2 18 2 4 1 3 3 3 3m-4-2c4 2 10 7 12 12 2 4 3 11 2 15-1 5-5 10-10 13-4 3-12 6-17 5s-10-5-13-9c-4-4-7-10-8-15 0-5 3-9 6-13s8-11 12-12c5-1 14 4 16 5 3 0 0-2 0-1" stroke="currentColor" fill="none"/></g><g><path d="M439 32l-21 18m20-19l-20 17" stroke="#fff" stroke-width="2" fill="none"/></g><g><path d="M408 42l12 9m-11-11l10 9" stroke="#fff" stroke-width="2" fill="none"/></g></svg>
<figcaption>Index for a column with low selectivity vs. high selectivity</figcaption>
</figure>
<hr>
<h2 id="use-partial-indexes"><a class="toclink" href="#use-partial-indexes">Use Partial Indexes</a></h2>
<p>In the previous section we created an index on a boolean column where ~90% of the of the values were true (activated user). When we tried to query for active users, the database did not use the index. However, when we queried unactivated users the database did use the index.</p>
<p>This brings us to the next question.... if the database is not going to use the index to filter active users, why should we index them in the first place?</p>
<p>Before we answer this question let's look at how much the full index on the <code>activated</code> column weighs:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">users_activated_ix</span>
<span class="go"> Schema | Name | Type | Owner | Table | Size</span>
<span class="go">--------+--------------------+-------+-------+-------+------</span>
<span class="go"> public | users_activated_ix | index | haki | users | 21 MB</span>
</pre></div>
<p>The index is 21MB. Just for reference, the <code>users</code> table is 65MB. This means the index weighs ~32% the size of the table. We also know that ~90% of the index is likely not going to be used.</p>
<p>In PostgreSQL, there is a way to create an index on only a part of the table, using whats called a <a href="https://www.postgresql.org/docs/current/indexes-partial.html" rel="noopener">partial index</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">users_unactivated_partial_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="hll"><span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">activated</span><span class="p">;</span>
</span><span class="go">CREATE INDEX</span>
</pre></div>
<p>Using a <code>WHERE</code> clause, we restrict the rows indexed by the index. Let's first make sure it works:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">activated</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">------------------------------------------------------------------------------------------------</span>
<span class="go"> Index Scan using users_unactivated_partial_ix on users (cost=0.29..3493.60 rows=102567 width=38)</span>
</pre></div>
<p>Amazing, the database was smart enough to understand that the predicate we used in the query can be satisfied by the partial index.</p>
<p>There is another benefit to using partial indexes:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">users_unactivated_partial_ix</span>
<span class="go"> List of relations</span>
<span class="go"> Schema | Name | Type | Owner | Table | Size</span>
<span class="go">--------+------------------------------+-------+-------+-------+---------</span>
<span class="go"> public | users_unactivated_partial_ix | index | haki | users | 2216 kB</span>
</pre></div>
<p>The partial index weighs only 2.2MB. The full index on the column weighed 21MB. The partial index is exactly 10% the size of the full index, which matches the ratio of inactive users in the table.</p>
<hr>
<h2 id="always-load-sorted-data"><a class="toclink" href="#always-load-sorted-data">Always Load Sorted Data</a></h2>
<p>This is one of the things I comment most about in code reviews. It's not as intuitive as the other tips and it can have a huge impact on performance.</p>
<p>Say you have a large sales fact table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="nb">date</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>Every night, during some ETL process, you load data into the table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="p">(</span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 day'</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">sold_at</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100000</span><span class="p">);</span>
<span class="go">INSERT 0 100000</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">VACUUM</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">VACUUM</span>
</pre></div>
<p>To fake a loading process we used random data. We inserted 100K rows with random username, and sale dates from 2020-01-01 to two years forward.</p>
<p>The table is used mostly to produce summary sales reports. Most reports filter by date to get the sales at a specific period. To speed up range scans you create an index on <code>sold_at</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_fact_sold_at_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">(</span><span class="n">sold_at</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
</pre></div>
<p>Let's look at the execution plan of a query to fetch all sales made in June 2020:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">-----------------------------------------------------------------------------------------------</span>
<span class="go"> Bitmap Heap Scan on sale_fact (cost=108.30..1107.69 rows=4293 width=41)</span>
<span class="go"> Recheck Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="go"> Heap Blocks: exact=927</span>
<span class="go"> -> Bitmap Index Scan on sale_fact_sold_at_ix (cost=0.00..107.22 rows=4293 width=0)</span>
<span class="go"> Index Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="go"> Planning Time: 0.191 ms</span>
<span class="go"> Execution Time: 5.906 ms</span>
</pre></div>
<p>After executing the query several times to warm up the cache, the timing settled at ~6ms.</p>
<p><strong>Bitmap Scan</strong></p>
<p>Looking at the execution plan, we can see that the database used a bitmap scan. A bitmap scan works in two stages:</p>
<ul>
<li><code>Bitmap Index Scan</code>: Go through the entire index <code>sale_fact_sold_at_ix</code> and map all the table pages that contain relevant rows.</li>
<li><code>Bitmap Heap Scan</code>: Read the pages that contain relevant rows, and find the rows inside these pages that satisfy the condition.</li>
</ul>
<p>Pages can contain multiple rows. The first step uses the index to find <em>pages</em>. The second step check for <em>rows</em> inside these pages, hence the "Recheck Cond" operation in the execution plan.</p>
<p>At this point many DBAs and developers will call it a day and move on to the next query. BUT, there's a way to make this query better.</p>
<p><strong>Index Scan</strong></p>
<p>To make things better, we'll make a small change in how we load the data.</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">TRUNCATE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">TRUNCATE TABLE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="p">(</span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 day'</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">sold_at</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100000</span><span class="p">)</span>
<span class="hll"><span class="gp">db-#</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">sold_at</span><span class="p">;</span>
</span><span class="go">INSERT 0 100000</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">VACUUM</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">VACUUM</span>
</pre></div>
<p>This time, we loaded the data sorted by the <code>sold_at</code>.</p>
<p>Let's see what the execution plan for the exact same query looks like now:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">---------------------------------------------------------------------------------------------</span>
<span class="go"> Index Scan using sale_fact_sold_at_ix on sale_fact (cost=0.29..184.73 rows=4272 width=41)</span>
<span class="go"> Index Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="go"> Planning Time: 0.145 ms</span>
<span class="go"> Execution Time: 2.294 ms</span>
</pre></div>
<p>After running the query several times we get a stable timing at round 2.3ms. Compared to the previous query that took ~6ms, we get a consistent saving of ~60%.</p>
<p>Another thing we can see right away, is that the database did not use a bitmap scan this time, but a "regular" index scan. Why is that?</p>
<p><strong>Correlation</strong></p>
<p>When the database is analyzing a table it collects all sort of statistics. One of those statistics is <a href="https://www.postgresql.org/docs/current/view-pg-stats.html" rel="noopener"><strong>correlation</strong></a>:</p>
<blockquote>
<p>Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk.</p>
</blockquote>
<p>As the official documentation explains, the correlation measures how "sorted" values of a specific column are on disk.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 228.4 231.6" width="20em"><path d="M117 33l28 26m-26-28l25 26M166 81l28 28m-29-29l28 27M59 82l-18 25m20-26l-20 24M154 82l-18 24m16-24l-17 22M78 81l14 23M77 80l14 25M107 38L80 54m26-17L82 52" stroke="currentColor" fill="none"/><path d="M19 103h30l1 27H18" fill="#f41d92"/><path d="M17 105h34m-33-2c11 1 22 2 32 1m1-1c-3 12 0 20-3 28m2-27l-1 25m1 1c-8 1-19 0-31-2m31 1H19m-1-1c0-6 1-13-1-25m1 27l1-26" stroke="currentColor" fill="none"/><path d="M77 105l32 1 1 25-31-2" fill="#fc54ee"/><path d="M79 105c11 2 23 3 32 2m-33-2l32 1m-1-2c3 12 2 20 2 27m-2-26c0 6-1 10 1 26m0 2c-8-2-19-1-33-1m32-1H78m-1-1l1-22m1 23l-1-25" stroke="currentColor" fill="none"/><path d="M95 9l32 1-1 26-32-2" fill="#fc54ee"/><path d="M94 10h34m-33 0l32 1m-2 1c2 8 0 18 1 21m0-23l1 25m1 2l-32-3m31 1l-32 1m0-1c1-7-1-11 2-24m-2 25l1-27" stroke="currentColor" fill="none"/><path d="M119 105l33-1v24l-34 1" fill="#e08fff"/><path d="M120 106h31m-31-2c10 2 18 2 32 2m1 0c-1 7 0 16-4 23m2-24v26m2-1l-34 1m32-1l-30-1m-1 2c-1-8 0-11 1-25m0 25l-2-26" stroke="currentColor" fill="none"/><path d="M144 57h31l-2 23-31 3" fill="#f071ff"/><path d="M145 58l31-2m-33 2l33-1m0 1c-3 7-2 10-2 24m1-26c-1 8 0 15 1 26m-1 0h-31m31 0c-8-1-16 1-32-1m2-1c-2-4 1-7 0-25m0 26l-1-25" stroke="currentColor" fill="none"/><path d="M53 54l32-1-2 26H54" fill="#f938c5"/><path d="M54 57l30-3m-30 2l31-1m0-1c1 6-2 11-1 26m-1-26l1 26m0 1c-13 0-25 0-31-2m31 1l-31-1m-1 0c1-4 3-11 0-23m1 24V55" stroke="currentColor" fill="none"/><path d="M179 106l29-2 1 29-31-1" fill="#daaeff"/><path d="M181 107l28-2m-29 1c10 1 21 0 30-1m-1 3l1 22m-1-24c1 8 2 16 1 26m1-2c-12 0-24 2-33 1m32 0h-32m0 0c3-5 2-15 2-25m-1 25v-25" stroke="currentColor" fill="none"/><path d="M11 196l30-1-1 28-28-1" fill="#f41d92"/><path d="M10 197l31-2m-31 0h32m0 2l-1 23m0-23l-1 24m0 2c-4-2-12 0-31-2m31 0l-30 1m1-1c-3-8 0-17-3-23m1 23l1-24" stroke="currentColor" fill="none"/><path d="M81 196l30-1 2 28-31-3" fill="#fc54ee"/><path d="M82 195h29m-31 1h31m1 1l-2 23m2-24v25m2 2c-10-2-15-2-34-1m31-1l-30 1m-1-2c0-3-1-10 3-25m-3 27l1-25" stroke="currentColor" fill="none"/><g><path d="M148 196l31 2 3 25-34-3" fill="#e08fff"/><path d="M147 197c13 0 26-2 32 1m-30-1l31-1m1 0l-3 25m2-24l-1 24m2 0c-13 3-25 0-31 0m30 0h-31m-2 0c3-9 0-13 1-24m1 25v-26" stroke="currentColor" fill="none"/></g><g><path d="M114 194l34 3-1 22-34 2" fill="#f071ff"/><path d="M115 194l33 1m-33 1l31-1m0 1c-1 8-2 15-1 26m2-27l-2 26m0-1l-30 2m30-2l-31 1m1 0c-2-11 2-19 1-25m0 25c-2-6 0-11-1-26" stroke="currentColor" fill="none"/></g><g><path d="M44 198l34-1-3 23-30 3" fill="#f938c5"/><path d="M44 197c12-2 22 1 32-1m-30 1c7-2 13-1 30 0m1-1c-1 7-3 16 1 24m-2-23c-1 8-1 16 1 24m-1 2l-30-3m30 1c-7 1-14-1-31 1m1-2c-2-3-2-11 0-25m-1 26v-25" stroke="currentColor" fill="none"/></g><g><path d="M189 197l31-3-3 27-28 1" fill="#daaeff"/><path d="M188 195c5 0 15 1 30-1m-31 3c7-2 16-2 32-1m-1 0l-1 26m2-27l-1 27m2-2c-13 2-24 2-34 0m32 1h-30m-3-2c4-4 3-7 1-24m0 26c2-7 1-15 1-26" stroke="currentColor" fill="none"/></g><g><path d="M38 139l-4 44m3-45l-4 46M29 164l3 19m-4-22l6 23M44 165c-4 4-7 9-12 18m11-20l-9 21" stroke="currentColor" fill="none"/></g><g><path d="M88 142l-23 44m25-45l-26 46M70 162c-5 8-3 16-7 25m5-25l-4 25M85 171c-10 5-13 10-22 16m21-16l-20 16" stroke="currentColor" fill="none"/></g><g><path d="M141 142l27 43m-28-44l27 46M149 172c3 1 7 5 16 13m-17-16l20 17M164 163l1 22m-2-24l5 25" stroke="currentColor" fill="none"/></g><g><path d="M198 144l6 39m-7-37l9 38M194 166c3 7 10 14 13 17m-12-17l10 19M207 163v20m2-20l-4 22" stroke="currentColor" fill="none"/></g></svg>
<figcaption>correlation = 1</figcaption>
</figure>
<p>When the correlation is 1, or close to 1, it means the pages in the tables are stored on disk in roughly the same order as the rows in the table. This is actually very common. For example, auto incrementing ID's will usually have a correlation close to 1. Date and timestamp columns that keeps track of when rows were created will also usually have a correlation close to 1.</p>
<p>When the correlation is -1, the pages of the table are sorted in reverse order relative to the column.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 238.1 229.4" width="20em"><path d="M158 192l32-4v29l-31-1" fill="#f41d92"/><path d="M158 191c6-2 16 1 31 1m-31-2h31m0 2c-2 5-1 12 1 24m-1-26l-1 26m2-2c-9 0-18 3-34 4m34-3h-33m0 1c2-7-1-14 2-26m-2 25c2-5 1-9 0-25" stroke="currentColor" fill="none"/><path d="M47 195h28l3 25-33-1" fill="#fc54ee"/><path d="M46 193c9 2 23 1 32 0m-33 0c10 2 20 1 31 1m3-2c-2 10-2 21-2 28m0-27v26m-2 0c-12-1-20-1-27 1m28-1H45m1 2l-1-29m0 28v-26" stroke="currentColor" fill="none"/><path d="M11 192l31 3v24H10" fill="#e08fff"/><path d="M9 193c10 3 18 2 31 1m-31-1l32 2m-2-1l3 25m-2-26l1 27m-1-2c-8 3-17 3-29 1m30 1l-31-2m1 2c-3-8 0-16-3-26m3 25c0-8-2-17-1-24" stroke="currentColor" fill="none"/><path d="M120 191h35l-2 24-31-1" fill="#f071ff"/><path d="M121 191l33-2m-32 2h32m-1-1v26m0-26l1 25m-2-1c-9 0-20 2-31 1m33 0l-33 1m2-2l-1-23m0 25l1-25" stroke="currentColor" fill="none"/><path d="M198 191l31-1-1 25-32 2" fill="#f938c5"/><path d="M198 192l29-1m-29-1h31m-2 0c2 11 2 20 2 25m-1-24v25m-1 1c-7 0-12-3-31 0m32-1l-31-1m-2 1c3-9 0-16 3-28m-1 28v-27" stroke="currentColor" fill="none"/><path d="M86 193l29-1v24l-29 3" fill="#daaeff"/><path d="M86 192c7-2 14-1 30 0m-32 0c7-1 16 1 32 1m0 0l-1 24m1-26v26m1-1c-8 2-19 1-31 0m30 1H84m1 1l-2-28m2 26c-1-9 0-18-2-24M118 32l27 28m-29-29l28 27M165 81l29 27m-27-27l26 28M59 83l-19 24m18-25l-19 24M153 81l-18 24m17-25l-17 26M76 81l15 24M75 81l15 26M106 37L81 52m25-16L80 53" stroke="currentColor" fill="none"/><path d="M18 105l31-1 1 27-32-3" fill="#f41d92"/><path d="M18 103c8 2 19 1 31 0m-31 2l31-2m0 1c-2 11-1 21 2 25m-3-26c2 9 1 17 1 26m-1 2l-31-3m32 1H17m1-1c-1-8 2-13-1-22m0 24l1-25" stroke="currentColor" fill="none"/><path d="M78 106h31l2 25-35-1" fill="#fc54ee"/><path d="M76 105c15 2 26 1 31 1m-29 0l30 1m0-3c2 6 2 14 0 28m0-26l1 25m1-1c-10 2-20 0-34 1m33 0H77m-1 0c1-6 4-16 4-25m-2 25v-25" stroke="currentColor" fill="none"/><path d="M95 10l31 2-1 22-32 1" fill="#fc54ee"/><path d="M94 11l32-2m-31 1l32-1m-3 2c1 9 3 17 2 26m1-27c-2 8-1 16-1 25m-1 2c-7-1-18-3-31-3m32 1H95m-1-1c3-2 0-7 1-23m-1 24l1-25" stroke="currentColor" fill="none"/><g><path d="M120 106l33-1-4 24-30 2" fill="#e08fff"/><path d="M120 106c9-1 17-1 29 1m-29-2l30 1m2-2c-1 10-2 16 0 27m-2-26l1 25m1 1c-8-2-18-1-34 0m33 0c-9-1-17 0-32-2m-1 3c4-10 4-16 3-26m-2 24l1-25" stroke="currentColor" fill="none"/></g><g><path d="M144 59l30-4-1 27-29-2" fill="#f071ff"/><path d="M144 57c7-2 14-2 32-1m-32 1h31m-2 0c3 6 2 15 2 24m-1-25v25m1 2c-13-2-24-2-31-2m31 1h-33m3 2c-3-5-2-11 0-29m-3 27l1-25" stroke="currentColor" fill="none"/></g><g><path d="M54 55l28-2 1 28-32-2" fill="#f938c5"/><path d="M52 55c7 0 16 2 31 1m-30-1h31m0 1c-1 5 1 10-2 23m1-24v26m1-2c-10 1-18 2-32 1m31 0c-10 1-20 0-31-1m2 0c1-6 0-11-3-23m1 24l1-25" stroke="currentColor" fill="none"/></g><g><path d="M178 104l30 2 2 24-33 3" fill="#daaeff"/><path d="M179 105l32 1m-33-1h31m2 0l-3 28m1-27l1 25m-2-1l-30 2m31-1h-32m1 2l1-28m0 27v-25" stroke="currentColor" fill="none"/></g><g><path d="M37 140l145 43M36 139l145 43M151 184c13-1 23-4 32-4m-33 4l31-2M157 164l26 16m-27-16l25 18" stroke="currentColor" fill="none"/></g><g><path d="M88 141l-24 46m23-47l-23 46M67 161l-5 26m5-28c-1 9-3 16-3 27M83 169l-21 18m21-20l-19 19" stroke="currentColor" fill="none"/></g><g><path d="M140 142L20 185m119-44L19 186M43 167c-10 5-16 12-23 19m21-19l-22 18" stroke="currentColor" fill="none"/><path d="M50 187c-13-3-21-3-30-1m28 0c-7 0-15 1-29-1" stroke="currentColor" fill="none"/></g><g><path d="M197 145L91 185m105-40L90 186M113 166l-23 20m23-19l-24 19M121 185c-12 3-21 3-31 1m30 0H89" stroke="currentColor" fill="none"/></g></svg>
<figcaption>correlation ~ 0</figcaption>
</figure>
<p>When the correlation is close to 0, it mean the values in the column have no or very little correlation to how the pages of the table are stored.</p>
<p>Going back to our <code>sale_fact</code> table, when we loaded the data into the table without sorting it first, these were the correlations:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">tablename</span><span class="p">,</span><span class="w"> </span><span class="n">attname</span><span class="p">,</span><span class="w"> </span><span class="n">correlation</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'sale_fact'</span><span class="p">;</span>
<span class="go"> tablename | attname | correlation</span>
<span class="go">-----------+----------+--------------</span>
<span class="go"> sale | id | 1</span>
<span class="go"> sale | username | -0.005344716</span>
<span class="hll"><span class="go"> sale | sold_at | -0.011389783</span>
</span></pre></div>
<p>The auto generated column <code>id</code> has a correlation of 1. The <code>sold_at</code> column has a very low correlation: consecutive values are scattered across the entire table.</p>
<p>When we loaded sorted data into the table, these were the correlations calculated by the database:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">attname</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">correlation</span>
<span class="c1">-----------+----------+----------------</span>
<span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">-</span><span class="mf">0.00041992788</span>
<span class="hll"><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
</span></pre></div>
<p>The correlation for <code>sold_at</code> is now 1.</p>
<p>So why did the database use a bitmap scan when the correlation was low, and an index scan when the correlation was close to 1?</p>
<ul>
<li>When the correlation was 1, the database estimated that rows in the requested range are likely to be in consecutive pages. In this case, an index scan is likely to read very few pages.</li>
<li>When the correlation was close to 0, the database estimated that rows in the requested range are likely to be scattered across the entire table. In this case, it makes sense to use a bitmap scan to map the table pages in which rows exist, and only then fetch them and apply the condition.</li>
</ul>
<p>The next time you load data into a table, think about how the data is going to be queried, and make sure you sort it in a way that indexes used for range scan can benefit from.</p>
<p><strong><code>CLUSTER</code> Command</strong></p>
<p>Another way of "sorting a table on disk" by a specific index is to use the <a href="https://www.postgresql.org/docs/current/sql-cluster.html" rel="noopener"><code>CLUSTER</code> command</a>.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">TRUNCATE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">TRUNCATE TABLE</span>
<span class="go">-- Insert rows without sorting</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="p">(</span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">username</span><span class="p">,</span>
<span class="gp">db-#</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 day'</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">round</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">365</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">sold_at</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span>
<span class="gp">db-#</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">100000</span><span class="p">)</span>
<span class="k">INSERT</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="mf">100000</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">ANALYZE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">tablename</span><span class="p">,</span><span class="w"> </span><span class="n">attname</span><span class="p">,</span><span class="w"> </span><span class="n">correlation</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'sale_fact'</span><span class="p">;</span>
<span class="go"> tablename | attname | correlation</span>
<span class="go">-----------+-----------+----------------</span>
<span class="hll"><span class="go"> sale_fact | sold_at | -5.9702674e-05</span>
</span><span class="go"> sale_fact | id | 1</span>
<span class="go"> sale_fact | username | 0.010033822</span>
</pre></div>
<p>We loaded data into the table in random order and as a result the correlation of <code>sold_at</code> is close to zero.</p>
<p>To "rearrange" the table by <code>sold_at</code>, we used the <code>CLUSTER</code> command to sort the table on disk according to the index <code>sale_fact_sold_at_ix</code>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">CLUSTER</span><span class="w"> </span><span class="n">sale_fact</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">sale_fact_sold_at_ix</span><span class="p">;</span>
</span><span class="go">CLUSTER</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">sale_fact</span><span class="p">;</span>
<span class="go">ANALYZE</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">tablename</span><span class="p">,</span><span class="w"> </span><span class="n">attname</span><span class="p">,</span><span class="w"> </span><span class="n">correlation</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stats</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'sale_fact'</span><span class="p">;</span>
<span class="go"> tablename | attname | correlation</span>
<span class="go">-----------+----------+--------------</span>
<span class="hll"><span class="go"> sale_fact | sold_at | 1</span>
</span><span class="go"> sale_fact | id | -0.002239401</span>
<span class="go"> sale_fact | username | 0.013389298</span>
</pre></div>
<p>After the table was clustered we can see that the correlation for <code>sold_at</code> is 1.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 231 130.3" width="20em"><path d="M83 10l29 1 2 27-30-3" fill="#f41d92"/><path d="M81 10c10 1 22 2 31-1m-31 2l32-1m1 0c-1 8-2 15-1 27m-1-27c1 9 0 14 2 26m1 2l-31-2m29 0c-8-1-19 0-31 1m0-3l-1-22m2 25c-2-11-1-21 0-27" stroke="currentColor" fill="none"/><path d="M46 11l31-1v28l-29 1" fill="#fc54ee"/><path d="M49 13c11-2 23-1 31-3m-32 1c10 1 19 0 31 2m-2-1c2 7 0 14 3 24m-2-23v25m-1 1c-10-3-22-4-30-3m32 0l-31 1m-2 0c2-7 2-9 2-25m0 25V12" stroke="currentColor" fill="none"/><path d="M9 10h31l3 27-31 2" fill="#e08fff"/><path d="M12 12h28M9 13c14-2 26-2 33-2m-1 1l-1 24m2-25v27m1 0L9 36m32 1l-31-1m0 1c-2-7 2-13 0-24m0 24V11" stroke="currentColor" fill="none"/><path d="M114 9l32 2 1 26-29-2" fill="#f071ff"/><path d="M116 9c11 1 24-1 31 3m-30-2h30m0 0c-2 6 2 11-1 26m1-27c0 10-1 19 1 26m0 0c-6-1-15 0-33 1m32-2l-32 1m1-2l1-21m-2 22c2-5 2-11 0-24" stroke="currentColor" fill="none"/><path d="M151 11l29-2 3 27h-30" fill="#f938c5"/><path d="M153 10c7-1 21 0 28 1m-31-1c8-1 15 0 32 1m0-3v29m1-27v25m0 1c-12-1-26 1-34-1m34 0l-31-1m0 0c-3-6 1-14-2-24m1 26V10" stroke="currentColor" fill="none"/><path d="M187 13l31 1-1 24-30-2" fill="#daaeff"/><path d="M185 10c13 1 25 3 34 2m-33 0h30m0-1c2 6 2 14 3 27m-1-27c-2 8-2 16-2 25m1 0c-13 1-22 2-32 0m32 0l-31 2m-2-2c1-6 3-9 3-24m0 25c-2-6-1-14-1-26" stroke="currentColor" fill="none"/><path d="M13 92l35 4-3 23H15" fill="#f41d92"/><path d="M14 92c10 0 20 0 31 3m-30-2l30 2m0-2c0 12 0 20 3 25m-2-24v25m0-1c-6 2-15 1-33 3m33-1H14m1 1l-1-28m1 27V94" stroke="currentColor" fill="none"/><g><path d="M84 96l35-3-2 26-32 2" fill="#fc54ee"/><path d="M84 94c10 1 20-1 34-1m-33 2h33m-2-1c2 7 1 18-1 24m1-25l1 26m-2 2c-6-1-15-3-28 0m30-1c-11 1-21-1-31-1m1 1c-3-6-2-12-2-27m1 27l-1-27" stroke="currentColor" fill="none"/></g><g><path d="M156 96l28-2 1 25-30 1" fill="#e08fff"/><path d="M153 93c12 0 22 3 31 1m-30 1h32m-2 0c0 7 2 12 0 24m0-24l1 24m2 1h-31m30-1h-33m1-1l-2-24m2 26c0-9 1-16-1-26" stroke="currentColor" fill="none"/></g><g><path d="M122 92l29 1-1 27-30-3" fill="#f071ff"/><path d="M118 94c7-2 18-2 32-1m-30 1c7-2 16 0 31-1m1-1c-1 9-3 12-1 25m1-24l-1 26m0 0c-9-2-16 1-30 1m29-1l-31-1m-1 0l1-25m1 25l-1-24" stroke="currentColor" fill="none"/></g><g><path d="M48 94h33l1 26-34-2" fill="#f938c5"/><path d="M50 93c5 3 11 1 31 0m-30 1h30m-1 0c2 5-1 13-1 25m2-24c0 6-1 12 1 25m-1 0H52m30 0H49m1-1c-1-10-2-19 0-26m0 26c0-9-1-18 1-24" stroke="currentColor" fill="none"/></g><g><path d="M190 95l30 1 1 23-32 2" fill="#daaeff"/><path d="M188 94c11-1 17 1 32-1m-31 1c11 2 21 1 31 1m1 1c-2 4 0 10-1 23m2-24l-1 25m0-2c-10 4-23 4-31 3m31-1l-31 1m-1-2V96m0 25c0-9 0-14 2-27" stroke="currentColor" fill="none"/></g><g><path d="M115 50l2 33m-3-34l2 35M110 68c0 6 3 10 6 18m-8-17l7 15M121 67c-3 6-3 11-5 19m4-17l-5 15" stroke="currentColor" fill="none"/></g></svg>
<figcaption>CLUSTER command</figcaption>
</figure>
<p>Some things to note about the <code>CLUSTER</code> command:</p>
<ul>
<li>Clustering the table by a specific column may affect the correlation of other column. See for example the correlation of the column <code>id</code> after we clustered the table by <code>sold_at</code>.</li>
<li><code>CLUSTER</code> is a heavy, blocking operation, so make sure you don't execute it on a live table.</li>
</ul>
<p>For these two reason it's best to insert the data sorted and not rely on <code>CLUSTER</code>.</p>
<hr>
<h2 id="index-columns-with-high-correlation-using-brin"><a class="toclink" href="#index-columns-with-high-correlation-using-brin">Index Columns With High Correlation Using BRIN</a></h2>
<p>When talking about indexes, most developers will think about B-Tree indexes. But, PostgreSQL provides other types of indexes such as <a href="https://www.postgresql.org/docs/current/brin.html" rel="noopener">BRIN</a>:</p>
<blockquote>
<p>BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table</p>
</blockquote>
<p>BRIN stands for Block Range Index. According to the documentation, a BRIN index works best for columns with high correlation. As we've already seen in previous sections, some fields such as auto incrementing IDs and timestamps are naturally correlated with the physical structure of the table, hence they are good candidates for a BRIN index.</p>
<p>Under some circumstances, a BRIN index can provide a better "value for money" in terms of size and performance compared to a similar B-Tree index.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 408.4 132.1" width="35em"><path d="M9 96l29 1 1 22-29 2" fill="#f41d92"/><path d="M12 97c9 0 18 2 26 1m-28 0c11-2 22-1 28-1m-1 0c-1 9 1 18-1 21m2-21v22m1 2c-9-3-18-1-31-2m30 1l-29-1m0 1c-1-6 1-10 1-22m0 22c1-8 0-14-1-23" stroke="currentColor" fill="none"/><path d="M101 99l28 1v21l-26-1" fill="#fc54ee"/><path d="M101 100c10-2 23-1 29 1m-28-2l27 1m0-3c1 8 1 11-1 23m1-21l-1 22m2 0l-29-1m29 1h-29m1-2c-1-5 0-13 1-20m-1 23l-1-23" stroke="currentColor" fill="none"/><path d="M234 99l29-1-2 25h-29" fill="#e08fff"/><path d="M234 98c5 2 12 2 29 0m-29 0h28m-3 1c3 5 0 8 1 23m1-24v23m0-1c-9 2-15 1-29 0m29 1h-28m-2 0c3-6 2-13 3-24m0 24l-1-23" stroke="currentColor" fill="none"/><path d="M199 99l27-1 2 20-27 3" fill="#f071ff"/><path d="M198 97c14 1 24 3 28 0m-28 1c9 0 16 1 28-1m1 2v20m-1-21l1 22m-2-1c-8 0-18 0-27 2m29-2l-28 2m0 0c1-7 1-12-2-24m2 23c1-8 0-15-1-22" stroke="currentColor" fill="none"/><path d="M73 99l26 2-2 22-25-3" fill="#f938c5"/><path d="M70 101l28-3m-26 1c10 1 21 0 26-1m2 2c-3 5-3 14-2 22m1-23c1 7 0 15-1 23m-1 0c-10-1-20-2-25 0m26-1H71m1 2c-1-8 0-19-2-23m1 22V99" stroke="currentColor" fill="none"/><path d="M267 98l30-2v25l-31-2" fill="#daaeff"/><path d="M269 98c9 2 17-1 25 1m-26-1h26m2-2c-2 5 0 10 0 25m-1-22c-1 6 0 13 1 22m0-2c-9-1-18 2-28 2m27-1c-7-1-15 0-28 1m2-1c0-4-3-10-1-20m0 20V99" stroke="currentColor" fill="none"/><path d="M72 9l91 1 1 39-93 3" fill="#ced4da"/><path d="M74 12l90-2m-92 0l92-1m1 1c1 17-1 31-3 41m2-41v41m1 0c-21-2-43-1-92-2m91 1c-26 2-53 0-93 1m2-2c-1-10-3-24-2-37m1 39V11" stroke="transparent" fill="none"/><path d="M258 12l90 2 1 38-94 1" fill="#ced4da"/><path d="M257 11c32 1 67 2 93 0m-3 1l4 43m-2-1c-35 0-70-3-92 1m2-2c-4-11-2-17-1-42" stroke="transparent" stroke-width="1.5" fill="none" stroke-dasharray="3 6"/><path d="M79 20l34-1v27H80" fill="#f41d92"/><path d="M78 20l34-1m-32-1c12 2 21 1 31 0m0 1c0 10 1 15-1 26m1-25v24m0-2c-10 2-20 1-31 1m31 1c-12 1-22-1-32 1m0 0c3-10 0-22-1-27m3 26l-1-25" stroke="currentColor" fill="none"/><path d="M120 21l32 1 2 22-31 2" fill="#fc54ee"/><path d="M121 20c7-1 16 1 31 1m-30-1h30m2-2c-1 7-3 13 0 28m-1-27v25m-2-1c-4 3-12 2-31 2m32 0l-31 1m2-2l-2-23m0 23l1-25" stroke="currentColor" fill="none"/><g><path d="M264 18l29-1 1 28-31-3" fill="#f071ff"/><path d="M264 17c5 3 15 1 29 1m-31 0c10 0 21 0 32 2m0 1c0 3-1 10 1 23m-1-26v27m0-3c-9 3-19 0-31 0m30 1l-31 2m0 0l3-25m-3 25c2-10 0-21 0-26" stroke="currentColor" fill="none"/></g><g><path d="M308 21h29l1 24-28 1" fill="#daaeff"/><path d="M309 19c8 2 14 1 30 2m-32-1h33m1 1c-3 6-1 10-1 24m0-24v24m-2-1c-12 2-22 1-31 0m31 3c-9-2-20-1-30-2m1 2c-1-8-2-17-1-26m-1 24c2-8 1-16 1-25" stroke="currentColor" fill="none"/></g><g><path d="M42 101h26l-2 21-25 1" fill="#fc54ee"/><path d="M41 102c6-2 12-1 28-4m-30 3c10-2 20-1 29-1m-2 1l3 23m-2-25v22m2 1c-11-1-22 1-31-1m30 1l-29-1m0 3c-1-9 0-15 1-26m0 25l-1-22" stroke="currentColor" fill="none"/></g><g><path d="M131 97h28v23l-24-2" fill="#f41d92"/><path d="M133 97c8-1 12-1 28 1m-28 0l27-1m2 2c-3 8 0 17 1 21m-3-22l1 21m-1 0c-7-1-18-1-25 2m26 0h-28m-1 0c0-6 1-12-1-21m2 20V99" stroke="currentColor" fill="none"/></g><g><path d="M166 100l26-2 1 24-27-1" fill="#fc54ee"/><path d="M164 98l30-1m-29 1l26 1m0 0c2 7 1 10-1 21m2-22l1 24m0 1c-9 0-14-3-27-1m27-1l-29-1m0 1c0-5 2-15 0-22m1 22l-1-22" stroke="currentColor" fill="none"/></g><g><path d="M337 99l26 1 1 20-25-1" fill="#e08fff"/><path d="M337 98c7 1 14-1 26 1m-26-2c8 2 17 0 27 1m0 0c2 6-1 13 1 20m-1-21v24m0 0l-27 1m27-1c-9 0-18 0-27-2m1 0c-1-8 0-13-3-20m2 22l-1-23" stroke="currentColor" fill="none"/></g><g><path d="M303 96l29 1-3 25-29-1" fill="#f071ff"/><path d="M302 96l29 1m-29 1h28m0 1l-2 21m1-23l1 23m0 2c-10-1-21-2-26-1m26-2l-29 1m-1 1c0-9 2-20 3-23m0 22l-1-22" stroke="currentColor" fill="none"/></g><g><path d="M369 99l29-2 2 21-28 1" fill="#daaeff"/><path d="M370 99c8-2 18 0 28-1m-27 0l28-1m1-1c-3 8-1 17-1 25m-1-23l1 23m1 0c-9 0-19 0-28-2m26 1c-8 0-17-1-27 1m0-3c1-4-2-10 0-22m0 25V98" stroke="currentColor" fill="none"/></g><g><path d="M20 88c0-4-14-18-1-20 14-1 67 12 82 11 16-1 7-17 9-16 2 0-8 16 3 18 11 1 49-11 61-9 12 3 9 19 11 22M19 87c0-3-15-20-1-21 15-1 70 15 86 15 15-1 5-18 7-18 1 0-7 17 4 18 11 2 48-11 60-9s8 17 10 21" stroke="#ced4da" fill="none"/></g><g><path d="M214 88c0-3-15-16 0-17s73 11 90 10c16-1 5-16 7-16 1 0-8 17 4 19 12 1 54-12 66-10s7 18 8 22m-174-5c0-4-17-20-2-22 14-1 74 14 90 14 17-1 6-18 8-17 1 0-9 17 3 18 11 1 53-12 65-11 13 2 8 18 10 22" stroke="#ced4da" fill="none"/></g></svg>
<figcaption>BRIN Index</figcaption>
</figure>
<p>A BRIN index works by keeping the range of values within a number of adjacent pages in the table. Say we have these values in a column, each is single table page:</p>
<div class="highlight"><pre><span></span>1, 2, 3, 4, 5, 6, 7, 8, 9
</pre></div>
<p>A BRIN index works on ranges of adjacent pages in the table. If the number of adjacent pages is set to 3, the index will divide the table into the following ranges:</p>
<div class="highlight"><pre><span></span>[1,2,3], [4,5,6], [7,8,9]
</pre></div>
<p>For each range, the BRIN index <strong>keeps the minimum and maximum value</strong>:</p>
<div class="highlight"><pre><span></span>[1β3], [4β6], [7β9]
</pre></div>
<p>Using the index above, try to search for the value 5:</p>
<ul>
<li><code>[1β3]</code> - Definitely not here</li>
<li><code>[4β6]</code> - Might be here</li>
<li><code>[7β9]</code> - Definitely not here</li>
</ul>
<p>Using the BRIN index we managed to limit our search to blocks 4β6.</p>
<p>Let's take another example, this time the values in the column will have a correlation close to zero, meaning they are <em>not</em> sorted:</p>
<div class="highlight"><pre><span></span>[2,9,5], [1,4,7], [3,8,6]
</pre></div>
<p>Indexing 3 adjacent blocks produces the following ranges:</p>
<div class="highlight"><pre><span></span>[2β9], [1β7], [3β8]
</pre></div>
<p>Let's try to search for the value 5:</p>
<ul>
<li><code>[2β9]</code> - Might be here</li>
<li><code>[1β7]</code> - Might be here</li>
<li><code>[3β8]</code> - Might be here</li>
</ul>
<p>In this case the index is not limiting the search <em>at all</em>, hence it is useless.</p>
<p><strong>Understanding <code>pages_per_range</code></strong></p>
<p>The number of adjacent pages is determined by the parameter <code>pages_per_range</code>. The number of pages per range effects the size and accuracy of the BRIN index:</p>
<ul>
<li>A large <code>pages_per_range</code> will produce a small and less accurate index</li>
<li>A small <code>pages_per_range</code> will produce a bigger and more accurate index</li>
</ul>
<p>The default <code>pages_per_range</code> is 128.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 408.4 135.6" width="35em"><path d="M10 100l27 2 1 21-27-1" fill="#f41d92"/><path d="M8 102c12-1 21 0 28-4m-26 2l29 1m-3-1c3 7 1 12 0 23m2-22c0 5-1 10 1 22m0 1l-29-1m28 0H10m1-1c1-5 0-8-2-22m1 24V99" stroke="currentColor" fill="none"/><path d="M101 102l29 1-1 23h-29" fill="#fc54ee"/><path d="M103 102c10-1 21 0 24 2m-26-2h28m0 1v21m1-21c-2 6-2 14-1 22m1 1c-10-2-17-1-28-1m27-1h-27m-1 3c0-7 2-16 0-26m1 24l-2-22" stroke="currentColor" fill="none"/><path d="M233 103l29-1v21l-27 1" fill="#e08fff"/><path d="M233 103c9 0 20-3 29-1m-29 0h28m1 1c-2 5-3 8-2 21m1-23l1 23m0 0c-12 1-22 1-28-1m26 0c-7 2-15 2-27 1m2 0l-3-20m2 21v-23" stroke="currentColor" fill="none"/><path d="M200 102h26l-1 23h-27" fill="#f071ff"/><path d="M200 100c6 0 9 3 26 3m-28-2c10 2 20 0 28 0m1 0c1 3 0 9-1 22m0-22v24m0-2l-29 1m28 0h-26m0 1c2-10 0-19 1-25m0 23c-2-5-2-11-2-22" stroke="currentColor" fill="none"/><path d="M72 102l28-1v23l-28-1" fill="#f938c5"/><path d="M69 104c11-4 24-2 31-1m-29-1h27m0 2v22m1-24l1 24m-2 0c-9-1-19-2-27-1m28 1H71m2 1c-3-8-3-13-3-25m1 23v-22" stroke="currentColor" fill="none"/><path d="M267 102l27 1v22l-28-3" fill="#daaeff"/><path d="M269 100c4 3 12 3 27 0m-29 2l27-1m2-1v26m-2-24l2 23m0-1c-10 0-22-1-29 1m29-1h-28m1-1c-2-9 0-18-2-20m0 21c-1-8 0-16 1-22" stroke="currentColor" fill="none"/><path d="M26 15h78l-1 33-79 1" fill="#ced4da"/><path d="M24 14c30 1 58 2 77 1m-75 1h76m-1 0c0 11 0 24 3 34m-1-34v34m-1 0c-22-1-46 0-75-2m75 2l-77 1m-1-1c0-9 0-21 3-33m-1 32V16" stroke="transparent" fill="none"/><path d="M216 12h85v35l-85 2" fill="#ced4da"/><path d="M215 13c18 1 37-2 85-2m2-1c-2 9-3 20 0 40m-1-2c-22-1-42 2-84 0m-1 2l2-38" stroke="transparent" stroke-width="1.5" fill="none" stroke-dasharray="3 6"/><path d="M34 23l25-1-2 21-25 1" fill="#f41d92"/><path d="M33 23c10 1 17 0 24-1m-25 1h27m-2 0c0 9 3 16 0 21m2-21l-1 21m0-1l-28 1m28 0l-26 1m-1 1c0-11 3-17 3-24m-2 22l1-21" stroke="currentColor" fill="none"/><path d="M66 25l25-3 1 23-26 1" fill="#fc54ee"/><path d="M66 25c7-3 12-2 26-2m-26 0c10 1 22 2 26 1m0-1v20m1-18l-1 20m-1 0c-7 1-16-2-24 2m25-2l-25 1m1-3c-2-2-1-6-3-20m1 22c2-6 1-13 0-20" stroke="currentColor" fill="none"/><path d="M223 16h26l1 24-27 2" fill="#f071ff"/><path d="M224 18c3-1 9-2 25 1m-28-3c11 2 21 1 30 1m-2 1c3 7 0 13 0 24m2-24l-1 23m0-1h-26m26 1l-27-1m-2 0c3-6 3-15 0-22m1 23l1-24" stroke="currentColor" fill="none"/><path d="M262 17l29 2 1 23h-27" fill="#daaeff"/><path d="M264 17l30 3m-32-1h29m2 2c-1 3-2 8 1 20m-2-22v23m0 1c-11-3-18 0-29-1m30 0h-30m0 1c1-5-1-13 1-26m0 25l-1-23" stroke="currentColor" fill="none"/><path d="M38 102l30 1v24H40" fill="#fc54ee"/><path d="M38 102c9 0 16 3 30 0m-28 2l28-1m1 1c-3 5-3 14-2 22m1-22v22m1-2c-10 2-13 1-29 1m27 2l-27-2m1 1c1-10-2-17-3-23m2 23v-24" stroke="currentColor" fill="none"/><path d="M132 101l28 1 2 22-27-1" fill="#f41d92"/><path d="M132 100c10 1 23 2 27 0m-26 2l27-1m1-1c-1 6 2 12 1 24m-1-23v22m-2-1c-6 3-15 2-27 2m30 0l-30 1m2 1c-2-8-1-14-1-25m0 23l-1-24" stroke="currentColor" fill="none"/><g><path d="M165 102l28 1v23l-29-3" fill="#fc54ee"/><path d="M164 101c9 3 20 0 28-1m-27 1l27 1m-1-1l1 25m0-25v23m-2 0c-4 0-14-1-25 1m27 0l-26-1m1-1c-3-4-4-10-3-22m1 23l-1-23" stroke="currentColor" fill="none"/></g><g><path d="M337 102h29v21h-30" fill="#e08fff"/><path d="M336 101c9 2 13 0 27-1m-27 2h28m0-2c2 10 1 20 2 25m-2-23v22m0 2c-10-2-20-2-27-1m28 0l-29-1m-1-1c2-9 2-16 1-20m0 22v-23" stroke="currentColor" fill="none"/></g><g><path d="M303 103h27l-1 20-28 1" fill="#f071ff"/><path d="M301 103c9-3 15-4 27-2m-25-1l27 2m-2 1l3 22m-1-24v23m1-1h-28m28 0l-29 1m-1 0c-1-8 0-16 3-22m-1 21c0-7-2-13-1-22" stroke="currentColor" fill="none"/></g><g><path d="M372 101l28-1-3 22h-25" fill="#daaeff"/><path d="M373 102c4-2 10-3 25 0m-28-1h29m1-1v24m-1-23v22m-1 1l-28 1m28-2c-8 2-17 0-27 1m0-3c0-7-1-14 1-20m-1 23v-23" stroke="currentColor" fill="none"/></g><g><path d="M17 91c0-4-8-19-2-21 6-1 32 12 39 11 8 0 4-14 5-13 1 0-3 15 2 16s24-12 30-10c5 2 3 18 4 22m-79-6c1-3-5-17 1-18 7-2 33 11 39 10 7-1 3-17 3-16 1 0-4 16 1 18s23-10 29-8c6 3 5 19 6 22" stroke="#ced4da" fill="none"/></g><g><path d="M213 90c0-3-8-19-1-20s35 13 43 13c9-1 5-16 6-15 1 0-5 16 1 18 5 1 26-12 32-10 7 2 5 18 6 21m-88-5c0-3-9-20-2-21 8-2 39 14 48 13 8-1 3-19 4-18 1 0-4 19 2 20 5 2 25-13 31-11 6 1 5 17 5 21" stroke="#ced4da" fill="none"/></g><g><path d="M123 14l76-1v35h-78" fill="#ced4da"/><path d="M122 16c18-3 40-1 79-2m-78 1h77m2-1c-4 10-4 19-1 34m-1-33v35m0-2c-16-1-33 0-76 2m76-1l-77-1m1 1c1-10-1-19-1-32m0 32V15" stroke="transparent" fill="none"/></g><g><path d="M129 21l26 2 2 20h-28" fill="#fc54ee"/><path d="M131 23c5-1 15 2 25 0m-25 1h23m2-2l-3 18m2-18v20m1-1c-8-2-13 2-25-2m23 2h-23m-2-1c0-5 3-9 2-17m0 17l-1-17" stroke="currentColor" fill="none"/></g><g><path d="M163 25l27-3v19l-26-1" fill="#fc54ee"/><path d="M166 24c4-1 12 0 27 1m-29-1l27-1m-2-1c1 6 3 14 1 18m1-16v18m0-2c-7 3-12 0-26 3m26-1l-27-1m1 1l-2-19m1 19c-1-5 0-8 1-18" stroke="currentColor" fill="none"/></g><g><path d="M312 11h87l-3 38-81-2" fill="#ced4da"/><path d="M314 11c25-2 49-2 83-1m-1 1c2 8 3 16 1 36m2 2c-31-4-58-3-86-3m2 1c-1-12 0-24-3-36" stroke="transparent" stroke-width="1.5" fill="none" stroke-dasharray="3 6"/></g><g><path d="M318 17h28l3 22h-30" fill="#f071ff"/><path d="M320 15h27m-27 1h27m2-1c-4 10-3 16-3 25m1-24l1 22m-2 3c-7-2-18-2-27-1m29-2c-9 2-17 1-30 2m1-2l2-22m-2 22c0-4-1-9 1-23" stroke="currentColor" fill="none"/></g><g><path d="M361 16l29 3-1 21-29-1" fill="#daaeff"/><path d="M360 16h28m-27 1l28 1m-1-3c2 8 0 15 2 24m-1-22c1 8-1 15-1 24m3 0c-13-1-23-2-30-1m27 0h-27m-2 0c2-6 1-11 1-25m0 25V17" stroke="currentColor" fill="none"/></g><g><path d="M111 87c0-3-8-16-2-17 7-2 33 11 41 10 7 0 2-14 3-14s-2 14 3 16c5 1 23-11 28-9 6 2 4 19 5 22m-79-6c0-4-5-20 1-21 7-1 31 14 38 14 7-1 3-18 4-18 2 0-3 16 2 18 6 1 25-12 30-10 6 2 4 19 4 22" stroke="#ced4da" fill="none"/></g><g><path d="M319 90c0-4-6-21 0-22 6-2 30 13 37 12 6-1 2-18 3-18s-2 18 4 20c5 1 22-13 28-10 5 2 3 19 4 23m-77-6c0-4-6-21 0-23 7-2 33 13 40 12 7 0 0-16 1-16 1 1-2 18 3 20 5 1 22-14 27-11 5 2 2 22 3 26" stroke="#ced4da" fill="none"/></g></svg>
<figcaption>BRIN index with lower `pages_per_range`</figcaption>
</figure>
<p>To demonstrate, let's create a BRIN index on ranges of 2 adjacent pages and search for the value 5:</p>
<ul>
<li><code>[1β2]</code> - Definitely not here</li>
<li><code>[3β4]</code> - Definitely not here</li>
<li><code>[5β6]</code> - Might be here</li>
<li><code>[7β8]</code> - Definitely not here</li>
<li><code>[9]</code> - Definitely not here</li>
</ul>
<p>Using the index with 2 pages per range we were able to limit the search to blocks 5 and 6. When the range was 3 pages, the index limited the search to blocks 4,5 and 6.</p>
<p>Another difference between the two indexes is that when the range was 3 we only had to keep 3 ranges. When the range was 2 we had to keep 5 ranges so the index was bigger.</p>
<p><strong>Creating a BRIN Index</strong></p>
<p>Using the <code>sales_fact</code> from before, let's create a BRIN index on the column <code>sold_at</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_fact_sold_at_bix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="hll"><span class="gp">db-#</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">BRIN</span><span class="p">(</span><span class="n">sold_at</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">128</span><span class="p">);</span>
</span><span class="go">CREATE INDEX</span>
</pre></div>
<p>This creates a BRIN index with the default <code>pages_per_range = 128</code>.</p>
<p>Let's try to query for a range of sale dates:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">--------------------------------------------------------------------------------------------</span>
<span class="go"> Bitmap Heap Scan on sale_fact (cost=13.11..1135.61 rows=4319 width=41)</span>
<span class="go"> Recheck Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="hll"><span class="go"> Rows Removed by Index Recheck: 23130</span>
</span><span class="go"> Heap Blocks: lossy=256</span>
<span class="go"> -> Bitmap Index Scan on sale_fact_sold_at_bix (cost=0.00..12.03 rows=12500 width=0)</span>
<span class="go"> Index Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="go"> Execution Time: 8.877 ms</span>
</pre></div>
<p>The database used our BRIN index to get a range of sale dates, but that's not the interesting part...</p>
<p><strong>Optimizing <code>pages_per_range</code></strong></p>
<p>According to the execution plan, the database removed 23,130 rows from the pages it found using the index. This may indicate that the range we set for the index it too large for this particular query. Let's try to create an index with less pages per range:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_fact_sold_at_bix64</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="hll"><span class="gp">db-#</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">BRIN</span><span class="p">(</span><span class="n">sold_at</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">pages_per_range</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">64</span><span class="p">);</span>
</span><span class="go">CREATE INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">)</span>
<span class="n">db</span><span class="o">-</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="n">db</span><span class="o">-</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="n">db</span><span class="o">-</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">---------------------------------------------------------------------------------------------</span>
<span class="go"> Bitmap Heap Scan on sale_fact (cost=13.10..1048.10 rows=4319 width=41)</span>
<span class="go"> Recheck Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="hll"><span class="go"> Rows Removed by Index Recheck: 9434</span>
</span><span class="go"> Heap Blocks: lossy=128</span>
<span class="go"> -> Bitmap Index Scan on sale_fact_sold_at_bix64 (cost=0.00..12.02 rows=6667 width=0)</span>
<span class="go"> Index Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="go"> Execution Time: 5.491 ms</span>
</pre></div>
<p>With 64 pages per range the database removed less rows from the pages it found using the the index, only 9,434 were removed compared with 23,130 when the the range was 128 pages. This means the database had to do less IO and the query was slightly faster, ~5.5ms compared to ~8.9ms.</p>
<p>Testing the index with different values for <code>pages_per_range</code> produced the following results:</p>
<table>
<thead>
<tr>
<th>pages_per_range</th>
<th>Rows Removed by Index Recheck</th>
</tr>
</thead>
<tbody>
<tr>
<td>128</td>
<td>23,130</td>
</tr>
<tr>
<td>64</td>
<td>9,434</td>
</tr>
<tr>
<td>8</td>
<td>874</td>
</tr>
<tr>
<td>4</td>
<td>446</td>
</tr>
<tr>
<td>2</td>
<td>446</td>
</tr>
</tbody>
</table>
<p>We can see that as we decrease <code>pages_per_range</code>, the index is more accurate and less rows are removed from the pages found using the index.</p>
<p>Note that we optimized the query for a very specific query. This is fine for demonstration purposes, but in real life it's best to use values that meet the needs of most queries.</p>
<p><strong>Evaluating Index Size</strong></p>
<p>Another big selling point for BRIN indexes is their size. In previous sections we created a B-Tree index on the <code>sold_at</code> field. The size of the index was 2224kB. The size a BRIN index with <code>pages_per_range=128</code> is only 48kb. That's 46 times smaller than the B-Tree index.</p>
<div class="highlight"><pre><span></span> Schema | Name | Type | Owner | Table | Size
--------+-----------------------+-------+-------+-----------+-------
public | sale_fact_sold_at_bix | index | haki | sale_fact | 48 kB
public | sale_fact_sold_at_ix | index | haki | sale_fact | 2224 kB
</pre></div>
<p>The size of a BRIN index is also affected by <code>pages_per_range</code>. For example, a BRIN index with <code>pages_per_range=2</code> weighs 56kb, which is only slightly bigger than 48kb.</p>
<hr>
<h2 id="make-indexes-invisible"><a class="toclink" href="#make-indexes-invisible">Make Indexes "Invisible"</a></h2>
<p>PostgreSQL has a nice feature called <a href="https://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis#Transactional_DDL" rel="noopener">transactional DDL</a>. After years of using Oracle, I got used to DDL commands such as <code>CREATE</code>, <code>DROP</code> and <code>ALTER</code> ending a transaction. However, in PostgreSQL you can perform DDL commands inside a transaction, and changes will take effect only when the transaction is committed.</p>
<p>As I <a href="https://twitter.com/be_haki/status/1282585977668751360?s=20" rel="noopener">recently discovered</a>, using transactional DDL you can make indexes invisible! This comes in handy when you want to see what an execution plan looks like without some index.</p>
<p>For example, in the <code>sale_fact</code> table from the previous section we created an index on <code>sold_at</code>. The execution plan for fetching sales made in July looked like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">--------------------------------------------------------------------------------------------</span>
<span class="hll"><span class="go"> Index Scan using sale_fact_sold_at_ix on sale_fact (cost=0.42..182.80 rows=4319 width=41)</span>
</span><span class="go"> Index Cond: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))P</span>
</pre></div>
<p>To see what the execution plan would be if the index <code>sale_fact_sold_at_ix</code> did not exist, we can drop the index inside a transaction and immediately rollback:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">BEGIN</span><span class="p">;</span>
</span><span class="go">BEGIN</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">sale_fact_sold_at_ix</span><span class="p">;</span>
<span class="go">DROP INDEX</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">EXPLAIN</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale_fact</span>
<span class="gp">db-#</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2020-07-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-07-31'</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">---------------------------------------------------------------------------------</span>
<span class="hll"><span class="go"> Seq Scan on sale_fact (cost=0.00..2435.00 rows=4319 width=41)</span>
</span><span class="go"> Filter: ((sold_at >= '2020-07-01'::date) AND (sold_at <= '2020-07-31'::date))</span>
<span class="hll"><span class="gp">db=#</span><span class="w"> </span><span class="k">ROLLBACK</span><span class="p">;</span>
</span><span class="go">ROLLBACK</span>
</pre></div>
<p>We first start a transaction using <code>BEGIN</code>. Then we drop the index and generate an execution plan. Notice that the execution plan now uses a full table scan, as if the index does not exist. At this point the transaction is still in progress, so the index is not dropped yet. To finish the transaction without dropping the index we rollback the transaction using the <code>ROLLBACK</code> command.</p>
<p>Now, make sure the index still exists:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\di+</span><span class="w"> </span><span class="ss">sale_fact_sold_at_ix</span>
<span class="go"> List of relations</span>
<span class="go"> Schema | Name | Type | Owner | Table | Size</span>
<span class="go">--------+----------------------+-------+-------+-----------+---------</span>
<span class="go"> public | sale_fact_sold_at_ix | index | haki | sale_fact | 2224 kB</span>
</pre></div>
<p>Other database that don't support transactional DDL provide other ways to achieve the same goal. For example, Oracle let's you mark an index as <a href="https://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm#ADMIN12317" rel="noopener">invisible</a>, which will cause the optimizer to ignore it.</p>
<p><strong>CAUTION</strong>: Dropping an index inside a transaction will lock out concurrent selects, inserts, updates, and deletes on the table while the transaction is active. Use with caution in test environments, and avoid on production databases.</p>
<hr>
<h2 id="dont-schedule-long-running-processes-at-round-hours"><a class="toclink" href="#dont-schedule-long-running-processes-at-round-hours">Don't Schedule Long Running Processes at Round Hours</a></h2>
<p>It's a known fact among investors that weird things can happen when a stock's price reaches a nice round number such as 10$, 100$, 1000$. As <a href="https://www.investopedia.com/trading/support-and-resistance-basics/#mntl-sc-block_1-0-38" rel="noopener">the following article</a> explains:</p>
<blockquote>
<p>[...] asset's price may have a difficult time moving beyond a round number, such as $50 or $100 per share. Most inexperienced <strong>traders tend to buy or sell assets when the price is at a whole number</strong> because they are more likely to feel that a stock is fairly valued at such levels.</p>
</blockquote>
<p>Developers in this sense are not all that different than the investors. When they need to schedule a long running process, they will usually schedule it at a round hour.</p>
<figure>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 370 138.4" width="30em"><path d="M42 120l318-1m-316 0l315-1M332 129l27-12m-28 12l29-11M332 108c9 3 15 8 27 9m-28-9l29 10" stroke="currentColor" fill="none"/><path d="M48 90l15 4-1 18-14 2" fill="#f41d92"/><path d="M48 91h15m-14 2h14m-1-2c1 5-1 9 1 20m-1-19l1 20m-1 0c-2 2-8 1-12-1m12 2H48m0 1c0-11-2-16 2-22m-2 20V92" stroke="currentColor" fill="none"/><path d="M65 90l18-1-2 24H69" fill="#f41d92"/><path d="M66 90h14m-12 0l13-1m-2 1v23m2-24l-1 24m0 1l-14-1m15 0c-3-1-7 0-14 1m2-2c-4-6-1-8-2-21m0 22V89" stroke="currentColor" fill="none"/><path d="M88 32l13-2v84l-14-1" fill="#f41d92"/><path d="M88 31c4 0 11-1 14 1m-15-1l14-1m2 0c-2 16-1 33-2 83m1-83l-1 82m2 1c-7-1-11-2-16-1m15 1l-14-1m-2-1c-1-25 3-53 2-79m-1 81V30" stroke="currentColor" fill="none"/><path d="M108 89l12 1 2 20-14 2" fill="#f41d92"/><path d="M106 89h16m-15 1l13-1m0 0c-1 8-1 12 1 20m-1-18v20m2-1c-6 2-11 0-16 1m15-1l-15 1m-1-3c2-1 1-8 3-19m-1 22c0-7 0-13-2-21" stroke="currentColor" fill="none"/><path d="M125 87l14 2v23l-12-1" fill="#f41d92"/><path d="M125 87c3-1 7 1 13 1m-12-1h12m0-1c2 5 1 12 0 25m2-24v24m-1 0l-13-1m12 1h-12m-3 0c4-10 4-17 2-24m1 24V88" stroke="currentColor" fill="none"/><path d="M143 94h14v18l-13-3" fill="#f41d92"/><path d="M144 94c3 0 10-1 12 1m-12-1h11m-1 1c0 5 2 10 1 15m1-16l-1 16m0 2h-12m13-1h-12m0 2V93m0 17V94" stroke="currentColor" fill="none"/><path d="M159 93h12l1 19h-14" fill="#f41d92"/><path d="M160 91l12 1m-12 0h10m2-2c-2 8-2 12-2 22m1-21c0 5 1 11-1 21m2-2l-12 2m11-1h-11m1 0l-3-20m1 20V91" stroke="currentColor" fill="none"/><path d="M197 87l13 2-1 20h-13" fill="#f41d92"/><path d="M194 88c6-2 9 0 16 0m-15-1h14m-2 0c2 5 1 12 2 23m-1-23l2 25m-2-1l-12 1m13 0l-14-2m0 2c1-6 0-12-1-23m1 22V87" stroke="currentColor" fill="none"/><path d="M213 95h14l-3 17-12 1" fill="#f41d92"/><path d="M214 95h12m-13-1h12m0 1v17m0-18l1 17m-1 0l-11-1m12 1h-13m2 1c-2-6-3-13 0-16m-2 14V95" stroke="currentColor" fill="none"/><path d="M230 93l11-3v20l-10 2" fill="#f41d92"/><path d="M230 91c3 2 7 1 11 1m-12-1h11m1 0c-1 7 0 14 1 21m-2-19v19m0 0h-12m12 0h-11m2-1c-2-9-3-14 0-19m-3 20c2-5 1-10 2-20" stroke="currentColor" fill="none"/><path d="M247 88h11v25l-11-3" fill="#f41d92"/><path d="M245 88h15m-14 0h13m0 0l1 24m1-24c-2 6-1 10-1 25m0-2c-3 1-6-1-13 1m13-1l-14 1m-1-1c-1-7 2-11 2-25m-1 26V88" stroke="currentColor" fill="none"/><path d="M265 93h12l1 19-15 1" fill="#f41d92"/><path d="M265 96l12-1m-13-1h13m-1 0v16m0-16l1 17m-2 1h-10m11 0h-11m-1-2c0-3 2-10-1-15m2 16V95" stroke="currentColor" fill="none"/><g><path d="M279 93l11 1 2 17-10 3" fill="#f41d92"/><path d="M281 93c4 0 8 0 11-2m-12 2l11-1m0 0c0 8 1 14-1 19m1-18c2 4 1 10 0 19m1 0c-3-1-6 0-11 1m11-1h-12m-1-1c3-5 2-15 2-19m-1 20l1-19" stroke="currentColor" fill="none"/></g><g><path d="M298 34l16 3-3 76-12-1" fill="#f41d92"/><path d="M298 37l13-2m-13 1l14 1m0-1c2 20 0 36-1 74m2-73l-1 74m1 1c-7 0-11 0-14-2m13 1l-14 1m-1 1c2-21-1-41 0-79m0 77l1-76" stroke="currentColor" fill="none"/></g><g><path d="M177 9l15 2-1 100-14-1" fill="#f41d92"/><path d="M176 10l14-1m-15 1h15m1 0c-2 31-2 65-1 100m1-99l-1 101m1-1l-15 2m15-1l-15-1m0-1c0-25-3-50 2-101m-2 103c1-29-1-58-1-101" stroke="currentColor" fill="none"/></g><g><path d="M12 115l-1 12m3-13l-4 11" stroke="currentColor" fill="none"/></g><g><path d="M15 115l7 13m-9-11l11 10" stroke="currentColor" fill="none"/></g><g><path d="M21 123H10m11 0H10" stroke="currentColor" fill="none"/></g><g><path d="M27 114l-1 14m3-15l-1 13" stroke="currentColor" fill="none"/></g><g><path d="M27 114l5 7m-4-5l3 7" stroke="currentColor" fill="none"/></g><g><path d="M36 115l-3 4m2-4l-2 5" stroke="currentColor" fill="none"/></g><g><path d="M34 118l3 7m-2-8l4 6" stroke="currentColor" fill="none"/></g><g><path d="M94 115l1 11m-1-11l1 11" stroke="currentColor" fill="none"/></g><g><path d="M184 114v11m0-11v11" stroke="currentColor" fill="none"/></g><g><path d="M305 114l1 11m-1-11l1 11" stroke="currentColor" fill="none"/></g></svg>
<figcaption>Typical load on a system during the night</figcaption>
</figure>
<p>This tendency to schedule tasks at round hours can cause some unusual loads during these times. So, if you need to schedule some long running process, you have a better chance of finding a system at rest if you schedule at another time.</p>
<p>Another good idea is to apply a random delay to the task's schedule, so it doesn't run at the same time every time. This way, even if another task is scheduled to run at the same time, it won't be a big problem. If you use <code>systemd</code> timer units to schedule your tasks, you can use the <a href="https://www.freedesktop.org/software/systemd/man/systemd.timer.html#RandomizedDelaySec=" rel="noopener">RandomizedDelaySec</a> option for this.</p>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>This article covers some trivial and non-trivial tips from my own experience. Some of these tips are easy to implement, and some require a deeper understanding of how the database works. Databases are the backbone of most modern systems, so taking some time to understand how they work is a good investment for any developer!</p>
<p><em>This article was reviewed by the great team at <a href="https://www.pgmustard.com/" rel="noopener">pgMustard</a></em></p>Stop Using datetime.now!2020-06-01T00:00:00+03:002020-06-01T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-06-01:/python-dependency-injection<p>If you ever had a test that one day just started to fail, unprovoked, or a test that fails once every blue moon for no apparent reason, it's possible your code is relying on something that is not deterministic. In this article I describe a practical approach to dependency injection in Python that when used correctly, can eliminate nondeterminism and make your code easier to maintain and to test.</p><hr>
<p>One of my favorite job interview questions is this:</p>
<blockquote>
<p>Write a function that returns tomorrow's date</p>
</blockquote>
<p>This looks innocent enough for someone to suggest this as a solution:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">tomorrow</span><span class="p">()</span> <span class="o">-></span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">:</span>
<span class="k">return</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="o">.</span><span class="n">today</span><span class="p">()</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>This will work, but there is a followup question:</p>
<blockquote>
<p>How would you test this function?</p>
</blockquote>
<p>Before you move on.... take a second to think about <em>your</em> answer.</p>
<figure><img alt="One of these pigeons is a mock<br><small>Photo by <a href="https://www.pexels.com/photo/two-pigeon-perched-on-white-track-light-681447/">Pedro Figueras</a></small>" src="https://hakibenita.com/images/00-python-dependency-injection.jpg"><figcaption>One of these pigeons is a mock<br><small>Photo by <a href="https://www.pexels.com/photo/two-pigeon-perched-on-white-track-light-681447/">Pedro Figueras</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#naive-approach">Naive Approach</a></li>
<li><a href="#dependency-injection">Dependency Injection</a><ul>
<li><a href="#dependency-injection-in-the-wild">Dependency Injection in The Wild</a></li>
<li><a href="#injecting-functions">Injecting Functions</a></li>
<li><a href="#injecting-values">Injecting Values</a></li>
<li><a href="#when-to-instantiate-injected-values">When to Instantiate Injected Values</a></li>
</ul>
</li>
<li><a href="#dependency-injection-in-practice">Dependency Injection in Practice</a><ul>
<li><a href="#ip-lookup">IP Lookup</a></li>
<li><a href="#assigning-responsibility">Assigning Responsibility</a></li>
<li><a href="#using-a-service">Using a Service</a></li>
<li><a href="#changing-implementations">Changing Implementations</a></li>
<li><a href="#typing-services">Typing Services</a></li>
<li><a href="#using-a-protocol">Using a Protocol</a></li>
<li><a href="#nondeterminism-and-side-effects">Nondeterminism and Side-Effects</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="naive-approach"><a class="toclink" href="#naive-approach">Naive Approach</a></h2>
<p>The most naive approach to test a function that returns tomorrow's date is this:</p>
<div class="highlight"><pre><span></span><span class="c1"># Bad</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">()</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">16</span><span class="p">)</span>
</pre></div>
<p>This test will pass <em>today</em>, but it will fail on any other day.</p>
<p>Another way to test the function is this:</p>
<div class="highlight"><pre><span></span><span class="c1"># Bad</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">()</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="o">.</span><span class="n">today</span><span class="p">()</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>This will also work, but there is an inherent problem with this approach. The same way you can't define a word in the dictionary using itself, <strong>you should not test a function by repeating its implementation.</strong></p>
<p>Another problem with this approach is that it's only testing one scenario, for the day it is executed. What about getting the next day across a month or a year? What about the day after 2020-02-28?</p>
<p>The problem with both implementations is that <code>today</code> is set inside the function, and to simulate different test scenarios you need to control this value. One solution that comes to mind is to mock <code>datetime.date</code>, and try to set the value returned by <code>today()</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'datetime.date.today'</span><span class="p">,</span> <span class="n">return_value</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)):</span>
<span class="gp">... </span> <span class="k">assert</span> <span class="n">tomorrow</span><span class="p">()</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
File <span class="nb">"/usr/lib/python3.7/unittest/mock.py"</span>, line <span class="m">1410</span>, in <span class="n">__enter__</span>
<span class="w"> </span><span class="nb">setattr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">target</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">attribute</span><span class="p">,</span> <span class="n">new_attr</span><span class="p">)</span>
<span class="gr">TypeError</span>: <span class="n">can't set attributes of built-in/extension type 'datetime.date'</span>
</pre></div>
<p>As the exception suggests, built-in modules written in C cannot be mocked. The <code>unittest.mock</code> documentation specifically addresses this attempt to <a href="https://docs.python.org/3/library/unittest.mock-examples.html#partial-mocking" rel="noopener">mock the datetime module</a>. Apparently, this is a very common issue and the writers of the official documentation felt it's worth mentioning. They even go the extra mile and link to a <a href="https://williambert.online/2011/07/how-to-unit-testing-in-django-with-mocking-and-patching/" rel="noopener">blog post</a> on this exact problem. The article is worth a read, and we are going to address the solution it presents later on.</p>
<p>Like every other problem in Python, there are libraries that provide a solution. Two libraries that stand out are <a href="https://pypi.org/project/freezegun/" rel="noopener"><code>freezegun</code></a> and <a href="https://pypi.org/project/libfaketime/" rel="noopener"><code>libfaketime</code></a>. Both provide the ability to mock time at different levels. However, resorting to external libraries is a luxury only developers of legacy system can afford. For new projects, or projects that are small enough to change, there are other alternatives that can keep the project free of these dependencies.</p>
<hr>
<h2 id="dependency-injection"><a class="toclink" href="#dependency-injection">Dependency Injection</a></h2>
<p>The problem we were trying to solve with mock, can also be solved by changing the function's API:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">tomorrow</span><span class="p">(</span><span class="n">asof</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">)</span> <span class="o">-></span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">:</span>
<span class="k">return</span> <span class="n">asof</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>To control the reference time of the function, the time can be provided as an argument. This makes it easier to test the function in different scenarios:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">(</span><span class="n">asof</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">(</span><span class="n">asof</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2019</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">31</span><span class="p">))</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">(</span><span class="n">asof</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">28</span><span class="p">))</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">29</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">tomorrow</span><span class="p">(</span><span class="n">asof</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2021</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">28</span><span class="p">))</span> <span class="o">==</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2021</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>To remove the function's dependency on <code>datetime.date.today</code>, we provide today's date as an argument. This pattern of providing, or "injecting" dependencies into functions and objects is often called "dependency injection", or in short "DI".</p>
<h3 id="dependency-injection-in-the-wild"><a class="toclink" href="#dependency-injection-in-the-wild">Dependency Injection in The Wild</a></h3>
<p>Dependency injection is a way to decouple modules from each other. As our previous example shows, the function <code>tomorrow</code> no longer depends on <code>today</code>.</p>
<p>Using dependency injection is very common and often very intuitive. It's very likely that you already use it without even knowing. For example, <a href="https://python-patterns.guide/gang-of-four/factory-method/#dodge-use-dependency-injection" rel="noopener">this article</a> suggests that providing an open file to <code>json.load</code> is a form of dependency injection:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'path/to/file.json'</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</pre></div>
<p>The popular test framework pytest builds its entire fixture infrastructure around the <a href="https://docs.pytest.org/en/latest/fixture.html#fixtures-a-prime-example-of-dependency-injection" rel="noopener">concept of dependency injection</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pytest</span>
<span class="nd">@pytest</span><span class="o">.</span><span class="n">fixture</span>
<span class="k">def</span> <span class="nf">one</span><span class="p">()</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">1</span>
<span class="nd">@pytest</span><span class="o">.</span><span class="n">fixture</span>
<span class="k">def</span> <span class="nf">two</span><span class="p">()</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">2</span>
<span class="k">def</span> <span class="nf">test_one_is_less_than_two</span><span class="p">(</span><span class="n">one</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">two</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">one</span> <span class="o"><</span> <span class="n">two</span>
</pre></div>
<p>The functions <code>one</code> and <code>two</code> are declared as fixtures. When pytest executes the test function <code>test_one_is_less_than_two</code>, it will provide it with the values returned by the fixture functions matching the attribute names. In pytest, the injection is magically happening simply by using the name of a known fixture as an argument.</p>
<p>Dependency injection is not limited just to Python. The popular JavaScript framework <a href="https://angular.io/" rel="noopener">Angular</a> is also built around <a href="https://angular.io/guide/dependency-injection" rel="noopener">dependency injection</a>:</p>
<div class="highlight"><pre><span></span><span class="kd">@Component</span><span class="p">({</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="s1">'order-list'</span><span class="p">,</span>
<span class="w"> </span><span class="nx">template</span><span class="o">:</span><span class="w"> </span><span class="sb">`...`</span>
<span class="p">})</span>
<span class="k">export</span><span class="w"> </span><span class="kd">class</span><span class="w"> </span><span class="nx">OrderListComponent</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">orders</span><span class="o">:</span><span class="w"> </span><span class="kt">Order</span><span class="p">[];</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">orderService</span><span class="o">:</span><span class="w"> </span><span class="kt">OrderService</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">orders</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">orderService</span><span class="p">.</span><span class="nx">getOrders</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Notice how the <code>orderService</code> is provided, or injected, to the constructor. The component is using the order service, but is not instantiating it.</p>
<h3 id="injecting-functions"><a class="toclink" href="#injecting-functions">Injecting Functions</a></h3>
<p>Sometimes injecting a value is not enough. For example, what if we need to get the current date before and after some operation:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Tuple</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">go</span><span class="p">()</span> <span class="o">-></span> <span class="n">Tuple</span><span class="p">[</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">]:</span>
<span class="n">started_at</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="c1"># Do something ...</span>
<span class="n">ended_at</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="k">return</span> <span class="n">started_at</span><span class="p">,</span> <span class="n">ended_at</span>
</pre></div>
<p>To test this function, we can provide the start time like we did before, but we can't provide the end time. One way to approach this is to make the calls to start and end outside the function. This is a valid solution, but for the sake of discussion we'll assume they need to be called inside.</p>
<p>Since we can't mock <code>datetime.datetime</code> itself, one way to make this function testable is to create a separate function that returns the current date:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Tuple</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="hll"><span class="k">def</span> <span class="nf">now</span><span class="p">()</span> <span class="o">-></span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">:</span>
</span><span class="hll"> <span class="k">return</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
</span>
<span class="k">def</span> <span class="nf">go</span><span class="p">()</span> <span class="o">-></span> <span class="n">Tuple</span><span class="p">[</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">]:</span>
<span class="hll"> <span class="n">started_at</span> <span class="o">=</span> <span class="n">now</span><span class="p">()</span>
</span> <span class="c1"># Do something ...</span>
<span class="hll"> <span class="n">ended_at</span> <span class="o">=</span> <span class="n">now</span><span class="p">()</span>
</span> <span class="k">return</span> <span class="n">started_at</span><span class="p">,</span> <span class="n">ended_at</span>
</pre></div>
<p>To control the values returned by the function <code>now</code> in tests, we can use a mock:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="gp">>>> </span><span class="n">fake_start</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">fake_end</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">mock</span><span class="p">(</span><span class="s1">'__main__.now'</span><span class="p">,</span> <span class="n">side_effect</span><span class="o">=</span><span class="p">[</span><span class="n">fake_start</span><span class="p">,</span> <span class="n">fake_end</span><span class="p">]):</span>
<span class="gp">... </span> <span class="n">go</span><span class="p">()</span>
<span class="go">(datetime.datetime(2020, 1, 1, 15, 0),</span>
<span class="go"> datetime.datetime(2020, 1, 1, 15, 1, 30))</span>
</pre></div>
<p>Another way to approach this without mocking, is to rewrite the function once again:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Tuple</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">go</span><span class="p">(</span>
<span class="n">now</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[],</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">],</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">Tuple</span><span class="p">[</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">]:</span>
<span class="n">started_at</span> <span class="o">=</span> <span class="n">now</span><span class="p">()</span>
<span class="c1"># Do something ...</span>
<span class="n">ended_at</span> <span class="o">=</span> <span class="n">now</span><span class="p">()</span>
<span class="k">return</span> <span class="n">started_at</span><span class="p">,</span> <span class="n">ended_at</span>
</pre></div>
<p>This time we provide the function with another function that returns a datetime. This is very similar to the first solution we suggested, when we injected the datetime itself to the function.</p>
<p>The function can now be used like this:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">go</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">)</span>
<span class="go">(datetime.datetime(2020, 4, 18, 14, 14, 5, 687471),</span>
<span class="go"> datetime.datetime(2020, 4, 18, 14, 14, 5, 687475))</span>
</pre></div>
<p>To test it, we provide a different function that returns known datetimes:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">fake_start</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">fake_end</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">gen</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">([</span><span class="n">fake_start</span><span class="p">,</span> <span class="n">fake_end</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">go</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="nb">next</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span>
<span class="go">(datetime.datetime(2020, 1, 1, 15, 0),</span>
<span class="go"> datetime.datetime(2020, 1, 1, 15, 1, 30))</span>
</pre></div>
<p>This pattern can be generalized even more using a utility object:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterator</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">ticker</span><span class="p">(</span>
<span class="n">start</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">,</span>
<span class="n">interval</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">]:</span>
<span class="w"> </span><span class="sd">"""Generate an unending stream of datetimes in fixed intervals.</span>
<span class="sd"> Useful to test processes which require datetime for each step.</span>
<span class="sd"> """</span>
<span class="n">current</span> <span class="o">=</span> <span class="n">start</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">current</span>
<span class="n">current</span> <span class="o">+=</span> <span class="n">interval</span>
</pre></div>
<p>Using <code>ticker</code>, the test will now look like this:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">gen</span> <span class="o">=</span> <span class="n">ticker</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2020</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">seconds</span><span class="o">=</span><span class="mi">90</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">go</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="nb">next</span><span class="p">(</span><span class="n">gen</span><span class="p">)))</span>
<span class="go">(datetime.datetime(2020, 1, 1, 15, 0),</span>
<span class="go"> datetime.datetime(2020, 1, 1, 15, 1, 30))</span>
</pre></div>
<p>Fun fact: the name "ticker" was <a href="https://gobyexample.com/tickers" rel="noopener">stolen from Go</a>.</p>
<h3 id="injecting-values"><a class="toclink" href="#injecting-values">Injecting Values</a></h3>
<p>The previous sections demonstrate injection of both values and functions. It's clear from the examples that injecting values is much simpler. This is why it's usually favorable to inject values rather than functions.</p>
<p>Another reason is consistency. Take this common pattern that is often used in Django models:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">auto_now_add</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">modified</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">auto_now</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>The model <code>Order</code> includes two datetime fields, <code>created</code> and <code>modified</code>. It uses Django's <a href="https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.DateField.auto_now" rel="noopener"><code>auto_now_add</code></a> attribute to automatically set <code>created</code> when the object is saved for the first time, and <a href="https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.DateField.auto_now_add" rel="noopener"><code>auto_now</code></a> to set <code>modified</code> every time the object is saved.</p>
<p>Say we create a new order and save it to the database:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">()</span>
</pre></div>
<p>Would you expect this test to fail:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">assert</span> <span class="n">o</span><span class="o">.</span><span class="n">created</span> <span class="o">==</span> <span class="n">o</span><span class="o">.</span><span class="n">modified</span>
<span class="go">False</span>
</pre></div>
<p>This is very unexpected. How can an object that was just created have two different values for <code>created</code> and <code>modified</code>? Can you imagine what would happen if you rely on <code>modified</code> and <code>created</code> to be equal when an object was never changed, and actually use it to identify unchanged objects:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">F</span>
<span class="c1"># Wrong!</span>
<span class="k">def</span> <span class="nf">get_unchanged_objects</span><span class="p">():</span>
<span class="k">return</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'modified'</span><span class="p">))</span>
</pre></div>
<p>For the <code>Order</code> model above, this function will always return an empty queryset.</p>
<p>The reason for this unexpected behavior is that each individual <code>DateTimeField</code> is using <code>django.timezone.now</code> internally during <code>save()</code> to get the current time. The time between when the two fields are populated by Django causes the values to end up slightly different:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">o</span><span class="o">.</span><span class="n">created</span>
<span class="go">datetime.datetime(2020, 4, 18, 11, 41, 35, 740909, tzinfo=<UTC>)</span>
<span class="gp">>>> </span><span class="n">o</span><span class="o">.</span><span class="n">modified</span>
<span class="go">datetime.datetime(2020, 4, 18, 11, 41, 35, 741015, tzinfo=<UTC>)</span>
</pre></div>
<p>If we treat <code>timezone.now</code> like an injected function, we understand the inconsistencies it may cause.</p>
<p><strong>So, can this be avoided?</strong> Can <code>created</code> and <code>modified</code> be equal when the object is first created? I'm sure there are a lot of hacks, libraries and other such exotic solutions but the truth is much simpler. If you want to make sure these two fields are equal when the object is first created, you better avoid <code>auto_now</code> and <code>auto_now_add</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">modified</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
</pre></div>
<p>Then, when you create a new instance, explicitly provide the values for both fields:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="gp">>>> </span><span class="n">asof</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="hll"><span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">created</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span> <span class="n">modified</span><span class="o">=</span><span class="n">asof</span><span class="p">)</span>
</span><span class="gp">>>> </span><span class="k">assert</span> <span class="n">o</span><span class="o">.</span><span class="n">created</span> <span class="o">==</span> <span class="n">o</span><span class="o">.</span><span class="n">modified</span>
<span class="gp">>>> </span><span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'modified'</span><span class="p">))</span>
<span class="go"><QuerySet [<Order: Order object (2)>]></span>
</pre></div>
<p>To quote the "Zen of Python", explicit is better than implicit. Explicitly providing the values for the fields requires a bit more work, but this is a small price to pay for reliable and predictable data.</p>
<div class="admonition info">
<p class="admonition-title">using auto_now and auto_now_add</p>
<p>When is it OK to use <code>auto_now</code> and <code>auto_now_add</code>? Usually when a date is used for audit purposes and not for business logic, it's fine to make this shortcut and use <code>auto_now</code> or <code>auto_now_add</code>.</p>
</div>
<h3 id="when-to-instantiate-injected-values"><a class="toclink" href="#when-to-instantiate-injected-values">When to Instantiate Injected Values</a></h3>
<p>Injecting values poses another interesting question, at what point should the value be set? The answer to this is "it depends", but there is a rule of thumb that is usually correct: <strong>values should be instantiated at the topmost level</strong>.</p>
<p>For example, if <code>asof</code> represents when an order is created, a website backend serving a store front may set this value when the request is received. In a normal Django setup, this means that the value should be set by the view. Another common example is a scheduled job. If you have jobs that use management commands, <code>asof</code> should be set by the management command.</p>
<p>Setting the values at the topmost level guarantees that the <strong>lower levels remain decoupled and easier to test</strong>. The level at which injected values are set, is the level that you will usually need to use mock to test. In the example above, setting <code>asof</code> in the view will make the models easier to test.</p>
<p>Other than testing and correctness, another benefit of setting values explicitly rather than implicitly, is that it gives you more control over your data. For example, in the website scenario, an order's creation date is set by the view immediately when the request is received. However, if you process a batch file from a large customer, the time in which the order was created may well be in the past, when the customer first created the files. By avoiding "auto-magically" generated dates, we can implement this by passing the past date as an argument.</p>
<hr>
<h2 id="dependency-injection-in-practice"><a class="toclink" href="#dependency-injection-in-practice">Dependency Injection in Practice</a></h2>
<p>The best way to understand the benefits of DI and the motivation for it is using a real life example.</p>
<h3 id="ip-lookup"><a class="toclink" href="#ip-lookup">IP Lookup</a></h3>
<p>Say we want to try and guess where visitors to our Django site are coming from, and we decide to try an use the IP address from the request to do that. An initial implementation can look like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
<span class="kn">from</span> <span class="nn">django.http</span> <span class="kn">import</span> <span class="n">HttpRequest</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="k">def</span> <span class="nf">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">ip</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'REMOTE_ADDR'</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'HTTP_X_FORWARDED_FOR'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">ip</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">ip</span> <span class="o">==</span> <span class="s1">''</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s1">'https://ip-api.com/json/</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="s1">'status'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'success'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="s1">'countryCode'</span><span class="p">]</span>
</pre></div>
<p>This single function accepts an <code>HttpRequest</code>, tries to extract an IP address from the request headers, and then uses the <code>requests</code> library to call an external service to get the country code.</p>
<div class="admonition info">
<p class="admonition-title">ip lookup</p>
<p>I'm using the free service <a href="https://ip-api.com" rel="noopener">https://ip-api.com</a> to lookup a country from an IP. I'm using this service just for demonstration purposes. I'm not familiar with it, so don't see this as a recommendation to use it.</p>
</div>
<p>Let's try to use this function:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="gp">>>> </span><span class="n">rf</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">request</span> <span class="o">=</span> <span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="go">'US'</span>
</pre></div>
<p>OK, so it works. Notice that to use it we created an <a href="https://docs.djangoproject.com/en/3.0/ref/request-response/#django.http.HttpRequest" rel="noopener"><code>HttpRequest</code> object</a> using <a href="https://docs.djangoproject.com/en/3.0/topics/testing/advanced/#django.test.RequestFactory" rel="noopener">Django's <code>RequestFactory</code></a></p>
<p>Let's try to write a test for a scenario when a country code is found:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">responses</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="n">rf</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span>
<span class="k">with</span> <span class="n">responses</span><span class="o">.</span><span class="n">RequestsMock</span><span class="p">()</span> <span class="k">as</span> <span class="n">rsps</span><span class="p">:</span>
<span class="n">url_pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'http://ip-api.com/json/[0-9\.]+'</span><span class="p">)</span>
<span class="n">rsps</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">responses</span><span class="o">.</span><span class="n">GET</span><span class="p">,</span> <span class="n">url_pattern</span><span class="p">,</span> <span class="n">status</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">content_type</span><span class="o">=</span><span class="s1">'application/json'</span><span class="p">,</span> <span class="n">body</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span>
<span class="s1">'status'</span><span class="p">:</span> <span class="s1">'success'</span><span class="p">,</span>
<span class="s1">'countryCode'</span><span class="p">:</span> <span class="s1">'US'</span>
<span class="p">}))</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="n">countryCode</span> <span class="o">=</span> <span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">countryCode</span> <span class="o">==</span> <span class="s1">'US'</span>
</pre></div>
<p>The function is using the <code>requests</code> library internally to make a request to the external API. To mock the response, we used the <a href="https://github.com/getsentry/responses" rel="noopener"><code>responses</code></a> library.</p>
<p>If you look at this test and feel like it's very complicated than you are right. To test the function we had to do the following:</p>
<ul>
<li>Generate a Django request using a <code>RequestFactory</code>.</li>
<li>Mock a <code>requests</code> response using <code>responses</code>.</li>
<li>Have knowledge of the inner works of the function (what url it uses).</li>
</ul>
<p>That last point is where it gets hairy. To test the function we used our knowledge of how the function is implemented: what endpoint it uses, how the URL is structured, what method it uses and what the response looks like. This creates an implicit dependency between the test and the implementation. In other words, <strong>the implementation of the function cannot change without changing the test as well</strong>. This type of unhealthy dependency is both unexpected, and prevents us from treating the function as a "black box".</p>
<p>Also, notice that that we only tested one scenario. If you look at the coverage of this test you'll find that it's very low. So next, we try and simplify this function.</p>
<h3 id="assigning-responsibility"><a class="toclink" href="#assigning-responsibility">Assigning Responsibility</a></h3>
<p>One of the techniques to make functions easier to test is to remove dependencies. Our IP function currently depends on Django's <code>HttpRequest</code>, the <code>requests</code> library and implicitly on the external service. Let's start by moving the part of the function that handles the external service to a separate function:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s1">'http://ip-api.com/json/</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="s1">'status'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'success'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="s1">'countryCode'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">ip</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'REMOTE_ADDR'</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'HTTP_X_FORWARDED_FOR'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">ip</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">ip</span> <span class="o">==</span> <span class="s1">''</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">get_country_from_ip</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
</pre></div>
<p>We now have two functions:</p>
<ul>
<li><code>get_country_from_ip</code>: receives an IP address and returns the country code.</li>
<li><code>get_country_from_request</code>: accepts a Django <code>HttpRequest</code>, extract the IP from the header, and then uses the first function to find the country code.</li>
</ul>
<p>After splitting the function we can now search an IP directly, without crating a request:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">get_country_from_ip</span><span class="p">(</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="go">'US'</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="gp">>>> </span><span class="n">request</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="go">'US'</span>
</pre></div>
<p>Now, let's write a test for this function:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">responses</span>
<span class="k">with</span> <span class="n">responses</span><span class="o">.</span><span class="n">RequestsMock</span><span class="p">()</span> <span class="k">as</span> <span class="n">rsps</span><span class="p">:</span>
<span class="n">url_pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'http://ip-api.com/json/[0-9\.]+'</span><span class="p">)</span>
<span class="n">rsps</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">responses</span><span class="o">.</span><span class="n">GET</span><span class="p">,</span> <span class="n">url_pattern</span><span class="p">,</span> <span class="n">status</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">content_type</span><span class="o">=</span><span class="s1">'application/json'</span><span class="p">,</span> <span class="n">body</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span>
<span class="s1">'status'</span><span class="p">:</span> <span class="s1">'success'</span><span class="p">,</span>
<span class="s1">'countryCode'</span><span class="p">:</span> <span class="s1">'US'</span>
<span class="p">}))</span>
<span class="n">country_code</span> <span class="o">=</span> <span class="n">get_country_from_ip</span><span class="p">(</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">country_code</span> <span class="o">==</span> <span class="s1">'US'</span>
</pre></div>
<p>This test looks similar to the previous one, but we no longer need to use <code>RequestFactory</code>. Because we have a separate function that retrieves the country code for an IP directly, we don't need to "fake" a Django <code>HttpRequest</code>.</p>
<p>Having said that, we still want to make sure the top level function works, and that the IP is being extracted from the request correctly:</p>
<div class="highlight"><pre><span></span><span class="c1"># BAD EXAMPLE!</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">responses</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="n">rf</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span>
<span class="n">request_with_no_ip</span> <span class="o">=</span> <span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="n">country_code</span> <span class="o">=</span> <span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request_with_no_ip</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">country_code</span> <span class="ow">is</span> <span class="kc">None</span>
</pre></div>
<p>We created a request with no IP and the function returned <code>None</code>. With this outcome, can we really say for sure that the function works as expected? Can we tell that the function returned <code>None</code> because it couldn't extract the IP from the request, or because the country lookup returned nothing?</p>
<p>Someone once told me that if to describe what a function does you need to use the words "and" or "or", you can probably benefit from splitting it. This is the layman's version of the <a href="https://en.wikipedia.org/wiki/Single-responsibility_principle" rel="noopener">Single-responsibility principle</a> that dictates that <strong>every class or function should have just one reason to change</strong>.</p>
<p>The function <code>get_country_from_request</code> extracts the IP from a request <em>and</em> tries to find the country code for it. So, if the rule is correct, we need to split it up:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_ip_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">ip</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'REMOTE_ADDR'</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'HTTP_X_FORWARDED_FOR'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">ip</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">ip</span> <span class="o">==</span> <span class="s1">''</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">ip</span>
<span class="c1"># Maintain backward compatibility</span>
<span class="k">def</span> <span class="nf">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">ip</span> <span class="o">=</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ip</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">get_country_from_ip</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
</pre></div>
<p>To be able to test if we extract an IP from a request correctly, we yanked this part to a separate function. We can now test this function separately:</p>
<div class="highlight"><pre><span></span><span class="n">rf</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">))</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'0.0.0.0'</span><span class="p">))</span> <span class="o">==</span> <span class="s1">'0.0.0.0'</span>
<span class="k">assert</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">HTTP_X_FORWARDED_FOR</span><span class="o">=</span><span class="s1">'0.0.0.0'</span><span class="p">))</span> <span class="o">==</span> <span class="s1">'0.0.0.0'</span>
<span class="k">assert</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">rf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'0.0.0.0'</span><span class="p">,</span> <span class="n">HTTP_X_FORWARDED_FOR</span><span class="o">=</span><span class="s1">'1.1.1.1'</span><span class="p">))</span> <span class="o">==</span><span class="s1">'0.0.0.0'</span>
</pre></div>
<p>With just these 5 lines of code we covered a lot more possible scenarios.</p>
<h3 id="using-a-service"><a class="toclink" href="#using-a-service">Using a Service</a></h3>
<p>So far we've implemented unit tests for the function that extracts the IP from the request, and made it possible to do a country lookup using just an IP address. The tests for the top level function are still very messy. Because we use <code>requests</code> inside the function, we were forced to use <code>responses</code> as well to test it. There is nothing wrong with <code>responses</code>, but the less dependencies the better.</p>
<p>Invoking a request inside the function creates an implicit dependency between this function and the <code>requests</code> library. One way to eliminate this dependency is to extract the part making the request to a separate service:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">requests</span>
<span class="k">class</span> <span class="nc">IpLookupService</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">base_url</span> <span class="o">=</span> <span class="n">base_url</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">base_url</span><span class="si">}</span><span class="s1">/json/</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="s1">'status'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'success'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="s1">'countryCode'</span><span class="p">]</span>
</pre></div>
<p>The new <code>IpLookupService</code> is instantiated with the base url for the service, and provides a single function to get a country from an IP:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">ip_lookup_service</span> <span class="o">=</span> <span class="n">IpLookupService</span><span class="p">(</span><span class="s1">'http://ip-api.com'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ip_lookup_service</span><span class="o">.</span><span class="n">get_country_from_ip</span><span class="p">(</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="go">'US'</span>
</pre></div>
<p>Constructing services this way has many benefits:</p>
<ul>
<li>Encapsulate all the logic related to IP lookup</li>
<li>Provides a single interface with type annotations</li>
<li>Can be reused</li>
<li>Can be tested separately</li>
<li>Can be developed separately (as long as the API it provides remains unchanged)</li>
<li>Can be adjusted for different environments (for example, use a different URL for test and production)</li>
</ul>
<p>The top level function should also change. Instead of making requests on its own, it uses the service:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_country_from_request</span><span class="p">(</span>
<span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">,</span>
<span class="n">ip_lookup_service</span><span class="p">:</span> <span class="n">IpLookupService</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">ip</span> <span class="o">=</span> <span class="n">get_ip_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ip</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">ip_lookup_service</span><span class="o">.</span><span class="n">get_country_from_ip</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
</pre></div>
<p>To use the function, we pass an instance of the service to it:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">ip_lookup_service</span> <span class="o">=</span> <span class="n">IpLookupService</span><span class="p">(</span><span class="s1">'http://ip-api.com'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">request</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">ip_lookup_service</span><span class="p">)</span>
<span class="go">'US'</span>
</pre></div>
<p>Now that we have full control of the service, we can test the top level function without using <code>responses</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="n">fake_ip_lookup_service</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">create_autospec</span><span class="p">(</span><span class="n">IpLookupService</span><span class="p">)</span>
<span class="n">fake_ip_lookup_service</span><span class="o">.</span><span class="n">get_country_from_ip</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="s1">'US'</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="n">country_code</span> <span class="o">=</span> <span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">fake_ip_lookup_service</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">country_code</span> <span class="o">==</span> <span class="s1">'US'</span>
</pre></div>
<p>To test the function without actually making http requests we created a mock of the service. We then set the return value of <code>get_country_from_ip</code>, and passed the mock service to the function.</p>
<h3 id="changing-implementations"><a class="toclink" href="#changing-implementations">Changing Implementations</a></h3>
<p>Another benefit of DI which is often mentioned, is the ability to completely change the underlying implementation of an injected service. For example, one day you discover that you don't have to use a remote service to lookup an IP. Instead, you can use a <a href="https://github.com/maxmind/geoip-api-python" rel="noopener">local IP database</a>.</p>
<p>Because our <code>IpLookupService</code> does not leak its internal implementation, it's an easy switch:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
<span class="kn">import</span> <span class="nn">GeoIP</span>
<span class="k">class</span> <span class="nc">LocalIpLookupService</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_to_db_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">db</span> <span class="o">=</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">path_to_db_file</span><span class="p">,</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">GEOIP_STANDARD</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">country_code_by_addr</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
</pre></div>
<p>The service API remained unchanged, so you can use it the same way as the old service:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">ip_lookup_service</span> <span class="o">=</span> <span class="n">LocalIpLookupService</span><span class="p">(</span><span class="s1">'/usr/share/GeoIP/GeoIP.dat'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ip_lookup_service</span><span class="o">.</span><span class="n">get_country_from_ip</span><span class="p">(</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="go">'US'</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="gp">>>> </span><span class="n">request</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">ip_lookup_service</span><span class="p">)</span>
<span class="go">'US'</span>
</pre></div>
<p>The best part here is that the tests are unaffected. All the tests should pass without making any changes.</p>
<div class="admonition info">
<p class="admonition-title">GeoIP</p>
<p>In the example I use the <a href="https://github.com/maxmind/geoip-api-python" rel="noopener">MaxMind GeoIP Legacy Python Extension API</a> because it uses files I already have in my OS as part of <a href="https://linux.die.net/man/1/geoiplookup" rel="noopener"><code>geoiplookup</code></a>. If you really need to lookup IP addresses check out <a href="https://geoip2.readthedocs.io/en/latest/" rel="noopener">GeoIP2</a> and make sure to check the license and usage restrictions.</p>
<p>Also, Django users might be delighted to know that <a href="https://docs.djangoproject.com/en/3.0/ref/contrib/gis/geoip2/" rel="noopener">Django provides a wrapper around <code>geoip2</code></a>.</p>
</div>
<h3 id="typing-services"><a class="toclink" href="#typing-services">Typing Services</a></h3>
<p>In the last section we cheated a bit. We injected the new service <code>LocalIpLookupService</code> into a function that expects an instance of <code>IpLookupService</code>. We made sure that these two are the same, but the type annotations are now wrong. We also used a mock to test the function which is also not of type <code>IpLookupService</code>. So, how can we use type annotations and still be able to inject different services?</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">ABCMeta</span>
</span><span class="kn">import</span> <span class="nn">GeoIP</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="hll"><span class="k">class</span> <span class="nc">IpLookupService</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">ABCMeta</span><span class="p">):</span>
</span> <span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">RemoteIpLookupService</span><span class="p">(</span><span class="n">IpLookupService</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">base_url</span> <span class="o">=</span> <span class="n">base_url</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">base_url</span><span class="si">}</span><span class="s1">/json/</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="s1">'status'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'success'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="s1">'countryCode'</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">LocalIpLookupService</span><span class="p">(</span><span class="n">IpLookupService</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_to_db_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">db</span> <span class="o">=</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">path_to_db_file</span><span class="p">,</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">GEOIP_STANDARD</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">country_code_by_addr</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
</pre></div>
<p>We defined a base class called <code>IpLookupService</code> that acts as an interface. The base class defines the public API for users of <code>IpLookupService</code>. Using the base class, we can provide two implementations:</p>
<ol>
<li><code>RemoteIpLookupService</code>: uses the <code>requests</code> library to lookup the IP at an external.</li>
<li><code>LocalIpLookupService</code>: uses the local GeoIP database.</li>
</ol>
<p>Now, any function that needs an instance of <code>IpLookupService</code> can use this type, and the function will be able to accept any subclass of it.</p>
<p>Before we wrap things up, we still need to handle the tests. Previously we removed the test's dependency on <code>responses</code>, now we can ditch <code>mock</code> as well. Instead, we subclass <code>IpLookupService</code> with a simple implementation for testing:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterable</span>
<span class="k">class</span> <span class="nc">FakeIpLookupService</span><span class="p">(</span><span class="n">IpLookupService</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">results</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]]):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">results</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">return</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">results</span><span class="p">)</span>
</pre></div>
<p>The <code>FakeIpLookupService</code> implements <code>IpLookupService</code>, and is producing results from a list of predefined results we provide to it:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">RequestFactory</span>
<span class="hll"><span class="n">fake_ip_lookup_service</span> <span class="o">=</span> <span class="n">FakeIpLookupService</span><span class="p">(</span><span class="n">results</span><span class="o">=</span><span class="p">[</span><span class="s1">'US'</span><span class="p">])</span>
</span><span class="n">request</span> <span class="o">=</span> <span class="n">RequestFactory</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="n">REMOTE_ADDR</span><span class="o">=</span><span class="s1">'216.58.210.46'</span><span class="p">)</span>
<span class="n">country_code</span> <span class="o">=</span> <span class="n">get_country_from_request</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">fake_ip_lookup_service</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">country_code</span> <span class="o">==</span> <span class="s1">'US'</span>
</pre></div>
<p>The test no longer uses <code>mock</code>.</p>
<h3 id="using-a-protocol"><a class="toclink" href="#using-a-protocol">Using a Protocol</a></h3>
<p>The form of class hierarchy demonstrated in the previous section is called <a href="https://en.wikipedia.org/wiki/Nominal_type_system" rel="noopener">"nominal subtyping"</a>. There is another way to utilize typing without classes, using <a href="https://mypy.readthedocs.io/en/latest/protocols.html" rel="noopener"><code>Protocols</code></a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterable</span><span class="p">,</span> <span class="n">Optional</span>
<span class="hll"><span class="kn">from</span> <span class="nn">typing_extensions</span> <span class="kn">import</span> <span class="n">Protocol</span>
</span><span class="kn">import</span> <span class="nn">GeoIP</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="hll"><span class="k">class</span> <span class="nc">IpLookupService</span><span class="p">(</span><span class="n">Protocol</span><span class="p">):</span>
</span> <span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">RemoteIpLookupService</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_url</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">base_url</span> <span class="o">=</span> <span class="n">base_url</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">base_url</span><span class="si">}</span><span class="s1">/json/</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="s1">'status'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'success'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="s1">'countryCode'</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">LocalIpLookupService</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_to_db_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">db</span> <span class="o">=</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">path_to_db_file</span><span class="p">,</span> <span class="n">GeoIP</span><span class="o">.</span><span class="n">GEOIP_STANDARD</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">country_code_by_addr</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">FakeIpLookupService</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">results</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]]):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">results</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_country_from_ip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ip</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="k">yield from</span> <span class="bp">self</span><span class="o">.</span><span class="n">results</span>
</pre></div>
<p>The switch from classes to protocols is mild. Instead of creating <code>IpLookupService</code> as a base class, we declare it a <code>Protocol</code>. A protocol is used to define an interface and cannot be instantiated. Instead, a protocol is used only for typing purposes. When a class implements the interface defined by the protocol, is means "Structural Subtyping" exits and the type check will validate.</p>
<p>In our case, we use a protocol to make sure an argument of type <code>IpLookupService</code> implements the functions we expect an IP service to provide.</p>
<div class="admonition info">
<p class="admonition-title">structural and nominal subtyping</p>
<p>I've written about protocols, structural and nominal subtyping to in the past. Check out <a href="https://realpython.com/modeling-polymorphism-django-python/#generic-foreign-key" rel="noopener">Modeling Polymorphism in Django With Python</a>.</p>
</div>
<p><strong>So which to use?</strong> Some languages, like Java, use nominal typing exclusively, while other languages, like Go, use structural typing for interfaces. There are advantages and disadvantages to both ways, but we won't get into that here. In Python, nominal typing is easier to use and understand, so my recommendation is to stick to it, unless you need the flexibility afforded by protocols.</p>
<h3 id="nondeterminism-and-side-effects"><a class="toclink" href="#nondeterminism-and-side-effects">Nondeterminism and Side-Effects</a></h3>
<p>If you ever had a test that one day just started to fail, unprovoked, or a test that fails once every blue moon for no apparent reason, it's possible your code is relying on something that is not deterministic. In the <code>datetime.date.today</code> example, the result of <code>datetime.date.today</code> relies on the current time which is always changing, hence it's not deterministic.</p>
<p>There are many sources of nondeterminism. Common examples include:</p>
<ul>
<li>Randomness</li>
<li>Network access</li>
<li>Filesystem access</li>
<li>Database access</li>
<li>Environment variables</li>
<li>Mutable global variables</li>
</ul>
<p>Dependency injection provides a good way to control nondeterminism in tests. The basic recipe is this:</p>
<ol>
<li><strong>Identify the source of nondeterminism and encapsulate it in a service</strong>: For example, TimeService, RandomnessService, HttpService, FilesystemService and DatabaseService.</li>
<li><strong>Use dependency injection to access these services</strong>: Never bypass them by using datetime.now() and similar directly.</li>
<li><strong>Provide deterministic implementations of these services in tests</strong>: Use a mock, or a custom implementation suited for tests instead.</li>
</ol>
<p>If you follow the recipe diligently, your tests will not be affected by external circumstances and you will not have flaky tests!</p>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>Dependency injection is a design pattern just like any other. Developers can decide to what degree they want to take advantage of it. The main benefits of DI are:</p>
<ul>
<li>Decouple modules, functions and objects.</li>
<li>Switch implementations, or support several different implementations.</li>
<li>Eliminate nondeterminism from tests.</li>
</ul>
<p>In the use-case above we took several twists and turns to illustrate a point, which might have caused the implementation to seem more complicated than it really is. In addition to that, searching for information about dependency injection in Python often result in libraries and packages than seem to completely change the way you structure your application. This can be very intimidating.</p>
<p>In reality, DI can be used sparingly and in appropriate places to achieve the benefits listed above. When implemented correctly, DI can make your code easier to maintain and to test.</p>How to Move a Django Model to Another App2020-05-06T00:00:00+03:002020-05-06T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-05-06:/move-django-model<p>In my latest article for RealPython I cover some exotic migration operations, many of the built-in migration CLI commands and demonstrate important migrations concepts such as reversible migrations, migration plans and introspection.</p><hr>
<p>In my latest article for <a href="https://realpython.com" rel="noopener">RealPython</a> I share three ways to tackle one of the most challenging tasks involving Django migrations: moving a model from one Django app to another.</p>
<p>The article covers some exotic migration operations and many of the built-in migration CLI commands such <code>sqlmigrate</code>, <code>showmigrations</code> and <code>sqlsequencereset</code>. In the article I also demonstrate important migrations concepts such as reversible migrations, migration plans and introspection.</p>
<p><a href="https://realpython.com/move-django-model/" rel="noopener"><strong>Read "How to Move a Django Model to Another App" on RealPython β«</strong></a></p>
<figure><img alt="How to Move a Django Model to Another App" src="https://hakibenita.com/images/00-move-django-model.png"><figcaption>How to Move a Django Model to Another App</figcaption>
</figure>Testing an Interactive Voice Response System With Python and Pytest2020-05-01T00:00:00+03:002020-05-01T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-05-01:/python-django-pytest-twilio-ivr<p>It can be very challenging to test a system that rely heavily on a third party service such as Twilio. In this article, I show how to organize your code in a way that would isolate your bushiness logic and make it easier for you to test it separately.</p><hr>
<p>Following my previous article on <a href="/python-django-twilio-ivr">how to build an Interactive Voice Response (IVR) system with Twilio, Python and Django</a>, in this follow-up tutorial I show how to write automated tests for this system.</p>
<p>It can be very challenging to test a system that rely heavily on a third party service such as Twilio. In this article, I show how to organize your code in a way that would isolate your business logic and make it easier to test separately. The article demonstrate useful testing patterns using Django's <code>RequestFactory</code>, <code>unittest.mock</code>, Pytest fixtures, build-in <code>django-pytest</code> and many more.</p>
<div class="admonition info">
<p class="admonition-title">Source Code</p>
<p>The source code for this article and the previous one can be found <a href="https://github.com/hakib/twilio-ivr-test" rel="noopener">here</a>.</p>
</div>
<p><a href="https://www.twilio.com/blog/testing-twilio-ivr-system-python-pytest" rel="noopener"><strong>Read "Testing a Twilio Interactive Voice Response System With Python and Pytest" on the Twilio blog β«</strong></a></p>
<figure><img alt="Building an IVR System with Django and Twilio" src="https://hakibenita.com/images/00-python-django-pytest-twilio-ivr.png"><figcaption>Building an IVR System with Django and Twilio</figcaption>
</figure>How to Provide Test Fixtures for Django Models in Pytest2020-04-08T00:00:00+03:002020-04-08T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-04-08:/django-pytest-fixtures<p>One of the most challenging aspects of writing good tests is maintaining test fixtures. Good test fixtures motivate developers to write better tests, and bad fixtures can cripple a system to a point where developers fear and avoid them all together. The article covers everything from setting up Pytest for a Django project, creating test fixtures and how to create dependency between fixtures.</p><hr>
<p>One of the most challenging aspects of writing good tests is maintaining test fixtures. Good test fixtures motivate developers to write better tests, and bad fixtures can cripple a system to a point where developers fear and avoid them all together. The key to maintaining good fixtures is to find a good balance between flexibility and usability. Good fixtures are ones that are easy to use and easy to modify.</p>
<p>In my latest article for <a href="https://realpython.com" rel="noopener">RealPython</a> I share some insights on how to maintain good test fixtures for Django models using Pytest. The article covers everything from setting up Pytest for a Django project, creating test fixtures and how to create dependency between fixtures.</p>
<p>The article focuses on a pattern called <strong>"factory as a service"</strong>. Using this pattern, you can create fixture for Django models that depend on other fixtures. This makes it easier to set up data for tests and focus on the the scenario at hand rather than setting up the data.</p>
<p><a href="https://realpython.com/django-pytest-fixtures/" rel="noopener"><strong>Read "How to Provide Test Fixtures for Django Models in Pytest" on RealPython β«</strong></a></p>
<figure><img alt="How to Provide Test Fixtures for Django Models in Pytest" src="https://hakibenita.com/images/00-django-pytest-fixtures.png"><figcaption>How to Provide Test Fixtures for Django Models in Pytest</figcaption>
</figure>Using Markdown in Django2020-03-30T00:00:00+03:002020-03-30T00:00:00+03:00Haki Benitatag:hakibenita.com,2020-03-30:/django-markdown<p>How we developed a Markdown extension to manage content in Django sites.</p><hr>
<p>As developers, we rely on static analysis tools to check, lint and transform our code. We use these tools to help us be more productive and produce better code. However, when we write content using <a href="https://wikipedia.org/wiki/Markdown" rel="noopener">markdown</a> the tools at our disposal are scarce.</p>
<p><strong>In this article we describe how we developed a Markdown extension to address challenges in managing content using Markdown in Django sites.</strong></p>
<figure><img alt="Do you think they had a linter?<br><small>Photo by <a href="https://www.pexels.com/photo/typing-writing-typography-vintage-102100/">mali maeder from Pexels</a></small>" src="https://hakibenita.com/images/00-django-markdown.jpg"><figcaption>Do you think they had a linter?<br><small>Photo by <a href="https://www.pexels.com/photo/typing-writing-typography-vintage-102100/">mali maeder from Pexels</a></small></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#the-problem">The Problem</a><ul>
<li><a href="#prior-work">Prior Work</a></li>
</ul>
</li>
<li><a href="#using-markdown">Using Markdown</a><ul>
<li><a href="#converting-markdown-to-html">Converting Markdown to HTML</a></li>
<li><a href="#using-markdown-extensions">Using Markdown Extensions</a></li>
<li><a href="#creating-a-markdown-extension-to-process-inline-links">Creating a Markdown Extension to Process Inline Links</a></li>
</ul>
</li>
<li><a href="#validate-and-transform-django-links">Validate and Transform Django Links</a><ul>
<li><a href="#validating-mailto-links">Validating mailto Links</a></li>
<li><a href="#handling-internal-and-external-links">Handling Internal and External Links</a><ul>
<li><a href="#resolving-url-names">Resolving URL Names</a></li>
<li><a href="#handling-external-links">Handling External Links</a></li>
<li><a href="#requiring-scheme">Requiring Scheme</a></li>
</ul>
</li>
<li><a href="#putting-it-all-together">Putting it All Together</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a><ul>
<li><a href="#taking-it-further">Taking it Further</a></li>
</ul>
</li>
</ul>
</div>
<p></details></p>
<h2 id="the-problem"><a class="toclink" href="#the-problem">The Problem</a></h2>
<p>Like every website, we have different types of (mostly) static content in places like our home page, FAQ section and "About" page. For a very long time, we managed all of this content directly in Django templates.</p>
<p>When we finally decided it's time to move this content out of templates and into the database, we thought it's best to use Markdown. It's safer to produce HTML from Markdown, it provides a certain level of control and uniformity, and is easier for non-technical users to handle. As we progressed with the move, we noticed we are missing a few things:</p>
<p><strong>Internal Links</strong></p>
<p>Links to internal pages can get broken when the URL changes. In Django templates and views we use <code>reverse</code> and <code>{% url %}</code>, but this is not available in plain Markdown.</p>
<p><strong>Copy Between Environments</strong></p>
<p>Absolute internal links cannot be copied between environments. This can be resolved using relative links, but there is no way to enforce this out of the box.</p>
<p><strong>Invalid Links</strong></p>
<p>Invalid links can harm user experience and cause the user to question the reliability of the entire content. This is not something that is unique to Markdown, but HTML templates are maintained by developers who know a thing or two about URLs. Markdown documents on the other hand, are intended for non-technical writers.</p>
<h3 id="prior-work"><a class="toclink" href="#prior-work">Prior Work</a></h3>
<p>When I was researching this issue I searched for Python linters, Markdown preprocessor and extensions to help produce better Markdown. I found very few results. One approach that stood out was to use Django templates to produce Markdown documents.</p>
<p><strong>Preprocess Markdown using Django Template</strong></p>
<p>Using Django templates, you can use template tags such as <a href="https://docs.djangoproject.com/en/3.0/ref/templates/builtins/#url" rel="noopener"><code>url</code></a> to reverse URL names, as well as conditions, variables, date formats and all the other Django template features. This approach essentially uses Django template as a preprocessor for Markdown documents.</p>
<p>I personally felt like this may no be the best solution for non-technical writers. In addition, I was worried that providing access to Django template tags might be dangerous.</p>
<hr>
<h2 id="using-markdown"><a class="toclink" href="#using-markdown">Using Markdown</a></h2>
<p>With a better understanding of the problem, we were ready to dig a bit deeper into Markdown in Python.</p>
<h3 id="converting-markdown-to-html"><a class="toclink" href="#converting-markdown-to-html">Converting Markdown to HTML</a></h3>
<p>To start using Markdown in Python, install the <a href="https://python-markdown.github.io/" rel="noopener"><code>markdown</code></a> package:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>markdown
<span class="go">Collecting markdown</span>
<span class="go">Installing collected packages: markdown</span>
<span class="go">Successfully installed markdown-3.2.1</span>
</pre></div>
<p>Next, create a <code>Markdown</code> object and use the function <code>convert</code> to turn some Markdown into HTML:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">markdown</span>
<span class="gp">>>> </span><span class="n">md</span> <span class="o">=</span> <span class="n">markdown</span><span class="o">.</span><span class="n">Markdown</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"My name is **Haki**"</span><span class="p">)</span>
<span class="go"><p>My name is <strong>Haki</strong></p></span>
</pre></div>
<p>You can now use this HTML snippet in your template.</p>
<h3 id="using-markdown-extensions"><a class="toclink" href="#using-markdown-extensions">Using Markdown Extensions</a></h3>
<p>The basic Markdown processor provides the essentials for producing HTML content. For more "exotic" options, the Python <code>markdown</code> package includes some <a href="https://python-markdown.github.io/extensions/" rel="noopener">built-in extensions</a>. A popular extension is the <a href="https://python-markdown.github.io/extensions/extra/" rel="noopener">"extra" extension</a> that adds, among other things, support for <a href="https://python-markdown.github.io/extensions/fenced_code_blocks/" rel="noopener">fenced code blocks</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">markdown</span>
<span class="gp">>>> </span><span class="n">md</span> <span class="o">=</span> <span class="n">markdown</span><span class="o">.</span><span class="n">Markdown</span><span class="p">(</span><span class="n">extensions</span><span class="o">=</span><span class="p">[</span><span class="s1">'extra'</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"""```python</span>
<span class="gp">... </span><span class="s2">print('this is Python code!')</span>
<span class="gp">... </span><span class="s2">```"""</span><span class="p">)</span>
<span class="go"><pre><code class="python">print(\'this is Python code!\')\n</code></pre></span>
</pre></div>
<p>To extend Markdown with our unique Django capabilities, we are going to develop an extension of our own.</p>
<h3 id="creating-a-markdown-extension-to-process-inline-links"><a class="toclink" href="#creating-a-markdown-extension-to-process-inline-links">Creating a Markdown Extension to Process Inline Links</a></h3>
<p>If you look at the source, you'll see that to convert markdown to HTML, <code>Markdown</code> uses different processors. One type of processor is an <a href="https://github.com/Python-Markdown/markdown/blob/c116cfcca8bf610b643cbc7eafe9228f7a832fc3/markdown/inlinepatterns.py#L73" rel="noopener">inline processor</a>. Inline processors match specific inline patterns such as links, backticks, bold text and underlined text, and converts them to HTML.</p>
<p>The main purpose of our Markdown extension is to validate and transform links. So, the inline processor we are most interested in is the <a href="https://github.com/Python-Markdown/markdown/blob/c116cfcca8bf610b643cbc7eafe9228f7a832fc3/markdown/inlinepatterns.py#L593" rel="noopener"><code>LinkInlineProcessor</code></a>. This processor takes markdown in the form of <code>[Haki's website](https://hakibenita.com)</code>, parses it and returns a tuple containing the link and the text.</p>
<p>To extend the functionality, we extend <code>LinkInlineProcessor</code> and create a <code>Markdown.Extension</code> that uses it to handle links:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">markdown</span>
<span class="kn">from</span> <span class="nn">markdown.inlinepatterns</span> <span class="kn">import</span> <span class="n">LinkInlineProcessor</span><span class="p">,</span> <span class="n">LINK_RE</span>
<span class="k">def</span> <span class="nf">get_site_domain</span><span class="p">()</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="c1"># TODO: Get your site domain here</span>
<span class="k">return</span> <span class="s1">'example.com'</span>
<span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="c1"># TODO: This is where the magic happens!</span>
<span class="k">return</span> <span class="n">href</span>
<span class="k">class</span> <span class="nc">DjangoLinkInlineProcessor</span><span class="p">(</span><span class="n">LinkInlineProcessor</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">getLink</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
<span class="n">href</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">handled</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">getLink</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">index</span><span class="p">)</span>
<span class="n">site_domain</span> <span class="o">=</span> <span class="n">get_site_domain</span><span class="p">()</span>
<span class="n">href</span> <span class="o">=</span> <span class="n">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">)</span>
<span class="k">return</span> <span class="n">href</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">handled</span>
<span class="k">class</span> <span class="nc">DjangoUrlExtension</span><span class="p">(</span><span class="n">markdown</span><span class="o">.</span><span class="n">Extension</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">extendMarkdown</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">md</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwrags</span><span class="p">):</span>
<span class="n">md</span><span class="o">.</span><span class="n">inlinePatterns</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">DjangoLinkInlineProcessor</span><span class="p">(</span><span class="n">LINK_RE</span><span class="p">,</span> <span class="n">md</span><span class="p">),</span> <span class="s1">'link'</span><span class="p">,</span> <span class="mi">160</span><span class="p">)</span>
</pre></div>
<p>Let's break it down:</p>
<ul>
<li>The extension <code>DjangoUrlExtension</code> registers an inline link processor called <code>DjangoLinkInlineProcessor</code>. This processor will replace any other existing link processor.</li>
<li>The inline processor <code>DjangoLinkInlineProcessor</code> extends the built-in <code>LinkInlineProcessor</code>, and calls the function <code>clean_link</code> on every link it processes.</li>
<li>The function <code>clean_link</code> receives a link and a domain, and returns a transformed link. This is where we are going to plug in our implementation.</li>
</ul>
<div class="admonition info">
<p class="admonition-title">How to get the site domain</p>
<p>To identify links to your own site you must know the domain of your site. If you are using Django's <a href="https://docs.djangoproject.com/en/3.0/ref/contrib/sites" rel="noopener">sites framework</a> you can use it to <a href="https://docs.djangoproject.com/en/3.0/ref/contrib/sites/#getting-the-current-domain-for-full-urls" rel="noopener">get the current domain</a>.</p>
<p>I did not include this in my implementation because we don't use the sites framework. Instead, we set a variable in Django settings.</p>
<p>Another way to get the current domain is from an <a href="https://docs.djangoproject.com/en/3.0/ref/request-response/#django.http.HttpRequest" rel="noopener"><code>HttpRequest</code> object</a>. If content is only edited in your own site, you can try to plug the site domain from the request object. This may require some changes to the implementation.</p>
</div>
<p>To use the extension, add it when you initialize a new <code>Markdown</code> instance:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">md</span> <span class="o">=</span> <span class="n">markdown</span><span class="o">.</span><span class="n">Markdown</span><span class="p">(</span><span class="n">extensions</span><span class="o">=</span><span class="p">[</span><span class="n">DjangoUrlExtension</span><span class="p">()])</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"[haki's site](https://hakibenita.com)"</span><span class="p">)</span>
<span class="go"><p><a href="https://hakibenita.com">haki\'s site</a></p></span>
</pre></div>
<p>Great, the extension is being used and we are ready for the interesting part!</p>
<hr>
<h2 id="validate-and-transform-django-links"><a class="toclink" href="#validate-and-transform-django-links">Validate and Transform Django Links</a></h2>
<p>Now that we got the extension to call <code>clean_link</code> on all links, we can implement our validation and transformation logic.</p>
<h3 id="validating-mailto-links"><a class="toclink" href="#validating-mailto-links">Validating <code>mailto</code> Links</a></h3>
<p>To get the ball rolling, we'll start with a simple validation. <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#Linking_to_an_email_address" rel="noopener"><code>mailto</code> links</a> are useful for opening the user's email client with a predefined recipient address, subject and even message body.</p>
<p>A common <code>mailto</code> link can look like this:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"mailto:support@service.com?subject=I need help!"</span><span class="p">></span>Help!<span class="p"></</span><span class="nt">a</span><span class="p">></span>
</pre></div>
<p>This link will open your email client set to compose a new email to "support@service.com" with subject line "I need help!".</p>
<p><code>mailto</code> links do not have to include an email address. If you look at the "share" buttons at the bottom of this article, you'll find a <code>mailto</code> link that looks like this:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">a</span>
<span class="na">href</span><span class="o">=</span><span class="s">"mailto:?subject=Django Markdown by Haki Benita&body=http://hakibenita.com/django-markdown"</span>
<span class="na">title</span><span class="o">=</span><span class="s">"Email"</span><span class="p">></span>
Share via Email
<span class="p"></</span><span class="nt">a</span><span class="p">></span>
</pre></div>
<p>This <code>mailto</code> link does not include a recipient, just a subject line and message body.</p>
<p>Now that we have a good understanding of what <code>mailto</code> links look like, we can add the first validation to the <code>clean_link</code> function:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">django.core.exceptions</span> <span class="kn">import</span> <span class="n">ValidationError</span>
<span class="kn">from</span> <span class="nn">django.core.validators</span> <span class="kn">import</span> <span class="n">EmailValidator</span>
<span class="k">class</span> <span class="nc">Error</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">InvalidMarkdown</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">error</span> <span class="o">=</span> <span class="n">error</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span>
<span class="k">return</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="si">}</span><span class="s1"> "</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s1">"'</span><span class="p">;</span>
<span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="n">href</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'mailto:'</span><span class="p">):</span>
<span class="n">email_match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s1">'^(mailto:)?([^?]*)'</span><span class="p">,</span> <span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">email_match</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span><span class="s1">'Invalid mailto link'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">email_match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">if</span> <span class="n">email</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">EmailValidator</span><span class="p">()(</span><span class="n">email</span><span class="p">)</span>
<span class="k">except</span> <span class="n">ValidationError</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span><span class="s1">'Invalid email address'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">email</span><span class="p">)</span>
<span class="k">return</span> <span class="n">href</span>
<span class="c1"># More validations to come...</span>
<span class="k">return</span> <span class="n">href</span>
</pre></div>
<p>To validate a <code>mailto</code> link we added the following code to <code>clean_link</code>:</p>
<ul>
<li>Check if the link starts with <code>mailto:</code> to identify relevant links.</li>
<li>Split the link to its components using a regular expression.</li>
<li>Yank the actual email address from the <code>mailto</code> link, and validate it using <a href="https://docs.djangoproject.com/en/3.0/ref/validators/#emailvalidator" rel="noopener">Django's <code>EmailValidator</code></a>.</li>
</ul>
<p>Notice that we also added a new type of exception called <code>InvalidMarkdown</code>. We defined our own custom <code>Exception</code> type to distinguish it from other errors raised by <code>markdown</code> itself.</p>
<div class="admonition tip">
<p class="admonition-title">Custom error class</p>
<p>I wrote about <a href="bullet-proofing-django-models#a-better-approach">custom error classes</a> in the past, why they are useful and when you should use them.</p>
</div>
<p>Before we move on, let's add some tests and see this in action:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">md</span> <span class="o">=</span> <span class="n">markdown</span><span class="o">.</span><span class="n">Markdown</span><span class="p">(</span><span class="n">extensions</span><span class="o">=</span><span class="p">[</span><span class="n">DjangoUrlExtension</span><span class="p">()])</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"[Help](mailto:support@service.com?subject=I need help!)"</span><span class="p">)</span>
<span class="go">'<p><a href="mailto:support@service.com?subject=I need help!">Help</a></p>'</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"[Help](mailto:?subject=I need help!)"</span><span class="p">)</span>
<span class="go"><p><a href="mailto:?subject=I need help!">Help</a></p></span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"[Help](mailto:invalidemail?subject=I need help!)"</span><span class="p">)</span>
<span class="go">InvalidMarkdown: Invalid email address "invalidemail"</span>
</pre></div>
<p>Great! Worked as expected.</p>
<h3 id="handling-internal-and-external-links"><a class="toclink" href="#handling-internal-and-external-links">Handling Internal and External Links</a></h3>
<p>Now that we got our toes wet with <code>mailto</code> links, we can handle other types of links:</p>
<p><strong>External Links</strong></p>
<ul>
<li>Links outside our Django app.</li>
<li>Must contains a scheme: either http or https.</li>
<li>Ideally, we also want to make sure these links are not broken, but we won't do that now.</li>
</ul>
<p><strong>Internal Links</strong></p>
<ul>
<li>Links to pages inside our Django app.</li>
<li>Link must be relative: this will allow us to move content between environments.</li>
<li>Use Django's URL names instead of a URL path: this will allow us to safely move views around without worrying about broken links in markdown content.</li>
<li>Links may contain query parameters (<code>?</code>) and a fragment (<code>#</code>).</li>
</ul>
<div class="admonition info">
<p class="admonition-title">SEO</p>
<p>From an SEO standpoint, public URL's should not change. When they do, you should handle it properly with redirects, otherwise you might get penalized by search engines.</p>
</div>
<p>With this list of requirements we can start working.</p>
<h4 id="resolving-url-names"><a class="toclink" href="#resolving-url-names">Resolving URL Names</a></h4>
<p>To link to internal pages we want writers to provide a <strong>URL name</strong>, not a <strong>URL path</strong>. For example, say we have this view:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">path</span>
<span class="kn">from</span> <span class="nn">app.views</span> <span class="kn">import</span> <span class="n">home</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">path</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span> <span class="n">home</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'home'</span><span class="p">),</span>
<span class="p">]</span>
</pre></div>
<p>The URL path to this page is <code>https://example.com/</code>, the URL name is <code>home</code>. We want to use the URL name <code>home</code> in our markdown links, like this:</p>
<div class="highlight"><pre><span></span>Go back to [<span class="nt">homepage</span>](<span class="na">home</span>)
</pre></div>
<p>This should render to:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">p</span><span class="p">></span>Go back to <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/"</span><span class="p">></span>homepage<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
</pre></div>
<p>We also want to support query params and hash:</p>
<div class="highlight"><pre><span></span>Go back to [homepage](home#top)
Go back to [homepage](home?utm_source=faq)
</pre></div>
<p>This should render to the following HTML:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">p</span><span class="p">></span>Go back to <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/#top"</span><span class="p">></span>homepage<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Go back to <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/?utm_source=faq"</span><span class="p">></span>homepage<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
</pre></div>
<p>Using URL names, if we change the URL path, the links in the content will not be broken. To check if the href provided by the writer is a valid <code>url_name</code>, we can try to <a href="https://docs.djangoproject.com/en/3.0/ref/urlresolvers/#reverse" rel="noopener"><code>reverse</code></a> it:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="gp">>>> </span><span class="n">reverse</span><span class="p">(</span><span class="s1">'home'</span><span class="p">)</span>
<span class="go">'/'</span>
</pre></div>
<p>The URL name "home" points to the url path "/". When there is no match, an exception is raised:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="gp">>>> </span><span class="n">reverse</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)</span>
<span class="go">NoReverseMatch: Reverse for 'foo' not found.</span>
<span class="go">'foo' is not a valid view function or pattern name.</span>
</pre></div>
<p>Before we move forward, what happens when the URL name include query params or a hash:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="gp">>>> </span><span class="n">reverse</span><span class="p">(</span><span class="s1">'home#top'</span><span class="p">)</span>
<span class="go">NoReverseMatch: Reverse for 'home#top' not found.</span>
<span class="go">'home#top' is not a valid view function or pattern name.</span>
<span class="gp">>>> </span><span class="n">reverse</span><span class="p">(</span><span class="s1">'home?utm_source=faq'</span><span class="p">)</span>
<span class="go">NoReverseMatch: Reverse for 'home?utm_source=faq' not found.</span>
<span class="go">'home?utm_source=faq' is not a valid view function or pattern name.</span>
</pre></div>
<p>This makes sense because query parameters and hash are not part of the URL name.</p>
<p>To use <code>reverse</code> <em>and</em> support query params and hashes, we first need to clean the value. Then, check that it is a valid URL name and return the URL path including the query params and hash, if provided:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="c1"># ... Same as before ...</span>
<span class="c1"># Remove fragments or query params before trying to match the URL name.</span>
<span class="n">href_parts</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'#|\?'</span><span class="p">,</span> <span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">href_parts</span><span class="p">:</span>
<span class="n">start_ix</span> <span class="o">=</span> <span class="n">href_parts</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">url_name</span><span class="p">,</span> <span class="n">url_extra</span> <span class="o">=</span> <span class="n">href</span><span class="p">[:</span><span class="n">start_ix</span><span class="p">],</span> <span class="n">href</span><span class="p">[</span><span class="n">start_ix</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">url_name</span><span class="p">,</span> <span class="n">url_extra</span> <span class="o">=</span> <span class="n">href</span><span class="p">,</span> <span class="s1">''</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="n">url_name</span><span class="p">)</span>
<span class="k">except</span> <span class="n">NoReverseMatch</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">url</span> <span class="o">+</span> <span class="n">url_extra</span>
<span class="k">return</span> <span class="n">href</span>
</pre></div>
<p>This snippet uses a regular expression to split <code>href</code> in the occurrence of either <code>?</code> or <code>#</code>, and return the parts.</p>
<p>Make sure that it works:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">md</span> <span class="o">=</span> <span class="n">markdown</span><span class="o">.</span><span class="n">Markdown</span><span class="p">(</span><span class="n">extensions</span><span class="o">=</span><span class="p">[</span><span class="n">DjangoUrlExtension</span><span class="p">()])</span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"Go back to [homepage](home)"</span><span class="p">)</span>
<span class="go"><p>Go back to <a href="/">homepage</a></p></span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"Go back to [homepage](home#top)"</span><span class="p">)</span>
<span class="go"><p>Go back to <a href="/#top">homepage</a></p></span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"Go back to [homepage](home?utm_source=faq)"</span><span class="p">)</span>
<span class="go"><p>Go back to <a href="/?utm_source=faq">homepage</a></p></span>
<span class="gp">>>> </span><span class="n">md</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="s2">"Go back to [homepage](home?utm_source=faq#top)"</span><span class="p">)</span>
<span class="go"><p>Go back to <a href="/?utm_source=faq#top">homepage</a></p></span>
</pre></div>
<p>Amazing! Writers can now use URL names in Markdown. They can also include query parameters and fragment to be added to the URL.</p>
<h4 id="handling-external-links"><a class="toclink" href="#handling-external-links">Handling External Links</a></h4>
<p>To handle external links properly we want to check two things:</p>
<ol>
<li>External links always provide a scheme, either <code>http:</code> or <code>https:</code>.</li>
<li>Prevent absolute links to our own site. Internal links should use URL names.</li>
</ol>
<p>So far, we handled URL names and <code>mailto</code> links. If we passed these two checks it means <code>href</code> is a URL. Let's start by checking if the link is to our own site:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlparse</span>
<span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">parsed_url</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">parsed_url</span><span class="o">.</span><span class="n">netloc</span> <span class="o">==</span> <span class="n">site_domain</span><span class="p">:</span>
<span class="c1"># TODO: URL is internal.</span>
</pre></div>
<p>The function <a href="https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse" rel="noopener"><code>urlparse</code></a> returns a named tuple that contains the different parts of the URL. If the <code>netloc</code> property equals the <code>site_domain</code>, the link is really an internal link.</p>
<p>If the URL is in fact internal, we need to fail. But, keep in mind that writers are not necessarily technical people, so we want to help them out a bit and provide a useful error message. We require that internal links use a URL name and not a URL path, so it's best to let writers know what is the URL name for the path they provided.</p>
<p>To get the URL name of a URL path, Django provides a function called <a href="https://docs.djangoproject.com/en/3.0/ref/urlresolvers/#resolve" rel="noopener"><code>resolve</code></a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">resolve</span>
<span class="gp">>>> </span><span class="n">resolve</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="go">ResolverMatch(</span>
<span class="go"> func=app.views.home,</span>
<span class="go"> args=(),</span>
<span class="go"> kwargs={},</span>
<span class="go"> url_name=home,</span>
<span class="go"> app_names=[],</span>
<span class="go"> namespaces=[],</span>
<span class="go"> route=,</span>
<span class="go">)</span>
<span class="gp">>>> </span><span class="n">resolve</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span><span class="o">.</span><span class="n">url_name</span>
<span class="go">'home'</span>
</pre></div>
<p>When a match is found, <code>resolve</code> returns a <code>ResolverMatch</code> object that contains, among other information, the URL name. When a match is not found, it raises an error:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">resolve</span><span class="p">(</span><span class="s1">'/foo'</span><span class="p">)</span>
<span class="go">Resolver404: {'tried': [[<URLPattern '' [name='home']>]], 'path': 'foo'}</span>
</pre></div>
<p>This is actually what Django does under the hood to determine which view function to execute when a new request comes in.</p>
<p>To provide writers with better error messages we can use the URL name from the <code>ResolverMatch</code> object:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlparse</span>
<span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="c1"># ...</span>
<span class="n">parsed_url</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">parsed_url</span><span class="o">.</span><span class="n">netloc</span> <span class="o">==</span> <span class="n">site_domain</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">resolver_match</span> <span class="o">=</span> <span class="n">resolve</span><span class="p">(</span><span class="n">parsed_url</span><span class="o">.</span><span class="n">path</span><span class="p">)</span>
<span class="k">except</span> <span class="n">Resolver404</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s2">"Should not use absolute links to the current site.</span><span class="se">\n</span><span class="s2">"</span>
<span class="s2">"We couldn't find a match to this URL. Are you sure it exists?"</span><span class="p">,</span>
<span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s2">"Should not use absolute links to the current site.</span><span class="se">\n</span><span class="s2">"</span>
<span class="s1">'Try using the url name "</span><span class="si">{}</span><span class="s1">".'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resolver_match</span><span class="o">.</span><span class="n">url_name</span><span class="p">),</span>
<span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">href</span>
</pre></div>
<p>When we identify that the link in internal, we handle two cases:</p>
<ul>
<li>We don't recognize the URL: The url is most likely incorrect. Ask the writer to check the URL for mistakes.</li>
<li>We recognize the URL: The url is correct so tell the writer what URL name to use instead.</li>
</ul>
<p>Let's see it in action:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">clean_link</span><span class="p">(</span><span class="s1">'https://example.com/'</span><span class="p">,</span> <span class="s1">'example.com'</span><span class="p">)</span>
<span class="go">InvalidMarkdown: Should not use absolute links to the current site.</span>
<span class="go">Try using the url name "home". "https://example.com/"</span>
<span class="gp">>>> </span><span class="n">clean_link</span><span class="p">(</span><span class="s1">'https://example.com/foo'</span><span class="p">,</span> <span class="s1">'example.com'</span><span class="p">)</span>
<span class="go">InvalidMarkdown: Should not use absolute links to the current site.</span>
<span class="go">We couldn't find a match to this URL.</span>
<span class="go">Are you sure it exists? "https://example.com/foo"</span>
<span class="gp">>>> </span><span class="n">clean_link</span><span class="p">(</span><span class="s1">'https://external.com'</span><span class="p">,</span> <span class="s1">'example.com'</span><span class="p">)</span>
<span class="go">'https://external.com'</span>
</pre></div>
<p>Nice! External links are accepted and internal links are rejected with a helpful message.</p>
<h4 id="requiring-scheme"><a class="toclink" href="#requiring-scheme">Requiring Scheme</a></h4>
<p>The last thing we want to do is to make sure external links include a scheme, either <code>http:</code> or <code>https:</code>. Let's add that last piece to the function <code>clean_link</code>:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="c1"># ...</span>
<span class="n">parsed_url</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="c1">#...</span>
<span class="k">if</span> <span class="n">parsed_url</span><span class="o">.</span><span class="n">scheme</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">'http'</span><span class="p">,</span> <span class="s1">'https'</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s1">'Must provide an absolute URL '</span>
<span class="s1">'(be sure to include https:// or http://)'</span><span class="p">,</span>
<span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">href</span>
</pre></div>
<p>Using the parsed URL we can easily check the scheme. Let's make sure it's working:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">clean_link</span><span class="p">(</span><span class="s1">'external.com'</span><span class="p">,</span> <span class="s1">'example.com'</span><span class="p">)</span>
<span class="go">InvalidMarkdown: Must provide an absolute URL (be sure to include https:// or http://) "external.com"</span>
</pre></div>
<p>We provided the function with a link that has no scheme, and it failed with a helpful message. Cool!</p>
<h3 id="putting-it-all-together"><a class="toclink" href="#putting-it-all-together">Putting it All Together</a></h3>
<p>This is the complete code for the <code>clean_link</code> function:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">clean_link</span><span class="p">(</span><span class="n">href</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">site_domain</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="n">href</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'mailto:'</span><span class="p">):</span>
<span class="n">email_match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^(mailto:)?([^?]*)'</span><span class="p">,</span> <span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">email_match</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span><span class="s1">'Invalid mailto link'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">email_match</span><span class="o">.</span><span class="n">groups</span><span class="p">()[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">email</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">EmailValidator</span><span class="p">()(</span><span class="n">email</span><span class="p">)</span>
<span class="k">except</span> <span class="n">ValidationError</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span><span class="s1">'Invalid email address'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">email</span><span class="p">)</span>
<span class="k">return</span> <span class="n">href</span>
<span class="c1"># Remove fragments or query params before trying to match the url name</span>
<span class="n">href_parts</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'#|\?'</span><span class="p">,</span> <span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">href_parts</span><span class="p">:</span>
<span class="n">start_ix</span> <span class="o">=</span> <span class="n">href_parts</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">url_name</span><span class="p">,</span> <span class="n">url_extra</span> <span class="o">=</span> <span class="n">href</span><span class="p">[:</span><span class="n">start_ix</span><span class="p">],</span> <span class="n">href</span><span class="p">[</span><span class="n">start_ix</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">url_name</span><span class="p">,</span> <span class="n">url_extra</span> <span class="o">=</span> <span class="n">href</span><span class="p">,</span> <span class="s1">''</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="n">url_name</span><span class="p">)</span>
<span class="k">except</span> <span class="n">NoReverseMatch</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">url</span> <span class="o">+</span> <span class="n">url_extra</span>
<span class="n">parsed_url</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">parsed_url</span><span class="o">.</span><span class="n">netloc</span> <span class="o">==</span> <span class="n">site_domain</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">resolver_match</span> <span class="o">=</span> <span class="n">resolve</span><span class="p">(</span><span class="n">parsed_url</span><span class="o">.</span><span class="n">path</span><span class="p">)</span>
<span class="k">except</span> <span class="n">Resolver404</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s2">"Should not use absolute links to the current site.</span><span class="se">\n</span><span class="s2">"</span>
<span class="s2">"We couldn't find a match to this URL. Are you sure it exists?"</span><span class="p">,</span>
<span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s2">"Should not use absolute links to the current site.</span><span class="se">\n</span><span class="s2">"</span>
<span class="s1">'Try using the url name "</span><span class="si">{}</span><span class="s1">".'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resolver_match</span><span class="o">.</span><span class="n">url_name</span><span class="p">),</span>
<span class="n">value</span><span class="o">=</span><span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">parsed_url</span><span class="o">.</span><span class="n">scheme</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">'http'</span><span class="p">,</span> <span class="s1">'https'</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidMarkdown</span><span class="p">(</span>
<span class="s1">'Must provide an absolute URL '</span>
<span class="s1">'(be sure to include https:// or http://)'</span><span class="p">,</span>
<span class="n">href</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">href</span>
</pre></div>
<p>To get a sense of what a real use case for all of these features look like, take a look at the following content:</p>
<div class="highlight"><pre><span></span><span class="gh"># How to Get Started?</span>
Download the [<span class="nt">mobile app</span>](<span class="na">https://some-app-store.com/our-app</span>) and log in to your account.
If you don't have an account yet, [<span class="nt">sign up now</span>](<span class="na">signup?utm_source=getting_started</span>).
For more information about pricing, check our [<span class="nt">pricing plans</span>](<span class="na">home#pricing-plans</span>)
</pre></div>
<p>This will produce the following HTML:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">h1</span><span class="p">></span>How to Get Started?<span class="p"></</span><span class="nt">h1</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Download the <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"https://some-app-store.com/our-app"</span><span class="p">></span>mobile app<span class="p"></</span><span class="nt">a</span><span class="p">></span> and log in to your account.
If you don't have an account yet, <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"signup/?utm_source=getting_started"</span><span class="p">></span>sign up now<span class="p"></</span><span class="nt">a</span><span class="p">></span>.
For more information about pricing, check our <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/#pricing-plans"</span><span class="p">></span>pricing plans<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
</pre></div>
<p>Nice!</p>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>We now have a pretty sweet extension that can validate and transform links in Markdown documents! It is now much easier to move documents between environments and keep our content tidy and most importantly, correct and up to date!</p>
<div class="admonition source">
<p class="admonition-title">Source</p>
<p>The full source code can be found in <a href="https://gist.github.com/hakib/73fccc340e855bb65f42197e298c0c7d" rel="noopener">this gist</a>.</p>
</div>
<h3 id="taking-it-further"><a class="toclink" href="#taking-it-further">Taking it Further</a></h3>
<p>The capabilities described in this article worked well for us, but you might want to adjust it to fit your own needs.</p>
<p>If you need some ideas, then in addition to this extension we also created a markdown <a href="https://python-markdown.github.io/extensions/api/#preprocessors" rel="noopener">Preprocessor</a> that lets writers use constants in Markdown. For example, we defined a constant called <code>SUPPORT_EMAIL</code>, and we use it like this:</p>
<div class="highlight"><pre><span></span>Contact our support at [<span class="nt">$SUPPORT_EMAIL</span>](<span class="na">mailto:$SUPPORT_EMAIL</span>)
</pre></div>
<p>The preprocessor will replace the string <code>$SUPPORT_EMAIL</code> with the text we defined, and only then render the Markdown.</p>Building an IVR System with Python, Django and Twilio2020-02-12T00:00:00+02:002020-02-12T00:00:00+02:00Haki Benitatag:hakibenita.com,2020-02-12:/python-django-twilio-ivr<p>Last year my team and I worked on a very challenging IVR system. After almost a year in production and thousands of processed transactions, I teamed up with the great people over at the Twilio blog to write an introductory tutorial for developing IVR systems using Django and Twilio IVR.</p><hr>
<p>Last year my team and I worked on a very challenging IVR system. After almost a year in production and thousands of processed transactions, I teamed up with the great people over at the <a href="https://www.twilio.com/blog" rel="noopener">Twilio blog</a> to write an introductory tutorial for developing IVR systems using Django and Twilio IVR.</p>
<p>Aside from "making your server talk" and diving into the cool speech features, I found the most challenging part working on IVR is designing the views. Unlike APIs and Forms, IVR is very limited in the type of input it takes (DTMF tones, transcribed speech), and the amount of data it can communicate and process is limited.</p>
<p><a href="https://www.twilio.com/blog/building-interactive-voice-response-ivr-system-python-django-twilio" rel="noopener"><strong>Read "Building an Interactive Voice Response (IVR) System with Python, Django and Twilio" on the Twilio blog β«</strong></a></p>
<figure><img alt="Building an IVR System with Django and Twilio" src="https://hakibenita.com/images/00-python-django-twilio-ivr.png"><figcaption>Building an IVR System with Django and Twilio</figcaption>
</figure>Understand Group by in Django with SQL2020-02-11T00:00:00+02:002020-02-11T00:00:00+02:00Haki Benitatag:hakibenita.com,2020-02-11:/django-group-by-sql<p>Understand GROUP BY in Django ORM by comparing QuerySets and SQL side by side. If SQL is where you are most comfortable, this is the Django GROUP BY tutorial for you.</p><hr>
<p>Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a different angle.</p>
<p>In this article I put <strong>QuerySets and SQL side by side</strong>. If SQL is where you are most comfortable, this is the Django GROUP BY cheat-sheet for you.</p>
<figure><img alt="Image by <a href="https://unsplash.com/photos/D4YrzSwyIEc">Jason Leung</a>" src="https://hakibenita.com/images/00-django-group-by-sql.jpg"><figcaption>Image by <a href="https://unsplash.com/photos/D4YrzSwyIEc">Jason Leung</a></figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#how-to-group-by-in-django">How to Group By in Django</a><ul>
<li><a href="#how-to-count-rows">How to Count Rows</a></li>
<li><a href="#how-to-use-aggregate-functions">How to Use Aggregate Functions</a></li>
<li><a href="#how-to-group-by">How to Group By</a></li>
<li><a href="#how-to-filter-a-queryset-with-group-by">How to Filter a QuerySet With Group By</a></li>
<li><a href="#how-to-sort-a-queryset-with-group-by">How to Sort a QuerySet With Group By</a></li>
<li><a href="#how-to-combine-multiple-aggregations">How to Combine Multiple Aggregations</a></li>
<li><a href="#how-to-group-by-multiple-fields">How to Group by Multiple Fields</a></li>
<li><a href="#how-to-group-by-an-expression">How to Group by an Expression</a></li>
<li><a href="#how-to-use-conditional-aggregation">How to Use Conditional Aggregation</a></li>
<li><a href="#how-to-use-having">How to Use Having</a></li>
<li><a href="#how-to-group-by-distinct">How to Group by Distinct</a></li>
<li><a href="#how-to-create-expressions-using-aggregate-fields">How to Create Expressions Using Aggregate Fields</a></li>
<li><a href="#how-to-group-by-across-relations">How to Group By Across Relations</a></li>
<li><a href="#how-to-group-by-a-many-to-many-relationship">How to Group By a Many to Many Relationship</a></li>
</ul>
</li>
<li><a href="#going-further">Going Further</a></li>
</ul>
</div>
<p></details></p>
<style>
.side-by-side {
display: flex;
}
.side-by-side .highlight {
flex-grow: 1;
flex-shrink: 0;
width: 50%;
}
.side-by-side .highlight pre {
height: 100%;
}
/* Desktop */
@media only screen and (min-width: 52rem) {
.side-by-side .highlight:first-child {
margin-right: 1em;
}
}
/* Mobile */
@media only screen and (max-width: 52rem) {
.side-by-side {
flex-direction: column;
}
.side-by-side .highlight {
width: 100%;
}
.side-by-side .highlight:first-child {
margin-bottom: 0;
border-bottom: 1px solid rgba(0,0,0,0.2);
}
}
</style>
<hr>
<h2 id="how-to-group-by-in-django"><a class="toclink" href="#how-to-group-by-in-django">How to Group By in Django</a></h2>
<p>To demonstrate different GROUP BY queries, I will use models from Django's built-in <code>django.contrib.auth</code> app.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
</pre></div>
<p>Django ORM produces SQL statements with long aliases. For brevity, I will show a cleaned-up, but equivalent, version of what Django executes.</p>
<div class="admonition tip">
<p class="admonition-title">SQL Logging</p>
<p>To see the SQL actually executed by Django, you can <a href="all-you-need-to-know-about-prefetching-in-django#before-we-start">turn on SQL logging in the Django settings</a>.</p>
</div>
<h3 id="how-to-count-rows"><a class="toclink" href="#how-to-count-rows">How to Count Rows</a></h3>
<p>Let's count how many users we have:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span><span class="p">;</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
</section>
<p>Counting rows is so common that Django includes a function for it right on the QuerySet. Unlike other QuerySets we'll see next, <code>count</code> returns a number.</p>
<h3 id="how-to-use-aggregate-functions"><a class="toclink" href="#how-to-use-aggregate-functions">How to Use Aggregate Functions</a></h3>
<p>Django offers two more ways to count rows in a table.</p>
<p>We'll start with <a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/#aggregate" rel="noopener"><code>aggregate</code></a>:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id__count</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span><span class="p">;</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
</pre></div>
</section>
<p>To use <code>aggregate</code> we imported the aggregate function <code>Count</code>. The function accepts an expression to count. In this case, we used the name of the primary key column <code>id</code> to count all the rows in the table.</p>
<div class="admonition warning">
<p class="admonition-title">Aggregate NULL</p>
<p>Aggregations ignore <code>NULL</code> values. For more on how aggregations handle <code>NULL</code>, see <a href="sql-dos-and-donts#be-careful-when-counting-nullable-columns">12 Common Mistakes and Missed Optimization Opportunities in SQL</a>.</p>
</div>
<p>The result of <code>aggregate</code> is a dict:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="gp">>>> </span><span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="go">{"id__count": 891}</span>
</pre></div>
<p>The name of the key is derived from the name of the field and the name of the aggregate. In this case, it's <code>id__count</code>. It's a good idea not to rely on this naming convention, and instead provide your own name:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span><span class="p">;</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="gp">>>> </span><span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="go">{"total": 891}</span>
</pre></div>
</section>
<p>The name of the argument to <code>aggregate</code> is also the name of the key in the resulting dictionary.</p>
<h3 id="how-to-group-by"><a class="toclink" href="#how-to-group-by">How to Group By</a></h3>
<p>Using <code>aggregate</code> we got the result of applying the aggregate function on the entire table. This is useful, but usually we want to apply the aggregation on groups of rows.</p>
<p>Let's count users by their active status:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)))</span>
</pre></div>
</section>
<p>This time we used the function <a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/#django.db.models.query.QuerySet.annotate" rel="noopener"><code>annotate</code></a>. To produce a GROUP BY we use a combination of <code>values</code> and <code>annotate</code>:</p>
<ul>
<li><code>values('is_active')</code>: what to group by</li>
<li><code>annotate(total=Count('id'))</code>: what to aggregate</li>
</ul>
<p><strong>The order is important</strong>: failing to call <code>values</code> before <code>annotate</code> will not produce aggregate results.</p>
<p>Just like <code>aggregate</code>, the name of the argument to <code>annotate</code> is the key in the result of the evaluated QuerySet. In this case it's <code>total</code>.</p>
<h3 id="how-to-filter-a-queryset-with-group-by"><a class="toclink" href="#how-to-filter-a-queryset-with-group-by">How to Filter a QuerySet With Group By</a></h3>
<p>To apply aggregation on a filtered query you can use <a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/#filter" rel="noopener"><code>filter</code></a> anywhere in the query. For example, count only staff users by their active status:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="hll"><span class="k">WHERE</span>
</span><span class="hll"><span class="w"> </span><span class="n">is_staff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">True</span>
</span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="hll"><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">is_staff</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)))</span>
</pre></div>
</section>
<h3 id="how-to-sort-a-queryset-with-group-by"><a class="toclink" href="#how-to-sort-a-queryset-with-group-by">How to Sort a QuerySet With Group By</a></h3>
<p>Like filter, to sort a queryset use <a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/#order-by" rel="noopener"><code>order_by</code></a> anywhere in the query:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
<span class="hll"><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
</span><span class="hll"><span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">total</span>
</span></pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="hll"><span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">,</span> <span class="s1">'total'</span><span class="p">))</span>
</span></pre></div>
</section>
<p>Notice that you can sort by both the GROUP BY key and the aggregate field.</p>
<h3 id="how-to-combine-multiple-aggregations"><a class="toclink" href="#how-to-combine-multiple-aggregations">How to Combine Multiple Aggregations</a></h3>
<p>To produce multiple aggregations of the same group, add multiple annotations:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="k">MAX</span><span class="p">(</span><span class="n">date_joined</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">last_joined</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Max</span>
<span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="hll"> <span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
</span><span class="hll"> <span class="n">last_joined</span><span class="o">=</span><span class="n">Max</span><span class="p">(</span><span class="s1">'date_joined'</span><span class="p">),</span>
</span><span class="p">))</span>
</pre></div>
</section>
<p>The query will produce the number of active and inactive users, and the last date a user joined in each group.</p>
<h3 id="how-to-group-by-multiple-fields"><a class="toclink" href="#how-to-group-by-multiple-fields">How to Group by Multiple Fields</a></h3>
<p>Just like performing multiple aggregations, we might also want to group by multiple fields. For example, group by active status and staff status:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="n">is_staff</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
</span><span class="hll"><span class="w"> </span><span class="n">is_staff</span>
</span></pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">,</span> <span class="s1">'is_staff'</span><span class="p">)</span>
</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)))</span>
</pre></div>
</section>
<p>The result of this query includes <code>is_active</code>, <code>is_staff</code> and the number of users in each group.</p>
<h3 id="how-to-group-by-an-expression"><a class="toclink" href="#how-to-group-by-an-expression">How to Group by an Expression</a></h3>
<p>Another common use case for GROUP BY is to group by an expression. For example, count the number of users that joined each year:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'year'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">date_joined</span><span class="p">),</span>
</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'year'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">date_joined</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'date_joined__year'</span><span class="p">)</span>
</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)))</span>
</pre></div>
</section>
<p>Notice that to get the year from the date we used the special expression <code><field>__year</code> in the first call to <code>values()</code>. The result of the query is a dict, and the name of the key will be <code>date_joined__year</code>.</p>
<p>Sometimes, the built-in expressions are not enough, and you need to aggregate on a more complicated expression. For example, group by users that have logged in since they signed-up:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">last_login</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">date_joined</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">logged_since_joined</span><span class="p">,</span>
</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">last_login</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">date_joined</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">ExpressionWrapper</span><span class="p">,</span>
<span class="n">Q</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">BooleanField</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="hll"> <span class="n">logged_since_joined</span><span class="o">=</span><span class="n">ExpressionWrapper</span><span class="p">(</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">last_login__gt</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'date_joined'</span><span class="p">)),</span>
</span><span class="hll"> <span class="n">output_field</span><span class="o">=</span><span class="n">BooleanField</span><span class="p">(),</span>
</span><span class="hll"> <span class="p">)</span>
</span><span class="p">)</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'logged_since_joined'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'logged_since_joined'</span><span class="p">,</span> <span class="s1">'total'</span><span class="p">)</span>
</pre></div>
</section>
<p>The expression here is fairly complicated. We first use <code>annotate</code> to built the expression, and we mark it as a GROUP BY key by referencing the expression in the following call to <code>values()</code>. From here on, it's exactly the same.</p>
<h3 id="how-to-use-conditional-aggregation"><a class="toclink" href="#how-to-use-conditional-aggregation">How to Use Conditional Aggregation</a></h3>
<p>Using conditional aggregation, you can aggregate only a part of the group. Conditions come in handy when you have multiple aggregates. For example, count the number of staff and non-staff users by the year they signed-up:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'year'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">date_joined</span><span class="p">),</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">is_staff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">True</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">staff_users</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">is_staff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">False</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">non_staff_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'year'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">date_joined</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">F</span><span class="p">,</span> <span class="n">Q</span>
<span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'date_joined__year'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">staff_users</span><span class="o">=</span><span class="p">(</span>
<span class="hll"> <span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="nb">filter</span><span class="o">=</span><span class="n">Q</span><span class="p">(</span><span class="n">is_staff</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
</span> <span class="p">),</span>
<span class="n">non_staff_users</span><span class="o">=</span><span class="p">(</span>
<span class="hll"> <span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="nb">filter</span><span class="o">=</span><span class="n">Q</span><span class="p">(</span><span class="n">is_staff</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
</span> <span class="p">),</span>
<span class="p">))</span>
</pre></div>
</section>
<p>The SQL above is from PostgreSQL, which along with SQLite is currently the only database backend that supports the <code>FILTER</code> syntax shortcut (formally called <a href="https://modern-sql.com/feature/filter" rel="noopener">"selective aggregates"</a>). For other database backends, the ORM will use <code>CASE ... WHEN</code> instead.</p>
<div class="admonition tip">
<p class="admonition-title">tip</p>
<p>I previously wrote about aggregations with filters. Check out my <a href="9-django-tips-for-working-with-databases#aggregation-with-filter">9 Django tips for working with databases</a>.</p>
</div>
<h3 id="how-to-use-having"><a class="toclink" href="#how-to-use-having">How to Use Having</a></h3>
<p>The <code>HAVING</code> clause is used to filter on the result of an aggregate function. For example, find the years in which more than a 100 users joined:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
<span class="hll"><span class="k">HAVING</span>
</span><span class="hll"><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">100</span>
</span></pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">year_joined</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'date_joined__year'</span><span class="p">))</span>
</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="hll"><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">total__gt</span><span class="o">=</span><span class="mi">100</span><span class="p">))</span>
</span></pre></div>
</section>
<p>The filter on the annotated field <code>total</code> added an HAVING clause in the generated SQL.</p>
<h3 id="how-to-group-by-distinct"><a class="toclink" href="#how-to-group-by-distinct">How to Group by Distinct</a></h3>
<p>For some aggregate functions such as <code>COUNT</code>, it is sometimes desirable to only count distinct occurrences. For example, how many different last names are there per user active status:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">last_name</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_names</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="hll"> <span class="n">unique_names</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'last_name'</span><span class="p">,</span> <span class="n">distinct</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
</span><span class="p">))</span>
</pre></div>
</section>
<p>Notice the use of <code>distinct=True</code> in the call to <code>Count</code>.</p>
<h3 id="how-to-create-expressions-using-aggregate-fields"><a class="toclink" href="#how-to-create-expressions-using-aggregate-fields">How to Create Expressions Using Aggregate Fields</a></h3>
<p>Aggregate fields are often just the first step to a greater question. For example, what is the <em>percent</em> of unique last names by user active status:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">is_active</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">last_name</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_names</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">last_name</span><span class="p">)::</span><span class="nb">float</span>
</span><span class="hll"><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)::</span><span class="nb">float</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">pct_unique_names</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">is_active</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">FloatField</span>
<span class="kn">from</span> <span class="nn">django.db.models.functions</span> <span class="kn">import</span> <span class="n">Cast</span>
<span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="n">unique_names</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'last_name'</span><span class="p">,</span> <span class="n">distinct</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
<span class="p">)</span>
<span class="hll"><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">pct_unique_names</span><span class="o">=</span><span class="p">(</span>
</span><span class="hll"> <span class="n">Cast</span><span class="p">(</span><span class="s1">'unique_names'</span><span class="p">,</span> <span class="n">FloatField</span><span class="p">())</span>
</span><span class="hll"> <span class="o">/</span> <span class="n">Cast</span><span class="p">(</span><span class="s1">'total'</span><span class="p">,</span> <span class="n">FloatField</span><span class="p">())</span>
</span><span class="hll"><span class="p">))</span>
</span></pre></div>
</section>
<p>The first <code>annotate()</code> defines the aggregate fields. The second <code>annotate()</code> uses the aggregate function to construct an expression.</p>
<h3 id="how-to-group-by-across-relations"><a class="toclink" href="#how-to-group-by-across-relations">How to Group By Across Relations</a></h3>
<p>So far we've used only data in a single model, but aggregates are often used across relations. The simpler scenario is of a one-to-one or a foreign key relation. For example, say we have a <code>UserProfile</code> with a one-to-one relationship to the User, and we want to count users by the type of profile:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="k">type</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span><span class="w"> </span><span class="n">u</span>
<span class="hll"><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">user_profile</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">user_id</span>
</span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="k">type</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'user_profile__type'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)))</span>
</pre></div>
</section>
<p>Just like GROUP BY expressions, using relations in <code>values</code> will group by that field. Note that the name of the user profile type in the result will be 'user_profile__type'.</p>
<h3 id="how-to-group-by-a-many-to-many-relationship"><a class="toclink" href="#how-to-group-by-a-many-to-many-relationship">How to Group By a Many to Many Relationship</a></h3>
<p>A more complicated type of relation is the many to many relationship. For example, count in how many groups each user is a member:</p>
<section class="side-by-side">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">ug</span><span class="p">.</span><span class="n">group_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">memberships</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_user</span>
<span class="hll"><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">OUTER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">auth_user_groups</span><span class="w"> </span><span class="n">ug</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ug</span><span class="p">.</span><span class="n">user_id</span>
</span><span class="hll"><span class="w"> </span><span class="p">)</span>
</span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">User</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">memberships</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'groups'</span><span class="p">))</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="s1">'memberships'</span><span class="p">))</span>
</pre></div>
</section>
<p>A user can be a member of more than one group. To count the number of groups the user is member of we used the related name "groups" in the <code>User</code> model. If the related name is not explicitly set (and not explicitly disabled), Django will automatically generate a name in the format <code>{related model model}_set</code>. For example, <code>group_set</code>.</p>
<hr>
<h2 id="going-further"><a class="toclink" href="#going-further">Going Further</a></h2>
<p>To dig deeper into the ORM and GROUP BY in particular, check out these links:</p>
<ul>
<li><a href="/how-to-use-grouping-sets-in-django">How to use grouping sets in Django</a>: An article about advanced group by techniques such as group by cube, group by rollup and group by grouping sets.</li>
<li><a href="/sql-group-by-first-last-value">How to Get the First or Last Value in a Group Using Group By in SQL</a>: A neat little trick using arrays in PostgreSQL.</li>
<li><a href="/sql-dos-and-donts">12 Common Mistakes and Missed Optimization Opportunities in SQL</a>: Some SQL do's and dont's you need to know if you are working with data and writing SQL.</li>
<li><a href="https://docs.djangoproject.com/en/3.0/topics/db/aggregation/#cheat-sheet" rel="noopener">Django Aggregation cheat-sheet page</a>: How to do common aggregate queries.</li>
</ul>12 Common Mistakes and Missed Optimization Opportunities in SQL2019-11-21T00:00:00+02:002019-11-21T00:00:00+02:00Haki Benitatag:hakibenita.com,2019-11-21:/sql-dos-and-donts<p>SQL is used by analysts, data scientists, product managers, designers and many others. These professionals have access to databases, but they don't always have the intuition and understanding to write efficient queries. In an effort to make my team write better SQL, I went over reports written by non-developers and code reviews, and gathered common mistakes and missed optimization opportunities in SQL.</p><hr>
<p>Most programming languages are designed for professional developers with knowledge of algorithms and data structure. <strong>SQL is different</strong>.</p>
<p>SQL is used by analysts, data scientists, product managers, designers and many others. These professionals have access to databases, but they don't always have the intuition and understanding to write efficient queries.</p>
<p>In an effort to make my team write better SQL, I went over reports written by non-developers and code reviews, and gathered <strong>common mistakes and missed optimization opportunities in SQL</strong>.</p>
<figure><img alt="Avoid painful mistakes..." src="https://hakibenita.com/images/00-sql-dos-and-donts.jpg"><figcaption>Avoid painful mistakes...</figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#be-careful-when-dividing-integers">Be Careful When Dividing Integers</a></li>
<li><a href="#guard-against-division-by-zero-errors">Guard Against "division by zero" Errors</a></li>
<li><a href="#know-the-difference-between-union-and-union-all">Know the Difference Between UNION and UNION ALL</a></li>
<li><a href="#be-careful-when-counting-nullable-columns">Be Careful When Counting Nullable Columns</a></li>
<li><a href="#be-aware-of-timezones">Be Aware of Timezones</a></li>
<li><a href="#avoid-transformations-on-indexed-fields">Avoid Transformations on Indexed Fields</a></li>
<li><a href="#use-between-only-for-inclusive-ranges">Use BETWEEN Only For Inclusive Ranges</a></li>
<li><a href="#add-faux-predicates">Add "Faux" Predicates</a></li>
<li><a href="#inline-cte">Inline CTE*</a></li>
<li><a href="#fetch-only-what-you-need">Fetch Only What You Need!</a></li>
<li><a href="#reference-column-position-in-group-by-and-order-by">Reference Column Position in GROUP BY and ORDER BY</a></li>
<li><a href="#format-your-query">Format Your Query</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h3 id="be-careful-when-dividing-integers"><a class="toclink" href="#be-careful-when-dividing-integers">Be Careful When Dividing Integers</a></h3>
<p>In PostgreSQL, <strong>dividing an integer by an integer results in an integer</strong>:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">tax</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">price</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">tax_ratio</span>
</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="p">);</span>
<span class="go"> tax_ratio</span>
<span class="go">----------</span>
<span class="go"> 0</span>
</pre></div>
</div>
<p>To get the expected result of the division, you need to cast one of the values to float:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">tax</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">price</span><span class="o">::</span><span class="k">float</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">tax_ratio</span>
</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="p">);</span>
<span class="go"> tax_ratio</span>
<span class="go">----------</span>
<span class="go"> 0.17</span>
</pre></div>
</div>
<p>Failing to recognize this pitfall might lead to horribly incorrect results.</p>
<hr>
<h3 id="guard-against-division-by-zero-errors"><a class="toclink" href="#guard-against-division-by-zero-errors">Guard Against "division by zero" Errors</a></h3>
<p>Zero division is a notorious error in production:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mf">0</span>
<span class="n">ERROR</span><span class="p">:</span><span class="w"> </span><span class="n">division</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">zero</span>
</pre></div>
</div>
<p>Division by zero is a logic error, and shouldn't just be "worked around" but fixed such that you don't have a zero divisor in the first place. However, there are situations where a zero denominator is possible. One easy way to protect against zero division errors in such cases, is to make the entire expression null by setting the denominator to null if it equals zero:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">);</span>
<span class="go"> ?column?</span>
<span class="go">----------</span>
<span class="go"> -</span>
</pre></div>
</div>
<p>The function <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-NULLIF" rel="noopener">NULLIF</a> returns null if the first argument equals the second argument. In this case, if the denominator is equal to zero.</p>
<p>When dividing any number with NULL, the result is NULL. To force some value, you can wrap the entire expression with <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL" rel="noopener">COALESCE</a> and provide a fallback value:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="n">db</span><span class="o">=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">NULLIF</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="o">?</span><span class="k">column</span><span class="o">?</span>
<span class="c1">----------</span>
<span class="w"> </span><span class="mi">1</span>
</pre></div>
</div>
<p>The function <a href="https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL" rel="noopener">COALESCE</a> is very useful. It accepts any number of arguments, and returns the first value which is not null.</p>
<hr>
<h3 id="know-the-difference-between-union-and-union-all"><a class="toclink" href="#know-the-difference-between-union-and-union-all">Know the Difference Between UNION and UNION ALL</a></h3>
<p>A classic entry level interview question for developers and DBAs is "what is the difference between UNION and UNION ALL".</p>
<p><code>UNION ALL</code> concatenate the results of one or more queries. <code>UNION</code> does the same, but it also <strong>eliminates duplicate rows</strong>:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">UNION</span>
</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">past_sale</span>
<span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="hll"><span class="go">Unique (cost=2654611.00..2723233.86 rows=13724572 width=4)</span>
</span><span class="go"> -> Sort (cost=2654611.00..2688922.43 rows=13724572 width=4)</span>
<span class="go"> Sort Key: sale.created_by_id</span>
<span class="go"> -> Append (cost=0.00..652261.30 rows=13724572 width=4)</span>
<span class="go"> -> Seq Scan on sale (cost=0.00..442374.57 rows=13570157 width=4)</span>
<span class="go"> -> Seq Scan on past_sale (cost=0.00..4018.15 rows=154415 width=4)</span>
</pre></div>
</div>
<p>You can see in the execution plan that after appending the two queries, the database sorted the results and eliminated duplicate rows.</p>
<p>If you don't need to eliminate duplicate rows, it's best to use <code>UNION ALL</code>:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">past_sale</span>
<span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go"> Append (cost=0.00..515015.58 rows=13724572 width=4)</span>
<span class="go"> -> Seq Scan on sale (cost=0.00..442374.57 rows=13570157 width=4)</span>
<span class="go"> -> Seq Scan on past_sale (cost=0.00..4018.15 rows=154415 width=4)</span>
</pre></div>
</div>
<p>The execution plan is much simpler. The results are appended and a sort is not necessary.</p>
<hr>
<h3 id="be-careful-when-counting-nullable-columns"><a class="toclink" href="#be-careful-when-counting-nullable-columns">Be Careful When Counting Nullable Columns</a></h3>
<p>When using aggregate functions such as <code>COUNT</code>, it's important to understand how they handle null values.</p>
<p>For example, take the following table:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="kp">\pset</span><span class="w"> </span><span class="ss">null</span><span class="w"> </span><span class="ss">NULL</span>
<span class="go">Null display is "NULL".</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> id</span>
<span class="go">------</span>
<span class="go"> 1</span>
<span class="go"> NULL</span>
</pre></div>
<p>The <code>id</code> column contains a null value. Counting the <code>id</code> column:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="p">)</span>
<span class="hll"><span class="k">SELECT</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
</span><span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">-------</span>
<span class="go"> 1</span>
</pre></div>
</div>
<p>There are two rows in the table, but <code>COUNT</code> returned 1. This is because null values are ignored by <code>COUNT</code>.</p>
<p>To count rows, use <code>COUNT(*)</code>:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="p">)</span>
<span class="hll"><span class="k">SELECT</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
</span><span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> count</span>
<span class="go">-------</span>
<span class="go"> 2</span>
</pre></div>
</div>
<p>This feature can also be useful. For example, if a field called <code>modified</code> contains null if a row was not changed, you can calculate the percentage of changed rows like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">modified</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="o">::</span><span class="k">float</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">modified_pct</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="p">);</span>
<span class="go"> modified_pct</span>
<span class="go">---------------</span>
<span class="go"> 0.98</span>
</pre></div>
<p>Other aggregate functions, such as SUM, will ignore null values. To demonstrate, SUM a field that contains only null values:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">tb</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">id</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">SUM</span><span class="p">(</span><span class="n">id</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">tb</span><span class="p">;</span>
<span class="go"> sum</span>
<span class="go">-------</span>
<span class="go"> NULL</span>
</pre></div>
<p>These are all documented behaviors, so just be aware!</p>
<hr>
<h3 id="be-aware-of-timezones"><a class="toclink" href="#be-aware-of-timezones">Be Aware of Timezones</a></h3>
<p>Timezone are always a source of confusion and pitfalls. PostgreSQL does a fair job with timezones, but you still have to pay attention to some things.</p>
<p>A common mistake I see countless times is truncating timestamps without specifying the time zone. Say we want to find out how many sales were made each day:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">created_at</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mi">1</span>
</pre></div>
</div>
<p>Without explicitly setting the time zone, you might get different results, depending on the time zone set by the client application:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="o">::</span><span class="nb">date</span><span class="p">;</span>
<span class="go"> now</span>
<span class="go">------------</span>
<span class="go"> 2019-11-08</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="nb">TIME</span><span class="w"> </span><span class="k">ZONE</span><span class="w"> </span><span class="s1">'australia/perth'</span><span class="p">;</span>
<span class="go">SET</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="o">::</span><span class="nb">date</span><span class="p">;</span>
<span class="go"> now</span>
<span class="go">------------</span>
<span class="go"> 2019-11-09</span>
</pre></div>
<p>If you are not sure what time zone you are working with, you might be doing it wrong.</p>
<p>When truncating a timestamp, convert to the desired time zone first:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="k">timestamp</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="k">time</span><span class="w"> </span><span class="k">zone</span><span class="w"> </span><span class="s1">'asia/tel_aviv'</span><span class="p">)::</span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
</pre></div>
</div>
<p>Setting the time zone is usually the responsibility of the client application. For example, to get the time zone used by <code>psql</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SHOW</span><span class="w"> </span><span class="n">timezone</span><span class="p">;</span>
<span class="go"> TimeZone</span>
<span class="go">----------</span>
<span class="go"> Israel</span>
<span class="go">(1 row)</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">();</span>
<span class="go"> now</span>
<span class="go">-------------------------------</span>
<span class="go"> 2019-11-09 11:41:45.233529+02</span>
<span class="go">(1 row)</span>
</pre></div>
<p>To set the time zone in <code>psql</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">timezone</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="s1">'UTC'</span><span class="p">;</span>
<span class="go">SET</span>
<span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">();</span>
<span class="go"> now</span>
<span class="go">-------------------------------</span>
<span class="go"> 2019-11-09 09:41:55.904474+00</span>
<span class="go">(1 row)</span>
</pre></div>
<p>Another important thing to keep in mind is that the time zone of your server can be different than the time zone of your local machine. so if you run queries in you local machine they might yield different results in production. To avoid mistakes, always explicitly set a time zone.</p>
<div class="admonition tip">
<p class="admonition-title">Timezones in PostgreSQL</p>
<p>To get a complete list of time zone names in PostgreSQL query the <a href="https://www.postgresql.org/docs/current/view-pg-timezone-names.html" rel="noopener">view <code>pg_timezone_names</code></a>.</p>
</div>
<hr>
<h3 id="avoid-transformations-on-indexed-fields"><a class="toclink" href="#avoid-transformations-on-indexed-fields">Avoid Transformations on Indexed Fields</a></h3>
<p>Using functions on an indexed field might prevent the database from using the index on the field:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="nb">time</span><span class="w"> </span><span class="k">ZONE</span><span class="w"> </span><span class="s1">'asia/tel_aviv'</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="s1">'2019-10-01'</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go">Seq Scan on sale (cost=0.00..510225.35 rows=4523386 width=276)</span>
<span class="go"> Filter:timezone('asia/tel_aviv', created)>'2019-10-01 00:00:00'::timestamp without time zone</span>
</pre></div>
</div>
<p>The field <code>created</code> is indexed, but because we transformed it with <code>timezone</code>, the index was not used.</p>
<p>One way to utilize the index in this case is to <strong>apply the transformation on the right-hand side instead</strong>:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="s1">'2019-10-01'</span><span class="w"> </span><span class="k">AT</span><span class="w"> </span><span class="nb">TIME</span><span class="w"> </span><span class="k">ZONE</span><span class="w"> </span><span class="s1">'asia/tel_aviv'</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go">Index Scan using sale_created_ix on sale (cost=0.43..4.51 rows=1 width=276)</span>
<span class="go"> Index Cond: (created > '2019-10-01 00:00:00'::timestamp with time zone)</span>
</pre></div>
</div>
<p>Another common use-case involving dates is filtering a specific period:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 day'</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="s1">'2019-10-01'</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go">Seq Scan on sale (cost=0.00..510225.35 rows=4523386 width=276)</span>
<span class="go"> Filter: ((created + '1 day'::interval) > '2019-10-01 00:00:00+03'::timestamp with time zone)</span>
</pre></div>
</div>
<p>Like before, the interval function on the field <code>created</code> prevented the database from utilizing the index. To make the database use the index, apply the transformation on the right-hand side instead of the field:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="s1">'2019-10-01'</span><span class="o">::</span><span class="nb">date</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 day'</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go"> Index Scan using sale_created_ix on sale (cost=0.43..4.51 rows=1 width=276)</span>
<span class="go"> Index Cond: (created > '2019-10-01 00:00:00'::timestamp without time zone)</span>
</pre></div>
</div>
<hr>
<h3 id="use-between-only-for-inclusive-ranges"><a class="toclink" href="#use-between-only-for-inclusive-ranges">Use BETWEEN Only For Inclusive Ranges</a></h3>
<p>A common mistake I see very often is when filtering a date range using BETWEEN:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sales</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2019-01-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="p">;</span>
</pre></div>
</div>
<p>Intuitively, you might think this query is fetching all the sales in 2019, but in fact, it's fetching all the sales made in 2019 <em>and</em> the first day of 2020. <code>BETWEEN</code> is inclusive, so the query above is equivalent to this query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sales</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'2019-01-01'</span>
<span class="k">AND</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="p">;</span>
</pre></div>
<p>To filter results in 2019 you can either write this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sales</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2019-01-01'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2019-12-31'</span><span class="p">;</span>
</pre></div>
<p>Or better yet:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">sales</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'2019-01-01'</span>
<span class="k">AND</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2020-01-01'</span><span class="p">;</span>
</pre></div>
</div>
<p>Using <code>BETWEEN</code> incorrectly might produce overlapping results, for example, counting sales twice in two different periods.</p>
<hr>
<h3 id="add-faux-predicates"><a class="toclink" href="#add-faux-predicates">Add "Faux" Predicates</a></h3>
<p>One of the most important components in database implementations, and usually the one that makes one database better than the other, is <strong>the query optimizer</strong>.</p>
<p>The query optimizer looks at your SQL and generates an <strong>execution plan</strong>. The execution plan describes how the database in going to access the data necessary to satisfy the query. For example, the optimizer decides whether to use a specific index or not, in what order to execute a join, which table to filter first, and so on.</p>
<p>To generate a good execution plan, the optimizer utilizes metadata and statistics it has on your data. For example, if you apply a filter on a column with a unique constraint, the optimizer knows it can expect exactly one row for each value. In this case, it might conclude that it makes more sense to use an index rather than scan the entire table.</p>
<p>In some circumstances, you have knowledge of your data that the optimizer does not have, or cannot have. You might be able to improve the performance of a query by providing additional information to the optimizer, using what I like to call, a <strong>"Faux Predicate"</strong>.</p>
<p>Take this query for example:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">modified</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2019-01-01 asia/tel_aviv'</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go"> Seq Scan on sale (cost=0.00..510225.35 rows=1357 width=276)</span>
<span class="go"> Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)</span>
</pre></div>
</div>
<p>The query fetches sales that were modified before 2019. There is no index on this field, so the optimizer generates an execution plan to scan the entire table.</p>
<p>Let's say you have another field in this table with the time the sale was created. Since it's not possible for a sale to be modified before it was created, adding a similar condition on the <code>created</code> field won't change the result of the query. However, the optimizer might use this information to generate a better execution plan:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">modified</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2019-01-01 asia/tel_aviv'</span>
<span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'2019-01-01 asia/tel_aviv'</span><span class="p">;</span>
</span><span class="go">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go">Index Scan using sale_created_ix on sale (cost=0.44..4.52 rows=1 width=276)</span>
<span class="hll"><span class="go"> Index Cond: (created < '2019-01-01 00:00:00+02'::timestamp with time zone)</span>
</span><span class="go"> Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)</span>
</pre></div>
</div>
<p>After we added the "Faux Predicate" the optimizer decided to use the index on the <code>created</code> field, and the query got much faster! Note that the previous predicate on the <code>modified</code> field is still being evaluated, but it's now being applied on much fewer rows.</p>
<p><strong>A "Faux Predicate" should not change the result of the query</strong>. It should only be used to provide more information to the optimizer that can improve the query performance. Keep in mind that the database has to evaluate all the predicates, so adding <em>too many</em> might make a query slower.</p>
<hr>
<h3 id="inline-cte"><a class="toclink" href="#inline-cte">Inline CTE*</a></h3>
<p>Before PostgreSQL 12, Common Table Expressions (aka CTE) were materialized. This changed in PostgreSQL 12, where CTEs are no longer materialized and are treated like sub-queries.</p>
<p>In versions prior to PostgreSQL 12, when CTEs are used incorrectly they can cause increased memory usage and degraded performance:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">cte</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">cte</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">---------------------------------------------------------------------------</span>
<span class="go"> CTE Scan on cte (cost=442906.19..748632.12 rows=67939 width=1148)</span>
<span class="go"> Filter: (created_by_id = 1)</span>
<span class="go"> CTE cte</span>
<span class="go"> -> Seq Scan on sale (cost=0.00..442906.19 rows=1999999999 width=276)</span>
<span class="go">(4 rows)</span>
</pre></div>
</div>
<p>The overall cost of the execution plan seems very high. This is because the database first materialized the result of the common table expression, and only then applied the predicate. The database was unable to utilize the index on the field, and the query ended up not very efficient.</p>
<p>For better performance, inline the CTE (or upgrade to PostgreSQL 12 π):</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">inlined</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created_by_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span>
<span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">-------------------------------------------------------------------------------------</span>
<span class="go"> Index Scan using sale_created_by_ix on sale (cost=0.43..714.70 rows=277 width=276)</span>
<span class="go"> Index Cond: (created_by_id = 1)</span>
</pre></div>
</div>
<p>For more about CTE in PostgreSQL and how it effects a query execution plan check out <a href="/be-careful-with-cte-in-postgre-sql">Be Careful With CTE in PostgreSQL</a>.</p>
<hr>
<h3 id="fetch-only-what-you-need"><a class="toclink" href="#fetch-only-what-you-need">Fetch Only What You Need!</a></h3>
<p>Databases are really good at storing and retrieving data. Other application, not so much. If you fetch data to Excel, SASS, R, Pandas or any other reporting tool - it's best to fetch only what you need.</p>
<p>For example, you sometimes want to get a sense of the data and you might do this:</p>
<div class="dont">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span><span class="p">;</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go"> Seq Scan on sale (cost=0.00..442374.57 rows=13570157 width=276)</span>
</pre></div>
</div>
<p>This query will fetch the entire table and will most likely cause unnecessary load if you only need several rows.</p>
<p>Some client applications will automatically fetch the data in pages or limit the result set, but to be on the safe side, it's best to set <code>LIMIT</code> yourself:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">sale</span>
<span class="hll"><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mf">10</span>
</span><span class="p">);</span>
<span class="go"> QUERY PLAN</span>
<span class="go">----------------------------------------------------------------------------------</span>
<span class="go"> Limit (cost=0.00..0.33 rows=10 width=276)</span>
<span class="go"> -> Seq Scan on sale (cost=0.00..442374.57 rows=13570157 width=276)</span>
</pre></div>
</div>
<p>Another common case where unnecessary data is fetched from the database is when a user that is less familiar with SQL is fetching data into some other tool such as Excel or Pandas, only to immediately apply some filter or aggregation to it. This can usually be solved by some sort of training.</p>
<hr>
<h3 id="reference-column-position-in-group-by-and-order-by"><a class="toclink" href="#reference-column-position-in-group-by-and-order-by">Reference Column Position in GROUP BY and ORDER BY</a></h3>
<p>A nice feature in PostgreSQL is that columns can be referenced in GROUP BY and ORDER BY by their position in the SELECT clause:</p>
<div class="do">
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">first_name</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">' '</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">last_name</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">full_name</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">sales</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="mi">1</span>
</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="k">DESC</span>
</span></pre></div>
</div>
<p>The <code>GROUP BY</code> clause references the expression in first position in the <code>SELECT</code> clause, <code>full_name</code>. The <code>ORDER BY</code> clause references the second expression, the <code>sales</code> count. By referencing the position we avoided repeating the expression. Aside from saving a few more clicks, if the expression changes in the future we can edit it in only one place.</p>
<p>I realize this tip can be controversial because the column position in the <code>SELECT</code> clause has no significance and might itself change when the query is edited. However, I found that it improves productivity when writing ad-hoc queries.</p>
<hr>
<h3 id="format-your-query"><a class="toclink" href="#format-your-query">Format Your Query</a></h3>
<p>Readability counts. Pick whatever style you and your team feel most comfortable with, and stick with it.</p>
<p>When I got started years ago I write queries like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">col1</span><span class="p">,</span><span class="w"> </span><span class="n">col2</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">t1</span>
<span class="k">JOIN</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">ta</span><span class="p">.</span><span class="n">pk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t2</span><span class="p">.</span><span class="n">fk</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">col1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col2</span>
<span class="k">AND</span><span class="w"> </span><span class="n">col3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">col4</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">col1</span><span class="p">,</span><span class="w"> </span><span class="n">col2</span><span class="p">,</span>
<span class="k">HAVING</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span>
</pre></div>
<p>I started this way because this is roughly the format Oracle used in their documentation.</p>
<p>Over the years, I encountered many different styles. For example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">col1</span><span class="p">,</span><span class="w"> </span><span class="n">col2</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t1</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">ta</span><span class="p">.</span><span class="n">pk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t2</span><span class="p">.</span><span class="n">fk</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">col1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col2</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">col3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">col4</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">col1</span><span class="p">,</span><span class="w"> </span><span class="n">col2</span><span class="p">,</span>
<span class="w"> </span><span class="k">HAVING</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span>
</pre></div>
<p>I can't think of a reason anyone would write like this, it's exhausting to format this manually (but it does look good...)</p>
<p>Nowadays, my team an I use this format:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">col1</span><span class="p">,</span>
<span class="w"> </span><span class="n">col2</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">t1</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">ta</span><span class="p">.</span><span class="n">pk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t2</span><span class="p">.</span><span class="n">fk</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="n">col1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col2</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">col3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">col4</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">col1</span><span class="p">,</span>
<span class="w"> </span><span class="n">col2</span>
<span class="k">HAVING</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">col3</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span>
</pre></div>
<p>It's readable, it's flexible, and most importantly, <strong>it plays very nicely with <code>git diff</code></strong> which makes code reviews easier.</p>
<hr>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>Applying the tips above in our day-to-day helps us sustain a healthy database with very little waste. We found that educating developers and non-developers about how to write better SQL can go a long way.</p>
<p>If you have any more SQL tips I might have missed, let me know and I'll be happy to add them here.</p>
<hr>
<p><em>UPDATES</em></p>
<ul>
<li>2019-11-22: Fixed the examples in the "Faux Predicate" section after several keen eyed readers noticed it was backwards.</li>
</ul>Preventing SQL Injection Attacks With Python2019-10-01T00:00:00+03:002019-10-01T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-10-01:/python-sql-injection<p>SQL injection attacks are constantly ranked among the most common attacks against systems. While binding <em>values</em> is very common, I often find my self needing to being table and column names as well. This article will walk you through everything you need to know about SQL injections in Python.</p><hr>
<p>SQL injection are constantly ranked among the most common attacks against systems. For this reason, ORM's offer many ways of dealing with injections. A common solution is bind variables, a placeholder in the query that is sanitized by the ORM for safe execution in the database.</p>
<p>However, while binding <em>values</em> is very common, I often find myself needing to use table and column names as variables as well. A stroll through <code>psycopg2</code>'s documentation led me to the discovery of <code>psycopg2.sql.Identifer</code> and <code>psycopg2.sql.Literal</code>, two low-level functions for safely binding any type of variable in a query.</p>
<p>This discovery sparked my fourth article for <a href="https://realpython.com" rel="noopener">RealPython</a>, "Preventing SQL Injection Attacks With Python". If you're not sure what SQL injection is, this article will walk you through everything you need to know. If you are an ORM veteran, check your knowledge and get yourself familiar with the low level <code>psycopg2.sql</code> module.</p>
<p><a href="https://realpython.com/prevent-python-sql-injection/" rel="noopener"><strong>Read "Preventing SQL Injection Attacks With Python" on RealPython β«</strong></a></p>
<figure><img alt="Preventing SQL Injection Attacks With Python" src="https://hakibenita.com/images/00-prevent-python-sql-injection.png"><figcaption>Preventing SQL Injection Attacks With Python</figcaption>
</figure>How "Export to Excel" Almost Killed Our System2019-09-17T00:00:00+03:002019-09-17T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-09-17:/python-django-optimizing-excel-export<p>Inspired by an actual incident we had in one of our systems caused by an "Export to Excel" functionality implemented in Python, we go through the process of identifying the problem, experimenting and benchmarking different solutions.</p><hr>
<p>A few weeks ago we had some trouble with an "Export to Excel" functionality in one of our systems. In the process of resolving this issue, we made some interesting discoveries and came up with original solutions.</p>
<p>This article is inspired by the actual issue we used to track this incident over a period of two days. We go through the process of identifying the problem, experimenting and benchmarking different solutions until eventually deploying to production.</p>
<p>These are the main takeaways described in this article:</p>
<ul>
<li>Generating xlsx files can consume significant amount of resources.</li>
<li>Under some circumstances better performance can be gained by not using <code>prefetch_related</code>.</li>
<li><code>pyexcelerate</code> is a fast package for creating simple Excel files.</li>
<li><code>tablib</code> (and <code>django-import-export</code>) can be patched to use <code>pyexcelerate</code> and produce excel files faster.</li>
</ul>
<p><details class="toc-container">
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#exporting-a-queryset-to-excel">Exporting a QuerySet to Excel</a><ul>
<li><a href="#using-django-import-export">Using django-import-export</a></li>
<li><a href="#finding-the-best-file-format">Finding the Best File Format</a></li>
</ul>
</li>
<li><a href="#improving-the-query">Improving the Query</a><ul>
<li><a href="#replacing-prefetch_related-with-subquery-and-outerref">Replacing prefetch_related with Subquery and OuterRef</a></li>
<li><a href="#using-an-iterator">Using an Iterator</a></li>
<li><a href="#simplifying-the-query">Simplifying the Query</a></li>
<li><a href="#manual-prefetch">Manual Prefetch</a></li>
<li><a href="#trouble-in-paradise">Trouble in Paradise</a></li>
</ul>
</li>
<li><a href="#using-a-different-excel-writer">Using a Different Excel Writer</a><ul>
<li><a href="#a-faster-excel-writer-in-python">A Faster Excel Writer in Python</a></li>
<li><a href="#patching-tablib">Patching tablib</a></li>
</ul>
</li>
<li><a href="#results-summary">Results Summary</a></li>
<li><a href="#seifa">Seifa</a></li>
<li><a href="#comments-from-readers">Comments From Readers</a></li>
</ul>
</div>
<p></details></p>
<figure><img alt="How a server must feel when asked to produce an Excel file" src="https://hakibenita.com/images/00-python-django-optimizing-excel-export.jpg"><figcaption>How a server must feel when asked to produce an Excel file</figcaption>
</figure>
<hr>
<p>A few weeks ago we started getting complaints from users about slow response time from one of our systems. A quick glance at the server metrics showed higher than normal CPU usage. This system is mostly IO intensive, so high CPU usage is not something we experience regularly.</p>
<p>The first thing we did was to identify the worker process that is consuming high CPU using <code>htop</code>. After getting the process identifier (PID) of the process, we used <a href="https://github.com/benfred/py-spy" rel="noopener">py-spy</a> to get a glance at what it's doing:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>py-spy<span class="w"> </span>-p<span class="w"> </span><span class="m">8187</span><span class="w"> </span>-r<span class="w"> </span><span class="m">1000</span>
</pre></div>
<p>This command samples the process a 1000 times per second and provides a <code>top</code>-like view of the results:</p>
<div class="highlight"><pre><span></span><span class="go">Total Samples 17974</span>
<span class="go">GIL: 0.00%, Active: 0.00%, Threads: 1</span>
<span class="go">OwnTime TotalTime Function (filename:line)</span>
<span class="go">0.000s 173.7s get_response (django/core/handlers/base.py:75)</span>
<span class="go">0.000s 173.7s inner (django/core/handlers/exception.py:34)</span>
<span class="go">0.000s 173.7s __call__ (django/utils/deprecation.py:94)</span>
<span class="go">0.000s 173.7s __call__ (django/core/handlers/wsgi.py:141)</span>
<span class="go">0.000s 173.6s view (django/views/generic/base.py:71)</span>
<span class="go">0.000s 173.6s _get_response (django/core/handlers/base.py:113)</span>
<span class="go">0.000s 173.6s dispatch (django/contrib/auth/mixins.py:52)</span>
<span class="go">0.000s 173.6s dispatch (django/contrib/auth/mixins.py:109)</span>
<span class="go">0.000s 173.6s dispatch (django/views/generic/base.py:97)</span>
<span class="hll"><span class="go">0.050s 173.6s get (dashboard/views/list_views.py:100)</span>
</span><span class="hll"><span class="go">0.000s 94.69s get_resource_to_export (dashboard/views/list_views.py:70)</span>
</span><span class="hll"><span class="go">0.000s 94.69s export (dashboard/views/list_views.py:73)</span>
</span><span class="go">0.000s 94.68s export (dashboard/resources.py:215)</span>
<span class="go">0.000s 83.81s __iter__ (django/db/models/query.py:274)</span>
<span class="go">0.040s 82.73s _fetch_all (django/db/models/query.py:1242)</span>
<span class="go">0.000s 78.84s export (dashboard/views/list_views.py:74)</span>
<span class="go">0.000s 70.58s __iter__ (django/db/models/query.py:55)</span>
<span class="go">0.000s 68.98s execute_sql (django/db/models/sql/compiler.py:1100)</span>
<span class="go">68.81s 68.81s _execute (django/db/backends/utils.py:84)</span>
<span class="go">0.000s 68.81s _execute_with_wrappers (django/db/backends/utils.py:76)</span>
<span class="go">0.000s 68.81s execute (django/db/backends/utils.py:67)</span>
<span class="hll"><span class="go">0.000s 50.11s save (tablib/packages/openpyxl3/workbook.py:186)</span>
</span><span class="hll"><span class="go">0.000s 50.11s export_set (tablib/formats/_xlsx.py:46)</span>
</span><span class="hll"><span class="go">0.000s 46.41s save (tablib/packages/openpyxl3/writer/excel.py:124)</span>
</span><span class="hll"><span class="go">0.000s 46.41s save_workbook (tablib/packages/openpyxl3/writer/excel.py:141)</span>
</span><span class="go">0.000s 42.40s _fetch_all (django/db/models/query.py:1244)</span>
<span class="go">0.000s 42.40s _prefetch_related_objects (django/db/models/query.py:771)</span>
<span class="go">0.000s 42.38s prefetch_related_objects (django/db/models/query.py:1625)</span>
<span class="go">0.000s 41.94s prefetch_one_level (django/db/models/query.py:1738)</span>
<span class="go">0.000s 41.25s get_prefetch_queryset (django/db/models/fields/related_descriptors.py:627)</span>
<span class="go">0.000s 32.30s _write_worksheets (tablib/packages/openpyxl3/writer/excel.py:91)</span>
</pre></div>
<p>After monitoring this view for a minute or two, we had a few insights:</p>
<ol>
<li>A lot of time is spent fetching data.</li>
<li>A lot of time is spent on <em>some</em> call to <code>prefetch_related</code>.</li>
<li>The problem is in the dashboard, and more specifically in the view that exports data.</li>
</ol>
<p>With these insights, we wanted to moved on to identify the exact view. We then turned to the <a href="https://www.nginx.com/" rel="noopener">nginx</a> access log:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>journalctl<span class="w"> </span>-u<span class="w"> </span>nginx<span class="w"> </span>-r<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>dashboard
</pre></div>
<p>We managed to identify several endpoints that were taking very long to execute. Some of them finished in just under 60 seconds, others were killed by PostgreSQL after hitting the <a href="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-STATEMENT-TIMEOUT" rel="noopener"><code>statement_timeout</code> limit</a> and returned a 500 status code.</p>
<p>At this point we had a pretty good idea where the problem is, but we were still clueless as to why. The next step was to inspect the problematic code, and try to reproduce.</p>
<h2 id="exporting-a-queryset-to-excel"><a class="toclink" href="#exporting-a-queryset-to-excel">Exporting a QuerySet to Excel</a></h2>
<p>The system is used to report and track violations in public transportation. During an inspection, the inspector documents different types of violations such as dirty bus, bus running late etc. The models for this system look roughly like this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ViolationType</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Inspection</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">AutoField</span><span class="p">(</span><span class="n">primary_key</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Violation</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">inspection</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Inspection</span><span class="p">,</span> <span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">PROTECT</span><span class="p">)</span>
<span class="n">violation_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">ViolationType</span><span class="p">,</span> <span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">PROTECT</span><span class="p">)</span>
<span class="n">comments</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
</pre></div>
<p>Every once in a while, a back office user would download the inspection information to Excel for further analysis.</p>
<figure><img alt="Users + Excel = Love" src="https://hakibenita.com/images/01-python-django-optimizing-excel-export.jpg"><figcaption>Users + Excel = Love</figcaption>
</figure>
<p>The report includes a lot of information about the inspection, but most importantly, it includes a list of the violation types for each inspection:</p>
<div class="highlight"><pre><span></span>inspection, violations
1, dirty floors | full trash can
2, full trash can | no light | missing signs
</pre></div>
<h3 id="using-django-import-export"><a class="toclink" href="#using-django-import-export">Using <code>django-import-export</code></a></h3>
<p>To produce the Excel report we use a package called <a href="https://github.com/django-import-export/django-import-export" rel="noopener"><code>django-import-export</code></a>. Using the package, we define a <a href="https://django-import-export.readthedocs.io/en/latest/api_resources.html#modelresource" rel="noopener"><code>ModelResource</code></a> that can produce an Excel file from a queryset:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">import_export</span> <span class="kn">import</span> <span class="n">resources</span><span class="p">,</span> <span class="n">fields</span><span class="p">,</span> <span class="n">widgets</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Inspection</span><span class="p">,</span> <span class="n">Violation</span>
<span class="k">class</span> <span class="nc">InspectionResource</span><span class="p">(</span><span class="n">resources</span><span class="o">.</span><span class="n">ModelResource</span><span class="p">):</span>
<span class="n">violations</span> <span class="o">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">Field</span><span class="p">(</span>
<span class="n">widget</span><span class="o">=</span><span class="n">widgets</span><span class="o">.</span><span class="n">ManyToManyWidget</span><span class="p">(</span><span class="n">Violation</span><span class="p">,</span> <span class="n">field</span><span class="o">=</span><span class="s1">'violation_type'</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Inspection</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'violations'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>The query produced by this <code>ModelResource</code> causes an <a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger#the-n1-problem">N+1 queries issue</a>, so before we ever deployed it to production we patched it and added <a href="https://docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related" rel="noopener"><code>prefetch_related</code></a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Prefetch</span>
<span class="kn">from</span> <span class="nn">import_export</span> <span class="kn">import</span> <span class="n">resources</span><span class="p">,</span> <span class="n">fields</span><span class="p">,</span> <span class="n">widgets</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Inspection</span><span class="p">,</span> <span class="n">Violation</span>
<span class="k">class</span> <span class="nc">InspectionResource</span><span class="p">(</span><span class="n">resources</span><span class="o">.</span><span class="n">ModelResource</span><span class="p">):</span>
<span class="n">violations</span> <span class="o">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">Field</span><span class="p">(</span>
<span class="n">widget</span><span class="o">=</span><span class="n">widgets</span><span class="o">.</span><span class="n">ManyToManyWidget</span><span class="p">(</span><span class="n">Violation</span><span class="p">,</span> <span class="n">field</span><span class="o">=</span><span class="s1">'violation_type'</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Inspection</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'violations'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">export</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">queryset</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="n">queryset</span>
</span><span class="hll"> <span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="n">Prefetch</span><span class="p">(</span>
</span><span class="hll"> <span class="s1">'violations'</span><span class="p">,</span>
</span><span class="hll"> <span class="n">queryset</span><span class="o">=</span><span class="n">Violation</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'violation_type'</span><span class="p">),</span>
</span><span class="hll"> <span class="n">to_attr</span><span class="o">=</span><span class="s1">'prefetched_violations'</span><span class="p">,</span>
</span><span class="hll"> <span class="p">))</span>
</span> <span class="p">)</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">export</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">queryset</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">dehydrate_violations</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inspection</span><span class="p">:</span> <span class="n">Inspection</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
<span class="n">v</span><span class="o">.</span><span class="n">violation_type</span><span class="o">.</span><span class="n">name</span>
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">inspection</span><span class="o">.</span><span class="n">prefetched_violations</span>
<span class="p">)</span>
</pre></div>
<p>To use <code>prefetch_related</code> in a <code>ModelResource</code> we had to make the following changes:</p>
<ol>
<li>
<p>Override <code>export</code> and adjust the query to prefetch the violations using <code>prefetch_related</code>. We use the <a href="https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.Prefetch" rel="noopener"><code>Prefetch</code> object</a> because we needed to customize the prefetch query, and add the violation type name from a related table.</p>
</li>
<li>
<p>Evaluate the query and have the export function return a list instead of a queryset. <code>django-import-export</code> uses <code>iterator</code> to speed up the query. Using <a href="https://docs.djangoproject.com/en/2.2/ref/models/querysets/#iterator" rel="noopener"><code>iterator()</code></a>, the ORM uses a cursor to iterate over the data in chunks and reduce memory. While this is usually useful, Django is unable to use <code>iterator()</code> with <code>prefetch_related</code>.</p>
</li>
<li>
<p>Add a custom <code>dehydrate_</code> function for the violations field that will render a comma-delimited list of violation type names.</p>
</li>
</ol>
<div class="admonition tip">
<p class="admonition-title">Prefetch Related</p>
<p>This is <a href="all-you-need-to-know-about-prefetching-in-django">all you need to know about prefetching in Django</a></p>
</div>
<p>The resource was used by the view to produce the Excel report:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.http</span> <span class="kn">import</span> <span class="n">HttpRequest</span><span class="p">,</span> <span class="n">HttpResponse</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Inspection</span>
<span class="kn">from</span> <span class="nn">.resources</span> <span class="kn">import</span> <span class="n">InspectionResource</span>
<span class="n">LIMIT</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="k">def</span> <span class="nf">export_to_excel</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">HttpRequest</span><span class="p">)</span> <span class="o">-></span> <span class="n">HttpResponse</span><span class="p">:</span>
<span class="n">inspections</span> <span class="o">=</span> <span class="n">Inspection</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="c1"># Apply some filter on the queryset based on request</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">InspectionResource</span><span class="p">()</span><span class="o">.</span><span class="n">export</span><span class="p">(</span><span class="n">inspections</span><span class="p">[:</span><span class="n">LIMIT</span><span class="p">])</span><span class="o">.</span><span class="n">xlsx</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">content_type</span><span class="o">=</span><span class="s1">'text/xlsx'</span><span class="p">)</span>
<span class="n">response</span><span class="p">[</span><span class="s1">'Content-Disposition'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'attachment; filename=export.xlsx'</span>
<span class="k">return</span> <span class="n">response</span>
</pre></div>
<p>The view takes a request, apply some filter on the inspections and produces the xlsx file using the <code>ModelResource</code>.</p>
<h3 id="finding-the-best-file-format"><a class="toclink" href="#finding-the-best-file-format">Finding the Best File Format</a></h3>
<p>Before we can start improving the export process, we need to establish a baseline. To get the timings and identify the hot spots in the call stack we used <a href="https://docs.python.org/3.7/library/profile.html" rel="noopener"><code>cProfile</code></a>. To identify and time query execution we turned SQL logging on in the Django settings:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'loggers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'django.db.backends'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'level'</span><span class="p">:</span> <span class="s1">'DEBUG'</span><span class="p">,</span>
<span class="p">},</span>
<span class="c1"># ...</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>The benchmark looked like this:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">cProfile</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Inspection</span>
<span class="kn">from</span> <span class="nn">.resources</span> <span class="kn">import</span> <span class="n">InspectionResource</span>
<span class="n">qs</span> <span class="o">=</span> <span class="n">VehicleInspection</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">10000</span><span class="p">]</span>
<span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'resources.VehicleInspectionResource().export(qs).xlsx'</span><span class="p">)</span>
</pre></div>
<p>These were the results of exporting 10,000 rows in xlsx format using <code>prefetch_related</code>:</p>
<div class="highlight"><pre><span></span><span class="go">56830808 function calls (47142920 primitive calls) in 41.574 seconds</span>
<span class="go">select 5.009</span>
<span class="go">prefetch 8.009</span>
<span class="go">56660555 function calls (47149065 primitive calls) in 39.927 seconds</span>
<span class="go">select 2.356</span>
<span class="go">prefetch 7.991</span>
</pre></div>
<p>We ran the benchmark twice to make sure the results were not effected by caches. The function took 40s to complete, and only 10s of it (25%) were spent in the database.</p>
<p>At this point, we suspected that <strong>the problem might be in the file format</strong>. This assumption was supported by the application server's high CPU usage.</p>
<p>Next, we wanted to try the same benchmark, only instead of xlsx we produced a csv:</p>
<div class="highlight"><pre><span></span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'resources.VehicleInspectionResource().export(qs).csv'</span><span class="p">)</span>
</pre></div>
<p>These were the results of exporting 10,000 rows in csv format using <code>prefetch_related</code>:</p>
<div class="highlight"><pre><span></span><span class="go">9179705 function calls (9107672 primitive calls) in 17.429 seconds</span>
<span class="go">select 1.970</span>
<span class="go">prefetch 8.343</span>
</pre></div>
<p>Wow! That's a big improvement. This confirmed our suspicion that the actual production of the xlsx was the problem.</p>
<p>Before we moved on, we wanted to check another file format that might be more useful to our users, the old xls format:</p>
<div class="highlight"><pre><span></span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'resources.VehicleInspectionResource().export(qs).xls'</span><span class="p">)</span>
</pre></div>
<p>These were the results of exporting 10,000 rows in xls format using <code>prefetch_related</code>:</p>
<div class="highlight"><pre><span></span><span class="go">16317592 function calls (15745704 primitive calls) in 20.694 seconds</span>
<span class="go">select 1.922</span>
<span class="go">prefetch 7.976</span>
</pre></div>
<p>OK, so that's surprising. I'm not familiar with the internals of the Microsoft Office file formats, but it seems like the old format is only a little bit slower than the csv format, and much faster than the new xlsx format.</p>
<p>This benchmark results brought up an old dilemma. In the past we used to serve users with only csv files, but they complained a lot about troubles opening the files, and encoding and formatting issues. For this reason we decided to produce xlsx in the first place, so at that time, producing xls files seemed like the best solution.</p>
<p>I should already tell you, using the old xls format was a bad decision, but we didn't know that yet.</p>
<hr>
<h2 id="improving-the-query"><a class="toclink" href="#improving-the-query">Improving the Query</a></h2>
<p>After reducing the overall execution time by half, our next targets were the queries. Two queries are executed to produce the dataset for the export. Before any change is made, it took the "main" query ~2s and the prefetch ~8s to complete.</p>
<p>The "main" query looked like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="ss">"inspection"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- around 50 more fields from joined tables</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="ss">"inspection"</span>
<span class="w"> </span><span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"auth_user"</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"inspection"</span><span class="p">.</span><span class="ss">"user_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"auth_user"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">-- around 11 more joined tables</span>
</pre></div>
<p>The resource used a lot of data from related tables, and the query joined ~12 tables and had many fields listed in the SELECT clause. The table is one of the main tables in the database so it is heavily indexed, and the lookup tables were relatively small so the query didn't take long to complete.</p>
<p>The prefetch query looked like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="p">.</span><span class="ss">"inspection_id"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="p">.</span><span class="ss">"violation_type_id"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"violation_type"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="w"> </span><span class="ss">"violation_type"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">,</span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"violation"</span>
<span class="w"> </span><span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"violation_type"</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="p">.</span><span class="ss">"violation_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"violation_type"</span><span class="p">.</span><span class="ss">"id"</span>
<span class="w"> </span><span class="p">)</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="p">.</span><span class="ss">"inspection_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ANY</span><span class="p">([</span>
<span class="w"> </span><span class="mi">2814</span><span class="p">,</span><span class="w"> </span><span class="mi">9330</span><span class="p">,</span><span class="w"> </span><span class="mi">8848</span><span class="p">,</span><span class="w"> </span><span class="mi">8971</span><span class="p">,</span><span class="w"> </span><span class="mi">9372</span><span class="p">,</span><span class="w"> </span><span class="mi">9084</span><span class="p">,</span><span class="w"> </span><span class="mi">78</span><span class="p">,</span><span class="w"> </span><span class="mi">3896</span><span class="p">,</span><span class="w"> </span><span class="mi">2609</span><span class="p">,</span><span class="w"> </span><span class="mi">5177</span><span class="p">,</span><span class="w"> </span><span class="mi">2866</span>
<span class="w"> </span><span class="c1">-- another 10,000 inspection IDs</span>
<span class="w"> </span><span class="mi">1399</span><span class="p">,</span><span class="w"> </span><span class="mi">9348</span><span class="p">,</span><span class="w"> </span><span class="mi">914</span><span class="p">,</span><span class="w"> </span><span class="mi">8884</span><span class="p">,</span><span class="w"> </span><span class="mi">9082</span><span class="p">,</span><span class="w"> </span><span class="mi">3356</span><span class="p">,</span><span class="w"> </span><span class="mi">2896</span><span class="p">,</span><span class="w"> </span><span class="mi">742</span><span class="p">,</span><span class="w"> </span><span class="mi">9432</span><span class="p">,</span><span class="w"> </span><span class="mi">8926</span><span class="p">,</span><span class="w"> </span><span class="mi">9153</span>
<span class="w"> </span><span class="p">])</span>
</pre></div>
<p>This query seems innocent, but in fact, it took ~8s to complete. The execution plan of this query looked like this:</p>
<div class="highlight"><pre><span></span><span class="n">Nested</span><span class="w"> </span><span class="n">Loop</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1000.28..2040346.39</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">26741</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">181</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Gather</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1000.00..2032378.29</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">26741</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">115</span><span class="p">)</span>
<span class="w"> </span><span class="n">Workers</span><span class="w"> </span><span class="n">Planned</span><span class="p">:</span><span class="w"> </span><span class="mf">2</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Parallel</span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">violation</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.00..2028704.19</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">11142</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">115</span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="k">Filter</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">vehicle_inspection_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ANY</span><span class="w"> </span><span class="p">(</span><span class="s1">'{2814,9330,....,8926,9153}'</span><span class="o">::</span><span class="nb">integer</span><span class="p">[]))</span>
</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">violationtype_pkey</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">violationtype</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.28..0.30</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">66</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">violation</span><span class="mf">.</span><span class="n">violation_type_id</span><span class="p">)</span>
</pre></div>
<p>I trimmed the execution plan for brevity, but the <code>Filter</code> line was three or four pages long, filled with IDs. This got us thinking, is it possible that this huge <code>ANY</code> filter is what's causing us trouble?</p>
<h3 id="replacing-prefetch_related-with-subquery-and-outerref"><a class="toclink" href="#replacing-prefetch_related-with-subquery-and-outerref">Replacing <code>prefetch_related</code> with <code>Subquery</code> and <code>OuterRef</code></a></h3>
<p>To answer this question we decided to try and implement the query without <code>prefetch_related</code>. Instead, we decided to use the new <a href="https://docs.djangoproject.com/en/2.2/ref/models/expressions/#subquery-expressions" rel="noopener"><code>Subquery</code> expression</a>.</p>
<p>Using <code>Subquery</code> the query using the ORM looked like that:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">OuterRef</span><span class="p">,</span> <span class="n">Subquery</span><span class="p">,</span> <span class="n">Value</span>
<span class="kn">from</span> <span class="nn">django.contrib.postgres.aggregates</span> <span class="kn">import</span> <span class="n">ArrayAgg</span>
<span class="n">inspections</span> <span class="o">=</span> <span class="n">inspections</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">violations_csv</span><span class="o">=</span><span class="n">Subquery</span><span class="p">(</span>
<span class="n">Violation</span><span class="o">.</span><span class="n">objects</span>
<span class="c1"># Reference the inspection ID of the outer table, inspection.</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">inspection_id</span><span class="o">=</span><span class="n">OuterRef</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="c1"># Prevent Django from adding a group by column.</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">dummy</span><span class="o">=</span><span class="n">Value</span><span class="p">(</span><span class="s1">'1'</span><span class="p">))</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'dummy'</span><span class="p">)</span>
<span class="c1"># Construct an array of violation names.</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">violations</span><span class="o">=</span><span class="n">ArrayAgg</span><span class="p">(</span><span class="s1">'violation_type__name'</span><span class="p">,</span> <span class="n">distinct</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'violations'</span><span class="p">)</span>
<span class="p">))</span>
</pre></div>
<p>If you never experimented with <code>Subquery</code> there is a lot to take in here. Before we break it down, this is what the query looks like:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="ss">"inspection"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">U2</span><span class="p">.</span><span class="ss">"name"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"violations"</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="ss">"violation"</span><span class="w"> </span><span class="n">U0</span>
<span class="w"> </span><span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"violationtype"</span><span class="w"> </span><span class="n">U2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">U0</span><span class="p">.</span><span class="ss">"violation_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">U2</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"inspection_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="ss">"inspection"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"violations_csv"</span><span class="p">,</span>
<span class="w"> </span><span class="c1">-- around 50 more fields from joined tables</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="ss">"inspection"</span>
<span class="w"> </span><span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"auth_user"</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"inspection"</span><span class="p">.</span><span class="ss">"user_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"auth_user"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">-- around 11 more joined tables</span>
</pre></div>
<p>Now, let's break it down:</p>
<ul>
<li><code>Subquery</code> is a query expression that can only exist inside another query. In this case, the outer query is <code>inspection</code>.</li>
<li><code>Subquery</code> in used in <code>annotate</code> so the result of the subquery is stored in a another column for each row.</li>
<li>We added a dummy annotation to prevent Django from grouping the results. The subquery is executed for each inspection, this is what the filter using <code>OuterRef</code> does. For this reason, we don't need to group by any other column.</li>
<li>The subquery must return at most one row, so we group the names into an array using <code>ARRAY_AGG</code>.</li>
</ul>
<p>After all this hard work, we were keen to see if this is the silver bullet we were waiting for, but in fact, when we executed this on 10,000 rows it choked. To see it through, we executed the export function with only 1,000 rows.</p>
<p>These were the results of exporting 1,000 rows in xls format using subquery:</p>
<div class="highlight"><pre><span></span>1571053 function calls (1514505 primitive calls) in 60.962 seconds
select 59.917
</pre></div>
<p>The query is now crazy slow. I won't paste the execution plan because there were so many other tables, but PostgreSQL used a nested loop join on the top level of the query to produce the value for this field. Surprisingly, the database did a significantly worse job than the ORM did in this case.</p>
<h3 id="using-an-iterator"><a class="toclink" href="#using-an-iterator">Using an Iterator</a></h3>
<p>Before we completely abandoned this solution, we wanted to check one last thing. We previously mentioned that <code>django-import-export</code> is using <code>iterator()</code> to create a cursor over the results. We also mentioned that using <code>prefetch_related</code> prevents us from using <code>iterator()</code>. Well, we no longer use <code>prefetch_related</code> so we might as well check if using <code>iterator()</code> makes any difference.</p>
<p>These were the results of exporting 1,000 rows in xls format using subquery and iterator:</p>
<div class="highlight"><pre><span></span>1571580 function calls (1514788 primitive calls) in 62.130 seconds
select 60.618
</pre></div>
<p>The iterator made no difference.</p>
<h3 id="simplifying-the-query"><a class="toclink" href="#simplifying-the-query">Simplifying the Query</a></h3>
<p>In a final attempt to get something out of this expedition, we wanted to see if the complexity of the query prevented PostgreSQL from finding an optimal execution plan. To do that, we could have adjusted the database parameters <a href="https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-FROM-COLLAPSE-LIMIT" rel="noopener"><code>from_collapse_limit</code></a> and <a href="https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT" rel="noopener"><code>join_collapse_limit</code></a> and let PostgreSQL take all the time and resources it needs to find an optimal execution plan, but instead, we decided to strip all other fields from the resources besides <code>id</code> and <code>violations</code>.</p>
<p>These were the results of exporting 1,000 rows containing only the id and violations fields in xls format using subquery and iterator:</p>
<div class="highlight"><pre><span></span>6937 function calls (6350 primitive calls) in 57.280 seconds
select 57.255
</pre></div>
<p>No change, this is officially a dead end!</p>
<h3 id="manual-prefetch"><a class="toclink" href="#manual-prefetch">Manual Prefetch</a></h3>
<p>After a quick lunch break we decided it's time pull out the big guns. If Django's prefetch implementation wasn't working for us, and PostgreSQL was unable to produce a decent execution plan, we will just have to do it ourselves.</p>
<p>To implement our own "prefetch" we needed to adjust some of the other functions in the resource:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">import_export</span> <span class="kn">import</span> <span class="n">resources</span><span class="p">,</span> <span class="n">fields</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Inspection</span><span class="p">,</span> <span class="n">Violation</span>
<span class="k">class</span> <span class="nc">InspectionResource</span><span class="p">(</span><span class="n">resources</span><span class="o">.</span><span class="n">ModelResource</span><span class="p">):</span>
<span class="n">violations</span> <span class="o">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">Field</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Inspection</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'violations'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">export</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">queryset</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="c1"># Manually prefetch the violations.</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">prefetched_violations</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span>
</span> <span class="n">Violation</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">inspection_id__in</span><span class="o">=</span><span class="p">(</span>
<span class="n">queryset</span>
<span class="c1"># Clean all joins.</span>
<span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
<span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'pk'</span><span class="p">)</span>
<span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">violations_csv</span><span class="o">=</span><span class="n">ArrayAgg</span><span class="p">(</span><span class="s1">'violation_type__name'</span><span class="p">),</span>
<span class="p">)</span>
<span class="o">.</span><span class="n">values_list</span><span class="p">(</span>
<span class="s1">'vehicle_inspection_id'</span><span class="p">,</span>
<span class="s1">'violations_csv'</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">export</span><span class="p">(</span><span class="n">queryset</span><span class="p">)</span>
<span class="hll"> <span class="k">def</span> <span class="nf">dehydrate_violations</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inspection</span><span class="p">:</span> <span class="n">Inspection</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
</span> <span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prefetched_violations</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">inspection</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="p">[]))</span>
</pre></div>
<p>This looks like a lot, but it's actually not:</p>
<ol>
<li>
<p>We create our own "prefetch related" dict <code>prefetched_violations</code>:</p>
<ul>
<li>The key is the violation ID, and the value is an array containing the violation names (<code>violations_csv</code>).</li>
<li>To fetch only relevant violations, we use filter using <code>queryset</code> to filter only the necessary inspections.</li>
<li>We executed <code>select_related(None)</code> to remove all previously set <code>select_related</code> tables, and make the ORM remove any unnecessary joins.</li>
</ul>
</li>
<li>
<p>We return the original queryset to the <code>export</code> function which produces the Excel file.</p>
</li>
<li>
<p>To construct the value for the <code>violations</code> field, we use the <code>prefetched_violations</code> we populated during <code>export</code>. This is the "lookup" part of the prefetch. While using Django's <code>prefetch_related</code> we have access to this value on the instance, when we do it manually we have to look it up ourselves.</p>
</li>
<li>
<p>Once again, since we no longer use Django's <code>prefetch_related</code> we were able to use an iterator. So, instead of evaluating the query we return a queryset.</p>
</li>
</ol>
<p>We already got disappointed after putting in a lot of effort the last time, let's see if this time the hard work paid off.</p>
<p>These were the results of exporting 10,000 rows in xls format using manual prefetch and iterator:</p>
<div class="highlight"><pre><span></span>15281887 function calls (14721333 primitive calls) in 11.411 seconds
select 0.833
manual prefetch 0.107
</pre></div>
<p>Compared to the 40 seconds we started with, this is an overall 75% improvement. 20s were reduced by switching to xls format, another 10s were from manually doing the prefetch.</p>
<p>We are ready for production!</p>
<h3 id="trouble-in-paradise"><a class="toclink" href="#trouble-in-paradise">Trouble in Paradise</a></h3>
<p>Quickly after rolling out the new version to production we started getting complaints from users not being able to open the file.</p>
<p>Remember I told you using xls was a bad idea? Well, when users started downloading the xls files they got a nasty message saying the file is corrupt, and excel, thank god, managed to salvage some of the data (which is way worse!).</p>
<p>One might ask, <em>"but how come you didn't catch this in QA?"</em>. Well, that's just another reason we hate working with Excel. When we tested it locally on our Linux desktops using LibreOffice, it worked just fine.</p>
<figure><img alt=""But it works on my machine!"" src="https://hakibenita.com/images/02-python-django-optimizing-excel-export.jpg"><figcaption>"But it works on my machine!"</figcaption>
</figure>
<p>So let's recap:</p>
<ul>
<li>xlsx is slow and consumes a lot of CPU.</li>
<li>xls is not supported by the excel version used by our users.</li>
<li>csv has many encoding and formatting issues, and proved to be unusable in the past.</li>
</ul>
<hr>
<h2 id="using-a-different-excel-writer"><a class="toclink" href="#using-a-different-excel-writer">Using a Different Excel Writer</a></h2>
<p>As always, when all options suck and the future is looking bleak, we turned to Google.</p>
<p>A quick search of <em>"python excel performance"</em> brought up <a href="https://gist.github.com/jmcnamara/ba25c2bf4ba0777065eb" rel="noopener">this gist</a> which compares 4 different Excel writers in Python (gotta love the internet!).</p>
<p>These are the benchmark results:</p>
<div class="highlight"><pre><span></span># Source: https://gist.github.com/jmcnamara/ba25c2bf4ba0777065eb
Versions:
python : 2.7.2
openpyxl : 2.2.1
pyexcelerate: 0.6.6
xlsxwriter : 0.7.2
xlwt : 1.0.0
Dimensions:
Rows = 10000
Cols = 50
Times:
<span class="hll"> pyexcelerate : 10.63
</span> xlwt : 16.93
xlsxwriter (optimised): 20.37
xlsxwriter : 24.24
openpyxl (optimised): 26.63
<span class="hll"> openpyxl : 35.75
</span></pre></div>
<p>According to the results, there is a big difference between the xlsx libraries.</p>
<p>As mentioned before, we use <code>django-import-export</code> to produce excel files from Django models and querysets. Under the hood, <code>django-import-export</code> is using the popular <a href="https://docs.python-tablib.org" rel="noopener"><code>tablib</code> package</a> to do the actual export.</p>
<p>Tablib offers export and import capabilities to and from many formats, but it doesn't do any of the heavy lifting itself. To <a href="https://github.com/vinayak-mehta/tablib/blob/master/tablib/formats/_xlsx.py" rel="noopener">produce xlsx files</a>, tablib is using the package <a href="https://openpyxl.readthedocs.io/en/stable/" rel="noopener"><code>openpyxl</code></a>.</p>
<h3 id="a-faster-excel-writer-in-python"><a class="toclink" href="#a-faster-excel-writer-in-python">A Faster Excel Writer in Python</a></h3>
<p>Looking back at the benchmark results, <code>openpyxl</code> is the slowest among all packages. It looks like by switching to the fastest implementation, <code>pyexcelerate</code> we might be able to gain some significant improvement for this export process.</p>
<p>The <a href="https://github.com/kz26/PyExcelerate" rel="noopener">package <code>pyexcelerate</code></a> looked great from start. The tag line is just what we needed:</p>
<blockquote>
<p>PyExcelerate is a Python for writing Excel-compatible XLSX spreadsheet files, with an emphasis on speed.</p>
</blockquote>
<p>Even the snarky subtitles on the <a href="https://github.com/kz26/PyExcelerate#usage" rel="noopener">"Usage" section</a> in the README were just what we wanted: <em>fast, faster and fastest!</em></p>
<p>With such promising benchmarks and README, we had to try it out!</p>
<h3 id="patching-tablib"><a class="toclink" href="#patching-tablib">Patching <code>tablib</code></a></h3>
<p>We already have an entire system built on top of <code>django-import-export</code> and <code>tablib</code>, and we didn't want to start making changes everywhere. So instead, we looked for a way to patch tablib, and make it use <code>pyexcelerate</code> instead of <code>openpyxl</code>.</p>
<p>After some digging, we found that tablib uses an internal function called <a href="https://github.com/vinayak-mehta/tablib/blob/d25d24a9bb1ef0479e800aacbe6ae7dd9e6e35c2/tablib/core.py#L253" rel="noopener"><code>_register_formats</code></a> to add export and import formats such as csv, xls and xlsx. To get a list of available formats, tablib imports a collection called <code>available</code> from the module <code>formats</code>. The contents of the file <a href="https://github.com/vinayak-mehta/tablib/blob/d25d24a9bb1ef0479e800aacbe6ae7dd9e6e35c2/tablib/formats/__init__.py" rel="noopener"><code>formats/__init__.py</code></a> where the collection is defined, look like this:</p>
<div class="highlight"><pre><span></span><span class="c1"># -*- coding: utf-8 -*-</span>
<span class="sd">""" Tablib - formats</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_csv</span> <span class="k">as</span> <span class="n">csv</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_json</span> <span class="k">as</span> <span class="n">json</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_xls</span> <span class="k">as</span> <span class="n">xls</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_yaml</span> <span class="k">as</span> <span class="n">yaml</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_tsv</span> <span class="k">as</span> <span class="n">tsv</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_html</span> <span class="k">as</span> <span class="n">html</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_xlsx</span> <span class="k">as</span> <span class="n">xlsx</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_ods</span> <span class="k">as</span> <span class="n">ods</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_dbf</span> <span class="k">as</span> <span class="n">dbf</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_latex</span> <span class="k">as</span> <span class="n">latex</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_df</span> <span class="k">as</span> <span class="n">df</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_rst</span> <span class="k">as</span> <span class="n">rst</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_jira</span> <span class="k">as</span> <span class="n">jira</span>
<span class="n">available</span> <span class="o">=</span> <span class="p">(</span><span class="n">json</span><span class="p">,</span> <span class="n">xls</span><span class="p">,</span> <span class="n">yaml</span><span class="p">,</span> <span class="n">csv</span><span class="p">,</span> <span class="n">dbf</span><span class="p">,</span> <span class="n">tsv</span><span class="p">,</span> <span class="n">html</span><span class="p">,</span> <span class="n">jira</span><span class="p">,</span> <span class="n">latex</span><span class="p">,</span> <span class="n">xlsx</span><span class="p">,</span> <span class="n">ods</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">rst</span><span class="p">)</span>
</pre></div>
<p>The interesting part is the contents of the file <a href="https://github.com/vinayak-mehta/tablib/blob/d25d24a9bb1ef0479e800aacbe6ae7dd9e6e35c2/tablib/formats/_xlsx.py" rel="noopener">_xlsx.py</a>. The file defines some functions to export and import from Excel using <code>openpyxl</code>.</p>
<p>To patch <code>tablib</code>, we first need to implement a similar interface to the one in <code>_xlsx.py</code> using <code>pyexcelerate</code>, and then register it in <code>tablib</code>.</p>
<p>Let's start with implementing <code>_xlsx.py</code> using <code>pyexcelerate</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># fast_xlsx.py</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">BytesIO</span>
<span class="kn">from</span> <span class="nn">tablib.formats._xlsx</span> <span class="kn">import</span> <span class="o">*</span> <span class="c1"># noqa</span>
<span class="kn">from</span> <span class="nn">pyexcelerate</span> <span class="kn">import</span> <span class="n">Workbook</span>
<span class="c1"># Override the default xlsx implementation</span>
<span class="n">title</span> <span class="o">=</span> <span class="s1">'xlsx'</span>
<span class="k">def</span> <span class="nf">export_set</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">freeze_panes</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns XLSX representation of Dataset."""</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">title</span> <span class="ow">or</span> <span class="s1">'Sheet1'</span>
<span class="n">wb</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="n">wb</span><span class="o">.</span><span class="n">new_sheet</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">itertools</span><span class="o">.</span><span class="n">chain</span><span class="p">([</span><span class="n">dataset</span><span class="o">.</span><span class="n">headers</span><span class="p">],</span> <span class="n">dataset</span><span class="p">))</span>
<span class="n">stream</span> <span class="o">=</span> <span class="n">BytesIO</span><span class="p">()</span>
<span class="n">wb</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span>
<span class="k">return</span> <span class="n">stream</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">export_book</span><span class="p">(</span><span class="n">databook</span><span class="p">,</span> <span class="n">freeze_panes</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns XLSX representation of DataBook."""</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">databook</span><span class="o">.</span><span class="n">_datasets</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">export_set</span><span class="p">(</span><span class="n">databook</span><span class="o">.</span><span class="n">_datasets</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">freeze_panes</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">dset_sheet</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">assert</span> <span class="kc">False</span><span class="p">,</span> <span class="s1">'How did you get here?'</span>
</pre></div>
<p>This is a simple implementation of the main functions. It lacks some functionalities such a multiple sheets, but it was fine for our needs.</p>
<p>Next, we need to make <code>tablib</code> register this file instead of the existing xlsx format. To do that, we created a new file called <code>monkeypatches.py</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># monkeypatches.py</span>
<span class="kn">import</span> <span class="nn">tablib</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">fast_xlsx</span>
<span class="c1"># Override default xlsx format with a faster implementation</span>
<span class="c1"># using `pyexcelerate` (export only).</span>
<span class="n">tablib</span><span class="o">.</span><span class="n">formats</span><span class="o">.</span><span class="n">available</span> <span class="o">+=</span> <span class="p">(</span><span class="n">fast_xlsx</span><span class="p">,</span> <span class="p">)</span>
</pre></div>
<p>To apply the patch to <code>tablib</code>, we import our implementation and add it to the available formats list. We then import this file in the module's <code>__init__.py</code> so every time the system starts up, <code>tablib</code> is patched.</p>
<p>Now for the moment of truth, <em>did all this hard work finally paid off?</em></p>
<p>These were the results of exporting 10,000 rows in xlsx format with <code>pyexcelerate</code> using manual prefetch and iterator:</p>
<div class="highlight"><pre><span></span>13627507 function calls (13566956 primitive calls) in 10.944 seconds
select 0.137
manual prefetch 2.219
</pre></div>
<p>The hard work definitely paid off! Just so we have an honest comparison, these are the results of exporting 10,000 rows in xlsx format without patching <code>tablib</code> using manual prefetch and iterator:</p>
<div class="highlight"><pre><span></span>55982358 function calls (46155371 primitive calls) in 29.965 seconds
select 0.137
manual prefetch 1.724
</pre></div>
<p>That's a 64% improvement compared to the default implementation provided by <code>tablib</code>, and a 75% improvements compared to the 40s we started with.</p>
<hr>
<h2 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h2>
<p>This a summary of all the results mentioned in the article:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>Time</th>
<th>Rows</th>
<th>Format</th>
<th>Method</th>
</tr>
</thead>
<tbody>
<tr>
<td>39.927s</td>
<td>10,000</td>
<td>xlsx</td>
<td><code>prefetch_related</code> (Django)</td>
</tr>
<tr>
<td>17.429s</td>
<td>10,000</td>
<td>csv</td>
<td><code>prefetch_related</code> (Django)</td>
</tr>
<tr>
<td>20.694s</td>
<td>10,000</td>
<td>xls</td>
<td><code>prefetch_related</code> (Django)</td>
</tr>
<tr>
<td>60.962</td>
<td>1,000</td>
<td>xls</td>
<td>subquery</td>
</tr>
<tr>
<td>62.130</td>
<td>1,000</td>
<td>xls</td>
<td>subquery and iterator</td>
</tr>
<tr>
<td>57.280s</td>
<td>1,000</td>
<td>xls</td>
<td>simplified query, subquery and iterator</td>
</tr>
<tr>
<td>29.965s</td>
<td>10,000</td>
<td>xlsx</td>
<td>default <code>tablib</code> implementation, manual prefetch and iterator</td>
</tr>
<tr>
<td>11.411s</td>
<td>10,000</td>
<td>xls</td>
<td>using manual prefetch and iterator</td>
</tr>
<tr>
<td>10.944s</td>
<td>10,000</td>
<td>xlsx</td>
<td>using <code>pyexcelerate</code>, manual prefetch and iterator</td>
</tr>
</tbody>
</table>
</div>
<hr>
<h2 id="seifa"><a class="toclink" href="#seifa">Seifa</a></h2>
<p>We try to study every incident and take actions to prevent similar incidents from happening in the future. During this incident, some of our users did experience slowness for a short period of time, however, the "Export to Excel" functionality did not <em>really</em> killed our app.</p>
<p>Following this incident, there are a few open questions we haven't had the chance to fully explore yet:</p>
<ul>
<li>
<p><strong>Why was the prefetch query so slow?</strong> The difference boils down to executing <code>Model.objects.filter(fk__in = [1,2,3,4....9,999, 10,000])</code> vs executing <code>Model.objects.filter(fk__in = OtherModel.objects.filter( ... ).values_list('pk'))</code>. When we tried to compare the two in the database, we found no difference, but the built-in <code>prefetch_related</code> was significantly slower. Is it possible that time is being spent generating the query in Python?</p>
</li>
<li>
<p><strong>Can <code>openpyxl3</code> performance be improved?</strong> When I talked to John, the author of the Excel writers benchmark, he mentioned that <a href="https://openpyxl.readthedocs.io/en/stable/#installation" rel="noopener"><code>openpyxl3</code> can be faster if <code>lxml</code> is installed</a>.</p>
</li>
<li>
<p><strong>Is xlsx really the best format?</strong> Can we eliminate some of the problems we had with csv by switching to <a href="https://ebay.github.io/tsv-utils/docs/comparing-tsv-and-csv.html" rel="noopener">a different textual format such as tsv</a>?</p>
</li>
</ul>
<p>If you have the answer to any of these questions feel free to share them with me and i'll be happy to post the response.</p>
<hr>
<p><em>UPDATED: Aug 19, 2019</em></p>
<h2 id="comments-from-readers"><a class="toclink" href="#comments-from-readers">Comments From Readers</a></h2>
<p>A <a href="https://lobste.rs/s/ujaizr/how_export_excel_almost_killed_our_system#c_3jvn6v" rel="noopener">reader from lobste.rs</a> ran a quick benchmark to check how faster <code>openpyxl</code> can get using <code>lxml</code>. These were his results:</p>
<div class="highlight"><pre><span></span>Versions:
python: 3.6.8
Dimensions:
Cols = 50
Sheets = 1
Proportion text = 0.10
optimised = True
Rows = 10000
Times:
openpyxl: 2.6.3 using LXML True: 3.70
openpyxl: 2.6.3 using LXML False: 6.03
Rows = 1000
Times:
openpyxl: 2.6.3 using LXML True: 0.37
openpyxl: 2.6.3 using LXML False: 0.57
</pre></div>
<p>This benchmark shows that <code>openpyxl</code> can be made almost twice as fast just by installing <code>lxml</code>. However, <code>pyexcelerate</code> improved the speed by a factor of 3.</p>
<hr>
<p>Many reader on <a href="https://www.reddit.com/r/django/comments/d5i9p4/how_export_to_excel_almost_killed_our_system/" rel="noopener">Reddit</a> and <a href="https://lobste.rs/s/ujaizr/how_export_excel_almost_killed_our_system" rel="noopener">Lobsters</a> suggested that a better approach would be to generate the Excel file on the client side using Javascript. This is definitely something worth considering when designing a new system, even thought I think this approach might be problematic for very large files.</p>How to Get the First or Last Value in a Group Using Group By in SQL2019-08-13T00:00:00+03:002019-08-13T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-08-13:/sql-group-by-first-last-value<p>Getting the last value of a group in an aggregated query in PostgreSQL is a challenging task. In this article we present a simple way to get the first or last value of a group using group by.</p><hr>
<p>I recently had to produce reports on a table containing events of a user's account balance. The user can deposit and withdraw from their account, and support personnel can set the account's credit, which is the maximum amount the user can overdraw.</p>
<p>The table looked roughly like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">event</span><span class="p">;</span>
<span class="go"> id | account | type | happened_at | data</span>
<span class="go">----+---------+------------+------------------------+-----------------------------------</span>
<span class="go"> 1 | 1 | created | 2019-08-01 15:14:13+03 | {"credit": 0, "delta_balance": 0}</span>
<span class="go"> 2 | 1 | deposited | 2019-08-01 15:15:15+03 | {"delta_balance": 100}</span>
<span class="go"> 3 | 1 | withdraw | 2019-08-02 09:35:33+03 | {"delta_balance": -50}</span>
<span class="go"> 4 | 1 | credit_set | 2019-08-03 16:14:12+03 | {"credit": 50}</span>
<span class="go"> 5 | 1 | withdraw | 2019-08-03 14:45:44+03 | {"delta_balance": -30}</span>
<span class="go"> 6 | 1 | credit_set | 2019-08-03 16:14:12+03 | {"credit": 100}</span>
<span class="go"> 7 | 1 | withdraw | 2019-08-03 16:15:09+03 | {"delta_balance": -50}</span>
<span class="go">(7 rows)</span>
</pre></div>
<p><details>
<summary>Expand for table and data setup</summary></p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">event</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">serial</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span>
<span class="w"> </span><span class="n">account</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
<span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">timestamp</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="k">time</span><span class="w"> </span><span class="k">zone</span><span class="p">,</span>
<span class="w"> </span><span class="k">data</span><span class="w"> </span><span class="n">jsonb</span>
<span class="p">);</span>
<span class="k">SET</span><span class="w"> </span><span class="n">dateformat</span><span class="w"> </span><span class="n">YMD</span><span class="p">;</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">event</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">account</span><span class="p">,</span><span class="w"> </span><span class="k">type</span><span class="p">,</span><span class="w"> </span><span class="n">happened_at</span><span class="p">,</span><span class="w"> </span><span class="k">data</span>
<span class="p">)</span>
<span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'created'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-01 15:14:13'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"delta_balance": 0, "credit": 0}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'deposited'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-01 15:15:15'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"delta_balance": 100}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'withdraw'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-02 09:35:33'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"delta_balance": -50}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-03 16:14:12'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"credit": 50}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'withdraw'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-03 14:45:44'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"delta_balance": -30}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-03 16:14:12'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"credit": 100}'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'withdraw'</span><span class="p">,</span><span class="w"> </span><span class="s1">'2019-08-03 16:15:09'</span><span class="p">,</span><span class="w"> </span><span class="s1">'{"delta_balance": -50}'</span><span class="p">)</span>
<span class="p">;</span>
</pre></div>
<p></details></p>
<figure><img alt="What deposit boxes used to look like. Photo by <a href="https://unsplash.com/@tjevans">Tim Evans</a>" src="https://hakibenita.com/images/00-sql-group-by-first-last-value.jpg"><figcaption>What deposit boxes used to look like. Photo by <a href="https://unsplash.com/@tjevans">Tim Evans</a></figcaption>
</figure>
<p>To get the current balance of an account, we sum the changes in <code>delta_balance</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | balance</span>
<span class="go">---------+---------</span>
<span class="go"> 1 | -30</span>
</pre></div>
<p>The <code>data</code> field contains information specific to each type of event. To extract the value of <code>delta_balance</code> from the <code>data</code> column we use the <a href="https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-OP-TABLE" rel="noopener">arrow operator provided by PostgreSQL</a>.</p>
<p>The result of the query shows that the current balance of account 1 is -30. This means the account is in overdraft. To check if this is within the allowed range, we need to compare it to the credit set for this account. The credit for account 1 was initially set to 0 when the account was created. The credit was then adjusted twice, and is currently set to 100.</p>
<p>To get the current state of an account, we need its aggregated balance and the latest credit that was set for it.</p>
<h2 id="the-problem"><a class="toclink" href="#the-problem">The Problem</a></h2>
<p>In Oracle there is a function called <a href="https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions071.htm" rel="noopener"><code>last</code></a> we can be use to get the last <code>credit_set</code> event. A query using <code>last</code> might look like this:</p>
<div class="highlight"><pre><span></span><span class="c1">-- Oracle</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="k">MAX</span><span class="p">(</span><span class="k">CASE</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="k">data</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">END</span><span class="p">)</span>
<span class="w"> </span><span class="n">KEEP</span><span class="w"> </span><span class="p">(</span><span class="n">DENSE_RANK</span><span class="w"> </span><span class="k">LAST</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
</pre></div>
<p>PostgreSQL also has a <a href="https://www.postgresql.org/docs/current/functions-window.html#id-1.5.8.26.6.2.2.10.1.1" rel="noopener"><code>LAST_VALUE</code></a> analytic function. Analytics functions cannot be used in a group by the way aggregate functions do:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">LAST_VALUE</span><span class="p">(</span><span class="k">data</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">RANGE</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">FOLLOWING</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="gs">ERROR:</span><span class="gr"> column "event.data" must appear in the GROUP BY clause or be used in an aggregate function</span>
<span class="gs">LINE 3:</span><span class="gr"> LAST_VALUE(data) OVER (</span>
</pre></div>
<p>The error tells us that the <code>data</code> field used in the analytic function must be used in the group by. This is not really what we want. To use PostgreSQL's <code>LAST_VALUE</code> function, we need to remove the group by:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">LAST_VALUE</span><span class="p">(</span><span class="k">data</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">RANGE</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">FOLLOWING</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">;</span>
<span class="go"> account | credit</span>
<span class="go">---------+-----------------</span>
<span class="go"> 1 | {"credit": 100}</span>
<span class="go"> 1 | {"credit": 100}</span>
</pre></div>
<p>These are not exactly the results we need. Analytic, or window functions, operate on a set of rows, and not in a group by.</p>
<p><strong>PostgreSQL doesn't have a built-in function to obtain the first or last value in a group using group by.</strong></p>
<p>To get the last value in a group by, we need to get creative!</p>
<hr>
<h2 id="old-fashioned-sql"><a class="toclink" href="#old-fashioned-sql">Old Fashioned SQL</a></h2>
<p>The plain SQL solution is to divide and conquer. We already have a query to get the current balance of an account. If we write another query to get the credit for each account, we can join the two together and get the complete state of an account.</p>
<p>To get the last event for each account in PostgreSQL we can use <a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT" rel="noopener"><code>DISTINCT ON</code></a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">account</span><span class="p">)</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
<span class="go">account | credit</span>
<span class="go">---------+--------</span>
<span class="go"> 1 | 100</span>
</pre></div>
<p>Great! Using <code>DISTINCT ON</code> we got the last credit set for each account.</p>
<div class="admonition info">
<p class="admonition-title">DISTINCT ON</p>
<p>I've written about <a href="/the-many-faces-of-distinct-in-postgre-sql">the many faces of DISTINCT in PostgreSQL</a>.</p>
</div>
<p>The next step is to join the two queries, and get the complete account state:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">a</span><span class="mf">.</span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">a</span><span class="mf">.</span><span class="n">balance</span><span class="p">,</span>
<span class="w"> </span><span class="n">b</span><span class="mf">.</span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">a</span>
<span class="w"> </span><span class="k">JOIN</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">account</span><span class="p">)</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="w"> </span><span class="k">WHERE</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">DESC</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">b</span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">a</span><span class="mf">.</span><span class="n">account</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="mf">.</span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | balance | credit</span>
<span class="go">---------+---------+--------</span>
<span class="go"> 1 | -30 | 100</span>
</pre></div>
<p>We got the expected result.</p>
<p>Before we move on, let's take a glance at the execution plan:</p>
<div class="highlight"><pre><span></span> QUERY PLAN
---------------------------------------------------------------------------------------------
Hash Join (cost=44.53..49.07 rows=4 width=44)
Hash Cond: (event.account = b.account)
-> HashAggregate (cost=25.00..27.00 rows=200 width=12)
Group Key: event.account
-> Seq Scan on event (cost=0.00..17.50 rows=750 width=36)
-> Hash (cost=19.49..19.49 rows=4 width=36)
-> Subquery Scan on b (cost=19.43..19.49 rows=4 width=36)
-> Unique (cost=19.43..19.45 rows=4 width=40)
-> Sort (cost=19.43..19.44 rows=4 width=40)
Sort Key: event_1.account, event_1.id DESC
-> Seq Scan on event event_1 (cost=0.00..19.39 rows=4 width=40)
Filter: (type = 'credit_set'::text)
</pre></div>
<p>The event table is being scanned twice, once for each subquery. The <code>DISTINCT ON</code> subquery also requires a sort by <code>account</code> and <code>id</code>. The two subqueries are then joined using a hash-join.</p>
<p><strong>PostgreSQL is unable to combine the two subqueries into a single scan of the table.</strong> If the event table is very large, performing two full table scans, and a sort and a hash join, might become slow and consume a lot of memory.</p>
<div class="admonition info">
<p class="admonition-title">common table expression</p>
<p>It's tempting to use common table expression (CTE) to make the query more readable. But, <a href="/be-careful-with-cte-in-postgre-sql">CTE's are currently optimization fences</a>, and using it here will most definitely prevent PostgreSQL from performing any optimization that involves both subqueries.</p>
</div>
<hr>
<h2 id="the-array-trick"><a class="toclink" href="#the-array-trick">The Array Trick</a></h2>
<p>Using good ol' SQL got us the answer, but it took two passes on the table. We can do better with the following trick:</p>
<div class="highlight"><pre><span></span><span class="gp">db#=></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">MAX</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">])</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | balance | credit</span>
<span class="go">---------+---------+--------</span>
<span class="go"> 1 | -30 | 100</span>
</pre></div>
<p>This is so much simpler than the previous question, so let's break it down.</p>
<h3 id="how-postgresql-compares-arrays"><a class="toclink" href="#how-postgresql-compares-arrays">How PostgreSQL Compares Arrays</a></h3>
<p>To understand what exactly is going on here, we first need to understand <a href="https://www.postgresql.org/docs/current/functions-array.html" rel="noopener">how PostgreSQL compares arrays</a>:</p>
<blockquote>
<p>Array comparisons compare the array contents element-by-element, using the default B-tree comparison function for the element data type.</p>
</blockquote>
<p>When comparing arrays, PostgreSQL will go element by element and compare the values according to their type. To demonstrate:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">greatest</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">200</span><span class="p">],</span><span class="w"> </span><span class="k">ARRAY</span><span class="p">[</span><span class="mf">2</span><span class="p">,</span><span class="w"> </span><span class="mf">100</span><span class="p">]);</span>
<span class="go"> greatest</span>
<span class="go">----------</span>
<span class="go"> {2,100}</span>
</pre></div>
<p>The first element of the second array (2) is larger than the first element of the first array (1), so it's the greatest.</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">greatest</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">200</span><span class="p">],</span><span class="w"> </span><span class="k">ARRAY</span><span class="p">[</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">201</span><span class="p">]);</span>
<span class="go"> greatest</span>
<span class="go">----------</span>
<span class="go"> {1,201}</span>
<span class="go">(1 row)</span>
</pre></div>
<p>The first elements of both arrays are equal (1), so PostgreSQL moves on to the next element. In this case, the second element of the second array (201) is the greatest.</p>
<h3 id="max-by-key"><a class="toclink" href="#max-by-key">Max by Key...</a></h3>
<p>Using this feature of PostgreSQL, we construct an array where the first element is the value to sort by, and the second element is the value we want to keep. In our case, we want to get the credit by the max <code>id</code>:</p>
<div class="highlight"><pre><span></span><span class="k">MAX</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">])</span>
</pre></div>
<p>Not all events set credit, so we need to restrict the result to <code>credit_set</code> events:</p>
<div class="highlight"><pre><span></span><span class="k">MAX</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">])</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">)</span>
</pre></div>
<p>The result of this expression is an array:</p>
<div class="highlight"><pre><span></span><span class="gp">db#=></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="k">ARRAY</span><span class="p">[</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">])</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | max</span>
<span class="go">---------+---------</span>
<span class="go"> 1 | {6,100}</span>
</pre></div>
<p>We only want the second element, the value of <code>credit</code>:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">MAX</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">])</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mi">2</span><span class="p">]</span>
</pre></div>
<p>And this is it! This way we can get the last credit set for each account.</p>
<p>Next, let's examine the execution plan:</p>
<div class="highlight"><pre><span></span> QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=32.50..34.50 rows=200 width=16)
Group Key: account
-> Seq Scan on event (cost=0.00..17.50 rows=750 width=72)
</pre></div>
<p>Simple plan for a simple query!</p>
<p>The main benefit of this approach is that it only needs one pass of the table and no sort.</p>
<h3 id="caveats"><a class="toclink" href="#caveats">Caveats</a></h3>
<p>This approach is very useful, but it has some restrictions:</p>
<div class="admonition warning">
<p class="admonition-title">caveat</p>
<p>All elements must be of the same type.</p>
</div>
<p>PostgreSQL does not support <a href="https://www.postgresql.org/docs/current/arrays.html#ARRAYS-DECLARATION" rel="noopener">arrays with different types of elements</a>. As an example, if we wanted to get the last credit set by date, and not by id:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">MAX</span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">ARRAY</span><span class="p">[</span><span class="n">happened_at</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">]</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="hll"><span class="gs">ERROR:</span><span class="gr"> ARRAY types timestamp with time zone and integer cannot be matched</span>
</span><span class="gp">LINE 4: (MAX(ARRAY[happened_at, (data-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">])</span><span class="w"> </span><span class="k">FILTER</span><span class="mf">...</span>
</pre></div>
<p>PostgreSQL tells us that an array cannot contain both timestamps and integers.</p>
<p>We can overcome this restriction in some cases by casting one of the elements:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">MAX</span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="k">ARRAY</span><span class="p">[</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'EPOCH'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">happened_at</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">]</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mf">2</span><span class="p">]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | balance | credit</span>
<span class="go">---------+---------+--------</span>
<span class="go"> 1 | -30 | 100</span>
</pre></div>
<p>We <a href="https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT" rel="noopener">converted the timestamp to epoch</a>, which is the number of seconds since 1970. Once both elements are of the same type, we can use the array trick.</p>
<div class="admonition warning">
<p class="admonition-title">caveat</p>
<p>Query might consume a little more memory.</p>
</div>
<p>This one if a bit of a stretch, but as <a href="how-we-solved-a-storage-problem-in-postgre-sql-without-adding-a-single-bytes-of-storage">we demonstrated in the
past</a>,
large group and sort keys consume more memory in joins and sorts. Using the array trick, the group key is an array, which is a bit larger than the plain fields we usually sort by.</p>
<p>Also, the array trick can be used to "piggyback" more than one value:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">MAX</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">,</span>
<span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'EPOCH'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">happened_at</span><span class="p">)</span>
<span class="p">])</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mi">2</span><span class="p">:]</span>
</pre></div>
<p>This query will return both the last credit set, and the date in which it was set. The entire array is used for sorting, so the more values we put in the array, the larger the group key gets.</p>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>The array trick is very useful, and can significantly simplify complicated queries and improve performance. We use it to produce reports on time series data, and to generate read models from event tables.</p>
<hr>
<div class="admonition tip">
<p class="admonition-title">Call out to my readers</p>
<p>I'm pretty sure using <a href="https://www.postgresql.org/docs/current/xaggr.html" rel="noopener">user-defined aggregate function in PostgreSQL</a> it should be possible to create a function with the signature <code>MAX_BY(key, value)</code>. I haven't had time to dig deep into custom aggregate functions, but if any of the readers do, please share your implementation and i'll be happy to post it here.</p>
</div>
<hr>
<p><em>UPDATED: Aug 17, 2019</em></p>
<h2 id="comments-from-readers"><a class="toclink" href="#comments-from-readers">Comments From Readers</a></h2>
<p>In the few days following the publication of this article, I received several suggestions and comments from readers. This is a summary of the comments I received, and my thoughts on them.</p>
<hr>
<p>One <a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewsbdqo/" rel="noopener">commenter on Reddit</a> suggested using <code>ARRAY_AGG</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="p">((</span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">data</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">DESC</span><span class="p">)</span>
<span class="w"> </span><span class="k">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">))[</span><span class="mf">1</span><span class="p">]</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span><span class="p">;</span>
<span class="go"> account | credit</span>
<span class="go">---------+--------</span>
<span class="go"> 1 | 50</span>
</pre></div>
<p>This approach obviously works, and it doesn't require the key and the value to be of the same type, which is a big limitation of the array trick.</p>
<p>The downside to this approach is that it requires a sort which might become expensive with very large data sets:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">QUERY</span><span class="w"> </span><span class="n">PLAN</span>
<span class="c1">------------------------------------------------------------------</span>
<span class="w"> </span><span class="n">GroupAggregate</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.17..1.36</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">8</span><span class="p">)</span>
<span class="w"> </span><span class="k">Group</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">account</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.17..1.19</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">76</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">account</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">event</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.00..1.07</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">76</span><span class="p">)</span>
</pre></div>
<hr>
<p>Another <a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewu6juf/" rel="noopener">commenter on Reddit</a> suggested using window functions in combination with <code>DISTINCT ON</code>. This is the original suggestion:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">account</span><span class="p">)</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">FIRST_VALUE</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="p">,</span>
<span class="w"> </span><span class="n">LAST_VALUE</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">event</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span>
<span class="n">WINDOW</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="n">UNBOUNDED</span><span class="w"> </span><span class="n">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">UNBOUNDED</span><span class="w"> </span><span class="n">FOLLOWING</span>
<span class="p">)</span>
</pre></div>
<p>The query uses both <code>DISTINCT ON</code> and window functions. It works by calculating the aggregates using the window function on the entire set (all rows of the account), and then fetch the first or last row using <code>DISTINCT ON</code>.</p>
<p>To make the window functions behave like a "group by" and calculate the aggregates on the entire set, the bound is defined as <code>BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</code>, meaning "for the entire partition".</p>
<p>To avoid repeating the window for every aggregate, the query uses <a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW" rel="noopener">a WINDOW clause</a> to define a named window that can be used multiple times in the query.</p>
<p>This query however, is not really working because the where clause is restricted to events with type <code>credit_set</code>. To get the complete status of the account, we also need to aggregate the balance of <em>all</em> events.</p>
<p>To actually make this approach work, we need to make the following adjustments:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">account</span><span class="p">)</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="n">LAST_VALUE</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'credit'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">CASE</span>
<span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="n">happened_at</span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="k">null</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">ASC</span><span class="w"> </span><span class="k">NULLS</span><span class="w"> </span><span class="k">FIRST</span>
<span class="w"> </span><span class="k">ROWS</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">UNBOUNDED</span><span class="w"> </span><span class="k">FOLLOWING</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">credit</span><span class="p">,</span>
<span class="w"> </span><span class="n">SUM</span><span class="p">(</span><span class="k">COALESCE</span><span class="p">((</span><span class="k">data</span><span class="o">-></span><span class="s1">'delta_balance'</span><span class="p">)</span><span class="o">::</span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="k">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">account</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="k">event</span><span class="p">;</span>
<span class="go"> account | credit | balance</span>
<span class="go">---------+--------+---------</span>
<span class="go"> 1 | 100 | -30</span>
</pre></div>
<p>What changes did we make:</p>
<ul>
<li>We had to ditch the where clause so all events are processed.</li>
<li>We also had to do some "creative sorting" to get the <code>last_credit</code> set event.</li>
<li>We removed the named window because it was no longer reused.</li>
</ul>
<p>The plan also gotten more complicated:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">Unique</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.19..1.55</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">24</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.19..1.54</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">24</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.19..1.38</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">48</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">1.19..1.20</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">44</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">account</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="k">CASE</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="p">(</span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span>
<span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="n">happened_at</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="k">NULL</span><span class="o">::</span><span class="nb">timestamp</span><span class="w"> </span><span class="nb">with time zone</span>
<span class="w"> </span><span class="k">END</span><span class="p">)</span><span class="w"> </span><span class="k">NULLS</span><span class="w"> </span><span class="k">FIRST</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">event</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.00..1.09</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">7</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">44</span><span class="p">)</span>
</pre></div>
<p>Two sorts, and several aggregates. The bottom line, in my opinion, is that this approach is harder to maintain and it yields a significantly more complicated plan. I wouldn't use it in this case. However, like the previous approach, it is not restricted by the type of the key and value.</p>
<hr>
<p>In response to <a href="https://twitter.com/mdevanr/status/1161275759786418177" rel="noopener">my tweet</a>, a reader pointed me to <a href="https://wiki.postgresql.org/wiki/First/last_(aggregate)" rel="noopener">an old wiki page</a> with an implementation of two custom aggregate functions <code>FIRST</code> and <code>LAST</code>.</p>
<p>After creating the custom aggregates in the database as instructed in the wiki page, the query can look like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="k">COALESCE</span><span class="p">((</span><span class="k">data</span><span class="o">->></span><span class="s1">'delta_balance'</span><span class="p">)::</span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">LAST</span><span class="p">((</span><span class="k">data</span><span class="o">->></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
</pre></div>
<p>The main issue I found with this approach is that the order seems to be arbitrary. First and last can only be defined in the context of some order. I couldn't find a way to provide a field to sort by, so I consider this approach flawed for this use case.</p>
<hr>
<p>As I suspected, this use case is ideal for custom aggregates and extensions, and indeed, <a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewudq11/" rel="noopener">another reader on Reddit</a> pointed me <a href="https://pgxn.org/dist/first_last" rel="noopener">the extension "first_last"</a>. The API is roughly similar to the custom aggregate above, but is also offers a way to sort the results so the first and last are not arbitrary.</p>
<p>I did not install the extension, but the query should look something like that:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">account</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="k">COALESCE</span><span class="p">((</span><span class="k">data</span><span class="o">->></span><span class="s1">'delta_balance'</span><span class="p">)::</span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">balance</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">LAST</span><span class="p">((</span><span class="k">data</span><span class="o">->></span><span class="s1">'credit'</span><span class="p">)::</span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">happened_at</span><span class="p">)</span>
</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'credit_set'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">credit</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">event</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">account</span><span class="p">;</span>
</pre></div>What You Need to Know to Manage Users in Django Admin2019-08-05T00:00:00+03:002019-08-05T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-08-05:/what-you-need-to-know-to-manage-users-in-django-admin<p>Have you ever stopped to think what staff user can do in your Django admin site? Did you know staff users with misconfigured permissions on the user model can make themselves superusers? Permissive permissions to staff users can cause disastrous human errors at best, and lead to major data leaks and at worst.</p><hr>
<p>Have you ever stopped to think what your staff users can do in Django admin? Did you know staff users with misconfigured permissions on the user model can make themselves superusers?</p>
<p>Permissive permissions to staff users can cause disastrous human errors at best, and lead to major data leaks at worst. With the great staff of <a href="https://realpython.com" rel="noopener">RealPython</a>, I wrote about ways to protect your Django admin and make it safer for users, and staff users.</p>
<p><a href="https://realpython.com/manage-users-in-django-admin/" rel="noopener"><strong>Read "What You Need to Know to Manage Users in Django Admin" on RealPython β«</strong></a></p>
<figure><img alt="What You Need to Know to Manage Users in Django Admin" src="https://hakibenita.com/images/01-what-you-need-to-know-to-manage-users-in-django-admin.jpeg"><figcaption>What You Need to Know to Manage Users in Django Admin</figcaption>
</figure>Fastest Way to Load Data Into PostgreSQL Using Python2019-07-09T00:00:00+03:002019-07-09T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-07-09:/fast-load-data-python-postgresql<p>Explore the best way to import messy data from remote source into PostgreSQL using Python and Psycopg2. The data is big, fetched from a remote source, and needs to be cleaned and transformed.</p><hr>
<p>As glorified data plumbers, we are often tasked with loading data fetched from a remote source into our systems. If we are lucky, the data is serialized as JSON or YAML. When we are less fortunate, we get an Excel spreadsheet or a CSV file which is always broken in some way, can't explain it.</p>
<p>Data from large companies or old systems is somehow always encoded in a weird way, and the Sysadmins always think they do us a favour by zipping the files (please gzip) or break them into smaller files with random names.</p>
<p>Modern services might provide a decent API, but more often that not we need to fetch a file from an FTP, SFTP, S3 or some proprietary vault that works only on Windows.</p>
<p><strong>In this article we explore the best way to import messy data from remote source into PostgreSQL.</strong></p>
<p>To provide a real life, workable solution, we set the following ground roles:</p>
<ol>
<li>The Data is fetched from a remote source.</li>
<li>The Data is dirty and needs to be transformed.</li>
<li>Data is big.</li>
</ol>
<p><details class="toc-container">
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#setup-a-beer-brewery">Setup: A Beer Brewery</a><ul>
<li><a href="#the-data">The Data</a></li>
<li><a href="#fetch-the-data">Fetch the Data</a></li>
<li><a href="#create-a-table-in-the-database">Create a Table in the Database</a></li>
</ul>
</li>
<li><a href="#metrics">Metrics</a><ul>
<li><a href="#measuring-time">Measuring Time</a></li>
<li><a href="#measuring-memory">Measuring Memory</a></li>
<li><a href="#profile-decorator">profile Decorator</a></li>
</ul>
</li>
<li><a href="#benchmark">Benchmark</a><ul>
<li><a href="#insert-rows-one-by-one">Insert Rows One by One</a></li>
<li><a href="#execute-many">Execute Many</a></li>
<li><a href="#execute-many-from-iterator">Execute Many From Iterator</a></li>
<li><a href="#execute-batch">Execute Batch</a></li>
<li><a href="#execute-batch-from-iterator">Execute Batch From Iterator</a></li>
<li><a href="#execute-batch-from-iterator-with-page-size">Execute Batch From Iterator with Page Size</a></li>
<li><a href="#execute-values">Execute Values</a></li>
<li><a href="#execute-values-from-iterator">Execute Values From Iterator</a></li>
<li><a href="#execute-values-from-iterator-with-page-size">Execute Values From Iterator with Page Size</a></li>
<li><a href="#copy">Copy</a></li>
<li><a href="#copy-data-from-a-string-iterator">Copy Data From a String Iterator</a></li>
<li><a href="#copy-data-from-a-string-iterator-with-buffer-size">Copy Data From a String Iterator with Buffer Size</a></li>
</ul>
</li>
<li><a href="#results-summary">Results Summary</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
</div>
<p></details></p>
<figure><img alt="Speedy Gonzales" src="https://hakibenita.com/images/01-fast-load-data-python-postgresql.jpg"><figcaption>Speedy Gonzales</figcaption>
</figure>
<hr>
<h2 id="setup-a-beer-brewery"><a class="toclink" href="#setup-a-beer-brewery">Setup: A Beer Brewery</a></h2>
<p>I found this <a href="https://punkapi.com/documentation/v2" rel="noopener">great public API for beers</a>, so we are going to import data to a beer table in the database.</p>
<h3 id="the-data"><a class="toclink" href="#the-data">The Data</a></h3>
<p>A single beer from the API looks like this:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>https://api.punkapi.com/v2/beers/?per_page<span class="o">=</span><span class="m">1</span><span class="p">&</span><span class="nv">page</span><span class="o">=</span><span class="m">1</span>
<span class="go">[</span>
<span class="go"> {</span>
<span class="go"> "id": 1,</span>
<span class="go"> "name": "Buzz",</span>
<span class="go"> "tagline": "A Real Bitter Experience.",</span>
<span class="go"> "first_brewed": "09/2007",</span>
<span class="go"> "description": "A light, crisp and bitter IPA ...",</span>
<span class="go"> "image_url": "https://images.punkapi.com/v2/keg.png",</span>
<span class="go"> "abv": 4.5,</span>
<span class="go"> "ibu": 60,</span>
<span class="go"> "target_fg": 1010,</span>
<span class="go"> "target_og": 1044,</span>
<span class="go"> "ebc": 20,</span>
<span class="go"> "srm": 10,</span>
<span class="go"> "ph": 4.4,</span>
<span class="go"> "attenuation_level": 75,</span>
<span class="go"> "volume": {</span>
<span class="go"> "value": 20,</span>
<span class="go"> "unit": "litres"</span>
<span class="go"> },</span>
<span class="go"> "contributed_by": "Sam Mason <samjbmason>"</span>
<span class="go"> "brewers_tips": "The earthy and floral aromas from...",</span>
<span class="go"> "boil_volume": {},</span>
<span class="go"> "method": {},</span>
<span class="go"> "ingredients": {},</span>
<span class="go"> "food_pairing": [],</span>
<span class="go"> }</span>
<span class="go">]</span>
</pre></div>
<p>I trimmed the output for brevity, but there is a lot of information about beers here. In this article we want to import all of the fields before <code>brewers_tips</code> to a table in the database.</p>
<p>The field <code>volume</code> is nested. We want to extract only the <code>value</code> from the field, and save it to a field called <code>volume</code> in the table.</p>
<div class="highlight"><pre><span></span><span class="n">volume</span> <span class="o">=</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
</pre></div>
<p>The field <code>first_brewed</code> contains only year and month, and in some cases, only the year. We want to transform the value to a valid date. For example, the value <code>09/2007</code> will be transformed to date <code>2007-09-01</code>. The value <code>2006</code> will be transformed to date <code>2016-01-01</code>.</p>
<p>Let's write a simple function to transform the text value in the field, to a Python <code>datetime.date</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">def</span> <span class="nf">parse_first_brewed</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">:</span>
<span class="n">parts</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">parts</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">parts</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">assert</span> <span class="kc">False</span><span class="p">,</span> <span class="s1">'Unknown date format'</span>
</pre></div>
<p>Let's quickly make sure that it works:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">parse_first_brewed</span><span class="p">(</span><span class="s1">'09/2007'</span><span class="p">)</span>
<span class="go">datetime.date(2007, 9, 1)</span>
<span class="gp">>>> </span><span class="n">parse_first_brewed</span><span class="p">(</span><span class="s1">'2006'</span><span class="p">)</span>
<span class="go">datetime.date(2006, 1, 1)</span>
</pre></div>
<p>In real life, transformations can be much more complicated. But for our purpose, this is more than enough.</p>
<h3 id="fetch-the-data"><a class="toclink" href="#fetch-the-data">Fetch the Data</a></h3>
<p>The API provides paged results. To encapsulate the paging, we create a generator that yields beers one by one:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterator</span><span class="p">,</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">Any</span>
<span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlencode</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="k">def</span> <span class="nf">iter_beers_from_api</span><span class="p">(</span><span class="n">page_size</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">5</span><span class="p">)</span> <span class="o">-></span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]:</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span>
<span class="n">page</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'https://api.punkapi.com/v2/beers?'</span> <span class="o">+</span> <span class="n">urlencode</span><span class="p">({</span>
<span class="s1">'page'</span><span class="p">:</span> <span class="n">page</span><span class="p">,</span>
<span class="s1">'per_page'</span><span class="p">:</span> <span class="n">page_size</span>
<span class="p">}))</span>
<span class="n">response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">yield from</span> <span class="n">data</span>
<span class="n">page</span> <span class="o">+=</span> <span class="mi">1</span>
</pre></div>
<p>And to use the generator function, we call and iterate it:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">beers</span> <span class="o">=</span> <span class="n">iter_beers_from_api</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">next</span><span class="p">(</span><span class="n">beers</span><span class="p">)</span>
<span class="go">{'id': 1,</span>
<span class="go"> 'name': 'Buzz',</span>
<span class="go"> 'tagline': 'A Real Bitter Experience.',</span>
<span class="go"> 'first_brewed': '09/2007',</span>
<span class="go"> 'description': 'A light, crisp and bitter IPA brewed...',</span>
<span class="go"> 'image_url': 'https://images.punkapi.com/v2/keg.png',</span>
<span class="go"> 'abv': 4.5,</span>
<span class="go"> 'ibu': 60,</span>
<span class="go"> 'target_fg': 1010,</span>
<span class="go">...</span>
<span class="go">}</span>
<span class="gp">>>> </span><span class="nb">next</span><span class="p">(</span><span class="n">beers</span><span class="p">)</span>
<span class="go">{'id': 2,</span>
<span class="go"> 'name': 'Trashy Blonde',</span>
<span class="go"> 'tagline': "You Know You Shouldn't",</span>
<span class="go"> 'first_brewed': '04/2008',</span>
<span class="go"> 'description': 'A titillating, ...',</span>
<span class="go"> 'image_url': 'https://images.punkapi.com/v2/2.png',</span>
<span class="go"> 'abv': 4.1,</span>
<span class="go"> 'ibu': 41.5,</span>
</pre></div>
<p>You will notice that the first result of each page takes a bit longer. This is because it does a network request to fetch the page.</p>
<h3 id="create-a-table-in-the-database"><a class="toclink" href="#create-a-table-in-the-database">Create a Table in the Database</a></h3>
<p>The next step is to create a table in the database to import the data into.</p>
<p>Create a database:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>createdb<span class="w"> </span>-O<span class="w"> </span>haki<span class="w"> </span>testload
</pre></div>
<p>Change <code>haki</code> in the example to your local user.</p>
<p>To connect from Python to a PostgreSQL database, we use <a href="http://initd.org/psycopg/" rel="noopener">psycopg</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>-m<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>psycopg2
</pre></div>
<p>Using psycopg, create a connection to the database:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span>
<span class="n">host</span><span class="o">=</span><span class="s2">"localhost"</span><span class="p">,</span>
<span class="n">database</span><span class="o">=</span><span class="s2">"testload"</span><span class="p">,</span>
<span class="n">user</span><span class="o">=</span><span class="s2">"haki"</span><span class="p">,</span>
<span class="n">password</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">connection</span><span class="o">.</span><span class="n">autocommit</span> <span class="o">=</span> <span class="kc">True</span>
</pre></div>
<p>We set <a href="http://initd.org/psycopg/docs/connection.html#connection.autocommit" rel="noopener"><code>autocommit=True</code></a> so every command we execute will take effect immediately. For the purpose of this article, this is fine.</p>
<p>Now that we have a connection, we can write a function to create a table:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"> DROP TABLE IF EXISTS staging_beers;</span>
<span class="s2"> CREATE UNLOGGED TABLE staging_beers (</span>
<span class="s2"> id INTEGER,</span>
<span class="s2"> name TEXT,</span>
<span class="s2"> tagline TEXT,</span>
<span class="s2"> first_brewed DATE,</span>
<span class="s2"> description TEXT,</span>
<span class="s2"> image_url TEXT,</span>
<span class="s2"> abv DECIMAL,</span>
<span class="s2"> ibu DECIMAL,</span>
<span class="s2"> target_fg DECIMAL,</span>
<span class="s2"> target_og DECIMAL,</span>
<span class="s2"> ebc DECIMAL,</span>
<span class="s2"> srm DECIMAL,</span>
<span class="s2"> ph DECIMAL,</span>
<span class="s2"> attenuation_level DECIMAL,</span>
<span class="s2"> brewers_tips TEXT,</span>
<span class="s2"> contributed_by TEXT,</span>
<span class="s2"> volume INTEGER</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">)</span>
</pre></div>
<p>The function receives a cursor and creates a unlogged table called <code>staging_beers</code>.</p>
<div class="admonition warning">
<p class="admonition-title">UNLOGGED TABLE</p>
<p>Data written to an <a href="https://www.postgresql.org/docs/current/sql-createtable.html#id-1.9.3.85.6" rel="noopener">unlogged table</a> will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that <code>UNLOGGED</code> tables will not be restored in case of a crash, and will not be replicated.</p>
</div>
<p>Using the connection we created before, this is how the function is used:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="gp">>>> </span> <span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
</pre></div>
<p>We are now ready to move on to the next part.</p>
<hr>
<h2 id="metrics"><a class="toclink" href="#metrics">Metrics</a></h2>
<p>Throughout this article we are interested in two main metrics: time and memory.</p>
<h3 id="measuring-time"><a class="toclink" href="#measuring-time">Measuring Time</a></h3>
<p>To measure time for each method we use the built-in <a href="https://docs.python.org/3/library/time.html" rel="noopener"><code>time</code> module</a>:</p>
<div class="highlight"><pre><span></span>>>> import time
>>> start = time.perf_counter()
>>> time.sleep(1) # do work
>>> elapsed = time.perf_counter() - start
>>> print(f'Time {elapsed:0.4}')
Time 1.001
</pre></div>
<p>The function <a href="https://docs.python.org/3/library/time.html#time.perf_counter" rel="noopener"><code>perf_counter</code></a> provides the clock with the highest available resolution, which makes it ideal for our purposes.</p>
<h3 id="measuring-memory"><a class="toclink" href="#measuring-memory">Measuring Memory</a></h3>
<p>To measure memory consumption, we are going to use the package <a href="https://pypi.org/project/memory-profiler/" rel="noopener">memory-profiler</a>.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>-m<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>memory-profiler
</pre></div>
<p>This package provides the memory usage, and the incremental memory usage for each line in the code. This is very useful when optimizing for memory. To illustrate, this is the example provided in PyPI:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>-m<span class="w"> </span>memory_profiler<span class="w"> </span>example.py
<span class="go">Line # Mem usage Increment Line Contents</span>
<span class="go">==============================================</span>
<span class="go"> 3 @profile</span>
<span class="go"> 4 5.97 MB 0.00 MB def my_func():</span>
<span class="go"> 5 13.61 MB 7.64 MB a = [1] * (10 ** 6)</span>
<span class="go"> 6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)</span>
<span class="go"> 7 13.61 MB -152.59 MB del b</span>
<span class="go"> 8 13.61 MB 0.00 MB return a</span>
</pre></div>
<p>The interesting part is the <code>Increment</code> column that shows the additional memory allocated by the code in each line.</p>
<p>In this article we are interested in the peak memory used by the function. The peak memory is the difference between the starting value of the "Mem usage" column, and the highest value (also known as the "high watermark").</p>
<p>To get the list of "Mem usage" we use the function <code>memory_usage</code> from <code>memory_profiler</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">memory_profiler</span> <span class="kn">import</span> <span class="n">memory_usage</span>
<span class="gp">>>> </span><span class="n">mem</span><span class="p">,</span> <span class="n">retval</span> <span class="o">=</span> <span class="n">memory_usage</span><span class="p">((</span><span class="n">fn</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">),</span> <span class="n">retval</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">interval</span><span class="o">=</span><span class="mf">1e-7</span><span class="p">)</span>
</pre></div>
<p>When used like this, the function <code>memory_usage</code> executes the function <code>fn</code> with the provided <code>args</code> and <code>kwargs</code>, but also launches another process in the background to monitor the memory usage every <code>interval</code> seconds.</p>
<p>For very quick operations the function <code>fn</code> might be executed more than once. By setting <code>interval</code> to a value <a href="https://github.com/pythonprofilers/memory_profiler/blob/0.55/memory_profiler.py#L350" rel="noopener">lower than 1e-6</a>, we force it to execute only once.</p>
<p>The argument <code>retval</code> tells the function to return the result of <code>fn</code>.</p>
<h3 id="profile-decorator"><a class="toclink" href="#profile-decorator"><code>profile</code> Decorator</a></h3>
<p>To put it all together, we create the following decorator to measure and report time and memory:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">wraps</span>
<span class="kn">from</span> <span class="nn">memory_profiler</span> <span class="kn">import</span> <span class="n">memory_usage</span>
<span class="k">def</span> <span class="nf">profile</span><span class="p">(</span><span class="n">fn</span><span class="p">):</span>
<span class="nd">@wraps</span><span class="p">(</span><span class="n">fn</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">inner</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">fn_kwargs_str</span> <span class="o">=</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s1">=</span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s1">'</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\n</span><span class="si">{</span><span class="n">fn</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">(</span><span class="si">{</span><span class="n">fn_kwargs_str</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
<span class="c1"># Measure time</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span> <span class="o">-</span> <span class="n">t</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Time </span><span class="si">{</span><span class="n">elapsed</span><span class="si">:</span><span class="s1">0.4</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="c1"># Measure memory</span>
<span class="n">mem</span><span class="p">,</span> <span class="n">retval</span> <span class="o">=</span> <span class="n">memory_usage</span><span class="p">((</span><span class="n">fn</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">),</span> <span class="n">retval</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">interval</span><span class="o">=</span><span class="mf">1e-7</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Memory </span><span class="si">{</span><span class="nb">max</span><span class="p">(</span><span class="n">mem</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">min</span><span class="p">(</span><span class="n">mem</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">retval</span>
<span class="k">return</span> <span class="n">inner</span>
</pre></div>
<p>To eliminate mutual effects of the timing on the memory and vice versa, we execute the function twice. First to time it, second to measure the memory usage.</p>
<p>The decorator will print the function name and any keyword arguments, and report the time and memory used:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="nd">@profile</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="gp">>>> </span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="gp">>>> </span> <span class="mi">2</span> <span class="o">**</span> <span class="n">n</span>
<span class="gp">>>> </span><span class="n">work</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="go">work()</span>
<span class="go">Time 0.06269</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">work</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
<span class="go">work(n=10000)</span>
<span class="go">Time 0.3865</span>
<span class="go">Memory 0.0234375</span>
</pre></div>
<p>Only keywords arguments are printed. This is intentional, we are going to use that in parameterized tests.</p>
<hr>
<h2 id="benchmark"><a class="toclink" href="#benchmark">Benchmark</a></h2>
<p>At the time of writing, the beers API contains only 325 beers. To work on a large dataset, we duplicate it 100 times and store it in-memory. The resulting dataset contains 32,500 beers:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">beers</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">iter_beers_from_api</span><span class="p">())</span> <span class="o">*</span> <span class="mi">100</span>
<span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="n">beers</span><span class="p">)</span>
<span class="go">32,500</span>
</pre></div>
<p>To imitate a remote API, our functions will accept iterators similar to the return value of <code>iter_beers_from_api</code>:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]))</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># Process beers...</span>
</pre></div>
<p>For the benchmark, we are going to import the beer data into the database. To eliminate external influences such as the network, we fetch the data from the API in advance, and serve it locally.</p>
<p>To get an accurate timing, we "fake" the remote API:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">beers</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">iter_beers_from_api</span><span class="p">())</span> <span class="o">*</span> <span class="mi">100</span>
<span class="gp">>>> </span><span class="n">process</span><span class="p">(</span><span class="n">beers</span><span class="p">)</span>
</pre></div>
<p>In a real life situation you would use the function <code>iter_beers_from_api</code> directly:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">process</span><span class="p">(</span><span class="n">iter_beers_from_api</span><span class="p">())</span>
</pre></div>
<p>We are now ready to start!</p>
<h3 id="insert-rows-one-by-one"><a class="toclink" href="#insert-rows-one-by-one">Insert Rows One by One</a></h3>
<p>To establish a baseline we start with the simplest approach, insert rows one by one:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_one_by_one</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="p">{</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">})</span>
</pre></div>
<p>Notice that as we iterate the beers, we transform the <code>first_brewed</code> to a <code>datetime.date</code> and extracted the volume value from the nested <code>volume</code> field.</p>
<p>Running this function produces the following output:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_one_by_one</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_one_by_one()</span>
<span class="go">Time 128.8</span>
<span class="go">Memory 0.08203125</span>
</pre></div>
<p>The function took 129 seconds to import 32K rows. The memory profiler shows that the function consumed very little memory.</p>
<p>Intuitively, inserting rows one by one does not sound very efficient. The constant context switching between the program and the database must be slowing it down.</p>
<h3 id="execute-many"><a class="toclink" href="#execute-many">Execute Many</a></h3>
<p>Psycopg2 provides a way to insert many rows at once using <a href="http://initd.org/psycopg/docs/cursor.html#cursor.executemany" rel="noopener"><code>executemany</code></a>. From the docs:</p>
<blockquote>
<p>Execute a database operation (query or command) against all parameter tuples or mappings found in the sequence vars_list.</p>
</blockquote>
<p>Sounds promising!</p>
<p>Let's try to import the data using <code>executemany</code>:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_executemany</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">all_beers</span> <span class="o">=</span> <span class="p">[{</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">]</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">executemany</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="n">all_beers</span><span class="p">)</span>
</pre></div>
<p>The function looks very similar to the previous function, and the transformations are the same. The main difference here is that we first transform all of the data in-memory, and only then import it to the database.</p>
<p>Running this function produces the following output:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_executemany</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_executemany()</span>
<span class="go">Time 124.7</span>
<span class="go">Memory 2.765625</span>
</pre></div>
<p>This is disappointing. The timing is just a little bit better, but the function now consumes 2.7MB of memory.</p>
<p>To put the memory usage in perspective, a JSON file containing only the data we import weighs 25MB on disk. Considering the proportion, using this method to import a 1GB file will require 110MB of memory.</p>
<h3 id="execute-many-from-iterator"><a class="toclink" href="#execute-many-from-iterator">Execute Many From Iterator</a></h3>
<p>The previous method consumed a lot of memory because the transformed data was stored in-memory before being processed by psycopg.</p>
<p>Let's see if we can use an iterator to avoid storing the data in-memory:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_executemany_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">executemany</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="p">({</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">))</span>
</pre></div>
<p>The difference here is that the transformed data is "streamed" into <code>executemany</code> using an iterator.</p>
<p>This function produces the following result:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_executemany_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_executemany_iterator()</span>
<span class="go">Time 129.3</span>
<span class="go">Memory 0.0</span>
</pre></div>
<p>Our "streaming" solution worked as expected and we managed to bring the memory to zero. The timing however, remains roughly the same, even compared to the one-by-one method.</p>
<h3 id="execute-batch"><a class="toclink" href="#execute-batch">Execute Batch</a></h3>
<p>The psycopg documentation has a very interesting note about <code>executemany</code> in the <a href="http://initd.org/psycopg/docs/extras.html#fast-execution-helpers" rel="noopener">"fast execution helpers" section</a>:</p>
<blockquote>
<p>The current implementation of executemany() is (using an extremely charitable understatement) not particularly performing. These functions can be used to speed up the repeated execution of a statement against a set of parameters. By reducing the number of server roundtrips the performance can be orders of magnitude better than using executemany().</p>
</blockquote>
<p>So we've been doing it wrong all along!</p>
<p>The function just below this section is <a href="http://initd.org/psycopg/docs/extras.html#psycopg2.extras.execute_batch" rel="noopener"><code>execute_batch</code></a>:</p>
<blockquote>
<p>Execute groups of statements in fewer server roundtrips.</p>
</blockquote>
<p>Let's implement the loading function using <code>execute_batch</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">psycopg2.extras</span>
<span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_batch</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">all_beers</span> <span class="o">=</span> <span class="p">[{</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">]</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_batch</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="n">all_beers</span><span class="p">)</span>
</pre></div>
<p>Executing the function:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_batch</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_execute_batch()</span>
<span class="go">Time 3.917</span>
<span class="go">Memory 2.50390625</span>
</pre></div>
<p>Wow! That's a huge leap. The function completed in just under 4 seconds. That's ~33 times faster than the 129 seconds we started with.</p>
<h3 id="execute-batch-from-iterator"><a class="toclink" href="#execute-batch-from-iterator">Execute Batch From Iterator</a></h3>
<p>The function <code>execute_batch</code> used less memory than <code>executemany</code> did for the same data. Let's try to eliminate memory by "streaming" the data into <code>execute_batch</code> using an iterator:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">iter_beers</span> <span class="o">=</span> <span class="p">({</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">)</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_batch</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="n">iter_beers</span><span class="p">)</span>
</pre></div>
<p>Executing the function</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_execute_batch_iterator()</span>
<span class="go">Time 4.333</span>
<span class="go">Memory 0.2265625</span>
</pre></div>
<p>We got roughly the same time, but with less memory.</p>
<h3 id="execute-batch-from-iterator-with-page-size"><a class="toclink" href="#execute-batch-from-iterator-with-page-size">Execute Batch From Iterator with Page Size</a></h3>
<p>When reading though <a href="http://initd.org/psycopg/docs/extras.html#psycopg2.extras.execute_batch" rel="noopener">the documentation for <code>execute_batch</code></a>, the argument <code>page_size</code> caught my eye:</p>
<blockquote>
<p>page_size β maximum number of argslist items to include in every statement. If there are more items the function will execute more than one statement.</p>
</blockquote>
<p>The documentation previously stated that the function performs better because it does less roundtrips to the database. If that's the case, a larger page size should reduce the number of roundtrips, and result in a faster loading time.</p>
<p>Let's add an argument for page size to our function so we can experiment:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_batch_iterator</span><span class="p">(</span>
<span class="n">connection</span><span class="p">,</span>
<span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]],</span>
<span class="n">page_size</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">100</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">iter_beers</span> <span class="o">=</span> <span class="p">({</span>
<span class="o">**</span><span class="n">beer</span><span class="p">,</span>
<span class="s1">'first_brewed'</span><span class="p">:</span> <span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="s1">'volume'</span><span class="p">:</span> <span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">)</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_batch</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES (</span>
<span class="s2"> </span><span class="si">%(id)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(name)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(tagline)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(first_brewed)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(description)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(image_url)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(abv)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ibu)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_fg)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(target_og)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ebc)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(srm)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(ph)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(attenuation_level)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(brewers_tips)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(contributed_by)s</span><span class="s2">,</span>
<span class="s2"> </span><span class="si">%(volume)s</span>
<span class="s2"> );</span>
<span class="s2"> """</span><span class="p">,</span> <span class="n">iter_beers</span><span class="p">,</span> <span class="n">page_size</span><span class="o">=</span><span class="n">page_size</span><span class="p">)</span>
</pre></div>
<p>The default page size is 100. Let's benchmark different values and compare the results:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="go">insert_execute_batch_iterator(page_size=1)</span>
<span class="go">Time 130.2</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="go">insert_execute_batch_iterator(page_size=100)</span>
<span class="go">Time 4.333</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
<span class="go">insert_execute_batch_iterator(page_size=1000)</span>
<span class="go">Time 2.537</span>
<span class="go">Memory 0.2265625</span>
<span class="gp">>>> </span><span class="n">insert_execute_batch_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
<span class="go">insert_execute_batch_iterator(page_size=10000)</span>
<span class="go">Time 2.585</span>
<span class="go">Memory 25.4453125</span>
</pre></div>
<p>We got some interesting results, let's break it down:</p>
<ul>
<li>1: The results are similar to the results we got inserting rows one by one.</li>
<li>100: This is the default <code>page_size</code>, so the results are similar to our previous benchmark.</li>
<li>1000: The timing here is about 40% faster, and the memory is low.</li>
<li>10000: Timing is not much faster than with a page size of 1000, but the memory is significantly higher.</li>
</ul>
<p>The results show that there is a tradeoff between memory and speed. In this case, it seems that the sweet spot is page size of 1000.</p>
<h3 id="execute-values"><a class="toclink" href="#execute-values">Execute Values</a></h3>
<p>The gems in psycopg's documentation does not end with <code>execute_batch</code>. While strolling through the documentation, another function called <a href="http://initd.org/psycopg/docs/extras.html#psycopg2.extras.execute_values" rel="noopener"><code>execute_values</code></a> caught my eye:</p>
<blockquote>
<p>Execute a statement using VALUES with a sequence of parameters.</p>
</blockquote>
<p>The function <code>execute_values</code> works by generating a huge <a href="https://www.postgresql.org/docs/current/queries-values.html" rel="noopener">VALUES list</a> to the query.</p>
<p>Let's give it a spin:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">psycopg2.extras</span>
<span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_values</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_values</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES </span><span class="si">%s</span><span class="s2">;</span>
<span class="s2"> """</span><span class="p">,</span> <span class="p">[(</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">])</span>
</pre></div>
<p>Importing beers using the function:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_values</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_execute_values()</span>
<span class="go">Time 3.666</span>
<span class="go">Memory 4.50390625</span>
</pre></div>
<p>So right out of the box we get a slight speedup compared to <code>execute_batch</code>. However, the memory is slightly higher.</p>
<h3 id="execute-values-from-iterator"><a class="toclink" href="#execute-values-from-iterator">Execute Values From Iterator</a></h3>
<p>Just like we did before, to reduce memory consumption we try to avoid storing data in-memory by using an iterator instead of a list:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_values</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES </span><span class="si">%s</span><span class="s2">;</span>
<span class="s2"> """</span><span class="p">,</span> <span class="p">((</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">))</span>
</pre></div>
<p>Executing the function produced the following results:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">insert_execute_values_iterator()</span>
<span class="go">Time 3.677</span>
<span class="go">Memory 0.0</span>
</pre></div>
<p>So the timing is almost the same, but the memory is back to zero.</p>
<h3 id="execute-values-from-iterator-with-page-size"><a class="toclink" href="#execute-values-from-iterator-with-page-size">Execute Values From Iterator with Page Size</a></h3>
<p>Just like <code>execute_batch</code>, the function <code>execute_values</code> also accept a <code>page_size</code> argument:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">insert_execute_values_iterator</span><span class="p">(</span>
<span class="n">connection</span><span class="p">,</span>
<span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]],</span>
<span class="n">page_size</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">100</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">psycopg2</span><span class="o">.</span><span class="n">extras</span><span class="o">.</span><span class="n">execute_values</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s2">"""</span>
<span class="s2"> INSERT INTO staging_beers VALUES </span><span class="si">%s</span><span class="s2">;</span>
<span class="s2"> """</span><span class="p">,</span> <span class="p">((</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="n">page_size</span><span class="p">)</span>
</pre></div>
<p>Executing with different page sizes:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="go">insert_execute_values_iterator(page_size=1)</span>
<span class="go">Time 127.4</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="go">insert_execute_values_iterator(page_size=100)</span>
<span class="go">Time 3.677</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
<span class="go">insert_execute_values_iterator(page_size=1000)</span>
<span class="go">Time 1.468</span>
<span class="go">Memory 0.0</span>
<span class="gp">>>> </span><span class="n">insert_execute_values_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="nb">iter</span><span class="p">(</span><span class="n">beers</span><span class="p">),</span> <span class="n">page_size</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
<span class="go">insert_execute_values_iterator(page_size=10000)</span>
<span class="go">Time 1.503</span>
<span class="go">Memory 2.25</span>
</pre></div>
<p>Just like <code>execute_batch</code>, we see a tradeoff between memory and speed. Here as well, the sweet spot is around page size 1000. However, using <code>execute_values</code> we got results ~20% faster compared to the same page size using <code>execute_batch</code>.</p>
<h3 id="copy"><a class="toclink" href="#copy">Copy</a></h3>
<p>The official documentation for PostgreSQL features an entire section on <a href="https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM" rel="noopener">Populating a Database</a>. According to the documentation, the best way to load data into a database is using the <a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener"><code>copy</code> command</a>.</p>
<p>To use <code>copy</code> from Python, psycopg provides a special function called <a href="http://initd.org/psycopg/docs/cursor.html#cursor.copy_from" rel="noopener"><code>copy_from</code></a>. The <code>copy</code> command requires a CSV file. Let's see if we can transform our data into CSV, and load it into the database using <code>copy_from</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">io</span>
<span class="k">def</span> <span class="nf">clean_csv_value</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Any</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="n">value</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="sa">r</span><span class="s1">'\N'</span>
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'</span><span class="se">\\</span><span class="s1">n'</span><span class="p">)</span>
<span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">copy_stringio</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">csv_file_like_object</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span>
<span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span><span class="p">:</span>
<span class="n">csv_file_like_object</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">clean_csv_value</span><span class="p">,</span> <span class="p">(</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">]),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)))</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">csv_file_like_object</span><span class="o">.</span><span class="n">seek</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">copy_from</span><span class="p">(</span><span class="n">csv_file_like_object</span><span class="p">,</span> <span class="s1">'staging_beers'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'|'</span><span class="p">)</span>
</pre></div>
<p>Let's break it down:</p>
<ul>
<li><code>clean_csv_value</code>: Transforms a single value<ul>
<li><strong>Escape new lines</strong>: some of the text fields include newlines, so we escape <code>\n</code> -> <code>\\n</code>.</li>
<li><strong>Empty values are transformed to <code>\N</code></strong>: The string <code>"\N"</code> is the default string used by PostgreSQL to indicate NULL in COPY (this can be changed using the <code>NULL</code> option).</li>
</ul>
</li>
<li><code>csv_file_like_object</code>: Generate a file like object using <a href="https://docs.python.org/3.7/library/io.html?#io.StringIO" rel="noopener"><code>io.StringIO</code></a>. A <code>StringIO</code> object contains a string which can be used like a file. In our case, a CSV file.</li>
<li><code>csv_file_like_object.write</code>: Transform a beer to a CSV row<ul>
<li><strong>Transform the data</strong>: transformations on <code>first_brewed</code> and <code>volume</code> are performed here.</li>
<li><strong>Pick a delimiter</strong>: Some of the fields in the dataset contain free text with commas. To prevent conflicts, we pick "|" as the delimiter (another option is to use <code>QUOTE</code>).</li>
</ul>
</li>
</ul>
<p>Now let's see if all of this hard work paid off:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">copy_stringio</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">copy_stringio()</span>
<span class="go">Time 0.6274</span>
<span class="go">Memory 99.109375</span>
</pre></div>
<p>The <code>copy</code> command is the fastest we've seen so far! Using <code>COPY</code>, the process completed in less than a second. However, it seems like this method is a lot more wasteful in terms of memory usage. The function consumes 99MB, which is more than twice the size of our JSON file on disk.</p>
<h3 id="copy-data-from-a-string-iterator"><a class="toclink" href="#copy-data-from-a-string-iterator">Copy Data From a String Iterator</a></h3>
<p>One of the main drawbacks of using copy with <code>StringIO</code> is that the entire file is created in-memory. What if instead of creating the entire file in-memory, we create a file-like object that will act as a buffer between the remote source and the <code>COPY</code> command. The buffer will consume JSON via the iterator, clean and transform the data, and output clean CSV.</p>
<figure><img alt="Copy Data From a String Iterator (<a href="https://yuml.me/edit/64d10485" target="_blank">yuml.me</a>)" src="https://hakibenita.com/images/02-fast-load-data-python-postgresql.png"><figcaption>Copy Data From a String Iterator (<a href="https://yuml.me/edit/64d10485" target="_blank">yuml.me</a>)</figcaption>
</figure>
<p>Inspired by <a href="https://stackoverflow.com/a/12604375/2000875" rel="noopener">this stack overflow answer</a>, we created an object that feeds off an iterator, and provides a file-like interface:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterator</span><span class="p">,</span> <span class="n">Optional</span>
<span class="kn">import</span> <span class="nn">io</span>
<span class="k">class</span> <span class="nc">StringIteratorIO</span><span class="p">(</span><span class="n">io</span><span class="o">.</span><span class="n">TextIOBase</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">iter</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_iter</span> <span class="o">=</span> <span class="nb">iter</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_buff</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">def</span> <span class="nf">readable</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">_read1</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">while</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_buff</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_buff</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_iter</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">StopIteration</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">ret</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_buff</span><span class="p">[:</span><span class="n">n</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_buff</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_buff</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">ret</span><span class="p">):]</span>
<span class="k">return</span> <span class="n">ret</span>
<span class="k">def</span> <span class="nf">read</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">line</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">n</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">m</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_read1</span><span class="p">()</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">m</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">line</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">while</span> <span class="n">n</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">m</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_read1</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">m</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">n</span> <span class="o">-=</span> <span class="nb">len</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="n">line</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="k">return</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
</pre></div>
<p>To demonstrate how this works, this is how a CSV file-like object can be generated from a list of numbers:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">gen</span> <span class="o">=</span> <span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s1">,</span><span class="si">{</span><span class="n">i</span><span class="o">**</span><span class="mi">2</span><span class="si">}</span><span class="se">\n</span><span class="s1">'</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">gen</span>
<span class="go"><generator object <genexpr> at 0x7f58bde7f5e8></span>
<span class="gp">>>> </span><span class="n">f</span> <span class="o">=</span> <span class="n">StringIteratorIO</span><span class="p">(</span><span class="n">gen</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="go">0,0</span>
<span class="go">1,1</span>
<span class="go">2,4</span>
</pre></div>
<p>Notice that we used <code>f</code> like a file. Internally, it fetched the rows from <code>gen</code> only when its internal line buffer was empty.</p>
<p>The loading function using <code>StringIteratorIO</code> looks like this:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">copy_string_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">beers_string_iterator</span> <span class="o">=</span> <span class="n">StringIteratorIO</span><span class="p">((</span>
<span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">clean_csv_value</span><span class="p">,</span> <span class="p">(</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">])</span><span class="o">.</span><span class="n">isoformat</span><span class="p">(),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)))</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span>
<span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span>
<span class="p">))</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">copy_from</span><span class="p">(</span><span class="n">beers_string_iterator</span><span class="p">,</span> <span class="s1">'staging_beers'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'|'</span><span class="p">)</span>
</pre></div>
<p>The main difference is that the beers CSV file is consumed on demand, and the data is not stored in-memory after it was used.</p>
<p>Let's execute the function and see the results:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">copy_string_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">)</span>
<span class="go">copy_string_iterator()</span>
<span class="go">Time 0.4596</span>
<span class="go">Memory 0.0</span>
</pre></div>
<p>Great! Timing is low and memory is back to zero.</p>
<h3 id="copy-data-from-a-string-iterator-with-buffer-size"><a class="toclink" href="#copy-data-from-a-string-iterator-with-buffer-size">Copy Data From a String Iterator with Buffer Size</a></h3>
<p>In an attempt to squeeze one final drop of performance, we notice that just like <code>page_size</code>, the <code>copy</code> command also accepts a similar argument called <code>size</code>:</p>
<blockquote>
<p>size β size of the buffer used to read from the file.</p>
</blockquote>
<p>Let's add a <code>size</code> argument to the function:</p>
<div class="highlight"><pre><span></span><span class="nd">@profile</span>
<span class="k">def</span> <span class="nf">copy_string_iterator</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="n">beers</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]],</span> <span class="n">size</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">8192</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="n">create_staging_table</span><span class="p">(</span><span class="n">cursor</span><span class="p">)</span>
<span class="n">beers_string_iterator</span> <span class="o">=</span> <span class="n">StringIteratorIO</span><span class="p">((</span>
<span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">clean_csv_value</span><span class="p">,</span> <span class="p">(</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'tagline'</span><span class="p">],</span>
<span class="n">parse_first_brewed</span><span class="p">(</span><span class="n">beer</span><span class="p">[</span><span class="s1">'first_brewed'</span><span class="p">])</span><span class="o">.</span><span class="n">isoformat</span><span class="p">(),</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'description'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'image_url'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'abv'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ibu'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_fg'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'target_og'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ebc'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'srm'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'ph'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'attenuation_level'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'brewers_tips'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'contributed_by'</span><span class="p">],</span>
<span class="n">beer</span><span class="p">[</span><span class="s1">'volume'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">],</span>
<span class="p">)))</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span>
<span class="k">for</span> <span class="n">beer</span> <span class="ow">in</span> <span class="n">beers</span>
<span class="p">))</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">copy_from</span><span class="p">(</span><span class="n">beers_string_iterator</span><span class="p">,</span> <span class="s1">'staging_beers'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'|'</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">size</span><span class="p">)</span>
</pre></div>
<p>The default value for size is 8192, which is <code>2 ** 13</code>, so we will keep sizes in powers of 2:</p>
<div class="highlight"><pre><span></span>>>> copy_string_iterator(connection, iter(beers), size=1024)
copy_string_iterator(size=1024)
Time 0.4536
Memory 0.0
>>> copy_string_iterator(connection, iter(beers), size=8192)
copy_string_iterator(size=8192)
Time 0.4596
Memory 0.0
>>> copy_string_iterator(connection, iter(beers), size=16384)
copy_string_iterator(size=16384)
Time 0.4649
Memory 0.0
>>> copy_string_iterator(connection, iter(beers), size=65536)
copy_string_iterator(size=65536)
Time 0.6171
Memory 0.0
</pre></div>
<p>Unlike the previous examples, it seems like there is no tradeoff between speed and memory. This makes sense because this method was designed to consume no memory. However, we do get different timing when changing the page size. For our dataset, the default 8192 is the sweet spot.</p>
<h2 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h2>
<p>A summary of the results:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>Function</th>
<th>Time (seconds)</th>
<th>Memory (MB)</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>insert_one_by_one()</code></td>
<td>128.8</td>
<td>0.08203125</td>
</tr>
<tr>
<td><code>insert_executemany()</code></td>
<td>124.7</td>
<td>2.765625</td>
</tr>
<tr>
<td><code>insert_executemany_iterator()</code></td>
<td>129.3</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_batch()</code></td>
<td>3.917</td>
<td>2.50390625</td>
</tr>
<tr>
<td><code>insert_execute_batch_iterator(page_size=1)</code></td>
<td>130.2</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_batch_iterator(page_size=100)</code></td>
<td>4.333</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_batch_iterator(page_size=1000)</code></td>
<td>2.537</td>
<td>0.2265625</td>
</tr>
<tr>
<td><code>insert_execute_batch_iterator(page_size=10000)</code></td>
<td>2.585</td>
<td>25.4453125</td>
</tr>
<tr>
<td><code>insert_execute_values()</code></td>
<td>3.666</td>
<td>4.50390625</td>
</tr>
<tr>
<td><code>insert_execute_values_iterator(page_size=1)</code></td>
<td>127.4</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_values_iterator(page_size=100)</code></td>
<td>3.677</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_values_iterator(page_size=1000)</code></td>
<td>1.468</td>
<td>0.0</td>
</tr>
<tr>
<td><code>insert_execute_values_iterator(page_size=10000)</code></td>
<td>1.503</td>
<td>2.25</td>
</tr>
<tr>
<td><code>copy_stringio()</code></td>
<td>0.6274</td>
<td>99.109375</td>
</tr>
<tr>
<td><code>copy_string_iterator(size=1024)</code></td>
<td>0.4536</td>
<td>0.0</td>
</tr>
<tr>
<td><code>copy_string_iterator(size=8192)</code></td>
<td>0.4596</td>
<td>0.0</td>
</tr>
<tr>
<td><code>copy_string_iterator(size=16384)</code></td>
<td>0.4649</td>
<td>0.0</td>
</tr>
<tr>
<td><code>copy_string_iterator(size=65536)</code></td>
<td>0.6171</td>
<td>0.0</td>
</tr>
</tbody>
</table>
</div>
<hr>
<h2 id="summary"><a class="toclink" href="#summary">Summary</a></h2>
<p>The big question now is <em>What should I use?</em> as always, the answer is <em>It depends</em>.</p>
<p>Each method has its own advantages and disadvantages, and is suited for different circumstances:</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>Prefer built-in approaches for complex data types.</p>
</div>
<p>Execute many, execute values and batch take care of the conversion between Python data types to database types. CSV approaches required escaping.</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>Prefer built-in approaches for small data volume.</p>
</div>
<p>The build-in approaches are more readable and less likely to break in the future. If memory and time is not an issue, keep it simple!</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>Prefer copy approaches for large data volume.</p>
</div>
<p>Copy approach is more suitable for larger amounts of data where memory might become an issue.</p>
<hr>
<div class="admonition info">
<p class="admonition-title">Source code</p>
<p>The source code for this benchmark can be found <a href="https://gist.github.com/hakib/7e723d2c113b947f7920bf55737e4d16" rel="noopener">here</a>.</p>
</div>Improve Serialization Performance in Django Rest Framework2019-06-08T00:00:00+03:002019-06-08T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-06-08:/django-rest-framework-slow<p>When a developer chooses Python, Django, or Django Rest Framework, it's usually not because of its blazing fast performance. All of this doesn't mean performance is not important. As this story taught us, major performance boosts can be gained with just a little attention, and a few small changes.</p><hr>
<p>When a developer chooses Python, Django, or Django Rest Framework, it's usually not because of its blazing fast performance. Python has always been the "comfortable" choice, the language you choose when you care more about ergonomics than skimming a few microseconds of some process.</p>
<p>There is nothing wrong with ergonomics. Most projects don't really need that micro second performance boost, but they do need to ship quality code fast.</p>
<p>All of this doesn't mean performance is not important. As this story taught us, major performance boosts can be gained with just a little attention, and a few small changes.</p>
<figure><img alt=""mip mip"" src="https://hakibenita.com/images/01-django-rest-framework-slow.jpg"><figcaption>"mip mip"</figcaption>
</figure>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#model-serializer-performance">Model Serializer Performance</a><ul>
<li><a href="#simple-function">Simple Function</a></li>
<li><a href="#modelserializer">ModelSerializer</a></li>
<li><a href="#read-only-modelserializer">Read Only ModelSerializer</a></li>
<li><a href="#regular-serializer">"Regular" Serializer</a></li>
<li><a href="#read-only-regular-serializer">Read Only "regular" Serializer</a></li>
<li><a href="#results-summary">Results Summary</a></li>
</ul>
</li>
<li><a href="#why-is-this-happening">Why is This Happening?</a><ul>
<li><a href="#prior-work">Prior Work</a></li>
<li><a href="#fixing-djangos-lazy">Fixing Django's lazy</a></li>
<li><a href="#fixing-django-rest-framework">Fixing Django Rest Framework</a></li>
</ul>
</li>
<li><a href="#takeaway">Takeaway</a></li>
<li><a href="#bonus-forcing-good-habits">Bonus: Forcing Good Habits</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="model-serializer-performance"><a class="toclink" href="#model-serializer-performance">Model Serializer Performance</a></h2>
<p>A while back we noticed very poor performance from one of our main API endpoints. The endpoint fetched data from a very large table, so we naturally assumed that the problem must be in the database.</p>
<p>When we noticed that even small data sets get poor performance, we started looking into other parts of the app. This journey eventually led us to Django Rest Framework (DRF) serializers.</p>
<div class="admonition info">
<p class="admonition-title">versions</p>
<p>In the benchmark we use Python 3.7, Django 2.1.1 and Django Rest Framework 3.9.4.</p>
</div>
<h3 id="simple-function"><a class="toclink" href="#simple-function">Simple Function</a></h3>
<p>Serializers are used for transforming data into objects, and objects into data. This is a simple function, so let's write one that accepts a <code>User</code> instance, and returns a dict:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">Any</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="k">def</span> <span class="nf">serialize_user</span><span class="p">(</span><span class="n">user</span><span class="p">:</span> <span class="n">User</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]:</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">id</span><span class="p">,</span>
<span class="s1">'last_login'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">last_login</span><span class="o">.</span><span class="n">isoformat</span><span class="p">()</span> <span class="k">if</span> <span class="n">user</span><span class="o">.</span><span class="n">last_login</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="kc">None</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">is_superuser</span><span class="p">,</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">username</span><span class="p">,</span>
<span class="s1">'first_name'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">first_name</span><span class="p">,</span>
<span class="s1">'last_name'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">last_name</span><span class="p">,</span>
<span class="s1">'email'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">email</span><span class="p">,</span>
<span class="s1">'is_staff'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">is_staff</span><span class="p">,</span>
<span class="s1">'is_active'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">is_active</span><span class="p">,</span>
<span class="s1">'date_joined'</span><span class="p">:</span> <span class="n">user</span><span class="o">.</span><span class="n">date_joined</span><span class="o">.</span><span class="n">isoformat</span><span class="p">(),</span>
<span class="p">}</span>
</pre></div>
<p>Create a user to use in the benchmark:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="gp">>>> </span><span class="n">u</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span>
<span class="gp">>>> </span> <span class="n">username</span><span class="o">=</span><span class="s1">'hakib'</span><span class="p">,</span>
<span class="gp">>>> </span> <span class="n">first_name</span><span class="o">=</span><span class="s1">'haki'</span><span class="p">,</span>
<span class="gp">>>> </span> <span class="n">last_name</span><span class="o">=</span><span class="s1">'benita'</span><span class="p">,</span>
<span class="gp">>>> </span> <span class="n">email</span><span class="o">=</span><span class="s1">'me@hakibenita.com'</span><span class="p">,</span>
<span class="gp">>>> </span><span class="p">)</span>
</pre></div>
<p>For our benchmark we are using <a href="https://docs.python.org/3.7/library/profile.html" rel="noopener"><code>cProfile</code></a>. To eliminate external influences such as the database, we fetch a user in advance and serialize it 5,000 times:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">cProfile</span>
<span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'for i in range(5000): serialize_user(u)'</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'tottime'</span><span class="p">)</span>
<span class="hll"><span class="go">15003 function calls in 0.034 seconds</span>
</span>
<span class="go">Ordered by: internal time</span>
<span class="go">ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 5000 0.020 0.000 0.021 0.000 {method 'isoformat' of 'datetime.datetime' objects}</span>
<span class="go"> 5000 0.010 0.000 0.030 0.000 drf_test.py:150(serialize_user)</span>
<span class="go"> 1 0.003 0.003 0.034 0.034 <string>:1(<module>)</span>
<span class="go"> 5000 0.001 0.000 0.001 0.000 __init__.py:208(utcoffset)</span>
<span class="go"> 1 0.000 0.000 0.034 0.034 {built-in method builtins.exec}</span>
<span class="go"> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}</span>
</pre></div>
<p>The simple function took 0.034 seconds to serialize a <code>User</code> object 5,000 times.</p>
<h3 id="modelserializer"><a class="toclink" href="#modelserializer"><code>ModelSerializer</code></a></h3>
<p>Django Rest Framework (DRF) comes with a few utility classes, namely the <a href="https://www.django-rest-framework.org/api-guide/serializers/#modelserializer" rel="noopener"><code>ModelSerializer</code></a>.</p>
<p>A <code>ModelSerializer</code> for the built-in <code>User</code> model might look like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">rest_framework</span> <span class="kn">import</span> <span class="n">serializers</span>
<span class="k">class</span> <span class="nc">UserModelSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">ModelSerializer</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">User</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'last_login'</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">,</span>
<span class="s1">'username'</span><span class="p">,</span>
<span class="s1">'first_name'</span><span class="p">,</span>
<span class="s1">'last_name'</span><span class="p">,</span>
<span class="s1">'email'</span><span class="p">,</span>
<span class="s1">'is_staff'</span><span class="p">,</span>
<span class="s1">'is_active'</span><span class="p">,</span>
<span class="s1">'date_joined'</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
<p>Running the same benchmark as before:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'for i in range(5000): UserModelSerializer(u).data'</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'tottime'</span><span class="p">)</span>
<span class="hll"><span class="go">18845053 function calls (18735053 primitive calls) in 12.818 seconds</span>
</span>
<span class="go">Ordered by: internal time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 85000 2.162 0.000 4.706 0.000 functional.py:82(__prepare_class__)</span>
<span class="go"> 7955000 1.565 0.000 1.565 0.000 {built-in method builtins.hasattr}</span>
<span class="go"> 1080000 0.701 0.000 0.701 0.000 functional.py:102(__promise__)</span>
<span class="go"> 50000 0.594 0.000 4.886 0.000 field_mapping.py:66(get_field_kwargs)</span>
<span class="go"> 1140000 0.563 0.000 0.581 0.000 {built-in method builtins.getattr}</span>
<span class="go"> 55000 0.489 0.000 0.634 0.000 fields.py:319(__init__)</span>
<span class="go"> 1240000 0.389 0.000 0.389 0.000 {built-in method builtins.setattr}</span>
<span class="go"> 5000 0.342 0.000 11.773 0.002 serializers.py:992(get_fields)</span>
<span class="go"> 20000 0.338 0.000 0.446 0.000 {built-in method builtins.__build_class__}</span>
<span class="go"> 210000 0.333 0.000 0.792 0.000 trans_real.py:275(gettext)</span>
<span class="go"> 75000 0.312 0.000 2.285 0.000 functional.py:191(wrapper)</span>
<span class="go"> 20000 0.248 0.000 4.817 0.000 fields.py:762(__init__)</span>
<span class="go"> 1300000 0.230 0.000 0.264 0.000 {built-in method builtins.isinstance}</span>
<span class="go"> 50000 0.224 0.000 5.311 0.000 serializers.py:1197(build_standard_field)</span>
</pre></div>
<p>It took DRF 12.8 seconds to serialize a user 5,000 times, or 2.56ms to serialize just a single user. That is <strong>377 times slower than the plain function</strong>.</p>
<p>We can see that a significant amount of time is spent in <code>functional.py</code>. <code>ModelSerializer</code> uses the <code>lazy</code> function from <code>django.utils.functional</code> to evaluate validations. It is also used by Django verbose names and so on, which are also being evaluated by DRF. This function seem to be weighing down the serializer.</p>
<h3 id="read-only-modelserializer"><a class="toclink" href="#read-only-modelserializer">Read Only <code>ModelSerializer</code></a></h3>
<p>Field validations are added by <code>ModelSerializer</code> only for writable fields. To measure the effect of validation, we create a <code>ModelSerializer</code> and mark all fields as read only:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">rest_framework</span> <span class="kn">import</span> <span class="n">serializers</span>
<span class="k">class</span> <span class="nc">UserReadOnlyModelSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">ModelSerializer</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">User</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'last_login'</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">,</span>
<span class="s1">'username'</span><span class="p">,</span>
<span class="s1">'first_name'</span><span class="p">,</span>
<span class="s1">'last_name'</span><span class="p">,</span>
<span class="s1">'email'</span><span class="p">,</span>
<span class="s1">'is_staff'</span><span class="p">,</span>
<span class="s1">'is_active'</span><span class="p">,</span>
<span class="s1">'date_joined'</span><span class="p">,</span>
<span class="p">]</span>
<span class="hll"> <span class="n">read_only_fields</span> <span class="o">=</span> <span class="n">fields</span>
</span></pre></div>
<p>When all fields are read only, the serializer cannot be used to create new instances.</p>
<p>Let's run our benchmark with the read only serializer:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'for i in range(5000): UserReadOnlyModelSerializer(u).data'</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'tottime'</span><span class="p">)</span>
<span class="hll"><span class="go">14540060 function calls (14450060 primitive calls) in 7.407 seconds</span>
</span>
<span class="go"> Ordered by: internal time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go">6090000 0.809 0.000 0.809 0.000 {built-in method builtins.hasattr}</span>
<span class="go"> 65000 0.725 0.000 1.516 0.000 functional.py:82(__prepare_class__)</span>
<span class="go"> 50000 0.561 0.000 4.182 0.000 field_mapping.py:66(get_field_kwargs)</span>
<span class="go"> 55000 0.435 0.000 0.558 0.000 fields.py:319(__init__)</span>
<span class="go"> 840000 0.330 0.000 0.346 0.000 {built-in method builtins.getattr}</span>
<span class="go"> 210000 0.294 0.000 0.688 0.000 trans_real.py:275(gettext)</span>
<span class="go"> 5000 0.282 0.000 6.510 0.001 serializers.py:992(get_fields)</span>
<span class="go"> 75000 0.220 0.000 1.989 0.000 functional.py:191(wrapper)</span>
<span class="go">1305000 0.200 0.000 0.228 0.000 {built-in method builtins.isinstance}</span>
<span class="go"> 50000 0.182 0.000 4.531 0.000 serializers.py:1197(build_standard_field)</span>
<span class="go"> 50000 0.145 0.000 0.259 0.000 serializers.py:1310(include_extra_kwargs)</span>
<span class="go"> 55000 0.133 0.000 0.696 0.000 text.py:14(capfirst)</span>
<span class="go"> 50000 0.127 0.000 2.377 0.000 field_mapping.py:46(needs_label)</span>
<span class="go"> 210000 0.119 0.000 0.145 0.000 gettext.py:451(gettext)</span>
</pre></div>
<p>Only 7.4 seconds. A 40% improvement compared to the writable <code>ModelSerializer</code>.</p>
<p>In the benchmark's output we can see a lot of time is being spent in <code>field_mapping.py</code> and <code>fields.py</code>. These are related to the inner workings of the <code>ModelSerializer</code>. In the serialization and initialization process the <code>ModelSerializer</code> is using a lot of metadata to construct and validate the serializer fields, and it comes at a cost.</p>
<h3 id="regular-serializer"><a class="toclink" href="#regular-serializer">"Regular" <code>Serializer</code></a></h3>
<p>In the next benchmark, we wanted to measure exactly how much the <code>ModelSerializer</code> "costs" us. Let's create a "regular" <a href="https://www.django-rest-framework.org/api-guide/serializers/#declaring-serializers" rel="noopener"><code>Serializer</code></a> for the <code>User</code> model:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">rest_framework</span> <span class="kn">import</span> <span class="n">serializers</span>
<span class="k">class</span> <span class="nc">UserSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">Serializer</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">()</span>
<span class="n">last_login</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">()</span>
<span class="n">first_name</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">()</span>
<span class="n">last_name</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">()</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="n">is_staff</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
<span class="n">is_active</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
<span class="n">date_joined</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
</pre></div>
<p>Running the same benchmark using the "regular" serializer:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'for i in range(5000): UserSerializer(u).data'</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'tottime'</span><span class="p">)</span>
<span class="hll"><span class="go">3110007 function calls (3010007 primitive calls) in 2.101 seconds</span>
</span>
<span class="go">Ordered by: internal time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 55000 0.329 0.000 0.430 0.000 fields.py:319(__init__)</span>
<span class="go">105000/5000 0.188 0.000 1.247 0.000 copy.py:132(deepcopy)</span>
<span class="go"> 50000 0.145 0.000 0.863 0.000 fields.py:626(__deepcopy__)</span>
<span class="go"> 20000 0.093 0.000 0.320 0.000 fields.py:762(__init__)</span>
<span class="go"> 310000 0.092 0.000 0.092 0.000 {built-in method builtins.getattr}</span>
<span class="go"> 50000 0.087 0.000 0.125 0.000 fields.py:365(bind)</span>
<span class="go"> 5000 0.072 0.000 1.934 0.000 serializers.py:508(to_representation)</span>
<span class="go"> 55000 0.055 0.000 0.066 0.000 fields.py:616(__new__)</span>
<span class="go"> 5000 0.053 0.000 1.204 0.000 copy.py:268(_reconstruct)</span>
<span class="go"> 235000 0.052 0.000 0.052 0.000 {method 'update' of 'dict' objects}</span>
<span class="go"> 50000 0.048 0.000 0.097 0.000 fields.py:55(is_simple_callable)</span>
<span class="go"> 260000 0.048 0.000 0.075 0.000 {built-in method builtins.isinstance}</span>
<span class="go"> 25000 0.047 0.000 0.051 0.000 deconstruct.py:14(__new__)</span>
<span class="go"> 55000 0.042 0.000 0.057 0.000 copy.py:252(_keep_alive)</span>
<span class="go"> 50000 0.041 0.000 0.197 0.000 fields.py:89(get_attribute)</span>
<span class="go"> 5000 0.037 0.000 1.459 0.000 serializers.py:353(fields)</span>
</pre></div>
<p>Here is the leap we were waiting for!</p>
<p>The "regular" serializer took only 2.1 seconds. That's 60% faster than the read only <code>ModelSerializer</code>, and a whooping 85% faster than the writable <code>ModelSerializer</code>.</p>
<p>At this point it become obvious that the <code>ModelSerializer</code> does not come cheap!</p>
<h3 id="read-only-regular-serializer"><a class="toclink" href="#read-only-regular-serializer">Read Only "regular" <code>Serializer</code></a></h3>
<p>In the writable <code>ModelSerializer</code> a lot of time was spent on validations. We were able to make it faster by marking all fields as read only. The "regular" serializer does not define any validation, so marking fields as read only is not expected to be faster. Let's make sure:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">rest_framework</span> <span class="kn">import</span> <span class="n">serializers</span>
<span class="k">class</span> <span class="nc">UserReadOnlySerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">Serializer</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">last_login</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">first_name</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">last_name</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">EmailField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">is_staff</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">is_active</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">date_joined</span> <span class="o">=</span> <span class="n">serializers</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>And running the benchmark for a user instance:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'for i in range(5000): UserReadOnlySerializer(u).data'</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'tottime'</span><span class="p">)</span>
<span class="hll"><span class="go">3360009 function calls (3210009 primitive calls) in 2.254 seconds</span>
</span>
<span class="go">Ordered by: internal time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 55000 0.329 0.000 0.433 0.000 fields.py:319(__init__)</span>
<span class="go">155000/5000 0.241 0.000 1.385 0.000 copy.py:132(deepcopy)</span>
<span class="go"> 50000 0.161 0.000 1.000 0.000 fields.py:626(__deepcopy__)</span>
<span class="go"> 310000 0.095 0.000 0.095 0.000 {built-in method builtins.getattr}</span>
<span class="go"> 20000 0.088 0.000 0.319 0.000 fields.py:762(__init__)</span>
<span class="go"> 50000 0.087 0.000 0.129 0.000 fields.py:365(bind)</span>
<span class="go"> 5000 0.073 0.000 2.086 0.000 serializers.py:508(to_representation)</span>
<span class="go"> 55000 0.055 0.000 0.067 0.000 fields.py:616(__new__)</span>
<span class="go"> 5000 0.054 0.000 1.342 0.000 copy.py:268(_reconstruct)</span>
<span class="go"> 235000 0.053 0.000 0.053 0.000 {method 'update' of 'dict' objects}</span>
<span class="go"> 25000 0.052 0.000 0.057 0.000 deconstruct.py:14(__new__)</span>
<span class="go"> 260000 0.049 0.000 0.076 0.000 {built-in method builtins.isinstance}</span>
</pre></div>
<p>As expected, marking the fields as readonly didn't make a significant difference compared to the "regular" serializer. This reaffirms that the time was spent on validations derived from the model's field definitions.</p>
<h3 id="results-summary"><a class="toclink" href="#results-summary">Results Summary</a></h3>
<p>Here is a summary of the results so far:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>serializer</th>
<th>seconds</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>UserModelSerializer</code></td>
<td>12.818</td>
</tr>
<tr>
<td><code>UserReadOnlyModelSerializer</code></td>
<td>7.407</td>
</tr>
<tr>
<td><code>UserSerializer</code></td>
<td>2.101</td>
</tr>
<tr>
<td><code>UserReadOnlySerializer</code></td>
<td>2.254</td>
</tr>
<tr>
<td><code>serialize_user</code></td>
<td>0.034</td>
</tr>
</tbody>
</table>
</div>
<hr>
<h2 id="why-is-this-happening"><a class="toclink" href="#why-is-this-happening">Why is This Happening?</a></h2>
<p>A lot of articles were written about serialization performance in Python. As expected, most articles focus on improving DB access using techniques like <code>select_related</code> and <code>prefetch_related</code>. While both are valid ways to improve the <em>overall</em> response time of an API request, they don't address the serialization itself. I suspect this is because nobody expects serialization to be slow.</p>
<h3 id="prior-work"><a class="toclink" href="#prior-work">Prior Work</a></h3>
<p>Other articles that do focus solely on serialization usually avoid fixing DRF, and instead motivate <a href="https://engineering.betterworks.com/2015/09/04/ditching-django-rest-framework-serializers-for-serpy/" rel="noopener">new serialization frameworks</a> such as <a href="https://marshmallow.readthedocs.io" rel="noopener">marshmallow</a> and <a href="https://serpy.readthedocs.io/en/latest/" rel="noopener">serpy</a>. There is even a site devoted to <a href="https://voidfiles.github.io/python-serialization-benchmark/" rel="noopener">comparing serialization formats in Python</a>. To save you a click, DRF always comes last.</p>
<p>In late 2013, Tom Christie, the creator of Django Rest Framework, wrote <a href="https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/" rel="noopener">an article</a> discussing some of DRF's drawbacks. In his benchmarks, serialization accounted for 12% of the total time spend on processing a single request. In the summary, Tom recommends to not always resort to serialization:</p>
<blockquote>
<p><strong>4. You don't always need to use serializers.</strong></p>
<p>For performance critical views you might consider dropping the serializers entirely and simply use .values() in your database queries.</p>
</blockquote>
<p>As we see in a bit, this is solid advice.</p>
<h3 id="fixing-djangos-lazy"><a class="toclink" href="#fixing-djangos-lazy">Fixing Django's <code>lazy</code></a></h3>
<p>In the first benchmark using <code>ModelSerializer</code> we saw a significant amount of time being spent in <code>functional.py</code>, and more specifically in the function <code>lazy</code>.</p>
<p>The function <code>lazy</code> is used internally by Django for many things such as verbose names, templates etc. <a href="https://github.com/django/django/blob/2.2.1/django/utils/functional.py#L92-L207" rel="noopener">The source</a> describes <code>lazy</code> as follows:</p>
<blockquote>
<p>Encapsulate a function call and act as a proxy for methods that are called on the result of that function. The function is not evaluated until one of the methods on the result is called.</p>
</blockquote>
<p>The <code>lazy</code> function does its magic by creating a proxy of the result class. To create the proxy, <code>lazy</code> iterates over all attributes and functions of the result class (and its super-classes), and creates a wrapper class which evaluates the function only when its result is actually used.</p>
<p>For large result classes, it can take some time to create the proxy. So, to speed things up, <code>lazy</code> caches the proxy. But as it turns out, <strong>a small oversight in the code completely broke the cache mechanism, making the <code>lazy</code> function <em>very very</em> slow.</strong></p>
<p>To get a sense of just how slow <code>lazy</code> is without proper caching, let's use a simple function which returns an <code>str</code> (the result class), such as <code>upper</code>. We choose <code>str</code> because it has a lot of methods, so it should take a while to set up a proxy for it.</p>
<p>To establish a baseline, we benchmark using <code>str.upper</code> directly, without <code>lazy</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">cProfile</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils.functional</span> <span class="kn">import</span> <span class="n">lazy</span>
<span class="gp">>>> </span><span class="n">upper</span> <span class="o">=</span> <span class="nb">str</span><span class="o">.</span><span class="n">upper</span>
<span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'''for i in range(50000): upper('hello') + ""'''</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'cumtime'</span><span class="p">)</span>
<span class="hll"><span class="go"> 50003 function calls in 0.034 seconds</span>
</span>
<span class="go"> Ordered by: cumulative time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 1 0.000 0.000 0.034 0.034 {built-in method builtins.exec}</span>
<span class="go"> 1 0.024 0.024 0.034 0.034 <string>:1(<module>)</span>
<span class="go"> 50000 0.011 0.000 0.011 0.000 {method 'upper' of 'str' objects}</span>
<span class="go"> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}</span>
</pre></div>
<p>Now for the scary part, the exact same function but this time wrapped with <code>lazy</code>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">>>> </span><span class="n">lazy_upper</span> <span class="o">=</span> <span class="n">lazy</span><span class="p">(</span><span class="n">upper</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span>
</span><span class="gp">>>> </span><span class="n">cProfile</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">'''for i in range(50000): lazy_upper('hello') + ""'''</span><span class="p">,</span> <span class="n">sort</span><span class="o">=</span><span class="s1">'cumtime'</span><span class="p">)</span>
<span class="hll"><span class="go"> 4900111 function calls in 1.139 seconds</span>
</span>
<span class="go"> Ordered by: cumulative time</span>
<span class="go"> ncalls tottime percall cumtime percall filename:lineno(function)</span>
<span class="go"> 1 0.000 0.000 1.139 1.139 {built-in method builtins.exec}</span>
<span class="go"> 1 0.037 0.037 1.139 1.139 <string>:1(<module>)</span>
<span class="go"> 50000 0.018 0.000 1.071 0.000 functional.py:160(__wrapper__)</span>
<span class="go"> 50000 0.028 0.000 1.053 0.000 functional.py:66(__init__)</span>
<span class="go"> 50000 0.500 0.000 1.025 0.000 functional.py:83(__prepare_class__)</span>
<span class="go">4600000 0.519 0.000 0.519 0.000 {built-in method builtins.hasattr}</span>
<span class="go"> 50000 0.024 0.000 0.031 0.000 functional.py:106(__wrapper__)</span>
<span class="go"> 50000 0.006 0.000 0.006 0.000 {method 'mro' of 'type' objects}</span>
<span class="go"> 50000 0.006 0.000 0.006 0.000 {built-in method builtins.getattr}</span>
<span class="go"> 54 0.000 0.000 0.000 0.000 {built-in method builtins.setattr}</span>
<span class="go"> 54 0.000 0.000 0.000 0.000 functional.py:103(__promise__)</span>
<span class="go"> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}</span>
</pre></div>
<p>No mistake! Using <code>lazy</code> it took 1.139 seconds to turn 5,000 strings uppercase. The same exact function used directly took only 0.034 seconds. That is 33.5 faster.</p>
<p>This was obviously an oversight. The developers were clearly aware of the importance of caching the proxy. A PR was issued, and merged shortly after (diff <a href="https://github.com/django/django/commit/a2c31e12da272acc76f3a3a0157fae9a7f6477ac" rel="noopener">here</a>). Once released, this patch is supposed to make Django overall performance a bit better.</p>
<h3 id="fixing-django-rest-framework"><a class="toclink" href="#fixing-django-rest-framework">Fixing Django Rest Framework</a></h3>
<p>DRF uses <code>lazy</code> for validations and fields verbose names. When all of these lazy evaluations are put together, you get a noticeable slowdown.</p>
<p>The fix to <code>lazy</code> in Django would have solved this issue for DRF as well after a minor fix, but nonetheless, <a href="https://github.com/encode/django-rest-framework/commit/c2293e9f251b1f215825186a7bcbf5a006df0cb0" rel="noopener">a separate fix to DRF</a> was made to replace <code>lazy</code> with something more efficient.</p>
<p>To see the effect of the changes, install the latest of both Django and DRF:</p>
<div class="highlight"><pre><span></span><span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>git+https://github.com/encode/django-rest-framework
<span class="gp gp-VirtualEnv">(venv)</span> <span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>git+https://github.com/django/django
</pre></div>
<p>After applying both patches, we ran the same benchmark again. These are the results side by side:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>serializer</th>
<th>before</th>
<th>after</th>
<th>% change</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>UserModelSerializer</code></td>
<td>12.818</td>
<td>5.674</td>
<td>-55%</td>
</tr>
<tr>
<td><code>UserReadOnlyModelSerializer</code></td>
<td>7.407</td>
<td>5.323</td>
<td>-28%</td>
</tr>
<tr>
<td><code>UserSerializer</code></td>
<td>2.101</td>
<td>2.146</td>
<td>+2%</td>
</tr>
<tr>
<td><code>UserReadOnlySerializer</code></td>
<td>2.254</td>
<td>2.125</td>
<td>-5%</td>
</tr>
<tr>
<td><code>serialize_user</code></td>
<td>0.034</td>
<td>0.034</td>
<td>0%</td>
</tr>
</tbody>
</table>
</div>
<p>To sum up the results of the changes to both Django and DRF:</p>
<ul>
<li>Serialization time for writable <code>ModelSerializer</code> was cut by half.</li>
<li>Serialization time for a read only <code>ModelSerializer</code> was cut by almost a third.</li>
<li>As expected, there is no noticeable difference in the other serialization methods.</li>
</ul>
<hr>
<h2 id="takeaway"><a class="toclink" href="#takeaway">Takeaway</a></h2>
<p>Our takeaways from this experiment were:</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>Upgrade DRF and Django once these patches make their way into a formal release.</p>
</div>
<p>Both PR's were merged but not yet released.</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>In performance critical endpoints, use a "regular" serializer, or none at all.</p>
</div>
<p>We had several places where clients were fetching large amounts or data using an API. The API was used only for reading data from the server, so we decided to not use a <code>Serializer</code> at all, and inline the serialization instead.</p>
<div class="admonition tip">
<p class="admonition-title">Take away</p>
<p>Serializer fields that are not used for writing or validation, should be read only.</p>
</div>
<p>As we've seen in the benchmarks, the way validations are implemented makes them expensive. Marking fields as read only eliminate unnecessary additional cost.</p>
<hr>
<h2 id="bonus-forcing-good-habits"><a class="toclink" href="#bonus-forcing-good-habits">Bonus: Forcing Good Habits</a></h2>
<p>To make sure developers don't forget to set read only fields, we added a Django check to make sure all <code>ModelSerializer</code>s set <code>read_only_fields</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/checks.py</span>
<span class="kn">import</span> <span class="nn">django.core.checks</span>
<span class="nd">@django</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">checks</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="s1">'rest_framework.serializers'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">check_serializers</span><span class="p">(</span><span class="n">app_configs</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="kn">import</span> <span class="nn">inspect</span>
<span class="kn">from</span> <span class="nn">rest_framework.serializers</span> <span class="kn">import</span> <span class="n">ModelSerializer</span>
<span class="kn">import</span> <span class="nn">conf.urls</span> <span class="c1"># noqa, force import of all serializers.</span>
<span class="k">for</span> <span class="n">serializer</span> <span class="ow">in</span> <span class="n">ModelSerializer</span><span class="o">.</span><span class="n">__subclasses__</span><span class="p">():</span>
<span class="c1"># Skip third-party apps.</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">inspect</span><span class="o">.</span><span class="n">getfile</span><span class="p">(</span><span class="n">serializer</span><span class="p">)</span>
<span class="k">if</span> <span class="n">path</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'site-packages'</span><span class="p">)</span> <span class="o">></span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">serializer</span><span class="o">.</span><span class="n">Meta</span><span class="p">,</span> <span class="s1">'read_only_fields'</span><span class="p">):</span>
<span class="k">continue</span>
<span class="k">yield</span> <span class="n">django</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">checks</span><span class="o">.</span><span class="n">Warning</span><span class="p">(</span>
<span class="s1">'ModelSerializer must define read_only_fields.'</span><span class="p">,</span>
<span class="n">hint</span><span class="o">=</span><span class="s1">'Set read_only_fields in ModelSerializer.Meta'</span><span class="p">,</span>
<span class="n">obj</span><span class="o">=</span><span class="n">serializer</span><span class="p">,</span>
<span class="nb">id</span><span class="o">=</span><span class="s1">'H300'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>With this check in place, when a developer adds a serializer she must also set <code>read_only_fields</code>. If the serializer is writable, <code>read_only_fields</code> can be set to an empty tuple. If a developer forgets to set <code>read_only_fields</code>, she gets the following error:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>manage.py<span class="w"> </span>check
<span class="go">System check identified some issues:</span>
<span class="go">WARNINGS:</span>
<span class="go"><class 'serializers.UserSerializer'>: (H300) ModelSerializer must define read_only_fields.</span>
<span class="go"> HINT: Set read_only_fields in ModelSerializer.Meta</span>
<span class="go">System check identified 1 issue (4 silenced).</span>
</pre></div>
<p>We use Django checks a lot to make sure nothing falls through the cracks. You can find many other useful checks in this article about <a href="/automating-the-boring-stuff-in-django-using-the-check-framework">how we use the Django system check framework</a>.</p>How to Let Google Know of Other Languages in Your Django Site2019-05-07T00:00:00+03:002019-05-07T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-05-07:/django-multi-language-site-hreflang<p>If you have a public facing Django site in multiple languages, you probably want to let Google and other search engines know about it.</p><hr>
<p>If you have a public facing Django site in multiple languages, you probably want to let Google and other search engines know about it.</p>
<figure><img alt="Linguistic map of the world (<a href="https://en.wikipedia.org/wiki/Linguistic_map">source</a>)" src="https://hakibenita.com/images/01-django-multi-language-site-hreflang.png"><figcaption>Linguistic map of the world (<a href="https://en.wikipedia.org/wiki/Linguistic_map">source</a>)</figcaption>
</figure>
<h2 id="multi-language-django-site"><a class="toclink" href="#multi-language-django-site">Multi-Language Django Site</a></h2>
<p>Django has a <a href="https://docs.djangoproject.com/en/2.2/topics/i18n/" rel="noopener">very extensive framework</a> to serve sites in multiple languages.
The least amount of setup necessary to add additional languages to a Django site are these.</p>
<p>Activate the <a href="https://docs.djangoproject.com/en/2.2/ref/settings/#std:setting-USE_I18N" rel="noopener">i18n framework</a> in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">USE_I18N</span> <span class="o">=</span> <span class="kc">True</span>
</pre></div>
<p>Define the <a href="https://docs.djangoproject.com/en/2.2/ref/settings/#languages" rel="noopener">supported languages</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="kn">from</span> <span class="nn">django.utils.translation</span> <span class="kn">import</span> <span class="n">gettext_lazy</span> <span class="k">as</span> <span class="n">_</span>
<span class="n">LANGUAGES</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span><span class="s1">'en'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'English'</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'he'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Hebrew'</span><span class="p">)),</span>
<span class="p">]</span>
</pre></div>
<p>Set the <a href="https://docs.djangoproject.com/en/2.2/ref/settings/#language-code" rel="noopener">default language</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">LANGUAGE_CODE</span> <span class="o">=</span> <span class="s1">'en'</span>
</pre></div>
<p>Add <code>LocaleMiddleware</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">MIDDLEWARE</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s1">'django.middleware.locale.LocaleMiddleware'</span><span class="p">,</span>
<span class="c1"># ...</span>
<span class="p">]</span>
</pre></div>
<p>Use <code>gettext</code> to mark texts for translation:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/views.py</span>
<span class="kn">from</span> <span class="nn">django.utils.translation</span> <span class="kn">import</span> <span class="n">gettext_lazy</span> <span class="k">as</span> <span class="n">_</span>
<span class="kn">from</span> <span class="nn">django.http</span> <span class="kn">import</span> <span class="n">HttpResponse</span>
<span class="k">def</span> <span class="nf">about</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="o">-></span> <span class="n">HttpResponse</span><span class="p">:</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="n">_</span><span class="p">(</span><span class="s1">'Hello!'</span><span class="p">))</span>
</pre></div>
<p>Generate the translation files:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>manage.py<span class="w"> </span>makemessages
</pre></div>
<p>Translate the text:</p>
<div class="highlight"><pre><span></span><span class="nv">msgid</span> <span class="s">"Hello!"</span>
<span class="nv">msgstr</span> <span class="s">"Χ©ΧΧΧ!"</span>
</pre></div>
<p>Compile the translation files:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>manage.py<span class="w"> </span>compilemessages
</pre></div>
<p>Serve views in multiple languages using <a href="https://docs.djangoproject.com/en/2.2/topics/i18n/translation/#django.conf.urls.i18n.i18n_patterns" rel="noopener"><code>i18n_patterns</code></a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># urls.py</span>
<span class="kn">from</span> <span class="nn">django.conf.urls.i18n</span> <span class="kn">import</span> <span class="n">i18n_patterns</span>
<span class="kn">from</span> <span class="nn">django.conf.urls</span> <span class="kn">import</span> <span class="n">url</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">views</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="n">i18n_patterns</span><span class="p">(</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^about$'</span><span class="p">,</span> <span class="n">views</span><span class="o">.</span><span class="n">about</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'about'</span><span class="p">),</span>
<span class="p">)</span>
</pre></div>
<p>Make sure that it works:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>http://localhost:8000/en/about
<span class="go">Hello!</span>
<span class="gp">$ </span>curl<span class="w"> </span>http://localhost:8000/he/about
<span class="go">Χ©ΧΧΧ!</span>
</pre></div>
<p>This is it!</p>
<p>There are a few additional steps like <a href="https://docs.djangoproject.com/en/2.2/topics/i18n/translation/#set-language-redirect-view" rel="noopener">adding a view to switch the language</a>, but all in all, your multi-language Django site is ready to go!</p>
<h2 id="link-to-other-languages-using-hreflang"><a class="toclink" href="#link-to-other-languages-using-hreflang">Link to Other Languages Using <code>hreflang</code></a></h2>
<p>To let search engines know a page is available in a different language, you can use a special link tag:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"alternate"</span> <span class="na">hreflang</span><span class="o">=</span><span class="s">"en"</span> <span class="na">href</span><span class="o">=</span><span class="s">"https://example.com/en"</span> <span class="p">/></span>
</pre></div>
<p>The tag has the following attributes:</p>
<ul>
<li><code>hreflang</code>: Language code of the linked page.</li>
<li><code>href</code>: Link to the page in the specified language.</li>
</ul>
<p>According to <a href="https://support.google.com/webmasters/answer/189077?hl=en" rel="noopener">Google's guidelines</a>, and the <a href="https://en.wikipedia.org/wiki/Hreflang" rel="noopener">information in Wikipedia</a>, these are the rules we need to follow:</p>
<ol>
<li>Use absolute URLs, including the schema.</li>
<li>Link must be valid, and the linked page should be in the specified language.</li>
<li>List all languages, including the current one.</li>
<li>If language X links to language Y, language Y should link back to language X.</li>
</ol>
<p>To implement the following in Django, start by listing the available languages in a template, and set the language code in the <code>hreflang</code> attribute:</p>
<div class="highlight"><pre><span></span><span class="cp">{%</span> <span class="k">load</span> <span class="nv">i18n</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">get_available_languages</span> <span class="k">as</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">language_code</span><span class="o">,</span> <span class="nv">language_name</span> <span class="k">in</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="x"><link</span>
<span class="x"> rel="alternate"</span>
<span class="x"> hreflang="</span><span class="cp">{{</span> <span class="nv">language_code</span> <span class="cp">}}</span><span class="x">"</span>
<span class="x"> href="TODO" /></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div>
<p>The next step is to add localized links for each language. It took some digging, but it turns out Django already has a function called <code>translate_url</code> we can use:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django</span> <span class="kn">import</span> <span class="n">urls</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">translation</span>
<span class="gp">>>> </span><span class="n">translation</span><span class="o">.</span><span class="n">activate</span><span class="p">(</span><span class="s1">'en'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">reverse</span><span class="p">(</span><span class="s1">'about'</span><span class="p">)</span>
<span class="go">'/en/about'</span>
<span class="gp">>>> </span><span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="s1">'/en/about'</span><span class="p">,</span> <span class="s1">'he'</span><span class="p">)</span>
<span class="go">'/he/about'</span>
</pre></div>
<p>The function <code>translate_url</code> accepts a URL and a language, and returns the URL in that language. In the example above, we activated the English language and got the a URL prefixed with <code>/en</code>.</p>
<p>The guidelines require absolute URLs.</p>
<p>Let's make sure <code>translate_url</code> can handle absolute URLs as well:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="s1">'https://example.com/en/about'</span><span class="p">,</span> <span class="s1">'he'</span><span class="p">)</span>
<span class="go">'https://example.com/he/about'</span>
</pre></div>
<p>Great! <code>translate_url</code> can "translate" absolute URLs.</p>
<p>How about URLs with query parameters or hash?</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="s1">'https://example.com/en/about?utm_source=search#top'</span><span class="p">,</span> <span class="s1">'he'</span><span class="p">)</span>
<span class="go">'https://example.com/he/about?utm_source=search#top'</span>
</pre></div>
<p>Cool, it worked too!</p>
<p><strong>NOTE</strong>: It doesn't make much sense to have a page URL with query params and hashes in a place like a link tag (or canonical for that matter). The reason I mention it is because it might be useful for deep linking into other pages.</p>
<p>This is basically all that we need. But, <code>translate_url</code> has some limitations that are worth knowing.</p>
<p>Translate a non-localized URL:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="s1">'/about'</span><span class="p">,</span> <span class="s1">'en'</span><span class="p">)</span>
<span class="go">'/about'</span>
</pre></div>
<p>If you use the <a href="https://docs.djangoproject.com/en/2.2/ref/middleware/#django.middleware.locale.LocaleMiddleware" rel="noopener">built-in <code>LocaleMiddleware</code></a> and try to navigate to <code>/about</code>, Django will redirect you to the page in the current language. <code>translate_url</code> is unable to do the same.</p>
<div class="admonition info">
<p class="admonition-title">good to know</p>
<p><code>translate_url</code> cannot "translate" a non-localized URL (even though it might exist).</p>
</div>
<p>What about translating a URL already in a language which is not the current language?</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">translation</span><span class="o">.</span><span class="n">activate</span><span class="p">(</span><span class="s1">'en'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="s1">'/he/about'</span><span class="p">,</span> <span class="s1">'en'</span><span class="p">)</span>
<span class="go">'/he/about'</span>
</pre></div>
<p>Nope, can't do that either.</p>
<div class="admonition info">
<p class="admonition-title">good to know</p>
<p><code>translate_url</code> can only translate localized urls in the current language.</p>
</div>
<p>If you look at the <a href="https://github.com/django/django/blob/master/django/urls/base.py#L163" rel="noopener">implementation of <code>translate_url</code></a> this restriction becomes clear:</p>
<div class="highlight"><pre><span></span><span class="c1"># django/urls/base.py</span>
<span class="k">def</span> <span class="nf">translate_url</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">lang_code</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Given a URL (absolute or relative), try to get its translated version in</span>
<span class="sd"> the `lang_code` language (either by i18n_patterns or by translated regex).</span>
<span class="sd"> Return the original URL if no translated version is found.</span>
<span class="sd"> """</span>
<span class="n">parsed</span> <span class="o">=</span> <span class="n">urlsplit</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">resolve</span><span class="p">(</span><span class="n">parsed</span><span class="o">.</span><span class="n">path</span><span class="p">)</span>
<span class="k">except</span> <span class="n">Resolver404</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">match</span><span class="o">.</span><span class="n">namespace</span>
<span class="n">to_be_reversed</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">:</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">match</span><span class="o">.</span><span class="n">namespace</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">url_name</span><span class="p">)</span>
<span class="k">else</span>
<span class="n">to_be_reversed</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">url_name</span>
<span class="k">with</span> <span class="n">override</span><span class="p">(</span><span class="n">lang_code</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="n">to_be_reversed</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">except</span> <span class="n">NoReverseMatch</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">urlunsplit</span><span class="p">((</span><span class="n">parsed</span><span class="o">.</span><span class="n">scheme</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">netloc</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">query</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">fragment</span><span class="p">))</span>
<span class="k">return</span> <span class="n">url</span>
</pre></div>
<p>Django first tries to <code>resolve</code> the URL path. This is Django's way of checking if the URL is valid. Only if the URL is valid, it is split into parts, and reversed in the desired language.</p>
<h3 id="translate_url-template-tag"><a class="toclink" href="#translate_url-template-tag"><code>translate_url</code> template tag</a></h3>
<p>Now that we know how to "translate" URLs to different languages, we need to be able to use it in a template. Django provides us with a way to define <a href="https://docs.djangoproject.com/en/2.2/howto/custom-template-tags/" rel="noopener">custom template tags and filters</a>.</p>
<p>Let's add a custom template tag for <code>translate_url</code> :</p>
<div class="highlight"><pre><span></span><span class="c1"># app/templatetags/urls.py</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span><span class="p">,</span> <span class="n">Any</span>
<span class="kn">from</span> <span class="nn">django</span> <span class="kn">import</span> <span class="n">urls</span>
<span class="n">register</span> <span class="o">=</span> <span class="n">template</span><span class="o">.</span><span class="n">Library</span><span class="p">()</span>
<span class="nd">@register</span><span class="o">.</span><span class="n">simple_tag</span><span class="p">(</span><span class="n">takes_context</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">translate_url</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">],</span> <span class="n">language</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">])</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="w"> </span><span class="sd">"""Get the absolute URL of the current page for the specified language.</span>
<span class="sd"> Usage:</span>
<span class="sd"> {% translate_url 'en' %}</span>
<span class="sd"> """</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">context</span><span class="p">[</span><span class="s1">'request'</span><span class="p">]</span><span class="o">.</span><span class="n">build_absolute_uri</span><span class="p">()</span>
<span class="k">return</span> <span class="n">urls</span><span class="o">.</span><span class="n">translate_url</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">language</span><span class="p">)</span>
</pre></div>
<p>Our <code>translate_url</code> template tag takes context. This is necessary if we want to provide an absolute URL. We use <a href="https://docs.djangoproject.com/en/2.2/ref/request-response/#django.http.HttpRequest.build_absolute_uri" rel="noopener"><code>build_absolute_uri</code></a> to grab the absolute URL from the request.</p>
<p>The tag also accepts the target language code to translate the URL to, and uses <code>translate_url</code> to generate the translated URL.</p>
<p>With our new template tag, we can fill in the blanks in the previous implementation:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="cp">{%</span> <span class="k">load</span> <span class="nv">i18n</span> <span class="nv">urls</span> <span class="cp">%}</span>
</span>
<span class="cp">{%</span> <span class="k">get_available_languages</span> <span class="k">as</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">language_code</span><span class="o">,</span> <span class="nv">language_name</span> <span class="k">in</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="x"><link</span>
<span class="x"> rel="alternate"</span>
<span class="x"> hreflang="</span><span class="cp">{{</span> <span class="nv">language_code</span> <span class="cp">}}</span><span class="x">"</span>
<span class="hll"><span class="x"> href="</span><span class="cp">{%</span> <span class="k">translate_url</span> <span class="nv">language_code</span> <span class="cp">%}</span><span class="x">" /></span>
</span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div>
<h3 id="using-x-default-for-the-default-language"><a class="toclink" href="#using-x-default-for-the-default-language">Using <code>x-default</code> for the default language</a></h3>
<p>The guidelines include another recommendation:</p>
<blockquote>
<p>The reserved value hreflang="x-default" is used when no other language/region matches the user's browser setting. This value is optional, but recommended, as a way for you to control the page when no languages match. A good use is to target your site's homepage where there is a clickable map that enables the user to select their country.</p>
</blockquote>
<p>So it's also a good idea to add a link to some default language. If, for example, we want to make our default language English, we can add the following to the snippet above:</p>
<div class="highlight"><pre><span></span><span class="cp">{%</span> <span class="k">load</span> <span class="nv">i18n</span> <span class="nv">urls</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">get_available_languages</span> <span class="k">as</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">language_code</span><span class="o">,</span> <span class="nv">language_name</span> <span class="k">in</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="x"><link</span>
<span class="x"> rel="alternate"</span>
<span class="x"> hreflang="</span><span class="cp">{{</span> <span class="nv">language_code</span> <span class="cp">}}</span><span class="x">"</span>
<span class="x"> href="</span><span class="cp">{%</span> <span class="k">translate_url</span> <span class="nv">language_code</span> <span class="cp">%}</span><span class="x">" /></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="hll"><span class="x"><link</span>
</span><span class="hll"><span class="x"> rel="alternate"</span>
</span><span class="hll"><span class="x"> hreflang="x-default"</span>
</span><span class="hll"><span class="x"> href="</span><span class="cp">{%</span> <span class="k">translate_url</span> <span class="nv">en</span> <span class="cp">%}</span><span class="x">" /></span>
</span></pre></div>
<p>When we setup our Django project, we already defined a default language. Instead of hard-coding English (or any other language for that matter), we want to use the <code>LANGUAGE_CODE</code> defined in <code>settings.py</code>.</p>
<p>To use values from <code>settings.py</code> in templates, we can use an old trick <a href="/5-ways-to-make-django-admin-safer#visually-distinguish-environments">we used in the past to visually distinguish between environments in Django admin</a>. It's a simple <a href="https://docs.djangoproject.com/en/2.2/ref/templates/api/#using-requestcontext" rel="noopener">context processor</a> that exposes specific values from <code>settings.py</code> to templates through the request context:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/context_processor.py</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">Any</span>
<span class="kn">from</span> <span class="nn">django.conf</span> <span class="kn">import</span> <span class="n">settings</span>
<span class="k">def</span> <span class="nf">from_settings</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]:</span>
<span class="k">return</span> <span class="p">{</span>
<span class="n">attr</span><span class="p">:</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">attr</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">for</span> <span class="n">attr</span> <span class="ow">in</span> <span class="p">(</span>
<span class="s1">'LANGUAGE_CODE'</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>To register the context processor, add the following in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">TEMPLATES</span> <span class="o">=</span> <span class="p">[{</span>
<span class="c1"># ...</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'context_processors'</span><span class="p">:</span> <span class="p">[</span>
<span class="c1">#...</span>
<span class="s1">'app.context_processors.from_settings'</span><span class="p">,</span>
<span class="p">],</span>
<span class="c1">#...</span>
<span class="p">}</span>
<span class="p">]}</span>
</pre></div>
<p>Now that we have access to <code>LANGUAGE_CODE</code> in the template, we can really complete our snippet:</p>
<div class="highlight"><pre><span></span><span class="cp">{%</span> <span class="k">load</span> <span class="nv">i18n</span> <span class="nv">urls</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">get_available_languages</span> <span class="k">as</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">language_code</span><span class="o">,</span> <span class="nv">language_name</span> <span class="k">in</span> <span class="nv">LANGUAGES</span> <span class="cp">%}</span>
<span class="x"><link</span>
<span class="x"> rel="alternate"</span>
<span class="x"> hreflang="</span><span class="cp">{{</span> <span class="nv">language_code</span> <span class="cp">}}</span><span class="x">"</span>
<span class="x"> href="</span><span class="cp">{%</span> <span class="k">translate_url</span> <span class="nv">language_code</span> <span class="cp">%}</span><span class="x">" /></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="x"><link</span>
<span class="x"> rel="alternate"</span>
<span class="x"> hreflang="x-default"</span>
<span class="hll"><span class="x"> href="</span><span class="cp">{%</span> <span class="k">translate_url</span> <span class="nv">LANGUAGE_CODE</span> <span class="cp">%}</span><span class="x">" /></span>
</span></pre></div>
<p>The rendered markup for the about page looks like this:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">link</span>
<span class="na">rel</span><span class="o">=</span><span class="s">"alternate"</span>
<span class="na">hreflang</span><span class="o">=</span><span class="s">"en"</span>
<span class="na">href</span><span class="o">=</span><span class="s">"https://example.com/en/about"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">link</span>
<span class="na">rel</span><span class="o">=</span><span class="s">"alternate"</span>
<span class="na">hreflang</span><span class="o">=</span><span class="s">"he"</span>
<span class="na">href</span><span class="o">=</span><span class="s">"https://example.com/he/about"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">link</span>
<span class="na">rel</span><span class="o">=</span><span class="s">"alternate"</span>
<span class="na">hreflang</span><span class="o">=</span><span class="s">"x-default"</span>
<span class="na">href</span><span class="o">=</span><span class="s">"https://example.com/en/about"</span> <span class="p">/></span>
</pre></div>
<h2 id="final-words"><a class="toclink" href="#final-words">Final Words</a></h2>
<p>Hopefully this short article helped you gain a better understanding of how search engines can identify different languages in your Django site. In the process, you might have also picked up on some little tricks to manipulate localized URLs in Django.</p>How to Create an Index in Django Without Downtime2019-04-10T00:00:00+03:002019-04-10T00:00:00+03:00Haki Benitatag:hakibenita.com,2019-04-10:/how-to-create-django-index-without-downtime<p>If you ever had to maintain a traffic heavy Django site, you probably had to deal with graceful migrations. In the article I explain what atomic and reversible migrations are, how to execute "raw" SQL in migrations the right way, and how using a little known migration command we can completely alter the Django migrations built-in behavior.</p><hr>
<p>If you maintain a Django site with a decent traffic, you probably need to deal with graceful migrations. With the help of the <a href="https://realpython.com" rel="noopener">RealPython</a> team, I wrote an article about one the most common problems in Django migrations: <strong>How to create an index without causing downtime</strong>.</p>
<p>In the article I explain what atomic and reversible migrations are, how to execute "raw" SQL in migrations the right way, and how using a little known migration command called <code>SeparateDatabaseAndState</code>, you can completely alter the Django migrations built-in behavior.</p>
<p><a href="https://realpython.com/create-django-index-without-downtime/" rel="noopener"><strong>Read "How to Create an Index in Django Without Downtime" on RealPython β«</strong></a></p>
<figure><img alt="How to Create an Index in Django Without Downtime" src="https://hakibenita.com/images/01-how-to-create-django-index-without-downtime.png"><figcaption>How to Create an Index in Django Without Downtime</figcaption>
</figure>How to Use Grouping Sets in Django2019-03-10T00:00:00+02:002019-03-10T00:00:00+02:00Haki Benitatag:hakibenita.com,2019-03-10:/how-to-use-grouping-sets-in-django<p>How we cut a heavy admin dashboard response time in half with advanced SQL and some Django hackery. I recently had the pleasure of optimizing an old dashboard. The solution we came up with required some advanced SQL that Django does not support out of the box. In this article I present the solution, how we got to it, and a word of caution.</p><hr>
<p>I recently had the pleasure of optimizing an <a href="/how-to-turn-django-admin-into-a-lightweight-dashboard">old dashboard</a>. The solution we came up with required some advanced SQL that Django does not support out of the box. In this article I present the solution, how we got to it, and a word of caution.</p>
<div class="admonition info">
<p class="admonition-title">advanced SQL</p>
<p>This article covers advanced topics in SQL aggregation. If you need to perform GROUP BY in Django ORM check out <a href="django-group-by-sql">Understand Group by in Django with SQL</a>.</p>
</div>
<h2 id="the-dashboard"><a class="toclink" href="#the-dashboard">The Dashboard</a></h2>
<p>The dashboard is of a sales model. It includes a simple table with metrics grouped by merchants and their devices, and a summary line.</p>
<p>The code to produce the table looks roughly like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Sum</span><span class="p">,</span> <span class="n">Count</span><span class="p">,</span> <span class="n">Avg</span><span class="p">,</span> <span class="n">Q</span>
<span class="n">metrics</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'total'</span><span class="p">:</span> <span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="s1">'avg_charged_amount'</span><span class="p">:</span> <span class="n">Avg</span><span class="p">(</span><span class="s1">'charged_amount'</span><span class="p">),</span>
<span class="s1">'unique_users'</span><span class="p">:</span> <span class="n">Count</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="n">distinct</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
<span class="p">}</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Sale</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'merchant'</span><span class="p">,</span> <span class="s1">'device'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="o">**</span><span class="n">metrics</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<p>The code to produce the summary line uses the same metrics, and looks likes this:</p>
<div class="highlight"><pre><span></span><span class="n">summary</span> <span class="o">=</span> <span class="n">Sale</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="o">**</span><span class="n">metrics</span><span class="p">)</span>
</pre></div>
<p>Our admin gets a nice dashboard that looks roughly like this:</p>
<figure><img alt="A summary line in Django Admin page" src="https://hakibenita.com/images/01-how-to-use-grouping-sets-in-django.png"><figcaption>A summary line in Django Admin page</figcaption>
</figure>
<div class="admonition info">
<p class="admonition-title">see also</p>
<p>On how to create the dashboard above see
<a href="/how-to-turn-django-admin-into-a-lightweight-dashboard">How to Turn Django Admin Into a Lightweight Dashboard</a></p>
</div>
<h2 id="the-problem"><a class="toclink" href="#the-problem">The Problem</a></h2>
<p>The dashboard was working well for about three years. We got good response times and accurate information. However, as data piled up, performance has degraded to a point where the page became unusable.</p>
<p>To analyze the problem, we inspected the SQL, and timed it. The query to produce the table looks like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">charged_amount</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span>
</pre></div>
<p>At worst, this query took about 30s to complete.</p>
<p>The next query executed by the dashboard was used to produce the summary line:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">charged_amount</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
</pre></div>
<p>This query took roughly the same time, ~30s. Together, at their worst, <strong>the two queries took more than a minute to complete.</strong></p>
<h3 id="aggregate-in-memory"><a class="toclink" href="#aggregate-in-memory">Aggregate in Memory</a></h3>
<p>Both queries process the exact same data, the only difference is the <code>GROUP BY</code> key. The first query produces results at the merchant and device level, the second produces the same aggregates for the entire dataset.</p>
<p>The first thought that came to mind was to <strong>calculate the summary by aggregating the results in-memory.</strong></p>
<p>The first metric, <code>total</code>, is easy to calculate:</p>
<div class="highlight"><pre><span></span><span class="n">summary_total</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s1">'total'</span><span class="p">]</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">results</span><span class="p">)</span>
</pre></div>
<p>The second metric is the average charged amount. We can't just sum the average for each device and merchant, we need additional information.</p>
<p>To calculate the average charged amount for all merchants and devices, we need to divide the total charged amount by the number of sales. We already have the number of sales, so we need to add a metric for the total charged amount:</p>
<div class="highlight"><pre><span></span><span class="n">metrics</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'total'</span><span class="p">:</span> <span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="s1">'avg_charged_amount'</span><span class="p">:</span> <span class="n">Avg</span><span class="p">(</span><span class="s1">'charged_amount'</span><span class="p">),</span>
<span class="hll"> <span class="s1">'total_charged_amount'</span><span class="p">:</span> <span class="n">Sum</span><span class="p">(</span><span class="s1">'charged_amount'</span><span class="p">),</span>
</span> <span class="s1">'unique_users'</span><span class="p">:</span> <span class="n">Count</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="n">distinct</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
<span class="p">}</span>
</pre></div>
<p>Now that we have both <code>total</code> and <code>total_charged_amount</code>, we can compute <code>avg_charged_amount</code>:</p>
<div class="highlight"><pre><span></span><span class="n">summary_total</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s1">'total'</span><span class="p">]</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">results</span><span class="p">)</span>
<span class="n">summary_total_charged_amount</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s1">'total_charged_amount'</span><span class="p">]</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">results</span><span class="p">)</span>
<span class="hll"><span class="n">summary_avg_charged_amount</span> <span class="o">=</span> <span class="n">summary_total_charged_amount</span> <span class="o">/</span> <span class="n">summary_total</span>
</span></pre></div>
<p>We have one metric left, <code>unique_users</code>. This metric counts the unique number of users that visited each device at each merchant. The same user can visit several devices at different merchants. If we sum <code>unique_users</code> we won't get the correct metric for the entire set.</p>
<p><strong>It's impossible to accurately compute distinct values from aggregated results, so the solution must be in the database.</strong></p>
<hr>
<h2 id="aggregating-in-the-database"><a class="toclink" href="#aggregating-in-the-database">Aggregating in the Database</a></h2>
<p>Most SQL implementations provide several <a href="https://www.postgresql.org/docs/devel/queries-table-expressions.html#QUERIES-GROUPING-SETS" rel="noopener">useful functions to aggregate data</a> at different levels.</p>
<div class="admonition info">
<p class="admonition-title">Database Support</p>
<p>Throughout this article I use PostgreSQL. Similar functions are available in <a href="https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm" rel="noopener">Oracle</a>, <a href="https://dev.mysql.com/doc/refman/8.0/en/group-by-modifiers.html" rel="noopener">MySQL</a> and <a href="https://docs.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-2017" rel="noopener">MSSQL</a>. As far as I know, SQLite has no support for the functions I'm about to use.</p>
</div>
<p>Let's start with some data:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sale</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="n">merchant</span><span class="w"> </span><span class="nb">varchar</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span><span class="w"> </span><span class="nb">varchar</span><span class="p">,</span>
<span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="n">charged_amount</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="n">sold_at</span><span class="w"> </span><span class="k">timestamp</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sale</span>
<span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">merchant</span><span class="p">,</span><span class="w"> </span><span class="n">device</span><span class="p">,</span><span class="w"> </span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">charged_amount</span><span class="p">,</span><span class="w"> </span><span class="n">sold_at</span><span class="p">)</span>
<span class="k">VALUES</span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D1'</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">54</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D1'</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'2 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D2'</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">22</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D2'</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'3 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D2'</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">15</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'4 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D3'</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">29</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="s1">'Walmart'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D3'</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">45</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'3 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D1'</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">47</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'5 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">9</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D1'</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">223</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D1'</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">67</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'2 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">11</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D4'</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">25</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'5 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D5'</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">125</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'4 days'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">,</span><span class="w"> </span><span class="s1">'Costco'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D5'</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">150</span><span class="p">,</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1 days'</span><span class="p">);</span>
</pre></div>
<p>The query we used in our dashboard produces the following results:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">charged_amount</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span><span class="p">;</span>
<span class="go"> merchant | device | total | avg_charged_amount | unique_users</span>
<span class="go">----------+--------+-------+----------------------+--------------</span>
<span class="go"> Costco | D1 | 3 | 112.3333333333333333 | 3</span>
<span class="go"> Costco | D4 | 1 | 25.0000000000000000 | 1</span>
<span class="go"> Costco | D5 | 2 | 137.5000000000000000 | 2</span>
<span class="go"> Walmart | D1 | 2 | 77.0000000000000000 | 1</span>
<span class="go"> Walmart | D2 | 3 | 17.0000000000000000 | 3</span>
<span class="go"> Walmart | D3 | 2 | 37.0000000000000000 | 1</span>
<span class="go">(6 rows)</span>
</pre></div>
<p>The query to produce the summary line:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">charged_amount</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span><span class="p">;</span>
<span class="go"> total | avg_charged_amount | unique_users</span>
<span class="go">-------+---------------------+--------------</span>
<span class="go"> 13 | 70.4615384615384615 | 5</span>
<span class="go">(1 row)</span>
</pre></div>
<h3 id="rollup"><a class="toclink" href="#rollup">Rollup</a></h3>
<p>The first special <code>GROUP BY</code> expression is <code>ROLLUP</code>. As the name suggest, <code>ROLLUP</code> aggregate at the lowest level and up:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">ROLLUP</span><span class="p">(</span><span class="n">device</span><span class="p">,</span><span class="w"> </span><span class="n">merchant</span><span class="p">);</span>
</span>
<span class="go"> device | merchant | count</span>
<span class="go">--------+----------+-------</span>
<span class="go"> | | 13</span>
<span class="go"> D3 | Walmart | 2</span>
<span class="go"> D5 | Costco | 2</span>
<span class="go"> D1 | Walmart | 2</span>
<span class="go"> D4 | Costco | 1</span>
<span class="go"> D1 | Costco | 3</span>
<span class="go"> D2 | Walmart | 3</span>
<span class="go"> D2 | | 3</span>
<span class="go"> D4 | | 1</span>
<span class="go"> D1 | | 5</span>
<span class="go"> D5 | | 2</span>
<span class="go"> D3 | | 2</span>
<span class="go">(12 rows)</span>
</pre></div>
<p>We grouped by two fields, <code>device</code> and <code>merchant</code>, and we got three groups of aggregation:</p>
<ol>
<li><code>()</code> <em>all</em></li>
<li><code>(device, merchant)</code></li>
<li><code>(device)</code></li>
</ol>
<p><code>ROLLUP</code> aggregates "up", so the order of the fields is significant. Let's flip the order of the fields:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">ROLLUP</span><span class="p">(</span><span class="n">merchant</span><span class="p">,</span><span class="w"> </span><span class="n">device</span><span class="p">);</span>
</span>
<span class="go"> device | merchant | count</span>
<span class="go">--------+----------+-------</span>
<span class="go"> | | 13</span>
<span class="go"> D4 | Costco | 1</span>
<span class="go"> D3 | Walmart | 2</span>
<span class="go"> D1 | Walmart | 2</span>
<span class="go"> D1 | Costco | 3</span>
<span class="go"> D5 | Costco | 2</span>
<span class="go"> D2 | Walmart | 3</span>
<span class="go"> | Costco | 6</span>
<span class="go"> | Walmart | 7</span>
</pre></div>
<p>This time we got the following groups:</p>
<ol>
<li><code>()</code> <em>all</em></li>
<li><code>(merchant, device)</code></li>
<li><code>(merchant)</code></li>
</ol>
<h3 id="cube"><a class="toclink" href="#cube">Cube</a></h3>
<p>The next group by expression is most likely borrowed from <a href="https://en.wikipedia.org/wiki/Online_analytical_processing" rel="noopener">OLAP</a>, which often mention cubes. The <code>CUBE</code> expression aggregates all possible combinations:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">CUBE</span><span class="p">(</span><span class="n">merchant</span><span class="p">,</span><span class="w"> </span><span class="n">device</span><span class="p">);</span>
</span>
<span class="go"> device | merchant | count</span>
<span class="go">--------+----------+-------</span>
<span class="go"> | | 13</span>
<span class="go"> D4 | Costco | 1</span>
<span class="go"> D3 | Walmart | 2</span>
<span class="go"> D1 | Walmart | 2</span>
<span class="go"> D1 | Costco | 3</span>
<span class="go"> D5 | Costco | 2</span>
<span class="go"> D2 | Walmart | 3</span>
<span class="go"> | Costco | 6</span>
<span class="go"> | Walmart | 7</span>
<span class="go"> D2 | | 3</span>
<span class="go"> D4 | | 1</span>
<span class="go"> D1 | | 5</span>
<span class="go"> D5 | | 2</span>
<span class="go"> D3 | | 2</span>
<span class="go">(14 rows)</span>
</pre></div>
<p>The results contain the following groups:</p>
<ol>
<li><code>()</code> <em>all</em></li>
<li><code>(device, merchant)</code></li>
<li><code>(merchant)</code></li>
<li><code>(device)</code></li>
</ol>
<h3 id="grouping-sets"><a class="toclink" href="#grouping-sets">Grouping Sets</a></h3>
<p>Grouping sets allows us to provide the exact groups of aggregation we want. For example, to recreate the results of the <code>ROLLUP</code> above, we can provide the following grouping sets:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">GROUPING</span><span class="w"> </span><span class="k">SETS</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">merchant</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">merchant</span><span class="p">,</span><span class="w"> </span><span class="n">device</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="p">()</span>
</span><span class="hll"><span class="w"> </span><span class="p">)</span>
</span>
<span class="w"> </span><span class="n">device</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">merchant</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">count</span>
<span class="c1">--------+----------+-------</span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">13</span>
<span class="w"> </span><span class="n">D4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="n">D3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span>
<span class="w"> </span><span class="n">D1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span>
<span class="w"> </span><span class="n">D1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span>
<span class="w"> </span><span class="n">D5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span>
<span class="w"> </span><span class="n">D2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">6</span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">7</span>
<span class="p">(</span><span class="mf">9</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div>
<p>Each list of fields inside parentheses in the <code>GROUPING SETS</code> is a group in the result.</p>
<p>Both <code>CUBE</code> and <code>ROLLUP</code> can be implemented using <code>GROUPING SETS</code>. The following table shows the equivalent <code>GROUPING SETS</code> expression for both <code>ROLLUP</code> and <code>CUBE</code>, on two fields <code>a</code> and <code>b</code>:</p>
<table>
<thead>
<tr>
<th>expression</th>
<th>equivalent grouping sets</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>ROLLUP(a, b)</code></td>
<td><code>GROUPING SETS ((a, b), (a), ())</code></td>
</tr>
<tr>
<td><code>CUBE(a, b)</code></td>
<td><code>GROUPING SETS ((a, b), (a), (b) ())</code></td>
</tr>
</tbody>
</table>
<p>In our original query we had metrics at the merchant and device level, and we wanted to get a summary line. Using <code>GROUPING SETS</code>, this query will look like this:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">merchant</span><span class="p">,</span>
<span class="w"> </span><span class="n">device</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total</span><span class="p">,</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">charged_amount</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="p">,</span>
<span class="w"> </span><span class="n">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">unique_users</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">sale</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="hll"><span class="w"> </span><span class="k">GROUPING</span><span class="w"> </span><span class="k">SETS</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="p">(</span><span class="n">merchant</span><span class="p">,</span><span class="w"> </span><span class="n">device</span><span class="p">),</span>
</span><span class="hll"><span class="w"> </span><span class="p">()</span>
</span><span class="hll"><span class="w"> </span><span class="p">)</span>
</span>
<span class="w"> </span><span class="n">merchant</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">device</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">total</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">avg_charged_amount</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">unique_users</span>
<span class="c1">----------+--------+-------+----------------------+--------------</span>
<span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">112.3333333333333333</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span>
<span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">25.0000000000000000</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="n">Costco</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">137.5000000000000000</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span>
<span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">77.0000000000000000</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
<span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">17.0000000000000000</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span>
<span class="w"> </span><span class="n">Walmart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">37.0000000000000000</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span>
<span class="hll"><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">13</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">70.4615384615384615</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">5</span>
</span><span class="p">(</span><span class="mf">7</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div>
<p>The first 6 lines are similar to the original query. The last line is similar to the results from the summary query we used.</p>
<p><strong>Using <code>GROUPING SETS</code> we get the results we need in just one query instead of two.</strong></p>
<hr>
<h2 id="using-grouping-sets-in-django"><a class="toclink" href="#using-grouping-sets-in-django">Using Grouping Sets in Django</a></h2>
<p>Now that we have the query, we need to find a way to use it with Django. Unfortunately, Django still has no support for grouping sets. On top of that, the query is generated by Django Admin, and it includes predicates from list filters and date hierarchy. We couldn't just use raw SQL.</p>
<p><strong>We need to find a way to modify a given Django QuerySet, and add grouping sets to it.</strong></p>
<p>Since Django has no built-in support for grouping sets, we are forced to manipulate the query. The base query we need to manipulate is the query that Django generates, along with any predicates and annotations added by Django Admin. Eventually, we want to execute the query in the database the same way Django does.</p>
<h3 id="getting-the-query"><a class="toclink" href="#getting-the-query">Getting the Query</a></h3>
<p>A nice feature of <a href="https://docs.djangoproject.com/en/2.1/ref/models/querysets/#queryset-api-reference" rel="noopener">Django QuerySet</a> is that it provides the generated SQL:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="p">(</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="n">Sale</span><span class="o">.</span><span class="n">objects</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'merchant'</span><span class="p">,</span> <span class="s1">'device'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="o">**</span><span class="n">metrics</span><span class="p">)</span>
<span class="gp">>>> </span><span class="p">)</span>
<span class="hll"><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="p">)</span>
</span><span class="go">SELECT "sale"."merchant", "sale"."device", COUNT("sale"."id") AS "total",</span>
<span class="go">AVG("sale"."charged_amount") AS "avg_charged_amount",</span>
<span class="go">COUNT(DISTINCT "sale"."user_id") AS "unique_users" FROM "sale"</span>
<span class="go">GROUP BY "sale"."merchant", "sale"."device"</span>
</pre></div>
<p>This is a simple query, can we execute it <a href="https://docs.djangoproject.com/en/2.1/topics/db/sql/#executing-custom-sql-directly" rel="noopener">directly in the database</a>?</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="gp">>>></span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="p">))</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="n">results</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">results</span>
<span class="go">[('Costco', 'D1', 3, 112.3333333333333333, 3),</span>
<span class="go">('Costco', 'D4', 1, 25.0000000000000000, 1),</span>
<span class="go">...</span>
<span class="go">('Walmart', 'D3', 2, 37.0000000000000000, 1)]</span>
</pre></div>
<p>This looks like something we can work with, let's dig deeper...</p>
<p>As mentioned before, the QuerySet is generated by Django Admin, and it might include predicates for list filters and date hierarchy. Let's try to execute a query with a predicate on the <code>sold_at</code> date field:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="gp">>>></span>
<span class="hll"><span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">sold_at__lt</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
</span><span class="gp">>>> </span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="gp">>>> </span><span class="o">...</span> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="p">))</span>
<span class="go">*** django.db.utils.ProgrammingError: syntax error at or near "00"</span>
<span class="go">LINE 1: ..."sale"."sold_at" >= 2019-03-02 00:00:00+0...</span>
</pre></div>
<p>Looks like Django is unable to execute the query as is. The reason for that is that the text generated by <code>str(qs.query)</code> is just a text representation of the query. Under the hood, Django uses proper bind variables (might also be known as substitution variables) to avoid SQL injection.</p>
<p>Much of the Django ORM QuerySet logic is carried out by an internal class called <code>Query</code>. The class is not documented. The only place to learn about it is in <a href="https://github.com/django/django/blob/2.1/django/db/models/sql/query.py#L133" rel="noopener">the source</a>. One promising function of <code>Query</code> is <code>sql_with_params</code>. Let's use it on the query above, and see what we get:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">sold_at__lt</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="hll"><span class="gp">>>> </span><span class="n">sql</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">sql_with_params</span><span class="p">()</span>
</span><span class="gp">>>> </span><span class="n">sql</span>
<span class="go">SELECT "sale"."merchant", "sale"."device", COUNT("sale"."id") AS "total",</span>
<span class="go">AVG("sale"."charged_amount") AS "avg_charged_amount",</span>
<span class="go">COUNT(DISTINCT "sale"."user_id") AS "unique_users" FROM "sale"</span>
<span class="go">WHERE "sale"."sold_at" < %s</span>
<span class="go">GROUP BY "sale"."merchant", "sale"."device" ORDER BY "sale"."merchant", "sale"."device"</span>
<span class="gp">>>> </span><span class="n">params</span>
<span class="go">(datetime.datetime(2019, 3, 2, 0, 0, tzinfo=<UTC>))</span>
</pre></div>
<p>The function <code>sql_with_params</code> returns a tuple. The first argument is the SQL query. The second, is a list of parameters to that query.</p>
<p>The keen-eyed might have spotted the placeholder <code>%s</code> in the query text:</p>
<div class="highlight"><pre><span></span>WHERE "sale"."sold_at" < %s
</pre></div>
<p>This placeholder corresponds to the parameter we got in the second argument. Let's try to execute the query with the placeholder, and the params:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="gp">>>></span>
<span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">sold_at__lt</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="gp">>>> </span><span class="n">sql</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">sql_with_params</span><span class="p">()</span>
<span class="gp">>>></span>
<span class="gp">>>> </span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="hll"><span class="gp">>>> </span><span class="o">...</span> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">sql</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
</span><span class="gp">>>> </span><span class="o">...</span> <span class="n">results</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">results</span>
<span class="go">[('Costco', 'D1', 3, 112.3333333333333333, 3),</span>
<span class="go">('Costco', 'D4', 1, 25.0000000000000000, 1),</span>
<span class="go">...</span>
<span class="go">('Walmart', 'D3', 2, 37.0000000000000000, 1)]</span>
</pre></div>
<p>Great! We are now able to execute a query as Django does. We are ready to manipulate the query.</p>
<h3 id="manipulating-the-query"><a class="toclink" href="#manipulating-the-query">Manipulating the Query</a></h3>
<p>The query generated by Django includes a simple GROUP BY clause:</p>
<div class="highlight"><pre><span></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"merchant"</span><span class="p">,</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"device"</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"merchant"</span><span class="p">,</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"device"</span>
</pre></div>
<p>We want to replace that with the following group by clause:</p>
<div class="highlight"><pre><span></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">GROUPING</span><span class="w"> </span><span class="k">SETS</span><span class="w"> </span><span class="p">((</span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"merchant"</span><span class="p">,</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"device"</span><span class="p">),</span><span class="w"> </span><span class="p">())</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"merchant"</span><span class="p">,</span><span class="w"> </span><span class="ss">"sale"</span><span class="p">.</span><span class="ss">"device"</span>
</pre></div>
<p>This looks like a job for <a href="https://docs.python.org/3/library/re.html" rel="noopener"><code>re</code></a>.</p>
<p>We want catch the grouped by fields between <code>GROUP BY</code> and <code>ORDER BY</code>, and make them the first group in the <code>GROUPING SET</code> expression. Then, we want add the group <code>()</code> for the summary:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">re</span>
<span class="n">sql_with_grouping_sets</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span>
<span class="sa">r</span><span class="s1">'GROUP BY (.*) ORDER'</span><span class="p">,</span>
<span class="sa">r</span><span class="s1">'GROUP BY GROUPING SETS (( \1 ), ()) ORDER'</span><span class="p">,</span>
<span class="n">sql</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Now we can take the modified query, and execute it with the params:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="hll"><span class="gp">>>> </span><span class="o">...</span> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">sql_with_grouping_sets</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
</span><span class="gp">>>> </span><span class="o">...</span> <span class="n">results</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">results</span>
<span class="go">[('Costco', 'D1', 3, 112.3333333333333333, 3),</span>
<span class="go">('Costco', 'D4', 1, 25.0000000000000000, 1),</span>
<span class="go">...</span>
<span class="go">('Walmart', 'D3', 2, 37.0000000000000000, 1)]</span>
<span class="hll"><span class="go">(None, None, 13, 70.4615384615384615, 5)]</span>
</span></pre></div>
<p>Lo and behold... We got both the results <em>and</em> the summary line in a single query.</p>
<h2 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h2>
<p>Several important issues to consider about this approach:</p>
<ul>
<li>
<p><strong>Don't do this!:</strong> This is as bad as it gets. This approach is a nice exercise, and a great opportunity to explore the ORM internals, but the implementation is too fragile. When using an internal, undocumented API, there is no guaranty it wont change unexpectedly in the future. Having said that, we decided to use this approach in one of our internal admin pages. It's a very specific scenario involving a queryset that's not used for any user facing features. It helped us cut the page response time exactly in half and we are pleased with the result.</p>
</li>
<li>
<p><strong>Make the sort order deterministic:</strong> When using <code>GROUPING SETS</code> (and <code>ROLLUP</code> or <code>CUBE</code> for that matter), you mix more than one level of aggregation in a single query. To be able to fetch the results in a predictable way, it's important to explicitly sort the results. For example, in the query above, to make sure the summary row is the first row, add the following sort order <code>qs.order_by( F('merchant').desc(nulls_last=False) )</code>.</p>
</li>
</ul>It's Time to Own My Own Content2019-01-14T00:00:00+02:002019-01-14T00:00:00+02:00Haki Benitatag:hakibenita.com,2019-01-14:/its-time-to-own-my-own-content<p>I started writing about two years ago. Back then, I used to read a lot on Medium. When I finally felt the urge to write something, it made sense to publish there as well. Medium provided me with a platform, an audience, and constant reinforcements in the form of stats, likes and comments. It motivated me to keep writing. Despite it's many advantages, I feel Medium is lacking in some areas.</p><hr>
<p>I started writing about two years ago. Back then, I used to read a lot on Medium. When I finally felt the urge to write something, it made sense to publish there as well.</p>
<p>Medium provided me with a platform, an audience, and constant reinforcements in the form of stats, likes and comments. It motivated me to keep writing.</p>
<h2 id="whats-wrong-with-medium"><a class="toclink" href="#whats-wrong-with-medium">What's Wrong With Medium</a></h2>
<p>Despite it's many advantages, I feel Medium is lacking in some areas.</p>
<h4 id="poor-support-for-source-code"><a class="toclink" href="#poor-support-for-source-code">Poor Support For Source Code</a></h4>
<p>Poor support is really an understatement, Medium has no support for source code. I understand Medium is not intended for developers, so I don't expect features like syntax highlighting. However, making the "source" container scroll on overflow would have made all the difference.</p>
<figure><img alt="A code block shorter than 50 characters, on a small screen on Medium" src="https://hakibenita.com/images/05-its-time-to-own-my-own-content.png"><figcaption>A code block shorter than 50 characters, on a small screen on Medium</figcaption>
</figure>
<p>I write about code, and most of my articles are filled with code examples. Not being able to embed code snippets in a suitable manner is a deal breaker. I know using gist and the like is an option, but it's too much hassle.</p>
<p>I'm not the only one who thinks Medium can do better, the readers think so as well. The most <a href="https://medium.com/@hakibenita/bullet-proofing-django-models-c080739be4e" rel="noopener">highlighted sentence</a> in my articles is this:</p>
<figure><img alt="Medium has such poor support for source code" src="https://hakibenita.com/images/01-its-time-to-own-my-own-content.png"><figcaption>Medium has such poor support for source code</figcaption>
</figure>
<h4 id="bad-reading-experience-for-non-registered-users"><a class="toclink" href="#bad-reading-experience-for-non-registered-users">Bad Reading Experience For Non-Registered Users</a></h4>
<p>A lot has been said about Medium's effort to make you register. The large popup, the app banner on top, the fixed footer. All of these harm the reading experience. Even though I am a registered user, once in a while I get to open a Medium article in a WebView, and I get to experience what it is like for non-registered users. It bothers me that I have to put users through that just to read my article.</p>
<h4 id="the-writing-experience"><a class="toclink" href="#the-writing-experience">The Writing Experience</a></h4>
<p>When Medium first started everybody praised the innovative UI. A clean WYSIWYG editor with a minimal popup element that appears when you highlight some text. That was fun at first, but once I actually started writing, I found it annoying. After a while I started writing in Google Docs and copy-pasting back to Medium.</p>
<p>I wanted a spell checker. I wanted other people that were not registered on Medium review my writings. I wanted to write code without manually adding leading spaces if the previous line was blank. I wanted to paste code directly from my editor, without quotes and dashes magically converted into Unicode characters (like "..." -> ββ¦β). Writing became tedious!</p>
<figure><img alt="Trying to make a code block look decent on Medium" src="https://hakibenita.com/images/03-its-time-to-own-my-own-content.gif"><figcaption>Trying to make a code block look decent on Medium</figcaption>
</figure>
<p>This is also a good opportunity to apologize to the following Twitter users, who were automatically tagged in so many of my articles: @property, @classmethod, @contextmanager, @receiver, @cached_property, @patch and @register. Sorry... π₯</p>
<h4 id="i-stopped-reading-on-medium"><a class="toclink" href="#i-stopped-reading-on-medium">I Stopped Reading on Medium</a></h4>
<p>About a year ago Medium did a redesign. They changed the likes to claps, which made the score meaningless. The stream of articles related to my interests was replaced with random tiles from various topics, with no score and a very short summary. I found myself going to Medium only to check notifications and respond to comments.</p>
<figure><img alt="A lot of things I don't care about" src="https://hakibenita.com/images/04-its-time-to-own-my-own-content.png"><figcaption>A lot of things I don't care about</figcaption>
</figure>
<p>Having said all of that, I think Medium still excels in many things. For registered user, the reading experience is very pleasant, a lot of the crowd is still there, and they seem to be constantly trying to improve the UI.</p>
<hr>
<h2 id="do-it-yourself"><a class="toclink" href="#do-it-yourself">Do It Yourself!</a></h2>
<p>At some point I said to myself:</p>
<blockquote>
<p>Damn it, you're a developer... you can host your own stuff!</p>
</blockquote>
<p>And so I did...</p>
<p>I wanted to make a site I would enjoy reading. I put together all the things I liked from other places (including Medium), and stitched it all up. It took me a couple of weeks to migrate all of the content, but it was worth it.</p>
<h4 id="minimal-interface"><a class="toclink" href="#minimal-interface">Minimal Interface</a></h4>
<p>It's all about the content, anything else is a nuisance. I didn't want a fixed header or footer. There are no popups or modals, no floating elements or share buttons that follow you around. Just long lines of words made out of letters. I'm obviously not a designer, but I think it turned out OK.</p>
<h4 id="code-code-code"><a class="toclink" href="#code-code-code">Code, Code, Code!</a></h4>
<p>I write about code, so the code should take center stage. The articles are written in markdown, and syntax highlighting is done using <a href="http://pygments.org/" rel="noopener">Pygments</a>. I was surprised to learn how many <a href="http://pygments.org/languages/" rel="noopener">languages are supported</a> (there is even a formatter for Jinja and Django templates which I often use). The code blocks are wide and spacious, on small screens it scrolls, and the text can be copy-pasted directly into the code editor.</p>
<h4 id="read-progress-indicator"><a class="toclink" href="#read-progress-indicator">Read Progress Indicator</a></h4>
<p>I like knowing where I am in the article. Some browsers, especially on mobile, hide the scroll indicator. I dont like that. I like to know where I am, and how much reading I have left.</p>
<p>I liked the idea of a small, fixed read progress indicator at the top of the screen. I tried to do it <a href="https://codepen.io/MadeByMike/pen/ZOrEmr" rel="noopener">only with CSS</a>, but it turned out to be flaky on some mobile browsers. You can check out the result. Make sure JS is enabled in the browser and look up βββ</p>
<p>I also like knowing how long an article is before I start reading it. I used a plugin called <a href="https://github.com/getpelican/pelican-plugins/tree/master/readtime" rel="noopener">readtime</a> for it. It basically counts the words in the article and divides by some predefined <a href="http://en.wikipedia.org/wiki/Words_per_minute" rel="noopener">words per minute</a>. If you're curious, it's 230.</p>
<h4 id="dark-theme"><a class="toclink" href="#dark-theme">Dark Theme</a></h4>
<p>When I use Gnome, VSCode, Android, YouTube, Feedly, Medium, Reddit and just about every app that support it, I always choose the dark theme. There is no reason for me not to provide one as well:</p>
<figure><img alt="Dark theme" src="https://hakibenita.com/images/06-its-time-to-own-my-own-content.png"><figcaption>Dark theme</figcaption>
</figure>
<p>Also, I wanted to play around with CSS variables (they're awesome!).</p>
<h4 id="titles-are-anchor-links"><a class="toclink" href="#titles-are-anchor-links">Titles are Anchor Links</a></h4>
<p>This is so obvious but so commonly overlooked. I like sharing articles with peers and team members. I also like to reference articles in commit messages and comments. I like being able to point to a specific section in the article. This is what anchor links are for.</p>
<h4 id="experiment"><a class="toclink" href="#experiment">Experiment</a></h4>
<p>I wanted to use this site to experiment with a static web site generator. I choose <a href="https://blog.getpelican.com/" rel="noopener">Pelican</a>. It's implemented in Python, articles are written in <a href="https://python-markdown.github.io/" rel="noopener">Markdown</a> and templates are rendered using <a href="http://jinja.pocoo.org/" rel="noopener">Jinja</a>. Pelican comes with a decent set of plugins and options. It also comes with a useful <code>Makefile</code> for common operations such as devserver, publish and clean. It was pretty easy to set up and customize.</p>
<p>The best part is the deployment. After years of fat Python apps, Ansible playbooks, virtual environments and what not, the deployment process here is just pure gold:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>make<span class="w"> </span>publish
$<span class="w"> </span>rsync<span class="w"> </span>-avc<span class="w"> </span>output/*<span class="w"> </span>hakibenita:/opt/www/
</pre></div>
<hr>
<p>Hopefully this change will turn out for the best. I intend to keep posting a short intro to my articles on Medium, and reference the full article on this site. I also post on Twitter and Reddit so, If you like what I write you should be able to find me.</p>Modeling Polymorphism in Django2019-01-02T00:00:00+02:002019-01-02T00:00:00+02:00Haki Benitatag:hakibenita.com,2019-01-02:/modeling-polymorphism-in-django<p>Modeling polymorphism in relational databases is a challenging task. In this article, we present several modeling techniques to represent polymorphic objects in a relational database using the Django object-relational mapping (ORM).</p><hr>
<p>If you ever added a <code>type</code>, <code>kind</code> or a <code>mode</code> field to a Django model, you probably had to deal with polymorphism at some level. With the great people over at <a href="https://realpython.com" rel="noopener">RealPython</a>, I wrote about 5 ways to model polymorphism in Django.</p>
<p><a href="https://realpython.com/modeling-polymorphism-django-python/" rel="noopener"><strong>Read "Modeling Polymorphism in Django With Python" on RealPython β«</strong></a></p>
<figure><img alt="Modeling Polymorphism in Django With Python" src="https://hakibenita.com/images/01-modeling-polymorphism-in-django-with-python.jpeg"><figcaption>Modeling Polymorphism in Django With Python</figcaption>
</figure>How We Solved a Storage Problem in PostgreSQL Without Adding a Single Byte of Storage2018-12-22T00:00:00+02:002018-12-22T00:00:00+02:00Haki Benitatag:hakibenita.com,2018-12-22:/how-we-solved-a-storage-problem-in-postgre-sql-without-adding-a-single-bytes-of-storage<p>A while back we started getting alerts in the middle of the night on low disk space. A quick investigation led us to one of our ETL tasks. Every night the task was fired to eliminate duplicate dumps, and free up some space. This is a short story about how we found our silver bullet and solved the issue without adding a single byte of storage.</p><hr>
<p>A while back we started getting alerts in the middle of the night on low disk space. A quick investigation led us to one of our ETL tasks.</p>
<p>The ETL task was working on a table that stored binary records we refer to as "dumps". Every night the task was fired to eliminate duplicate dumps, and free up some space.</p>
<p>To find duplicate dumps we used this query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">MIN</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="nb">blob</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">)</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">dumps</span>
</pre></div>
<p>The query groups similar dumps by the blob field. Using a window function we get the ID of the first occurrence of each dump. We later use this query to remove the other duplicate dumps.</p>
<p>This query took some time to run. The log showed that while it ran, it consumed significant amount of disk space. The following chart shows the big dips in free disk space every night while the query was running:</p>
<figure><img alt="Free storage space over time" src="https://hakibenita.com/images/01-how-we-solved-a-storage-problem-in-postgre-sql-without-adding-a-single-bytes-of-storage.png"><figcaption>Free storage space over time</figcaption>
</figure>
<p>As time passed the query consumed more and more disk space and the dips got deeper. Looking at the execution plan, the reason for the high disk usage became apparent:</p>
<div class="highlight"><pre><span></span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">69547.50..79494.14</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">497332</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">40</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">107.619..152.457</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span><span class="p">,</span><span class="w"> </span><span class="k">temp</span><span class="w"> </span><span class="k">read</span><span class="o">=</span><span class="mf">3807</span><span class="w"> </span><span class="n">written</span><span class="o">=</span><span class="mf">3816</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">69547.50..70790.83</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">497332</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">36</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">107.607..127.485</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">blob</span><span class="p">,</span><span class="w"> </span><span class="n">id</span>
<span class="hll"><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Method</span><span class="p">:</span><span class="w"> </span><span class="k">external</span><span class="w"> </span><span class="n">merge</span><span class="w"> </span><span class="n">Disk</span><span class="p">:</span><span class="w"> </span><span class="mf">30456</span><span class="n">kB</span>
</span><span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span><span class="p">,</span><span class="w"> </span><span class="k">temp</span><span class="w"> </span><span class="k">read</span><span class="o">=</span><span class="mf">3807</span><span class="w"> </span><span class="n">written</span><span class="o">=</span><span class="mf">3816</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">dumps</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0..8889.32</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">497332</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">36</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">0.022..8.747</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span>
<span class="n">Execution</span><span class="w"> </span><span class="nb">time</span><span class="p">:</span><span class="w"> </span><span class="mf">159.960</span><span class="w"> </span><span class="n">ms</span>
</pre></div>
<p><strong>The sort consumes a significant amount of disk space</strong>. In the execution plan above from a test dataset, the sort consumed ~30MB of disk space.</p>
<h3 id="why-is-this-happening"><a class="toclink" href="#why-is-this-happening">Why is This Happening?</a></h3>
<p>PostgreSQL allocates memory for hash and sort operations. The amount of memory is controlled using the <a href="https://www.postgresql.org/docs/9.6/runtime-config-resource.html#GUC-WORK-MEM" rel="noopener">work_mem</a> parameter. The default size of work_mem is 4MB. Once a sort or hash operation requires more than 4MB to complete, PostgreSQL will resort to temporary disk space.</p>
<p>Our query is obviously consuming more than 4MB of memory, and this is why we see the database using so much disk space. We decided that before we increase the parameter or add more storage, we should find another way to <strong>reduce the amount of space consumed by the sort</strong>.</p>
<h3 id="giving-the-sort-a-diet"><a class="toclink" href="#giving-the-sort-a-diet">Giving the Sort a Diet</a></h3>
<p>The amount of space consumed by a sort is affected by the size of the dataset and the size of the sorting key. We can't change the size of the dataset, but we can <strong>reduce the size of the key</strong>.</p>
<p>To establish a baseline, let's see the average size of the sort key:</p>
<div class="highlight"><pre><span></span><span class="o">></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">pg_column_size</span><span class="p">(</span><span class="n">blob</span><span class="p">))</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">dumps</span><span class="p">;</span>
<span class="go"> avg</span>
<span class="go">----------</span>
<span class="go"> 780</span>
</pre></div>
<p>Each key weighs 780. One way to reduce the size of a binary key is to hash it. In PostgreSQL we can use <a href="https://www.postgresql.org/docs/9.6/functions-string.html" rel="noopener">md5</a> (yes, it is insecure, but fine for our purposes). Let's see what is the size of the blob hashed using md5:</p>
<div class="highlight"><pre><span></span><span class="o">></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">avg</span><span class="p">(</span><span class="n">pg_column_size</span><span class="p">(</span><span class="n">md5</span><span class="p">(</span><span class="n">array_to_string</span><span class="p">(</span><span class="n">blob</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">))))</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">dumps</span><span class="p">;</span>
<span class="go"> avg</span>
<span class="go">-----------</span>
<span class="go"> 36</span>
</pre></div>
<p>The size of a key hashed with md5 is 36 bytes. <strong>The hashed key is ~4% the size of the original key</strong>, way smaller.</p>
<p>The next step was to try our original query with the hashed key:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="k">MIN</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="hll"><span class="w"> </span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">array_to_string</span><span class="p">(</span><span class="nb">blob</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span>
</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">)</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">dumps</span><span class="p">;</span>
</pre></div>
<p>And the execution plan:</p>
<div class="highlight"><pre><span></span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">7490.74..8469.74</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">40</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">349.394..371.771</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">7490.74..7588.64</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">36</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">349.383..353.045</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">md5</span><span class="p">(</span><span class="n">array_to_string</span><span class="p">(</span><span class="n">blob</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="o">::</span><span class="nb">text</span><span class="p">))),</span><span class="w"> </span><span class="n">id</span>
<span class="hll"><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Method</span><span class="p">:</span><span class="w"> </span><span class="n">quicksort</span><span class="w"> </span><span class="n">Memory</span><span class="p">:</span><span class="w"> </span><span class="mf">4005</span><span class="n">kB</span>
</span><span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">dumps</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0..4503.40</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">36</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="nb">time</span><span class="o">=</span><span class="mf">0.055..292.070</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">39160</span><span class="p">)</span>
<span class="w"> </span><span class="n">Buffers</span><span class="p">:</span><span class="w"> </span><span class="n">shared</span><span class="w"> </span><span class="n">hit</span><span class="o">=</span><span class="mf">3916</span>
<span class="n">Execution</span><span class="w"> </span><span class="nb">time</span><span class="p">:</span><span class="w"> </span><span class="mf">374.125</span><span class="w"> </span><span class="n">ms</span>
</pre></div>
<p>Using the hashed key the query consumed only 4MB of additional disk space. That is ~10% of the previous size of 30MB. This means <strong>the size of the sort key has significant impact on the amount of storage consumed by a sort</strong>.</p>
<hr>
<h3 id="extra-credit"><a class="toclink" href="#extra-credit">Extra Credit</a></h3>
<p>In the example above we used <code>md5</code> to hash the blob. Hashes generated with MD5 are supposed to be 16 bytes. However, using <code>md5</code> we get bigger output size:</p>
<div class="highlight"><pre><span></span><span class="k">select</span><span class="w"> </span><span class="n">pg_column_size</span><span class="p">(</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">md5_size</span>
<span class="n">md5_size</span>
<span class="c1">-------------</span>
<span class="mi">32</span>
</pre></div>
<p>The size of the hash is exactly double the size we expected. This is because <code>md5</code> provides the hash as text represented in hexadecimal.</p>
<p>There is another way to hash with MD5 in PostgreSQL using the <a href="https://www.postgresql.org/docs/current/pgcrypto.html" rel="noopener"><code>pgcrypto</code> extension</a>. <code>pgcrypto</code> can produce MD5 as <a href="https://www.postgresql.org/docs/current/datatype-binary.html" rel="noopener"><code>bytea</code> (binary)</a>:</p>
<div class="highlight"><pre><span></span><span class="k">create</span><span class="w"> </span><span class="n">extension</span><span class="w"> </span><span class="n">pgcrypto</span><span class="p">;</span>
<span class="k">select</span><span class="w"> </span><span class="n">pg_column_size</span><span class="p">(</span><span class="w"> </span><span class="n">digest</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span><span class="w"> </span><span class="s1">'md5'</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">crypto_md5_size</span>
<span class="n">crypto_md5_size</span>
<span class="c1">---------------</span>
<span class="mi">20</span>
</pre></div>
<p>The size of the hash is still 4 bytes larger than we expected. This is because the <code>bytea</code> type uses an additional 4 bytes to store the length of the value.</p>
<p>To strip off these 4 bytes, we can resort to some hackery. As it happens, the <code>uuid</code> type in PostgreSQL is exactly 16 bytes, and it supports any arbitrary value. We can use that fact to strip off the remaining 4 bytes:</p>
<div class="highlight"><pre><span></span><span class="k">select</span><span class="w"> </span><span class="n">pg_column_size</span><span class="p">(</span><span class="w"> </span><span class="n">uuid_in</span><span class="p">(</span><span class="n">md5</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)::</span><span class="n">cstring</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">uuid_size</span>
<span class="n">uuid_size</span>
<span class="c1">---------------</span>
<span class="mi">16</span>
</pre></div>
<p>And there you have it. From 32 bytes using <code>md5</code>, down to 16 bytes using <code>uuid</code>.</p>
<p>To check the impact of this change I had to use a bigger dataset than the one I used in this article. Since I can't share the actual data, I'll only share the results:</p>
<table>
<thead>
<tr>
<th>expression</th>
<th>size</th>
<th>disk used</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>blob</code></td>
<td>780</td>
<td>309MB</td>
</tr>
<tr>
<td><code>md5(blob)</code></td>
<td>36</td>
<td>11MB</td>
</tr>
<tr>
<td><code>uuid_in(md5(blob))</code></td>
<td>16</td>
<td>7MB</td>
</tr>
</tbody>
</table>
<p>As the table above shows, the original query which caused us grief used 300MB of disk space (and woke us up at night). After we changed the query to use a <code>uuid</code> key, the sort used only 7MB of disk space.</p>
<h3 id="additional-considerations"><a class="toclink" href="#additional-considerations">Additional Considerations</a></h3>
<p>The query with the hashed sort key runs much slower, even though it consumes less disk space:</p>
<table>
<thead>
<tr>
<th>query</th>
<th>run time</th>
</tr>
</thead>
<tbody>
<tr>
<td>blob</td>
<td>160ms</td>
</tr>
<tr>
<td>hashed blob</td>
<td>374ms</td>
</tr>
</tbody>
</table>
<p>Hashing utilizes more CPU. This caused the query with the hash to be slower. In our case, we were trying to solve a disk space problem. The task ran at night so we weren't concerned with the execution time. We were willing to make this compromise to solve the disk space problem.</p>
<p>This case was a good reminder that <strong>tuning a database is not only about making queries run faster</strong>. It's all about balancing the resources at our disposal and doing as much as possible with as little as possible.</p>Optimizing the Django Admin Paginator2018-11-06T00:00:00+02:002018-11-06T00:00:00+02:00Haki Benitatag:hakibenita.com,2018-11-06:/optimizing-the-django-admin-paginator<p>I often talk about making Django scale but what does it actually mean? It means getting consistent performance regardless of the amount of data. In this article we tackle The last nail in Django admin's scalability coffin - the paginator.</p><hr>
<p>In almost every project we work on, we use Django admin for support and operations. Over time we experienced an influx of new users and the amount of data we had stored grew rapidly. With a large dataset we started to experience the real cost of some Django admin features.</p>
<p>I often talk about making Django scale but what does it actually mean? It means getting consistent performance regardless of the amount of data. Over the past few years I've written about different approaches to scale and optimize Django admin for large tables:</p>
<ul>
<li><a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger">Things You Must Know About Django admin As Your App Gets Bigger</a>: Included an overview of built-in options to optimize for larger tables.</li>
<li><a href="/how-to-add-custom-action-buttons-to-django-admin">How to Add Custom Action Buttons to Django Admin</a>: Explained how to make Django admin more suitable for operations.</li>
<li><a href="/how-to-turn-django-admin-into-a-lightweight-dashboard">How to turn Django Admin into a lightweight dashboard</a>: Showed how to add custom pages to the admin with fancy graphs and chart. This was our first dashboard (and we still use it!).</li>
<li><a href="/5-ways-to-make-django-admin-safer">5 ways to make Django Admin safer</a>: Described different measures we took to make our admin more secure (this one also inspired <a href="https://github.com/dizballanze/django-admin-env-notice" rel="noopener">a package</a>).</li>
<li><a href="/scaling-django-admin-date-hierarchy">Scaling Django Admin Date Hierarchy</a>: Identified issues with the date hierarchy and suggested ways to alter the admin behavior to better handle large tables (we also turned it into <a href="https://github.com/hakib/django-admin-lightweight-date-hierarchy" rel="noopener">a package</a>!).</li>
<li><a href="/django-admin-range-based-date-hierarchy">Django Admin Range-Based Date Hierarchy </a>: Looked at the queries generated by the date hierarchy and made them much faster. I'm especially proud of this one because <strong>I managed to <a href="https://github.com/django/django/pull/9469" rel="noopener">get it into Django 2.1</a></strong>.</li>
<li><a href="/how-to-add-a-text-filter-to-django-admin">How to add a text filter to Django Admin</a>: We added a new type of filter to save some time when generating list filters.</li>
</ul>
<p>After all of this, only one problem remained...</p>
<hr>
<h2 id="the-paginator"><a class="toclink" href="#the-paginator">The Paginator</a></h2>
<p>The last nail in Django admin's scalability coffin is the paginator:</p>
<figure><img alt="Count is taking so much time!" src="https://hakibenita.com/images/01-optimizing-django-admin-paginator.png"><figcaption>Count is taking so much time!</figcaption>
</figure>
<p>Django spent 781ms out of 786ms just to count the rows in the table. That's ~99.4% of the time, just for the paginator!</p>
<p>For reference, the actual query used to fetch the data took only 2.10ms. The reason it's much quicker is that it only needs to fetch one page (notice the <code>LIMIT 100</code>).</p>
<h3 id="what-can-we-do"><a class="toclink" href="#what-can-we-do">What Can We Do?</a></h3>
<p>Making something scale is all about eliminating operations that work on the entire dataset. The paginator has to count the rows to determine how many pages there are. This forces every single page in the admin to scan the entire table.</p>
<p>As the table grows in size this query takes longer and longer to execute. The size of the table has a direct impact on the load time of a single page and this is exactly what we want to avoid.</p>
<p>Django <code>ModelAdmin</code> <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator" rel="noopener">provides a way to override the paginator</a>.
Our initial thought was to implement an entirely different type of paginator - one that doesn't calculate the number of pages, but only shows links to the previous and next pages.</p>
<p>Unfortunately, the paginator is embedded deep into Django admin - Django uses partial templates to render the paginator. This makes it difficult to create a paginator that doesn't count pages.</p>
<p>To provide a different paginator to a <code>ModelAdmin</code>we need to implement a <code>Paginator</code>. The interesting function in the paginator implementation is <code>count</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># django/core/paginator.py</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">object_list</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</span> <span class="k">except</span> <span class="p">(</span><span class="ne">AttributeError</span><span class="p">,</span> <span class="ne">TypeError</span><span class="p">):</span>
<span class="hll"> <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">object_list</span><span class="p">)</span>
</span></pre></div>
<p>This is where the horror happens. Django counts the rows in the queryset (or the object list) and caches the result.</p>
<p>If we want to eliminate the count this is where we need to start.</p>
<h3 id="a-dumb-paginator"><a class="toclink" href="#a-dumb-paginator">A Dumb Paginator</a></h3>
<p>Let's take the simplest approach and create a paginator that returns a very
large number without actually counting the rows:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/paginator.py</span>
<span class="kn">from</span> <span class="nn">django.core.paginator</span> <span class="kn">import</span> <span class="n">Paginator</span>
<span class="k">class</span> <span class="nc">DumbPaginator</span><span class="p">(</span><span class="n">Paginator</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Paginator that does not count the rows in the table.</span>
<span class="sd"> """</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">9999999999</span>
</pre></div>
<p>Let's add it to a <code>ModelAdmin</code> of a large table and see what happens:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/admin.py</span>
<span class="kn">from</span> <span class="nn">common.paginator</span> <span class="kn">import</span> <span class="n">DumbPaginator</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">LargeTable</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">LargeTableAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="hll"> <span class="n">paginator</span> <span class="o">=</span> <span class="n">DumbPaginator</span>
</span></pre></div>
<p>And the admin page:</p>
<figure><img alt="Admin page did not count rows" src="https://hakibenita.com/images/02-optimizing-django-admin-paginator.png"><figcaption>Admin page did not count rows</figcaption>
</figure>
<p>Wow! With the dumb paginator the page now loads in only 4ms. This is impressive but how does the UI looks like?</p>
<figure><img alt="Pagination UI" src="https://hakibenita.com/images/03-optimizing-django-admin-paginator.png"><figcaption>Pagination UI</figcaption>
</figure>
<p>We made the paginator think there are 9999999999 results so this is what it shows. Clicking on a page that doesn't exist will open an empty list view.</p>
<hr>
<h2 id="getting-creative"><a class="toclink" href="#getting-creative">Getting Creative</a></h2>
<p>We saw that we can provide a custom pagination and eliminate the count. We also know that Django is very attached to this specific paginator template so we can't easily replace it.</p>
<p>At this point it's obvious that if we don't want to work <em>too</em> hard and alter Django's templates, we need to make a compromise. A compromise can be made either in the UI or in accuracy.</p>
<p>I personally don't like to sacrifice accuracy so fast so I tend to make compromises in the UI. Especially in internal projects such as ones implemented with Django admin.</p>
<p>Instead of always eliminating the pagination, what if we just limit the execution time of the count query:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/paginator.py</span>
<span class="kn">from</span> <span class="nn">django.core.paginator</span> <span class="kn">import</span> <span class="n">Paginator</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span><span class="p">,</span> <span class="n">transaction</span><span class="p">,</span> <span class="n">OperationalError</span>
<span class="hll"><span class="k">class</span> <span class="nc">TimeLimitedPaginator</span><span class="p">(</span><span class="n">Paginator</span><span class="p">):</span>
</span><span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Paginator that enforces a timeout on the count operation.</span>
<span class="sd"> If the operations times out, a fake bogus value is</span>
<span class="sd"> returned instead.</span>
<span class="sd"> """</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># We set the timeout in a db transaction to prevent it from</span>
<span class="c1"># affecting other transactions.</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">(),</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
</span><span class="hll"> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'SET LOCAL statement_timeout TO 200;'</span><span class="p">)</span>
</span> <span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">count</span>
</span> <span class="k">except</span> <span class="n">OperationalError</span><span class="p">:</span>
<span class="hll"> <span class="k">return</span> <span class="mi">9999999999</span>
</span></pre></div>
<p>We let the original paginator count the rows. If the count takes longer than 200ms, the database will kill the query, an <code>OperationError</code> will be raised and we return the fake value. If the count takes less than 200ms it will work as usual.</p>
<p>To check the new approach we set the paginator in the <code>ModelAdmin</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/admin.py</span>
<span class="kn">from</span> <span class="nn">common.paginator</span> <span class="kn">import</span> <span class="n">TimeLimitedPaginator</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">LargeTable</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">LargeTableAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="hll"> <span class="n">paginator</span> <span class="o">=</span> <span class="n">TimeLimitedPaginator</span>
</span></pre></div>
<p>First let's try it with a big result set. For example, all the rows from year 2018:</p>
<figure><img alt="Query took longer than 200ms and was killed" src="https://hakibenita.com/images/04-optimizing-django-admin-paginator.png"><figcaption>Query took longer than 200ms and was killed</figcaption>
</figure>
<p>Query took longer than 200ms and was killed.</p>
<p>Let's try a shorter period. For example, only records from a single day in 2018:</p>
<figure><img alt="Query finished in under 200ms and was not killed." src="https://hakibenita.com/images/05-optimizing-django-admin-paginator.png"><figcaption>Query finished in under 200ms and was not killed.</figcaption>
</figure>
<p>Count took only 2.41ms. It was not killed and we got the regular "paginator" - exactly what we wanted!</p>
<div class="admonition info">
<p class="admonition-title">source code</p>
<p>The complete code for <code>TimeLimitedPaginator</code> can be found in <a href="https://gist.github.com/hakib/5cbda96c8121299088115a94ec634903" rel="noopener">this gist</a>.</p>
</div>
<hr>
<h2 id="other-approaches"><a class="toclink" href="#other-approaches">Other Approaches</a></h2>
<p>As discussed in the previous section, compromises can be made in different ways.
Some ideas I encountered when I researched this issue were:</p>
<ol>
<li><strong>Estimate the number of rows based on the database execution plan</strong> - in Django 2.1 there is even a new <a href="https://docs.djangoproject.com/en/2.1/ref/models/querysets/#explain" rel="noopener">explain function</a> on <code>QuerySet</code>. This approach is very inaccurate, especially with complicated predicates, and might yield unexpected results.</li>
<li><strong>Cache the number of rows</strong> - the boring and unimaginative solution. As always with a solution that involves caching, you need to decide when to invalidate the cache which is a problem of it's own.</li>
<li><strong>Completely replace Django's paginator</strong> - replacing templates and possibly making adjustments to the change list itself. <br> I went down this road
initially but gave up when I realized it will be difficult to make this "plug and play". I do believe that a proper solution should let users replace the pagination implementation to something like <a href="https://www.django-rest-framework.org/api-guide/pagination/#cursorpagination" rel="noopener">cursor pagination</a> (pagination that does not need to evaluate the entire dataset).</li>
</ol>Be Careful With CTE in PostgreSQL2018-09-17T00:00:00+03:002018-09-17T00:00:00+03:00Haki Benitatag:hakibenita.com,2018-09-17:/be-careful-with-cte-in-postgre-sql<p>Common table expressions, also known as the WITH clause, are a very useful feature. They help break down big queries into smaller pieces which makes it easier to read and understand. But, when used incorrectly they can cause a significant performance hit.</p><hr>
<p><a href="https://www.postgresql.org/docs/current/static/queries-with.html" rel="noopener">Common table expressions</a> (CTE), also known as the WITH clause, are a very useful feature. They help break down big queries into smaller pieces which makes it easier to read and understand.</p>
<div class="admonition info">
<p class="admonition-title">PostgreSQL Version</p>
<p>This article is intended for PostgreSQL versions 11 and prior. Starting at version 12, <a href="https://www.postgresql.org/docs/12/release-12.html" rel="noopener">PostgreSQL changed the way it treats CTE</a> to prevent the issues described in this article.</p>
</div>
<h3 id="whats-so-dangerous"><a class="toclink" href="#whats-so-dangerous">What's So Dangerous?</a></h3>
<p>Let's create a sample table with two columns and an index, and populate it with 1M random rows:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">padding</span><span class="w"> </span><span class="nb">text</span><span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="gp">haki=#</span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">padding</span><span class="p">)</span>
<span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">md5</span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="o">::</span><span class="nb">text</span><span class="p">)</span>
<span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">1000000</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="k">order</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">random</span><span class="p">();</span>
<span class="go">INSERT 0 1000000</span>
<span class="gp">haki=#</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="k">index</span><span class="w"> </span><span class="n">foo_id_ix</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="go">CREATE INDEX</span>
<span class="gp">haki=#</span><span class="w"> </span><span class="k">analyze</span><span class="w"> </span><span class="n">foo</span><span class="p">;</span>
<span class="go">ANALYZE</span>
</pre></div>
<p>To illustrate the problem with CTE, let's execute a simple query to fetch a single record from the table. First without using a CTE and than using a CTE:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">500000</span><span class="p">;</span>
<span class="go">id | padding</span>
<span class="go">------------------------------------------</span>
<span class="go">500000 | b292eb19f3145fb087648d5956dfa44e</span>
<span class="hll"><span class="go">Time: 0.619 ms</span>
</span>
<span class="gp">haki=#</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="n">cte</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">foo</span><span class="p">)</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">cte</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">500000</span><span class="p">;</span>
<span class="go">id | padding</span>
<span class="go">------------------------------------------</span>
<span class="go">500000 | b292eb19f3145fb087648d5956dfa44e</span>
<span class="hll"><span class="go">Time: 227.675 ms</span>
</span></pre></div>
<p>The first query took 0.619 ms while the second one took almost 300 times more, 227 ms. Why is that?</p>
<p>A lesser known fact about CTE in PostgreSQL is that the <strong>database will evaluate the query inside the CTE and store the results</strong>.</p>
<p>From <a href="https://www.postgresql.org/docs/10/static/queries-with.html" rel="noopener">the docs</a>:</p>
<blockquote>
<p>A useful property of WITH queries is that they are evaluated only once per
execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects</p>
</blockquote>
<p>This sounds great, using CTE we can "cache" expensive calculations that are reused multiple times in the query, BUT...</p>
<blockquote>
<p>However, the other side of this coin is that the optimizer is less able to push
restrictions from the parent query down into a WITH query than an ordinary subquery.</p>
</blockquote>
<p>Going back to the queries above, let's take a look at the execution plans:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">explain</span><span class="w"> </span><span class="p">(</span><span class="k">analyze</span><span class="w"> </span><span class="k">on</span><span class="p">,</span><span class="w"> </span><span class="n">timing</span><span class="w"> </span><span class="k">on</span><span class="p">)</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">500000</span><span class="p">;</span>
<span class="go">QUERY PLAN</span>
<span class="go">------------------------------</span>
<span class="hll"><span class="go">Index Scan using foo_id_ix on foo (cost=0.42..8.44 rows=1 width=37)</span>
</span><span class="go"> (actual time=0.026..0.028 rows=1 loops=1)</span>
<span class="go"> Index Cond: (id = 500000)</span>
<span class="go">Execution time: 0.060 ms</span>
</pre></div>
<p>In the simple query without the CTE PostgreSQL used the index on the ID field to quickly locate the desired record. Simple and fast.</p>
<p>The execution plan using the CTE is a bit different:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">explain</span><span class="w"> </span><span class="p">(</span><span class="k">analyze</span><span class="w"> </span><span class="k">on</span><span class="p">,</span><span class="w"> </span><span class="n">timing</span><span class="w"> </span><span class="k">on</span><span class="p">)</span>
<span class="k">with</span><span class="w"> </span><span class="n">cte</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">foo</span>
<span class="p">)</span>
<span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">cte</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">500000</span><span class="p">;</span>
<span class="go">QUERY PLAN</span>
<span class="go">------------------------------</span>
<span class="go">CTE Scan on cte (cost=18334.00..40834.00 rows=5000 width=36)</span>
<span class="go"> (actual time=3.243..269.290 rows=1 loops=1)</span>
<span class="go"> Filter: (id = 500000)</span>
<span class="go"> Rows Removed by Filter: 999999</span>
<span class="hll"><span class="go"> CTE cte</span>
</span><span class="hll"><span class="go"> -> Seq Scan on foo (cost=0.00..18334.00 rows=1000000 width=37)</span>
</span><span class="hll"><span class="go"> (actual time=0.029..77.078 rows=1000000 loops=1)</span>
</span>
<span class="go">Execution time: 276.625 ms</span>
</pre></div>
<p>PostgreSQL materialized the CTE, meaning, it <strong>created a temporary structure with the results of the query defined in the CTE</strong>, and only then applied the filter to it. Because the predicate was not applied on the table (but the CTE) PostgreSQL was unable to utilize the index on the ID column.</p>
<p>The overall cost of the second query is significantly higher than the first one. It's essentially equivalent to two full table scans plus extra memory to store the CTE result.</p>
<p><strong>A possible alternative to CTE is a subquery</strong>. Let's see how the execution plan looks like when we <strong>inline the CTE</strong> as a subquery:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">explain</span><span class="w"> </span><span class="p">(</span><span class="k">analyze</span><span class="w"> </span><span class="k">on</span><span class="p">,</span><span class="w"> </span><span class="n">timing</span><span class="w"> </span><span class="k">on</span><span class="p">)</span>
<span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">foo</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">subquery</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">500000</span><span class="p">;</span>
<span class="go">QUERY PLAN</span>
<span class="go">------------------------------</span>
<span class="hll"><span class="go">Index Scan using foo_id_ix on foo (cost=0.42..8.44 rows=1 width=37)</span>
</span><span class="go"> (actual time=0.028..0.031 rows=1 loops=1)</span>
<span class="go"> Index Cond: (id = 500000)</span>
<span class="go">Execution time: 0.066 ms</span>
</pre></div>
<p>The execution plan using the subquery is similar to the simple query without the CTE. PostgreSQL was smart enough to apply the predicate <code>id = 500000</code> in the subquery and utilize the index.</p>
<h3 id="is-it-possible-to-prevent-postgresql-from-materializing-a-cte"><a class="toclink" href="#is-it-possible-to-prevent-postgresql-from-materializing-a-cte">Is it Possible to Prevent PostgreSQL From Materializing a CTE?</a></h3>
<p>The short answer is <strong>not that I know of</strong>.</p>
<p>To illustrate the difference let's look at how <strong>Oracle</strong> behaves under similar circumstances (setup can be found <a href="http://sqlfiddle.com/#!4/a03a7/8" rel="noopener">here</a>):</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="go">> SELECT * FROM foo WHERE id = 500000;</span>
</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">| Id | Operation | Name | Rows | Cost |</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">| 0 | SELECT STATEMENT | | 1 | 21 |</span>
<span class="go">| 1 | TABLE ACCESS BY INDEX ROWID | FOO | 1 | 21 |</span>
<span class="go">| * 2 | INDEX RANGE SCAN | FOO_ID_IX | 37 | 1 |</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">Predicate Information (identified by operation id):</span>
<span class="go">------------------------------------------</span>
<span class="go">* 2 - access("ID"=500000)</span>
<span class="hll"><span class="go">> WITH cte AS (</span>
</span><span class="hll"><span class="go"> SELECT * FROM foo</span>
</span><span class="hll"><span class="go">)</span>
</span><span class="hll"><span class="go">SELECT * FROM cte WHERE id = 500000;</span>
</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">| Id | Operation | Name | Rows | Cost |</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">| 0 | SELECT STATEMENT | | 1 | 21 |</span>
<span class="go">| 1 | TABLE ACCESS BY INDEX ROWID | FOO | 1 | 21 |</span>
<span class="go">| * 2 | INDEX RANGE SCAN | FOO_ID_IX | 37 | 1 |</span>
<span class="go">-----------------------------------------------------------------</span>
<span class="go">Predicate Information (identified by operation id):</span>
<span class="go">------------------------------------------</span>
<span class="go">* 2 - access("FOO"."ID"=500000)</span>
</pre></div>
<p>Unlike PostgreSQL, <strong>Oracle is not materializing CTEs by default</strong> and the two queries generate the same execution plan.</p>
<p>There is however, an undocumented hint in Oracle that can be used to force it to materialize the CTE the same way PostgreSQL does:</p>
<div class="highlight"><pre><span></span><span class="go">> WITH cte AS (</span>
<span class="hll"><span class="go"> SELECT /*+ MATERIALIZE */ * FROM foo</span>
</span><span class="go">)</span>
<span class="go">SELECT * FROM cte WHERE id = 500000;</span>
<span class="go">--------------------------------------------------------------</span>
<span class="go">| Id | Operation | Name | Rows | Cost |</span>
<span class="go">--------------------------------------------------------------</span>
<span class="go">| 0 | SELECT STATEMENT | | 9308 | 46 |</span>
<span class="go">| 1 | TEMP TABLE TRANSFORMATION | | | |</span>
<span class="go">| 2 | LOAD AS SELECT | SYS_TEMP | | |</span>
<span class="go">| 3 | TABLE ACCESS FULL | FOO | 9308 | 22 |</span>
<span class="go">| * 4 | VIEW | | 9308 | 24 |</span>
<span class="go">| 5 | TABLE ACCESS FULL | SYS_TEMP | 9308 | 24 |</span>
<span class="go">--------------------------------------------------------------</span>
<span class="go">Predicate Information (identified by operation id):</span>
<span class="go">------------------------------------------</span>
<span class="go">* 4 - filter("ID"=500000)</span>
</pre></div>
<p>In the execution plan we can see that Oracle created an in-memory temp table to store the results of the CTE. The runtime and the memory usage are significantly higher.</p>
<h3 id="predicate-push-down-and-cte-inlining"><a class="toclink" href="#predicate-push-down-and-cte-inlining">Predicate Push Down and CTE Inlining</a></h3>
<p>The behaviour illustrated above is often referred to as "<strong>push predicate</strong>", "predicate push down" or "<strong>CTE inlining</strong>".</p>
<p><a href="https://docs.oracle.com/database/121/TGSQL/tgsql_transform.htm#TGSQL-GUID-D2E2DE0D-A013-41C2-8527-A797B1F35709" rel="noopener">Predicate push down</a> means that the query optimizer can move predicates around based on logical rules in order generate better execution plans.</p>
<p><strong>CTE inlining</strong> is when the query optimizer decides to inline a CTE as a subquery which, as we've seen above, makes it possible to push the predicate.
PostgreSQL is not inlining CTEs.</p>
<p>Discussions about the original decision to not inline CTEs and newer requests to change this behavior can be found in the <a href="https://www.postgresql.org/message-id/flat/87sh48ffhb.fsf@news-spur.riddles.org.uk" rel="noopener">PostgreSQL mailing lists</a>.</p>
<h3 id="should-i-stop-using-cte"><a class="toclink" href="#should-i-stop-using-cte">Should I Stop Using CTE?</a></h3>
<p><strong>No! CTE are awesome and very useful</strong>. It's just important to be aware, especially in PostgreSQL, that CTEs are materialized. I found many cases where easy performance gains were achieved simply by inlining CTEs.</p>Automating the Boring Stuff in Django Using the Check Framework2018-06-05T00:00:00+03:002018-06-05T00:00:00+03:00Haki Benitatag:hakibenita.com,2018-06-05:/automating-the-boring-stuff-in-django-using-the-check-framework<p>Every team has a unique development style. Some teams implement localization and require translations. Some teams are more sensitive to database issues and require more careful handling of indexes and constraints. In this article we describe how we enforce our own development style using the Django check framework, the inspect and the ast modules from the Python standard library.</p><hr>
<p>Every team has a unique development style. Some teams implement localization and require translations. Some teams are more sensitive to database issues and require more careful handling of indexes and constraints.</p>
<p>Existing tools can not always address these specific issues out of the box, so we came up with a way to <strong>enforce our own development style using the Django check framework, the inspect and the ast modules from the Python standard library.</strong></p>
<div class="dark--invert">
<figure><img alt="Image by <a href="https://www.instagram.com/_wrightdesign/">Wright Design</a>" src="https://hakibenita.com/images/all_nighter.png"><figcaption>Image by <a href="https://www.instagram.com/_wrightdesign/">Wright Design</a></figcaption>
</figure>
</div>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#django-checks">Django checks</a><ul>
<li><a href="#our-first-check">Our first check</a></li>
<li><a href="#inspecting-the-code">Inspecting the code</a></li>
<li><a href="#parsing-the-code">Parsing the code</a></li>
<li><a href="#evaluating-a-model-field">Evaluating a Model Field</a></li>
<li><a href="#issuing-django-checks">Issuing Django checks</a></li>
<li><a href="#putting-it-all-together">Putting it all together</a></li>
</ul>
</li>
<li><a href="#custom-checks-in-the-real-world">Custom Checks in the Real World</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="django-checks"><a class="toclink" href="#django-checks">Django checks</a></h2>
<p>Django checks are part of the <a href="https://docs.djangoproject.com/en/2.0/ref/checks/#system-check-framework" rel="noopener">Django System Check
framework</a>.
To quote the docs:</p>
<blockquote>
<p>The system check framework is a set of static checks for validating Django
projects. It detects common problems and provides hints for how to fix them. The framework is extensible so you can easily add your own checks.</p>
</blockquote>
<p>One check you might be familiar with is this one from Django admin:</p>
<div class="highlight"><pre><span></span><span class="go">SystemCheckError: System check identified some issues:</span>
<span class="go">ERRORS:</span>
<span class="go"><class 'app.admin.BarAdmin></span>
<span class="go">(admin.E108) The value of 'list_display[3]' refers to 'foo',</span>
<span class="go">which is not a callable, an attribute of 'Bar', or an attribute</span>
<span class="go">or method on 'app.Bar'.</span>
</pre></div>
<p>The Django admin developers added a system check to warn developers about fields in the model admin that does not exist in the actual model. In this case the field 'foo' do not exist in model Bar.</p>
<p>Checks are executed by some management commands such as <code>makemigrations</code> and <code>migrate</code>. It's also possible to explicitly run check using <code>manage.py</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./manage.py<span class="w"> </span>check
</pre></div>
<p>It's a good idea to incorporate <code>check</code> in your CI. If you want to fail the CI on warnings you can do that by setting a flag:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./manage.py<span class="w"> </span>check<span class="w"> </span>--fail-level<span class="o">=</span>WARNING
</pre></div>
<p>A simple example of how Django uses checks can be found in the source code of the <a href="https://github.com/django/django/blob/c03e41712b2274f524d32bc2aef455ed82c9e3b4/django/db/models/fields/__init__.py#L211" rel="noopener">model Field checks</a>.</p>
<h3 id="our-first-check"><a class="toclink" href="#our-first-check">Our first check</a></h3>
<p>Most of our apps are not designated for English speakers so we use translations extensively. We put a lot of focus during code review to make sure everything is translated properly.</p>
<p>One of the main issues that come up during code reviews is that <strong>developers often forget to set verbose_name on model fields.</strong></p>
<p>Checking that a field has a verbose name is a pretty straightforward task and we wanted to automate the process of making sure it was set.</p>
<p>To get us started we are going to define a simple customer profile model:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">CustomerProfile</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">PositiveSmallIntegerField</span><span class="p">(</span>
<span class="n">primary_key</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">verbose_name</span><span class="o">=</span><span class="n">_</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">created_by</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">User</span><span class="p">,</span>
<span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">PROTECT</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>The "name" field does not have <code>verbose_name</code>. Let's see if we can identify that using only the model's <code>_meta</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">name_field</span> <span class="o">=</span> <span class="n">CustomerProfile</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">get_field</span><span class="p">(</span><span class="s1">'name'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">name_field</span><span class="o">.</span><span class="n">verbose_name</span>
<span class="go">name</span>
</pre></div>
<p>It looks like Django did something under to hood to set the <code>verbose_name</code>. Looking at the Field class, there is a function called <a href="https://github.com/django/django/blob/c03e41712b2274f524d32bc2aef455ed82c9e3b4/django/db/models/fields/__init__.py#L724" rel="noopener">set_attributes_from_name</a> that populates <code>verbose_name</code> by transforming the name of the field - this is where the <code>verbose_name</code> "name" came from.</p>
<p>Because Django is setting the <code>verbose_name</code> on its own the string "name" <strong>will not be picked up by </strong><code>makemessages</code><strong> and will not be added to the po file automatically. </strong>This will probably cause the string "name" to go unnoticed. We don't want that.</p>
<p>Also, because Django is populating the field automatically <strong>we can't use the model _meta to check if </strong><code>verbose_name</code><strong> was originally set</strong>. To do that we need to inspect the actual source code.</p>
<h3 id="inspecting-the-code"><a class="toclink" href="#inspecting-the-code">Inspecting the code</a></h3>
<p>I didn't use the word <em>inspect</em> for no reason - Python has a module called <a href="https://docs.python.org/3/library/inspect.html" rel="noopener">inspect</a> that we can use to, well, inspect code:</p>
<blockquote>
<p>The inspect module provides several useful functions to help get information
about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.</p>
</blockquote>
<p>Let's see what we can get from inspect:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">inspect</span>
<span class="gp">>>> </span><span class="n">inspect</span><span class="o">.</span><span class="n">getsource</span><span class="p">(</span><span class="n">CustomerProfile</span><span class="p">)</span>
<span class="go">"class CustomerProfile(models.Model):\n id = models.PositiveSmallIntegerField(\n</span>
<span class="go">primary_key=True,\n verbose_name=_('Name'),\n )\n name = models.CharField(\n</span>
<span class="go">max_length=100,\n )\n created_by = models.ForeignKey(\n User,\n on_delete=models.PROTECT,\n</span>
<span class="go">)\n\n def __str__(self):\n return self.name\n"</span>
</pre></div>
<p>That's pretty exciting. We gave inspect the class and got the source code for that class as text.</p>
<p>Given the source code we could have used some fancy RegExp to parse the code but once again, Python already has us covered.</p>
<h3 id="parsing-the-code"><a class="toclink" href="#parsing-the-code">Parsing the code</a></h3>
<p>Parsing code in Python is done by the <a href="https://docs.python.org/3/library/ast.html" rel="noopener">ast
module</a>:</p>
<blockquote>
<p><em>The ast module helps Python applications to process trees of the Python
abstract syntax grammar.</em></p>
</blockquote>
<p>Great! A tree is much easier to work with than text.</p>
<p>Let's use ast to parse the source code of our model:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">inspect</span>
<span class="hll"><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">ast</span>
</span><span class="gp">>>> </span><span class="n">model_source</span> <span class="o">=</span> <span class="n">inspect</span><span class="o">.</span><span class="n">getsource</span><span class="p">(</span><span class="n">CustomerProfile</span><span class="p">)</span>
<span class="hll"><span class="gp">>>> </span><span class="n">model_node</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">model_source</span><span class="p">)</span>
</span><span class="gp">>>> </span><span class="n">ast</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">model_node</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="go">Module([</span>
<span class="go"> ClassDef('CustomerProfile',</span>
<span class="go"> [Attribute(Name('models', Load()), 'Model', Load())],</span>
<span class="go"> [],</span>
<span class="go"> [</span>
<span class="go"> Assign(</span>
<span class="go"> [Name('id', Store())],</span>
<span class="go"> Call(</span>
<span class="go"> Attribute(Name('models', Load()), 'PositiveSmallIntegerField', Load()),</span>
<span class="go"> [],</span>
<span class="go"> [</span>
<span class="go"> keyword('primary_key', NameConstant(True)),</span>
<span class="go"> keyword('verbose_name', Call(Name('_', Load()), [Str('Name')], []))</span>
<span class="go"> ]</span>
<span class="go"> )</span>
<span class="go"> ),</span>
<span class="go"> Assign(</span>
<span class="go"> [Name('name', Store())],</span>
<span class="go"> Call(</span>
<span class="go"> Attribute(Name('models', Load()), 'CharField', Load()),</span>
<span class="go"> [],</span>
<span class="go"> [keyword('max_length', Num(100))]</span>
<span class="go"> )</span>
<span class="go"> ),</span>
<span class="go"> Assign(</span>
<span class="go"> [Name('created_by', Store())],</span>
<span class="go"> Call(</span>
<span class="go"> Attribute(Name('models', Load()), 'ForeignKey', Load()),</span>
<span class="go"> [Name('User', Load())],</span>
<span class="go"> [keyword('on_delete', Attribute(Name('models', Load()), 'PROTECT', Load()))]</span>
<span class="go"> )</span>
<span class="go"> ),</span>
<span class="go"> FunctionDef(</span>
<span class="go"> '__str__',</span>
<span class="go"> arguments([arg('self', None)],None,[],[],None,[]),</span>
<span class="go"> [Return(Attribute(Name('self', Load()), 'name', Load()))], [], None</span>
<span class="go"> )</span>
<span class="go"> ],</span>
<span class="go"> []</span>
<span class="go"> )</span>
<span class="go">])</span>
</pre></div>
<p>If we look closely at the dump we can identify that our <strong>model fields are all
Assign nodes</strong>.</p>
<p>Let's zoom-in on the "name" field:</p>
<div class="highlight"><pre><span></span>Assign(
[Name('name', Store())],
Call(
Attribute(Name('models', Load()), 'CharField', Load()),
[],
[keyword('max_length', Num(100))]
)
)
</pre></div>
<p>The model field is an assignment of a Call node (CharField) to a Name node ("name"). The Call node has a list of arguments. In this case we only have one argument "max_length" with the numeric value 100.</p>
<p>Our id field looks like this:</p>
<div class="highlight"><pre><span></span>Assign(
[Name('id', Store())],
Call(
Attribute(Name('models', Load()), 'PositiveSmallIntegerField', Load()), [], [
keyword('primary_key', NameConstant(True)),
keyword('verbose_name', Call(
Name('_', Load()), [Str('Name')], []
)
)
])
)
</pre></div>
<p>The id field is also an Assign node with a Name node and a Call node. The id field has two keywords - <code>primary_key</code> and <code>verbose_name,</code> which is <strong>the one we are looking for</strong>.</p>
<h3 id="evaluating-a-model-field"><a class="toclink" href="#evaluating-a-model-field">Evaluating a Model Field</a></h3>
<p>To evaluate the fields we first need to identify them. We already saw that <strong>model fields are Assign nodes </strong>but we <strong>can't rely on them being the only Assign nodes </strong>in the class.</p>
<p>The only thing we can rely on is that at the top level of the class the attribute names are unique. Meaning, <strong>if we know there is a field called "name" we can assume the attribute "name" of the class is the field.</strong></p>
<p>Let's join forces with Django model <code>_meta</code> to find the nodes of the model fields:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">FieldDoesNotExist</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">model_node</span><span class="o">.</span><span class="n">body</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">body</span><span class="p">:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Assign</span><span class="p">):</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">targets</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">targets</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">ast</span><span class="o">.</span><span class="n">Name</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">field_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">targets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">id</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">field</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">get_field</span><span class="p">(</span><span class="n">field_name</span><span class="p">)</span>
<span class="k">except</span> <span class="n">FieldDoesNotExist</span><span class="p">:</span>
<span class="k">continue</span>
<span class="c1"># node is field!</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li><strong>Model fields are defined at the top level of the class</strong> - we only need to check attributes defined at the top level (no need to "visit" nodes recursively).</li>
<li><strong>Model fields will have a Name target </strong>- the name of the field.</li>
<li>Finally, the field we assign will be <strong>registered in the Django model as a field</strong>.</li>
</ol>
<p>Now we have the field node and we can check if there is a <code>verbose_name</code> attribute defined.</p>
<p>Let's iterate the keywords and search for <code>verbose_name</code>:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">kw</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">keywords</span><span class="p">:</span>
<span class="hll"> <span class="k">if</span> <span class="n">kw</span><span class="o">.</span><span class="n">arg</span> <span class="o">==</span> <span class="s1">'verbose_name'</span><span class="p">:</span>
</span><span class="hll"> <span class="n">verbose_name</span> <span class="o">=</span> <span class="n">kw</span>
</span> <span class="k">break</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">verbose_name</span> <span class="o">=</span> <span class="kc">None</span>
</pre></div>
<p>At this point, if <code>verbose_name</code> is None we know that the attribute was not set and we are ready to <strong>issue our first warning!</strong></p>
<h3 id="issuing-django-checks"><a class="toclink" href="#issuing-django-checks">Issuing Django checks</a></h3>
<p>To issue checks we need to register a function with the check framework:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core</span> <span class="kn">import</span> <span class="n">check</span> <span class="n">checks</span>
<span class="hll"><span class="nd">@checks</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">checks</span><span class="o">.</span><span class="n">Tags</span><span class="o">.</span><span class="n">models</span><span class="p">)</span>
</span><span class="k">def</span> <span class="nf">run_custom_checks</span><span class="p">(</span><span class="n">app_configs</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c1"># implement check logic</span>
</pre></div>
<p>Inside the function we implement the check logic and return a list of checks.</p>
<p>We want to warn the developer that a field is missing a <code>verbose_name</code> attribute, so once we find a field that has no <code>verbose_name</code> we create a <code>CheckMessage</code> of type <code>Warning</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.checks</span> <span class="kn">import</span> <span class="ne">Warning</span>
<span class="nd">@checks</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">checks</span><span class="o">.</span><span class="n">Tags</span><span class="o">.</span><span class="n">models</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">run_custom_checks</span><span class="p">(</span><span class="n">app_configs</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c1"># inspect and parse models...</span>
<span class="k">return</span> <span class="p">[(</span>
<span class="hll"> <span class="ne">Warning</span><span class="p">(</span>
</span><span class="hll"> <span class="s1">'Field has no verbose name'</span><span class="p">,</span>
</span><span class="hll"> <span class="n">hint</span><span class="o">=</span><span class="s1">'Set verbose name on field </span><span class="si">{}</span><span class="s1">.'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">field</span><span class="o">.</span><span class="n">name</span><span class="p">),</span>
</span><span class="hll"> <span class="n">obj</span><span class="o">=</span><span class="n">field</span><span class="p">,</span>
</span><span class="hll"> <span class="nb">id</span><span class="o">=</span><span class="s1">'H001'</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="p">)]</span>
</pre></div>
<p>I assigned the code <code>H00X</code> to my warnings (guess whyβ¦). For each warning we can also add a hint to inform the developer on how to address the issue raised by the warning.</p>
<h3 id="putting-it-all-together"><a class="toclink" href="#putting-it-all-together">Putting it all together</a></h3>
<p>To recap what we did so far:</p>
<ol>
<li>Get the source code for a model using inspect.</li>
<li>Parse the model source code using ast and identify the field nodes.</li>
<li>Examine a field node and check if <code>verbose_name</code> is defined.</li>
<li>Register a function with the check framework and issue a Warning.</li>
</ol>
<p>The skeleton of a function that checks a single model:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/checks.py</span>
<span class="k">def</span> <span class="nf">check_model</span><span class="p">(</span><span class="n">model</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Check a single model.</span>
<span class="sd"> Yields (django.checks.CheckMessage)</span>
<span class="sd"> """</span>
<span class="n">model_source</span> <span class="o">=</span> <span class="n">inspect</span><span class="o">.</span><span class="n">getsource</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="n">model_node</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">model_source</span><span class="p">)</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">model_node</span><span class="o">.</span><span class="n">body</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">body</span><span class="p">:</span>
<span class="c1"># Check if node is a model field.</span>
<span class="c1"># Check if field has verbose name defined</span>
<span class="k">yield</span> <span class="ne">Warning</span><span class="p">(</span>
<span class="s1">'Field has no verbose name'</span><span class="p">,</span>
<span class="n">hint</span><span class="o">=</span><span class="s1">'Set verbose name on field </span><span class="si">{}</span><span class="s1">.'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">field</span><span class="o">.</span><span class="n">name</span><span class="p">),</span>
<span class="n">obj</span><span class="o">=</span><span class="n">field</span><span class="p">,</span>
<span class="nb">id</span><span class="o">=</span><span class="s1">'H001'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>The next step is to implement a single function to <strong>iterate over all models</strong>, run our checks and register it with the Django check framework:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/checks.py</span>
<span class="nd">@checks</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">checks</span><span class="o">.</span><span class="n">Tags</span><span class="o">.</span><span class="n">models</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">check_models</span><span class="p">(</span><span class="n">app_configs</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">errors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="hll"> <span class="k">for</span> <span class="n">app</span> <span class="ow">in</span> <span class="n">django</span><span class="o">.</span><span class="n">apps</span><span class="o">.</span><span class="n">apps</span><span class="o">.</span><span class="n">get_app_configs</span><span class="p">():</span>
</span>
<span class="c1"># Skip third party apps.</span>
<span class="k">if</span> <span class="n">app</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'site-packages'</span><span class="p">)</span> <span class="o">></span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">app</span><span class="o">.</span><span class="n">get_models</span><span class="p">():</span>
<span class="hll"> <span class="k">for</span> <span class="n">check_message</span> <span class="ow">in</span> <span class="n">check_model</span><span class="p">(</span><span class="n">model</span><span class="p">):</span>
</span><span class="hll"> <span class="n">errors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">check_message</span><span class="p">)</span>
</span>
<span class="k">return</span> <span class="n">errors</span>
</pre></div>
<p>We use a little trick to skip models from third party apps. We assume that when
installing third party apps using <code>pip install</code> they are installed in a
directory called "site-packages".</p>
<p>The only thing left to do it to import this file somewhere in the code and
that's it.</p>
<div class="highlight"><pre><span></span><span class="c1"># app/__init__.py</span>
<span class="hll"><span class="kn">from</span> <span class="nn">common.checks</span> <span class="kn">import</span> <span class="o">*</span> <span class="c1"># noqa</span>
</span></pre></div>
<p>Let's see our new check in action:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./manage.py<span class="w"> </span>check
SystemCheckError:<span class="w"> </span>System<span class="w"> </span>check<span class="w"> </span>identified<span class="w"> </span>some<span class="w"> </span>issues:
<span class="hll">WARNINGS:
</span><span class="hll">app.CustomerProfile.name:<span class="w"> </span><span class="o">(</span>H001<span class="o">)</span><span class="w"> </span>Field<span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>verbose<span class="w"> </span>name
</span><span class="hll">HINT:<span class="w"> </span>Set<span class="w"> </span>verbose<span class="w"> </span>name<span class="w"> </span>on<span class="w"> </span>the<span class="w"> </span>field<span class="w"> </span><span class="s2">"name"</span>.
</span>
System<span class="w"> </span>check<span class="w"> </span>identified<span class="w"> </span><span class="m">1</span><span class="w"> </span>issues<span class="w"> </span><span class="o">(</span><span class="m">0</span><span class="w"> </span>silenced<span class="o">)</span>.
</pre></div>
<p>Exactly what we wanted!</p>
<hr>
<h2 id="custom-checks-in-the-real-world"><a class="toclink" href="#custom-checks-in-the-real-world">Custom Checks in the Real World</a></h2>
<p>To give a sense of what you can do with Django checks, these are the checks we
use in our code base:</p>
<ul>
<li>
<p><strong>H001: Field has no verbose name.</strong>
This is the example we just saw.</p>
</li>
<li>
<p><strong>H002: Verbose name should use gettext.</strong>
Make sure verbose_name is always in the form of <code>verbose_name=_('text')</code>. If the value is not using gettext it will not be translated.</p>
</li>
<li>
<p><strong>H003: Words in verbose name must be all upper case or all lower case.</strong>
We decided to use only lower case in verbose names. Using lower case texts we were able to reuse more translations. One exception to the rule is acronyms such as API and ETL. The general rule we ended up with is making sure all words are either all lower or all upper case. For example, "etl run" is valid, "ETL run" is also valid, "Etl Run" is not valid.</p>
</li>
<li>
<p><strong>H004: Help text should use gettext.</strong>
Help text is displayed to the user in admin forms and detail views so it should use gettext and be translated.</p>
</li>
<li>
<p><strong>H005: Model must define class Meta.</strong>
The translation of the model name is defined in the model Meta class so every model must have a class Meta.</p>
</li>
<li>
<p><strong>H006: Model has no verbose name.</strong>
Model verbose names are defined in the class Meta and are displayed to the user in the admin so they should be translated.</p>
</li>
<li>
<p><strong>H007: Model has no verbose name plural.</strong>
Plural model names are used in the admin and are displayed to the user so they should be translated.</p>
</li>
<li>
<p><strong>H008: Must set db_index explicitly on a ForeignKey field.</strong>
This must be the most useful check we defined. This check forces the developer to explicitly set db_index on every ForeignKey field. I wrote in the past about <a href="/9-django-tips-for-working-with-databases#fk-indexes">how a database index is created implicitly for every foreign key field</a>. By making sure the developer is aware of that and making him decide if an index is required or not, you are left with <strong>only the indexes you really need!</strong></p>
</li>
</ul>
<p>This is it, <strong>go piss off some colleagues!</strong></p>
<div class="admonition info">
<p class="admonition-title">source code</p>
<p>The complete source code for the checks above can be found in <a href="https://gist.github.com/hakib/e2e50d41d19a6984dc63bd94580c8647" rel="noopener">this gist</a>.</p>
</div>9 Django Tips for Working with Databases2018-01-29T00:00:00+02:002018-01-29T00:00:00+02:00Haki Benitatag:hakibenita.com,2018-01-29:/9-django-tips-for-working-with-databases<p>ORMs offer great utility for developers but abstracting access to the database has its costs. Developers who are willing to poke around the database and change some defaults often find that great improvements can be made.</p><hr>
<p>ORMs offer great utility for developers but abstracting access to the database has its costs. Developers who are willing to poke around the database and change some defaults often find that great improvements can be made.</p>
<h3 id="aggregation-with-filter"><a class="toclink" href="#aggregation-with-filter">Aggregation with Filter</a></h3>
<p>Prior to Django 2.0 if we wanted to get something like the total number of users and the total number of active users we had to resort to <a href="https://docs.djangoproject.com/en/2.0/ref/models/conditional-expressions/" rel="noopener">conditional expressions</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">Count</span><span class="p">,</span>
<span class="n">Sum</span><span class="p">,</span>
<span class="n">Case</span><span class="p">,</span>
<span class="n">When</span><span class="p">,</span>
<span class="n">Value</span><span class="p">,</span>
<span class="n">IntegerField</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="n">total_users</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="n">total_active_users</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="n">Case</span><span class="p">(</span>
<span class="hll"> <span class="n">When</span><span class="p">(</span><span class="n">is_active</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">then</span><span class="o">=</span><span class="n">Value</span><span class="p">(</span><span class="mi">1</span><span class="p">)),</span>
</span><span class="hll"> <span class="n">default</span><span class="o">=</span><span class="n">Value</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span>
</span><span class="hll"> <span class="n">output_field</span><span class="o">=</span><span class="n">IntegerField</span><span class="p">(),</span>
</span> <span class="p">)),</span>
<span class="p">)</span>
</pre></div>
<p>In Django 2.0 a <code>filter</code> <a href="https://docs.djangoproject.com/en/2.0/ref/models/querysets/#id6" rel="noopener">argument to aggregate functions</a> was added to make this a lot easier:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span><span class="p">,</span> <span class="n">F</span>
<span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="n">total_users</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="hll"> <span class="n">total_active_users</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="nb">filter</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'is_active'</span><span class="p">)),</span>
</span><span class="p">)</span>
</pre></div>
<p>Nice, short and sweet.</p>
<p>If you are using PostgreSQL, the two queries will look like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total_users</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="k">CASE</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">is_active</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">END</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total_active_users</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_users</span><span class="p">;</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total_users</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">FILTER</span><span class="w"> </span><span class="p">(</span><span class="k">WHERE</span><span class="w"> </span><span class="n">is_active</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total_active_users</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">auth_users</span><span class="p">;</span>
</pre></div>
<p>The second query uses the <code>FILTER (WHERE β¦)</code> clause.</p>
<hr>
<h3 id="queryset-results-as-namedtuple"><a class="toclink" href="#queryset-results-as-namedtuple">QuerySet Results as <code>namedtuple</code></a></h3>
<p><a href="/working-with-apis-the-pythonic-way">I'm a big fan of namedtuples</a> and apparently starting Django 2.0 so is the ORM.</p>
<p>In Django 2.0 a <a href="https://docs.djangoproject.com/en/2.0/ref/models/querysets/#django.db.models.query.QuerySet.values_list" rel="noopener">new attribute was added to values_list called</a> <code>named</code>. Setting <code>named</code> to true will return the queryset as a list of namedtuples:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">user</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span>
<span class="go"> 'first_name',</span>
<span class="go"> 'last_name',</span>
<span class="go">)[0]</span>
<span class="go">('Haki', 'Benita')</span>
<span class="gp">>>> </span><span class="n">user_names</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span>
<span class="go"> 'first_name',</span>
<span class="go"> 'last_name',</span>
<span class="hll"><span class="go"> named=True,</span>
</span><span class="go">)</span>
<span class="gp">>>> </span><span class="n">user_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="hll"><span class="go">Row(first_name='Haki', last_name='Benita')</span>
</span>
<span class="hll"><span class="gp">>>> </span><span class="n">user_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">first_name</span>
</span><span class="go">'Haki'</span>
<span class="gp">>>> </span><span class="n">user_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">last_name</span>
<span class="go">'Benita'</span>
</pre></div>
<hr>
<h3 id="custom-functions"><a class="toclink" href="#custom-functions">Custom Functions</a></h3>
<p>Django ORM is very powerful and feature-rich but it can't possibly keep up with all database vendors. Luckily the ORM lets us extend it with custom functions.</p>
<p>Say we have a Report model with a duration field. We want to find the average duration of all reports:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Avg</span>
<span class="gp">>>> </span><span class="n">Report</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">avg_duration</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'duration'</span><span class="p">))</span>
<span class="go">{'avg_duration': datetime.timedelta(0, 0, 55432)}</span>
</pre></div>
<p>That's great, but average alone tells us very little. Let's try to fetch the standard deviation as well:</p>
<div class="highlight"><pre><span></span><span class="go">>>>> from django.db.models import Avg, StdDev</span>
<span class="gp">>>> </span><span class="n">Report</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">avg_duration</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'duration'</span><span class="p">),</span>
<span class="hll"><span class="gp">... </span> <span class="n">std_duration</span><span class="o">=</span><span class="n">StdDev</span><span class="p">(</span><span class="s1">'duration'</span><span class="p">),</span>
</span><span class="gp">... </span><span class="p">)</span>
<span class="go">ProgrammingError: function stddev_pop(interval) does not exist</span>
<span class="go">LINE 1: SELECT STDDEV_POP("report"."duration") AS "std_dura...</span>
<span class="go"> ^</span>
<span class="go">HINT: No function matches the given name and argument types.</span>
<span class="hll"><span class="go">You might need to add explicit type casts.</span>
</span></pre></div>
<p>Oops⦠PostgreSQL does not support stddev on an interval field - we need to convert the interval to a number before we can apply <code>STDDEV_POP</code> to it.</p>
<p>One option is extracting epoch from the duration:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">AVG</span><span class="p">(</span><span class="n">duration</span><span class="p">),</span>
<span class="hll"><span class="w"> </span><span class="n">STDDEV_POP</span><span class="p">(</span><span class="k">EXTRACT</span><span class="p">(</span><span class="n">EPOCH</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">duration</span><span class="p">))</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">report</span><span class="p">;</span>
<span class="go"> avg | stddev_pop</span>
<span class="go">------------------+------------------</span>
<span class="go"> 00:00:00.55432 | 1.06310113695549</span>
<span class="go">(1 row)</span>
</pre></div>
<p>So how can we implement this in Django? You guessed it - a <a href="https://docs.djangoproject.com/en/2.0/ref/models/expressions/#func-expressions" rel="noopener">custom
function</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/db.py</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Func</span>
<span class="k">class</span> <span class="nc">Epoch</span><span class="p">(</span><span class="n">Func</span><span class="p">):</span>
<span class="n">function</span> <span class="o">=</span> <span class="s1">'EXTRACT'</span>
<span class="n">template</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%(function)s</span><span class="s2">('epoch' from </span><span class="si">%(expressions)s</span><span class="s2">)"</span>
</pre></div>
<p>And use our new function like this:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Avg</span><span class="p">,</span> <span class="n">StdDev</span><span class="p">,</span> <span class="n">F</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">common.db</span> <span class="kn">import</span> <span class="n">Epoch</span>
<span class="gp">>>> </span><span class="n">Report</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">avg_duration</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'duration'</span><span class="p">),</span>
<span class="gp">... </span> <span class="n">std_duration</span><span class="o">=</span><span class="n">StdDev</span><span class="p">(</span><span class="n">Epoch</span><span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="s1">'duration'</span><span class="p">))),</span>
<span class="gp">... </span><span class="p">)</span>
<span class="go">{'avg_duration': datetime.timedelta(0, 0, 55432),</span>
<span class="go">'std_duration': 1.06310113695549}</span>
</pre></div>
<p>Notice the use of the <a href="https://docs.djangoproject.com/en/2.0/ref/models/expressions/#f-expressions" rel="noopener">F expression</a> in the call to Epoch.</p>
<hr>
<h3 id="statement-timeout"><a class="toclink" href="#statement-timeout">Statement Timeout</a></h3>
<p>This is probably the easiest and most important tip I can give. We are all humans and we make mistakes. We can't possibly handle each and every edge case so <strong>we must set boundaries</strong>.</p>
<p>Unlike other non-blocking app servers such as Tornado, asyncio or even Node, Django usually uses synchronous worker processes. This means that <strong>when a user executes a long running operation, the worker process is blocked and no one else can use it until it is done</strong>.</p>
<p>I'm sure no one is really running Django in production with just one worker process but we still want to make sure a single query is not hogging too much resources for too long.</p>
<p>In most Django apps the majority of time is spent waiting for database queries. So, <a href="https://www.postgresql.org/docs/9.6/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT" rel="noopener">setting a timeout on SQL queries</a> <strong>is a good place to start.</strong></p>
<p>We like setting a global timeout in our <code>wsgi.py</code> file like this:</p>
<div class="highlight"><pre><span></span><span class="c1"># wsgi.py</span>
<span class="kn">from</span> <span class="nn">django.db.backends.signals</span> <span class="kn">import</span> <span class="n">connection_created</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="nd">@receiver</span><span class="p">(</span><span class="n">connection_created</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">setup_postgres</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="n">connection</span><span class="o">.</span><span class="n">vendor</span> <span class="o">!=</span> <span class="s1">'postgresql'</span><span class="p">:</span>
<span class="k">return</span>
<span class="c1"># Timeout statements after 30 seconds.</span>
<span class="k">with</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> <span class="k">as</span> <span class="n">cursor</span><span class="p">:</span>
<span class="hll"> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"SET statement_timeout TO 30000;"</span><span class="p">)</span>
</span></pre></div>
<p><strong>Why wsgi.py?</strong> This way it only affects worker processes and not out-of-band analytic queries, cron tasks, etc.</p>
<p>Hopefully, you are using <a href="https://docs.djangoproject.com/en/2.0/ref/databases/#persistent-connections" rel="noopener">persistent database connections</a>, so this per-connection setup should not add overhead to each request.</p>
<p>The timeout can also be set <a href="https://www.postgresql.org/docs/9.5/static/sql-alteruser.html" rel="noopener">at the user level</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">db=#</span><span class="o">></span><span class="w"> </span><span class="k">alter</span><span class="w"> </span><span class="k">user</span><span class="w"> </span><span class="n">app_user</span><span class="w"> </span><span class="k">set</span><span class="w"> </span><span class="n">statement_timeout</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="mf">30000</span><span class="p">;</span>
<span class="go">ALTER ROLE</span>
</pre></div>
<p><em>SIDE NOTE</em>: The other common place we spent a lot of time at is networking. So make sure when you call a remote service to <a href="http://docs.python-requests.org/en/master/user/quickstart/#timeouts" rel="noopener">always set a timeout</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">requests</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span>
<span class="s1">'https://api.slow-as-hell.com'</span><span class="p">,</span>
<span class="hll"> <span class="n">timeout</span><span class="o">=</span><span class="mi">3000</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<hr>
<h3 id="limit"><a class="toclink" href="#limit">LIMIT</a></h3>
<p>This is somewhat related to the last point about setting boundaries. Sometimes we want to let users produce reports and maybe export them to a spreadsheet. These types of views are usually the immediate suspects for any weird behavior in production.</p>
<p>It's not uncommon to encounter a user that thinks it's reasonable to export all sales since the dawn of time in the middle of the work day. It's also not uncommon for this same user to open another tab and try again when the first attempt "got stuck".</p>
<p><strong>This is where LIMIT comes in.</strong></p>
<p>Let's limit a certain query to no more than 100 rows:</p>
<div class="highlight"><pre><span></span><span class="go"># bad example</span>
<span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Sale</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">())[:</span><span class="mi">100</span><span class="p">]</span>
</pre></div>
<p>This is the worst thing you can do. You just fetched all gazillion rows into memory just to return the first 100.</p>
<p>Let's try again:</p>
<div class="highlight"><pre><span></span><span class="go">data = Sale.objects.all()[:100]</span>
</pre></div>
<p>This is better. Django will use the limit clause in the SQL to fetch only 100 rows.</p>
<p>Now let's say we added the limit, the users are under control and all is good. We still have one problem - the user asked for all the sales and we gave them 100. The <strong>user now thinks there are only 100 sales - this is wrong</strong>.</p>
<p>Instead of blindly returning the first 100 rows, let's make sure that if there are more than 100 rows (normally after filtering) we throw an exception:</p>
<div class="highlight"><pre><span></span><span class="n">LIMIT</span> <span class="o">=</span> <span class="mi">100</span>
<span class="k">if</span> <span class="n">Sales</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">></span> <span class="n">LIMIT</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">ExceededLimit</span><span class="p">(</span><span class="n">LIMIT</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Sale</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="n">LIMIT</span><span class="p">]</span>
</pre></div>
<p>This will work but we just added another query.</p>
<p>Can we do better? I think we can:</p>
<div class="highlight"><pre><span></span><span class="n">LIMIT</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">Sale</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:(</span><span class="n">LIMIT</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">></span> <span class="n">LIMIT</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">ExceededLimit</span><span class="p">(</span><span class="n">LIMIT</span><span class="p">)</span>
<span class="k">return</span> <span class="n">data</span>
</pre></div>
<p>Instead of fetching 100 rows, we fetch 100 + 1 = 101 rows. <strong>If the 101 row exists it's enough for us to know there is more than 100 rows</strong>. Or in other words, fetching LIMIT + 1 rows is the least we need to make sure there are no more than LIMIT rows in the query result.</p>
<p>Remember the LIMIT + 1 trick, it can come pretty handy at times.</p>
<hr>
<h3 id="select-for-update-of"><a class="toclink" href="#select-for-update-of">Select for update β¦ of</a></h3>
<p>This one we learned the hard way. We started getting errors in the middle of the night about transactions timing out due to locks in the database.</p>
<p>A common <a href="/bullet-proofing-django-models">pattern for manipulating a transaction</a> in our code would look like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="c1"># ...</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">transaction</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"> <span class="o">.</span><span class="n">select_related</span><span class="p">(</span>
</span> <span class="s1">'user'</span><span class="p">,</span>
<span class="s1">'product'</span><span class="p">,</span>
<span class="s1">'product__category'</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span>
<span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>Manipulating the transaction usually involves some properties from the user and the product so we often use select_related to force a join and save some queries.</p>
<p>Updating the transaction also involves obtaining a lock to make sure it's not being manipulated by anyone else.</p>
<p>Now, <strong>do you see the problem?</strong> NO? Neither did we.</p>
<p>We had some ETL processes running at night performing maintenance on the product and user tables. These ETLs performed updates and inserts to the tables so they also obtained locks on the tables.</p>
<p>So what was the problem? <strong>When select_for_update is used along with select_related, Django will attempt to obtain a lock on all the tables in the query.</strong></p>
<p>The code we used to fetch the transaction tried to obtain a lock on both the transaction table and the users, product and category tables. Once the ETL locked the last three tables in the middle of the night transactions started to fail.</p>
<p>Once we had a better understanding of the problem we started looking for ways to lock only the necessary table - the transaction table. Luckily <a href="https://docs.djangoproject.com/en/2.0/releases/2.0/#database-backend-api" rel="noopener">A new option to select_for_update just became available in Django 2.0</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span> <span class="k">as</span> <span class="n">db_transaction</span>
<span class="c1"># ...</span>
<span class="k">with</span> <span class="n">db_transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
<span class="n">transaction</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">select_related</span><span class="p">(</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="s1">'product'</span><span class="p">,</span>
<span class="s1">'product__category'</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">(</span>
</span><span class="hll"> <span class="n">of</span><span class="o">=</span><span class="p">(</span><span class="s1">'self'</span><span class="p">,)</span>
</span><span class="hll"> <span class="p">)</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span>
<span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>The <code>of</code> option was added to <a href="https://docs.djangoproject.com/en/2.0/ref/models/querysets/#django.db.models.query.QuerySet.select_for_update" rel="noopener">select_for_update</a>. Using <code>of</code> we can explicitly state which tables we want to lock. <code>self</code> is a special keyword indicating we want to lock the model we are working on, in this case, the Transaction.</p>
<p>Currently, this feature is only available for the PostgreSQL and Oracle backends.</p>
<hr>
<h3 id="fk-indexes"><a class="toclink" href="#fk-indexes">FK Indexes</a></h3>
<p>When creating a model, Django will automatically create a B-Tree index on any foreign key. B-Tree indexes can get pretty heavy and sometimes they are not really necessary.</p>
<p>A classic example is a through model for an M2M relation:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Membership</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">group</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span><span class="n">Group</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
</pre></div>
<p>In the model above <strong>Django will implicitly create two indexes</strong> - one for user and one for group.</p>
<p>Another common pattern in M2M models is adding a unique constraint on the two fields. In our case it means that a user can only be a member of the same group once:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Membership</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">group</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span><span class="n">Group</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">unique_together</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="p">(</span><span class="s1">'group'</span><span class="p">,</span> <span class="s1">'user'</span><span class="p">,)</span>
</span> <span class="p">)</span>
</pre></div>
<p>The unique_together will also create an index on both fields. So <strong>we get one model with two fields and three indexes</strong>.</p>
<p>Depending on the work we do with this model, many times we can dismiss the FK indexes and keep only the one created by the unique constraint:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Membership</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">group</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">Group</span><span class="p">,</span>
<span class="hll"> <span class="n">db_index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">User</span><span class="p">,</span>
<span class="hll"> <span class="n">db_index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">unique_together</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="s1">'group'</span><span class="p">,</span> <span class="s1">'user'</span><span class="p">,)</span>
<span class="p">)</span>
</pre></div>
<p>Removing redundant indexes will make insert and updates faster, plus, our database is now lighter which is always a good thing.</p>
<hr>
<h3 id="order-of-columns-in-composite-index"><a class="toclink" href="#order-of-columns-in-composite-index">Order of columns in composite index</a></h3>
<p>Indexes with more than one column are called <strong>composite indexes</strong>. In B-Tree composite indexes the first column is indexed using a tree structure. From the leafs of the first level a new tree is created for the second level and so on.</p>
<p><strong>The order of the columns in the index is significant</strong>.</p>
<p>In the example above we would get a tree for groups first, and for each group another tree for all it's users.</p>
<p>The rule of thumb for B-Tree composite indexes is to <strong>make the secondary indexes as small as possible</strong>. In other words, columns with high cardinality (more distinct values) should come first.</p>
<p>In our example it's reasonable to assume there are more users than groups so puting the user column first will make the secondary index on group, smaller.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Membership</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">group</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">Group</span><span class="p">,</span>
<span class="n">db_index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">User</span><span class="p">,</span>
<span class="n">db_index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">unique_together</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="s1">'group'</span><span class="p">,</span> <span class="p">)</span>
</span> <span class="p">)</span>
</pre></div>
<p>This is just a rule of thumb and it should be taken with a grain of salt. The final indexing should be optimized for the specific use case. The main point here is to <strong>be aware of implicit indexes and the significance of the column order in composite indexes.</strong></p>
<hr>
<h3 id="brin-indexes"><a class="toclink" href="#brin-indexes">BRIN indexes</a></h3>
<p>A B-Tree index is structured like a tree. The cost of looking up a single value is the height of the tree + 1 for the random access to the table. This makes B-Tree indexes ideal for unique constraints and (some) range queries.</p>
<p><strong>The disadvantage of B-Tree index is its size - B-Tree indexes can get big.</strong></p>
<p>It's not uncommon to think there are no alternatives but databases offer <a href="https://medium.com/@Alibaba_Cloud/principles-and-applications-of-the-index-types-supported-by-postgresql-481f59bab67d" rel="noopener">other types of indexes for specific use cases</a>.</p>
<p>Starting with Django 1.11 there is a <a href="https://docs.djangoproject.com/en/2.0/ref/models/indexes/" rel="noopener">new Meta option for creating indexes</a> on a model. This gives us an opportunity to explore other types of indexes.</p>
<p>PostgreSQL has a very useful type of index called BRIN (Block Range Index). <strong>Under some circumstances BRIN indexes can be more efficient than B-Tree indexes.</strong></p>
<p>Let's see what <a href="https://www.postgresql.org/docs/9.5/static/brin-intro.html" rel="noopener">the official documentation</a> has
to say first:</p>
<blockquote>
<p><em>BRIN is designed for handling very large tables in which certain columns have
some natural correlation with their physical location within the table.</em></p>
</blockquote>
<p>To understand this statement it's important to understand how BRIN index works. As the name suggest, a BRIN index will create a mini index on a range of
adjacent blocks in the table. The index is very small and it can only say if a certain value is <strong>definitely not in the range </strong>or if <strong>it might be in the range</strong> of indexed blocks.</p>
<p>Let's do a simplified example of how BRIN works to help us understand.</p>
<p>Say we have these values in a column, each is one block:</p>
<div class="highlight"><pre><span></span>1, 2, 3, 4, 5, 6, 7, 8, 9
</pre></div>
<p>Let's create a range for each 3 adjacent blocks:</p>
<div class="highlight"><pre><span></span>[1,2,3], [4,5,6], [7,8,9]
</pre></div>
<p>For each range we are going to <strong>keep the minimum and maximum</strong> value in the range:</p>
<div class="highlight"><pre><span></span>[1β3], [4β6], [7β9]
</pre></div>
<p>Using this index, let's try to search for the value 5:</p>
<ul>
<li><code>[1β3]</code> - Definitely not here.</li>
<li><code>[4β6]</code> - Might be here.</li>
<li><code>[7β9]</code> - Definitely not here.</li>
</ul>
<p>Using the index we limited our search to blocks 4β6.</p>
<p>Let's take another example, this time the values in the column are not going to be nicely sorted:</p>
<div class="highlight"><pre><span></span>[2,9,5], [1,4,7], [3,8,6]
</pre></div>
<p>And this is our index with the minimum and maximum value in each range:</p>
<div class="highlight"><pre><span></span>[2β9], [1β7], [3β8]
</pre></div>
<p>Let's try to search for the value 5:</p>
<ul>
<li><code>[2β9]</code> - Might be here.</li>
<li><code>[1β7]</code> - Might be here.</li>
<li><code>[3β8]</code> - Might be here.</li>
</ul>
<p>The index is useless - not only did it not limit the search at all, we actually had to read more because we fetched both the index and the entire table.</p>
<p>Going back to the documentation:</p>
<blockquote>
<p><em>β¦columns have some natural correlation with their physical location within the
table.</em></p>
</blockquote>
<p>This is key for BRIN indexes. To get the most out of it, <strong>the values in the column must be roughly sorted or clustered on disk.</strong></p>
<p>Now back to Django, <strong>what field do we have that is often indexed and will most likely be naturally sorted on disk?</strong> That's right, I'm looking at you <a href="https://docs.djangoproject.com/en/2.0/ref/models/fields/#datefield" rel="noopener">auto_now_add</a>.</p>
<p>A very common pattern in Django models is this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SomeModel</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">DatetimeField</span><span class="p">(</span>
<span class="hll"> <span class="n">auto_now_add</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span> <span class="p">)</span>
</pre></div>
<p>When auto_now_add is used Django will automatically populate the field with the current time when the row is created. A <code>created</code> field is usually also a great candidate for queries so it's often indexed.</p>
<p>Let's add a BRIN index on <code>created</code>:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="kn">from</span> <span class="nn">django.contrib.postgres.indexes</span> <span class="kn">import</span> <span class="n">BrinIndex</span>
</span>
<span class="k">class</span> <span class="nc">SomeModel</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">DatetimeField</span><span class="p">(</span>
<span class="n">auto_now_add</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="n">BrinIndex</span><span class="p">(</span><span class="n">fields</span><span class="o">=</span><span class="p">[</span><span class="s1">'created'</span><span class="p">]),</span>
</span> <span class="p">)</span>
</pre></div>
<p>To get a sense of the difference in size I created a table with ~2M rows with a date field that is naturally sorted on disk:</p>
<ul>
<li>B-Tree index: 37 MB</li>
<li>BRIN index: 49 KB</li>
</ul>
<p>That's right, no mistake.</p>
<p>There are a lot more to consider when creating indexes than the size of the
index. But now, with Django 1.11 support for indexes, we can easily integrate
new types of indexes into our apps and make them lighter and faster.</p>How to Add a Text Filter to Django Admin2018-01-02T00:00:00+02:002018-01-02T00:00:00+02:00Haki Benitatag:hakibenita.com,2018-01-02:/how-to-add-a-text-filter-to-django-admin<p>Django Admin search fields are great, throw a bunch of fields in search_fields and Django will handle the rest. The problem with search field begins when there are too many of them. This is how we replaced Django search with text filters for specific fields, and made Django admin much faster.</p><hr>
<p>When creating a new Django Admin page a common conversation between the developer and the support personal might sound like this:</p>
<blockquote>
<p><strong>Developer</strong>: Hey, I'm adding a new admin page for transactions. Can you tell
me how you want to search for transactions?</p>
<p><strong>Support</strong>: Sure, I usually just search by the username.</p>
<p><strong>Developer</strong>: Cool.</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<blockquote>
<p>Anything else?</p>
<p><strong>Support</strong>: I sometimes also want to search by the user email address.</p>
<p><strong>Developer</strong>: OK.</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="hll"> <span class="n">user__email</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<blockquote>
<p><strong>Support</strong>: And the first and last name of course.</p>
<p><strong>Developer:</strong> Yeah, OK.</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="n">user__email</span><span class="p">,</span>
<span class="hll"> <span class="n">user__first_name</span><span class="p">,</span>
</span><span class="hll"> <span class="n">user__last_name</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<blockquote>
<p><strong>Developer:</strong> Is that it?</p>
<p><strong>Support</strong>: Well, sometimes I need to search by the payment voucher number.</p>
<p><strong>Developer</strong>: OK.</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="n">user__email</span><span class="p">,</span>
<span class="n">user__first_name</span><span class="p">,</span>
<span class="n">user__last_name</span><span class="p">,</span>
<span class="hll"> <span class="n">payment__voucher_number</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<blockquote>
<p><strong>Developer</strong>: Anything else?</p>
<p><strong>Support</strong>: Some customers send their invoices and ask questions so I search by the invoice number as well.</p>
<p><strong>Developer</strong>: FINE!</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="n">user__email</span><span class="p">,</span>
<span class="n">user__first_name</span><span class="p">,</span>
<span class="n">user__last_name</span><span class="p">,</span>
<span class="n">payment__voucher_number</span><span class="p">,</span>
<span class="hll"> <span class="n">invoice__invoice_number</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<blockquote>
<p><strong>Developer</strong>: OK, are you sure this is it?</p>
<p><strong>Support</strong>: Well, developers sometimes forward tickets to us and they use these long random strings. I'm never really sure what they are so I just search and hope for the best.</p>
<p><strong>Developer:</strong> These are called UUID's.</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="n">search_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">user__username</span><span class="p">,</span>
<span class="n">user__email</span><span class="p">,</span>
<span class="n">user__first_name</span><span class="p">,</span>
<span class="n">user__last_name</span><span class="p">,</span>
<span class="n">payment__voucher_number</span><span class="p">,</span>
<span class="n">invoice__invoice_number</span><span class="p">,</span>
<span class="n">uid</span><span class="p">,</span>
<span class="hll"> <span class="n">user__uid</span><span class="p">,</span>
</span><span class="hll"> <span class="n">payment__uid</span><span class="p">,</span>
</span><span class="hll"> <span class="n">invoice__uid</span><span class="p">,</span>
</span><span class="p">)</span>
</pre></div>
<blockquote>
<p><strong>Developer:</strong> So is that it?</p>
<p><strong>Support</strong>: Yes, for nowβ¦</p>
</blockquote>
<h3 id="the-problem-with-search-fields"><a class="toclink" href="#the-problem-with-search-fields">The Problem With Search Fields</a></h3>
<p><a href="https://docs.djangoproject.com/en/2.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.search_fields" rel="noopener">Django Admin search fields</a> are great, throw a bunch of fields in <code>search_fields</code> and Django will handle the rest.</p>
<p><strong>The problem with search field begins when there are too many of them.</strong></p>
<p>When the admin user want to search by UID or email, Django has no idea this is what the user intended so it has to search by all the fields listed in <code>search_fields</code>. These "match any" queries have huge WHERE clauses and lots of joins and can quickly become very slow.</p>
<p><strong>Using a regular ListFilter is not an option</strong> -<code>ListFilter</code> will render a list of choices from the distinct values of the field. Some fields we listed above are unique and the others have many distinct values - <strong>Showing choices is not an option.</strong></p>
<h3 id="bridging-the-gap-between-django-and-the-user"><a class="toclink" href="#bridging-the-gap-between-django-and-the-user">Bridging the gap between Django and the user</a></h3>
<p>We started thinking of ways we can create multiple search fields - one for each field or group of fields. We thought that if the user want to search by email or UID there is no reason to search by any other field.</p>
<p>After some thought we came up with a solution - <strong>a custom <a href="https://docs.djangoproject.com/en/2.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_filter" rel="noopener">SimpleListFilter</a></strong>:</p>
<ul>
<li>ListFilter allows for custom filtering logic.</li>
<li>ListFilter can have a custom template.</li>
<li>Django already has support for multiple ListFilters.</li>
</ul>
<p>We wanted it to look like this:</p>
<figure><img alt="A text list filter" src="https://hakibenita.com/images/01-how-to-add-a-text-filter-to-django-admin.png"><figcaption>A text list filter</figcaption>
</figure>
<h3 id="implementing-inputfilter"><a class="toclink" href="#implementing-inputfilter">Implementing <code>InputFilter</code></a></h3>
<p>What we want to do is have a <strong>ListFilter with a text input instead of choices.</strong></p>
<p>Before we dive into the implementation, let's start from the end. This is how we want to use our <code>InputFilter</code> in a <code>ModelAdmin</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">UIDFilter</span><span class="p">(</span><span class="n">InputFilter</span><span class="p">):</span>
<span class="hll"> <span class="n">parameter_name</span> <span class="o">=</span> <span class="s1">'uid'</span>
</span> <span class="n">title</span> <span class="o">=</span> <span class="n">_</span><span class="p">(</span><span class="s1">'UID'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">queryset</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">()</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">uid</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">()</span>
<span class="hll"> <span class="k">return</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span> <span class="o">|</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">payment__uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span> <span class="o">|</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">user__uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span>
</span><span class="hll"> <span class="p">)</span>
</span></pre></div>
<p>And use it like any other list filter in a <code>ModelAdmin</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TransactionAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">list_filter</span> <span class="o">=</span> <span class="p">(</span>
<span class="hll"> <span class="n">UUIDFilter</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<ul>
<li>We create a custom filter for the uuid field - <code>UIDFilter</code>.</li>
<li>We set the <code>parameter_name</code> in the URL to be <code>uid</code>. A URL filtered by uid will look like this <code>/admin/app/transaction?uid=<uid></code></li>
<li>If the user entered a uid we search by transaction uid, payment uid or user uid.</li>
</ul>
<p><strong>So far this is just like a regular custom ListFilter.</strong></p>
<p>Now that we have a better idea of what we want let's implement our <code>InputFilter</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">InputFilter</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">SimpleListFilter</span><span class="p">):</span>
<span class="n">template</span> <span class="o">=</span> <span class="s1">'admin/input_filter.html'</span>
<span class="k">def</span> <span class="nf">lookups</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">model_admin</span><span class="p">):</span>
<span class="c1"># Dummy, required to show the filter.</span>
<span class="k">return</span> <span class="p">((),)</span>
</pre></div>
<p>We inherit from <code>SimpleListFilter</code> and override the template. We don't have any lookups and we want the template to render a text input instead of choices:</p>
<div class="highlight"><pre><span></span><span class="cm"><!-- templates/admin/input_filter.html --></span>
{% load i18n %}
<span class="p"><</span><span class="nt">h3</span><span class="p">></span>{% blocktrans with filter_title=title %} By {{ filter_title }} {% endblocktrans %}<span class="p"></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">form</span> <span class="na">method</span><span class="o">=</span><span class="s">"GET"</span> <span class="na">action</span><span class="o">=</span><span class="s">""</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">input</span>
<span class="na">type</span><span class="o">=</span><span class="s">"text"</span>
<span class="na">value</span><span class="o">=</span><span class="s">"{{ spec.value|default_if_none:'' }}"</span>
<span class="na">name</span><span class="o">=</span><span class="s">"{{ spec.parameter_name }}"</span><span class="p">/></span>
<span class="p"></</span><span class="nt">form</span><span class="p">></span>
<span class="p"></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
</pre></div>
<p>We use similar markup to Django's existing list filter to make it native. The template renders a simple form with a GET action and a text field for the parameter. When this form is submitted the URL will be updated with the parameter name and the submitted value.</p>
<h3 id="play-nice-with-other-filters"><a class="toclink" href="#play-nice-with-other-filters">Play Nice With Other Filters</a></h3>
<p>So far our filter works but only if there are no other filters. If we want to play nice with other filters we need to consider them in our form. To do that, we need to get their values.</p>
<p>The list filter has another function called "choices". The function accepts a <code>changelist</code> object that contains all the information about the current view and return a list of choices.</p>
<p>We don't have any choices, so we are going to use this function to extract all the filters that were applied to the queryset and expose them to the template:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">InputFilter</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">SimpleListFilter</span><span class="p">):</span>
<span class="n">template</span> <span class="o">=</span> <span class="s1">'admin/input_filter.html'</span>
<span class="k">def</span> <span class="nf">lookups</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">model_admin</span><span class="p">):</span>
<span class="c1"># Dummy, required to show the filter.</span>
<span class="k">return</span> <span class="p">((),)</span>
<span class="k">def</span> <span class="nf">choices</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">changelist</span><span class="p">):</span>
<span class="c1"># Grab only the "all" option.</span>
<span class="n">all_choice</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">choices</span><span class="p">(</span><span class="n">changelist</span><span class="p">))</span>
<span class="n">all_choice</span><span class="p">[</span><span class="s1">'query_parts'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">changelist</span><span class="o">.</span><span class="n">get_filters_params</span><span class="p">()</span><span class="o">.</span><span class="n">items</span><span class="p">()</span>
<span class="k">if</span> <span class="n">k</span> <span class="o">!=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parameter_name</span>
<span class="p">)</span>
<span class="k">yield</span> <span class="n">all_choice</span>
</pre></div>
<p>To include the filters we <strong>add a hidden input field for each parameter</strong>:</p>
<div class="highlight"><pre><span></span><span class="cm"><!-- templates/admin/input_filter.html --></span>
{% load i18n %}
<span class="p"><</span><span class="nt">h3</span><span class="p">></span>{% blocktrans with filter_title=title %} By {{ filter_title }} {% endblocktrans %}<span class="p"></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">></span>
{% with choices.0 as all_choice %}
<span class="p"><</span><span class="nt">form</span> <span class="na">method</span><span class="o">=</span><span class="s">"GET"</span> <span class="na">action</span><span class="o">=</span><span class="s">""</span><span class="p">></span>
<span class="hll"> {% for k, v in all_choice.query_parts %}
</span><span class="hll"> <span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">name</span><span class="o">=</span><span class="s">"{{ k }}"</span> <span class="na">value</span><span class="o">=</span><span class="s">"{{ v }}"</span> <span class="p">/></span>
</span><span class="hll"> {% endfor %}
</span>
<span class="p"><</span><span class="nt">input</span>
<span class="na">type</span><span class="o">=</span><span class="s">"text"</span>
<span class="na">value</span><span class="o">=</span><span class="s">"{{ spec.value|default_if_none:'' }}"</span>
<span class="na">name</span><span class="o">=</span><span class="s">"{{ spec.parameter_name }}"</span><span class="p">/></span>
<span class="p"></</span><span class="nt">form</span><span class="p">></span>
{% endwith %}
<span class="p"></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
</pre></div>
<p>Now we have a filter with a text input that plays nice with other filters. The only thing left to do it to <strong>add a "clear" option.</strong></p>
<p>To clear the filter we need a URL that include all filters except ours:</p>
<div class="highlight"><pre><span></span><span class="cm"><!-- templates/admin/input_filter.html --></span>
...
<span class="p"><</span><span class="nt">input</span>
<span class="na">type</span><span class="o">=</span><span class="s">"text"</span>
<span class="na">value</span><span class="o">=</span><span class="s">"{{ spec.value|default_if_none:'' }}"</span>
<span class="na">name</span><span class="o">=</span><span class="s">"{{ spec.parameter_name }}"</span><span class="p">/></span>
<span class="hll">{% if not all_choice.selected %}
</span><span class="hll"> <span class="p"><</span><span class="nt">strong</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"{{ all_choice.query_string }}"</span><span class="p">></span>β¨ {% trans 'Remove' %}<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">strong</span><span class="p">></span>
</span><span class="hll">{% endif %}
</span>
...
</pre></div>
<p><strong>VoilΓ !</strong></p>
<p>This is what we get:</p>
<figure><img alt="InputFilter with other filters and a remove button" src="https://hakibenita.com/images/02-how-to-add-a-text-filter-to-django-admin.png"><figcaption>InputFilter with other filters and a remove button</figcaption>
</figure>
<p>The complete code of admin.py can be found in <a href="https://gist.githubusercontent.com/hakib/1491a848e71078dae81fca48c46cc258/raw/19934611bcdd6d806aabaf00f55f582cd40fffd8/admin.py" rel="noopener">this gist</a> and the complete code of the tempalte can be found in <a href="https://gist.githubusercontent.com/hakib/1491a848e71078dae81fca48c46cc258/raw/19934611bcdd6d806aabaf00f55f582cd40fffd8/input_filter.html" rel="noopener">this gist</a>.</p>
<hr>
<h4 id="bonus"><a class="toclink" href="#bonus">Bonus</a></h4>
<h3 id="search-multiple-words-similar-to-django-search"><a class="toclink" href="#search-multiple-words-similar-to-django-search">Search Multiple Words Similar to Django Search</a></h3>
<p>You might have noticed that when searching multiple words <a href="https://github.com/django/django/blob/master/django/contrib/admin/options.py#L972" rel="noopener">Django find results that include at least one of the words and not all</a>.</p>
<p>For example, if you search for a user "John Duo" Django will find both "John Foo" and "Bar Due". This is very convenient when searching for things like full name, product names and so on.</p>
<p>We can implement a similar condition using our <code>InputFilter</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Q</span>
<span class="k">class</span> <span class="nc">UserFilter</span><span class="p">(</span><span class="n">InputFilter</span><span class="p">):</span>
<span class="n">parameter_name</span> <span class="o">=</span> <span class="s1">'user'</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">_</span><span class="p">(</span><span class="s1">'User'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">queryset</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="n">term</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">()</span>
<span class="k">if</span> <span class="n">term</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span>
<span class="hll"> <span class="n">any_name</span> <span class="o">=</span> <span class="n">Q</span><span class="p">()</span>
</span><span class="hll"> <span class="k">for</span> <span class="n">bit</span> <span class="ow">in</span> <span class="n">term</span><span class="o">.</span><span class="n">split</span><span class="p">():</span>
</span><span class="hll"> <span class="n">any_name</span> <span class="o">&=</span> <span class="p">(</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">user__first_name__icontains</span><span class="o">=</span><span class="n">bit</span><span class="p">)</span> <span class="o">|</span>
</span><span class="hll"> <span class="n">Q</span><span class="p">(</span><span class="n">user__last_name__icontains</span><span class="o">=</span><span class="n">bit</span><span class="p">)</span>
</span><span class="hll"> <span class="p">)</span>
</span>
<span class="k">return</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">any_name</span><span class="p">)</span>
</pre></div>
<p><strong>This is it!</strong></p>Django Admin Range-Based Date Hierarchy2017-12-11T00:00:00+02:002017-12-11T00:00:00+02:00Haki Benitatag:hakibenita.com,2017-12-11:/django-admin-range-based-date-hierarchy<p>A few weeks ago we encountered a major performance regression in one of our main admin pages. The page took more than 10 seconds to load (at best) and hit the query execution timeout at worst. When we investigated the issue, we found that the date hierarchy was the cause for most of the time spent loading the admin page. In the article we describe how we significantly improved the performance of Django Admin date hierarchy</p><hr>
<p>A few weeks ago we encountered a major performance regression in one of our main admin pages. The page took more than 10 seconds to load (at best) and hit the query execution timeout at worst.</p>
<p>The page was an admin list view of a transactions model, one of the main models in our app. The model is used by support personal on a daily basis. It has millions of rows, and used several joins to display relevant information.</p>
<p>The most common use for the page was to filter transactions by a certain period, most commonly the last day. We used Django admin date hierarchy to drill down on the creation date of the records. When we investigated the issue, we found that the date hierarchy was the cause for most of the time spent loading the admin page.</p>
<hr>
<h3 id="identifying-the-problem"><a class="toclink" href="#identifying-the-problem">Identifying the Problem</a></h3>
<p>The filtered URL looked like this:</p>
<div class="highlight"><pre><span></span>/admin/transactions/created__year=2017&created__month=11
</pre></div>
<p>We identified the "heavy" query as the one fetching the data to populate the list. The query performed the join with all the lookup tables and applied the filters we listed in the ModelAdmin. The relevant WHERE clause was:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">BETWEEN</span>
<span class="w"> </span><span class="s1">'2017-01-01T00:00:00+00:00'</span><span class="p">::</span><span class="n">timestamptz</span><span class="w"> </span><span class="k">AND</span>
<span class="w"> </span><span class="s1">'2017-12-31T23:59:59.999999+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'month'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">AT</span><span class="w"> </span><span class="k">TIME</span><span class="w"> </span><span class="k">ZONE</span><span class="w"> </span><span class="s1">'UTC'</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">11</span><span class="p">)</span>
</pre></div>
<p>When we inspected the execution plan of this query we found this snippet at the bottom:</p>
<div class="highlight"><pre><span></span><span class="k">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">ix</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">transactions_transaction</span><span class="w"> </span><span class="p">(</span><span class="k">cost</span><span class="o">=</span><span class="mf">0.43..90663.65</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mf">4561</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mf">471</span><span class="p">)</span>
<span class="w"> </span><span class="k">Index</span><span class="w"> </span><span class="n">Cond</span><span class="p">:</span><span class="w"> </span><span class="p">((</span><span class="n">created</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'2017-01-01 02:00:00+02'</span><span class="o">::</span><span class="nb">timestamp</span><span class="w"> </span><span class="nb">with time zone</span><span class="p">)</span>
<span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="p">(</span><span class="n">created</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'2018-01-01 01:59:59.999999+02'</span><span class="o">::</span><span class="nb">timestamp</span><span class="w"> </span><span class="nb">with time zone</span><span class="p">))</span>
</span><span class="w"> </span><span class="k">Filter</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">date_part</span><span class="p">(</span><span class="s1">'month'</span><span class="o">::</span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">timezone</span><span class="p">(</span><span class="s1">'UTC'</span><span class="o">::</span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">created</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'11'</span><span class="o">::</span><span class="nb">double precision</span><span class="p">)</span>
</pre></div>
<p>PostgreSQL decided to use the index on the <code>created</code> column. The estimate was 4,561 rows and the cost estimate maxed at 90,000. This estimate was wildly inaccurate which made us drew our first conclusion:</p>
<p><strong>The estimate made by the database is very low and inaccurate.</strong></p>
<p>We first suspected that the low estimate was due to stale statistics on the column. We gathered stats on the column and tried again:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">analyze</span><span class="w"> </span><span class="n">transaction_transaction</span><span class="w"> </span><span class="p">(</span><span class="n">created</span><span class="p">);</span>
</pre></div>
<p>No change. Estimate remained the same.</p>
<p>We decided to give the query another look. We found this snippet in the WHERE clause a bit odd:</p>
<div class="highlight"><pre><span></span><span class="k">WHERE</span>
<span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">BETWEEN</span>
<span class="w"> </span><span class="s1">'2017-01-01T00:00:00+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2017-12-31T23:59:59.999999+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="hll"><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'month'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">AT</span><span class="w"> </span><span class="k">TIME</span><span class="w"> </span><span class="k">ZONE</span><span class="w"> </span><span class="s1">'UTC'</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">11</span>
</span></pre></div>
<p>Django applied a filter on the <strong>full year </strong>and then used <code>EXTRACT</code> to filter only the month.</p>
<p>We decided to check what happens if we remove the <code>EXTRACT</code> function and instead simplify the condition:</p>
<div class="highlight"><pre><span></span><span class="k">WHERE</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">BETWEEN</span>
<span class="w"> </span><span class="s1">'2017-11-01T00:00:00+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2017-11-30T23:59:59.999999+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
</pre></div>
<p>The execution plan for that query was:</p>
<div class="highlight"><pre><span></span>Index Scan using ix on transactions_transaction (cost=0.43..1265.75 rows=13716 width=471)
Index Cond: ((created >= β2017β11β01 02:00:00+02'::timestamp with time zone)
AND (created <= β2017β12β01 01:59:59.999999+02'::timestamp with time zone))
</pre></div>
<p>The estimate is still low but the cost estimate is now significantly lower. This execution plan helped us reach our second conclusion:</p>
<p><strong>The way Django apply the filter makes it difficult for the database to optimize the query.</strong></p>
<hr>
<h3 id="the-problem-with-the-way-date-hierarchy-is-implemented"><a class="toclink" href="#the-problem-with-the-way-date-hierarchy-is-implemented">The Problem With the Way Date Hierarchy is Implemented</a></h3>
<p>Django filters the queryset for a given level in the date hierarchy using a database function to extract the relevant date part.</p>
<p><strong>A function is opaque to the database optimizer</strong>. If you have a range-based (BTREE) index on the field, using <code>EXTRACT</code> does not limit the range at all. The <strong>index is not utilized</strong> properly which might lead to a <strong>sub-optimal execution plan</strong>.</p>
<p>Once we had a better understanding of the problem we started discussing possible solutions.</p>
<h3 id="function-based-index"><a class="toclink" href="#function-based-index">Function based index</a></h3>
<p>A <a href="https://www.postgresql.org/docs/9.1/static/indexes-expressional.html" rel="noopener">function based index</a> is an index on an expression. In our case, an appropriate function based index might look like this:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">transactions_transaction_created_month_brin</span>
<span class="k">ON</span><span class="w"> </span><span class="n">transactions_transaction</span>
<span class="k">USING</span><span class="w"> </span><span class="n">BRIN</span><span class="p">(</span><span class="k">EXTRACT</span><span class="p">(</span><span class="s1">'month'</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">AT</span><span class="w"> </span><span class="k">TIME</span><span class="w"> </span><span class="k">ZONE</span><span class="s1">'UTC'</span><span class="p">));</span>
</pre></div>
<p>The index made the query run faster but it came at a cost.</p>
<p>The downside to this approach is having to maintain additional indexes for each level of the hierarchy (day and month). <strong>Additional indexes slow down insert and update operations, and take up space.</strong></p>
<p>Another downside is the index size. We used a <a href="https://www.postgresql.org/docs/9.5/static/brin-intro.html" rel="noopener">BRIN index</a> in this case to minimize the size of the index. The table is naturally clustered by the creation date so this is an ideal use case for a BRIN index. When this is not the case, a similar BTREE index can become quite heavy.</p>
<div class="admonition tip">
<p class="admonition-title">see also</p>
<p>More on how BRIN indexes work in <a href="/9-django-tips-for-working-with-databases#brin-indexes">9-django-tips-for-working-with-databases</a>.</p>
</div>
<h3 id="simplify-the-condition-used-by-django-date-hierarchy"><a class="toclink" href="#simplify-the-condition-used-by-django-date-hierarchy">Simplify the Condition Used by Django Date Hierarchy</a></h3>
<p>We decided to see if we can <strong>simplify the condition used by Django</strong> to apply the date hierarchy.</p>
<p>To implement the filter differently we first need to understand how Django admin applies filters on a queryset.</p>
<p>In <a href="https://github.com/django/django/blob/master/django/contrib/admin/views/main.py" rel="noopener">django/contrib/admin/views/main.py</a> there is a function called
<a href="https://github.com/django/django/blob/master/django/contrib/admin/views/main.py#L98" rel="noopener">get_filters</a>. It might look scary at first but what it does is:</p>
<ol>
<li>Extract all the query parameters from the URL.</li>
<li>Get all the <code>ListFilter</code> declared in the <code>ModelAdmin</code> and apply them one by one to the parameter list.</li>
<li>Each <code>ListFilter</code> receive the parameter list, takes what it needs and remove the value from the parameter list.</li>
<li>Any parameters left after all <code>ListFilter</code>'s were applied are processed using the "default" Django filter. This is why, for example, it's possible to filter the queryset directly from the URL even when a <code>ListFilter</code> is not explicitly defined.</li>
</ol>
<p>If you look at the date hierarchy query parameters you'll see that there is nothing special about them, they are just regular URL params. This sparked an idea:</p>
<p><strong>Implement a ListFilter to grab the relevant date hierarchy parameters from the parameter list and apply a custom filter on the queryset.</strong></p>
<h3 id="the-implementation"><a class="toclink" href="#the-implementation">The Implementation</a></h3>
<p>Let's start by grabbing the date hierarchy fields from the parameter list:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">RangeBasedDateHierarchyListFilter</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ListFilter</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">model_admin</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy_field</span> <span class="o">=</span> <span class="n">model_admin</span><span class="o">.</span><span class="n">date_hierarchy</span>
<span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">date_hierarchy_field_re</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="hll"> <span class="sa">r</span><span class="s1">'^</span><span class="si">{}</span><span class="s1">__(day|month|year)$'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy_field</span><span class="p">)</span>
</span> <span class="p">)</span>
<span class="k">for</span> <span class="n">param</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">params</span><span class="o">.</span><span class="n">keys</span><span class="p">()):</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">date_hierarchy_field_re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">param</span><span class="p">)</span>
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
<span class="n">period</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy</span><span class="p">[</span><span class="n">period</span><span class="p">]</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">params</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">param</span><span class="p">))</span>
</span></pre></div>
<ol>
<li>We take the <code>date_hierarchy</code> field name from the <code>model_admin.</code>In our case it was <code>created</code>.</li>
<li>We create a Regex pattern to identify the parameters in the URL. The pattern is always the name of the date hierarchy field + the period (day, month, year).<br> In our case the possible parameters are <code>created__day</code>, <code>created__month</code> and <code>created__year</code>.</li>
<li>We iterate the parameter list, pop any date hierarchy parameter that match our criteria and store the period and the value in a dict.</li>
</ol>
<p><strong>Now, this is where all the magic happens:</strong></p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">RangeBasedDateHierarchyListFilter</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ListFilter</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">queryset</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="n">tz</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">get_default_timezone</span><span class="p">()</span>
<span class="n">from_date</span><span class="p">,</span> <span class="n">to_date</span> <span class="o">=</span> <span class="n">get_date_range_for_hierarchy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy</span><span class="p">,</span> <span class="n">tz</span><span class="p">)</span>
<span class="k">return</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">**</span><span class="p">{</span>
<span class="s1">'</span><span class="si">{}</span><span class="s1">__gte'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy_field</span><span class="p">):</span> <span class="n">from_date</span><span class="p">,</span>
<span class="s1">'</span><span class="si">{}</span><span class="s1">__lt'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy_field</span><span class="p">):</span> <span class="n">to_date</span><span class="p">,</span>
<span class="p">})</span>
</pre></div>
<p>Django will call the <code>queryset</code> function of our <code>ListFilter</code> with a queryset, and the function is expected to return the filtered queryset.</p>
<p>In the example above this is happening in a separate function (so we can test it, you know...):</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_date_range_for_hierarchy</span><span class="p">(</span><span class="n">date_hierarchy</span><span class="p">,</span> <span class="n">tz</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Generate date range for date hierarchy.</span>
<span class="sd"> date_hierarchy <dict>:</span>
<span class="sd"> year (int)</span>
<span class="sd"> month (int or None)</span>
<span class="sd"> day (int or None)</span>
<span class="sd"> tz <timezone or None>:</span>
<span class="sd"> The timezone in which to generate the datetimes.</span>
<span class="sd"> If None, the datetimes will be naive.</span>
<span class="sd"> Returns (tuple):</span>
<span class="sd"> from_date (datetime.datetime, aware if tz is set) inclusive</span>
<span class="sd"> to_date (datetime.datetime, aware if tz is set) exclusive</span>
<span class="sd"> """</span>
<span class="n">from_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span>
<span class="n">date_hierarchy</span><span class="p">[</span><span class="s1">'year'</span><span class="p">],</span>
<span class="n">date_hierarchy</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'month'</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="n">date_hierarchy</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'day'</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">tz</span><span class="p">:</span>
<span class="n">from_date</span> <span class="o">=</span> <span class="n">tz</span><span class="o">.</span><span class="n">localize</span><span class="p">(</span><span class="n">from_date</span><span class="p">)</span>
<span class="k">if</span> <span class="s1">'day'</span> <span class="ow">in</span> <span class="n">date_hierarchy</span><span class="p">:</span>
<span class="n">to_date</span> <span class="o">=</span> <span class="n">from_date</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">elif</span> <span class="s1">'month'</span> <span class="ow">in</span> <span class="n">date_hierarchy</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">from_date</span><span class="o">.</span><span class="n">day</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">to_date</span> <span class="o">=</span> <span class="p">(</span><span class="n">from_date</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="mi">32</span><span class="p">))</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">day</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">to_date</span> <span class="o">=</span> <span class="n">from_date</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">year</span><span class="o">=</span><span class="n">from_date</span><span class="o">.</span><span class="n">year</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">from_date</span><span class="p">,</span> <span class="n">to_date</span>
</pre></div>
<p>The function receives the dict we constructed in <code>__init__</code> and returns a date range to filter on. The <code>queryset</code> function then applies a simpler range filter that our database can better utilize.</p>
<p>To use the simplified range condition in our <code>ModelAdmin</code> we need to add it as a <code>list_filter</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TransactionAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
<span class="n">list_filter</span> <span class="o">=</span> <span class="p">(</span>
<span class="c1">#...</span>
<span class="n">RangeBasedDateHierarchyListFilter</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<h3 id="the-result"><a class="toclink" href="#the-result">The Result</a></h3>
<p>After we deployed this change the performance of the page improved drastically. Our database was happy, the support team was happy and so were we.</p>
<div class="admonition info">
<p class="admonition-title">source code</p>
<p>See our package <a href="https://github.com/hakib/django-admin-lightweight-date-hierarchy" rel="noopener">django-lightweight-date-hierarchy</a> on github and pypi.</p>
</div>Scaling Django Admin Date Hierarchy2017-10-06T00:00:00+03:002017-10-06T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-10-06:/scaling-django-admin-date-hierarchy<p>The date hierarchy is a great feature but it comes at a price. On very large tables, the way date hierarchy is implemented can make an admin page nearly unusable. In this article we describe the limitations of the date hierarchy, and suggest a way to overcome them.</p><hr>
<div class="admonition info">
<p class="admonition-title">package</p>
<p>We published a package called <a href="https://github.com/hakib/django-admin-lightweight-date-hierarchy" rel="noopener">django-admin-lightweight-date-hierarchy</a> which overrides Django Admin <code>date_hierarchy</code> template tag and eliminates all database queries from it.<br>For the implementation details and the shocking performance analysis read on.</p>
</div>
<hr>
<p>If you are not familiar with Django Admin <a href="https://docs.djangoproject.com/en/1.11/ref/contrib/admin/#django.contrib.admin.ModelAdmin.date_hierarchy" rel="noopener">date_hierarchy</a> you should, it's great. Set the <code>date_hierarchy</code> attribute of a <code>ModelAdmin</code> to a <code>DateField</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Sale</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Sale</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SaleAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="hll"> <span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
</span></pre></div>
<p>And you get a nice drill-down menu at the top of the admin change list:</p>
<figure><img alt="Django date_hierarchy in action" src="https://hakibenita.com/images/01-scaling-django-admin-date-hierarchy.png"><figcaption>Django date_hierarchy in action</figcaption>
</figure>
<p>When selecting a year, Django will filter the data to the selected year, and present a list of months for which there is data in that year.</p>
<p>When selecting a month, Django will apply the filter and present the list of days for which there is data in that month.</p>
<h3 id="date_hierarchy-behind-the-scenes"><a class="toclink" href="#date_hierarchy-behind-the-scenes"><code>date_hierarchy</code> Behind the Scenes</a></h3>
<p>To produce a list of dates for which there is data, <strong>Django has to perform a query</strong>. For example, to produce the list of years, the first level in the hierarchy, Django will execute the following query:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span>
<span class="w"> </span><span class="n">django_datetime_trunc</span><span class="p">(</span><span class="s1">'year'</span><span class="p">,</span><span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="p">,</span><span class="w"> </span><span class="s1">'UTC'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"datetimefield"</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="ss">"sales_sale"</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="ss">"datetimefield"</span><span class="w"> </span><span class="k">ASC</span><span class="p">;</span>
</pre></div>
<p>When a year is selected, Django will execute a query to produce the next level in the hierarchy - months:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span>
<span class="w"> </span><span class="n">django_datetime_trunc</span><span class="p">(</span><span class="s1">'month'</span><span class="p">,</span><span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="p">,</span><span class="w"> </span><span class="s1">'UTC'</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"datetimefield"</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="ss">"sales_sale"</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2017β01β01 00:00:00'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2017β12β31 23:59:59.999999'</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">'2017β01β01 00:00:00'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">'2017β12β31 23:59:59.999999'</span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="ss">"sales_sale"</span><span class="p">.</span><span class="ss">"created"</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="ss">"datetimefield"</span><span class="w"> </span><span class="k">ASC</span><span class="p">;</span>
</pre></div>
<h3 id="how-expensive-is-it"><a class="toclink" href="#how-expensive-is-it">How Expensive is it?</a></h3>
<p>The date hierarchy is a great feature but it comes at a price.</p>
<p>To illustrate the problem we created a simple <code>Sale</code> model with two fields - id and created, and populated it with ~1,000,000 rows.</p>
<p>Using <a href="https://github.com/jazzband/django-debug-toolbar" rel="noopener">django-admin-toolbar</a> we can see how long it takes Django to produce the date hierarchy:</p>
<figure><img alt="Breakdown of SQL queries executed by Django Admin" src="https://hakibenita.com/images/02-scaling-django-admin-date-hierarchy.png"><figcaption>Breakdown of SQL queries executed by Django Admin</figcaption>
</figure>
<p><strong>WOW! The page took a boggling ~8s to load, out of which 7.6 seconds are spent producing the date hierarchy!</strong></p>
<p>Just for comparison, the exact same page without date_hierarchy:</p>
<figure><img alt="Breakdown of SQL queries executed by Django Admin without date hierarchy" src="https://hakibenita.com/images/03-scaling-django-admin-date-hierarchy.png"><figcaption>Breakdown of SQL queries executed by Django Admin without date hierarchy</figcaption>
</figure>
<p><strong>17ms. That's about 99.8% better.</strong></p>
<h3 id="a-possible-solution"><a class="toclink" href="#a-possible-solution">A Possible Solution</a></h3>
<p>Looking at the chart above it's clear that we have a problem. The date hierarchy is weighing our page making it nearly unusable.</p>
<p>Django need to execute a query because it only wants to show dates for which there is data. In our case, <strong>we have sales every day</strong>. Once we make this assumption <strong>we no longer have to query the data to produce a list of dates - we can just show them all!</strong></p>
<p>The idea we came up with is:</p>
<ul>
<li>if the user <strong>selected a month</strong> we show <strong>all of the days in the month</strong>.</li>
<li>If the user <strong>selected a year</strong> we show <strong>all of the months in the year</strong>.</li>
<li>If the user <strong>selected nothing </strong>we need to make an additional assumption - in our case we decided to show <strong>+-3 years from the current year</strong>. This is a compromise we were willing to make for the sake of performance and usability.</li>
</ul>
<p>Now that we have the general idea let's dive into the implementation.</p>
<h3 id="implementation"><a class="toclink" href="#implementation">Implementation</a></h3>
<p>Looking at the output above we can see that the queries originate from a template tag called <code>{% date_hierarchy cl %}</code>. The argument <code>cl</code> is the <code>ChangeList</code> created by the <code>ModelAdmin</code>.</p>
<p>The implementation for the <code>date_hierarchy</code> template tag can be found at <a href="https://github.com/django/django/blob/master/django/contrib/admin/templatetags/admin_list.py#L329" rel="noopener">admin_list.py</a>.The interesting part is where the queries are executed.</p>
<p>Let's take a look at how Django produces a list of months for a given year:</p>
<div class="highlight"><pre><span></span><span class="c1"># ...</span>
<span class="n">year_field</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">__year'</span> <span class="o">%</span> <span class="n">field_name</span>
<span class="c1"># ...</span>
<span class="n">year_lookup</span> <span class="o">=</span> <span class="n">cl</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">year_field</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">link</span><span class="p">(</span><span class="n">filters</span><span class="p">):</span>
<span class="k">return</span> <span class="n">cl</span><span class="o">.</span><span class="n">get_query_string</span><span class="p">(</span><span class="n">filters</span><span class="p">,</span> <span class="p">[</span><span class="n">field_generic</span><span class="p">])</span>
<span class="c1"># ...</span>
<span class="k">elif</span> <span class="n">year_lookup</span><span class="p">:</span>
<span class="hll"> <span class="n">months</span> <span class="o">=</span> <span class="n">cl</span><span class="o">.</span><span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">**</span><span class="p">{</span><span class="n">year_field</span><span class="p">:</span> <span class="n">year_lookup</span><span class="p">})</span>
</span><span class="hll"> <span class="n">months</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">months</span><span class="p">,</span> <span class="s1">'dates'</span><span class="p">)(</span><span class="n">field_name</span><span class="p">,</span> <span class="s1">'month'</span><span class="p">)</span>
</span> <span class="k">return</span> <span class="p">{</span>
<span class="s1">'show'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'back'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="n">link</span><span class="p">({}),</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="n">_</span><span class="p">(</span><span class="s1">'All dates'</span><span class="p">),</span>
<span class="p">},</span>
<span class="s1">'choices'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="n">link</span><span class="p">({</span>
<span class="n">year_field</span><span class="p">:</span> <span class="n">year_lookup</span><span class="p">,</span>
<span class="n">month_field</span><span class="p">:</span> <span class="n">month</span><span class="o">.</span><span class="n">month</span><span class="p">,</span>
<span class="p">}),</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="n">capfirst</span><span class="p">(</span><span class="n">formats</span><span class="o">.</span><span class="n">date_format</span><span class="p">(</span><span class="n">month</span><span class="p">,</span> <span class="s1">'YEAR_MONTH_FORMAT'</span><span class="p">))</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">month</span> <span class="ow">in</span> <span class="n">months</span><span class="p">]</span>
<span class="p">}</span>
<span class="c1"># ...</span>
</pre></div>
<p>Our <code>date_hierarchy</code> is set to the <code>created</code>. If we drill-down on year 2017 we get the following URL:</p>
<div class="highlight"><pre><span></span>http://localhost:8000/admin/sales/sale/?created__year=2017
</pre></div>
<p>In the template tag <code>year_field</code> is <code>created__year</code> and <code>year_lookup</code> is 2017. The generated query applies the filter on <code>created</code> and fetches a list of months there is data for in year 2017 to the variable <code>months</code>.</p>
<p>Let's replace this bit and populate <code>months</code> with a list of all the months
in the year instead:</p>
<div class="highlight"><pre><span></span><span class="c1"># months = cl.queryset.filter(**{year_field: year_lookup})</span>
<span class="c1"># months = getattr(months, 'dates')(field_name, 'month')</span>
<span class="c1"># All months of selected year.</span>
<span class="n">months</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">year_lookup</span><span class="p">),</span> <span class="n">month</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">month</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">13</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<p>After the change:</p>
<figure><img alt="Queries executed by Django Admin after the change" src="https://hakibenita.com/images/04-scaling-django-admin-date-hierarchy.png"><figcaption>Queries executed by Django Admin after the change</figcaption>
</figure>
<p><strong>Awesome! No queries.</strong></p>
<p>And the list view:</p>
<figure><img alt="All month of year 2017 are shown" src="https://hakibenita.com/images/05-scaling-django-admin-date-hierarchy.png"><figcaption>All month of year 2017 are shown</figcaption>
</figure>
<p>Our little change worked! All the months are displayed and <strong>no queries are executed by the date hierarchy</strong>.</p>
<p>Let's do the same for days:</p>
<div class="highlight"><pre><span></span><span class="c1"># days = cl.queryset.filter(**{year_field: year_lookup, month_field: month_lookup})</span>
<span class="c1"># days = getattr(days, dates_or_datetimes)(field_name, 'day')</span>
<span class="c1"># All days of month.</span>
<span class="n">days_in_month</span> <span class="o">=</span> <span class="n">calendar</span><span class="o">.</span><span class="n">monthrange</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">year_lookup</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">month_lookup</span><span class="p">))[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">first_day_of_month</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">year_lookup</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">month_lookup</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">days</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">first_day_of_month</span> <span class="o">+</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="n">i</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">days_in_month</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<p>We use the <code>calendar</code> module to find out how many days there are in a given month.</p>
<p>Let's handle the years. Remember, we fetch +-3 years from today:</p>
<div class="highlight"><pre><span></span><span class="c1"># years = getattr(cl.queryset, dates_or_datetimes)(field_name, 'year')</span>
<span class="c1"># Three years in each direction.</span>
<span class="n">today</span> <span class="o">=</span> <span class="n">get_today</span><span class="p">()</span>
<span class="n">years</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">today</span><span class="o">.</span><span class="n">year</span> <span class="o">-</span> <span class="mi">3</span><span class="p">,</span> <span class="n">today</span><span class="o">.</span><span class="n">year</span> <span class="o">+</span> <span class="mi">3</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<h3 id="integration"><a class="toclink" href="#integration">Integration</a></h3>
<p>Up until now we fiddled with Django's source but we can't really do that. First we <strong>override the template tag to make Django use our implementation</strong>.</p>
<p>Let's copy the function and register a template with the same name:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/templatetags/admin_list.py</span>
<span class="kn">from</span> <span class="nn">django.contrib.admin.templatetags.admin_list</span> <span class="kn">import</span> <span class="n">register</span>
<span class="nd">@register</span><span class="o">.</span><span class="n">inclusion_tag</span><span class="p">(</span><span class="s1">'admin/date_hierarchy.html'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">date_hierarchy</span><span class="p">(</span><span class="n">cl</span><span class="p">):</span>
<span class="c1"># ... (original implementation) ...</span>
</pre></div>
<p>Register the library in our app:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">TEMPLATES</span> <span class="o">=</span> <span class="p">[{</span>
<span class="c1"># ...</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'libraries'</span><span class="p">:</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'admin'</span><span class="p">:</span> <span class="s1">'app.templatetags.admin_list'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">}]</span>
</pre></div>
<p>Now Django uses our template tag instead of his.</p>
<p>The problem with producing the date hierarchy is really an issue only for very large tables. <strong>We don't want to disable the existing behavior - we want to enable it only for very large tables</strong>.</p>
<p>Let's add an attribute on the <code>ModelAdmin</code> to turn the default drill-down behavior on and off:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Sale</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SaleAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
<span class="hll"> <span class="n">date_hierarchy_drilldown</span> <span class="o">=</span> <span class="kc">False</span>
</span></pre></div>
<p>When <code>date_hierarchy_drilldown</code> is set to False our new template tag will not execute queries. Otherwise, we preserve the original behavior.</p>
<p>To implement this we add the following at the start of our implementation of the <code>date_hierarchy</code> template tag:</p>
<div class="highlight"><pre><span></span><span class="n">date_hierarchy_drilldown</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span>
<span class="n">cl</span><span class="o">.</span><span class="n">model_admin</span><span class="p">,</span>
<span class="s1">'date_hierarchy_drilldown'</span><span class="p">,</span>
<span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Now we can re-enable the default behavior when <code>date_hierarchy_drilldown=True</code>.</p>
<p>For example, producing a list of months:</p>
<div class="highlight"><pre><span></span><span class="c1"># ...</span>
<span class="k">if</span> <span class="n">date_hierarchy_drilldown</span><span class="p">:</span>
<span class="n">months</span> <span class="o">=</span> <span class="n">cl</span><span class="o">.</span><span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">**</span><span class="p">{</span><span class="n">year_field</span><span class="p">:</span> <span class="n">year_lookup</span><span class="p">})</span>
<span class="n">months</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">months</span><span class="p">,</span> <span class="n">dates_or_datetimes</span><span class="p">)(</span><span class="n">field_name</span><span class="p">,</span> <span class="s1">'month'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># All months of selected year.</span>
<span class="n">months</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">year_lookup</span><span class="p">),</span> <span class="n">month</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">month</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">13</span><span class="p">)</span>
<span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>This is it! We've successfully scaled date hierarchy to handle millions of rows with a little compromise on UX.</p>
<h3 id="package"><a class="toclink" href="#package">Package</a></h3>
<p>We found this approach useful in several projects so we decided to publish it as a package:</p>
<p>Using it is as simple as</p>
<p>Install it</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>django-admin-lightweight-date-hierarchy
</pre></div>
<p>Add it to your INSTALLED_APPS:</p>
<div class="highlight"><pre><span></span><span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'django_admin_lightweight_date_hierarchy'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Set <code>date_hierarchy_drilldown</code> to False on any ModelAdmin with date_hierarchy to prevent the default drill-down behavior:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">MyModel</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MyModelAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
<span class="hll"> <span class="n">date_hierarchy_drilldown</span> <span class="o">=</span> <span class="kc">False</span>
</span></pre></div>
<p>To squeeze some more juice out of Django Admin check out this post as well:</p>
<p><strong>Cheers!</strong></p>How We Replaced Dozens of Test Fixtures With One Simple Function2017-08-16T00:00:00+03:002017-08-16T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-08-16:/how-we-replaced-dozens-of-test-fixtures-with-one-simple-function<p>We had a large codebase and a lot of tests. Unfortunately, a lot of our tests were a relic from when we were using fixtures extensively. In this article, we describe a different approach that reduced the number of fixtures we maintain.</p><hr>
<p>It all started when we added feature flags to our app. After some deliberation we created a "feature set" model with boolean fields for each feature:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">FeatureSet</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
<span class="n">can_pay_with_credit_card</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
<span class="n">can_save_credit_card</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
<span class="n">can_receive_email_notifications</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">()</span>
</pre></div>
<p>We added a foreign key from the user account to the feature sets model, and created feature sets for "pro", "newbie" and "commercial" users.</p>
<p>To enforce the features we added tests in appropriate places. For example:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">pay_with_credit_card</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">user_account</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span><span class="o">.</span><span class="n">can_pay_with_credit_card</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">FeatureDisabled</span><span class="p">(</span><span class="s1">'can_pay_with_credit_card'</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<h3 id="the-problem"><a class="toclink" href="#the-problem">The Problem</a></h3>
<p>At this point we had a large codebase and a lot of tests. Unfortunately, a lot of our tests were a relic from when we were using fixtures extensively.</p>
<p>The thought of having to update and add new fixtures was unacceptable. But, we still had to test the new features so we started writing tests like this:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_charge_credit_card</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="n">feature_set</span> <span class="o">=</span> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span>
</span><span class="hll"> <span class="n">feature_set</span><span class="o">.</span><span class="n">can_pay_with_credit_card</span> <span class="o">=</span> <span class="kc">True</span>
</span> <span class="n">feature_set</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="s1">'can_pay_with_credit_card'</span><span class="p">])</span>
<span class="n">pay_with_credit_card</span><span class="p">(</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_when_feature_disabled</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="n">feature_set</span> <span class="o">=</span> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span>
</span><span class="hll"> <span class="n">feature_set</span><span class="o">.</span><span class="n">can_pay_with_credit_card</span> <span class="o">=</span> <span class="kc">False</span>
</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">FeatureDisabled</span><span class="p">):</span>
<span class="n">pay_with_credit_card</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>We had a lot of tests to update and some of the features we added interrupted the flow of other tests which resulted in <strong>a mess!</strong></p>
<h3 id="the-context-manager"><a class="toclink" href="#the-context-manager">The Context Manager</a></h3>
<p><a href="/how-to-test-django-signals-like-a-pro#enter-context-manager">We already used context managers to improve our tests in the past</a>, and we thought we can use one here to set features on and off:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">contextmanager</span>
<span class="nd">@contextmanager</span>
<span class="k">def</span> <span class="nf">feature</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">feature_name</span><span class="p">,</span> <span class="n">enabled</span><span class="p">):</span>
<span class="n">original_value</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">feature_name</span><span class="p">)</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">feature_name</span><span class="p">,</span> <span class="n">enabled</span><span class="p">)</span>
<span class="n">feature_set</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="n">feature_name</span><span class="p">])</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">yield</span>
<span class="k">finally</span><span class="p">:</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">feature_name</span><span class="p">,</span> <span class="n">original_value</span><span class="p">)</span>
<span class="n">feature_set</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span><span class="n">feature_name</span><span class="p">])</span>
</pre></div>
<p>What does this context manager do?</p>
<ol>
<li>Save the original value of the feature.</li>
<li>Set the new value for the feature.</li>
<li>Yields - this where our test code actually executes.</li>
<li>Set the feature back to the original value</li>
</ol>
<p>This made our tests much more elegant:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_charge_credit_card</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">feature</span><span class="p">(</span><span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">can_pay_with_credit_card</span><span class="p">,</span> <span class="kc">True</span><span class="p">):</span>
</span> <span class="n">pay_with_credit_card</span><span class="p">(</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_when_feature_disabled</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">feature</span><span class="p">(</span><span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">can_pay_with_credit_card</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">FeatureDisabled</span><span class="p">):</span>
<span class="n">pay_with_credit_card</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<h3 id="kwargs"><a class="toclink" href="#kwargs">**kwargs</a></h3>
<p>This context manager has proven to be very useful for features so we thought... <strong>why not use it for other things as well?</strong></p>
<p>We had a lot of methods involving more than one feature:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_not_send_notification</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">feature_set</span> <span class="o">=</span> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span>
<span class="hll"> <span class="k">with</span> <span class="n">feature</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">can_pay_with_credit_card</span><span class="p">,</span> <span class="kc">True</span><span class="p">):</span>
</span><span class="hll"> <span class="k">with</span> <span class="n">feature</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">can_receive_notifications</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
</span> <span class="n">pay_with_credit_card</span><span class="p">(</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>Or more than one object:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_not_send_notification_to_inactive_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">feature_set</span> <span class="o">=</span> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span>
<span class="hll"> <span class="n">user_account</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_active</span> <span class="o">=</span> <span class="kc">False</span>
</span><span class="hll"> <span class="k">with</span> <span class="n">feature</span><span class="p">(</span><span class="n">feature_set</span><span class="p">,</span> <span class="n">can_receive_notifications</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
</span> <span class="n">pay_with_credit_card</span><span class="p">(</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>So we rewrote the context manager to accept any object and added support for multiple arguments:</p>
<div class="highlight"><pre><span></span><span class="nd">@contextmanager</span>
<span class="k">def</span> <span class="nf">temporarily</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="hll"> <span class="n">original_values</span> <span class="o">=</span> <span class="p">{</span><span class="n">k</span><span class="p">:</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="p">}</span>
</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">obj</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="n">kwargs</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="k">yield</span>
</span>
<span class="k">finally</span><span class="p">:</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">original_values</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="hll"> <span class="n">obj</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="n">original_values</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
</span></pre></div>
<p>The context manager can now accept multiple features, save the original values,
set the new values and restore when we are done.</p>
<p>Testing became much easier:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_not_send_notification</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">temporarily</span><span class="p">(</span>
</span><span class="hll"> <span class="n">user_account</span><span class="o">.</span><span class="n">feature_set</span><span class="p">,</span>
</span><span class="hll"> <span class="n">can_pay_with_credit_card</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span><span class="hll"> <span class="n">can_receive_notifications</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span><span class="hll"> <span class="p">):</span>
</span> <span class="n">pay_with_credit_card</span><span class="p">(</span><span class="n">user_account</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEquals</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">outbox</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>We can now use the function on other objects as well:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_fail_to_login_inactive_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">temporarily</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">is_active</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
</span> <span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">login</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">status_code</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
</pre></div>
<p><strong>Profit!</strong></p>
<hr>
<h3 id="the-hidden-performance-benefit"><a class="toclink" href="#the-hidden-performance-benefit">The Hidden Performance Benefit</a></h3>
<p>After a while getting comfortable with the new utility we noticed another performance benefit. In tests that had heavy setups we managed to move the setup from the test level to the class level.</p>
<p>To illustrate the difference let's test a function that sends an invoice to the users. Invoices are usually sent only when the transaction is complete. To create a complete transaction we need a lot of setup (choose products, checkout, issue payment etc).</p>
<p>This is a test that require a lot of setup:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestSendInvoice</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="hll"> <span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transaction</span> <span class="o">=</span> <span class="n">Transaction</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">add_product</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">add_product</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">checkout</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">request_payment</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">process_payment</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_not_send_invoice_to_commercial_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="s1">'commercial'</span>
<span class="n">mail</span><span class="o">.</span><span class="n">outbox</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">send_invoice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_attach_special_offer_to_pro_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="s1">'pro'</span>
<span class="n">mail</span><span class="o">.</span><span class="n">outbox</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">send_invoice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span>
<span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">subject</span><span class="p">,</span>
<span class="s1">'Invoice and a special offer!'</span>
<span class="p">)</span>
</pre></div>
<p>The <code>setUp</code> function need to execute before each test function because the test functions change the objects and that might create a dangerous dependency between test cases.</p>
<p>To prevent dependencies between test cases we need to make sure each test leaves the data exactly as it got it. Luckily, this is exactly what our new context manager does:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestSendInvoice</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="hll"> <span class="k">def</span> <span class="nf">setUpTestData</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
</span> <span class="bp">cls</span><span class="o">.</span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">transaction</span> <span class="o">=</span> <span class="n">Transaction</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">add_product</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">add_product</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">checkout</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">request_payment</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="n">Transaction</span><span class="o">.</span><span class="n">process_payment</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_not_send_invoice_to_commercial_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">mail</span><span class="o">.</span><span class="n">outbox</span> <span class="o">=</span> <span class="p">[]</span>
<span class="hll"> <span class="k">with</span> <span class="n">temporarily</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s1">'commercial'</span><span class="p">):</span>
</span> <span class="n">Transaction</span><span class="o">.</span><span class="n">send_invoice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_attach_special_offer_to_pro_user</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">mail</span><span class="o">.</span><span class="n">outbox</span> <span class="o">=</span> <span class="p">[]</span>
<span class="hll"> <span class="k">with</span> <span class="n">temporarily</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s1">'pro'</span><span class="p">):</span>
</span> <span class="n">Transaction</span><span class="o">.</span><span class="n">send_invoice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">mail</span><span class="o">.</span><span class="n">outbox</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">subject</span><span class="p">,</span> <span class="s1">'Invoice and a special offer!'</span><span class="p">)</span>
</pre></div>
<p>We moved the setup code to setUpTestData. <strong>The setup code will execute only once for the entire test class resulting in quicker tests.</strong></p>
<hr>
<h3 id="final-words"><a class="toclink" href="#final-words">Final Words</a></h3>
<p>The motivation for this context processor was our long unhealthy relationship with fixtures. As we scaled our app the fixtures became a burden. Having so many tests rely on them made it difficult to completely replace.</p>
<p>With the addition of features we knew we did not want to rely on fixtures any more and we looked for creative, more verbose and maintainable ways, of managing test data. Having a simple way to create different variations of an object for testing was exactly what we needed.</p>How to Manage Concurrency in Django Models2017-07-07T00:00:00+03:002017-07-07T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-07-07:/how-to-manage-concurrency-in-django-models<p>The days of desktop systems serving single users are long gone. Web applications nowadays are serving millions of users at the same time. With many users comes a wide range of new problems: concurrency problems. In this article we describe two approaches for managing concurrency in Django models.</p><hr>
<p>The days of desktop systems serving single users are long gone. Web applications nowadays are serving millions of users at the same time. With many users comes a wide range of new problems: <strong>concurrency problems</strong>.</p>
<h3 id="the-problem"><a class="toclink" href="#the-problem">The Problem</a></h3>
<p>To demonstrate common concurrency issues we are going to work on a bank account
model:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Account</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">AutoField</span><span class="p">(</span>
<span class="n">primary_key</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">User</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">balance</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span>
<span class="n">default</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>To get started we are going to implement a naive deposit and withdraw methods for an account instance:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o">+=</span> <span class="n">amount</span>
<span class="bp">self</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">withdraw</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="k">if</span> <span class="n">amount</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">balance</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">InsufficientFunds</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o">-=</span> <span class="n">amount</span>
<span class="bp">self</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div>
<p>This seems innocent enough and it might even pass unit tests and integration tests on localhost. But, <strong>what happens when two users perform actions on the same account at the same time?</strong></p>
<ol>
<li>User A fetches the account<ul>
<li>balance is 100$</li>
</ul>
</li>
<li>User B fetches the account<ul>
<li>balance is 100$</li>
</ul>
</li>
<li>User B withdraws 30$<ul>
<li>balance is updated to 100$ - 30$ = 70$</li>
</ul>
</li>
<li>User A deposits 50$<ul>
<li>balance is updated to 100$ + 50$ = 150$</li>
</ul>
</li>
</ol>
<h4 id="what-happened-here"><a class="toclink" href="#what-happened-here">What Happened Here?</a></h4>
<p>User B asked to withdraw 30$ and user A deposited 50$. We expect the balance to be 120$, but we ended up with 150$.</p>
<h4 id="why-did-it-happen"><a class="toclink" href="#why-did-it-happen">Why Did it Happen?</a></h4>
<p>At step 4, when user A updated the balance, the amount he had stored in memory was stale (user B had already withdrawn 30$). To prevent this situation from happening we need to <strong>make sure the resource we are working on is not altered while we are working on it.</strong></p>
<hr>
<h3 id="pessimistic-approach"><a class="toclink" href="#pessimistic-approach">Pessimistic Approach</a></h3>
<p>The pessimistic approach dictates that you should <strong>lock the resource exclusively until you are finished with it</strong>. If nobody else can acquire a lock on the object while you are working on it, you can be sure the object was not changed.</p>
<p>To acquire a lock on a resource we use a <strong>database lock</strong> for several reasons:</p>
<ol>
<li>(relational) <strong>databases are very good at managing locks</strong> and maintaining consistency.</li>
<li>The database is the lowest level in which data is accessed - acquiring the lock at the lowest level will <strong>protect the data from other processes</strong> modifying the data as well. For example, direct updates in the DB, cron jobs, cleanup tasks, etc.</li>
<li>A Django app <strong>can run on multiple processes</strong> (e.g workers). Maintaining locks at the app level will require a lot of (unnecessary) work.</li>
</ol>
<h4 id="implementation"><a class="toclink" href="#implementation">Implementation</a></h4>
<p>Let's implement a safe deposit and withdraw actions using a pessimistic approach:</p>
<div class="highlight"><pre><span></span><span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="n">account</span> <span class="o">=</span> <span class="p">(</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="nb">id</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">+=</span> <span class="n">amount</span>
<span class="n">account</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">return</span> <span class="n">account</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">withdraw</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="n">account</span> <span class="o">=</span> <span class="p">(</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="nb">id</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o"><</span> <span class="n">amount</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">InsufficientFunds</span><span class="p">()</span>
<span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">-=</span> <span class="n">amount</span>
<span class="n">account</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">return</span> <span class="n">account</span>
</pre></div>
<ol>
<li>We use <code>select_for_update</code> on our queryset to tell the database to lock the object until the transaction is done.</li>
<li>Locking a row in the database requires a database transaction. We use Django's decorator <code>transaction.atomic()</code> to scope the transaction.</li>
<li>We use a <code>classmethod</code> instead of an instance method. To acquire the lock we need to tell the database to lock it. To achieve that we need to be the ones fetching the object from the database. When operating on <em>self </em>the object is already fetched and we don't have any guaranty that it was locked.</li>
<li>All the operations on the account are executed within the database transaction.</li>
</ol>
<p>Let's see how the scenario from earlier is prevented with our new implementation:</p>
<ol>
<li>User A asks to withdraw 30$:<ul>
<li>User A acquires a lock on the account</li>
<li>Balance is 100$</li>
</ul>
</li>
<li>User B asks to deposit 50$:<ul>
<li>Attempt to acquire lock on account fails (locked by user A)</li>
<li>User B waits for the lock to release</li>
</ul>
</li>
<li>User A withdraw 30$:<ul>
<li>Balance is 70$</li>
<li>Lock of user A on account is released</li>
</ul>
</li>
<li>User B acquires a lock on the account<ul>
<li>Balance is 70$</li>
<li>New balance is 70$ + 50$ = 120$</li>
</ul>
</li>
<li>Lock of user B on account is released, balance is 120$. Bug prevented!</li>
</ol>
<h4 id="what-you-need-to-know-about-select_for_update"><a class="toclink" href="#what-you-need-to-know-about-select_for_update">What You Need to Know About <code>select_for_update</code></a></h4>
<ul>
<li><strong>You dont have to wait for the lock to release</strong> - In our scenario, user B waited for user A to release the lock. Instead of waiting, we can tell Django not to wait for the lock to release and raise a <code>DatabaseError</code> instead. To do that, we set <code>select_for_update(nowait=True)</code>.</li>
<li><strong>Select related objects are also locked</strong> - Using <code>select_for_update</code> with <code>select_related</code> locks the related objects as well.<br>For example, If we <code>select_related</code> the user along with the account, both the user and the account are locked. If during deposit someone is trying to update the user's first name, that update will fail because the user object is locked.<br> If you are using PostgreSQL or Oracle this might not be a problem soon, thanks to <a href="https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-for-update" rel="noopener">a new feature</a> in the upcoming Django 2.0. In this version, <code>select_for_update</code> has an <code>of</code> option to explicitly state which of the tables in the query to lock.</li>
</ul>
<div class="admonition tip">
<p class="admonition-title">See Also</p>
<p>I used the bank account example in the past to demonstrate common patterns we use in Django models. You are welcome to follow up <a href="bullet-proofing-django-models">in this article</a></p>
</div>
<hr>
<h3 id="optimistic-approach"><a class="toclink" href="#optimistic-approach">Optimistic Approach</a></h3>
<p>Unlike the pessimistic approach, the optimistic approach does not require a lock on the object. The optimistic approach assumes collisions are not very common, and dictates that one should only make sure there were no changes made to the object at the time it is updated.</p>
<h4 id="implementation_1"><a class="toclink" href="#implementation_1">Implementation</a></h4>
<p>First, we add a column to keep track of changes made to the object:</p>
<div class="highlight"><pre><span></span><span class="n">version</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>Then, when we update an object, we make sure the version did not change:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="n">updated</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="nb">id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">,</span>
<span class="hll"> <span class="n">version</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">version</span><span class="p">,</span>
</span> <span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span>
<span class="n">balance</span><span class="o">=</span><span class="n">balance</span> <span class="o">+</span> <span class="n">amount</span><span class="p">,</span>
<span class="hll"> <span class="n">version</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">version</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="k">return</span> <span class="n">updated</span> <span class="o">></span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">withdraw</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o"><</span> <span class="n">amount</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">InsufficientFunds</span><span class="p">()</span>
<span class="n">updated</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="nb">id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">,</span>
<span class="hll"> <span class="n">version</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">version</span><span class="p">,</span>
</span> <span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span>
<span class="n">balance</span><span class="o">=</span><span class="n">balance</span> <span class="o">-</span> <span class="n">amount</span><span class="p">,</span>
<span class="hll"> <span class="n">version</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">version</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span>
</span> <span class="p">)</span>
<span class="k">return</span> <span class="n">updated</span> <span class="o">></span> <span class="mi">0</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li>We operate directly on the instance (no classmethod).</li>
<li>We rely on the fact that the version is incremented every time the object is updated.</li>
<li>We update only if the version did not change:<ul>
<li>If the object was not modified since we fetched it than the object is updated.</li>
<li>If it was modified than the query will return zero records and the object will not be updated.</li>
</ul>
</li>
<li>Django returns the number of updated rows. If <code>updated</code> is zero it means someone else changed the object from the time we fetched it.</li>
</ol>
<p>How is optimistic locking work in our scenario:</p>
<ol>
<li>User A fetch the account:<ul>
<li>balance is 100$</li>
<li>version is 0</li>
</ul>
</li>
<li>User B fetch the account:<ul>
<li>balance is 100$</li>
<li>version is 0</li>
</ul>
</li>
<li>User B asks to withdraw 30$:<ul>
<li>Balance is updated to 100$ - 30$ = 70$</li>
<li>Version is incremented to 1</li>
</ul>
</li>
<li>User A asks to deposit 50$:<ul>
<li>The calculated balance is 100$ + 50$ = 150$</li>
<li>The account does not exist with version 0 β nothing is updated</li>
</ul>
</li>
</ol>
<h4 id="what-you-need-to-know-about-the-optimistic-approach"><a class="toclink" href="#what-you-need-to-know-about-the-optimistic-approach">What You Need to Know About the Optimistic Approach:</a></h4>
<ul>
<li>Unlike the pessimistic approach, this approach requires an <strong>additional field and a lot of discipline</strong>. One way to overcome the discipline issue is to abstract this behavior. Some packages we've taken inspiration from are:<ul>
<li><a href="https://github.com/kmmbvnr/django-fsm" rel="noopener">django-fsm</a> implements <a href="https://github.com/kmmbvnr/django-fsm/blob/master/django_fsm/__init__.py#L463" rel="noopener">optimistic locking using a version field</a> as described above.</li>
<li><a href="https://github.com/gavinwahl/django-optimistic-lock" rel="noopener">django-optimistic-lock</a> seem to do the same.</li>
</ul>
</li>
<li>In an environment with a lot of concurrent updates this approach might be <strong>wasteful</strong>.</li>
<li>The optimistic approach <strong>does not protect from modifications made to the object outside the app</strong>. If you have other tasks that modify the data directly (e.g no through the model), you need to make sure they use the version as well.</li>
<li>Using the optimistic approach, the <strong>function can fail</strong> and return false. In this case we will most likely want to retry the operation. Using the pessimistic approach with <code>nowait=False</code> the operation cannot fail, it will wait for the lock to release.</li>
</ul>
<hr>
<h3 id="which-one-should-i-use"><a class="toclink" href="#which-one-should-i-use">Which One Should I Use?</a></h3>
<p>Like any great question, the answer is <em>"it depends"</em>:</p>
<ul>
<li>If your object has a lot of concurrent updates you are probably better off with the pessimistic approach.</li>
<li>If you have updates happening outside the ORM (for example, directly in the database) the pessimistic approach is safer.</li>
<li>If your method has side effects such as remote API calls or OS calls make sure they are safe. Some things to consider - can the remote call take a long time? Is the remote call idempotent (safe to retry)?</li>
</ul>5 Ways to Make Django Admin Safer2017-06-09T00:00:00+03:002017-06-09T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-06-09:/5-ways-to-make-django-admin-safer<p>With great power comes great responsibility. The more powerful your Django admin is, the safer it should be. Making a Django admin safer and more secure doesn't have to be hard - you just have to pay attention. In this article I present 5 ways to protect the Django Admin from human errors and attackers.</p><hr>
<p>In this article I present 5 ways to protect the Django Admin from human errors and attackers.</p>
<p><details class="toc-container" open>
<summary>Table of Contents</summary></p>
<div class="toc">
<ul>
<li><a href="#change-the-url">Change the URL</a></li>
<li><a href="#visually-distinguish-environments">Visually Distinguish Environments</a></li>
<li><a href="#name-your-admin-site">Name Your Admin Site</a></li>
<li><a href="#separate-the-django-admin-from-the-main-site">Separate the Django Admin From The Main Site</a></li>
<li><a href="#add-two-factor-authentication-2fa">Add Two Factor Authentication (2FA)</a></li>
<li><a href="#final-words">Final Words</a></li>
</ul>
</div>
<p></details></p>
<hr>
<h2 id="change-the-url"><a class="toclink" href="#change-the-url">Change the URL</a></h2>
<p>Every framework has a fingerprint and Django is no exception. A skilled developer, an attacker or even a tech savvy user can identify a Django site by looking at things like cookies and auth URLs.</p>
<p><strong>Once a site is identified as a Django site, an attacker will most likely try /admin.</strong></p>
<p>To make it harder to gain access we can change the "recommended" URL to something harder to guess.</p>
<p>In the base url.py of the app, <a href="https://docs.djangoproject.com/en/1.11/ref/contrib/admin/#hooking-adminsite-instances-into-your-urlconf" rel="noopener">register the admin site</a> under a different url:</p>
<div class="highlight"><pre><span></span><span class="n">urlpatterns</span> <span class="o">+=</span> <span class="n">i18n_patterns</span><span class="p">(</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^super-secret/'</span><span class="p">,</span> <span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">urls</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'admin'</span><span class="p">),</span>
<span class="p">)</span>
</pre></div>
<p>Change <em>"super-secret"</em> to something you and your team can remember.</p>
<p>This is definitely not the only precaution you should take, but it is a good start.</p>
<hr>
<h2 id="visually-distinguish-environments"><a class="toclink" href="#visually-distinguish-environments">Visually Distinguish Environments</a></h2>
<p>Users and admins are not perfect and mistakes happen. When you have multiple environments such as development, QA, staging and production, it's not unlikely for an admin to perform a destructive operation in the wrong environment by accident (<a href="https://about.gitlab.com/2017/02/01/gitlab-dot-com-database-incident/" rel="noopener">just ask gitlab</a>).</p>
<p>To reduce the chance of mistakes, we mark different environments clearly in the admin:</p>
<figure><img alt="Indicator in different environments" src="https://hakibenita.com/images/01-5-ways-to-make-django-admin-safer.png"><figcaption>Indicator in different environments</figcaption>
</figure>
<p>First you need to have some way of knowing which environment you are on. We have a variable called <code>ENVIRONMENT_NAME</code> we populate during deployment. We have another variable called <code>ENVIRONMENT_COLOR</code> for the indicator color.</p>
<p>To add the environment indicator to every page in the admin, override the base admin template:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- app/templates/admin/base_site.html --></span>
<span class="hll"><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"admin/base_site.html"</span> <span class="cp">%}</span>
</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">extrastyle</span> <span class="cp">%}</span>
<span class="x"><style type="text/css"></span>
<span class="x"> body:before {</span>
<span class="x"> display: block;</span>
<span class="x"> line-height: 35px;</span>
<span class="x"> text-align: center;</span>
<span class="x"> font-weight: bold;</span>
<span class="x"> text-transform: uppercase;</span>
<span class="x"> color: white;</span>
<span class="hll"><span class="x"> content: "</span><span class="cp">{{</span> <span class="nv">ENVIRONMENT_NAME</span> <span class="cp">}}</span><span class="x">";</span>
</span><span class="hll"><span class="x"> background-color: </span><span class="cp">{{</span> <span class="nv">ENVIRONMENT_COLOR</span> <span class="cp">}}</span><span class="x">;</span>
</span><span class="x"> }</span>
<span class="x"></style></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div>
<p>To make the ENVIRONMENT variables from <code>settings.py</code> available in the template we use a context processor:</p>
<div class="highlight"><pre><span></span><span class="c1"># app/context_processors.py</span>
<span class="kn">from</span> <span class="nn">django.conf</span> <span class="kn">import</span> <span class="n">settings</span>
<span class="k">def</span> <span class="nf">from_settings</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'ENVIRONMENT_NAME'</span><span class="p">:</span> <span class="n">settings</span><span class="o">.</span><span class="n">ENVIRONMENT_NAME</span><span class="p">,</span>
<span class="s1">'ENVIRONMENT_COLOR'</span><span class="p">:</span> <span class="n">settings</span><span class="o">.</span><span class="n">ENVIRONMENT_COLOR</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>To register the context processor add the following in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">TEMPLATES</span> <span class="o">=</span> <span class="p">[{</span>
<span class="c1"># ...</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'context_processors'</span><span class="p">:</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s1">'app.context_processors.from_settings'</span><span class="p">,</span>
<span class="p">],</span>
<span class="c1"># ...</span>
<span class="p">},</span>
<span class="p">}]</span>
</pre></div>
<p>Now when you open Django Admin you should see the indicator on top.</p>
<hr>
<h2 id="name-your-admin-site"><a class="toclink" href="#name-your-admin-site">Name Your Admin Site</a></h2>
<p>If you have multiple Django services that look the same, admins can easily get confused. To help admins be more aware of where they are, change the title:</p>
<div class="highlight"><pre><span></span><span class="c1"># urls.py</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">site_header</span> <span class="o">=</span> <span class="s1">'Awesome Inc. Administration'</span>
<span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">site_title</span> <span class="o">=</span> <span class="s1">'Awesome Inc. Administration'</span>
</pre></div>
<p>And you get:</p>
<figure><img alt="Django Admin site with a name and a title" src="https://hakibenita.com/images/02-5-ways-to-make-django-admin-safer.png"><figcaption>Django Admin site with a name and a title</figcaption>
</figure>
<p>For more exotic options look around <a href="https://docs.djangoproject.com/en/1.11/ref/contrib/admin/#adminsite-attributes" rel="noopener">in the docs</a>.</p>
<hr>
<h2 id="separate-the-django-admin-from-the-main-site"><a class="toclink" href="#separate-the-django-admin-from-the-main-site">Separate the Django Admin From The Main Site</a></h2>
<p>Using the same codebase you can deploy two instances of the same Django app - one only for the admin and one only for the rest of the app.</p>
<p>This is controversial and not as easy as the other tips. The implementation is dependent on the configuration (e.g. if you are using gunicorn or uwsgi) so I won't go into the details.</p>
<p>Some reasons you might want to split the admin to its own instance are:</p>
<ul>
<li><strong>Deploy the admin inside a VPN (virtual private network)</strong> - If the admin is used only internally and you have a VPN it is good practice to have it inside the private network.</li>
<li><strong>Remove unnecessary components from the main site</strong> - For example, the Django admin uses the messages framework. If the main site does not, you can remove that middleware. Another example is authentication - if the main site is an API backend using token authentication, you can remove a lot of templates configuration, session middleware, etc. and trim some fat from the request-response cycle.</li>
<li><strong>Stronger authentication</strong> - If you want to strengthen the security of Django Admin you might want to provide a different authentication mechanism just for the admin. This is much easier on different instances with different settings.</li>
</ul>
<p>We split the admin from the main site only in public facing sites. We don't bother with internal apps because it complicates the deployment and it has no benefit of being more secure.</p>
<hr>
<h2 id="add-two-factor-authentication-2fa"><a class="toclink" href="#add-two-factor-authentication-2fa">Add Two Factor Authentication (2FA)</a></h2>
<p>Two factor authentication became very popular lately as many sites started offering this option. 2FA performs authentication using two things:</p>
<ol>
<li><strong>Something you know</strong> - Usually a password.</li>
<li><strong>Something you have</strong> - Usually a mobile app that generates a random number every 30 seconds (such as <a href="https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=en" rel="noopener">Authenticator by Google</a>).</li>
</ol>
<p>On first signup the user is usually asked to scan a barcode with the authenticator app. After this initial setup the app will start generating the one-time codes.</p>
<p>I don't usually recommend third party packages, but a couple of months ago we started using <a href="https://pypi.python.org/pypi/django-otp" rel="noopener">django-otp </a>to implement 2FA in our admin site and it's working great for us. It's hosted on Bitbucket so you might have missed it.</p>
<p>The setup is pretty simple:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>django-otp
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>qrcode
</pre></div>
<p>Add django-otp to the installed apps and the middleware:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'django_otp'</span><span class="p">,</span>
<span class="s1">'django_otp.plugins.otp_totp'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">MIDDLEWARE</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'django.contrib.auth.middleware.AuthenticationMiddleware'</span><span class="p">,</span>
<span class="s1">'django_otp.middleware.OTPMiddleware'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Name the issuer - this is the name users will see in the authenticator app, so make it distinguishable.</p>
<div class="highlight"><pre><span></span># settings.py
OTP_TOTP_ISSUER = 'Awesome Inc.'
</pre></div>
<p>Add 2FA authentication to the admin site:</p>
<div class="highlight"><pre><span></span># urls.py
from django_otp.admin import OTPAdminSite
admin.site.__class__ = OTPAdminSite
</pre></div>
<p>Now you have a secure admin page that looks like this:</p>
<figure><img alt="Django Admin login with OTP token" src="https://hakibenita.com/images/03-5-ways-to-make-django-admin-safer.png"><figcaption>Django Admin login with OTP token</figcaption>
</figure>
<p>To set up a new user create a "TOTP Device" from the Django Admin. Once you are done click the QR link and you will get a screen like that:</p>
<figure><img alt="QR for setting up a new user" src="https://hakibenita.com/images/04-5-ways-to-make-django-admin-safer.png"><figcaption>QR for setting up a new user</figcaption>
</figure>
<p>Have the user scan the QR code with the authenticator app on their personal device, and they will have a fresh code generated every 30 seconds.</p>
<hr>
<h2 id="final-words"><a class="toclink" href="#final-words">Final Words</a></h2>
<p>Making a Django admin safer and more secure doesn't have to be hard - you just have to pay attention. Some of the tips mentioned here are very easy to set up and they go a long way.</p>The Many Faces of DISTINCT in PostgreSQL2017-05-11T00:00:00+03:002017-05-11T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-05-11:/the-many-faces-of-distinct-in-postgre-sql<p>I started my programming career as an Oracle DBA. It took a few years but eventually I got fed up with the corporate world and I went about doing my own thing. After I gotten over not having proper partitions and MERGE statement, I found some nice unique features in PostgreSQL. Oddly enough, a lot of them contained the word DISTINCT.</p><hr>
<p>I started my programming career as an Oracle DBA. It took a few years but eventually I got fed up with the corporate world and I went about doing my own thing.</p>
<p>When I no longer had the comfy cushion of Oracle enterprise edition I discovered PostgreSQL. After I gotten over not having proper partitions and MERGE statement (aka UPSERT), I found some nice unique features in PostgreSQL. Oddly enough, a lot of them contained the word DISTINCT.</p>
<h3 id="distinct"><a class="toclink" href="#distinct">DISTINCT</a></h3>
<p>I created a simple Employee table with name, department and salary using mock data from <a href="https://www.mockaroo.com/" rel="noopener">this site</a>:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="kp">\d</span><span class="w"> </span><span class="ss">employee</span>
<span class="go"> Column | Type | Modifiers</span>
<span class="go">------------+-----------------------+-----------</span>
<span class="go"> id | integer | not null</span>
<span class="go"> name | character varying(30) |</span>
<span class="go"> department | character varying(30) |</span>
<span class="go"> salary | integer |</span>
<span class="gp">haki=#</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">employee</span><span class="w"> </span><span class="k">limit</span><span class="w"> </span><span class="mf">5</span><span class="p">;</span>
<span class="go"> id | name | department | salary</span>
<span class="go">----+----------------+----------------------+--------</span>
<span class="go"> 1 | Carl Frazier | Engineering | 3052</span>
<span class="go"> 2 | Richard Fox | Product Management | 13449</span>
<span class="go"> 3 | Carolyn Carter | Engineering | 8366</span>
<span class="go"> 4 | Benjamin Brown | Business Development | 7386</span>
<span class="go"> 5 | Diana Fisher | Services | 10419</span>
</pre></div>
<h4 id="what-is-distinct"><a class="toclink" href="#what-is-distinct">What is DISTINCT?</a></h4>
<blockquote>
<p>SELECT DISTINCT eliminates duplicate rows from the result.</p>
</blockquote>
<p>The simplest use of distinct is, for example, to get a unique list of
departments:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">employee</span><span class="p">;</span>
<span class="go"> department</span>
<span class="go">--------------------------</span>
<span class="go"> Services</span>
<span class="go"> Support</span>
<span class="go"> Training</span>
<span class="go"> Accounting</span>
<span class="go"> Business Development</span>
<span class="go"> Marketing</span>
<span class="go"> Product Management</span>
<span class="go"> Human Resources</span>
<span class="go"> Engineering</span>
<span class="go"> Sales</span>
<span class="go"> Research and Development</span>
<span class="go"> Legal</span>
</pre></div>
<p><em>(easy CS students, I know it's not normalizedβ¦)</em></p>
<p>We can do the same thing with group by</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">employee</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">department</span><span class="p">;</span>
</pre></div>
<p>But we are talking about DISTINCT.</p>
<hr>
<h3 id="distinct-on"><a class="toclink" href="#distinct-on">DISTINCT ON</a></h3>
<p>A classic job interview question is <strong>finding the employee with the highest salary in each department</strong>.</p>
<p>This is what they teach in the university:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="k">WHERE</span>
<span class="w"> </span><span class="p">(</span><span class="n">department</span><span class="p">,</span><span class="w"> </span><span class="n">salary</span><span class="p">)</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="n">MAX</span><span class="p">(</span><span class="n">salary</span><span class="p">)</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span>
<span class="w"> </span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span><span class="p">;</span>
<span class="go"> id | name | department | salary</span>
<span class="go">----+------------------+--------------------------+--------</span>
<span class="go"> 30 | Sara Roberts | Accounting | 13845</span>
<span class="go"> 4 | Benjamin Brown | Business Development | 7386</span>
<span class="go"> 3 | Carolyn Carter | Engineering | 8366</span>
<span class="go"> 20 | Janet Hall | Human Resources | 2826</span>
<span class="hll"><span class="go"> 14 | Chris Phillips | Legal | 3706</span>
</span><span class="hll"><span class="go"> 10 | James Cunningham | Legal | 3706</span>
</span><span class="go"> 11 | Richard Bradley | Marketing | 11272</span>
<span class="go"> 2 | Richard Fox | Product Management | 13449</span>
<span class="go"> 25 | Evelyn Rodriguez | Research and Development | 10628</span>
<span class="go"> 17 | Benjamin Carter | Sales | 6197</span>
<span class="go"> 24 | Jessica Elliott | Services | 14542</span>
<span class="go"> 7 | Bonnie Robertson | Support | 12674</span>
<span class="go"> 8 | Jean Bailey | Training | 13230</span>
</pre></div>
<p><strong>Legal</strong> has two employees with the same high salary. Depending on the use case, this query can get pretty nasty.</p>
<p>If you graduated a while back, you already know a few things about databases and you heard about<strong> <a href="https://www.postgresql.org/docs/9.5/static/functions-window.html" rel="noopener">analytic and window functions</a></strong>, you might do this:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">ranked_employees</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span>
<span class="hll"><span class="w"> </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span>
</span><span class="hll"><span class="w"> </span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">salary</span><span class="w"> </span><span class="k">DESC</span>
</span><span class="hll"><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">rn</span><span class="p">,</span>
</span><span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">ranked_employees</span>
<span class="k">WHERE</span>
<span class="hll"><span class="w"> </span><span class="n">rn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span>
</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span><span class="p">;</span>
</pre></div>
<p>The result is the same without the duplicates:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">rn</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">department</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">salary</span>
<span class="c1">----+----+------------------+--------------------------+--------</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">30</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Sara</span><span class="w"> </span><span class="n">Roberts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Accounting</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">13845</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Benjamin</span><span class="w"> </span><span class="n">Brown</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Business</span><span class="w"> </span><span class="n">Development</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">7386</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Carolyn</span><span class="w"> </span><span class="n">Carter</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Engineering</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">8366</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Janet</span><span class="w"> </span><span class="n">Hall</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Human</span><span class="w"> </span><span class="n">Resources</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2826</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Chris</span><span class="w"> </span><span class="n">Phillips</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Legal</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">3706</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">11</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Richard</span><span class="w"> </span><span class="n">Bradley</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Marketing</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">11272</span>
<span class="mf">...</span>
</pre></div>
<p>Up until now, this is what I would have done.</p>
<p>Now for the real treat, PostgreSQL has a <strong>special nonstandard clause to find the first row in a group</strong>:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">department</span><span class="p">)</span>
<span class="w"> </span><span class="o">*</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="n">salary</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
</pre></div>
<figure><img alt="This is wild!" src="https://hakibenita.com/images/01-the-many-faces-of-distinct-in-postgresql.png"><figcaption>This is wild!</figcaption>
</figure>
<p><strong>This is wild! Why nobody told me this is possible?</strong></p>
<p><a href="https://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT" rel="noopener">The docs</a> explain DISTINCT ON:</p>
<blockquote>
<p><em>SELECT DISTINCT ON ( expression [, β¦] ) keeps only the first row of each set of
rows where the given expressions evaluate to equal</em></p>
</blockquote>
<p>And the reason I haven't heard about it is:</p>
<blockquote>
<p>Nonstandard Clauses<br> DISTINCT ON ( β¦ ) is an extension of the SQL standard.</p>
</blockquote>
<p>PostgreSQL does all the heavy lifting for us. The only requirement is that we ORDER BY the field we group by (<code>department</code> in this case). It also allows for "grouping" by more than one field which only makes this clause even more powerful.</p>
<hr>
<h3 id="is-distinct-from"><a class="toclink" href="#is-distinct-from">IS DISTINCT FROM</a></h3>
<p>Comparing values in SQL can result in three outcomes - <code>true</code>, <code>false</code> or <code>unknown</code>:</p>
<div class="highlight"><pre><span></span><span class="k">WITH</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">UNION</span><span class="w"> </span><span class="k">ALL</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">)</span>
<span class="k">SELECT</span>
<span class="w"> </span><span class="n">a</span><span class="p">,</span>
<span class="w"> </span><span class="n">b</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">equal</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">equal</span>
<span class="c1">------+------+-------</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">f</span>
<span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span>
<span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span>
</pre></div>
<p>The result of comparing NULL with NULL using equality (=) is UNKNOWN (marked as NULL in the table).</p>
<p><strong>In SQL 1 = 1 and NULL IS NULL but NULL != NULL.</strong></p>
<p>It's important to be aware of this subtlety because <strong>comparing nullable fields might yield unexpected results</strong>.</p>
<p>The full condition to get either true or false when comparing nullable fields is:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">null</span><span class="p">)</span>
<span class="k">or</span>
<span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">)</span>
</pre></div>
<p>And the result:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">equal</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">full_condition</span>
<span class="c1">------+------+-------+----------</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span>
<span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">f</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">f</span>
<span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">f</span>
<span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">t</span>
</pre></div>
<p>This is the result we want but it is very long. <strong>Is there a better way?</strong></p>
<p>PostgreSQL implements the SQL standard for safely comparing nullable fields:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">a</span><span class="p">,</span>
<span class="w"> </span><span class="n">b</span><span class="p">,</span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">equal</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">is_distinct_from</span>
</span><span class="k">FROM</span>
<span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="go"> a | b | equal | is_distinct_from</span>
<span class="go">------+------+-------+------------------</span>
<span class="go"> 1 | 1 | t | f</span>
<span class="go"> 1 | 2 | f | t</span>
<span class="go"> NULL | 1 | NULL | t</span>
<span class="go"> NULL | NULL | NULL | f</span>
</pre></div>
<p>PostgreSQL wiki explain <code>IS DISTINCT FROM</code>:</p>
<blockquote>
<p><strong>IS DISTINCT FROM<strong><em> and </em></strong>IS NOT DISTINCT FROM β¦</strong><em> treat NULL as if it was a
known value, rather than a special case for unknown.</em></p>
</blockquote>
<p>Much better - short and verbose.</p>
<h4 id="how-other-databases-handle-this"><a class="toclink" href="#how-other-databases-handle-this">How Other Databases Handle This?</a></h4>
<ul>
<li><strong>MySQL</strong> - A <a href="https://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#operator_equal-to" rel="noopener">special operator</a> <code><=></code> with similar functionality.</li>
<li><strong>Oracle</strong> - Provides a function called <a href="https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions078.htm" rel="noopener">LNNVL</a> to compare nullable fields (good luck with thatβ¦).</li>
<li><strong>MSSQL</strong> - Couldn't find a similar function.</li>
</ul>
<hr>
<h3 id="array_agg-distinct"><a class="toclink" href="#array_agg-distinct">ARRAY_AGG (DISTINCT)</a></h3>
<p><code>ARRAY_AGG</code> was one of the major selling points of PostgreSQL when I was
transitioning from Oracle.</p>
<p><code>ARRAY_AGG</code> aggregates values into an array:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">name</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">employees</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span><span class="p">;</span>
<span class="go"> department | employees</span>
<span class="go">----------------------+-------------------------------------</span>
<span class="go">Services | {"Diana Fisher","Jessica Elliott"}</span>
<span class="go">Support | {"Bonnie Robertson"}</span>
<span class="go">Training | {"Jean Bailey"}</span>
<span class="go">Accounting | {"Phillip Reynolds","Sean Franklin"}</span>
<span class="go">Business Development | {"Benjamin Brown","Brian Hayes"}</span>
<span class="go">Marketing | {"Richard Bradley","Arthur Moreno"}</span>
<span class="go">Product Management | {"Richard Fox","Randy Wells"}</span>
<span class="go">Human Resources | {"Janet Hall"}</span>
<span class="go">Engineering | {"Carl Frazier","Carolyn Carter"}</span>
<span class="go">Sales | {"Benjamin Carter"}</span>
<span class="go">Research and Develo.. | {"Donna Reynolds","Ann Boyd"}</span>
<span class="go">Legal | {"James Cunningham","George Hanson"}</span>
</pre></div>
<p>I find <code>ARRAY_AGG</code> useful mostly in the CLI for getting a quick view of the data, or when used with an ORM.</p>
<p>PostgreSQL took it the extra mile and implemented the DISTINCT option for this aggregate function as well. Using DISTINCT we can, for example, quickly view the unique salaries in each department:</p>
<div class="highlight"><pre><span></span><span class="gp">haki=#</span><span class="w"> </span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">department</span><span class="p">,</span>
<span class="w"> </span><span class="n">ARRAY_AGG</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">salary</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">salaries</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">employee</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="n">department</span><span class="p">;</span>
<span class="go">department | salaries</span>
<span class="go">--------------------------+---------------</span>
<span class="go"> Accounting | {11203}</span>
<span class="go"> Business Development | {2196,7386}</span>
<span class="go"> Engineering | {1542,3052}</span>
<span class="go"> Human Resources | {2826}</span>
<span class="go"> Legal | {1079,3706}</span>
<span class="go"> Marketing | {5740}</span>
<span class="go"> Product Management | {9101,13449}</span>
<span class="go"> Research and Development | {6451,10628}</span>
<span class="go"> Sales | {6197}</span>
<span class="go"> Services | {2119}</span>
<span class="go"> Support | {12674}</span>
<span class="go"> Training | {13230}</span>
</pre></div>
<p>We can immediately see that everyone in the support department are making the same salary.</p>
<h4 id="how-other-databases-handle-this_1"><a class="toclink" href="#how-other-databases-handle-this_1">How Other Databases Handle This?</a></h4>
<ul>
<li><strong>MySQL</strong> - Has a similar function called <a href="https://dev.mysql.com/doc/refman/5.6/en/group-by-functions.html#function_group-concat" rel="noopener">GROUP_CONCAT</a>.</li>
<li><strong>Oracle</strong> - Has an aggregate function called <a href="https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030" rel="noopener">ListAgg</a>. It has no support for DISTINCT. Oracle introduced the function in version 11.2 and up until then the world wide web was filled with custom implementations.</li>
<li><strong>MsSQL</strong> - The closest I found was a function called <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/stuff-transact-sql" rel="noopener">STUFF</a> that accepts an expression.</li>
</ul>
<hr>
<h3 id="take-away"><a class="toclink" href="#take-away">Take away</a></h3>
<p>The take away from this article is that you should always go back to the basics!</p>All You Need To Know About Prefetching in Django2017-04-29T00:00:00+03:002017-04-29T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-04-29:/all-you-need-to-know-about-prefetching-in-django<p>A rundown of all the ways you can use Prefetch to speed up queries in Django.</p><hr>
<p>I have recently worked on a ticket ordering system for a conference. It was very important for the customer to see a table of orders including a column with a list of program names in each order:</p>
<figure><img alt="The column requested by the users" src="https://hakibenita.com/images/01-all-you-need-to-know-about-prefetching-in-django.png"><figcaption>The column requested by the users</figcaption>
</figure>
<p>The models looked (roughly) like this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Program</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Price</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">program</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">Program</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">from_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">to_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Order</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">state</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">items</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">Price</span><span class="p">)</span>
</pre></div>
<ul>
<li><strong>Program</strong> - a session, lecture or a conference day.</li>
<li><strong>Price</strong> - Prices can change over time. One way to model changes over time is using a <a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row" rel="noopener">type 2 slowly changing dimension</a> (SCD). The <code>Price</code> model represents the price of a program at a certain point in time.</li>
<li><strong>Order</strong> - An order to one or more programs. Each item in the order is the price of the program at the time the order was made.</li>
</ul>
<h3 id="before-we-start"><a class="toclink" href="#before-we-start">Before We Start</a></h3>
<p>Throughout this article we are going to monitor the queries executed by Django.
To log the queries add the following to the <code>LOGGING</code> settings in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'loggers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'django.db.backends'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'level'</span><span class="p">:</span> <span class="s1">'DEBUG'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<hr>
<h3 id="whats-the-problem"><a class="toclink" href="#whats-the-problem">What's The Problem?</a></h3>
<p>Let's try to fetch the program names for a single order:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">state</span><span class="o">=</span><span class="s1">'completed'</span><span class="p">)</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
</span>
<span class="go">(0.002) SELECT ... FROM "orders_order"</span>
<span class="go">WHERE "orders_order"."state" = 'completed'</span>
<span class="go">ORDER BY "orders_order"."id" ASC LIMIT 1;</span>
<span class="hll">
</span><span class="gp">>>> </span><span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">program</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">o</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">all</span><span class="p">()]</span>
<span class="go">(0.002) SELECT ... FROM "events_price"</span>
<span class="go">INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")</span>
<span class="go">WHERE "orders_order_items"."order_id" = 29; args=(29,)</span>
<span class="go">(0.001) SELECT ... FROM "events_program"</span>
<span class="go">WHERE "events_program"."id" = 8; args=(8,)</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<ul>
<li>To fetch completed orders we need <strong>one query</strong>.</li>
<li>To fetch the program names for each order we need <strong>two more queries</strong>.</li>
</ul>
<p>I previously <a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger">wrote about the N+1 problem</a> and this is a classic case. If we need two queries for each order, <strong>the number of queries for 100 orders will be 1 + 100 * 2 = 201 queries</strong>, that's a lot!</p>
<p>Let's use Django to reduce the amount of queries:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">o</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'program__name'</span><span class="p">)</span>
<span class="go">(0.003) SELECT "events_program"."name" FROM "events_price"</span>
<span class="hll"><span class="go">INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")</span>
</span><span class="hll"><span class="go">INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")</span>
</span><span class="go">WHERE "orders_order_items"."order_id" = 29 LIMIT 21;</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<p><strong>Great!</strong> Django performed a join between <code>Price</code> and <code>Program</code> and reduced the amount of queries to just one per order.</p>
<p>At this point instead of 201 queries we only need 101 queries for 100 orders. Can we do better?</p>
<h3 id="why-cant-we-join"><a class="toclink" href="#why-cant-we-join">Why Can't We Join?</a></h3>
<p>The first question that should come to mind is <em>"why can't we join the tables?"</em></p>
<p>If we have a foreign key we can use <code>select_related</code> or use snake case like we did above to fetch the related fields in a single query.</p>
<p>For example, we fetched the program name for a list of prices in a single query using <code>values_list('program__name')</code><strong>. </strong>We were able to do that because each price is related to exactly one program.</p>
<p>If the relation between two models is many to many we can't do that. Every order has one or more related prices - if we join the two tables we get duplicate orders:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">order_id</span><span class="p">,</span>
<span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">price_id</span>
<span class="k">FROM</span>
<span class="w"> </span><span class="n">orders_order</span><span class="w"> </span><span class="n">o</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">orders_order_items</span><span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">.</span><span class="n">order_id</span><span class="p">)</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">events_price</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">op</span><span class="p">.</span><span class="n">price_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">id</span><span class="p">)</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span>
<span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">order_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">price_id</span>
<span class="c1">----------+----------</span>
<span class="w"> </span><span class="mi">45</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">38</span>
<span class="w"> </span><span class="mi">45</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">56</span>
<span class="w"> </span><span class="mi">70</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">38</span>
<span class="w"> </span><span class="mi">70</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">50</span>
<span class="w"> </span><span class="mi">70</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">77</span>
<span class="w"> </span><span class="mi">71</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">38</span>
</pre></div>
<p>Orders 70 and 45 have multiple items so they come up more than once in the result - <strong>Django can't handle that.</strong></p>
<hr>
<h3 id="enter-prefetch_related"><a class="toclink" href="#enter-prefetch_related">Enter <code>prefetch_related</code></a></h3>
<p>Django has a nice, built-in way, of dealing with this problem called <a href="https://docs.djangoproject.com/en/1.10/ref/models/querysets/#prefetch-related" rel="noopener"><code>prefetch_related</code></a>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">state</span><span class="o">=</span><span class="s1">'completed'</span><span class="p">,</span>
<span class="hll"><span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span>
</span><span class="hll"><span class="gp">... </span> <span class="s1">'items__program'</span><span class="p">,</span>
</span><span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="go">(0.002) SELECT ... FROM "orders_order"</span>
<span class="go">WHERE "orders_order"."state" = 'completed'</span>
<span class="go">ORDER BY "orders_order"."id" ASC LIMIT 1;</span>
<span class="go">(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id", "events_price"...</span>
<span class="go">FROM "events_price"</span>
<span class="go">INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")</span>
<span class="go">WHERE "orders_order_items"."order_id" IN (29);</span>
<span class="go">(0.001) SELECT "events_program"."id", "events_program"."name" FROM "events_program"</span>
<span class="go">WHERE "events_program"."id" IN (8);</span>
</pre></div>
<p>We told Django we intend to fetch <code>items__program</code> from the result set. In the second and third query we can see that Django fetched the through table <code>orders_order_items</code> and the relevant programs from <code>events_program</code>. <strong>The results of the prefetch are cached on the objects</strong>.</p>
<p>What happens when we try to fetch program names from the result?</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">program</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">o</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">all</span><span class="p">()]</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<p><strong>No additional queries</strong> - exactly what we wanted!</p>
<p>When using prefetch, it's important to <strong>work on the object and not on the query</strong>. Trying to fetch the program names with a query will produce the same outcome but will result in an additional query:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">o</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'program__name'</span><span class="p">)</span>
<span class="go">(0.002) SELECT "events_program"."name" FROM "events_price"</span>
<span class="go">INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")</span>
<span class="go">INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")</span>
<span class="go">WHERE "orders_order_items"."order_id" = 29 LIMIT 21;</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<p>At this point, <strong>fetching 100 orders requires only 3 queries</strong>. Can we do even better?</p>
<h3 id="introducing-prefetch"><a class="toclink" href="#introducing-prefetch">Introducing <code>Prefetch</code></a></h3>
<p>In version 1.7 Django introduced a new <a href="https://docs.djangoproject.com/en/1.11/ref/models/querysets/#django.db.models.Prefetch" rel="noopener"><code>Prefetch</code> object</a> that extends the capabilities of <code>prefetch_related</code>.</p>
<p>The new object allows the developer to override the query used by Django to prefetch the related objects.</p>
<p>In our previous example Django used two queries for the prefetch - one for the through table and one for the program table. What if we could tell Django to join these two together?</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">prices_and_programs</span> <span class="o">=</span> <span class="n">Price</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'program'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">state</span><span class="o">=</span><span class="s1">'completed'</span>
<span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span>
<span class="hll"><span class="gp">... </span> <span class="n">Prefetch</span><span class="p">(</span><span class="s1">'items'</span><span class="p">,</span> <span class="n">queryset</span><span class="o">=</span><span class="n">prices_and_programs</span><span class="p">)</span>
</span><span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="go">(0.001) SELECT ... FROM "orders_order"</span>
<span class="go">WHERE "orders_order"."state" = 'completed'</span>
<span class="go">ORDER BY "orders_order"."id" ASC LIMIT 1;</span>
<span class="go">(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id",</span>
<span class="go">"events_price"..., "events_program"...</span>
<span class="go">INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")</span>
<span class="go">WHERE "orders_order_items"."order_id" IN (29);</span>
</pre></div>
<p>We created a query that joins prices with programs. Than we told Django to use this query to prefetch the values. This is like telling Django that you intend to fetch both items and programs for each order.</p>
<p>Fetching program names for an order:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">program</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">o</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">all</span><span class="p">()]</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<p><strong>No additional queries</strong> - it worked!</p>
<hr>
<h3 id="taking-it-to-the-next-level"><a class="toclink" href="#taking-it-to-the-next-level">Taking It To The Next Level</a></h3>
<p>When we talked earlier about the models we mentioned that the prices are modeled as an SCD table. This means we might want to query only active prices at a certain date.</p>
<p>A price is active at a certain date if it's between <code>from_date</code> and <code>end_date</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="gp">>>> </span><span class="n">now</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">active_prices</span> <span class="o">=</span> <span class="n">Price</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">from_date__lte</span><span class="o">=</span><span class="n">now</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">to_date__gt</span><span class="o">=</span><span class="n">now</span><span class="p">,</span>
<span class="gp">... </span><span class="p">)</span>
</pre></div>
<p>Using the Prefetch object we can tell Django to store the prefetched objects in
a new attribute of the result set:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="gp">>>> </span><span class="n">now</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">active_prices_and_programs</span> <span class="o">=</span> <span class="p">(</span>
<span class="gp">... </span> <span class="n">Price</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">from_date__lte</span><span class="o">=</span><span class="n">now</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">to_date__gt</span><span class="o">=</span><span class="n">now</span><span class="p">,</span>
<span class="gp">... </span> <span class="p">)</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'program'</span><span class="p">)</span>
<span class="gp">... </span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="n">Order</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">state</span><span class="o">=</span><span class="s1">'completed'</span>
<span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span>
<span class="hll"><span class="gp">... </span> <span class="n">Prefetch</span><span class="p">(</span>
</span><span class="hll"><span class="gp">... </span> <span class="s1">'items'</span><span class="p">,</span>
</span><span class="hll"><span class="gp">... </span> <span class="n">queryset</span><span class="o">=</span><span class="n">active_prices_and_programs</span><span class="p">,</span>
</span><span class="hll"><span class="gp">... </span> <span class="n">to_attr</span><span class="o">=</span><span class="s1">'active_prices'</span><span class="p">,</span>
</span><span class="hll"><span class="gp">... </span> <span class="p">),</span>
</span><span class="gp">... </span><span class="p">)</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="go">(0.001) SELECT ... FROM "orders_order"</span>
<span class="go">WHERE "orders_order"."state" = 'completed'</span>
<span class="go">ORDER BY "orders_order"."id" ASC</span>
<span class="go">LIMIT 1;</span>
<span class="go">(0.001) SELECT ... FROM "events_price"</span>
<span class="go">INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id")</span>
<span class="go">INNER JOIN "events_program" ON ("events_price"."program_id" = "events_program"."id")</span>
<span class="go">WHERE ("orders_order_items"."order_id" IN (29)</span>
<span class="hll"><span class="go">AND "events_price"."from_date" <= '2017β04β29T07:53:00.210537+00:00'::timestamptz</span>
</span><span class="hll"><span class="go">AND "events_price"."to_date" > '2017β04β29T07:53:00.210537+00:00'::timestamptz);</span>
</span></pre></div>
<p>We can see in the log that Django performed only two queries, and the prefetch query now include the custom filter we defined.</p>
<p>To fetch the active prices we can use the new attribute defined in <code>to_attr</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">program</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">o</span><span class="o">.</span><span class="n">active_prices</span><span class="p">]</span>
<span class="go">['Day 1 Pass']</span>
</pre></div>
<p><strong>No additional query!</strong></p>
<hr>
<h3 id="final-words"><a class="toclink" href="#final-words">Final Words</a></h3>
<p>Prefetch is a very powerful feature of Django ORM. I strongly recommend going over <a href="https://docs.djangoproject.com/en/2.1/ref/models/querysets/#prefetch-related" rel="noopener">the documentation</a>, you are bound to strike a gem.</p>How to Turn Django Admin Into a Lightweight Dashboard2017-03-31T00:00:00+03:002017-03-31T00:00:00+03:00Haki Benitatag:hakibenita.com,2017-03-31:/how-to-turn-django-admin-into-a-lightweight-dashboard<p>Django Admin is a powerful tool for managing data in your app. However, it was not designed with summary tables and charts in mind. Luckily, the developers of Django Admin made it easy for us to customize. We are going to turn Django Admin into a dashboard by adding a chart and a summary table.</p><hr>
<div class="admonition info">
<p class="admonition-title">source code</p>
<p>The complete source code for this article can be found in <a href="https://gist.github.com/hakib/ec462baef03a6146654e4c095142b5eb" rel="noopener">this gist</a>.</p>
</div>
<p>Django Admin is a powerful tool for managing data in your app. However, it was not designed with summary tables and charts in mind. Luckily, the developers of Django Admin made it easy for us to customize.</p>
<p>This is what it's going to look like at the end:</p>
<figure><img alt="Django admin dashboard" src="https://hakibenita.com/images/01-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard</figcaption>
</figure>
<h3 id="why-would-i-want-to-do-that"><a class="toclink" href="#why-would-i-want-to-do-that">Why Would I Want To Do That</a></h3>
<p>There are a lot of tools, apps and packages out there that can produce very nice looking dashboards. I personally found that <strong>unless the product is an actual dashboard</strong>, most of the time all you need is a simple summary table and a few charts.</p>
<p>Second, and just as important - no dependencies.</p>
<p><strong>If all you need is a little boost to your admin interface this approach is definitely worth considering.</strong></p>
<h3 id="setup"><a class="toclink" href="#setup">Setup</a></h3>
<p>We are going to use a made up <code>Sale</code> model.</p>
<p>To harness the full power of Django Admin we are going to <strong>base our dashboard on a the built-in ModelAdmin</strong>.</p>
<p>To do that we need a model:</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">SaleSummary</span><span class="p">(</span><span class="n">Sale</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="hll"> <span class="n">proxy</span> <span class="o">=</span> <span class="kc">True</span>
</span> <span class="n">verbose_name</span> <span class="o">=</span> <span class="s1">'Sale Summary'</span>
<span class="n">verbose_name_plural</span> <span class="o">=</span> <span class="s1">'Sales Summary'</span>
</pre></div>
<p>A <a href="https://docs.djangoproject.com/en/1.10/topics/db/models/#proxy-models" rel="noopener">proxy model</a> extends the functionality of another model without creating an actual table in the database.</p>
<p>Now that we have a model we can create the <code>ModelAdmin</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">SaleSummary</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">SaleSummary</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SaleSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">change_list_template</span> <span class="o">=</span> <span class="s1">'admin/sale_summary_change_list.html'</span>
<span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
</pre></div>
<p>Because we are using a standard <code>ModelAdmin</code> we can use its features. In this example I added a <code>date_hierarchy</code> to filter sales by creation date. We are going to use this later for the chart.</p>
<p>To keep the page looking like a "regular" admin page we extend Django's <code>change_list</code> template and place our content in the <code>result_list</code> block:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- sales/templates/admin/sale_summary_change_list.html --></span>
<span class="hll"><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"admin/change_list.html"</span> <span class="cp">%}</span>
</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">content_title</span> <span class="cp">%}</span>
<span class="x"> <h1> Sales Summary </h1></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">result_list</span> <span class="cp">%}</span>
<span class="hll"><span class="x"> <!-- Our content goes here... --></span>
</span><span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">pagination</span> <span class="cp">%}{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div>
<p>This is what our page looks like at this point:</p>
<figure><img alt="A bare Django admin dashboard" src="https://hakibenita.com/images/02-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>A bare Django admin dashboard</figcaption>
</figure>
<h3 id="adding-a-summary-table"><a class="toclink" href="#adding-a-summary-table">Adding a Summary Table</a></h3>
<p>The context sent to the template is populated by the <code>ModelAdmin</code> in a function called <code>changelist_view</code>.</p>
<p>To render the table in the template we fetch the data in <code>changelist_view</code> and add it to the context:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="k">class</span> <span class="nc">SaleSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">changelist_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">response</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">changelist_view</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span>
<span class="n">extra_context</span><span class="o">=</span><span class="n">extra_context</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="n">qs</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">context_data</span><span class="p">[</span><span class="s1">'cl'</span><span class="p">]</span><span class="o">.</span><span class="n">queryset</span>
</span> <span class="k">except</span> <span class="p">(</span><span class="ne">AttributeError</span><span class="p">,</span> <span class="ne">KeyError</span><span class="p">):</span>
<span class="k">return</span> <span class="n">response</span>
<span class="n">metrics</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'total'</span><span class="p">:</span> <span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span>
<span class="s1">'total_sales'</span><span class="p">:</span> <span class="n">Sum</span><span class="p">(</span><span class="s1">'price'</span><span class="p">),</span>
<span class="p">}</span>
<span class="hll"> <span class="n">response</span><span class="o">.</span><span class="n">context_data</span><span class="p">[</span><span class="s1">'summary'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span>
</span> <span class="n">qs</span>
<span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'sale__category__name'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="o">**</span><span class="n">metrics</span><span class="p">)</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-total_sales'</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">response</span>
</pre></div>
<p>Let's break it down:</p>
<ol>
<li>Call super to let Django do its thing (populate headers, breadcrumbs, queryset, filters and so on).</li>
<li>Extract the queryset created for us from the context. At this point the query is filtered with any inline filters or date hierarchy selected by the user.</li>
<li>If we can't fetch the queryset from the context it's most likely due to invalid query parameters. In cases like this Django will redirect so we don't interfere and return the response.</li>
<li>Aggregate total sales by category and return a list (the "metrics" dict will become clear in the next section).</li>
</ol>
<p>Now that we have the data in the context we can render it in the template:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- sale_summary_change_list.html --></span>
<span class="cp">{%</span> <span class="k">load</span> <span class="nv">humanize</span> <span class="cp">%}</span>
<span class="x"><!-- ... --></span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">result_list</span> <span class="cp">%}</span>
<span class="x"><div class="results"></span>
<span class="x"> <table></span>
<span class="x"> <thead></span>
<span class="x"> <tr></span>
<span class="x"> <th></span>
<span class="x"> <div class="text"></span>
<span class="hll"><span class="x"> <a href="#">Category</a></span>
</span><span class="x"> </div></span>
<span class="x"> </th></span>
<span class="x"> <th></span>
<span class="x"> <div class="text"></span>
<span class="hll"><span class="x"> <a href="#">Total</a></span>
</span><span class="x"> </div></span>
<span class="x"> </th></span>
<span class="x"> <th></span>
<span class="x"> <div class="text"></span>
<span class="hll"><span class="x"> <a href="#">Total Sales</a></span>
</span><span class="x"> </div></span>
<span class="x"> </th></span>
<span class="x"> <th></span>
<span class="x"> <div class="text"></span>
<span class="x"> <a href="#"></span>
<span class="hll"><span class="x"> <strong>% Of Total Sales</strong></span>
</span><span class="x"> </a></span>
<span class="x"> </div></span>
<span class="x"> </th></span>
<span class="x"> </tr></span>
<span class="x"> </thead></span>
<span class="x"> <tbody></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">for</span> <span class="nv">row</span> <span class="k">in</span> <span class="nv">summary</span> <span class="cp">%}</span>
<span class="hll"><span class="x"> <tr class="</span><span class="cp">{%</span> <span class="k">cycle</span> <span class="s1">'row1'</span> <span class="s1">'row2'</span> <span class="cp">%}</span><span class="x">"></span>
</span><span class="x"> <td> </span><span class="cp">{{</span> <span class="nv">row.sale__category__name</span> <span class="cp">}}</span><span class="x"> </td></span>
<span class="x"> <td> </span><span class="cp">{{</span> <span class="nv">row.total</span> <span class="o">|</span> <span class="nf">intcomma</span> <span class="cp">}}</span><span class="x"> </td></span>
<span class="x"> <td> </span><span class="cp">{{</span> <span class="nv">row.total_sales</span> <span class="o">|</span> <span class="nf">default</span><span class="o">:</span><span class="m">0</span> <span class="o">|</span> <span class="nf">intcomma</span> <span class="cp">}}</span><span class="x">$ </td></span>
<span class="x"> <td></span>
<span class="x"> <strong></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">row.total_sales</span> <span class="o">|</span>
<span class="nf">default</span><span class="o">:</span><span class="m">0</span> <span class="o">|</span>
<span class="nf">percentof</span><span class="o">:</span><span class="nv">summary_total.total_sales</span> <span class="cp">}}</span>
<span class="x"> </strong></span>
<span class="x"> </td></span>
<span class="x"> </tr></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="x"> </tbody></span>
<span class="x"> </table></span>
<span class="x"></div></span>
<span class="x"><!-- ... --></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div>
<p><strong>The markup is important</strong>. To get the native Django look we need to render tables in the same way Django renders them.</p>
<p>This is what we have so far:</p>
<figure><img alt="Django admin dashboard with just a table" src="https://hakibenita.com/images/03-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard with just a table</figcaption>
</figure>
<p>A summary table is not much without a bottom line. We can use the metrics and do some Django ORM voodoo to quickly <strong>calculate the bottom line</strong>:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="k">class</span> <span class="nc">SaleSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">changelist_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">response</span><span class="o">.</span><span class="n">context_data</span><span class="p">[</span><span class="s1">'summary_total'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span>
<span class="hll"> <span class="n">qs</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="o">**</span><span class="n">metrics</span><span class="p">)</span>
</span> <span class="p">)</span>
<span class="k">return</span> <span class="n">response</span>
</pre></div>
<p>That's a pretty cool trickβ¦</p>
<p>Lets add the bottom line to the table:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- sale_summary_change_list.html --></span>
<span class="x"><div class="results"></span>
<span class="x"> <table></span>
<span class="x"> <!-- ... --></span>
<span class="x"> <tr style="font-weight:bold; border-top:2px solid #DDDDDD;"></span>
<span class="x"> <td> Total </td></span>
<span class="hll"><span class="x"> <td> </span><span class="cp">{{</span> <span class="nv">summary_total.total</span> <span class="o">|</span> <span class="nf">intcomma</span> <span class="cp">}}</span><span class="x"> </td></span>
</span><span class="hll"><span class="x"> <td> </span><span class="cp">{{</span> <span class="nv">summary_total.total_sales</span> <span class="o">|</span> <span class="nf">default</span><span class="o">:</span><span class="m">0</span> <span class="cp">}}</span><span class="x">$ </td></span>
</span><span class="x"> <td> 100% </td></span>
<span class="x"> </tr></span>
<span class="x"> </table></span>
<span class="x"></div></span>
</pre></div>
<p>This is starting to take shape:</p>
<figure><img alt="Django admin dashboard with a summary table" src="https://hakibenita.com/images/04-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard with a summary table</figcaption>
</figure>
<h3 id="adding-filters"><a class="toclink" href="#adding-filters">Adding Filters</a></h3>
<p>We are using a "regular" model admin so <strong>filters are already baked in</strong>. Let's add a filter by device:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="k">class</span> <span class="nc">SaleSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="n">list_filter</span> <span class="o">=</span> <span class="p">(</span>
</span><span class="hll"> <span class="s1">'device'</span><span class="p">,</span>
</span><span class="hll"> <span class="p">)</span>
</span></pre></div>
<p>And the result:</p>
<figure><img alt="Django admin dashboard with a filter" src="https://hakibenita.com/images/05-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard with a filter</figcaption>
</figure>
<h3 id="adding-a-chart"><a class="toclink" href="#adding-a-chart">Adding a Chart</a></h3>
<p>A dashboard is not complete without a chart so <strong>we are going to add a bar chart to show sales over time.</strong></p>
<p>To build our chart we are going to use plain HTML and some good ol' CSS with flexbox. The data for the chart is going to be a time series of percents to use as the bar height.</p>
<p>Back to our <code>changelist_view</code>, we add the following:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.db.models.functions</span> <span class="kn">import</span> <span class="n">Trunc</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">DateTimeField</span>
<span class="k">class</span> <span class="nc">SalesSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">changelist_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">summary_over_time</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">period</span><span class="o">=</span><span class="n">Trunc</span><span class="p">(</span>
<span class="s1">'created'</span><span class="p">,</span>
<span class="s1">'day'</span><span class="p">,</span>
<span class="n">output_field</span><span class="o">=</span><span class="n">DateTimeField</span><span class="p">(),</span>
<span class="p">),</span>
<span class="p">)</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'period'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="s1">'price'</span><span class="p">))</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'period'</span><span class="p">)</span>
<span class="n">summary_range</span> <span class="o">=</span> <span class="n">summary_over_time</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="n">low</span><span class="o">=</span><span class="n">Min</span><span class="p">(</span><span class="s1">'total'</span><span class="p">),</span>
<span class="n">high</span><span class="o">=</span><span class="n">Max</span><span class="p">(</span><span class="s1">'total'</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">summary_range</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'high'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">summary_range</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'low'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="hll">
</span> <span class="n">response</span><span class="o">.</span><span class="n">context_data</span><span class="p">[</span><span class="s1">'summary_over_time'</span><span class="p">]</span> <span class="o">=</span> <span class="p">[{</span>
<span class="s1">'period'</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="s1">'period'</span><span class="p">],</span>
<span class="s1">'total'</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="s1">'total'</span><span class="p">]</span> <span class="ow">or</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'pct'</span><span class="p">:</span> \
<span class="p">((</span><span class="n">x</span><span class="p">[</span><span class="s1">'total'</span><span class="p">]</span> <span class="ow">or</span> <span class="mi">0</span><span class="p">)</span> <span class="o">-</span> <span class="n">low</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">high</span> <span class="o">-</span> <span class="n">low</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span>
<span class="k">if</span> <span class="n">high</span> <span class="o">></span> <span class="n">low</span> <span class="k">else</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">}</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">summary_over_time</span><span class="p">]</span>
<span class="k">return</span> <span class="n">response</span>
</pre></div>
<p>Let's add the bar chart to the template and style it a bit:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- sale_summary_change_list.html --></span>
<span class="x"><div class="results"></span>
<span class="x"> <!-- ... --></span>
<span class="x"> <h2> Sales over time </h2></span>
<span class="x"> <style></span>
<span class="x"> .bar-chart {</span>
<span class="hll"><span class="x"> display: flex;</span>
</span><span class="hll"><span class="x"> justify-content: space-around;</span>
</span><span class="x"> height: 160px;</span>
<span class="x"> padding-top: 60px;</span>
<span class="x"> overflow: hidden;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar {</span>
<span class="hll"><span class="x"> flex: 100%;</span>
</span><span class="hll"><span class="x"> align-self: flex-end;</span>
</span><span class="x"> margin-right: 2px;</span>
<span class="x"> position: relative;</span>
<span class="x"> background-color: #79aec8;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar:last-child {</span>
<span class="x"> margin: 0;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar:hover {</span>
<span class="x"> background-color: #417690;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar .bar-tooltip {</span>
<span class="x"> position: relative;</span>
<span class="x"> z-index: 999;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar .bar-tooltip {</span>
<span class="x"> position: absolute;</span>
<span class="x"> top: -60px;</span>
<span class="x"> left: 50%;</span>
<span class="x"> transform: translateX(-50%);</span>
<span class="x"> text-align: center;</span>
<span class="x"> font-weight: bold;</span>
<span class="x"> opacity: 0;</span>
<span class="x"> }</span>
<span class="x"> .bar-chart .bar:hover .bar-tooltip {</span>
<span class="x"> opacity: 1;</span>
<span class="x"> }</span>
<span class="x"> </style></span>
<span class="x"> <div class="results"></span>
<span class="x"> <div class="bar-chart"></span>
<span class="hll"><span class="x"> </span><span class="cp">{%</span> <span class="k">for</span> <span class="nv">x</span> <span class="k">in</span> <span class="nv">summary_over_time</span> <span class="cp">%}</span>
</span><span class="x"> <div class="bar" style="height:</span><span class="cp">{{</span><span class="nv">x.pct</span><span class="cp">}}</span><span class="x">%"></span>
<span class="x"> <div class="bar-tooltip"></span>
<span class="x"> </span><span class="cp">{{</span><span class="nv">x.total</span> <span class="o">|</span> <span class="nf">default</span><span class="o">:</span><span class="m">0</span> <span class="o">|</span> <span class="nf">intcomma</span> <span class="cp">}}</span><span class="x"><br></span>
<span class="x"> </span><span class="cp">{{</span><span class="nv">x.period</span> <span class="o">|</span> <span class="nf">date</span><span class="s2">:"d/m/Y"</span><span class="cp">}}</span>
<span class="x"> </div></span>
<span class="x"> </div></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="x"> </div></span>
<span class="x"> </div></span>
<span class="x"></div></span>
</pre></div>
<p>For those of you not familiar with flexbox, that piece of CSS means "draw from the bottom up, pull to the left and adjust the width to fit".</p>
<p>This is how it looks like now:</p>
<figure><img alt="Django admin dashboard with a basic chart" src="https://hakibenita.com/images/06-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard with a basic chart</figcaption>
</figure>
<p>That's looking pretty good, but... Each bar in the chart represents a day. What will happen when we try to show data for a single day? Or several years?</p>
<figure><img alt="Daily chart for several years" src="https://hakibenita.com/images/07-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Daily chart for several years</figcaption>
</figure>
<p>A chart like that is both <strong>unreadable and dangerous</strong>. Fetching so much data will flood the server and generate a huge HTML file.</p>
<p>Django Admin has a date hierarchy - let's see if we can use that to <strong>adjust the period of the bars based on the selected date hierarchy:</strong></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_next_in_date_hierarchy</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">date_hierarchy</span><span class="p">):</span>
<span class="k">if</span> <span class="n">date_hierarchy</span> <span class="o">+</span> <span class="s1">'__day'</span> <span class="ow">in</span> <span class="n">request</span><span class="o">.</span><span class="n">GET</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'hour'</span>
<span class="k">if</span> <span class="n">date_hierarchy</span> <span class="o">+</span> <span class="s1">'__month'</span> <span class="ow">in</span> <span class="n">request</span><span class="o">.</span><span class="n">GET</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'day'</span>
<span class="k">if</span> <span class="n">date_hierarchy</span> <span class="o">+</span> <span class="s1">'__year'</span> <span class="ow">in</span> <span class="n">request</span><span class="o">.</span><span class="n">GET</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'week'</span>
<span class="k">return</span> <span class="s1">'month'</span>
</pre></div>
<ul>
<li>If the user filtered a <strong>single day </strong>each bar will be <strong>one hour</strong> (max 24 bars).</li>
<li>If the user selected a <strong>month </strong>each bar will be <strong>one day</strong> (max 31 bars).</li>
<li>If the user selected a <strong>year </strong>each bar will be <strong>one week </strong>(max 52 bars).</li>
<li><strong>More</strong> than that and each bar will be <strong>one month</strong>.</li>
</ul>
<p>Now we need just one small adjustment to the change list view:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="k">class</span> <span class="nc">SalesSummaryAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">changelist_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="n">period</span> <span class="o">=</span> <span class="n">get_next_in_date_hierarchy</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">date_hierarchy</span><span class="p">)</span>
</span><span class="hll"> <span class="n">response</span><span class="o">.</span><span class="n">context_data</span><span class="p">[</span><span class="s1">'period'</span><span class="p">]</span> <span class="o">=</span> <span class="n">period</span>
</span>
<span class="n">summary_over_time</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="hll"> <span class="n">period</span><span class="o">=</span><span class="n">Trunc</span><span class="p">(</span><span class="s1">'created'</span><span class="p">,</span> <span class="n">period</span><span class="p">,</span> <span class="n">output_field</span><span class="o">=</span><span class="n">DateTimeField</span><span class="p">()),</span>
</span> <span class="p">)</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'period'</span><span class="p">)</span>
<span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="s1">'price'</span><span class="p">))</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'period'</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>The <code>period</code> argument passed to <code>Trunc</code> is now a parameter. The result:</p>
<figure><img alt="Django admin dashboard chart with adjusted period" src="https://hakibenita.com/images/08-how-to-turn-django-admin-into-a-lightweight-dashboard.png"><figcaption>Django admin dashboard chart with adjusted period</figcaption>
</figure>
<p>That's a beautiful trend...</p>
<hr>
<h3 id="where-can-we-take-it-from-here"><a class="toclink" href="#where-can-we-take-it-from-here">Where Can We Take It From Here?</a></h3>
<p>Now that you have all this spare time from <em>not</em> rolling your own dashboard you can:</p>
<ul>
<li><a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger">Make it faster</a>.</li>
<li><a href="/how-to-add-custom-action-buttons-to-django-admin">Add a button</a>.</li>
</ul>How to Test Django Signals Like a Pro2017-02-18T00:00:00+02:002017-02-18T00:00:00+02:00Haki Benitatag:hakibenita.com,2017-02-18:/how-to-test-django-signals-like-a-pro<p>Django signals are extremely useful for decoupling modules. They allow a low-level Django app to send events for other apps to handle without creating a direct dependency. Signals are easy to set up, but harder to test. In this article we implement a context manager for testing Django signals, step by step.</p><hr>
<p><a href="https://docs.djangoproject.com/en/1.10/topics/signals/" rel="noopener">Django Signals</a> are extremely useful for decoupling modules. They allow a low-level Django app to
send events for other apps to handle without creating a direct dependency.</p>
<h3 id="the-use-case"><a class="toclink" href="#the-use-case">The Use Case</a></h3>
<p>Let's say you have a payment module with a charge function. (I <a href="/working-with-apis-the-pythonic-way">write a lot about payments</a>, so I know this use case well.) Once a charge is made, you want to increment a total charges counter.</p>
<p>What would that look like using signals?</p>
<p>First, define the signal:</p>
<div class="highlight"><pre><span></span><span class="c1"># signals.py</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">Signal</span>
<span class="n">charge_completed</span> <span class="o">=</span> <span class="n">Signal</span><span class="p">(</span><span class="n">providing_args</span><span class="o">=</span><span class="p">[</span><span class="s1">'total'</span><span class="p">])</span>
</pre></div>
<p>Then send the signal when a charge completes successfully:</p>
<div class="highlight"><pre><span></span><span class="c1"># payment.py</span>
<span class="kn">from</span> <span class="nn">.signals</span> <span class="kn">import</span> <span class="n">charge_completed</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">process_charge</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">total</span><span class="p">):</span>
<span class="c1"># Process charge...</span>
<span class="k">if</span> <span class="n">success</span><span class="p">:</span>
<span class="hll"> <span class="n">charge_completed</span><span class="o">.</span><span class="n">send_robust</span><span class="p">(</span><span class="n">sender</span><span class="o">=</span><span class="bp">cls</span><span class="p">,</span> <span class="n">total</span><span class="o">=</span><span class="n">total</span><span class="p">)</span>
</span></pre></div>
<p>A different app, such as a summary app, can connect a handler that increments a total charges counter:</p>
<div class="highlight"><pre><span></span><span class="c1"># summary.py</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="kn">from</span> <span class="nn">.signals</span> <span class="kn">import</span> <span class="n">charge_completed</span>
<span class="hll"><span class="nd">@receiver</span><span class="p">(</span><span class="n">charge_completed</span><span class="p">)</span>
</span><span class="k">def</span> <span class="nf">increment_total_charges</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">total</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">total_charges</span> <span class="o">+=</span> <span class="n">total</span>
</pre></div>
<p>The payment module does not have to know the summary module or any other module handling completed charges. <strong>You can add many receivers without modifying the payment module</strong>.</p>
<p>For example, the following are good candidates for receivers:</p>
<ul>
<li>Update the transaction status.</li>
<li>Send an email notification to the user.</li>
<li>Update the last used date of the credit card.</li>
</ul>
<hr>
<h3 id="testing-signals"><a class="toclink" href="#testing-signals">Testing Signals</a></h3>
<p>Now that you got the basics covered, let's write a test for <code>process_charge</code>. You want to make sure the signal is sent with the right arguments when a charge completes successfully.</p>
<p>The best way to test if a signal was sent is to connect to it:</p>
<div class="highlight"><pre><span></span><span class="c1"># test.py</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">TestCase</span>
<span class="kn">from</span> <span class="nn">.payment</span> <span class="kn">import</span> <span class="n">charge</span>
<span class="kn">from</span> <span class="nn">.signals</span> <span class="kn">import</span> <span class="n">charge_completed</span>
<span class="k">class</span> <span class="nc">TestCharge</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">test_should_send_signal_when_charge_succeeds</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">total</span> <span class="o">=</span> <span class="kc">None</span>
<span class="hll"> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">total</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">total</span> <span class="o">=</span> <span class="n">total</span>
<span class="hll">
</span> <span class="n">charge_completed</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
<span class="n">charge</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">total</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="hll">
</span> <span class="n">charge_completed</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
</pre></div>
<p>We create a handler, connect to the signal, execute the function and check the args.</p>
<p>We use <code>self</code> inside the handler to create a closure. If we hadn't used <code>self</code> the handler function would update the variables in its local scope and we won't have access to them. We will revisit this later.</p>
<p>Let's add a test to <strong>make sure the signal is not called if the charge failed</strong>:</p>
<div class="highlight"><pre><span></span><span class="c1"># test.py</span>
<span class="k">def</span> <span class="nf">test_should_not_send_signal_when_charge_failed</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">total</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">charge_completed</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
<span class="hll"> <span class="n">charge</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertFalse</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">signal_was_called</span><span class="p">)</span>
<span class="n">charge_completed</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
</pre></div>
<p>This is working but it's <strong>a lot of boilerplate! </strong>There must be a better way.</p>
<h3 id="enter-context-manager"><a class="toclink" href="#enter-context-manager">Enter Context Manager</a></h3>
<p>Let's break down what we did so far:</p>
<ol>
<li>Connect a signal to some handler.</li>
<li>Run the test code and save the arguments passed to the handler.</li>
<li>Disconnect the handler from the signal.</li>
</ol>
<p>This pattern sounds familiar...</p>
<p>Let's look at what a (file) <a href="https://docs.python.org/3/library/functions.html#open" rel="noopener">open context manager</a> does:</p>
<ol>
<li>Open a file.</li>
<li>Process the file.</li>
<li>Close the file.</li>
</ol>
<p>And a <a href="https://docs.djangoproject.com/en/1.10/topics/db/transactions/#controlling-transactions-explicitly" rel="noopener">database transaction context manager</a>:</p>
<ol>
<li>Open transaction.</li>
<li>Execute some operations.</li>
<li>Close transaction (commit / rollback).</li>
</ol>
<p>It looks like <strong>a context manager can work for signals as well</strong>.</p>
<p>Before you start, think how you want to use a context manager to test signals:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">CatchSignal</span><span class="p">(</span><span class="n">charge_completed</span><span class="p">)</span> <span class="k">as</span> <span class="n">signal_args</span><span class="p">:</span>
<span class="n">charge</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">signal_args</span><span class="o">.</span><span class="n">total</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>Nice, let's give it a try:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">CatchSignal</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">signal</span><span class="p">):</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">signal</span> <span class="o">=</span> <span class="n">signal</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">signal_kwargs</span> <span class="o">=</span> <span class="p">{}</span>
<span class="hll"> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">signal_kwrags</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">kwargs</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">handler</span> <span class="o">=</span> <span class="n">handler</span>
<span class="k">def</span> <span class="fm">__enter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">signal</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">handler</span><span class="p">)</span>
</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">signal_kwrags</span>
<span class="k">def</span> <span class="fm">__exit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc_type</span><span class="p">,</span> <span class="n">exc_value</span><span class="p">,</span> <span class="n">tb</span><span class="p">):</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">signal</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">handler</span><span class="p">)</span>
</span></pre></div>
<p>What we have here:</p>
<ul>
<li>You initialized the context with the signal you want to "catch".</li>
<li>The context creates a handler function to save the arguments sent by the signal.</li>
<li>You create closure by updating an existing object (<code>signal_kwargs</code>) on <code>self</code>.</li>
<li>You connect the handler to the signal.</li>
<li>Some processing is done (by the test) between <code>__enter__</code> and <code>__exit__</code>.</li>
<li>You disconnect the handler from the signal.</li>
</ul>
<p>Let's use the context manager to test the charge function:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_send_signal_when_charge_succeeds</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="n">CatchSignal</span><span class="p">(</span><span class="n">charge_completed</span><span class="p">)</span> <span class="k">as</span> <span class="n">signal_args</span><span class="p">:</span>
</span> <span class="n">charge</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">signal_args</span><span class="p">[</span><span class="s1">'total'</span><span class="p">],</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>This is better, but <strong>how would the negative test look like?</strong></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_should_not_send_signal_when_charge_failed</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">CatchSignal</span><span class="p">(</span><span class="n">signal</span><span class="p">)</span> <span class="k">as</span> <span class="n">signal_args</span><span class="p">:</span>
<span class="n">charge</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">signal_args</span><span class="p">,</span> <span class="p">{})</span>
</span></pre></div>
<p>Yak, that's bad.</p>
<p>Let's take another look at the handler:</p>
<ul>
<li>We want to make sure the handler function was invoked.</li>
<li>We want to test the args sent to the handler function.</li>
</ul>
<p>Wait... <strong>I already know this function!</strong></p>
<h3 id="enter-mock"><a class="toclink" href="#enter-mock">Enter Mock</a></h3>
<p>Let's replace our handler with a Mock:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
</span>
<span class="k">class</span> <span class="nc">CatchSignal</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">signal</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal</span> <span class="o">=</span> <span class="n">signal</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">handler</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
</span>
<span class="k">def</span> <span class="fm">__enter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">handler</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">handler</span>
<span class="k">def</span> <span class="fm">__exit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc_type</span><span class="p">,</span> <span class="n">exc_value</span><span class="p">,</span> <span class="n">tb</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">signal</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">handler</span><span class="p">)</span>
</pre></div>
<p>And the tests:</p>
<div class="highlight"><pre><span></span><span class="c1"># test.py</span>
<span class="k">def</span> <span class="nf">test_should_send_signal_when_charge_succeeds</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">CatchSignal</span><span class="p">(</span><span class="n">charge_completed</span><span class="p">)</span> <span class="k">as</span> <span class="n">handler</span><span class="p">:</span>
<span class="n">charge</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="hll"> <span class="n">handler</span><span class="o">.</span><span class="n">assert_called_once_with</span><span class="p">(</span>
</span> <span class="n">total</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
<span class="n">sender</span><span class="o">=</span><span class="n">mock</span><span class="o">.</span><span class="n">ANY</span><span class="p">,</span>
<span class="n">signal</span><span class="o">=</span><span class="n">charge_completed</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_not_send_signal_when_charge_failed</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">CatchSignal</span><span class="p">(</span><span class="n">charge_completed</span><span class="p">)</span> <span class="k">as</span> <span class="n">handler</span><span class="p">:</span>
<span class="n">charge</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="hll"> <span class="n">handler</span><span class="o">.</span><span class="n">assert_not_called</span><span class="p">()</span>
</span></pre></div>
<p><strong>Much better!</strong></p>
<p>You used the mock for exactly what it should be used for, and you don't need to worry about scope and closure.</p>
<p>Now that you have this working, <strong>can you make it even better?</strong></p>
<h3 id="enter-contextlib"><a class="toclink" href="#enter-contextlib">Enter <code>contextlib</code></a></h3>
<p>Python has a utility module for handling context managers called <a href="https://docs.python.org/3.6/library/contextlib.html" rel="noopener">contextlib</a>.</p>
<p>Let's rewrite our context using <code>contextlib</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="hll"><span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">contextmanager</span>
</span>
<span class="hll"><span class="nd">@contextmanager</span>
</span><span class="k">def</span> <span class="nf">catch_signal</span><span class="p">(</span><span class="n">signal</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Catch django signal and return the mocked call."""</span>
<span class="n">handler</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">signal</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">handler</span>
<span class="n">signal</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
</pre></div>
<p>I like this approach better because it's easier to follow:</p>
<ul>
<li>The yield makes it clear where the test code is executed.</li>
<li>No need to save objects on <code>self</code> because the setup code (enter and exit) are in the same scope.</li>
</ul>
<p>And that's it, 4 lines of code to rule them all! <strong>Profit!</strong></p>Working With APIs the Pythonic Way2017-01-05T00:00:00+02:002017-01-05T00:00:00+02:00Haki Benitatag:hakibenita.com,2017-01-05:/working-with-apis-the-pythonic-way<p>Communication with external services is an integral part of any modern system. Whether it's a payment service, authentication, analytics or an internal oneβ-βsystems need to talk to each other. In this short article we are going to implement a module for communicating with a made-up payment gateway, step by step.</p><hr>
<p>Communication with external services is an integral part of any modern system. Whether it's a payment service, authentication, analytics or an internal one - <strong>systems need to talk to each other</strong>.</p>
<p><strong>In this short article we are going to implement a module for communicating with a made-up payment gateway, step by step.</strong></p>
<figure><img alt="It used to be harder" src="https://hakibenita.com/images/01-working-with-apis-the-pythonic-way.jpeg"><figcaption>It used to be harder</figcaption>
</figure>
<h3 id="the-external-service"><a class="toclink" href="#the-external-service">The External Service</a></h3>
<p>Let's start by defining an imaginary payment service.</p>
<p>To charge a credit card we need a credit card token, an amount to charge (in cents) and some unique ID provided by the client (us):</p>
<div class="highlight"><pre><span></span>POST
{
token: <string>,
amount: <number>,
uid: <string>,
}
</pre></div>
<p>If the charge was successful we get a 200 OK status with the data from our request, an expiration time for the charge and a transaction ID:</p>
<div class="highlight"><pre><span></span>200 OK
{
uid: <string>,
amount: <number>,
token: <string>,
expiration: <string, isoformat>,
transaction_id: <number>
}
</pre></div>
<p>If the charge was not successful we get a 400 status with an error code and an informative message:</p>
<div class="highlight"><pre><span></span>400 Bad Request
{
uid: <string>,
error: <number>,
message: <string>
}
</pre></div>
<p>There are two error codes we want to handle - 1 = refused, and 2 = stolen.</p>
<h3 id="naive-implementation"><a class="toclink" href="#naive-implementation">Naive Implementation</a></h3>
<p>To get the ball rolling, we start with a naive implementation and build from there:</p>
<div class="highlight"><pre><span></span><span class="c1"># payments.py</span>
<span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="n">PAYMENT_GATEWAY_BASE_URL</span> <span class="o">=</span> <span class="s1">'https://gw.com/api'</span>
<span class="n">PAYMENT_GATEWAY_TOKEN</span> <span class="o">=</span> <span class="s1">'topsecret'</span>
<span class="k">def</span> <span class="nf">charge</span><span class="p">(</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">token</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">"""Charge.</span>
<span class="sd"> amount (int):</span>
<span class="sd"> Amount in cents to charge.</span>
<span class="sd"> token (str):</span>
<span class="sd"> Credit card token.</span>
<span class="sd"> timeout (int):</span>
<span class="sd"> Timeout in seconds.</span>
<span class="sd"> Returns (dict):</span>
<span class="sd"> New payment information.</span>
<span class="sd"> """</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"Authorization"</span><span class="p">:</span> <span class="s2">"Bearer "</span> <span class="o">+</span> <span class="n">PAYMENT_GATEWAY_TOKEN</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"token"</span><span class="p">:</span> <span class="n">token</span><span class="p">,</span>
<span class="s2">"amount"</span><span class="p">:</span> <span class="n">amount</span><span class="p">,</span>
<span class="s2">"uid"</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">()),</span>
<span class="p">}</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">PAYMENT_GATEWAY_BASE_URL</span> <span class="o">+</span> <span class="s1">'/charge'</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
</pre></div>
<p>90% of developer will stop here, <strong>so what is the problem?</strong></p>
<h3 id="handling-errors"><a class="toclink" href="#handling-errors">Handling Errors</a></h3>
<p>There are two types of errors we need to handle:</p>
<ul>
<li>HTTP errors such as connection errors, timeout or connection refused.</li>
<li>Remote payment errors such as refusal or stolen card.</li>
</ul>
<p>Our decision to use <code>requests</code> is an internal implementation detail. The consumer of our module shouldn't have to be aware of that.</p>
<p><strong>To provide a complete API our module must communicate errors.</strong></p>
<p>Let's start by defining custom error classes:</p>
<div class="highlight"><pre><span></span><span class="c1"># errors.py</span>
<span class="k">class</span> <span class="nc">Error</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Unavailable</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">PaymentGatewayError</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">code</span> <span class="o">=</span> <span class="n">code</span>
<span class="bp">self</span><span class="o">.</span><span class="n">message</span> <span class="o">=</span> <span class="n">message</span>
<span class="k">class</span> <span class="nc">Refused</span><span class="p">(</span><span class="n">PaymentGatewayError</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Stolen</span><span class="p">(</span><span class="n">PaymentGatewayError</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>I <a href="https://medium.com/@hakibenita/bullet-proofing-django-models-c080739be4e#.4ju7vgl0t" rel="noopener">previously wrote</a> about the benefits of using a base error class.</p>
<p>Let's add exception handling and logging to our function:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">errors</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s1">'payments'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">charge</span><span class="p">(</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">token</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">PAYMENT_GATEWAY_BASE_URL</span> <span class="o">+</span> <span class="s1">'/charge'</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
<span class="k">except</span> <span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span><span class="p">,</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span><span class="p">)</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="hll"> <span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Unavailable</span><span class="p">()</span> <span class="kn">from</span> <span class="nn">e</span>
</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">exceptions</span><span class="o">.</span><span class="n">HTTPError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">if</span> <span class="n">e</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">400</span><span class="p">:</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="n">code</span> <span class="o">=</span> <span class="n">error</span><span class="p">[</span><span class="s1">'code'</span><span class="p">]</span>
<span class="n">message</span> <span class="o">=</span> <span class="n">error</span><span class="p">[</span><span class="s1">'message'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">code</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="hll"> <span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Refused</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
</span> <span class="k">elif</span> <span class="n">code</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="hll"> <span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Stolen</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
</span> <span class="k">else</span><span class="p">:</span>
<span class="hll"> <span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">PaymentGatewayError</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
</span>
<span class="n">logger</span><span class="o">.</span><span class="n">exception</span><span class="p">(</span><span class="s2">"Payment service had internal error."</span><span class="p">)</span>
<span class="hll"> <span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Unavailable</span><span class="p">()</span> <span class="kn">from</span> <span class="nn">e</span>
</span></pre></div>
<p>Great! Our function no longer raises <code>requests</code> exceptions. Important errors such as stolen card or refusal are raised as custom exceptions.</p>
<h3 id="defining-the-response"><a class="toclink" href="#defining-the-response">Defining the Response</a></h3>
<p>Our function returns a dict. A dict is a great and flexible data structure, but when you have a defined set of fields you are better off using a more targeted data type.</p>
<p>In every OOP class you learn that everything is an object. While it is true in Java land, Python has a lightweight solution that works better in our case - <a href="https://docs.python.org/3.7/library/collections.html#collections.namedtuple" rel="noopener"><strong>namedtuple</strong></a>.</p>
<p>A namedtuple is just like it sounds, a tuple where the fields have names. You use it like a class and it consumes less space (even compared to a class with slots).</p>
<p>Let's define a namedtuple for the charge response:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="n">ChargeResponse</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s1">'ChargeResponse'</span><span class="p">,</span> <span class="p">[</span>
<span class="s1">'uid'</span><span class="p">,</span>
<span class="s1">'amount'</span><span class="p">,</span>
<span class="s1">'token'</span><span class="p">,</span>
<span class="s1">'expiration'</span><span class="p">,</span>
<span class="s1">'transaction_id'</span><span class="p">,</span>
<span class="p">])</span>
</pre></div>
<p>If the charge was successful, we create a <code>ChargeResponse</code> object:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">charge</span><span class="p">(</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">token</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="hll"> <span class="n">charge_response</span> <span class="o">=</span> <span class="n">ChargeResponse</span><span class="p">(</span>
</span> <span class="n">uid</span><span class="o">=</span><span class="n">uuid</span><span class="o">.</span><span class="n">UID</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s1">'uid'</span><span class="p">]),</span>
<span class="n">amount</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'amount'</span><span class="p">],</span>
<span class="n">token</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'token'</span><span class="p">],</span>
<span class="n">expiration</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s1">'expiration'</span><span class="p">],</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2">T%H:%M:%S.</span><span class="si">%f</span><span class="s2">"</span><span class="p">),</span>
<span class="n">transaction_id</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'transaction_id'</span><span class="p">],</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">charge_response</span>
</pre></div>
<p>Our function now returns a <code>ChargeResponse</code> object. Additional processing such as casting and validations can be added easily.</p>
<p>In the case of our imaginary payment gateway, we convert the expiration date to a datetime object. The consumer doesn't have to guess the date format used by the remote service (when it comes to date formats I am sure we all encountered a fair share of horrors).</p>
<p>By using a custom "class" as the return value we reduce the dependency in the payment vendorβs serialization format. If the response was an XML, would we still return a dict? That's just awkward.</p>
<h3 id="using-a-session"><a class="toclink" href="#using-a-session">Using a Session</a></h3>
<p>To skim some extra milliseconds from API calls we can use a session. <a href="http://docs.python-requests.org/en/master/user/advanced/#session-objects" rel="noopener">Requests session </a> uses
a connection pool internally. Requests to the same host can benefit from that. We also take the opportunity to add useful configuration such as blocking cookies:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">http.cookiejar</span>
<span class="c1"># A shared requests session for payment requests.</span>
<span class="k">class</span> <span class="nc">BlockAll</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">cookiejar</span><span class="o">.</span><span class="n">CookiePolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">set_ok</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cookie</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">payment_session</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span>
<span class="n">payment_session</span><span class="o">.</span><span class="n">cookies</span><span class="o">.</span><span class="n">policy</span> <span class="o">=</span> <span class="n">BlockAll</span><span class="p">()</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">charge</span><span class="p">(</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">token</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="n">response</span> <span class="o">=</span> <span class="n">payment_session</span><span class="o">.</span><span class="n">post</span><span class="p">(</span> <span class="o">...</span> <span class="p">)</span>
</span> <span class="c1"># ...</span>
</pre></div>
<h3 id="more-actions"><a class="toclink" href="#more-actions">More Actions</a></h3>
<p>Any external service, and a payment service in particular, has more than one action.</p>
<p>The first section of our function takes care of authorization, the request and HTTP errors. The second part handle protocol errors and serialization specific to the charge action.</p>
<p>The first part is relevant to all actions while the second part is specific only to the charge.</p>
<p>Let's split the function so we can reuse the first part:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">import</span> <span class="nn">logging</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">http.cookiejar</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s1">'payments'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">BlockAll</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">cookiejar</span><span class="o">.</span><span class="n">CookiePolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">set_ok</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cookie</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">payment_session</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span>
<span class="n">payment_session</span><span class="o">.</span><span class="n">cookies</span><span class="o">.</span><span class="n">policy</span> <span class="o">=</span> <span class="n">BlockAll</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">make_payment_request</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">payload</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Make a request to the payment gateway.</span>
<span class="sd"> path (str):</span>
<span class="sd"> Path to post to.</span>
<span class="sd"> payload (object):</span>
<span class="sd"> JSON-serializable request payload.</span>
<span class="sd"> timeout (int):</span>
<span class="sd"> Timeout in seconds.</span>
<span class="sd"> Raises</span>
<span class="sd"> Unavailable</span>
<span class="sd"> requests.exceptions.HTTPError</span>
<span class="sd"> Returns (response)</span>
<span class="sd"> """</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"Authorization"</span><span class="p">:</span> <span class="s2">"Bearer "</span> <span class="o">+</span> <span class="n">PAYMENT_GATEWAY_TOKEN</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">payment_session</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">PAYMENT_GATEWAY_BASE_URL</span> <span class="o">+</span> <span class="n">path</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">except</span> <span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span><span class="p">,</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span><span class="p">)</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Unavailable</span><span class="p">()</span> <span class="kn">from</span> <span class="nn">e</span>
<span class="n">response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">charge</span><span class="p">(</span><span class="n">amount</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Charge credit card.</span>
<span class="sd"> amount (int):</span>
<span class="sd"> Amount to charge in cents.</span>
<span class="sd"> token (str):</span>
<span class="sd"> Credit card token.</span>
<span class="sd"> Raises</span>
<span class="sd"> Unavailable</span>
<span class="sd"> Refused</span>
<span class="sd"> Stolen</span>
<span class="sd"> PaymentGatewayError</span>
<span class="sd"> Returns (ChargeResponse)</span>
<span class="sd"> """</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">make_payment_request</span><span class="p">(</span><span class="s1">'/charge'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'uid'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">()),</span>
<span class="s1">'amount'</span><span class="p">:</span> <span class="n">amount</span><span class="p">,</span>
<span class="s1">'token'</span><span class="p">:</span> <span class="n">token</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">HTTPError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">if</span> <span class="n">e</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">400</span><span class="p">:</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="n">code</span> <span class="o">=</span> <span class="n">error</span><span class="p">[</span><span class="s1">'code'</span><span class="p">]</span>
<span class="n">message</span> <span class="o">=</span> <span class="n">error</span><span class="p">[</span><span class="s1">'message'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">code</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">Refused</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
<span class="k">elif</span> <span class="n">code</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">Stolen</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">PaymentGatewayError</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">e</span>
<span class="n">logger</span><span class="o">.</span><span class="n">exception</span><span class="p">(</span><span class="s2">"Payment service had internal error"</span><span class="p">)</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">Unavailable</span><span class="p">()</span> <span class="kn">from</span> <span class="nn">e</span>
<span class="k">return</span> <span class="n">ChargeResponse</span><span class="p">(</span>
<span class="n">uid</span><span class="o">=</span><span class="n">uuid</span><span class="o">.</span><span class="n">UID</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s1">'uid'</span><span class="p">]),</span>
<span class="n">amount</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'amount'</span><span class="p">],</span>
<span class="n">token</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'token'</span><span class="p">],</span>
<span class="n">expiration</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s1">'expiration'</span><span class="p">],</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2">T%H:%M:%S.</span><span class="si">%f</span><span class="s2">"</span><span class="p">),</span>
<span class="n">transaction_id</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s1">'transaction_id'</span><span class="p">],</span>
<span class="p">)</span>
</pre></div>
<p><strong>This is the entire code.</strong></p>
<p>There is a clear separation between "transport", serialization, authentication and request processing. We also have a well defined interface to our top level function <code>charge</code>.</p>
<p>To add a new action we define a new return type, call <code>make_payment_request</code> and handle the response the same way:</p>
<div class="highlight"><pre><span></span><span class="n">RefundResponse</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s1">'RefundResponse'</span><span class="p">,</span> <span class="p">[</span>
<span class="s1">'transaction_id'</span><span class="p">,</span>
<span class="s1">'refunded_transaction_id'</span><span class="p">,</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">refund</span><span class="p">(</span><span class="n">transaction_id</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Refund charged transaction.</span>
<span class="sd"> transaction_id (str):</span>
<span class="sd"> Transaction id to refund.</span>
<span class="sd"> Raises:</span>
<span class="sd"> Return (RefundResponse)</span>
<span class="sd"> """</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">make_payment_request</span><span class="p">(</span><span class="s1">'/refund'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'uid'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">()),</span>
<span class="s1">'transaction_id'</span><span class="p">:</span> <span class="n">transaction_id</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">HTTPError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="c1"># TODO: Handle refund remote errors</span>
<span class="k">return</span> <span class="n">RefundResponse</span><span class="p">(</span>
<span class="s1">'transaction_id'</span><span class="p">:</span> <span class="n">data</span><span class="p">[</span><span class="s1">'transaction_id'</span><span class="p">],</span>
<span class="s1">'refunded_transaction_id'</span><span class="p">:</span> <span class="n">data</span><span class="p">[</span><span class="s1">'refunded_transaction_id'</span><span class="p">],</span>
<span class="p">)</span>
</pre></div>
<p><strong>Profit!</strong></p>
<h3 id="testing"><a class="toclink" href="#testing">Testing</a></h3>
<p>The challenge with external APIs is that you can't (or at least, shouldn't) make calls to them in automated tests. I want to focus on <strong>testing code that uses our payments module</strong> rather than testing the actual module.</p>
<p>Our module has a simple interface so it's easy to mock. Let's test a made up function called <code>charge_user_for_product</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># test.py</span>
<span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">TestCase</span>
<span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="kn">from</span> <span class="nn">payment.payment</span> <span class="kn">import</span> <span class="n">ChargeResponse</span>
<span class="kn">from</span> <span class="nn">payment</span> <span class="kn">import</span> <span class="n">errors</span>
<span class="k">def</span> <span class="nf">TestApp</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="hll"> <span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'payment.charge'</span><span class="p">)</span>
</span> <span class="k">def</span> <span class="nf">test_should_charge_user_for_product</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">mock_charge</span><span class="p">):</span>
<span class="hll"> <span class="n">mock_charge</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">ChargeResponse</span><span class="p">(</span>
</span> <span class="n">uid</span><span class="o">=</span><span class="s1">'test-uid'</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">token</span><span class="o">=</span><span class="s1">'test-token'</span><span class="p">,</span>
<span class="n">expiration</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2017</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">7</span><span class="p">),</span>
<span class="n">transaction_id</span><span class="o">=</span><span class="mi">12345</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">charge_user_for_product</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">product</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">approved_transactions</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="hll"> <span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'payment.charge'</span><span class="p">)</span>
</span> <span class="k">def</span> <span class="nf">test_should_suspend_user_if_stolen</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">mock_charge</span><span class="p">):</span>
<span class="hll"> <span class="n">mock_charge</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">errors</span><span class="o">.</span><span class="n">Stolen</span>
</span> <span class="n">charge_user_for_product</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">product</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">is_active</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
</pre></div>
<p>Pretty straight forward - no need to mock the API response. The tests are contained to data structures we defined ourselves and have full control of.</p>
<h4 id="note-about-dependency-injection"><a class="toclink" href="#note-about-dependency-injection">Note About Dependency Injection</a></h4>
<p>Another approach to test a service is to provide two implementations: the real one, and a fake one. Then for tests, inject the fake one.</p>
<p>This is of course, how dependency injection works. Django doesn't do DI but it utilizes the same concept with "backends" (email, cache, template, etc). For example you can test emails in django by using a test backend, test caching by using in-memory backend, etc.</p>
<p>This also has other advantages in that you can have multiple "real" backends.</p>
<p>Whether you choose to mock the service calls as illustrated above or inject a "fake" service, you must have a proper interface.</p>
<hr>
<h3 id="summary"><a class="toclink" href="#summary">Summary</a></h3>
<p>We have an external service we want to use in our app. We want to implement a module to communicate with that external service and make it robust, resilient and reusable.</p>
<p>We worked the following steps:</p>
<ol>
<li><strong>Naive implementation </strong>- Fetch using requests and return a json response.</li>
<li><strong>Handled errors</strong> - Defined custom errors to catch both transport and remote application errors. The consumer is indifferent to the transport (HTTP, RPC, Web Socket) and implementation details (requests).</li>
<li><strong>Formalize the return value </strong>- Used a namedtuple to return a class-like type that represents a response from the remote service. The consumer is now indifferent to the serialization format as well.</li>
<li><strong>Added a session</strong> - Skimmed off a few milliseconds from the request and added a place for global connection configuration.</li>
<li><strong>Split request from action</strong> - The request part is reusable and new actions can be added more easily.</li>
<li><strong>Test</strong> - Mocked calls to our module and replaced them with our own custom exceptions.</li>
</ol>The Best New Feature in unittest You Didn't Know You Need2016-12-02T00:00:00+02:002016-12-02T00:00:00+02:00Haki Benitatag:hakibenita.com,2016-12-02:/the-best-new-feature-in-unittest-you-didnt-know-you-need<p>From time to time I like to read documentation of modules I think I know well. The python documentation is not a pleasant read but sometimes you strike a gem.</p><hr>
<p>From time to time I like to read documentation of modules <em>I think</em> I know well. The python documentation is not a pleasant read but sometimes you strike a gem.</p>
<figure><img alt="Same thing but slightly different" src="https://hakibenita.com/images/01-the-best-new-feature-in-unittest-you-didnt-know-you-need.jpeg"><figcaption>Same thing but slightly different</figcaption>
</figure>
<h3 id="distinguishing-test-iterations"><a class="toclink" href="#distinguishing-test-iterations">Distinguishing Test Iterations</a></h3>
<p>Let's start with a simple function to check if a number is even</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">return</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div>
<p>And a simple test</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestIsEven</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">test_should_be_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="n">is_even</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>Nice, let's add some more cases:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestIsEven</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">test_zero_should_be_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="n">is_even</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">test_negative_should_be_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="n">is_even</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>This is a simple example and we copied code three times. Let's try to do better by writing a loop to iterate values we expect to be even:</p>
<div class="highlight"><pre><span></span>class TestIsEven(TestCase):
def test_should_all_be_even(self):
for n in (2, 0, -2, 11):
self.assertTrue(is_even(n))
</pre></div>
<p>This is starting to look more elegant, so <strong>what is the problem?</strong> I added an odd value, 11, to fail the test. Let's run the test and see what it looks like:</p>
<div class="highlight"><pre><span></span>F
===================================================
FAIL: test_should_all_be_even (__main__.TestIsEven)
- - -- - - - - - - - - - - - - - - - - - - - - - -
Traceback (most recent call last):
File "subtest.py", line 18, in test_should_all_be_even
self.assertTrue(is_even(n))
AssertionError: False is not true
</pre></div>
<p>It failed as expected, but <strong>which value failed</strong>?</p>
<h3 id="enter-subtest"><a class="toclink" href="#enter-subtest">Enter <code>subTest</code></a></h3>
<p>In python 3.4 there is a new feature called <a href="https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.subTest" rel="noopener">subTest</a>. Lets see it in action:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestIsEven</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">test_should_all_be_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="mi">11</span><span class="p">):</span>
<span class="hll"> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">subTest</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="n">is_even</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
</pre></div>
<p>Running this test produces the following output:</p>
<div class="highlight"><pre><span></span><span class="go">F</span>
<span class="go">==========================================================</span>
<span class="hll"><span class="go">FAIL: test_should_all_be_even (__main__.TestIsEven) (n=11)</span>
</span><span class="go">- - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go">File "subtest.py", line 23, in test_should_all_be_even</span>
<span class="go">self.assertTrue(is_even(n))</span>
<span class="go">AssertionError: False is not true</span>
</pre></div>
<p>So which value failed? 11! <strong>It's in the title</strong>.</p>
<p>How multiple failures look like?</p>
<div class="highlight"><pre><span></span><span class="go">F</span>
<span class="go">===========================================================</span>
<span class="hll"><span class="go">FAIL: test_should_all_be_even (__main__.TestIsEven) (n=3)</span>
</span><span class="go">- - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go">File "subtest.py", line 23, in test_should_all_be_even</span>
<span class="go">self.assertTrue(is_even(n))</span>
<span class="go">AssertionError: False is not true</span>
<span class="go">==========================================================</span>
<span class="hll"><span class="go">FAIL: test_should_all_be_even (__main__.TestIsEven) (n=5)</span>
</span><span class="go">- - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go">File "subtest.py", line 23, in test_should_all_be_even</span>
<span class="go">self.assertTrue(is_even(n))</span>
<span class="go">AssertionError: False is not true</span>
<span class="go">==========================================================</span>
<span class="hll"><span class="go">FAIL: test_should_all_be_even (__main__.TestIsEven) (n=11)</span>
</span><span class="go">- - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go">File "subtest.py", line 23, in test_should_all_be_even</span>
<span class="go">self.assertTrue(is_even(n))</span>
<span class="go">AssertionError: False is not true</span>
</pre></div>
<p>Exactly as if we wrote three separate test cases.</p>
<p><strong>Profit!</strong></p>Timing Tests in Python For Fun and Profit2016-11-09T00:00:00+02:002016-11-09T00:00:00+02:00Haki Benitatag:hakibenita.com,2016-11-09:/timing-tests-in-python-for-fun-and-profit<p>Hunting down slow tests by reporting tests that take longer than a certain threshold (Because the first step to better test performance is awareness!)</p><hr>
<p>I was preparing to push some changes a couple of days ago and as I usually do, I ran the tests. I sat back in my chair as the dots raced across the screen when suddenly I noticed that one of the dots linger. "OS is probably running some updates in the background or something" I said to myself, and ran the tests again just to be sure. I watched closely as the dots filled the screen and there it was again - <strong>I have a slow test</strong>!</p>
<figure><img alt="Can you spot the slow test? Neither can I..." src="https://hakibenita.com/images/01-timing-tests-in-python-for-fun-and-profit.png"><figcaption>Can you spot the slow test? Neither can I...</figcaption>
</figure>
<p><strong>We are going to hunt down slow tests by reporting tests that take longer than a certain threshold.</strong></p>
<h3 id="the-basics"><a class="toclink" href="#the-basics">The Basics</a></h3>
<p>To get the ball rolling let's create a simple test case with a fast test and a slow test:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">unittest</span>
<span class="k">class</span> <span class="nc">SlowTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
<span class="hll"> <span class="k">def</span> <span class="nf">test_should_run_fast</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="hll"> <span class="k">def</span> <span class="nf">test_should_run_slow</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>Running this script from the command line produces the following output:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>-m<span class="w"> </span>unittest<span class="w"> </span>timing.py
..
Ran<span class="w"> </span><span class="m">2</span><span class="w"> </span>tests<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.502s
OK
</pre></div>
<p>I'm sorry unittest, but <strong>this is definitely not OK</strong> - 0.5s for two tests?</p>
<p>To figure out which tests are slow we need to <strong>measure the time it takes each test to execute</strong>.</p>
<p>A python <code>unittest.TestCase</code> has hooks that execute in the following order:</p>
<div class="highlight"><pre><span></span>> setUpClass
> setUp
> test_*
> tearDown
> tearDownClass
</pre></div>
<p>If we want to time a single test (<code>test_*</code>) we need to start a timer in <code>setUp</code> and stop it in tearDown:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">unittest</span>
<span class="k">class</span> <span class="nc">SlowTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
<span class="hll"> <span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">_started_at</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="hll"> <span class="k">def</span> <span class="nf">tearDown</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span> <span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">_started_at</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1"> (</span><span class="si">{}</span><span class="s1">s)'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">(),</span> <span class="nb">round</span><span class="p">(</span><span class="n">elapsed</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">test_should_run_fast</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_run_slow</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>This produces the following output:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>-m<span class="w"> </span>unittest<span class="w"> </span>timing.py
__main__.SlowTestCase.test_should_run_fast<span class="w"> </span><span class="o">(</span><span class="m">0</span>.0s<span class="o">)</span>
.__main__.SlowTestCase.test_should_run_slow<span class="w"> </span><span class="o">(</span><span class="m">0</span>.5s<span class="o">)</span>
.
Ran<span class="w"> </span><span class="m">2</span><span class="w"> </span>tests<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.503s
OK
</pre></div>
<p>Great! We got the timing for each test but we really want <strong>only the slow ones</strong>.</p>
<p>Let's say a slow test is a test that takes longer than 0.3s:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="n">SLOW_TEST_THRESHOLD</span> <span class="o">=</span> <span class="mf">0.3</span>
</span>
<span class="k">class</span> <span class="nc">SlowTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">tearDown</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">_started_at</span>
<span class="hll"> <span class="k">if</span> <span class="n">elapsed</span> <span class="o">></span> <span class="n">SLOW_TEST_THRESHOLD</span><span class="p">:</span>
</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1"> (</span><span class="si">{}</span><span class="s1">s)'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">(),</span> <span class="nb">round</span><span class="p">(</span><span class="n">elapsed</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
</pre></div>
<p>And the output:</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>python<span class="w"> </span>-m<span class="w"> </span>unittest<span class="w"> </span>timing.py
.__main__.SlowTestCase.test_should_run_slow<span class="w"> </span><span class="o">(</span><span class="m">0</span>.5s<span class="o">)</span>
.
Ran<span class="w"> </span><span class="m">2</span><span class="w"> </span>tests<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.503s
OK
</pre></div>
<p>Awesome! We got exactly what we wanted but it is still incomplete. We are good developers so we are most likely dead lazy. We don't want to go around and update every test case - <strong>we need a more robust solution</strong>.</p>
<h3 id="the-runner"><a class="toclink" href="#the-runner">The Runner</a></h3>
<p>One of the roles of the <strong><code>TestRunner</code></strong> is to print test results to an output stream. The runner uses a <strong><code>TestResult</code></strong> object to format the results. The unittest module comes with a default <strong><a href="https://docs.python.org/3/library/unittest.html#unittest.TextTestRunner" rel="noopener">TextTestRunner</a></strong> and <strong><a href="https://docs.python.org/3/library/unittest.html#unittest.TextTestResult" rel="noopener">TextTestResult</a></strong>.</p>
<p>Let's implement a custom <code>TestResult</code> to report slow tests:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">unittest.runner</span> <span class="kn">import</span> <span class="n">TextTestResult</span>
<span class="n">SLOW_TEST_THRESHOLD</span> <span class="o">=</span> <span class="mf">0.3</span>
<span class="k">class</span> <span class="nc">TimeLoggingTestResult</span><span class="p">(</span><span class="n">TextTestResult</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">startTest</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_started_at</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">startTest</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">addSuccess</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
<span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">_started_at</span>
<span class="k">if</span> <span class="n">elapsed</span> <span class="o">></span> <span class="n">SLOW_TEST_THRESHOLD</span><span class="p">:</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">getDescription</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stream</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="si">{}</span><span class="s2"> (</span><span class="si">{:.03}</span><span class="s2">s)</span><span class="se">\n</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">elapsed</span><span class="p">))</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">addSuccess</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
</pre></div>
<p>Almost identical to what we already have but using <strong>different hooks</strong>. Instead of setUp we use testStart and instead of tearDown we use <code>addSuccess</code>.</p>
<p>The built-in TextTestRunner uses <code>TextTestResult</code>. To use a different <code>TestResult</code> we create an instance of <strong><code>TextTestRunner</code></strong> with our runner:</p>
<div class="highlight"><pre><span></span>from unittest import TextTestRunner
if __name__ == '__main__':
test_runner = TextTestRunner(resultclass=TimeLoggingTestResult)
unittest.main(testRunner=test_runner)
</pre></div>
<p>And the output:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python<span class="w"> </span>runner.py
.
test_should_run_slow<span class="w"> </span><span class="o">(</span>__main__.SlowTestCase<span class="o">)</span><span class="w"> </span><span class="o">(</span><span class="m">0</span>.501s<span class="o">)</span>
.
Ran<span class="w"> </span><span class="m">2</span><span class="w"> </span>tests<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.501s
OK
</pre></div>
<p>We get a nice report <strong>without having to make any changes</strong> to existing test cases.</p>
<h3 id="can-we-do-better"><a class="toclink" href="#can-we-do-better">Can We Do Better?</a></h3>
<p>Right now we have a bunch of messages sprinkled around in random places across the screen. What if we could get a nice report with all the slow tests? Well, we can!</p>
<p>Let's start by making our TestResult store the timings without reporting them:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">unittest.runner</span> <span class="kn">import</span> <span class="n">TextTestResult</span>
<span class="k">class</span> <span class="nc">TimeLoggingTestResult</span><span class="p">(</span><span class="n">TextTestResult</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">test_timings</span> <span class="o">=</span> <span class="p">[]</span>
</span>
<span class="k">def</span> <span class="nf">startTest</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_test_started_at</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">startTest</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">addSuccess</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
<span class="n">elapsed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">_test_started_at</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">getDescription</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">test_timings</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">name</span><span class="p">,</span> <span class="n">elapsed</span><span class="p">))</span>
</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">addSuccess</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="hll"> <span class="k">def</span> <span class="nf">getTestTimings</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class="hll"> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">test_timings</span>
</span></pre></div>
<p>The test result now holds a list of tuples containing the test name and the elapsed time. Moving over to our custom <code>TestRunner</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/test/runner.py</span>
<span class="kn">import</span> <span class="nn">unittest</span>
<span class="k">class</span> <span class="nc">TimeLoggingTestRunner</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TextTestRunner</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">slow_test_threshold</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">slow_test_threshold</span> <span class="o">=</span> <span class="n">slow_test_threshold</span>
</span><span class="hll"> <span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">resultclass</span><span class="o">=</span><span class="n">TimeLoggingTestResult</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stream</span><span class="o">.</span><span class="n">writeln</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">Slow Tests (></span><span class="si">{:.03}</span><span class="s2">s):"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">slow_test_threshold</span><span class="p">))</span>
<span class="hll"> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">elapsed</span> <span class="ow">in</span> <span class="n">result</span><span class="o">.</span><span class="n">getTestTimings</span><span class="p">():</span>
</span><span class="hll"> <span class="k">if</span> <span class="n">elapsed</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">slow_test_threshold</span><span class="p">:</span>
</span><span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">stream</span><span class="o">.</span><span class="n">writeln</span><span class="p">(</span><span class="s2">"(</span><span class="si">{:.03}</span><span class="s2">s) </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">elapsed</span><span class="p">,</span> <span class="n">name</span><span class="p">))</span>
</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p>Let's break it down:</p>
<ul>
<li>We've replaced <code>SLOW_TEST_THRESHOLD</code> with a parameter to the init - Much cleaner.</li>
<li>We've set the appropriate TestResult class.</li>
<li>We've override run and add our custom "slow test" report.</li>
</ul>
<p>This is what the output looks like (I added some slow tests to illustrate):</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>python<span class="w"> </span>timing.py
.....
Ran<span class="w"> </span><span class="m">5</span><span class="w"> </span>tests<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span>.706s
OK
Slow<span class="w"> </span>Tests<span class="w"> </span><span class="o">(</span>>0.3s<span class="o">)</span>:
<span class="o">(</span><span class="m">0</span>.501s<span class="o">)</span><span class="w"> </span>test_should_run_slow<span class="w"> </span><span class="o">(</span>__main__.SlowTestCase<span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.802s<span class="o">)</span><span class="w"> </span>test_should_run_very_slow<span class="w"> </span><span class="o">(</span>__main__.SlowTestCase<span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.301s<span class="o">)</span><span class="w"> </span>test_should_run_slow_enough<span class="w"> </span><span class="o">(</span>__main__.SlowTestCase<span class="o">)</span>
</pre></div>
<p>Now that we have the timing data we can use that to generate interesting reports. We can sort by elapsed time, show potential time reduction and highlight sluggish tests.</p>
<h3 id="how-to-use-this-with-django"><a class="toclink" href="#how-to-use-this-with-django">How To Use This With Django</a></h3>
<p>Django has its own test runner so we need to make some adjustments:</p>
<div class="highlight"><pre><span></span><span class="c1"># common/test/runner.py</span>
<span class="kn">from</span> <span class="nn">django.test.runner</span> <span class="kn">import</span> <span class="n">DiscoverRunner</span>
<span class="c1"># ...</span>
<span class="k">class</span> <span class="nc">TimeLoggingTestRunner</span><span class="p">(</span><span class="n">DiscoverRunner</span><span class="p">):</span>
<span class="hll">
</span> <span class="k">def</span> <span class="nf">get_resultclass</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">TimeLoggingTestResult</span>
</pre></div>
<p>And to make Django use our custom runner we set the following:</p>
<div class="highlight"><pre><span></span><span class="c1"># settings.py</span>
<span class="n">TEST_RUNNER</span> <span class="o">=</span> <span class="s1">'common.tests.runner.TimeLoggingTestRunner'</span>
</pre></div>
<p>Go make some tests faster!</p>How to Add Custom Action Buttons to Django Admin2016-11-02T00:00:00+02:002016-11-02T00:00:00+02:00Haki Benitatag:hakibenita.com,2016-11-02:/how-to-add-custom-action-buttons-to-django-admin<p>The built-in admin actions, operate on a queryset and are hidden in a dropbox menu. They are not suitable for most use cases. In this article we are going to add custom action buttons for each row in a Django Admin list view.</p><hr>
<p>We are big fans of the Django admin interface. It's a huge selling point for Django as it takes the load off developing a "back office" for support and day to day operations.</p>
<p>In the <a href="/bullet-proofing-django-models">last post</a> we presented a pattern we use often in our Django models. We used a bank account application with an <strong>Account</strong> and account <strong>Action</strong> models to demonstrate the way we handle common issues such as concurrency and validation. The bank account had two operations we wanted to expose in the admin interface - <strong>deposit</strong> and <strong>withdraw</strong>.</p>
<p><strong>We are going to add buttons in the Django admin interface to deposit and withdraw from an account, and we are going to do it in less than 100 lines of code!</strong></p>
<p>What will it look like at the end?</p>
<figure><img alt="Django Admin interface with custom action buttons" src="https://hakibenita.com/images/01-how-to-add-custom-action-buttons-to-django-admin.png"><figcaption>Django Admin interface with custom action buttons</figcaption>
</figure>
<p>Our custom actions are the nice looking deposit and withdraw buttons next to each account.</p>
<h3 id="why-not-use-the-existing-admin-actions"><a class="toclink" href="#why-not-use-the-existing-admin-actions">Why Not Use the Existing Admin Actions?</a></h3>
<p>The <a href="https://docs.djangoproject.com/en/1.10/ref/contrib/admin/actions/" rel="noopener">built-in admin actions</a> operate on a queryset. They are <strong>hidden in a dropbox menu</strong> in the top toolbar, and they are mostly useful for <strong>executing bulk operations</strong>. A good example is the default delete action. You mark several rows and select "delete rows" form the drop down menu. This is not very fleixble, and not suitable for some use cases.</p>
<figure><img alt="Django built in actions" src="https://hakibenita.com/images/02-how-to-add-custom-action-buttons-to-django-admin.png"><figcaption>Django built in actions</figcaption>
</figure>
<p>Another downside, is that <strong>actions are not available in the detail view</strong>. To add buttons to the detail view you need to override the template, A huge pain.</p>
<h3 id="the-forms"><a class="toclink" href="#the-forms">The Forms</a></h3>
<p>First thing first, we need some data from the user to perform the action. Naturally, <strong>we need a form</strong>. We need one form for deposit, and one form for withdraw.</p>
<p>In addition to performing the action, we are going to add a nifty option to send a notification email to the account owner, informing him about an action made to his account.</p>
<p>All of our actions have common arguments like comment and send_email. The actions also handle
success and failure in a similar way.</p>
<p>Let's start with a base form to handle a general action:</p>
<div class="highlight"><pre><span></span><span class="c1"># forms.py</span>
<span class="kn">from</span> <span class="nn">django</span> <span class="kn">import</span> <span class="n">forms</span>
<span class="kn">from</span> <span class="nn">common.utils</span> <span class="kn">import</span> <span class="n">send_email</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">errors</span>
<span class="k">class</span> <span class="nc">AccountActionForm</span><span class="p">(</span><span class="n">forms</span><span class="o">.</span><span class="n">Form</span><span class="p">):</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">widget</span><span class="o">=</span><span class="n">forms</span><span class="o">.</span><span class="n">Textarea</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">send_email</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">email_subject_template</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'email/account/notification_subject.txt'</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">email_body_template</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">form_action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">account</span><span class="p">,</span> <span class="n">user</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">account</span><span class="p">,</span> <span class="n">user</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">account</span><span class="p">,</span> <span class="n">action</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">form_action</span><span class="p">(</span><span class="n">account</span><span class="p">,</span> <span class="n">user</span><span class="p">)</span>
<span class="k">except</span> <span class="n">errors</span><span class="o">.</span><span class="n">Error</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="n">error_message</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_error</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="n">error_message</span><span class="p">)</span>
<span class="k">raise</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'send_email'</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
<span class="n">send_email</span><span class="p">(</span>
<span class="n">to</span><span class="o">=</span><span class="p">[</span><span class="n">account</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">email</span><span class="p">],</span>
<span class="n">subject_template</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">email_subject_template</span><span class="p">,</span>
<span class="n">body_template</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">email_body_template</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="p">{</span>
<span class="s2">"account"</span><span class="p">:</span> <span class="n">account</span><span class="p">,</span>
<span class="s2">"action"</span><span class="p">:</span> <span class="n">action</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">account</span><span class="p">,</span> <span class="n">action</span>
</pre></div>
<ul>
<li>Every action has a comment, and an option to send a notification if the action completed successfully.</li>
<li>The actual operation is executed when the form is saved. This is similar to how <code>ModelForm</code> works.</li>
<li>For logging and audit purposes, the caller must provide the user executing the action.</li>
<li>Required properties that are not implemented by derived forms, will raise <code>NotImplementedError</code>. This way, we make sure the developer gets an informative error message if she forgets to implement something.</li>
<li>Errors are handled at the base exception class level. Our models define a base error class, so we can easily catch all (and only) account related exceptions, and handle them appropriately.</li>
</ul>
<p>Now that we have a simple base class, let's add a form to withdraw from an account. The withdraw we need to add an amount field:</p>
<div class="highlight"><pre><span></span><span class="c1"># forms.py</span>
<span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Account</span><span class="p">,</span> <span class="n">Action</span>
<span class="k">class</span> <span class="nc">WithdrawForm</span><span class="p">(</span><span class="n">AccountActionForm</span><span class="p">):</span>
<span class="n">amount</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span>
<span class="n">min_value</span><span class="o">=</span><span class="n">Account</span><span class="o">.</span><span class="n">MIN_WITHDRAW</span><span class="p">,</span>
<span class="n">max_value</span><span class="o">=</span><span class="n">Account</span><span class="o">.</span><span class="n">MAX_WITHDRAW</span><span class="p">,</span>
<span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">help_text</span><span class="o">=</span><span class="s1">'How much to withdraw?'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">email_body_template</span> <span class="o">=</span> <span class="s1">'email/account/withdraw.txt'</span>
<span class="n">field_order</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'amount'</span><span class="p">,</span>
<span class="s1">'comment'</span><span class="p">,</span>
<span class="s1">'send_email'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">form_action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">account</span><span class="p">,</span> <span class="n">user</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Account</span><span class="o">.</span><span class="n">withdraw</span><span class="p">(</span>
<span class="nb">id</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">pk</span><span class="p">,</span>
<span class="n">user</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">user</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'amount'</span><span class="p">],</span>
<span class="n">withdrawn_by</span><span class="o">=</span><span class="n">user</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'comment'</span><span class="p">],</span>
<span class="n">asof</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span>
<span class="p">)</span>
</pre></div>
<ul>
<li>We extended the base <code>AccountActionForm</code> and added an <code>amount</code> field with proper validations.</li>
<li>We filled in the required attributes, <code>email_body_template</code>.</li>
<li>We implemented the form action using the <code>classmethod</code> from the previous post. The model takes care of locking the record, updating any calculated fields, and adding the proper action to the log.</li>
</ul>
<p>The next step is to add a deposit action. Deposit requires amount, reference and reference type fields:</p>
<div class="highlight"><pre><span></span><span class="c1"># forms.py</span>
<span class="k">class</span> <span class="nc">DepositForm</span><span class="p">(</span><span class="n">AccountActionForm</span><span class="p">):</span>
<span class="n">amount</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span>
<span class="n">min_value</span><span class="o">=</span><span class="n">Account</span><span class="o">.</span><span class="n">MIN_DEPOSIT</span><span class="p">,</span>
<span class="n">max_value</span><span class="o">=</span><span class="n">Account</span><span class="o">.</span><span class="n">MAX_DEPOSIT</span><span class="p">,</span>
<span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">help_text</span><span class="o">=</span><span class="s1">'How much to deposit?'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">reference_type</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">ChoiceField</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">choices</span><span class="o">=</span><span class="n">Action</span><span class="o">.</span><span class="n">REFERENCE_TYPE_CHOICES</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">reference</span> <span class="o">=</span> <span class="n">forms</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">email_body_template</span> <span class="o">=</span> <span class="s1">'email/account/deposit.txt'</span>
<span class="n">field_order</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'amount'</span><span class="p">,</span>
<span class="s1">'reference_type'</span><span class="p">,</span>
<span class="s1">'reference'</span><span class="p">,</span>
<span class="s1">'comment'</span><span class="p">,</span>
<span class="s1">'send_email'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">form_action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">account</span><span class="p">,</span> <span class="n">user</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Account</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span>
<span class="nb">id</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">pk</span><span class="p">,</span>
<span class="n">user</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">user</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'amount'</span><span class="p">],</span>
<span class="n">deposited_by</span><span class="o">=</span><span class="n">user</span><span class="p">,</span>
<span class="n">reference</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'reference'</span><span class="p">],</span>
<span class="n">reference_type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'reference_type'</span><span class="p">],</span>
<span class="n">comment</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s1">'comment'</span><span class="p">],</span>
<span class="n">asof</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span>
<span class="p">)</span>
</pre></div>
<p>So far we got the necessary forms to accept, validate and execute deposit and withdraw. The next step is to integrate it into Django Admin list view.</p>
<hr>
<h3 id="the-admin"><a class="toclink" href="#the-admin">The Admin</a></h3>
<p>Before we can add fancy buttons for actions, we need to set up a <strong>basic admin page</strong> for our <code>Account</code> model:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Account</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Account</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">AccountAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">date_heirarchy</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'modified'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">list_display</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="s1">'modified'</span><span class="p">,</span>
<span class="s1">'balance'</span><span class="p">,</span>
<span class="s1">'account_actions'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">readonly_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="s1">'modified'</span><span class="p">,</span>
<span class="s1">'balance'</span><span class="p">,</span>
<span class="s1">'account_actions'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">list_select_related</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">account_actions</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">):</span>
<span class="c1"># TODO: Render action buttons</span>
</pre></div>
<p>Side Note: We can make the list view much better by adding a link to the user, and to the account actions. We can add some search fields and many more. I previously wrote about <a href="/things-you-must-know-about-django-admin-as-your-app-gets-bigger">performance considerations in the admin interface when scaling a Django app to hundreds of thousands of users</a>. There are some nice tricks there that can make even this simple view much nicer.</p>
<h3 id="adding-the-action-buttons"><a class="toclink" href="#adding-the-action-buttons">Adding the Action Buttons</a></h3>
<p>We want to add action buttons for each account, and have them link to a page with a form. Django has a function to register URLs in a list view. Let's use that function to add the routes for our custom actions:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.utils.html</span> <span class="kn">import</span> <span class="n">format_html</span>
<span class="kn">from</span> <span class="nn">django.core.urlresolvers</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="k">class</span> <span class="nc">AccountAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">get_urls</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">urls</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_urls</span><span class="p">()</span>
<span class="n">custom_urls</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">url</span><span class="p">(</span>
<span class="hll"> <span class="sa">r</span><span class="s1">'^(?P<account_id>.+)/deposit/$'</span><span class="p">,</span>
</span><span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">admin_site</span><span class="o">.</span><span class="n">admin_view</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">process_deposit</span><span class="p">),</span>
</span> <span class="n">name</span><span class="o">=</span><span class="s1">'account-deposit'</span><span class="p">,</span>
<span class="p">),</span>
<span class="n">url</span><span class="p">(</span>
<span class="hll"> <span class="sa">r</span><span class="s1">'^(?P<account_id>.+)/withdraw/$'</span><span class="p">,</span>
</span><span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">admin_site</span><span class="o">.</span><span class="n">admin_view</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">process_withdraw</span><span class="p">),</span>
</span> <span class="n">name</span><span class="o">=</span><span class="s1">'account-withdraw'</span><span class="p">,</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="k">return</span> <span class="n">custom_urls</span> <span class="o">+</span> <span class="n">urls</span>
<span class="k">def</span> <span class="nf">account_actions</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">):</span>
<span class="k">return</span> <span class="n">format_html</span><span class="p">(</span>
<span class="s1">'<a class="button" href="</span><span class="si">{}</span><span class="s1">">Deposit</a>&nbsp;'</span>
<span class="s1">'<a class="button" href="</span><span class="si">{}</span><span class="s1">">Withdraw</a>'</span><span class="p">,</span>
<span class="n">reverse</span><span class="p">(</span><span class="s1">'admin:account-deposit'</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">[</span><span class="n">obj</span><span class="o">.</span><span class="n">pk</span><span class="p">]),</span>
<span class="n">reverse</span><span class="p">(</span><span class="s1">'admin:account-withdraw'</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">[</span><span class="n">obj</span><span class="o">.</span><span class="n">pk</span><span class="p">]),</span>
<span class="p">)</span>
<span class="n">account_actions</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Account Actions'</span>
<span class="n">account_actions</span><span class="o">.</span><span class="n">allow_tags</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">process_deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># TODO</span>
<span class="k">def</span> <span class="nf">process_withdraw</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># TODO</span>
</pre></div>
<ol>
<li>We registered two urls, one for deposit and one for withdraw.</li>
<li>We referenced each route to a relevant function, <code>process_deposit</code> and <code>process_withdraw</code>. These function will render an intermediate page with the corresponding form, and will carry out the operation.</li>
<li>We added a custom field <code>account_actions</code> to render the buttons for each action. The benefit of using a "regular" admin field like <code>account_actions</code>, is that it is available in both the detail and the list view.</li>
</ol>
<p>Let's move on to implement the functions to handle the actions:</p>
<div class="highlight"><pre><span></span><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.http</span> <span class="kn">import</span> <span class="n">HttpResponseRedirect</span>
<span class="kn">from</span> <span class="nn">django.template.response</span> <span class="kn">import</span> <span class="n">TemplateResponse</span>
<span class="kn">from</span> <span class="nn">.forms</span> <span class="kn">import</span> <span class="n">DepositForm</span><span class="p">,</span> <span class="n">WithdrawForm</span>
<span class="k">class</span> <span class="nc">AccountAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">process_deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">account_id</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">process_action</span><span class="p">(</span>
<span class="n">request</span><span class="o">=</span><span class="n">request</span><span class="p">,</span>
<span class="n">account_id</span><span class="o">=</span><span class="n">account_id</span><span class="p">,</span>
<span class="n">action_form</span><span class="o">=</span><span class="n">DepositForm</span><span class="p">,</span>
<span class="n">action_title</span><span class="o">=</span><span class="s1">'Deposit'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">process_withdraw</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">account_id</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">process_action</span><span class="p">(</span>
<span class="n">request</span><span class="o">=</span><span class="n">request</span><span class="p">,</span>
<span class="n">account_id</span><span class="o">=</span><span class="n">account_id</span><span class="p">,</span>
<span class="n">action_form</span><span class="o">=</span><span class="n">WithdrawForm</span><span class="p">,</span>
<span class="n">action_title</span><span class="o">=</span><span class="s1">'Withdraw'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">process_action</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
<span class="n">request</span><span class="p">,</span>
<span class="n">account_id</span><span class="p">,</span>
<span class="n">action_form</span><span class="p">,</span>
<span class="n">action_title</span>
<span class="p">):</span>
<span class="n">account</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_object</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">account_id</span><span class="p">)</span>
<span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">method</span> <span class="o">!=</span> <span class="s1">'POST'</span><span class="p">:</span>
<span class="n">form</span> <span class="o">=</span> <span class="n">action_form</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">form</span> <span class="o">=</span> <span class="n">action_form</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">POST</span><span class="p">)</span>
<span class="k">if</span> <span class="n">form</span><span class="o">.</span><span class="n">is_valid</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">form</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">account</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="k">except</span> <span class="n">errors</span><span class="o">.</span><span class="n">Error</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="c1"># If save() raised, the form will a have a non</span>
<span class="c1"># field error containing an informative message.</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'Success'</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span>
<span class="s1">'admin:account_account_change'</span><span class="p">,</span>
<span class="n">args</span><span class="o">=</span><span class="p">[</span><span class="n">account</span><span class="o">.</span><span class="n">pk</span><span class="p">],</span>
<span class="n">current_app</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">admin_site</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponseRedirect</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">context</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">admin_site</span><span class="o">.</span><span class="n">each_context</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'opts'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'form'</span><span class="p">]</span> <span class="o">=</span> <span class="n">form</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'account'</span><span class="p">]</span> <span class="o">=</span> <span class="n">account</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'title'</span><span class="p">]</span> <span class="o">=</span> <span class="n">action_title</span>
<span class="k">return</span> <span class="n">TemplateResponse</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span>
<span class="s1">'admin/account/account_action.html'</span><span class="p">,</span>
<span class="n">context</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Deposit and withdraw are executed in a very similar way. To cary out both actions, we want to render a form in an intermediate page, and execute the action when the user submit the form.</p>
<p><code>process_action</code> handles the form submission for both actions. The function accepts a form, the title of the action, and the account id. <code>process_withdraw</code> and <code>process_deposit</code>, are used to set the relevant context for each operation.</p>
<p><em>NOTE: There is some Django admin boilerplate here that is required by the Django admin site. No point in digging too deep into it because it's not relevant to us at this point.</em></p>
<p>To complete the process, we need a template for the intermediate page that contains the action form. We are going to base our template on an existing template used by Django Admin it self:</p>
<div class="highlight"><pre><span></span><span class="x"><!-- templates/admin/account/account_action.html --></span>
<span class="hll"><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"admin/change_form.html"</span> <span class="cp">%}</span>
</span><span class="cp">{%</span> <span class="k">load</span> <span class="nv">i18n</span> <span class="nv">admin_static</span> <span class="nv">admin_modify</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">content</span> <span class="cp">%}</span>
<span class="x"><div id="content-main"></span>
<span class="x"> <form action="" method="POST"></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">csrf_token</span> <span class="cp">%}</span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">if</span> <span class="nv">form.non_field_errors</span><span class="o">|</span><span class="nf">length</span> <span class="o">></span> <span class="m">0</span> <span class="cp">%}</span>
<span class="x"> <p class="errornote"></span>
<span class="x"> "Please correct the errors below."</span>
<span class="x"> </p></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">form.non_field_errors</span> <span class="cp">}}</span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="x"> <fieldset class="module aligned"></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">for</span> <span class="nv">field</span> <span class="k">in</span> <span class="nv">form</span> <span class="cp">%}</span>
<span class="x"> <div class="form-row"></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">field.errors</span> <span class="cp">}}</span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">field.label_tag</span> <span class="cp">}}</span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">field</span> <span class="cp">}}</span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">if</span> <span class="nv">field.field.help_text</span> <span class="cp">%}</span>
<span class="x"> <p class="help"></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">field.field.help_text</span><span class="o">|</span><span class="nf">safe</span> <span class="cp">}}</span>
<span class="x"> </p></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="x"> </div></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="x"> </fieldset></span>
<span class="x"> <div class="submit-row"></span>
<span class="x"> <input type="submit" class="default" value="Submit"></span>
<span class="x"> </div></span>
<span class="x"> </form></span>
<span class="x"></div></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div>
<p><strong>This is it!</strong></p>
<p>Staff members can now easily deposit and withdraw directly from the admin interface. No need to create an expensive dashboard or ssh to the server.</p>
<p>I promised we will do it in 100 lines and we did it in less!</p>
<hr>
<h3 id="where-can-we-take-it-from-here"><a class="toclink" href="#where-can-we-take-it-from-here">Where Can We Take It From Here</a></h3>
<p>Now that we nailed this technique we can pretty much do whatever we want with it. We have total control over the route and what's being rendered. The next step would be to abstract this functionality and put in a mixin, but, we will cross that bridge when we get there.</p>
<hr>
<h3 id="credits"><a class="toclink" href="#credits">Credits</a></h3>
<p>Big parts of the implementation are taken from the excellent (excellent!) package <a href="https://github.com/django-import-export/django-import-export" rel="noopener">django-import-export</a>. It saved us hours of "Can you just send me the data in Excel?" and we love it for it. If you are not familiar with it you should definitely <a href="https://github.com/django-import-export/django-import-export" rel="noopener">check it out</a>.</p>Bullet Proofing Django Models2016-10-26T00:00:00+03:002016-10-26T00:00:00+03:00Haki Benitatag:hakibenita.com,2016-10-26:/bullet-proofing-django-models<p>We recently added a bank account like functionality into one of our products. During the development we encountered some textbook problems and I thought it can be a good opportunity to go over some of the patterns we use in our Django models.</p><p>We recently added a bank account like functionality into one of our products. During the development we encountered some textbook problems and I thought it can be a good opportunity to go over some of the patterns we use in our Django models.</p>
<h3 id="a-bank-account"><a class="toclink" href="#a-bank-account">A Bank Account</a></h3>
<p>This article was written in the order in which we usually address new problems:</p>
<ol>
<li>Define the business requirements.</li>
<li>Write down a naive implementation and model definition.</li>
<li>Challenge the solution.</li>
<li>Refine and repeat.</li>
</ol>
<h4 id="business-requirements"><a class="toclink" href="#business-requirements">Business Requirements</a></h4>
<ul>
<li>Each user can have only one account but not every user must have one.</li>
<li>The user can deposit and withdraw from the account up to a certain amount.</li>
<li>The account balance cannot be negative.</li>
<li>There is a max limit to the user's account balance.</li>
<li>The total amount of all balances in the app cannot exceed a certain amount.</li>
<li>There must be a record for every action on the account.</li>
<li>Actions on the account can be executed by the user from either the mobile app or the web interface and by support personnel from the admin interface.</li>
</ul>
<p>Now that we have the business requirements we can start with a model definition.</p>
<h4 id="account-model"><a class="toclink" href="#account-model">Account Model</a></h4>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">from</span> <span class="nn">django.conf</span> <span class="kn">import</span> <span class="n">settings</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Account</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">verbose_name</span> <span class="o">=</span> <span class="s1">'Account'</span>
<span class="n">verbose_name_plural</span> <span class="o">=</span> <span class="s1">'Accounts'</span>
<span class="n">MAX_TOTAL_BALANCES</span> <span class="o">=</span> <span class="mi">10000000</span>
<span class="n">MAX_BALANCE</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="n">MIN_BALANCE</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">MAX_DEPOSIT</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">MIN_DEPOSIT</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">MAX_WITHDRAW</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">MIN_WITHDRAW</span> <span class="o">=</span> <span class="mi">1</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">AutoField</span><span class="p">(</span>
<span class="n">primary_key</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">uid</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">UUIDField</span><span class="p">(</span>
<span class="n">unique</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">editable</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">,</span>
<span class="n">verbose_name</span><span class="o">=</span><span class="s1">'Public identifier'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">OneToOneField</span><span class="p">(</span>
<span class="n">settings</span><span class="o">.</span><span class="n">AUTH_USER_MODEL</span><span class="p">,</span>
<span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">PROTECT</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span>
<span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">modified</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span>
<span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">balance</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">PositiveIntegerField</span><span class="p">(</span>
<span class="n">verbose_name</span><span class="o">=</span><span class="s1">'Current balance'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Let's break it down:</p>
<ul>
<li><strong>We use two unique identifiers</strong>β - βA private identifier which is an auto generated number (id) and a public id which is a uuid (uid). It's a good idea to <strong>keep enumerators private</strong>β - They expose important information about our data such as how many accounts we have and we don't want that.</li>
<li><strong>We use OneToOneField for the user</strong> - It's like a ForeignKey but with a unique constraint. This ensures a user cannot have more than one account.</li>
<li><strong>We set <code>on_delete=models.PROTECT</code></strong> - Starting with Django 2.0 <a href="https://docs.djangoproject.com/en/1.10/ref/models/fields/#foreignkey" rel="noopener">this argument will be mandatory</a>. The default is CASCADEβ-βwhen the user is deleted the related account is deleted as well. In our case this doesn't make senseβ-βimagine the bank "deleting your money" when you close an account. Setting on_delete=models.PROTECT will raise an IntegrityError when attempting to delete a user with an account.</li>
<li>You probably noticed that the code is very... "vertical". Wwe write like that because it makes <strong>git diffs look nicer.</strong></li>
</ul>
<h4 id="account-action-model"><a class="toclink" href="#account-action-model">Account Action Model</a></h4>
<p>Now that we have an account model we can create a model to log actions made to the account:</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Action</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">verbose_name</span> <span class="o">=</span> <span class="s1">'Account Action'</span>
<span class="n">verbose_name_plural</span> <span class="o">=</span> <span class="s1">'Account Actions'</span>
<span class="n">ACTION_TYPE_CREATED</span> <span class="o">=</span> <span class="s1">'CREATED'</span>
<span class="n">ACTION_TYPE_DEPOSITED</span> <span class="o">=</span> <span class="s1">'DEPOSITED'</span>
<span class="n">ACTION_TYPE_WITHDRAWN</span> <span class="o">=</span> <span class="s1">'WITHDRAWN'</span>
<span class="n">ACTION_TYPE_CHOICES</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="n">ACTION_TYPE_CREATED</span><span class="p">,</span> <span class="s1">'Created'</span><span class="p">),</span>
<span class="p">(</span><span class="n">ACTION_TYPE_DEPOSITED</span><span class="p">,</span> <span class="s1">'Deposited'</span><span class="p">),</span>
<span class="p">(</span><span class="n">ACTION_TYPE_WITHDRAWN</span><span class="p">,</span> <span class="s1">'Withdrawn'</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">REFERENCE_TYPE_BANK_TRANSFER</span> <span class="o">=</span> <span class="s1">'BANK_TRANSFER'</span>
<span class="n">REFERENCE_TYPE_CHECK</span> <span class="o">=</span> <span class="s1">'CHECK'</span>
<span class="n">REFERENCE_TYPE_CASH</span> <span class="o">=</span> <span class="s1">'CASH'</span>
<span class="n">REFERENCE_TYPE_NONE</span> <span class="o">=</span> <span class="s1">'NONE'</span>
<span class="n">REFERENCE_TYPE_CHOICES</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="n">REFERENCE_TYPE_BANK_TRANSFER</span><span class="p">,</span> <span class="s1">'Bank Transfer'</span><span class="p">),</span>
<span class="p">(</span><span class="n">REFERENCE_TYPE_CHECK</span><span class="p">,</span> <span class="s1">'Check'</span><span class="p">),</span>
<span class="p">(</span><span class="n">REFERENCE_TYPE_CASH</span><span class="p">,</span> <span class="s1">'Cash'</span><span class="p">),</span>
<span class="p">(</span><span class="n">REFERENCE_TYPE_NONE</span><span class="p">,</span> <span class="s1">'None'</span><span class="p">),</span>
<span class="p">)</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">AutoField</span><span class="p">(</span>
<span class="n">primary_key</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">user_friendly_id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">unique</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">editable</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">settings</span><span class="o">.</span><span class="n">AUTH_USER_MODEL</span><span class="p">,</span>
<span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">PROTECT</span><span class="p">,</span>
<span class="n">help_text</span><span class="o">=</span><span class="s1">'User who performed the action.'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">created</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span>
<span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">account</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span>
<span class="n">Account</span><span class="p">,</span>
<span class="p">)</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span>
<span class="n">choices</span><span class="o">=</span><span class="n">ACTION_TYPE_CHOICES</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">delta</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span>
<span class="n">help_text</span><span class="o">=</span><span class="s1">'Balance delta.'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">reference</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">(</span>
<span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">reference_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span>
<span class="n">choices</span><span class="o">=</span><span class="n">REFERENCE_TYPE_CHOICES</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="n">REFERENCE_TYPE_NONE</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">(</span>
<span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Fields used solely for debugging purposes.</span>
<span class="n">debug_balance</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span>
<span class="n">help_text</span><span class="o">=</span><span class="s1">'Balance after the action.'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>What do we have here?</p>
<ul>
<li>Each record will hold a reference to the associated balance and the delta amount. A deposit of 100$ will have a delta of 100$, and a withdrawal of 50$ will have a delta of -50$. This way we can <strong>sum the deltas of all actions made to an account and get the current balance</strong>. This is important for validating our calculated balance.</li>
<li>We follow the same pattern of adding two identifiersβ-βa private and a public one. The difference here is that reference numbers for actions are often used by users and support personnel to identify a specific action over the phone or in emails. <strong>A uuid is not user friendly</strong>-βit's very long and it's not something users are used to see. I found a nice implementation of user-friendly ID's in <a href="https://github.com/simonluijk/django-invoice/blob/master/invoice/utils/friendly_id.py" rel="noopener">django-invoice.</a></li>
<li>Two of the fields are only relevant for one type of action, depositβ-βreference and reference type. There are a lot of ways to tackle this issueβ-βtable inheritance and down casting, JSON fields, table polymorphism and the list of <strong>overly complicated solutions</strong> goes on. In our case we are going to <strong>use a sparse table</strong>.</li>
</ul>
<p>Note about the design: Maintaining <strong>calculated fields in the model is usually bad design</strong>. Calculated fields such as the account's balance should be avoided whenever possible.</p>
<p>However, in our "real life" implementation there are additional action types and thousands of actions on each accountβ-βwe <strong>treat calculated attribute as an optimization</strong>. Maintaining state poses some interesting challenges and we thought it can serve the purpose of this post so we decided to present it as well.</p>
<h3 id="challenges"><a class="toclink" href="#challenges">Challenges</a></h3>
<h4 id="multiple-platforms"><a class="toclink" href="#multiple-platforms">Multiple Platforms</a></h4>
<p>We have three client applications we need to support:</p>
<ul>
<li>Mobile appβ - Uses an API interface to manage the account.</li>
<li>Web clientβ - Uses either an API interface (if we have some sort of SPA), or good old server side rendering with Django forms.</li>
<li>Admin interfaceβ - Uses Django's admin module with Django forms.</li>
</ul>
<p>Our motivation is to keep things DRY and self contained as possible.</p>
<h4 id="validation"><a class="toclink" href="#validation">Validation</a></h4>
<p>We have two types of validations hiding in the business requirements:</p>
<p><strong>Input validation</strong> such as "amount must be between X and Y", "balance cannot exceed Z", etcβ-βthese types of validation are well supported by Django and can usually be expressed as database constraints or django validations.</p>
<p>The second validation is a bit more complicated. We need to ensure the total amount of all balances in the entire system does not exceed a certain amount. This forces us to <strong>validate an instance against all other instances of the model</strong>.</p>
<h4 id="atomicity"><a class="toclink" href="#atomicity">Atomicity</a></h4>
<p>Race conditions are a very common issue in distributed systems and ever more so in models that maintain state such as bank account (you can read more about <a href="https://en.wikipedia.org/wiki/Race_condition" rel="noopener">race conditions in Wikipedia</a>).</p>
<p>To illustrate the problem consider an account with a balance of 100$. The user connects from two different devices at the exact same time and issue a withdraw of 100$. Since the two actions were executed at the exact same time it is possible that both of them get a current balance of 100$. Given that both session see sufficient balance they will both get approved and update the new balance to 0$. The user withdrawn a total of 200$ and the current balance is now 0$β-β<strong>we have a race condition</strong> and we are down 100$.</p>
<h4 id="logging-history"><a class="toclink" href="#logging-history">Logging / History</a></h4>
<p>The log serves two purposes:</p>
<ul>
<li><strong>Log and Auditβ</strong>-βInformation about historical actionsβ-βdates, amounts, users etc.</li>
<li><strong>Check Consistencyβ</strong>-βWe maintain state in the model so we want to be able to validate the calculated balance by aggregating the action deltas.</li>
</ul>
<p>The history records must be 100% immutable.</p>
<hr>
<h3 id="the-naive-implementation"><a class="toclink" href="#the-naive-implementation">The Naive Implementation</a></h3>
<p>Let's start with a naive implementation of deposit (this is <em>not</em> a good implementation):</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Account</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">amount</span><span class="p">,</span> <span class="n">deposited_by</span><span class="p">,</span> <span class="n">asof</span><span class="p">):</span>
<span class="k">assert</span> <span class="n">amount</span> <span class="o">></span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">MIN_DEPOSIT</span> <span class="o"><=</span> <span class="n">amount</span> <span class="o"><=</span> <span class="bp">self</span><span class="o">.</span><span class="n">MAX_DEPOSIT</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidAmount</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o">+</span> <span class="n">amount</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">MAX_BALANCE</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">ExceedsLimit</span><span class="p">()</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span>
<span class="n">total</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="s1">'balance'</span><span class="p">)</span>
<span class="p">)[</span><span class="s1">'total'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">total</span> <span class="o">+</span> <span class="n">amount</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">MAX_TOTAL_BALANCES</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">ExceedsLimit</span><span class="p">()</span>
<span class="hll"> <span class="n">action</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">actions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user</span><span class="o">=</span><span class="n">deposited_by</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_DEPOSITED</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o">+=</span> <span class="n">amount</span>
</span> <span class="bp">self</span><span class="o">.</span><span class="n">modified</span> <span class="o">=</span> <span class="n">asof</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</span></pre></div>
<p>And let's add a simple endpoint for it using DRF @api_view:</p>
<div class="highlight"><pre><span></span><span class="c1"># api.py</span>
<span class="c1"># ...</span>
<span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">transaction</span>
<span class="c1"># ...</span>
<span class="nd">@api_view</span><span class="p">(</span><span class="s1">'POST'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">amount</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">data</span>\<span class="p">[</span><span class="s1">'amount'</span>\<span class="p">])</span>
<span class="k">except</span> <span class="p">(</span><span class="ne">KeyError</span><span class="p">,</span> <span class="ne">ValueError</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Response</span><span class="p">(</span><span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="o">.</span><span class="n">HTTP_400_BAD_REQUEST</span><span class="p">)</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span> <span class="k">try</span><span class="p">:</span>
<span class="n">account</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Account</span><span class="o">.</span><span class="n">objects</span>
<span class="hll"> <span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span>
</span> <span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">except</span> <span class="n">Account</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">:</span>
<span class="k">return</span> <span class="n">Response</span><span class="p">(</span><span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="o">.</span><span class="n">HTTP_404_NOT_FOUND</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="n">account</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span>
</span><span class="hll"> <span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
</span><span class="hll"> <span class="n">deposited_by</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">,</span>
</span><span class="hll"> <span class="n">asof</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span>
</span><span class="hll"> <span class="p">)</span>
</span> <span class="k">except</span> <span class="p">(</span><span class="n">ExceedsLimit</span><span class="p">,</span> <span class="n">InvalidAmount</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Response</span><span class="p">(</span><span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="o">.</span><span class="n">HTTP_400_BAD_REQUEST</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Response</span><span class="p">(</span><span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="o">.</span><span class="n">HTTP_200_OK</span><span class="p">)</span>
</pre></div>
<p><strong>So what is the problem?</strong></p>
<p><strong>Locking the account</strong> β- βAn instance cannot lock itself because it had already been fetched. We gave up control over the locking and fetching so we have to trust the caller to properly obtain a lockβ-β<strong>this is very bad design</strong>. Don't take my word for it, let's take a glimpse at <a href="https://docs.djangoproject.com/en/1.10/misc/design-philosophies/" rel="noopener">Django's design philosophy</a>:</p>
<blockquote>
<p><strong>Loose coupling
</strong>A fundamental goal of Django's stack is loose coupling and tight cohesion. The various layers of the framework shouldn't "know" about each other unless absolutely necessary.</p>
</blockquote>
<p>So is it really the business of our API, forms and django admin to fetch the account for us and obtain a proper lock? <em>I think not</em>.</p>
<p><strong>Validation</strong>β - βThe account has to validate itself against all other accountsβ-βthis just feels awkward.</p>
<h3 id="a-better-approach"><a class="toclink" href="#a-better-approach">A Better Approach</a></h3>
<p>We need to hook into the process before the account is fetched (to obtain a lock) and in a place where it makes sense to validate and process more than one account.</p>
<p>Let's start with a function to create an Action instance and write it as a <code>classmethod</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="kn">from</span> <span class="nn">django.core.exceptions</span> <span class="kn">import</span> <span class="n">ValidationError</span>
<span class="k">class</span> <span class="nc">Action</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="nd">@classmethod</span>
</span><span class="hll"> <span class="k">def</span> <span class="nf">create</span><span class="p">(</span>
</span><span class="hll"> <span class="bp">cls</span><span class="p">,</span>
</span> <span class="n">user</span><span class="p">,</span>
<span class="n">account</span><span class="p">,</span>
<span class="nb">type</span><span class="p">,</span>
<span class="n">delta</span><span class="p">,</span>
<span class="n">asof</span><span class="p">,</span>
<span class="n">reference</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">reference_type</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">"""Create Action.</span>
<span class="sd"> user (User):</span>
<span class="sd"> User who executed the action.</span>
<span class="sd"> account (Account):</span>
<span class="sd"> Account the action executed on.</span>
<span class="sd"> type (str, one of Action.ACTION_TYPE_\*):</span>
<span class="sd"> Type of action.</span>
<span class="sd"> delta (int):</span>
<span class="sd"> Change in balance.</span>
<span class="sd"> asof (datetime.datetime):</span>
<span class="sd"> When was the action executed.</span>
<span class="sd"> reference (str or None):</span>
<span class="sd"> Reference number when appropriate.</span>
<span class="sd"> reference_type(str or None):</span>
<span class="sd"> Type of reference.</span>
<span class="sd"> Defaults to "NONE".</span>
<span class="sd"> comment (str or None):</span>
<span class="sd"> Optional comment on the action.</span>
<span class="sd"> Raises:</span>
<span class="sd"> ValidationError</span>
<span class="sd"> Returns (Action)</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">asof</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">type</span> <span class="o">==</span> <span class="bp">cls</span><span class="o">.</span><span class="n">ACTION_TYPE_DEPOSITED</span> <span class="ow">and</span>
<span class="n">reference_type</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">ValidationError</span><span class="p">({</span>
<span class="s1">'reference_type'</span><span class="p">:</span> <span class="s1">'required for deposit.'</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">if</span> <span class="n">reference_type</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">reference_type</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">REFERENCE_TYPE_NONE</span>
<span class="c1"># Don't store null in text field.</span>
<span class="k">if</span> <span class="n">reference</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">reference</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">if</span> <span class="n">comment</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">comment</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">user_friendly_id</span> <span class="o">=</span> <span class="n">generate_user_friendly_id</span><span class="p">()</span>
<span class="hll"> <span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user_friendly_id</span><span class="o">=</span><span class="n">user_friendly_id</span><span class="p">,</span>
<span class="n">created</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="n">user</span><span class="o">=</span><span class="n">user</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">account</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="nb">type</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=</span><span class="n">delta</span><span class="p">,</span>
<span class="n">reference</span><span class="o">=</span><span class="n">reference</span><span class="p">,</span>
<span class="n">reference_type</span><span class="o">=</span><span class="n">reference_type</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="n">comment</span><span class="p">,</span>
<span class="n">debug_balance</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>What do we have here:</p>
<ul>
<li>We used a classmethod that accepts all necessary data to validate and create the new instance. By *not* using the default manager's create function (Action.objects.create) we <strong>encapsulate all the business logic in the creation process</strong>.</li>
<li>We easily introduced <strong>custom validation</strong> and raise proper ValidationError.</li>
<li>We accept the creation time as an argument. That might seem a bit strange at first glanceβ-βwhy not use the built in auto_time_add? For starters <strong>It's much easier to test with predictable values</strong>. Second, as we are going to see in just a bit, we can make sure the modified time of the account is exactly the same as the action created time.</li>
</ul>
<p>Before moving over to the implementation of the account methods let's define <strong>custom exceptions</strong> for our Account module:</p>
<div class="highlight"><pre><span></span><span class="c1"># errors.py</span>
<span class="k">class</span> <span class="nc">Error</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">ExceedsLimit</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">InvalidAmount</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">amount</span> <span class="o">=</span> <span class="n">amount</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'Invalid Amount: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">InsufficientFunds</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">balance</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">balance</span> <span class="o">=</span> <span class="n">balance</span>
<span class="bp">self</span><span class="o">.</span><span class="n">amount</span> <span class="o">=</span> <span class="n">amount</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'amount: </span><span class="si">{}</span><span class="s1">, current balance: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">amount</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">balance</span><span class="p">)</span>
</pre></div>
<p>We <strong>define a base Error class</strong> that inherits from Exception. This is something we found very useful and we use it a lot. A base error class allows us to catch all errors coming from a certain module:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">account.errors</span> <span class="kn">import</span> <span class="n">Error</span> <span class="k">as</span> <span class="n">AccountError</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># action on account</span>
<span class="k">except</span> <span class="n">AccountError</span><span class="p">:</span>
<span class="c1"># Handle all errors from account</span>
</pre></div>
<p>A <a href="https://github.com/kennethreitz/requests/blob/master/requests/exceptions.py#L12" rel="noopener">similar pattern</a> can be found in the popular requests package.</p>
<p>Let's implement the method to create a new Account:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Account</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="hll"> <span class="nd">@classmethod</span>
</span><span class="hll"> <span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">user</span><span class="p">,</span> <span class="n">created_by</span><span class="p">,</span> <span class="n">asof</span><span class="p">):</span>
</span><span class="w"> </span><span class="sd">"""Create account.</span>
<span class="sd"> user (User):</span>
<span class="sd"> Owner of the account.</span>
<span class="sd"> created_by (User):</span>
<span class="sd"> User that created the account.</span>
<span class="sd"> asof (datetime.datetime):</span>
<span class="sd"> Time of creation.</span>
<span class="sd"> Returns (tuple):</span>
<span class="sd"> [0] Account</span>
<span class="sd"> [1] Action</span>
<span class="sd"> """</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span><span class="hll"> <span class="n">account</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user</span><span class="o">=</span><span class="n">user</span><span class="p">,</span>
<span class="n">created</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="n">modified</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="n">balance</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="n">action</span> <span class="o">=</span> <span class="n">Action</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user</span><span class="o">=</span><span class="n">created_by</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">account</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_CREATED</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="k">return</span> <span class="n">account</span><span class="p">,</span> <span class="n">action</span>
</span></pre></div>
<p>Pretty straight forwardβ-βcreate the instance, create the action and return them both.</p>
<p>Notice how we accept asof here as wellβ-βmodified, created and the action creation time are all equalβ-βyou cant do that with auto_add and auto_add_now.</p>
<p>Now to the business logic:</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="hll"><span class="nd">@classmethod</span>
</span><span class="hll"><span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span>
</span> <span class="bp">cls</span><span class="p">,</span>
<span class="n">uid</span><span class="p">,</span>
<span class="n">deposited_by</span><span class="p">,</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">"""Deposit to account.</span>
<span class="sd"> uid (uuid.UUID):</span>
<span class="sd"> Account public identifier.</span>
<span class="sd"> deposited_by (User):</span>
<span class="sd"> Deposited by.</span>
<span class="sd"> amount (positive int):</span>
<span class="sd"> Amount to deposit.</span>
<span class="sd"> asof (datetime.datetime):</span>
<span class="sd"> Time of deposit.</span>
<span class="sd"> comment(str or None):</span>
<span class="sd"> Optional comment.</span>
<span class="sd"> Raises</span>
<span class="sd"> Account.DoesNotExist</span>
<span class="sd"> InvalidAmount</span>
<span class="sd"> ExceedsLimit</span>
<span class="sd"> Returns (tuple):</span>
<span class="sd"> [0] (Account) Updated account instance.</span>
<span class="sd"> [1] (Action) Deposit action.</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">amount</span> <span class="o">></span> <span class="mi">0</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span><span class="hll"> <span class="n">account</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span>
</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="n">MIN_DEPOSIT</span> <span class="o"><=</span> <span class="n">amount</span> <span class="o"><=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">MAX_DEPOSIT</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">InvalidAmount</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span>
<span class="k">if</span> <span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">+</span> <span class="n">amount</span> <span class="o">></span> <span class="bp">cls</span><span class="o">.</span><span class="n">MAX_BALANCE</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">ExceedsLimit</span><span class="p">()</span>
<span class="n">total</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="s1">'balance'</span><span class="p">))</span>\<span class="p">[</span><span class="s1">'total'</span>\<span class="p">]</span>
<span class="k">if</span> <span class="n">total</span> <span class="o">+</span> <span class="n">amount</span> <span class="o">></span> <span class="bp">cls</span><span class="o">.</span><span class="n">MAX_TOTAL_BALANCES</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">ExceedsLimit</span><span class="p">()</span>
<span class="hll"> <span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">+=</span> <span class="n">amount</span>
</span> <span class="n">account</span><span class="o">.</span><span class="n">modified</span> <span class="o">=</span> <span class="n">asof</span>
<span class="hll"> <span class="n">account</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span>
</span> <span class="s1">'balance'</span><span class="p">,</span>
<span class="s1">'modified'</span><span class="p">,</span>
<span class="p">])</span>
<span class="hll"> <span class="n">action</span> <span class="o">=</span> <span class="n">Action</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user</span><span class="o">=</span><span class="n">deposited_by</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">account</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_DEPOSITED</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">account</span><span class="p">,</span> <span class="n">action</span>
<span class="hll"><span class="nd">@classmethod</span>
</span><span class="hll"><span class="k">def</span> <span class="nf">withdraw</span><span class="p">(</span>
</span> <span class="bp">cls</span><span class="p">,</span>
<span class="n">uid</span><span class="p">,</span>
<span class="n">withdrawn_by</span><span class="p">,</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span>
<span class="n">comment</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">"""Withdraw from account.</span>
<span class="sd"> uid (uuid.UUID):</span>
<span class="sd"> Account public identifier.</span>
<span class="sd"> withdrawn_by (User):</span>
<span class="sd"> The withdrawing user.</span>
<span class="sd"> amount (positive int):</span>
<span class="sd"> Amount to withdraw.</span>
<span class="sd"> asof (datetime.datetime):</span>
<span class="sd"> Time of withdraw.</span>
<span class="sd"> comment (str or None):</span>
<span class="sd"> Optional comment.</span>
<span class="sd"> Raises:</span>
<span class="sd"> Account.DoesNotExist</span>
<span class="sd"> InvalidAmount</span>
<span class="sd"> InsufficientFunds</span>
<span class="sd"> Returns (tuple):</span>
<span class="sd"> [0] (Account) Updated account instance.</span>
<span class="sd"> [1] (Action) Withdraw action.</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">amount</span> <span class="o">></span> <span class="mi">0</span>
<span class="hll"> <span class="k">with</span> <span class="n">transaction</span><span class="o">.</span><span class="n">atomic</span><span class="p">():</span>
</span><span class="hll"> <span class="n">account</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">()</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">uid</span><span class="o">=</span><span class="n">uid</span><span class="p">)</span>
</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="n">MIN_WITHDRAW</span> <span class="o"><=</span> <span class="n">amount</span> <span class="o"><=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">MAX_WITHDRAW</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidAmount</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span>
<span class="k">if</span> <span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">-</span> <span class="n">amount</span> <span class="o"><</span> <span class="bp">cls</span><span class="o">.</span><span class="n">MIN_BALANCE</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InsufficientFunds</span><span class="p">(</span><span class="n">amount</span><span class="p">,</span> <span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">)</span>
<span class="hll"> <span class="n">account</span><span class="o">.</span><span class="n">balance</span> <span class="o">-=</span> <span class="n">amount</span>
</span> <span class="n">account</span><span class="o">.</span><span class="n">modified</span> <span class="o">=</span> <span class="n">asof</span>
<span class="hll">
</span> <span class="n">account</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">[</span>
<span class="s1">'balance'</span><span class="p">,</span>
<span class="s1">'modified'</span><span class="p">,</span>
<span class="p">])</span>
<span class="hll"> <span class="n">action</span> <span class="o">=</span> <span class="n">Action</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span> <span class="n">user</span><span class="o">=</span><span class="n">withdrawn_by</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">account</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_WITHDRAWN</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=-</span><span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">account</span><span class="p">,</span> <span class="n">action</span>
</pre></div>
<p>We can start to see the pattern here:</p>
<ol>
<li><strong>Acquire a lock on the account</strong> using select_for_update. This will lock the account row in the database and make sure no one can update the account instance until the transaction is completed (either committed or rolled-back).</li>
<li><strong>Perform validation checks</strong> and raise proper exceptionsβ-βraising an exception will cause the transaction to rollback.</li>
<li>If all the validations passed <strong>update the state (</strong>current balance), set the modification time, save the instance and <strong>create the log</strong> (action).</li>
</ol>
<p>So how does the model hold up to our challenges?</p>
<ul>
<li><strong>Multiple Platforms and Validation</strong>β<strong>-β</strong>We encapsulated all of our business logic including input and system wide validation inside the model method so consumers such as API, admin action or forms only need to handle exceptions and serialization / UI.</li>
<li><strong>Atomicityβ-β</strong>Each method obtains its own lock so there is no risk of race condition.</li>
<li><strong>Logging / Historyβ-β</strong>We created an action model and made sure each function registers the proper action.</li>
</ul>
<p><strong>Profit!</strong></p>
<hr>
<h3 id="testing"><a class="toclink" href="#testing">Testing</a></h3>
<p>Our app will be incomplete without proper tests. I previously wrote about <a href="/keeping-tests-dry-with-class-based-tests-in-python">class based testings</a>β-βwe are going to take a slightly different approach but still have a base class with utility functions:</p>
<div class="highlight"><pre><span></span><span class="c1"># tests/common.py</span>
<span class="k">class</span> <span class="nc">TestAccountBase</span><span class="p">:</span>
<span class="n">DEFAULT</span> <span class="o">=</span> <span class="nb">object</span><span class="p">()</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">default</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">default_value</span><span class="p">):</span>
<span class="k">return</span> <span class="n">default_value</span> <span class="k">if</span> <span class="n">value</span> <span class="ow">is</span> <span class="bp">cls</span><span class="o">.</span><span class="n">DEFAULT</span> <span class="k">else</span> <span class="n">value</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">setUpTestData</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">setUpTestData</span><span class="p">()</span>
<span class="c1"># Set up some default values</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">admin</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_superuser</span><span class="p">(</span>
<span class="s1">'Admin'</span><span class="p">,</span>
<span class="s1">'admin'</span><span class="p">,</span>
<span class="s1">'admin@testing.test'</span><span class="p">,</span>
<span class="p">)</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">user_A</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span>
<span class="s1">'user_A'</span><span class="p">,</span>
<span class="s1">'user_A'</span><span class="p">,</span>
<span class="s1">'A@testing.test'</span><span class="p">,</span>
<span class="p">)</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span>
<span class="bp">cls</span><span class="p">,</span>
<span class="n">user</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">created_by</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">DEFAULT</span>
<span class="p">):</span>
<span class="n">user</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="bp">cls</span><span class="o">.</span><span class="n">user_A</span><span class="p">)</span>
<span class="n">created_by</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">created_by</span><span class="p">,</span> <span class="bp">cls</span><span class="o">.</span><span class="n">admin</span><span class="p">)</span>
<span class="n">asof</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">asof</span><span class="p">,</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="n">account</span><span class="p">,</span> <span class="n">action</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">created_by</span><span class="p">,</span> <span class="n">asof</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="n">account</span><span class="p">,</span> <span class="n">action</span>
<span class="k">def</span> <span class="nf">deposit</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">deposited_by</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="p">):</span>
<span class="n">account</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">account</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="p">)</span>
<span class="n">deposited_by</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">deposited_by</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">admin</span><span class="p">)</span>
<span class="n">asof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">asof</span><span class="p">,</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="n">comment</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">comment</span><span class="p">,</span> <span class="s1">'deposit comment'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="p">,</span> <span class="n">action</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span>
<span class="n">uid</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">uid</span><span class="p">,</span>
<span class="n">deposited_by</span><span class="o">=</span><span class="n">deposited_by</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_DEPOSITED</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertIsNotNone</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">user_friendly_id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">created</span><span class="p">,</span> <span class="n">asof</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">delta</span><span class="p">,</span> <span class="n">amount</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">deposited_by</span><span class="p">)</span>
<span class="k">return</span> <span class="n">action</span>
<span class="k">def</span> <span class="nf">withdraw</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
<span class="n">amount</span><span class="p">,</span>
<span class="n">account</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">withdrawn_by</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="n">comment</span><span class="o">=</span><span class="n">DEFAULT</span><span class="p">,</span>
<span class="p">):</span>
<span class="n">account</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">account</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="p">)</span>
<span class="n">withdrawn_by</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">withdrawn_by</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">admin</span><span class="p">)</span>
<span class="n">asof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">asof</span><span class="p">,</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="n">comment</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">(</span><span class="n">comment</span><span class="p">,</span> <span class="s1">'withdraw comment'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="p">,</span> <span class="n">action</span> <span class="o">=</span> <span class="n">Account</span><span class="o">.</span><span class="n">withdraw</span><span class="p">(</span>
<span class="n">uid</span><span class="o">=</span><span class="n">account</span><span class="o">.</span><span class="n">uid</span><span class="p">,</span>
<span class="n">withdrawn_by</span><span class="o">=</span><span class="n">withdrawn_by</span><span class="p">,</span>
<span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<span class="n">asof</span><span class="o">=</span><span class="n">asof</span><span class="p">,</span>
<span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">Action</span><span class="o">.</span><span class="n">ACTION_TYPE_WITHDRAWN</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertIsNotNone</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">user_friendly_id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">created</span><span class="p">,</span> <span class="n">asof</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">delta</span><span class="p">,</span> <span class="n">amount</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">withdrawn_by</span><span class="p">)</span>
<span class="k">return</span> <span class="n">action</span>
</pre></div>
<p>To make testing easier to write we use utility functions to reduce the boilerplate of specifying the user, the account etc each time by providing default values and operating on self.account.</p>
<p>Lets use our base class to write some tests:</p>
<div class="highlight"><pre><span></span><span class="c1"># tests/test_account.py</span>
<span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">from</span> <span class="nn">django.test</span> <span class="kn">import</span> <span class="n">TestCase</span>
<span class="kn">from</span> <span class="nn">.common</span> <span class="kn">import</span> <span class="n">TestAccoutBase</span>
<span class="kn">from</span> <span class="nn">..models</span> <span class="kn">import</span> <span class="n">Account</span><span class="p">,</span> <span class="n">Action</span>
<span class="kn">from</span> <span class="nn">..errors</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">InvalidAmount</span><span class="p">,</span>
<span class="n">ExceedsLimit</span><span class="p">,</span>
<span class="n">InsuficientFunds</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">TestAccount</span><span class="p">(</span><span class="n">TestAccountBase</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">create</span><span class="p">()</span><span class="o">**</span>
<span class="k">def</span> <span class="nf">test_should_start_with_zero_balance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_deposit</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">150</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_to_deposit_less_than_minimum</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">InvalidAmount</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="n">Account</span><span class="o">.</span><span class="n">MIN_DEPOSIT</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_to_deposit_more_than_maximum</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">InvalidAmount</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="n">Account</span><span class="o">.</span><span class="n">MAX_DEPOSIT</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'account.models.Account.MAX_BALANCE'</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'account.models.Account.MAX_DEPOSIT'</span><span class="p">,</span> <span class="mi">502</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_to_deposit_more_than_max_balance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">ExceedsLimit</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">501</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'account.models.Account.MAX_BALANCE'</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'account.models.Account.MAX_DEPOSIT'</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'account.models.Account.MAX_TOTAL_BALANCES'</span><span class="p">,</span> <span class="mi">600</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_when_exceed_max_total_balances</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Exceed max total balances for the same account</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">500</span><span class="p">)</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">ExceedsLimit</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">500</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="c1"># Exceed max total balances in other account</span>
<span class="n">other_user</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">,</span> <span class="s1">'baz'</span><span class="p">)</span>
<span class="n">other_account</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">other_user</span><span class="p">)</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">ExceedsLimit</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">200</span><span class="p">,</span> <span class="n">account</span><span class="o">=</span><span class="n">other_account</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">other_account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_withdraw</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">withdraw</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">50</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">withdraw</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_fail_when_insufficient_funds</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deposit</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="n">InsufficientFunds</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">withdraw</span><span class="p">(</span><span class="mi">101</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">account</span><span class="o">.</span><span class="n">balance</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<h3 id="final-words"><a class="toclink" href="#final-words">Final Words</a></h3>
<p>The classmethod approach has proved itself in our development for quite some time now. We found that it provides the necessary <strong>flexibility, readability and testability with very little overhead</strong>.</p>
<p>In this article we presented two common issues we encounter frequentlyβ-βvalidation and concurrency. This method can be extended to handle <strong>access control</strong> (permissions) and <strong>caching</strong> (we have total control over the fetch, remember?), <strong>performance optimization</strong> (use select_related and update_fields...), <strong>audit and monitoring</strong> and additional business logic.</p>
<p>We usually support several interfaces for each modelβ-βadmin interface for support, API for mobile and SPA clients, and a dashboard. Encapsulating the business logic inside the model reduced the amount of code duplication and required tests which leads to overall <strong>quality code that is easy to maintain</strong>.</p>
<p>In a follow up post I (might) present the admin interface for this model with some neat tricks (such as custom actions, intermediate pages etc) and possibly an RPC implementation using DRF to interact with the account as an API.</p>Keeping Tests DRY with Class Based Tests In Python2016-08-20T00:00:00+03:002016-08-20T00:00:00+03:00Haki Benitatag:hakibenita.com,2016-08-20:/keeping-tests-dry-with-class-based-tests-in-python<p>Tests can be a bummer to write but even a bigger nightmare to maintain. When we noticed we are putting off simple tasks just because we were afraid to update some monster test case, we started looking for more creative ways to simplify the process of writing and maintaining tests. In this article I will describe a class based approach to writing tests.</p><hr>
<p>Tests can be a bummer to write but even a bigger nightmare to maintain. When we noticed we are putting off simple tasks just because we were afraid to update some monster test case, we started looking for more creative ways to simplify the process of writing and maintaining tests.</p>
<p><strong>In this article I will describe a class based approach to writing tests.</strong></p>
<p>Before we start writing code let's set some goals:</p>
<ul>
<li><strong>Extensive</strong> - We want our tests to cover as many scenarios as possible. We hope a solid platform for writing tests will make it easier for us to adapt to changes and cover more grounds.</li>
<li><strong>Expressive</strong> - Good tests tell a story. Issues become irrelevant and documents get lost but tests must always pass - this is why <strong>we treat our tests as specs</strong>. Writing good tests can help newcomers (and future self) to understand all the edge cases and micro-decisions made during development.</li>
<li><strong>Maintainable</strong> - As requirements and implementations change we want to adapt quickly with as little effort as possible.</li>
</ul>
<h3 id="enter-class-based-tests"><a class="toclink" href="#enter-class-based-tests">Enter Class Based Tests</a></h3>
<p>Articles and tutorials about testing always give simple examples such as <code>add</code> and <code>sub</code>. I rarely have the pleasure of testing such simple functions. I'll take a more realistic example and test an API endpoint that does login:</p>
<div class="highlight"><pre><span></span><span class="go">POST /api/account/login</span>
<span class="go">{</span>
<span class="go"> username: <str>,</span>
<span class="go"> password: <str></span>
<span class="go">}</span>
</pre></div>
<p>The scenarios we want to test are:</p>
<ul>
<li>User logins successfully.</li>
<li>User does not exist.</li>
<li>Incorrect password.</li>
<li>Missing or malformed data.</li>
<li>User already authenticated.</li>
</ul>
<p>The input to our test is:</p>
<ul>
<li>A payload, <code>username</code> and <code>password</code>.</li>
<li>The client performing the action, anonymous or authenticated.</li>
</ul>
<p>The output we want to test is:</p>
<ul>
<li>The return value, error or payload.</li>
<li>The response status code.</li>
<li>Side effects. For example, last login date after successful login.</li>
</ul>
<p>After properly defining the input and output, we can write a base test class:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">TestCase</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="k">class</span> <span class="nc">TestLogin</span><span class="p">:</span>
<span class="w"> </span><span class="sd">"""Base class for testing login endpoint."""</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">client</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">requests</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">username</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">password</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">payload</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">username</span><span class="p">,</span>
<span class="s1">'password'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">password</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">200</span>
<span class="n">expected_return_payload</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s1">'/api/account/login'</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_return_expected_status_code</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">expected_status_code</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_should_return_expected_payload</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">(),</span> <span class="bp">self</span><span class="o">.</span><span class="n">expected_return_payload</span><span class="p">)</span>
</pre></div>
<ul>
<li>We defined the input, <code>client</code> and <code>payload</code>, and the expected output <code>expected_*</code>.</li>
<li>We performed the login action during test <code>setUp</code>. To let specific test cases access the result, we kept the response on the class instance.</li>
<li>We implemented two common test cases:<ul>
<li>Test the expected status code.</li>
<li>Test the expected return value.</li>
</ul>
</li>
</ul>
<p>The observant reader might notice we raise a <code>NotImplementedError</code> exception from the properties. This way, if the test author forgets to set one of the required values for the test, they get a useful exception.</p>
<p>Lets use our <code>TestLogin</code> class to write a test for a successful login:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="k">class</span> <span class="nc">TestSuccessfulLogin</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
</span> <span class="n">username</span> <span class="o">=</span> <span class="s1">'Haki'</span><span class="p">,</span>
<span class="n">password</span> <span class="o">=</span> <span class="s1">'correct-password'</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">200</span>
<span class="n">expected_return_payload</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="s1">'Haki'</span><span class="p">,</span>
<span class="s1">'full_name'</span><span class="p">:</span> <span class="s1">'Haki Benita'</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_should_update_last_login_date_in_user_model</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">response</span><span class="o">.</span><span class="n">data</span><span class="p">[</span><span class="s1">'id'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertIsNotNone</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">last_login_date</span><span class="p">)</span>
</pre></div>
<p>By just reading the code we can tell that a <code>username</code> and <code>password</code> are sent. We expect a response with a 200 status code, and additional data about the user. We extended the test to also check the <code>last_login_date</code> in our user model. This specific test might not be relevant to all test cases, so we add it only to the successful test case.</p>
<p>Lets test a failed login scenario:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestInvalidPassword</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="s1">'Haki'</span>
<span class="n">password</span> <span class="o">=</span> <span class="s1">'wrong-password'</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">401</span>
<span class="k">class</span> <span class="nc">TestMissingPassword</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'username'</span><span class="p">:</span> <span class="s1">'Haki'</span><span class="p">}</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">400</span>
<span class="k">class</span> <span class="nc">TestMalformedData</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'username'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]}</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">400</span>
</pre></div>
<p>A developer that stumbles upon this piece of code will be able to tell exactly what should happen for any type of input. The name of the class describe the scenario, and the names of the attributes describe the input. Together, <strong>the class tells a story which is easy to read and understand</strong>.</p>
<p>The last two tests set the payload directly (without setting username and password). This won't raise a NotImplementedError because we override the payload property directly, which is the one calling username and password.</p>
<p><strong>A good test should help you find where the problem is.</strong></p>
<p>Let's see the output of a failed test case:</p>
<div class="highlight"><pre><span></span><span class="hll"><span class="go">FAIL: test_should_return_expected_status_code (tests.test_login.TestInvalidPassword)</span>
</span><span class="go">------------------------------------------------------</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "../tests/test_login.py", line 28, in test_should_return_expected_status_code</span>
<span class="go"> self.assertEqual(self.response.status_code, self.expected_status_code)</span>
<span class="hll"><span class="go">AssertionError: 400 != 401</span>
</span><span class="go">------------------------------------------------------</span>
</pre></div>
<p>Looking at the failed test report, it is clear what went wrong. When the password is invalid we expect status code 401, but we received 400.</p>
<p>Let's make things a bit harder, and test an authenticated user attempting to login:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestAuthenticatedUserLogin</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">,</span> <span class="n">TestCase</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="s1">'Haki'</span>
<span class="n">password</span> <span class="o">=</span> <span class="s1">'correct-password'</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">client</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">session</span><span class="p">()</span>
<span class="n">session</span><span class="o">.</span><span class="n">auth</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Haki'</span><span class="p">,</span> <span class="s1">'correct-password'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">session</span>
<span class="n">expected_status_code</span> <span class="o">=</span> <span class="mi">400</span>
</pre></div>
<p>This time we had to override the client property to authenticate the session.</p>
<h3 id="putting-our-test-to-the-test"><a class="toclink" href="#putting-our-test-to-the-test">Putting Our Test To The Test</a></h3>
<p>To illustrate how resilient our new test cases are lets see how we can modify the base class as we introduce new requirements and changes:</p>
<ul>
<li>We have made some refactoring and the <strong>endpoint changed</strong> to <code>/api/user/login</code>:</li>
</ul>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestLogin</span><span class="p">:</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="hll"> <span class="s1">'/api/user/login'</span><span class="p">,</span>
</span> <span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<ul>
<li>Someone decided it can speed things up if we <strong>use a different serialization format</strong> (msgpack, xml, yaml):</li>
</ul>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestLogin</span><span class="p">:</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="s1">'/api/account/login'</span><span class="p">,</span>
<span class="hll"> <span class="n">data</span><span class="o">=</span><span class="n">encode</span><span class="p">(</span><span class="n">payload</span><span class="p">),</span>
</span> <span class="p">)</span>
</pre></div>
<ul>
<li>The product guys want to go global, and now we need to test <strong>different languages</strong>:</li>
</ul>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestLogin</span><span class="p">:</span>
<span class="n">language</span> <span class="o">=</span> <span class="s1">'en'</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
<span class="hll"> <span class="s1">'/</span><span class="si">{}</span><span class="s1">/api/account/login'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">language</span><span class="p">),</span>
</span> <span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>None of the changes above managed to break our existing tests.</p>
<h3 id="taking-it-a-step-further"><a class="toclink" href="#taking-it-a-step-further">Taking it a Step Further</a></h3>
<p>A few things to consider when employing this technique.</p>
<h4 id="speed-things-up"><a class="toclink" href="#speed-things-up">Speed Things Up</a></h4>
<p><code>setUp</code> is executed for each test case in the class (test cases are the functions beginning with <code>test_*</code>). To speed things up, it is <strong>better to perform the action in <code>setUpClass</code></strong>. This changes a few things. For example, the properties we used should be set as attributes on the class or as <code>@classmethod</code>s.</p>
<h4 id="using-fixtures"><a class="toclink" href="#using-fixtures">Using Fixtures</a></h4>
<p>When using <strong>Django with fixtures</strong>, the action should go in <strong>setUpTestData</strong>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestLogin</span><span class="p">:</span>
<span class="n">fixtures</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'test/users'</span><span class="p">,</span>
<span class="p">)</span>
<span class="nd">@classmethod</span>
<span class="hll"> <span class="k">def</span> <span class="nf">setUpTestData</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">setUpTestData</span><span class="p">()</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">response</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">get_client</span><span class="p">()</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s1">'/api/account/login'</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="n">payload</span><span class="p">)</span>
</pre></div>
<p>Django loads fixtures at <code>setUpTestData</code> so by calling super the action is executed after the fixtures were loaded.</p>
<p>Another quick note about Django and requests. I've used the <code>requests</code> package but Django, and the popular Django <code>restframework</code>, provide their own clients. <a href="https://docs.djangoproject.com/en/2.1/topics/testing/tools/#default-test-client" rel="noopener"><code>django.test.Client</code></a> in Django's client, and <a href="https://www.django-rest-framework.org/api-guide/testing/#apiclient" rel="noopener"><code>rest_framework.test.APIClient</code></a> is DRF's client.</p>
<h4 id="testing-exceptions"><a class="toclink" href="#testing-exceptions">Testing Exceptions</a></h4>
<p>When a function raise an exception, we can extend the base class and wrap the action with <code>try ... catch</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TestLoginFailure</span><span class="p">(</span><span class="n">TestLogin</span><span class="p">):</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">expected_exception</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">setUp</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">exception</span> <span class="o">=</span> <span class="n">e</span>
</span>
<span class="k">def</span> <span class="nf">test_should_raise_expected_exception</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertIsInstance</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">exception</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">expected_exception</span>
<span class="p">)</span>
</pre></div>
<p>If you are familiar with the <a href="https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertRaises" rel="noopener"><code>assertRaises</code></a> context, I haven't used it in this case because the test should not fail during <code>setUp</code>.</p>
<h4 id="create-mixins"><a class="toclink" href="#create-mixins">Create Mixins</a></h4>
<p>Test cases are repetitive by nature. With mixins, we can abstract common parts of tests cases and compose new ones. For example:</p>
<ul>
<li><code>TestAnonymousUserMixin</code> - populates the test with anonymous API client.</li>
<li><code>TestRemoteResponseMixin</code> - mock response from remote service.</li>
</ul>
<p>The later, might look something like this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="k">class</span> <span class="nc">TestRemoteServiceXResponseMixin</span><span class="p">:</span>
<span class="n">mock_response_data</span> <span class="o">=</span> <span class="kc">None</span>
<span class="nd">@classmethod</span>
<span class="nd">@mock</span><span class="o">.</span><span class="n">patch</span><span class="p">(</span><span class="s1">'path.to.function.making.remote.request'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">setUpTestData</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">mock_remote</span><span class="p">)</span>
<span class="n">mock_remote</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">mock_response_data</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">setUpTestData</span><span class="p">()</span>
</pre></div>
<h3 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h3>
<p>Someone once said that <em>duplication is cheaper than the wrong abstraction</em>. I couldn't agree more. <strong>If your tests do not fit easily into a pattern then this solution is probably not the right one</strong>. It's important to carefully decide what to abstract. The more you abstract, the more flexible your tests are. But, as parameters pile up in base classes, tests are becoming harder to write and maintain, and we go back to square one.</p>
<p>Having said that, we found this technique to be useful in various situations and with different frameworks (such as Tornado and Django). Over time it has proven itself as being resilient to changes and easy to maintain. This is what we set out to achieve and we consider it a success!</p>Things You Must Know About Django Admin As Your App Gets Bigger2016-08-05T00:00:00+03:002016-08-05T00:00:00+03:00Haki Benitatag:hakibenita.com,2016-08-05:/things-you-must-know-about-django-admin-as-your-app-gets-bigger<p>The Django admin is a very powerful tool. We use it for day to day operations, browsing data and support. As we grew some of our projects from zero to 100K+ users we started experiencing some of Django's admin pain pointsβ-βlong response times and heavy load on the database.</p><hr>
<p>The Django admin is a very powerful tool. We use it for day to day operations,browsing data and support. As we grew some of our projects from zero to 100K+users we started experiencing some of Django's admin pain points - long response times and heavy load on the database.</p>
<p>In this short article I am going to share some simple techniques we use in our projects to make the Django admin behave as apps grow in size and complexity.</p>
<p><em>We use Django 1.8, Python 3.4 and PostgreSQL 9.4. The code samples are for Python 3.4 but they can be easily modified to work on 2.7 and other Django versions.</em></p>
<hr>
<h3 id="before-we-start"><a class="toclink" href="#before-we-start">Before We Start</a></h3>
<p>These are the main components in a Django Admin list view:</p>
<figure><img alt="Example admin list view including some of the components discussed in the article" src="https://hakibenita.com/images/01-things-you-must-know-about-django-admin-as-your-app-gets-bigger.png"><figcaption>Example admin list view including some of the components discussed in the article</figcaption>
</figure>
<h4 id="logging"><a class="toclink" href="#logging">Logging</a></h4>
<p>Most of Django's work is performing SQL queries so our main focus will be on
<strong>minimizing the amount of queries</strong>. To keep track of query execution you can
use one of the following:</p>
<ul>
<li><a href="https://django-debug-toolbar.readthedocs.io/en/stable/" rel="noopener">django-debug-toolbar</a> - Very nice utility that adds a little panel on the side of the screen with a list of SQL queries executed and other useful metrics.</li>
<li>If you don't like dependencies (like us) you can log SQL queries to the console by adding the following logger in settings.py:</li>
</ul>
<div class="highlight"><pre><span></span><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'loggers'</span><span class="p">:</span> <span class="p">{</span>
<span class="hll"> <span class="s1">'django.db.backends'</span><span class="p">:</span> <span class="p">{</span>
</span><span class="hll"> <span class="s1">'level'</span><span class="p">:</span> <span class="s1">'DEBUG'</span><span class="p">,</span>
</span> <span class="p">},</span>
<span class="p">},</span>
<span class="c1"># ...</span>
<span class="p">}</span>
</pre></div>
<hr>
<h3 id="the-n1-problem"><a class="toclink" href="#the-n1-problem">The N+1 Problem</a></h3>
<p>The N+1 problem is a well known problem in ORMs. To illustrate the problem let's say we have this schema:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Category</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
<span class="n">def__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span>
<span class="k">class</span> <span class="nc">Product</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
<span class="n">category</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Category</span><span class="p">)</span>
</pre></div>
<p>By implementing <code>__str__</code> we tell Django that we want the name of the category to be used as the default description of the object. Whenever we print a category object, Django will fetch the name of the category.</p>
<p>A simple admin page for our Product model might look like this:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Product</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ProductAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">list_display</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">,</span>
<span class="s1">'category'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>This seems innocent enough but the SQL log reveals the horror:</p>
<div class="highlight"><pre><span></span><span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>COUNT<span class="o">(</span>*<span class="o">)</span><span class="w"> </span>AS<span class="w"> </span><span class="s2">"__count"</span><span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_product"</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=()</span>
<span class="o">(</span><span class="m">0</span>.002<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span><span class="s2">"app_product"</span>.<span class="s2">"id"</span>,<span class="w"> </span><span class="s2">"app_product"</span>.<span class="s2">"name"</span>,<span class="w"> </span><span class="s2">"app_product"</span>.<span class="s2">"category_id"</span>
<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_product"</span><span class="w"> </span>ORDER<span class="w"> </span>BY<span class="w"> </span><span class="s2">"app_product"</span>.<span class="s2">"id"</span><span class="w"> </span>DESC<span class="w"> </span>LIMIT<span class="w"> </span><span class="m">100</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=()</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">1</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">2</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">1</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">4</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">3</span><span class="o">)</span>
...
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">2</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">99</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">99</span><span class="o">)</span>
<span class="o">(</span><span class="m">0</span>.000<span class="o">)</span><span class="w"> </span>SELECT<span class="w"> </span>...<span class="w"> </span>FROM<span class="w"> </span><span class="s2">"app_category"</span><span class="w"> </span>where<span class="w"> </span><span class="s2">"app_category"</span>.<span class="s2">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">104</span><span class="p">;</span><span class="w"> </span><span class="nv">args</span><span class="o">=(</span><span class="m">104</span><span class="o">)</span>
</pre></div>
<p>Django first counts the objects (more on that later), then fetches the actual objects (limiting to the default page size of 100) and then passes the data on to the template for rendering. We used the category name as the description of the <code>Category</code> object, so for each product Django has to fetch the category name. This results in <strong>100 additional queries</strong>.</p>
<p>To tell Django we want to perform a join instead of fetching the names of the categories one by one, we can use <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_select_related" rel="noopener"><code>list_select_related</code></a>:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Product</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ProductAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">list_display</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">,</span>
<span class="s1">'category'</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="n">list_select_related</span> <span class="o">=</span> <span class="p">(</span>
</span><span class="hll"> <span class="s1">'category'</span><span class="p">,</span>
</span><span class="hll"> <span class="p">)</span>
</span></pre></div>
<p>Now, the SQL log looks much nicer. Instead of 101 queries we have only 1:</p>
<div class="highlight"><pre><span></span>(0.004) SELECT "app_product"."id", "app_product"."name",
"app_product"."category_id", "app_category"."id", "app_category"."name"
FROM "app_product"
INNER JOIN "app_category" on ("app_product"."category_id" = "app_category"."id")
ORDER BY "app_product"."id" DESC LIMIT 100; args=()
</pre></div>
<p>To understand the real impact of this setting consider the following. Django default page size is 100 objects. If you have one related fields you have ~101 queries. If you have two related objects displayed in the list view, you have ~201 queries and so on.</p>
<p>Fetching related fields in a join can only work for <code>ForeignKey</code> relations. If you wish to display <code>ManyToMany</code> relations it's a bit more complicated (and most of the time wrong, but keep reading).</p>
<hr>
<h3 id="related-fields"><a class="toclink" href="#related-fields">Related Fields</a></h3>
<p>Sometimes it can be useful to quickly navigate between objects. After trying for a while to teach support personnel to filter using URL parameters, we finally gave up and created two simple decorators.</p>
<h4 id="admin_link"><a class="toclink" href="#admin_link"><code>admin_link</code></a></h4>
<p>Create a link to a detail page of a related model:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">admin_change_url</span><span class="p">(</span><span class="n">obj</span><span class="p">):</span>
<span class="n">app_label</span> <span class="o">=</span> <span class="n">obj</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">app_label</span>
<span class="n">model_name</span> <span class="o">=</span> <span class="n">obj</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="vm">__name__</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
<span class="k">return</span> <span class="n">reverse</span><span class="p">(</span><span class="s1">'admin:</span><span class="si">{}</span><span class="s1">_</span><span class="si">{}</span><span class="s1">_change'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">app_label</span><span class="p">,</span> <span class="n">model_name</span>
<span class="p">),</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">obj</span><span class="o">.</span><span class="n">pk</span><span class="p">,))</span>
<span class="k">def</span> <span class="nf">admin_link</span><span class="p">(</span><span class="n">attr</span><span class="p">,</span> <span class="n">short_description</span><span class="p">,</span> <span class="n">empty_description</span><span class="o">=</span><span class="s2">"-"</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Decorator used for rendering a link to a related model in</span>
<span class="sd"> the admin detail page.</span>
<span class="sd"> attr (str):</span>
<span class="sd"> Name of the related field.</span>
<span class="sd"> short_description (str):</span>
<span class="sd"> Name if the field.</span>
<span class="sd"> empty_description (str):</span>
<span class="sd"> Value to display if the related field is None.</span>
<span class="sd"> The wrapped method receives the related object and should</span>
<span class="sd"> return the link text.</span>
<span class="sd"> Usage:</span>
<span class="sd"> @admin_link('credit_card', _('Credit Card'))</span>
<span class="sd"> def credit_card_link(self, credit_card):</span>
<span class="sd"> return credit_card.name</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">wrap</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">field_func</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">):</span>
<span class="n">related_obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">attr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">related_obj</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">empty_description</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">admin_change_url</span><span class="p">(</span><span class="n">related_obj</span><span class="p">)</span>
<span class="k">return</span> <span class="n">format_html</span><span class="p">(</span><span class="s1">'<a href="</span><span class="si">{}</span><span class="s1">"></span><span class="si">{}</span><span class="s1"></a>'</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">func</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">related_obj</span><span class="p">))</span>
<span class="n">field_func</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="n">short_description</span>
<span class="n">field_func</span><span class="o">.</span><span class="n">allow_tags</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">field_func</span>
<span class="k">return</span> <span class="n">wrap</span>
</pre></div>
<p>The decorator will render a link (<code><a href="...">...</a></code>) to the related model in both the list view and the detail view. If for example, we want to add a link from each product to its category detail page, we use the decorator like this:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Product</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ProductAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">list_display</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'id'</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">,</span>
<span class="s1">'category_link'</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">admin_select_related</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'category'</span><span class="p">,</span>
<span class="p">)</span>
<span class="hll"> <span class="nd">@admin_link</span><span class="p">(</span><span class="s1">'category'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Category'</span><span class="p">))</span>
</span><span class="hll"> <span class="k">def</span> <span class="nf">category_link</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">category</span><span class="p">):</span>
</span><span class="hll"> <span class="k">return</span> <span class="n">category</span>
</span></pre></div>
<h4 id="admin_changelist_link"><a class="toclink" href="#admin_changelist_link"><code>admin_changelist_link</code></a></h4>
<p>More complicated links such as "all the products of a category" require a different implementation. We created a decorator that accepts a query string, and link to the list view of a related model:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">admin_changelist_url</span><span class="p">(</span><span class="n">model</span><span class="p">):</span>
<span class="n">app_label</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">app_label</span>
<span class="n">model_name</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="vm">__name__</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
<span class="k">return</span> <span class="n">reverse</span><span class="p">(</span><span class="s1">'admin:</span><span class="si">{}</span><span class="s1">_</span><span class="si">{}</span><span class="s1">_changelist'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">app_label</span><span class="p">,</span> <span class="n">model_name</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">admin_changelist_link</span><span class="p">(</span>
<span class="n">attr</span><span class="p">,</span>
<span class="n">short_description</span><span class="p">,</span>
<span class="n">empty_description</span><span class="o">=</span><span class="s1">'-'</span><span class="p">,</span>
<span class="n">query_string</span><span class="o">=</span><span class="kc">None</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">"""Decorator used for rendering a link to the list display of</span>
<span class="sd"> a related model in the admin detail page.</span>
<span class="sd"> attr (str):</span>
<span class="sd"> Name of the related field.</span>
<span class="sd"> short_description (str):</span>
<span class="sd"> Field display name.</span>
<span class="sd"> empty_description (str):</span>
<span class="sd"> Value to display if the related field is None.</span>
<span class="sd"> query_string (function):</span>
<span class="sd"> Optional callback for adding a query string to the link.</span>
<span class="sd"> Receives the object and should return a query string.</span>
<span class="sd"> The wrapped method receives the related object and</span>
<span class="sd"> should return the link text.</span>
<span class="sd"> Usage:</span>
<span class="sd"> @admin_changelist_link('credit_card', _('Credit Card'))</span>
<span class="sd"> def credit_card_link(self, credit_card):</span>
<span class="sd"> return credit_card.name</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">wrap</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">field_func</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">):</span>
<span class="n">related_obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">attr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">related_obj</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">empty_description</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">admin_changelist_url</span><span class="p">(</span><span class="n">related_obj</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
<span class="k">if</span> <span class="n">query_string</span><span class="p">:</span>
<span class="n">url</span> <span class="o">+=</span> <span class="s1">'?'</span> <span class="o">+</span> <span class="n">query_string</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="k">return</span> <span class="n">format_html</span><span class="p">(</span><span class="s1">'<a href="</span><span class="si">{}</span><span class="s1">"></span><span class="si">{}</span><span class="s1"></a>'</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">func</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">related_obj</span><span class="p">))</span>
<span class="n">field_func</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="n">short_description</span>
<span class="n">field_func</span><span class="o">.</span><span class="n">allow_tags</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">field_func</span>
<span class="k">return</span> <span class="n">wrap</span>
</pre></div>
<p>To add a link from a category to of its products, we do the following in <code>CategoryAdmin</code>:</p>
<div class="highlight"><pre><span></span>@admin.register(models.Category)
class CategoryAdmin(admin.ModelAdmin):
list_display = (
'id',
'name',
'products_link',
)
@admin_changelist_link('products', _('Products'),
query_string=lambda c: 'category_id={}'.format(c.pk))
def products_link(self, products):
return _('Products')
</pre></div>
<p>Be careful with the products argument. It is very tempting to do something like this:</p>
<div class="highlight"><pre><span></span><span class="c1"># Bad example</span>
<span class="nd">@admin_changelist_link</span><span class="p">(</span><span class="s1">'products'</span><span class="p">,</span> <span class="n">_</span><span class="p">(</span><span class="s1">'Products'</span><span class="p">),</span>
<span class="n">query_string</span><span class="o">=</span><span class="k">lambda</span> <span class="n">c</span><span class="p">:</span> <span class="s1">'category_id=</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">pk</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">products_link</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">products</span><span class="p">):</span>
<span class="c1"># Dont do that!</span>
<span class="k">return</span> <span class="s1">'see </span><span class="si">{}</span><span class="s1"> products'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">products</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
</pre></div>
<p>The example above will result in additional queries.</p>
<h4 id="readonly_fields"><a class="toclink" href="#readonly_fields"><code>readonly_fields</code></a></h4>
<p>In the detail page, Django creates an editable element for each field. Text and numeric fields will be rendered as regular input field. Choice fields and foreign key fields will be rendered as a <code><select></code> element. To render a select box Django has to do the following:</p>
<ol>
<li>Fetch the options - the entire related model and their descriptions (<em>remember the N+1 problem?</em>).</li>
<li>Render the option list - one option for each related model instance.</li>
</ol>
<p>A common scenario that is often overlooked, is foreign key to the <code>User</code> model. When you have 100 users you might not notice the load, but what happens when you suddenly have 100K users? <strong>The detail page will fetch the entire users table, and the option list will make the resulting HTML huge</strong>. We pay twice, first for the full table scan, and then for downloading the html file. Not to mention the memory required to generate the html file in the first place.</p>
<p>Having a select element with 100K options is not really usable. The easiest way to prevent Django from rendering a field as a <code><select></code> element is to mark it as <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.readonly_fields" rel="noopener">readonly_fields</a>:</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">SomeModel</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">SomeModelAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">readonly_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>This will render the description of the related model, without being able to change it in the admin.</p>
<p>Another option to prevent Django from rendering a select box, is to mark the field as <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.raw_id_fields" rel="noopener"><code>raw_id fields</code></a>.</p>
<div class="highlight"><pre><span></span><span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">SomeModel</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">SomeModelAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">raw_id_fields</span> <span class="o">=</span> <span class="p">(</span>
<span class="s1">'user'</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
<p>Using <code>raw_id_fields</code>, Django will render a special widget that shows the id of the value, and an option to open a list of all values in a popup window. <strong>This option is very useful when you want to edit a foreign key value</strong>.</p>
<hr>
<h3 id="filters"><a class="toclink" href="#filters">Filters</a></h3>
<p>We often use the admin interface as a day to day tool for general support. We found that most of the times we use the same filters: only active users, users registered in the last month, successful transactions ans so on. Once we realized that, we asked ourselves, why fetch the entire dataset if we are most likely to immediately apply a filter to it?. We started to look for a way to <strong>apply a default filter when entering the model list view</strong>.</p>
<h4 id="defaultfiltermixin"><a class="toclink" href="#defaultfiltermixin"><code>DefaultFilterMixin</code></a></h4>
<p>There are many approaches to apply default filters. Some approaches involve custom filters or injecting special query parameters to the request. We wanted to avoid those.</p>
<p>We found that the following approach to be simple and straightforward:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlencode</span>
<span class="kn">from</span> <span class="nn">django.shortcuts</span> <span class="kn">import</span> <span class="n">redirect</span>
<span class="k">class</span> <span class="nc">DefaultFilterMixin</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">get_default_filters</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Set default filters to the page.</span>
<span class="sd"> request (Request)</span>
<span class="sd"> Returns (dict):</span>
<span class="sd"> Default filter to encode.</span>
<span class="sd"> """</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">changelist_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">ref</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'HTTP_REFERER'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'PATH_INFO'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="c1"># If already have query parameters or if the page</span>
<span class="c1"># was referred from it self (by drilldown or redirect)</span>
<span class="c1"># don't apply default filter.</span>
<span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">GET</span> <span class="ow">or</span> <span class="n">ref</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">changelist_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">extra_context</span><span class="o">=</span><span class="n">extra_context</span><span class="p">)</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">urlencode</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">get_default_filters</span><span class="p">(</span><span class="n">request</span><span class="p">))</span>
<span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1">?</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">query</span><span class="p">))</span>
</pre></div>
<p>If the list view was accessed from a different view, and no query params were specified, we generate a default query and redirect.</p>
<p>Let's apply a default filter to our product page to show only products created in the last month:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Product</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ProductAdmin</span><span class="p">(</span><span class="n">DefaultFilterMixin</span><span class="p">,</span> <span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">date_hierarchy</span> <span class="o">=</span> <span class="s1">'created'</span>
<span class="hll"> <span class="k">def</span> <span class="nf">get_default_filters</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
</span> <span class="n">now</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'created__year'</span><span class="p">:</span> <span class="n">now</span><span class="o">.</span><span class="n">year</span><span class="p">,</span>
<span class="s1">'created__month'</span><span class="p">:</span> <span class="n">now</span><span class="o">.</span><span class="n">month</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>If we drill down from within the page, or if we get to the page with query parameters, the default filter will not be applied.</p>
<hr>
<h3 id="quick-bits"><a class="toclink" href="#quick-bits">Quick Bits</a></h3>
<p>Some neat tricks we gathered over time.</p>
<h4 id="show_full_result_count"><a class="toclink" href="#show_full_result_count"><code>show_full_result_count</code></a></h4>
<p>Prevent Django from showing the total number of rows in the list view. Setting <a href="(https://docs.djangoproject.com/ja/1.9/ref/contrib/admin/#django.contrib.admin.ModelAdmin.show_full_result_count)"><code>show_full_result_count=False</code></a> saves a <code>count(*)</code> query on the queryset on every page load.</p>
<h4 id="defer"><a class="toclink" href="#defer"><code>defer</code></a></h4>
<p>When performing a query the entire resultset is put into memory for processing. If you have large columns in your model such as JSON or Text fields, it might be a good idea to <a href="(https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.defer)">defer</a> them until you really need to use them. To defer fields override <a href="https://docs.djangoproject.com/ja/1.9/ref/contrib/admin/#django.contrib.admin.ModelAdmin.get_queryset" rel="noopener"><code>get_queryset</code></a>.</p>
<h4 id="change-the-admin-default-url-route"><a class="toclink" href="#change-the-admin-default-url-route">Change the Admin Default URL Route</a></h4>
<p>This is definitely not the only precaution you should take to protect your admin page, but it can make it harder for "curious" users to reach the login page.</p>
<p>In your main <code>urls.py</code> override the default admin route:</p>
<div class="highlight"><pre><span></span><span class="c1"># urls.py</span>
<span class="kn">from</span> <span class="nn">django.conf.urls</span> <span class="kn">import</span> <span class="n">include</span><span class="p">,</span> <span class="n">url</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^foo/'</span><span class="p">,</span> <span class="n">include</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">urls</span><span class="p">)),</span>
<span class="p">]</span>
</pre></div>
<div class="admonition tip">
<p class="admonition-title">see also</p>
<p>I wrote a bunch of tips on <a href="5-ways-to-make-django-admin-safer">how to make Django admin safer</a>.</p>
</div>
<h4 id="date_hierarchy"><a class="toclink" href="#date_hierarchy"><code>date_hierarchy</code></a></h4>
<p>We found that this index can be used to improve queries generate with <a href="https://docs.djangoproject.com/ja/1.9/ref/contrib/admin/#django.contrib.admin.ModelAdmin.date_hierarchy" rel="noopener">date hierarchy</a> predicate in PostgresSQL 9.4:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">yourmodel_date_hierarchy_ix</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">yourmodel_table</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'day'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="k">time</span><span class="w"> </span><span class="k">zone</span><span class="w"> </span><span class="s1">'America/New_York'</span><span class="p">),</span>
<span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'month'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="k">time</span><span class="w"> </span><span class="k">zone</span><span class="w"> </span><span class="s1">'America/New_York'</span><span class="p">),</span>
<span class="w"> </span><span class="k">extract</span><span class="p">(</span><span class="s1">'year'</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">created</span><span class="w"> </span><span class="k">at</span><span class="w"> </span><span class="k">time</span><span class="w"> </span><span class="k">zone</span><span class="w"> </span><span class="s1">'America/New_York'</span><span class="p">)</span>
<span class="p">);</span>
</pre></div>
<p>Make sure to change table name, index name, the date hierarchy column and the time zone.</p>
<div class="admonition tip">
<p class="admonition-title">see also</p>
<p>I wrote about <a href="scaling-django-admin-date-hierarchy">scaling Django admin <code>date_hierarchy</code></a>.</p>
</div>
<hr>
<h3 id="conclusion"><a class="toclink" href="#conclusion">Conclusion</a></h3>
<p>Even if you don't have 100K users and millions of records in the database, it is still important to keep the admin tidy. Bad code has this nasty tendency of biting you in the ass when you least expect it.</p>