One of my favorite job interview questions is this:
Write a function that returns tomorrow's date
This looks innocent enough for someone to suggest this as a solution:
import datetime
def tomorrow() -> datetime.date:
return datetime.date.today() + datetime.timedelta(days=1)
This will work, but there is a followup question:
How would you test this function?
Before you move on.... take a second to think about your answer.
Table of Contents
Naive Approach
The most naive approach to test a function that returns tomorrow's date is this:
# Bad
assert tomorrow() == datetime.date(2020, 4, 16)
This test will pass today, but it will fail on any other day.
Another way to test the function is this:
# Bad
assert tomorrow() == datetime.date.today() + datetime.timedelta(days=1)
This will also work, but there is an inherent problem with this approach. The same way you can't define a word in the dictionary using itself, you should not test a function by repeating its implementation.
Another problem with this approach is that it's only testing one scenario, for the day it is executed. What about getting the next day across a month or a year? What about the day after 2020-02-28?
The problem with both implementations is that today
is set inside the function, and to simulate different test scenarios you need to control this value. One solution that comes to mind is to mock datetime.date
, and try to set the value returned by today()
:
>>> from unittest import mock
>>> with mock.patch('datetime.date.today', return_value=datetime.date(2020, 1, 1)):
... assert tomorrow() == datetime.date(2020, 1, 2)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/unittest/mock.py", line 1410, in __enter__
setattr(self.target, self.attribute, new_attr)
TypeError: can't set attributes of built-in/extension type 'datetime.date'
As the exception suggests, built-in modules written in C cannot be mocked. The unittest.mock
documentation specifically addresses this attempt to mock the datetime module. Apparently, this is a very common issue and the writers of the official documentation felt it's worth mentioning. They even go the extra mile and link to a blog post on this exact problem. The article is worth a read, and we are going to address the solution it presents later on.
Like every other problem in Python, there are libraries that provide a solution. Two libraries that stand out are freezegun
and libfaketime
. Both provide the ability to mock time at different levels. However, resorting to external libraries is a luxury only developers of legacy system can afford. For new projects, or projects that are small enough to change, there are other alternatives that can keep the project free of these dependencies.
Dependency Injection
The problem we were trying to solve with mock, can also be solved by changing the function's API:
import datetime
def tomorrow(asof: datetime.date) -> datetime.date:
return asof + datetime.timedelta(days=1)
To control the reference time of the function, the time can be provided as an argument. This makes it easier to test the function in different scenarios:
import datetime
assert tomorrow(asof=datetime.date(2020, 5, 1)) == datetime.date(2020, 5, 2)
assert tomorrow(asof=datetime.date(2019, 12, 31)) == datetime.date(2020, 1, 1)
assert tomorrow(asof=datetime.date(2020, 2, 28)) == datetime.date(2020, 2, 29)
assert tomorrow(asof=datetime.date(2021, 2, 28)) == datetime.date(2021, 3, 1)
To remove the function's dependency on datetime.date.today
, we provide today's date as an argument. This pattern of providing, or "injecting" dependencies into functions and objects is often called "dependency injection", or in short "DI".
Dependency Injection in The Wild
Dependency injection is a way to decouple modules from each other. As our previous example shows, the function tomorrow
no longer depends on today
.
Using dependency injection is very common and often very intuitive. It's very likely that you already use it without even knowing. For example, this article suggests that providing an open file to json.load
is a form of dependency injection:
import json
with open('path/to/file.json', 'r') as f:
data = json.load(f)
The popular test framework pytest builds its entire fixture infrastructure around the concept of dependency injection:
import pytest
@pytest.fixture
def one() -> int:
return 1
@pytest.fixture
def two() -> int:
return 2
def test_one_is_less_than_two(one: int, two: int) -> None:
assert one < two
The functions one
and two
are declared as fixtures. When pytest executes the test function test_one_is_less_than_two
, it will provide it with the values returned by the fixture functions matching the attribute names. In pytest, the injection is magically happening simply by using the name of a known fixture as an argument.
Dependency injection is not limited just to Python. The popular JavaScript framework Angular is also built around dependency injection:
@Component({
selector: 'order-list',
template: `...`
})
export class OrderListComponent {
orders: Order[];
constructor(orderService: OrderService) {
this.orders = orderService.getOrders();
}
}
Notice how the orderService
is provided, or injected, to the constructor. The component is using the order service, but is not instantiating it.
Injecting Functions
Sometimes injecting a value is not enough. For example, what if we need to get the current date before and after some operation:
from typing import Tuple
import datetime
def go() -> Tuple[datetime.datetime, datetime.datetime]:
started_at = datetime.datetime.now()
# Do something ...
ended_at = datetime.datetime.now()
return started_at, ended_at
To test this function, we can provide the start time like we did before, but we can't provide the end time. One way to approach this is to make the calls to start and end outside the function. This is a valid solution, but for the sake of discussion we'll assume they need to be called inside.
Since we can't mock datetime.datetime
itself, one way to make this function testable is to create a separate function that returns the current date:
from typing import Tuple
import datetime
def now() -> datetime.datetime:
return datetime.datetime.now()
def go() -> Tuple[datetime.datetime, datetime.datetime]:
started_at = now()
# Do something ...
ended_at = now()
return started_at, ended_at
To control the values returned by the function now
in tests, we can use a mock:
>>> from unittest import mock
>>> fake_start = datetime.datetime(2020, 1, 1, 15, 0, 0)
>>> fake_end = datetime.datetime(2020, 1, 1, 15, 1, 30)
>>> with mock('__main__.now', side_effect=[fake_start, fake_end]):
... go()
(datetime.datetime(2020, 1, 1, 15, 0),
datetime.datetime(2020, 1, 1, 15, 1, 30))
Another way to approach this without mocking, is to rewrite the function once again:
from typing import Callable, Tuple
import datetime
def go(
now: Callable[[], datetime.datetime],
) -> Tuple[datetime.datetime, datetime.datetime]:
started_at = now()
# Do something ...
ended_at = now()
return started_at, ended_at
This time we provide the function with another function that returns a datetime. This is very similar to the first solution we suggested, when we injected the datetime itself to the function.
The function can now be used like this:
>>> go(datetime.datetime.now)
(datetime.datetime(2020, 4, 18, 14, 14, 5, 687471),
datetime.datetime(2020, 4, 18, 14, 14, 5, 687475))
To test it, we provide a different function that returns known datetimes:
>>> fake_start = datetime.datetime(2020, 1, 1, 15, 0, 0)
>>> fake_end = datetime.datetime(2020, 1, 1, 15, 1, 30)
>>> gen = iter([fake_start, fake_end])
>>> go(lambda: next(gen))
(datetime.datetime(2020, 1, 1, 15, 0),
datetime.datetime(2020, 1, 1, 15, 1, 30))
This pattern can be generalized even more using a utility object:
from typing import Iterator
import datetime
def ticker(
start: datetime.datetime,
interval: datetime.timedelta,
) -> Iterator[datetime.datetime]:
"""Generate an unending stream of datetimes in fixed intervals.
Useful to test processes which require datetime for each step.
"""
current = start
while True:
yield current
current += interval
Using ticker
, the test will now look like this:
>>> gen = ticker(datetime.datetime(2020, 1, 1, 15, 0, 0), datetime.timedelta(seconds=90))
>>> go(lambda: next(gen)))
(datetime.datetime(2020, 1, 1, 15, 0),
datetime.datetime(2020, 1, 1, 15, 1, 30))
Fun fact: the name "ticker" was stolen from Go.
Injecting Values
The previous sections demonstrate injection of both values and functions. It's clear from the examples that injecting values is much simpler. This is why it's usually favorable to inject values rather than functions.
Another reason is consistency. Take this common pattern that is often used in Django models:
from django.db import models
class Order(models.Model):
created = models.DateTimeField(auto_now_add=True)
modified = models.DateTimeField(auto_now=True)
The model Order
includes two datetime fields, created
and modified
. It uses Django's auto_now_add
attribute to automatically set created
when the object is saved for the first time, and auto_now
to set modified
every time the object is saved.
Say we create a new order and save it to the database:
>>> o = Order.objects.create()
Would you expect this test to fail:
>>> assert o.created == o.modified
False
This is very unexpected. How can an object that was just created have two different values for created
and modified
? Can you imagine what would happen if you rely on modified
and created
to be equal when an object was never changed, and actually use it to identify unchanged objects:
from django.db.models import F
# Wrong!
def get_unchanged_objects():
return Order.objects.filter(created=F('modified'))
For the Order
model above, this function will always return an empty queryset.
The reason for this unexpected behavior is that each individual DateTimeField
is using django.timezone.now
internally during save()
to get the current time. The time between when the two fields are populated by Django causes the values to end up slightly different:
>>> o.created
datetime.datetime(2020, 4, 18, 11, 41, 35, 740909, tzinfo=<UTC>)
>>> o.modified
datetime.datetime(2020, 4, 18, 11, 41, 35, 741015, tzinfo=<UTC>)
If we treat timezone.now
like an injected function, we understand the inconsistencies it may cause.
So, can this be avoided? Can created
and modified
be equal when the object is first created? I'm sure there are a lot of hacks, libraries and other such exotic solutions but the truth is much simpler. If you want to make sure these two fields are equal when the object is first created, you better avoid auto_now
and auto_now_add
:
from django.db import models
class Order(models.Model):
created = models.DateTimeField()
modified = models.DateTimeField()
Then, when you create a new instance, explicitly provide the values for both fields:
>>> from django.utils import timezone
>>> asof = timezone.now()
>>> o = Order.objects.create(created=asof, modified=asof)
>>> assert o.created == o.modified
>>> Order.objects.filter(created=F('modified'))
<QuerySet [<Order: Order object (2)>]>
To quote the "Zen of Python", explicit is better than implicit. Explicitly providing the values for the fields requires a bit more work, but this is a small price to pay for reliable and predictable data.
using auto_now and auto_now_add
When is it OK to use auto_now
and auto_now_add
? Usually when a date is used for audit purposes and not for business logic, it's fine to make this shortcut and use auto_now
or auto_now_add
.
When to Instantiate Injected Values
Injecting values poses another interesting question, at what point should the value be set? The answer to this is "it depends", but there is a rule of thumb that is usually correct: values should be instantiated at the topmost level.
For example, if asof
represents when an order is created, a website backend serving a store front may set this value when the request is received. In a normal Django setup, this means that the value should be set by the view. Another common example is a scheduled job. If you have jobs that use management commands, asof
should be set by the management command.
Setting the values at the topmost level guarantees that the lower levels remain decoupled and easier to test. The level at which injected values are set, is the level that you will usually need to use mock to test. In the example above, setting asof
in the view will make the models easier to test.
Other than testing and correctness, another benefit of setting values explicitly rather than implicitly, is that it gives you more control over your data. For example, in the website scenario, an order's creation date is set by the view immediately when the request is received. However, if you process a batch file from a large customer, the time in which the order was created may well be in the past, when the customer first created the files. By avoiding "auto-magically" generated dates, we can implement this by passing the past date as an argument.
Dependency Injection in Practice
The best way to understand the benefits of DI and the motivation for it is using a real life example.
IP Lookup
Say we want to try and guess where visitors to our Django site are coming from, and we decide to try an use the IP address from the request to do that. An initial implementation can look like this:
from typing import Optional
from django.http import HttpRequest
import requests
def get_country_from_request(request: HttpRequest) -> Optional[str]:
ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
if ip is None or ip == '':
return None
response = requests.get(f'https://ip-api.com/json/{ip}')
if not response.ok:
return None
data = response.json()
if data['status'] != 'success':
return None
return data['countryCode']
This single function accepts an HttpRequest
, tries to extract an IP address from the request headers, and then uses the requests
library to call an external service to get the country code.
ip lookup
I'm using the free service https://ip-api.com to lookup a country from an IP. I'm using this service just for demonstration purposes. I'm not familiar with it, so don't see this as a recommendation to use it.
Let's try to use this function:
>>> from django.test import RequestFactory
>>> rf = RequestFactory()
>>> request = rf.get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request)
'US'
OK, so it works. Notice that to use it we created an HttpRequest
object using Django's RequestFactory
Let's try to write a test for a scenario when a country code is found:
import re
import json
import responses
from django.test import RequestFactory
rf = RequestFactory()
with responses.RequestsMock() as rsps:
url_pattern = re.compile(r'http://ip-api.com/json/[0-9\.]+')
rsps.add(responses.GET, url_pattern, status=200, content_type='application/json', body=json.dumps({
'status': 'success',
'countryCode': 'US'
}))
request = rf.get('/', REMOTE_ADDR='216.58.210.46')
countryCode = get_country_from_request(request)
assert countryCode == 'US'
The function is using the requests
library internally to make a request to the external API. To mock the response, we used the responses
library.
If you look at this test and feel like it's very complicated than you are right. To test the function we had to do the following:
- Generate a Django request using a
RequestFactory
. - Mock a
requests
response usingresponses
. - Have knowledge of the inner works of the function (what url it uses).
That last point is where it gets hairy. To test the function we used our knowledge of how the function is implemented: what endpoint it uses, how the URL is structured, what method it uses and what the response looks like. This creates an implicit dependency between the test and the implementation. In other words, the implementation of the function cannot change without changing the test as well. This type of unhealthy dependency is both unexpected, and prevents us from treating the function as a "black box".
Also, notice that that we only tested one scenario. If you look at the coverage of this test you'll find that it's very low. So next, we try and simplify this function.
Assigning Responsibility
One of the techniques to make functions easier to test is to remove dependencies. Our IP function currently depends on Django's HttpRequest
, the requests
library and implicitly on the external service. Let's start by moving the part of the function that handles the external service to a separate function:
def get_country_from_ip(ip: str) -> Optional[str]:
response = requests.get(f'http://ip-api.com/json/{ip}')
if not response.ok:
return None
data = response.json()
if data['status'] != 'success':
return None
return data['countryCode']
def get_country_from_request(request: HttpRequest) -> Optional[str]:
ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
if ip is None or ip == '':
return None
return get_country_from_ip(ip)
We now have two functions:
get_country_from_ip
: receives an IP address and returns the country code.get_country_from_request
: accepts a DjangoHttpRequest
, extract the IP from the header, and then uses the first function to find the country code.
After splitting the function we can now search an IP directly, without crating a request:
>>> get_country_from_ip('216.58.210.46')
'US'
>>> from django.test import RequestFactory
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request)
'US'
Now, let's write a test for this function:
import re
import json
import responses
with responses.RequestsMock() as rsps:
url_pattern = re.compile(r'http://ip-api.com/json/[0-9\.]+')
rsps.add(responses.GET, url_pattern, status=200, content_type='application/json', body=json.dumps({
'status': 'success',
'countryCode': 'US'
}))
country_code = get_country_from_ip('216.58.210.46')
assert country_code == 'US'
This test looks similar to the previous one, but we no longer need to use RequestFactory
. Because we have a separate function that retrieves the country code for an IP directly, we don't need to "fake" a Django HttpRequest
.
Having said that, we still want to make sure the top level function works, and that the IP is being extracted from the request correctly:
# BAD EXAMPLE!
import re
import json
import responses
from django.test import RequestFactory
rf = RequestFactory()
request_with_no_ip = rf.get('/')
country_code = get_country_from_request(request_with_no_ip)
assert country_code is None
We created a request with no IP and the function returned None
. With this outcome, can we really say for sure that the function works as expected? Can we tell that the function returned None
because it couldn't extract the IP from the request, or because the country lookup returned nothing?
Someone once told me that if to describe what a function does you need to use the words "and" or "or", you can probably benefit from splitting it. This is the layman's version of the Single-responsibility principle that dictates that every class or function should have just one reason to change.
The function get_country_from_request
extracts the IP from a request and tries to find the country code for it. So, if the rule is correct, we need to split it up:
def get_ip_from_request(request: HttpRequest) -> Optional[str]:
ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
if ip is None or ip == '':
return None
return ip
# Maintain backward compatibility
def get_country_from_request(request: HttpRequest) -> Optional[str]:
ip = get_ip_from_request(request)
if ip is None:
return None
return get_country_from_ip(ip)
To be able to test if we extract an IP from a request correctly, we yanked this part to a separate function. We can now test this function separately:
rf = RequestFactory()
assert get_ip_from_request(rf.get('/')) is None
assert get_ip_from_request(rf.get('/', REMOTE_ADDR='0.0.0.0')) == '0.0.0.0'
assert get_ip_from_request(rf.get('/', HTTP_X_FORWARDED_FOR='0.0.0.0')) == '0.0.0.0'
assert get_ip_from_request(rf.get('/', REMOTE_ADDR='0.0.0.0', HTTP_X_FORWARDED_FOR='1.1.1.1')) =='0.0.0.0'
With just these 5 lines of code we covered a lot more possible scenarios.
Using a Service
So far we've implemented unit tests for the function that extracts the IP from the request, and made it possible to do a country lookup using just an IP address. The tests for the top level function are still very messy. Because we use requests
inside the function, we were forced to use responses
as well to test it. There is nothing wrong with responses
, but the less dependencies the better.
Invoking a request inside the function creates an implicit dependency between this function and the requests
library. One way to eliminate this dependency is to extract the part making the request to a separate service:
import requests
class IpLookupService:
def __init__(self, base_url: str) -> None:
self.base_url = base_url
def get_country_from_ip(self, ip: str) -> Optional[str]:
response = requests.get(f'{self.base_url}/json/{ip}')
if not response.ok:
return None
data = response.json()
if data['status'] != 'success':
return None
return data['countryCode']
The new IpLookupService
is instantiated with the base url for the service, and provides a single function to get a country from an IP:
>>> ip_lookup_service = IpLookupService('http://ip-api.com')
>>> ip_lookup_service.get_country_from_ip('216.58.210.46')
'US'
Constructing services this way has many benefits:
- Encapsulate all the logic related to IP lookup
- Provides a single interface with type annotations
- Can be reused
- Can be tested separately
- Can be developed separately (as long as the API it provides remains unchanged)
- Can be adjusted for different environments (for example, use a different URL for test and production)
The top level function should also change. Instead of making requests on its own, it uses the service:
def get_country_from_request(
request: HttpRequest,
ip_lookup_service: IpLookupService,
) -> Optional[str]:
ip = get_ip_from_request(request)
if ip is None:
return None
return ip_lookup_service.get_country_from_ip(ip)
To use the function, we pass an instance of the service to it:
>>> ip_lookup_service = IpLookupService('http://ip-api.com')
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request, ip_lookup_service)
'US'
Now that we have full control of the service, we can test the top level function without using responses
:
from unittest import mock
from django.test import RequestFactory
fake_ip_lookup_service = mock.create_autospec(IpLookupService)
fake_ip_lookup_service.get_country_from_ip.return_value = 'US'
request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
country_code = get_country_from_request(request, fake_ip_lookup_service)
assert country_code == 'US'
To test the function without actually making http requests we created a mock of the service. We then set the return value of get_country_from_ip
, and passed the mock service to the function.
Changing Implementations
Another benefit of DI which is often mentioned, is the ability to completely change the underlying implementation of an injected service. For example, one day you discover that you don't have to use a remote service to lookup an IP. Instead, you can use a local IP database.
Because our IpLookupService
does not leak its internal implementation, it's an easy switch:
from typing import Optional
import GeoIP
class LocalIpLookupService:
def __init__(self, path_to_db_file: str) -> None:
self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)
def get_country_from_ip(self, ip: str) -> Optional[str]:
return self.db.country_code_by_addr(ip)
The service API remained unchanged, so you can use it the same way as the old service:
>>> ip_lookup_service = LocalIpLookupService('/usr/share/GeoIP/GeoIP.dat')
>>> ip_lookup_service.get_country_from_ip('216.58.210.46')
'US'
>>> from django.test import RequestFactory
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request, ip_lookup_service)
'US'
The best part here is that the tests are unaffected. All the tests should pass without making any changes.
GeoIP
In the example I use the MaxMind GeoIP Legacy Python Extension API because it uses files I already have in my OS as part of geoiplookup
. If you really need to lookup IP addresses check out GeoIP2 and make sure to check the license and usage restrictions.
Also, Django users might be delighted to know that Django provides a wrapper around geoip2
.
Typing Services
In the last section we cheated a bit. We injected the new service LocalIpLookupService
into a function that expects an instance of IpLookupService
. We made sure that these two are the same, but the type annotations are now wrong. We also used a mock to test the function which is also not of type IpLookupService
. So, how can we use type annotations and still be able to inject different services?
from abc import ABCMeta
import GeoIP
import requests
class IpLookupService(metaclass=ABCMeta):
def get_country_from_ip(self, ip: str) -> Optional[str]:
raise NotImplementedError()
class RemoteIpLookupService(IpLookupService):
def __init__(self, base_url: str) -> None:
self.base_url = base_url
def get_country_from_ip(self, ip: str) -> Optional[str]:
response = requests.get(f'{self.base_url}/json/{ip}')
if not response.ok:
return None
data = response.json()
if data['status'] != 'success':
return None
return data['countryCode']
class LocalIpLookupService(IpLookupService):
def __init__(self, path_to_db_file: str) -> None:
self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)
def get_country_from_ip(self, ip: str) -> Optional[str]:
return self.db.country_code_by_addr(ip)
We defined a base class called IpLookupService
that acts as an interface. The base class defines the public API for users of IpLookupService
. Using the base class, we can provide two implementations:
RemoteIpLookupService
: uses therequests
library to lookup the IP at an external.LocalIpLookupService
: uses the local GeoIP database.
Now, any function that needs an instance of IpLookupService
can use this type, and the function will be able to accept any subclass of it.
Before we wrap things up, we still need to handle the tests. Previously we removed the test's dependency on responses
, now we can ditch mock
as well. Instead, we subclass IpLookupService
with a simple implementation for testing:
from typing import Iterable
class FakeIpLookupService(IpLookupService):
def __init__(self, results: Iterable[Optional[str]]):
self.results = iter(results)
def get_country_from_ip(self, ip: str) -> Optional[str]:
return next(self.results)
The FakeIpLookupService
implements IpLookupService
, and is producing results from a list of predefined results we provide to it:
from django.test import RequestFactory
fake_ip_lookup_service = FakeIpLookupService(results=['US'])
request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
country_code = get_country_from_request(request, fake_ip_lookup_service)
assert country_code == 'US'
The test no longer uses mock
.
Using a Protocol
The form of class hierarchy demonstrated in the previous section is called "nominal subtyping". There is another way to utilize typing without classes, using Protocols
:
from typing import Iterable, Optional
from typing_extensions import Protocol
import GeoIP
import requests
class IpLookupService(Protocol):
def get_country_from_ip(self, ip: str) -> Optional[str]:
pass
class RemoteIpLookupService:
def __init__(self, base_url: str) -> None:
self.base_url = base_url
def get_country_from_ip(self, ip: str) -> Optional[str]:
response = requests.get(f'{self.base_url}/json/{ip}')
if not response.ok:
return None
data = response.json()
if data['status'] != 'success':
return None
return data['countryCode']
class LocalIpLookupService:
def __init__(self, path_to_db_file: str) -> None:
self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)
def get_country_from_ip(self, ip: str) -> Optional[str]:
return self.db.country_code_by_addr(ip)
class FakeIpLookupService:
def __init__(self, results: Iterable[Optional[str]]):
self.results = iter(results)
def get_country_from_ip(self, ip: str) -> Optional[str]:
yield from self.results
The switch from classes to protocols is mild. Instead of creating IpLookupService
as a base class, we declare it a Protocol
. A protocol is used to define an interface and cannot be instantiated. Instead, a protocol is used only for typing purposes. When a class implements the interface defined by the protocol, is means "Structural Subtyping" exits and the type check will validate.
In our case, we use a protocol to make sure an argument of type IpLookupService
implements the functions we expect an IP service to provide.
structural and nominal subtyping
I've written about protocols, structural and nominal subtyping to in the past. Check out Modeling Polymorphism in Django With Python.
So which to use? Some languages, like Java, use nominal typing exclusively, while other languages, like Go, use structural typing for interfaces. There are advantages and disadvantages to both ways, but we won't get into that here. In Python, nominal typing is easier to use and understand, so my recommendation is to stick to it, unless you need the flexibility afforded by protocols.
Nondeterminism and Side-Effects
If you ever had a test that one day just started to fail, unprovoked, or a test that fails once every blue moon for no apparent reason, it's possible your code is relying on something that is not deterministic. In the datetime.date.today
example, the result of datetime.date.today
relies on the current time which is always changing, hence it's not deterministic.
There are many sources of nondeterminism. Common examples include:
- Randomness
- Network access
- Filesystem access
- Database access
- Environment variables
- Mutable global variables
Dependency injection provides a good way to control nondeterminism in tests. The basic recipe is this:
- Identify the source of nondeterminism and encapsulate it in a service: For example, TimeService, RandomnessService, HttpService, FilesystemService and DatabaseService.
- Use dependency injection to access these services: Never bypass them by using datetime.now() and similar directly.
- Provide deterministic implementations of these services in tests: Use a mock, or a custom implementation suited for tests instead.
If you follow the recipe diligently, your tests will not be affected by external circumstances and you will not have flaky tests!
Conclusion
Dependency injection is a design pattern just like any other. Developers can decide to what degree they want to take advantage of it. The main benefits of DI are:
- Decouple modules, functions and objects.
- Switch implementations, or support several different implementations.
- Eliminate nondeterminism from tests.
In the use-case above we took several twists and turns to illustrate a point, which might have caused the implementation to seem more complicated than it really is. In addition to that, searching for information about dependency injection in Python often result in libraries and packages than seem to completely change the way you structure your application. This can be very intimidating.
In reality, DI can be used sparingly and in appropriate places to achieve the benefits listed above. When implemented correctly, DI can make your code easier to maintain and to test.