The days of desktop systems serving single users are long gone. Web applications nowadays are serving millions of users at the same time. With many users comes a wide range of new problems: concurrency problems.
To demonstrate common concurrency issues we are going to work on a bank account model:
class Account(models.Model): id = models.AutoField( primary_key=True, ) user = models.ForeignKey( User, ) balance = models.IntegerField( default=0, )
To get started we are going to implement a naive deposit and withdraw methods for an account instance:
def deposit(self, amount): self.balance += amount self.save() def withdraw(self, amount): if amount > self.balance: raise errors.InsufficientFunds() self.balance -= amount self.save()
This seems innocent enough and it might even pass unit tests and integration tests on localhost. But, what happens when two users perform actions on the same account at the same time?
- User A fetches the account
- balance is 100$
- User B fetches the account
- balance is 100$
- User B withdraws 30$
- balance is updated to 100$ - 30$ = 70$
- User A deposits 50$
- balance is updated to 100$ + 50$ = 150$
What Happened Here?
User B asked to withdraw 30$ and user A deposited 50$. We expect the balance to be 120$, but we ended up with 150$.
Why Did it Happen?
At step 4, when user A updated the balance, the amount he had stored in memory was stale (user B had already withdrawn 30$). To prevent this situation from happening we need to make sure the resource we are working on is not altered while we are working on it.
The pessimistic approach dictates that you should lock the resource exclusively until you are finished with it. If nobody else can acquire a lock on the object while you are working on it, you can be sure the object was not changed.
To acquire a lock on a resource we use a database lock for several reasons:
- (relational) databases are very good at managing locks and maintaining consistency.
- The database is the lowest level in which data is accessed - acquiring the lock at the lowest level will protect the data from other processes modifying the data as well. For example, direct updates in the DB, cron jobs, cleanup tasks, etc.
- A Django app can run on multiple processes (e.g workers). Maintaining locks at the app level will require a lot of (unnecessary) work.
Let's implement a safe deposit and withdraw actions using a pessimistic approach:
@classmethod def deposit(cls, id, amount): with transaction.atomic(): account = ( cls.objects .select_for_update() .get(id=id) ) account.balance += amount account.save() return account @classmethod def withdraw(cls, id, amount): with transaction.atomic(): account = ( cls.objects .select_for_update() .get(id=id) ) if account.balance < amount: raise errors.InsufficientFunds() account.balance -= amount account.save() return account
- We use
select_for_updateon our queryset to tell the database to lock the object until the transaction is done.
- Locking a row in the database requires a database transaction. We use Django's decorator
transaction.atomic()to scope the transaction.
- We use a
classmethodinstead of an instance method. To acquire the lock we need to tell the database to lock it. To achieve that we need to be the ones fetching the object from the database. When operating on self the object is already fetched and we don't have any guaranty that it was locked.
- All the operations on the account are executed within the database transaction.
Let's see how the scenario from earlier is prevented with our new implementation:
- User A asks to withdraw 30$:
- User A acquires a lock on the account
- Balance is 100$
- User B asks to deposit 50$:
- Attempt to acquire lock on account fails (locked by user A)
- User B waits for the lock to release
- User A withdraw 30$:
- Balance is 70$
- Lock of user A on account is released
- User B acquires a lock on the account
- Balance is 70$
- New balance is 70$ + 50$ = 120$
- Lock of user B on account is released, balance is 120$. Bug prevented!
What You Need to Know About
- You dont have to wait for the lock to release - In our scenario, user B waited for user A to release the lock. Instead of waiting, we can tell Django not to wait for the lock to release and raise a
DatabaseErrorinstead. To do that, we set
- Select related objects are also locked - Using
select_relatedlocks the related objects as well.
For example, If we
select_relatedthe user along with the account, both the user and the account are locked. If during deposit someone is trying to update the user's first name, that update will fail because the user object is locked.
If you are using PostgreSQL or Oracle this might not be a problem soon, thanks to a new feature in the upcoming Django 2.0. In this version,
ofoption to explicitly state which of the tables in the query to lock.
I used the bank account example in the past to demonstrate common patterns we use in Django models. You are welcome to follow up in this article
Unlike the pessimistic approach, the optimistic approach does not require a lock on the object. The optimistic approach assumes collisions are not very common, and dictates that one should only make sure there were no changes made to the object at the time it is updated.
First, we add a column to keep track of changes made to the object:
version = models.IntegerField(default=0)
Then, when we update an object, we make sure the version did not change:
def deposit(self, id, amount): updated = Account.objects.filter( id=self.id, version=self.version, ).update( balance=balance + amount, version=self.version + 1, ) return updated > 0 def withdraw(self, id, amount): if self.balance < amount: raise errors.InsufficientFunds() updated = Account.objects.filter( id=self.id, version=self.version, ).update( balance=balance - amount, version=self.version + 1, ) return updated > 0
Let's break it down:
- We operate directly on the instance (no classmethod).
- We rely on the fact that the version is incremented every time the object is updated.
- We update only if the version did not change:
- If the object was not modified since we fetched it than the object is updated.
- If it was modified than the query will return zero records and the object will not be updated.
- Django returns the number of updated rows. If
updatedis zero it means someone else changed the object from the time we fetched it.
How is optimistic locking work in our scenario:
- User A fetch the account:
- balance is 100$
- version is 0
- User B fetch the account:
- balance is 100$
- version is 0
- User B asks to withdraw 30$:
- Balance is updated to 100$ - 30$ = 70$
- Version is incremented to 1
- User A asks to deposit 50$:
- The calculated balance is 100$ + 50$ = 150$
- The account does not exist with version 0 → nothing is updated
What You Need to Know About the Optimistic Approach:
- Unlike the pessimistic approach, this approach requires an additional field and a lot of discipline. One way to overcome the discipline issue is to abstract this behavior. Some packages we've taken inspiration from are:
- django-fsm implements optimistic locking using a version field as described above.
- django-optimistic-lock seem to do the same.
- In an environment with a lot of concurrent updates this approach might be wasteful.
- The optimistic approach does not protect from modifications made to the object outside the app. If you have other tasks that modify the data directly (e.g no through the model), you need to make sure they use the version as well.
- Using the optimistic approach, the function can fail and return false. In this case we will most likely want to retry the operation. Using the pessimistic approach with
nowait=Falsethe operation cannot fail, it will wait for the lock to release.
Which One Should I Use?
Like any great question, the answer is "it depends":
- If your object has a lot of concurrent updates you are probably better off with the pessimistic approach.
- If you have updates happening outside the ORM (for example, directly in the database) the pessimistic approach is safer.
- If your method has side effects such as remote API calls or OS calls make sure they are safe. Some things to consider - can the remote call take a long time? Is the remote call idempotent (safe to retry)?