Python Concurrency: Threads, Processes, and asyncio Explained


Intro

I recently needed to learn Python concurrency. I always thought I’d get to that topic someday. That someday arrived.

The best resource I’ve found is David Beazley’s Python Concurrency From the Ground Up: LIVE!. Not only does he do live coding without mistakes, but he explains it very intuitively. This post is a summary of his talk, plus a section on asyncio (wasn’t released at the time).

The goal is to understand the differences between threads, processes, and asyncio.

To understand it at a high level, I like this analogy:

We’ll explore each one, see what they’re good at, and figure out which you should use for your situation.

First, we need to define concurrency:

Concurrency in programming means that multiple computations happen at the same time.

Setup

To get started, we’re going to need two things:

For the first item, we are using the Fibonacci sequence . It’s familiar and it perfectly shows our concurrency challenges. When you’re computing fib(50), you’ll appreciate why it matters.

fib.py
def fib(n):
if n <=2:
return 1
else:
return fib(n-1) + fib(n-2)

Next, Dave uses socket programming to create a web server. This is helpful for seeing the effect of multiple calls to a CPU-bound task.

server.py
from socket import *
from fib import fib
def fib_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
while True:
client, addr = sock.accept()
print("Connection", addr)
fib_handler(client)
def fib_handler(client):
while True:
req = client.recv(100)
if not req:
break
result = fib(int(req))
resp = str(result).encode("ascii") + b"\n"
client.send(resp)
fib_server(("", 25000))

What’s happening here:

When you run this program, python server.py, you can open a connection to the server in order to start sending it requests:

telnet localhost 25000

Now, you can input any number and the server will give you the result if fib(n). But as we mentioned earlier, it can only handle one connection at a time. You can test this by running telnet localhost 25000 in another terminal, type any number, and see that it doesn’t return a result.

Now that we understand the basic server setup and its limitations, let’s see how threads can help us solve this problem.

Threads

We can use threads to help us handle multiple connections simultaneously.

server_with_threads.py
from socket import *
from fib import fib
from threading import Thread
def fib_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
while True:
client, addr = sock.accept()
print("Connection", addr)
Thread(target=fib_handler, args=(client,)).start()
def fib_handler(client):
while True:
req = client.recv(100)
if not req:
break
result = fib(int(req))
resp = str(result).encode("ascii") + b"\n"
client.send(resp)
fib_server(("", 25000))

You can check that this works by opening a connection (and inputting numbers) in 2 or more terminals:

telnet localhost 25000

Why does this work? We’re offloading each client connection to its own thread.

Now let’s stress-test our server! The following are performance scripts from Dave’s talk.

perf1.py
# Time of a long running request
from socket import *
import time
sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("localhost", 25000))
while True:
start = time.time()
sock.send(b"30")
resp = sock.recv(100)
end = time.time()
print(end - start)

This script, perf1.py, simulates a long running request (running fib(30)). If we run this python perf1.py in one terminal, we might see something like:

> python perf1.py
0.17196965217590332
0.17380785942077637
0.16659832000732422
0.18915700912475586
...

This shows us how long each request takes. Now if we keep this running, and open another terminal window to run it a second time, we see something like this:

> python perf1.py
0.36538004875183105
0.2956357002258301
0.288482666015625
0.309873104095459
0.2971019744873047

In both terminal windows, the time it takes to run the fib(30) has doubled. We see that the runtime increases linearly with the number of threads.

Now, Dave asks us an interesting question: ‘What happens if we mix a long running request with very short running requets?‘

We can simulate short running requests with this script:

perf2.py
# requests/sec of fast requests
from socket import *
import time
sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("localhost", 25000))
n = 0
from threading import Thread
def monitor():
global n
while True:
time.sleep(1)
print(n, "reqs/sec")
n = 0
Thread(target=monitor).start()
while True:
sock.send(b"1")
resp = sock.recv(100)
n += 1

This script sends fib(1) which is a very fast computation compared to perf1.py’s fib(30).

Now let’s run perf2.py followed by perf1.py

> python perf2.py
66683 reqs/sec
70768 reqs/sec
59795 reqs/sec
> python perf1.py
0.17106914520263672
0.1491072177886963
0.14998102188110352

If we take a look back at our terminal with perf2.py, we can see a significant drop in reqs/sec.

> python perf2.py
66683 reqs/sec
70768 reqs/sec
59795 reqs/sec
...
71 reqs/sec
126 reqs/sec
108 reqs/sec

Our fast running performance script decreases by ~600 times! But the long running request still takes the same amount of time. What’s happening here?

Here’s where we run into the infamous global interpreter lock (GIL). The GIL prevents multiple threads from executing Python bytecode at the same time. So what happens if you have a long running request in one thread and very short running requests in another thread? As we saw in our experiment above, the thread with the long running request will be prioritized!

We can also see here that we do not have control over when a task switch occurs. We’ll see later how that contrasts with asyncio which allows you to control when task switches happen.

This is why you may have heard Python developers saying you shouldn’t use threads. Many have been burned by scenarios where an app has grinded to a halt because of a long running request in a thread.

While threads helped us with concurrent connections, the GIL was a significant bottleneck. Our next approach uses processes to parallelize the work.

Processes

Processes can sidestep the GIL by using subprocesses instead of threads. Let’s take a look at how we can implement this in our server:

server_with_processes.py
from socket import *
from fib import fib
from threading import Thread
from concurrent.futures import ProcessPoolExecutor as Pool
pool = Pool(4)
def fib_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
while True:
client, addr = sock.accept()
print("Connection", addr)
Thread(target=fib_handler, args=(client,), daemon=True).start()
def fib_handler(client):
while True:
req = client.recv(100)
if not req:
break
n = int(req)
future = pool.submit(fib, n)
result = future.result()
resp = str(result).encode("ascii") + b"\n"
client.send(resp)
print("Closed")
fib_server(("", 25000))

We’re using the module from concurrent.futures since it offers a higher level abstraction and some niceties.

Now let’s run both perf1.py and perf2.py again and see what happens.

> python perf2.py
3493 reqs/sec
3527 reqs/sec
3844 reqs/sec
> python perf1.py
0.2727804183959961
0.19918036460876465
0.2786672115325928

And if we go back to perf2.py’s terminal, let’s see what happens to it after running perf1.py.

> python perf2.py
3493 reqs/sec
3527 reqs/sec
3844 reqs/sec
...
3087 reqs/sec
2785 reqs/sec
3090 reqs/sec

We can make a few observations here:

Processes have solved the CPU-intensive concurrency issues. But they have overhead costs. For non CPU-intensive tasks, there’s another approach that doesn’t have the overhead of threads or processes.

Asynchronous Programming

At this point, Dave introduces asynchronous programming. With threads, we can’t control when each thread becomes active or inactive. The OS controls that. But with this asynchronous programming pattern, we can control that point where the switch happens.

This means that you do not have to use locks since you control when task switches occur. The cost to task switching is also very low.

He demonstrates this by using yield to create a coroutine. You can take a look at his example here.

This is where I’ll take a bit of a detour from the video, and try to do something similar using asyncio. asyncio allows you to write concurrent code using the async/await syntax, and maintain full control over the execution of coroutines.

Let’s implement this in a simple way:

server_with_asyncio.py
import asyncio
from socket import *
from fib import fib
async def fib_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
sock.setblocking(False)
loop = asyncio.get_event_loop()
while True:
client, addr = await loop.sock_accept(sock)
print("Connection", addr)
loop.create_task(fib_handler(client, loop))
async def fib_handler(client, loop):
while True:
req = await loop.sock_recv(client, 100)
if not req:
break
result = fib(int(req))
resp = str(result).encode("ascii") + b"\n"
await loop.sock_sendall(client, resp)
client.close()
if __name__ == "__main__":
asyncio.run(fib_server(("", 25000)))

Note: Typically you would use asyncio’s server setup rather than socket programming. But since we want to focus on asynchronous programming, we’ll keep the raw socket operations so we can see the comparison between the different approaches.

What’s happening here?

A major difference between asyncio and threads is that asyncio handles many concurrent connections with coroutine tasks. It is a single threaded with an event loop. This means that asyncio has less overhead, and is better suited for managing many more concurrent connections since OS threads are expensive.

Now that we’ve explored the 3 different approaches to python concurrency, let’s summarize when each approach makes sense to use.

Summary

This is a bit simplistic, but here’s my take away for deciding when to use each:

if cpu_intensive:
'processes'
else:
if suited_for_threads:
'threads'
elif suited_for_asyncio:
'asyncio'

If your tasks are CPU-bound and intensive, then you don’t want any blocking to happen. Processes are ideal for this.

But what about threads vs asyncio? In what situation would you use each?

Let’s summarize what we explored above (this discussion is also helpful, with comments from CPython core developers)

Advantages of asyncio:

Disadvantages of asyncio:

Advantages of threads:

Disadvantages of threads:

When should you use asyncio?

When should you use threads?

Understanding the strengths of limitations of threads, processes, and asyncio will help us choose the right tool to improve performance and efficiency.

Resources


Have some thoughts on this post? Reply with an email.

If you're interested in updates, you can subscribe below or via the RSS feed

Powered by Buttondown.