Python concurrency
Various notes on concurrency in Python. Main information source is realpython concurrency article 1.
The dictionary definition of concurrency is simultaneous occurrence.
Concurrency types:
-
Pre-emptive multitasking (
threading
), single process/processors. Switch decision:
The operating system decides when to switch tasks external to Python. -
Cooperative multitasking (
asyncio
), single process/processors. Switch decision:
The tasks decide when to switch tasks. -
Multiprocessing (
multiprocessing
), multiple processes/processors. Difference betweenthreading
andasyncio
?
The processes all run at the same time on different processors. Number of processes/processors many.
When concurrency is useful
Main concurrency use cases are CPU bound and I/O bound tasks.
Your program spends most of its time talking to a slow device, like a network connection, a hard drive, or a printer, this is I/O bound process. Speeding it up involves overlapping the times spent waiting for these devices.
You program spends most of its time doing CPU operations, this is CPU bound process. Speeding it up involves finding ways to do more computations in the same amount of time (solving the problem with more physical cores).
How to Speed Up an I/O-Bound Program
Let’s say you have synchronous version of program that do I/O bound task.
Why we used requests.Session
here?
Creating a Session object allows requests to do some fancy networking tricks and
really speed things up, we can reuse connection between requests.
In my case I/O bound task is downloading a bunch of websites, and it takes less than a minute to download each website with mine internet connection.
The big problem here is that it’s relatively slow compared to the other solutions, and sometimes you can’t afford to wait that long. To solve this problem, we can use threads.
Threading
One major possible issue with threading is race conditions.
Race conditions happen because the programmer has not sufficiently protected data accesses to prevent threads from interfering with each other.
Can you explain problems with this code?
In order to increment counter, each of the threads needs to read the current
value, add one to it, and the save that value back to the variable. That happens
in this line: counter += 1
.
Because the operating system knows nothing about your code and can swap
threads at any point in the execution, it’s possible for this swap to
happen after a thread has read the value, but before it has had the chance to
write it back. If the new code that is running modifies counter
as well,
then the first thread has a stale copy of the data and trouble will
ensue (incorrect results).
Since this is rare situation, this type of problem quite difficult to debug.
asyncio
Detailed explanation available in asyncio note.
General concept of event loop?
The event loop object is aware of each task and knows what state it’s in.
An important point of asyncio
is that the tasks never give up control without
intentionally doing so. They never get interrupted in the middle of an
operation. This allows us to share resources a bit more easily in asyncio
than
in threading. You don’t have to worry about making your code thread-safe.