In this post, we will dive head-first intro into what Python bytecode is, how to view it, and how to read and understand it.
What is Python bytecode?
Like many people who have worked with Python, you may only be used to seeing Python source code (.py files), but what happens to our Python source code for it to be able to executed by a CPU?
In a similar fashion to many other interpreted programming language's, Python first compiles its source code to an intermediate bytecode format which is in turn interpreted by the Python runtime and subsequently converted to native CPU instructions. The intermediate bytecode instructions are stored in pycache (.pyc) files which are then consumed by the Python runtime when executing a program.
How to view the generated bytecode.
The Python standard library provides the dis module which exposes an API for disassembling Python source code into bytecode instructions.
Official Docs: dis — Disassembler for Python bytecode.
We can utilise the dis(obj) function within this module to print out the disassembled bytecode of the object passed in as an argument.
Below is an example of a simple hello_world() function which has been disassembled using the dis() function.
Reading and understanding Python bytecode
The bytecode output is composed of the following properties.
- The line number of the Python code that the current block of bytecode corresponds to.
- The instructions index in the evaluation stack.
- The opcode of the instruction.
- The oparg, this is the argument for the opcode where applicable.
- Where possible, the resolved oparg value.
Let's step through a simple, (perhaps somewhat contrived) example and outline what is being performed by each bytecode instruction
The following function simply takes two arguments, x & y, and returns the sum of the two provided arguments.
The first two
LOAD_FAST instructions push the x & y arguments provided to the add function onto the evaluation stack. The opargs provided to the
LOAD_FAST instruction reference the index of the values to be loaded in the
BINARY_ADD instruction then pops the two top items from the evaluation stack (x & y) and sums the two values. The result of the calculation is then pushed on to the top of the stack.
RETURN_VALUE then returns the value from the top of the evaluation stack to the caller and exits the function.