Multiple Interpreters? Whaaaaat?

Multiple Interpreters? Whaaaaat?
Photo by Aleksandr Kadykov / Unsplash

I'd like to talk about the support for multiple interpreters, also known as "subinterpreters", in CPython since version 1.5, which was released in 1997. This feature is accessible through the C-API. The multiple interpreters work in relative isolation, allowing for innovative new methods for handling concurrency.

One Damn Fine Proposal

Add a new standard library module named interpreters to assist extension module maintainers.

This module will offer a high-level interface to the existing multiple interpreter functionality in CPython. It will also provide a basic mechanism to pass data between interpreters by setting simple objects in the main module of a target subinterpreter. However, objects will not be shared between interpreters, instead, the data of the objects is passed. PEP 554 further elaborates on shared data and API for sharing data between interpreters.

The proposal suggests that an extension implementing multi-phase init (PEP 489) is seen as isolated and compatible with multiple interpreters. However, many extension modules are not yet compatible.

To fast-track compatibility and reduce impact, the proposal suggests the following steps:

  1. Clarify that extension modules are not required to support use in multiple interpreters;
  2. Trigger an ImportError when an incompatible module is imported in a subinterpreter;
  3. Offer resources to aid maintainers in achieving compatibility;
  4. Engage with maintainers of Cython and most-used extension modules for feedback and potential assistance.

The proposal acknowledges that failure to support multiple interpreters may lead to user confusion and negative implications for extension maintainers.

See some code examples here

Concerns and Misconceptions

Critics argue that subinterpreters are not beneficial enough to justify their inclusion in Python due to the language size increase. However, supporters believe that they offer a unique concurrency model and a chance to enhance CPython for multiple CPU core usage, currently hindered by the Global Interpreter Lock (GIL). As per usual, there are opinions and you can make your own mind up.

Alternatives to subinterpreters include threading, async, and multiprocessing, each with its limitations. Subinterpreters provide benefits of isolation and potential performance improvement without aiming to replace the alternatives.

Concerns are raised about the additional maintenance burden on C extension authors due to the incomplete isolation in CPython's subinterpreters. However, it's believed the actual burden will be minimal and outweighed by the benefits of subinterpreters.

Misconceptions exist that the proposal includes the removal of the shared GIL, leading to confusion about its value. However, the proposal does not aim to introduce a new concurrency API or eliminate the shared GIL. Instead, it seeks to:

  1. Increase exposure of the existing feature
  2. Promote isolated execution of interpreters, and
  3. Encourage experimentation.

Lastly, concerns about the negative impact on cache performance in multi-core scenarios due to data sharing are addressed, clarifying that the immediate plan is to focus on data copying rather than sharing between interpreters.

Concurrency, Sharing and Isolation

Sharing data between Python's interpreters is a significant challenge due to constraints on object ownership, visibility, and mutability. Objects are bound to the interpreter where they were created, complicating sharing. A variety of solutions exist, and the implementation of Interpreter.run() allows for these solutions to coexist, even though it's not part of the proposal.

Most objects can be converted to raw data and safely passed between interpreters, providing a basic interim solution. However, determining the best method is outside the scope of this proposal.

Python's interpreters are designed to be strictly isolated, each having its own copy of all modules, classes, functions, and variables. But some state remains shared, and there are issues with isolation due to bugs or designs that did not account for subinterpreters.

Currently, multiple interpreter support is not widely used, with only a few documented cases of widespread use, which indicates that the feature is relatively stable, but also means there's limited data to assess its utility.

I don't know about you but I am getting really excited by this thing and really looking forward to experimenting with it if it comes to pass. It will of course not be perfect straight out of the box but never the less it can certainly contribute in making Python a more performant language.

If you have any braincells left at the end of reading this condensed version have a strong coffee and read the full PEP 554.

Alright!