Faster enum implementation (Pure Python version) #1581

mdboom · 2026-02-05T19:54:44Z

This adds an implementation of integer enums that is significantly faster than the one in the stdlib (enum.IntEnum).

There was an alternate implementation of this that used a handwritten C extension, but the performance benefit was minor. This is both much faster than the stdlib, but also small and easy-to-maintain.

Additionally, this lets us assign docstrings to the enumeration values, which is a feature the stdlib enum doesn't have. Then all of the generated code has been updated to include docstrings, auto-derived from the headers.

+------------------------------+----------+-----------------------+
| Benchmark                    | baseline | fast_enum_py_docs     |
+==============================+==========+=======================+
| enum from int                | 300 ns   | 170 ns: 1.77x faster  |
+------------------------------+----------+-----------------------+
| int from enum                | 55.7 ns  | 57.5 ns: 1.03x slower |
+------------------------------+----------+-----------------------+
| enum value                   | 203 ns   | 79.9 ns: 2.54x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.driver import  | 38.3 ms  | 32.1 ms: 1.19x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.runtime import | 73.4 ms  | 61.4 ms: 1.20x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.nvrtc import   | 62.1 ms  | 55.2 ms: 1.12x faster |
+------------------------------+----------+-----------------------+

copy-pr-bot · 2026-02-05T19:54:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-02-05T19:54:59Z

/ok to test

github-actions · 2026-02-05T20:08:16Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1581/
https://nvidia.github.io/cuda-python/pr-preview/pr-1581/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1581/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1581/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

mdboom · 2026-02-06T18:37:30Z

/ok to test

mdboom · 2026-02-06T19:31:41Z

/ok to test

mdboom · 2026-02-06T20:40:56Z

/ok to test

mdboom · 2026-02-06T20:44:00Z

/ok to test

copy-pr-bot · 2026-02-09T20:53:53Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-02-09T20:54:59Z

/ok to test

kkraus14 · 2026-02-10T02:54:14Z

cuda_bindings/cuda/bindings/_internal/_fast_enum.py

+It supports the most important subset of the IntEnum API.  See `test_enum` in
+`cuda_bindings/tests/test_basics.py` for details.


Based on this comment I assume this means that we're only supporting a subset of an IntEnum and given we're returning these that would probably constitute an API break?

I.e. because we were previously returning an IntEnum as error codes, if someone has code that does something like isinstance(value, enum.IntEnum), that would then start returning False?

Yes. But moving away from Enum/IntEnum is our only route to gain performance on the error handling side. I think this ducktyping is good enough and the break due to type checking seems to be theoretical. Users most likely would just write isinstance(err, CUresult) (which is nonbreaking) instead of isinstance(err, Enum).

Yeah -- I'm probably being overly cautious with the word subset here. I carefully looked at everything IntEnum exposes and replicated it here. The test even tests stdlib and our implementation using the same test code. You are right that the biggest visible change is the isinstance(value, enum.IntEnum) thing, but I agree with @leofang, that seems unlikely.

In any event, I'll add a CHANGELOG entry here.

If we inherited from enum.IntEnum instead of int would that nullify some of the performance gains?

Could we use __instancecheck__ to make isinstance(enum.IntEnum) and isinstance(enum.Enum) work as they did previously?

Either of these would likely limit the breakage to if someone was using anti-patterns like using type(...) or something like that.

If we inherited from enum.IntEnum instead of int would that nullify some of the performance gains?

Yeah, unfortunately you can't bring in the value type (IntEnum) without bringing in the metaclass and all of its startup cost:

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Could we use instancecheck to make isinstance(enum.IntEnum) and isinstance(enum.Enum) work as they did previously?

Unfortunately not. __instancecheck__ is a method on the type/metaclass that you are testing for. In other words, to make isinstance(FastEnum.VALUE, enum.IntEnum) work, we would have to monkeypatch an __instancecheck__ to enum.IntEnum.

This GitHub code search with a regex (for isinstance\(.*, IntEnum\) in a file that also contains the word cuda throws up a few possible problematic things. I guess we need to decide whether the performance matters enough that it's worth the pain to help our users fix those things in a few places.

Monkeypatching IntEnum's __instancecheck__ does work, with the loss of some performance:

+------------------------------+-------------------+-----------------------+ | isinstance(enum, IntEnum) | 48.1 ns | 80.6 ns: 1.68x slower | +------------------------------+-------------------+-----------------------+

If performance matters to our users, we can suggest updating their code to isinstance(enum, FastEnum), which has no penalty from the monkeypatching.

Maybe this is an ok solution for backward compatibility.

@leofang, @kkraus14: So I think we just need to do decide:

We merge this as-is where isinstance(enum, IntEnum) returns False.

We include a monkeypatch to make isinstance(enum, IntEnum) work. This makes that isinstance check about 68% slower (but that check is exclusive to frameworks doing things in general, not really in user code), but we wouldn't break dependent libraries. As with all monkeypatches, unintended consequences are a little hard to reason about. I /think/ this is safe, but I wouldn't bet my life on it.

We don't do this whole thing right now. (And, as a reminder to myself, to revert the generator changes if that's what we decide).

This is what (2) looks like:

def __instancecheck__(cls, instance) -> bool: if isinstance(instance, FastEnum): return True return issubclass(type(instance), IntEnum) type(IntEnum).__instancecheck__ = __instancecheck__

I vote for 1 and if there is any bug report we do 2 in a patch release.

+1. I'd vote for 1 and then fallback to 2 if required based on user feedback.

mdboom · 2026-02-10T13:45:43Z

/ok to test

mdboom · 2026-02-10T16:17:30Z

/ok to test

mdboom · 2026-02-10T17:34:23Z

/ok to test

mdboom mentioned this pull request Feb 5, 2026

Replace stdlib enum with something more performant #1557

Open

Use a new, faster enum implementation

40d1305

mdboom force-pushed the fast-enum-py branch from 9ede54a to 40d1305 Compare February 9, 2026 20:45

mdboom marked this pull request as ready for review February 9, 2026 20:53

mdboom requested a review from leofang February 9, 2026 21:15

leofang modified the milestones: cuda.bindings 13.1.2 & 12.9.6, cuda.bindings next Feb 9, 2026

leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Feb 9, 2026

leofang assigned mdboom Feb 9, 2026

kkraus14 reviewed Feb 10, 2026

View reviewed changes

mdboom added 2 commits February 10, 2026 08:37

Updates from the generator side

2609f25

Add CHANGELOG entry

f97f1fb

mdboom added 2 commits February 10, 2026 11:12

Move _fast_enum.py to cuda.bindings._internal

b4b50ed

Merge remote-tracking branch 'upstream/main' into fast-enum-py

36edba7

Fix up imports

18af218

leofang mentioned this pull request Feb 10, 2026

cuda.core latency benchmark suite #1579

Open

Merge branch 'main' into fast-enum-py

ed2a35f

		It supports the most important subset of the IntEnum API. See `test_enum` in
		`cuda_bindings/tests/test_basics.py` for details.

Faster enum implementation (Pure Python version) #1581

Are you sure you want to change the base?

Faster enum implementation (Pure Python version) #1581

Uh oh!

Conversation

mdboom commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

mdboom commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mdboom commented Feb 6, 2026

Uh oh!

mdboom commented Feb 6, 2026

Uh oh!

mdboom commented Feb 6, 2026

Uh oh!

mdboom commented Feb 6, 2026

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

mdboom commented Feb 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom commented Feb 10, 2026

Uh oh!

mdboom commented Feb 10, 2026

Uh oh!

mdboom commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mdboom commented Feb 5, 2026 •

edited

Loading

mdboom Feb 10, 2026 •

edited

Loading