Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Feb 5, 2026

This adds an implementation of integer enums that is significantly faster than the one in the stdlib (enum.IntEnum).

There was an alternate implementation of this that used a handwritten C extension, but the performance benefit was minor. This is both much faster than the stdlib, but also small and easy-to-maintain.

Additionally, this lets us assign docstrings to the enumeration values, which is a feature the stdlib enum doesn't have. Then all of the generated code has been updated to include docstrings, auto-derived from the headers.

+------------------------------+----------+-----------------------+
| Benchmark                    | baseline | fast_enum_py_docs     |
+==============================+==========+=======================+
| enum from int                | 300 ns   | 170 ns: 1.77x faster  |
+------------------------------+----------+-----------------------+
| int from enum                | 55.7 ns  | 57.5 ns: 1.03x slower |
+------------------------------+----------+-----------------------+
| enum value                   | 203 ns   | 79.9 ns: 2.54x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.driver import  | 38.3 ms  | 32.1 ms: 1.19x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.runtime import | 73.4 ms  | 61.4 ms: 1.20x faster |
+------------------------------+----------+-----------------------+
| cuda.bindings.nvrtc import   | 62.1 ms  | 55.2 ms: 1.12x faster |
+------------------------------+----------+-----------------------+

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 5, 2026

/ok to test

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

@mdboom
Copy link
Contributor Author

mdboom commented Feb 6, 2026

/ok to test

3 similar comments
@mdboom
Copy link
Contributor Author

mdboom commented Feb 6, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 6, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 6, 2026

/ok to test

@mdboom mdboom marked this pull request as ready for review February 9, 2026 20:53
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 9, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 9, 2026

/ok to test

@mdboom mdboom requested a review from leofang February 9, 2026 21:15
@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Feb 9, 2026
Comment on lines +11 to +12
It supports the most important subset of the IntEnum API. See `test_enum` in
`cuda_bindings/tests/test_basics.py` for details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this comment I assume this means that we're only supporting a subset of an IntEnum and given we're returning these that would probably constitute an API break?

I.e. because we were previously returning an IntEnum as error codes, if someone has code that does something like isinstance(value, enum.IntEnum), that would then start returning False?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But moving away from Enum/IntEnum is our only route to gain performance on the error handling side. I think this ducktyping is good enough and the break due to type checking seems to be theoretical. Users most likely would just write isinstance(err, CUresult) (which is nonbreaking) instead of isinstance(err, Enum).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- I'm probably being overly cautious with the word subset here. I carefully looked at everything IntEnum exposes and replicated it here. The test even tests stdlib and our implementation using the same test code. You are right that the biggest visible change is the isinstance(value, enum.IntEnum) thing, but I agree with @leofang, that seems unlikely.

In any event, I'll add a CHANGELOG entry here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we inherited from enum.IntEnum instead of int would that nullify some of the performance gains?

Could we use __instancecheck__ to make isinstance(enum.IntEnum) and isinstance(enum.Enum) work as they did previously?

Either of these would likely limit the breakage to if someone was using anti-patterns like using type(...) or something like that.

Copy link
Contributor Author

@mdboom mdboom Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we inherited from enum.IntEnum instead of int would that nullify some of the performance gains?

Yeah, unfortunately you can't bring in the value type (IntEnum) without bringing in the metaclass and all of its startup cost:

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Could we use instancecheck to make isinstance(enum.IntEnum) and isinstance(enum.Enum) work as they did previously?

Unfortunately not. __instancecheck__ is a method on the type/metaclass that you are testing for. In other words, to make isinstance(FastEnum.VALUE, enum.IntEnum) work, we would have to monkeypatch an __instancecheck__ to enum.IntEnum.

This GitHub code search with a regex (for isinstance\(.*, IntEnum\) in a file that also contains the word cuda throws up a few possible problematic things. I guess we need to decide whether the performance matters enough that it's worth the pain to help our users fix those things in a few places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monkeypatching IntEnum's __instancecheck__ does work, with the loss of some performance:

+------------------------------+-------------------+-----------------------+
| isinstance(enum, IntEnum)    | 48.1 ns           | 80.6 ns: 1.68x slower |
+------------------------------+-------------------+-----------------------+

If performance matters to our users, we can suggest updating their code to isinstance(enum, FastEnum), which has no penalty from the monkeypatching.

Maybe this is an ok solution for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leofang, @kkraus14: So I think we just need to do decide:

  1. We merge this as-is where isinstance(enum, IntEnum) returns False.
  2. We include a monkeypatch to make isinstance(enum, IntEnum) work. This makes that isinstance check about 68% slower (but that check is exclusive to frameworks doing things in general, not really in user code), but we wouldn't break dependent libraries. As with all monkeypatches, unintended consequences are a little hard to reason about. I /think/ this is safe, but I wouldn't bet my life on it.
  3. We don't do this whole thing right now. (And, as a reminder to myself, to revert the generator changes if that's what we decide).

This is what (2) looks like:

def __instancecheck__(cls, instance) -> bool:
    if isinstance(instance, FastEnum):
        return True
    return issubclass(type(instance), IntEnum)


type(IntEnum).__instancecheck__ = __instancecheck__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for 1 and if there is any bug report we do 2 in a patch release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'd vote for 1 and then fallback to 2 if required based on user feedback.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 10, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 10, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 10, 2026

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants