Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

Summary

Cythonize the _linker module, continuing the work from #1565 (Program) to complete #1079. This moves all linker backend calls (nvJitLink and cuLink) to the C level, releases the GIL during linking operations, and replaces the weakref/MNFF cleanup with RAII resource handles.

Changes

  • Rename _linker.py to _linker.pyx with a cdef class Linker
  • Add NvJitLinkHandle and CuLinkHandle RAII wrappers to the resource handle infrastructure, including HANDLE_RETURN_NVJITLINK for standardized nvJitLink error handling with log annotation
  • Replace all Python-level cuda.bindings.nvjitlink / cuda.bindings.driver calls with C-level cynvjitlink / cydriver calls wrapped in nogil blocks
  • Remove weakref.finalize / MNFF cleanup in favor of shared_ptr-based RAII handles
  • Reorganize the module per the developer guide: principal class first, cdef inline helpers at the end, proper import ordering, __all__, module docstring
  • Type the Linker _linker member in _program.pxd (was object)
  • Update release notes and breaking change note (add cynvjitlink to the 12.9.0 build requirement rationale)

Performance

The Linker constructor + link cycle is dominated by the nvJitLink backend (~3.5 ms), so the Python overhead reduction is not visible in wall-clock benchmarks. The primary benefit is GIL release during those ~3.5 ms backend calls, enabling concurrent work in multithreaded applications.

For reference, the Program constructor (from #1565) saw a ~44% reduction (9.4 -> 5.3 us) where the backend overhead was much smaller.

Test Coverage

  • All existing test_linker.py and test_program.py tests pass (98 passed, 5 skipped, 1 xfailed)
  • Full test suite: 1770 passed, 93 skipped, 2 xfailed

Closes #1079.

Made with Cursor

Andy-Jost and others added 21 commits February 3, 2026 16:24
- Rename _program.py to _program.pyx
- Convert Program to cdef class with _program.pxd declarations
- Extract _MembersNeededForFinalize to module-level _ProgramMNFF
  (nested classes not allowed in cdef class)
- Add __repr__ method to Program
- Keep ProgramOptions as @DataClass (unchanged)
- Keep weakref.finalize pattern for handle cleanup
- Move _translate_program_options to Program_translate_options (cdef)
- Move _can_load_generated_ptx to Program_can_load_generated_ptx (cdef)
- Remove unused TYPE_CHECKING import block
- Follow _memory/_buffer.pyx helper function patterns
- Reorganize file structure per developer guide (principal class first)
- Add module docstring, __all__, type alias section
- Factor long methods into cdef inline helpers
- Add proper exception specs to cdef functions
- Fix docstrings (use :class: refs, public paths)
- Add type annotations to public methods
- Inline _nvvm_exception_manager (single use)
- Remove Union import, use | syntax
- Add public Program.driver_can_load_nvrtc_ptx_output() API
- Update tests to use new public API

Closes NVIDIA#1082
Add fixtures for different Program backends (NVRTC, PTX, NVVM) and
ObjectCode code types (cubin, PTX, LTOIR). Split API_TYPES into more
precise HASH_TYPES, EQ_TYPES, and WEAKREF_TYPES lists. Derive
DICT_KEY_TYPES and WEAK_KEY_TYPES for collection tests.
- Add NvrtcProgramHandle and NvvmProgramHandle to resource handles module
- Add function pointer initialization for nvrtcDestroyProgram and nvvmDestroyProgram
- Forward-declare nvvmProgram to avoid nvvm.h dependency
- Refactor detail::make_py to accept module name parameter
- Remove _ProgramMNFF class from _program.pyx
- Program now uses typed handles directly with RAII cleanup
- Update handle property to return None when handle is null
- Add NVVMError exception class
- Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log
- Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log
- Remove vestigial supported_error_type fused type
- Simplify HANDLE_RETURN to directly take cydriver.CUresult
- Change cdef function return types from ObjectCode to object (Cython limitation)
- Remove unused imports: intptr_t, NvrtcProgramHandle, NvvmProgramHandle, as_intptr
- Update as_py(NvvmProgramHandle) to return Python int via PyLong_FromSsize_t
- Update test assertions: remove handle checks after close(), test idempotency instead
- Update NVVM error message regex to match new unified format
… SPDX dates

- Remove Program.driver_can_load_nvrtc_ptx_output() public static method
- Make _can_load_generated_ptx a cpdef (callable from Python tests)
- Update tests to import _can_load_generated_ptx from cuda.core._program
- Update SPDX copyright years to 2024-2026 for files with 2024-2025
- Update get_kernel docstring: name parameter is str | bytes

Co-authored-by: Cursor <cursoragent@cursor.com>
…error helpers

- Make NVVMError inherit from cuda.bindings.nvvm.nvvmError for compatibility
  with code catching nvvmError, while also inheriting from CUDAError
- Simplify HANDLE_RETURN_NVRTC and HANDLE_RETURN_NVVM to just check success
  and delegate to helper functions
- Move all error handling logic into _raise_nvrtc_error and _raise_nvvm_error

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Rename _linker.py to _linker.pyx with cdef class Linker. Create
_linker.pxd with typed attribute declarations. Move inner
_MembersNeededForFinalize class to module level.

No behavior change; all existing tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add resource handle infrastructure for linker cythonization:

- NvJitLinkHandle and CuLinkHandle in resource_handles.hpp/.cpp with
  shared_ptr RAII ownership and automatic destroy on release.
- TaggedHandle<T, int> template to make void*-based handle types
  (NvvmProgramHandle, NvJitLinkHandle) distinct for C++ overloading.
- Factory functions, as_cu/as_intptr/as_py accessors, and function
  pointer initialization for nvJitLinkDestroy and cuLinkDestroy.
- HANDLE_RETURN_NVJITLINK in cuda_utils for nvJitLink error handling
  with automatic error log retrieval.

Pure infrastructure; no changes to _linker.pyx yet.

Co-authored-by: Cursor <cursoragent@cursor.com>
The Linker class now holds NvJitLinkHandle and CuLinkHandle shared_ptrs
directly as cdef attributes, replacing the _LinkerMembersNeededForFinalize
helper class and weakref.finalize destructor pattern. close() simply
resets the shared_ptr. Also removes _const_char_keep_alive (unnecessary)
and TYPE_CHECKING block.

Co-authored-by: Cursor <cursoragent@cursor.com>
Both nvJitLink and driver paths now use cynvjitlink/cydriver C-level
calls directly, releasing the GIL around expensive operations. Option
arrays built with std::vector (RAII, no manual malloc/free). Removed
_exception_manager context manager (was too broad, caught TypeError);
driver path errors annotated via targeted except CUDAError. Removed
ctypes dependency, _input_type_from_code_type (inlined), and
TYPE_CHECKING block.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace _nvjitlink Python module global with _use_nvjitlink_backend flag.
Input type dicts now use C-level enum values (cynvjitlink/cydriver)
instead of Python enum objects. The cuda.bindings.nvjitlink module is
only imported locally for availability detection.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add module docstring and __all__ declaration
- Reorder file: Linker (principal) -> LinkerOptions (supporting) -> cdef inline helpers -> private state
- Extract cdef inline helpers: Linker_init, Linker_add_code_object, Linker_link, Linker_annotate_error_log
- Fix import ordering per developer guide (5 groups)
- Apply c_ prefix to cdef variables for clarity
- Replace _formatted_options/option_keys with typed log members
- Cache decoded log strings after link() for efficiency
- Type Linker _linker member in _program.pxd (object -> Linker)

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	cuda_core/cuda/core/_cpp/resource_handles.cpp
#	cuda_core/cuda/core/_cpp/resource_handles.hpp
#	cuda_core/cuda/core/_program.pxd
#	cuda_core/cuda/core/_resource_handles.pxd
#	cuda_core/cuda/core/_resource_handles.pyx
#	cuda_core/cuda/core/_utils/cuda_utils.pxd
#	cuda_core/cuda/core/_utils/cuda_utils.pyx
#	cuda_core/docs/source/release/0.6.x-notes.rst
Remove unused cimports (CuLinkHandle, NvJitLinkHandle already declared
in .pxd) and use __import__ for the availability check to avoid an
unused-import lint error.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost added this to the cuda.core v0.6.0 milestone Feb 11, 2026
@Andy-Jost Andy-Jost added P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Feb 11, 2026
@Andy-Jost Andy-Jost self-assigned this Feb 11, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test fc8e0c0

@Andy-Jost Andy-Jost marked this pull request as draft February 11, 2026 20:26
@github-actions
Copy link

@Andy-Jost Andy-Jost requested a review from leofang February 11, 2026 23:57
@Andy-Jost Andy-Jost marked this pull request as ready for review February 11, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cythonize _linker.py

1 participant