-
Notifications
You must be signed in to change notification settings - Fork 245
Cythonize _linker.py #1604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Andy-Jost
wants to merge
21
commits into
NVIDIA:main
Choose a base branch
from
Andy-Jost:cythonize-linker
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Cythonize _linker.py #1604
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Rename _program.py to _program.pyx - Convert Program to cdef class with _program.pxd declarations - Extract _MembersNeededForFinalize to module-level _ProgramMNFF (nested classes not allowed in cdef class) - Add __repr__ method to Program - Keep ProgramOptions as @DataClass (unchanged) - Keep weakref.finalize pattern for handle cleanup
- Move _translate_program_options to Program_translate_options (cdef) - Move _can_load_generated_ptx to Program_can_load_generated_ptx (cdef) - Remove unused TYPE_CHECKING import block - Follow _memory/_buffer.pyx helper function patterns
- Reorganize file structure per developer guide (principal class first) - Add module docstring, __all__, type alias section - Factor long methods into cdef inline helpers - Add proper exception specs to cdef functions - Fix docstrings (use :class: refs, public paths) - Add type annotations to public methods - Inline _nvvm_exception_manager (single use) - Remove Union import, use | syntax - Add public Program.driver_can_load_nvrtc_ptx_output() API - Update tests to use new public API Closes NVIDIA#1082
Add fixtures for different Program backends (NVRTC, PTX, NVVM) and ObjectCode code types (cubin, PTX, LTOIR). Split API_TYPES into more precise HASH_TYPES, EQ_TYPES, and WEAKREF_TYPES lists. Derive DICT_KEY_TYPES and WEAK_KEY_TYPES for collection tests.
- Add NvrtcProgramHandle and NvvmProgramHandle to resource handles module - Add function pointer initialization for nvrtcDestroyProgram and nvvmDestroyProgram - Forward-declare nvvmProgram to avoid nvvm.h dependency - Refactor detail::make_py to accept module name parameter - Remove _ProgramMNFF class from _program.pyx - Program now uses typed handles directly with RAII cleanup - Update handle property to return None when handle is null
- Add NVVMError exception class - Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log - Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log - Remove vestigial supported_error_type fused type - Simplify HANDLE_RETURN to directly take cydriver.CUresult
- Change cdef function return types from ObjectCode to object (Cython limitation) - Remove unused imports: intptr_t, NvrtcProgramHandle, NvvmProgramHandle, as_intptr - Update as_py(NvvmProgramHandle) to return Python int via PyLong_FromSsize_t - Update test assertions: remove handle checks after close(), test idempotency instead - Update NVVM error message regex to match new unified format
… SPDX dates - Remove Program.driver_can_load_nvrtc_ptx_output() public static method - Make _can_load_generated_ptx a cpdef (callable from Python tests) - Update tests to import _can_load_generated_ptx from cuda.core._program - Update SPDX copyright years to 2024-2026 for files with 2024-2025 - Update get_kernel docstring: name parameter is str | bytes Co-authored-by: Cursor <cursoragent@cursor.com>
…error helpers - Make NVVMError inherit from cuda.bindings.nvvm.nvvmError for compatibility with code catching nvvmError, while also inheriting from CUDAError - Simplify HANDLE_RETURN_NVRTC and HANDLE_RETURN_NVVM to just check success and delegate to helper functions - Move all error handling logic into _raise_nvrtc_error and _raise_nvvm_error Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Rename _linker.py to _linker.pyx with cdef class Linker. Create _linker.pxd with typed attribute declarations. Move inner _MembersNeededForFinalize class to module level. No behavior change; all existing tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>
Add resource handle infrastructure for linker cythonization: - NvJitLinkHandle and CuLinkHandle in resource_handles.hpp/.cpp with shared_ptr RAII ownership and automatic destroy on release. - TaggedHandle<T, int> template to make void*-based handle types (NvvmProgramHandle, NvJitLinkHandle) distinct for C++ overloading. - Factory functions, as_cu/as_intptr/as_py accessors, and function pointer initialization for nvJitLinkDestroy and cuLinkDestroy. - HANDLE_RETURN_NVJITLINK in cuda_utils for nvJitLink error handling with automatic error log retrieval. Pure infrastructure; no changes to _linker.pyx yet. Co-authored-by: Cursor <cursoragent@cursor.com>
The Linker class now holds NvJitLinkHandle and CuLinkHandle shared_ptrs directly as cdef attributes, replacing the _LinkerMembersNeededForFinalize helper class and weakref.finalize destructor pattern. close() simply resets the shared_ptr. Also removes _const_char_keep_alive (unnecessary) and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com>
Both nvJitLink and driver paths now use cynvjitlink/cydriver C-level calls directly, releasing the GIL around expensive operations. Option arrays built with std::vector (RAII, no manual malloc/free). Removed _exception_manager context manager (was too broad, caught TypeError); driver path errors annotated via targeted except CUDAError. Removed ctypes dependency, _input_type_from_code_type (inlined), and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace _nvjitlink Python module global with _use_nvjitlink_backend flag. Input type dicts now use C-level enum values (cynvjitlink/cydriver) instead of Python enum objects. The cuda.bindings.nvjitlink module is only imported locally for availability detection. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add module docstring and __all__ declaration - Reorder file: Linker (principal) -> LinkerOptions (supporting) -> cdef inline helpers -> private state - Extract cdef inline helpers: Linker_init, Linker_add_code_object, Linker_link, Linker_annotate_error_log - Fix import ordering per developer guide (5 groups) - Apply c_ prefix to cdef variables for clarity - Replace _formatted_options/option_keys with typed log members - Cache decoded log strings after link() for efficiency - Type Linker _linker member in _program.pxd (object -> Linker) Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # cuda_core/cuda/core/_cpp/resource_handles.cpp # cuda_core/cuda/core/_cpp/resource_handles.hpp # cuda_core/cuda/core/_program.pxd # cuda_core/cuda/core/_resource_handles.pxd # cuda_core/cuda/core/_resource_handles.pyx # cuda_core/cuda/core/_utils/cuda_utils.pxd # cuda_core/cuda/core/_utils/cuda_utils.pyx # cuda_core/docs/source/release/0.6.x-notes.rst
Remove unused cimports (CuLinkHandle, NvJitLinkHandle already declared in .pxd) and use __import__ for the availability check to avoid an unused-import lint error. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
Contributor
Author
|
/ok to test fc8e0c0 |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Cythonize the
_linkermodule, continuing the work from #1565 (Program) to complete #1079. This moves all linker backend calls (nvJitLink and cuLink) to the C level, releases the GIL during linking operations, and replaces the weakref/MNFF cleanup with RAII resource handles.Changes
_linker.pyto_linker.pyxwith acdef class LinkerNvJitLinkHandleandCuLinkHandleRAII wrappers to the resource handle infrastructure, includingHANDLE_RETURN_NVJITLINKfor standardized nvJitLink error handling with log annotationcuda.bindings.nvjitlink/cuda.bindings.drivercalls with C-levelcynvjitlink/cydrivercalls wrapped innogilblocksweakref.finalize/ MNFF cleanup in favor ofshared_ptr-based RAII handlescdef inlinehelpers at the end, proper import ordering,__all__, module docstringLinker _linkermember in_program.pxd(wasobject)cynvjitlinkto the 12.9.0 build requirement rationale)Performance
The Linker constructor + link cycle is dominated by the nvJitLink backend (~3.5 ms), so the Python overhead reduction is not visible in wall-clock benchmarks. The primary benefit is GIL release during those ~3.5 ms backend calls, enabling concurrent work in multithreaded applications.
For reference, the Program constructor (from #1565) saw a ~44% reduction (9.4 -> 5.3 us) where the backend overhead was much smaller.
Test Coverage
test_linker.pyandtest_program.pytests pass (98 passed, 5 skipped, 1 xfailed)Closes #1079.
Made with Cursor