Implement CUDA support and GPU operations for tensor processing#8
Merged
Alwaysproblem merged 34 commits intomainfrom Mar 8, 2026
Merged
Implement CUDA support and GPU operations for tensor processing#8Alwaysproblem merged 34 commits intomainfrom
Alwaysproblem merged 34 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a CUDA/GPU-oriented path to the mlir/cuda-tile Toy-based compiler flow, including a new matmul op, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.
Changes:
- Add Toy dialect/compiler extensions:
matmulop + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass. - Add build/sync scripts, devcontainer configuration, and VS Code configs to stand up a CUDA/MLIR development environment.
- Add sample
.toy/.mlirprograms and CUDA shim/kernel sources demonstrating GPU execution.
Reviewed changes
Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/cuda-tile/vscode/settings.json | VS Code CMake/C++ tooling configuration for the project. |
| mlir/cuda-tile/vscode/launch.json | VS Code debug launch configuration. |
| mlir/cuda-tile/vscode/cmake-kits.json | CMake Tools kit definition for the workspace. |
| mlir/cuda-tile/vscode/c_cpp_properties.json | IntelliSense configuration for the workspace. |
| mlir/cuda-tile/vscode/.zsh_history | Added shell history file (should not be committed). |
| mlir/cuda-tile/vscode/.initial_container.sh | Helper script to start a CUDA-enabled dev container. |
| mlir/cuda-tile/scripts/update.sh | Script to sync/update tutorial sources from LLVM examples. |
| mlir/cuda-tile/scripts/sync_deps.sh | Script to clone/sync LLVM + cuda-tile dependencies. |
| mlir/cuda-tile/scripts/patch/matmul.patch | Patch capturing matmul-related changes against upstream tutorial chapters. |
| mlir/cuda-tile/scripts/patch/matmul.back.patch | Alternate/back patch variant for matmul changes. |
| mlir/cuda-tile/scripts/make_patch.sh | Script to generate the chapter diff patch. |
| mlir/cuda-tile/scripts/build_deps.sh | Script to configure/build/install LLVM+MLIR with CUDA runner enabled. |
| mlir/cuda-tile/scripts/build_cuda_tile.sh | Script to build/install the external cuda-tile dependency. |
| mlir/cuda-tile/scripts/apply_patch.sh | Script to apply the provided chapter patch to a local copy. |
| mlir/cuda-tile/sample/validation.py | Numpy validation snippet for matmul results. |
| mlir/cuda-tile/sample/test.mlir | Sample MLIR module using CUDA shim calls. |
| mlir/cuda-tile/sample/matmul_numpy.py | Numpy reference implementation for Toy examples. |
| mlir/cuda-tile/sample/matmul.toy.mlir | Sample Toy-MLIR for a matmul/transpose case. |
| mlir/cuda-tile/sample/matmul.toy | Sample Toy source including matmul usage. |
| mlir/cuda-tile/sample/lowering-llvm.sh | Script to lower MLIR to LLVM dialect/IR and link with CUDA shim. |
| mlir/cuda-tile/sample/gpu.mlir | Sample MLIR using toy.gpu_func + toy.launch_gpu. |
| mlir/cuda-tile/sample/gpu-func.mlir | Expanded host-side CUDA shim sample for launching a kernel. |
| mlir/cuda-tile/sample/example.toy | Minimal Toy example program. |
| mlir/cuda-tile/sample/cuda-tile.mlir | Sample MLIR including a cuda_tile.module entry. |
| mlir/cuda-tile/explore/run.sh | Experimental script for GPU lowering to NVVM/LLVM IR. |
| mlir/cuda-tile/explore/outlined.mlir | Sample MLIR with GPU kernel outlining results. |
| mlir/cuda-tile/explore/gpu.mlir | Experimental MLIR showing gpu.* dialect usage. |
| mlir/cuda-tile/explore/extern_fun.mlir | Experimental MLIR for external/shim function calls. |
| mlir/cuda-tile/cuda_shim/vector_add.cu | CUDA kernel source used for PTX generation testing. |
| mlir/cuda-tile/cuda_shim/outlined_gpu_kernel.cu | CUDA kernels corresponding to outlined Toy GPU subgraphs. |
| mlir/cuda-tile/cuda_shim/load_ptx_main.cpp | Minimal C++ demo to load PTX and launch via shim ABI. |
| mlir/cuda-tile/build_with_conda.sh | Build helper for conda-based environments. |
| mlir/cuda-tile/build.sh | Build helper for non-conda environments. |
| mlir/cuda-tile/Toy/parser/AST.cpp | Toy AST dumper implementation (copied/ported from tutorial). |
| mlir/cuda-tile/Toy/mlir/ToyCombine.td | TableGen DRR patterns for Toy canonicalization. |
| mlir/cuda-tile/Toy/mlir/ToyCombine.cpp | C++ canonicalization patterns registration. |
| mlir/cuda-tile/Toy/mlir/ShapeInferencePass.cpp | Shape inference pass implementation. |
| mlir/cuda-tile/Toy/mlir/MLIRGen.cpp | MLIR generation updates including matmul emission. |
| mlir/cuda-tile/Toy/mlir/LowerToLLVM.cpp | Lowering pipeline from Toy/Affine/SCF to LLVM dialect. |
| mlir/cuda-tile/Toy/mlir/LowerToGpu.cpp | Pass to outline GPU-eligible Toy op subgraphs into toy.gpu_func. |
| mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp | Pass to write CUDA Tile bytecode, run tileiras, and annotate launches with CUDA binary metadata. |
| mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.td | Shape inference op interface definition (TableGen). |
| mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.h | Generated interface header inclusion wrapper. |
| mlir/cuda-tile/Toy/include/toy/Passes.h | Pass factory declarations (incl. GPU/cuda-tile passes). |
| mlir/cuda-tile/Toy/include/toy/Parser.h | Toy parser implementation header. |
| mlir/cuda-tile/Toy/include/toy/Ops.td | Toy ODS op definitions (adds matmul, launch_gpu, gpu_func). |
| mlir/cuda-tile/Toy/include/toy/MLIRGen.h | MLIRGen API header. |
| mlir/cuda-tile/Toy/include/toy/Lexer.h | Toy lexer header. |
| mlir/cuda-tile/Toy/include/toy/Dialect.h | Toy dialect + op/interface includes. |
| mlir/cuda-tile/Toy/include/toy/CMakeLists.txt | TableGen targets for Toy dialect/ops/interfaces. |
| mlir/cuda-tile/Toy/include/toy/AST.h | Toy AST node definitions. |
| mlir/cuda-tile/Toy/include/cuda_shim/SupportOps.hpp | Defines which Toy ops are considered GPU-eligible by the outlining pass. |
| mlir/cuda-tile/Toy/include/CMakeLists.txt | Adds Toy include subdirectory. |
| mlir/cuda-tile/Toy/cuda_wrapper/CMakeLists.txt | Builds the CUDA shim wrapper library. |
| mlir/cuda-tile/Toy/CMakeLists.txt | Builds toy-cuda tool and wires MLIR/CUDA Tile deps. |
| mlir/cuda-tile/CMakeLists.txt | Top-level CMake config for the cuda-tile MLIR/Toy project. |
| mlir/cuda-tile/.pre-commit-config.yaml | Pre-commit hooks config (clang-format/cmake-format/etc.). |
| mlir/cuda-tile/.gitignore | Ignores CUDA/LLVM build artifacts and generated binaries. |
| mlir/cuda-tile/.envsetup.sh | Conda activation helper script. |
| mlir/cuda-tile/.devcontainer/noop.txt | Placeholder to satisfy devcontainer COPY steps. |
| mlir/cuda-tile/.devcontainer/devcontainer.json | Devcontainer config for CUDA/MLIR development. |
| mlir/cuda-tile/.devcontainer/Dockerfile | Devcontainer image definition with LLVM/Clang/CUDA tooling. |
| mlir/cuda-tile/.clang-format | Clang-format configuration for this subproject. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request overview
This PR adds a CUDA/GPU-oriented path to the
mlir/cuda-tileToy-based compiler flow, including a newmatmulop, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.Changes:
matmulop + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass..toy/.mlirprograms and CUDA shim/kernel sources demonstrating GPU execution.Reviewed changes
Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.
Show a summary per file
matmul-related changes against upstream tutorial chapters.matmulchanges.cuda-tiledependency.toy.gpu_func+toy.launch_gpu.cuda_tile.moduleentry.gpu.*dialect usage.matmulemission.toy.gpu_func.tileiras, and annotate launches with CUDA binary metadata.matmul,launch_gpu,gpu_func).toy-cudatool and wires MLIR/CUDA Tile deps.💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.