Skip to content

Implement CUDA support and GPU operations for tensor processing#8

Merged
Alwaysproblem merged 34 commits intomainfrom
cuda-tile
Mar 8, 2026
Merged

Implement CUDA support and GPU operations for tensor processing#8
Alwaysproblem merged 34 commits intomainfrom
cuda-tile

Conversation

@Alwaysproblem
Copy link
Owner

@Alwaysproblem Alwaysproblem commented Mar 8, 2026

Pull request overview

This PR adds a CUDA/GPU-oriented path to the mlir/cuda-tile Toy-based compiler flow, including a new matmul op, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.

Changes:

  • Add Toy dialect/compiler extensions: matmul op + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass.
  • Add build/sync scripts, devcontainer configuration, and VS Code configs to stand up a CUDA/MLIR development environment.
  • Add sample .toy/.mlir programs and CUDA shim/kernel sources demonstrating GPU execution.

Reviewed changes

Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
mlir/cuda-tile/vscode/settings.json VS Code CMake/C++ tooling configuration for the project.
mlir/cuda-tile/vscode/launch.json VS Code debug launch configuration.
mlir/cuda-tile/vscode/cmake-kits.json CMake Tools kit definition for the workspace.
mlir/cuda-tile/vscode/c_cpp_properties.json IntelliSense configuration for the workspace.
mlir/cuda-tile/vscode/.zsh_history Added shell history file (should not be committed).
mlir/cuda-tile/vscode/.initial_container.sh Helper script to start a CUDA-enabled dev container.
mlir/cuda-tile/scripts/update.sh Script to sync/update tutorial sources from LLVM examples.
mlir/cuda-tile/scripts/sync_deps.sh Script to clone/sync LLVM + cuda-tile dependencies.
mlir/cuda-tile/scripts/patch/matmul.patch Patch capturing matmul-related changes against upstream tutorial chapters.
mlir/cuda-tile/scripts/patch/matmul.back.patch Alternate/back patch variant for matmul changes.
mlir/cuda-tile/scripts/make_patch.sh Script to generate the chapter diff patch.
mlir/cuda-tile/scripts/build_deps.sh Script to configure/build/install LLVM+MLIR with CUDA runner enabled.
mlir/cuda-tile/scripts/build_cuda_tile.sh Script to build/install the external cuda-tile dependency.
mlir/cuda-tile/scripts/apply_patch.sh Script to apply the provided chapter patch to a local copy.
mlir/cuda-tile/sample/validation.py Numpy validation snippet for matmul results.
mlir/cuda-tile/sample/test.mlir Sample MLIR module using CUDA shim calls.
mlir/cuda-tile/sample/matmul_numpy.py Numpy reference implementation for Toy examples.
mlir/cuda-tile/sample/matmul.toy.mlir Sample Toy-MLIR for a matmul/transpose case.
mlir/cuda-tile/sample/matmul.toy Sample Toy source including matmul usage.
mlir/cuda-tile/sample/lowering-llvm.sh Script to lower MLIR to LLVM dialect/IR and link with CUDA shim.
mlir/cuda-tile/sample/gpu.mlir Sample MLIR using toy.gpu_func + toy.launch_gpu.
mlir/cuda-tile/sample/gpu-func.mlir Expanded host-side CUDA shim sample for launching a kernel.
mlir/cuda-tile/sample/example.toy Minimal Toy example program.
mlir/cuda-tile/sample/cuda-tile.mlir Sample MLIR including a cuda_tile.module entry.
mlir/cuda-tile/explore/run.sh Experimental script for GPU lowering to NVVM/LLVM IR.
mlir/cuda-tile/explore/outlined.mlir Sample MLIR with GPU kernel outlining results.
mlir/cuda-tile/explore/gpu.mlir Experimental MLIR showing gpu.* dialect usage.
mlir/cuda-tile/explore/extern_fun.mlir Experimental MLIR for external/shim function calls.
mlir/cuda-tile/cuda_shim/vector_add.cu CUDA kernel source used for PTX generation testing.
mlir/cuda-tile/cuda_shim/outlined_gpu_kernel.cu CUDA kernels corresponding to outlined Toy GPU subgraphs.
mlir/cuda-tile/cuda_shim/load_ptx_main.cpp Minimal C++ demo to load PTX and launch via shim ABI.
mlir/cuda-tile/build_with_conda.sh Build helper for conda-based environments.
mlir/cuda-tile/build.sh Build helper for non-conda environments.
mlir/cuda-tile/Toy/parser/AST.cpp Toy AST dumper implementation (copied/ported from tutorial).
mlir/cuda-tile/Toy/mlir/ToyCombine.td TableGen DRR patterns for Toy canonicalization.
mlir/cuda-tile/Toy/mlir/ToyCombine.cpp C++ canonicalization patterns registration.
mlir/cuda-tile/Toy/mlir/ShapeInferencePass.cpp Shape inference pass implementation.
mlir/cuda-tile/Toy/mlir/MLIRGen.cpp MLIR generation updates including matmul emission.
mlir/cuda-tile/Toy/mlir/LowerToLLVM.cpp Lowering pipeline from Toy/Affine/SCF to LLVM dialect.
mlir/cuda-tile/Toy/mlir/LowerToGpu.cpp Pass to outline GPU-eligible Toy op subgraphs into toy.gpu_func.
mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp Pass to write CUDA Tile bytecode, run tileiras, and annotate launches with CUDA binary metadata.
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.td Shape inference op interface definition (TableGen).
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.h Generated interface header inclusion wrapper.
mlir/cuda-tile/Toy/include/toy/Passes.h Pass factory declarations (incl. GPU/cuda-tile passes).
mlir/cuda-tile/Toy/include/toy/Parser.h Toy parser implementation header.
mlir/cuda-tile/Toy/include/toy/Ops.td Toy ODS op definitions (adds matmul, launch_gpu, gpu_func).
mlir/cuda-tile/Toy/include/toy/MLIRGen.h MLIRGen API header.
mlir/cuda-tile/Toy/include/toy/Lexer.h Toy lexer header.
mlir/cuda-tile/Toy/include/toy/Dialect.h Toy dialect + op/interface includes.
mlir/cuda-tile/Toy/include/toy/CMakeLists.txt TableGen targets for Toy dialect/ops/interfaces.
mlir/cuda-tile/Toy/include/toy/AST.h Toy AST node definitions.
mlir/cuda-tile/Toy/include/cuda_shim/SupportOps.hpp Defines which Toy ops are considered GPU-eligible by the outlining pass.
mlir/cuda-tile/Toy/include/CMakeLists.txt Adds Toy include subdirectory.
mlir/cuda-tile/Toy/cuda_wrapper/CMakeLists.txt Builds the CUDA shim wrapper library.
mlir/cuda-tile/Toy/CMakeLists.txt Builds toy-cuda tool and wires MLIR/CUDA Tile deps.
mlir/cuda-tile/CMakeLists.txt Top-level CMake config for the cuda-tile MLIR/Toy project.
mlir/cuda-tile/.pre-commit-config.yaml Pre-commit hooks config (clang-format/cmake-format/etc.).
mlir/cuda-tile/.gitignore Ignores CUDA/LLVM build artifacts and generated binaries.
mlir/cuda-tile/.envsetup.sh Conda activation helper script.
mlir/cuda-tile/.devcontainer/noop.txt Placeholder to satisfy devcontainer COPY steps.
mlir/cuda-tile/.devcontainer/devcontainer.json Devcontainer config for CUDA/MLIR development.
mlir/cuda-tile/.devcontainer/Dockerfile Devcontainer image definition with LLVM/Clang/CUDA tooling.
mlir/cuda-tile/.clang-format Clang-format configuration for this subproject.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Alwaysproblem Alwaysproblem self-assigned this Mar 8, 2026
Copilot AI review requested due to automatic review settings March 8, 2026 02:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CUDA/GPU-oriented path to the mlir/cuda-tile Toy-based compiler flow, including a new matmul op, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.

Changes:

  • Add Toy dialect/compiler extensions: matmul op + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass.
  • Add build/sync scripts, devcontainer configuration, and VS Code configs to stand up a CUDA/MLIR development environment.
  • Add sample .toy/.mlir programs and CUDA shim/kernel sources demonstrating GPU execution.

Reviewed changes

Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
mlir/cuda-tile/vscode/settings.json VS Code CMake/C++ tooling configuration for the project.
mlir/cuda-tile/vscode/launch.json VS Code debug launch configuration.
mlir/cuda-tile/vscode/cmake-kits.json CMake Tools kit definition for the workspace.
mlir/cuda-tile/vscode/c_cpp_properties.json IntelliSense configuration for the workspace.
mlir/cuda-tile/vscode/.zsh_history Added shell history file (should not be committed).
mlir/cuda-tile/vscode/.initial_container.sh Helper script to start a CUDA-enabled dev container.
mlir/cuda-tile/scripts/update.sh Script to sync/update tutorial sources from LLVM examples.
mlir/cuda-tile/scripts/sync_deps.sh Script to clone/sync LLVM + cuda-tile dependencies.
mlir/cuda-tile/scripts/patch/matmul.patch Patch capturing matmul-related changes against upstream tutorial chapters.
mlir/cuda-tile/scripts/patch/matmul.back.patch Alternate/back patch variant for matmul changes.
mlir/cuda-tile/scripts/make_patch.sh Script to generate the chapter diff patch.
mlir/cuda-tile/scripts/build_deps.sh Script to configure/build/install LLVM+MLIR with CUDA runner enabled.
mlir/cuda-tile/scripts/build_cuda_tile.sh Script to build/install the external cuda-tile dependency.
mlir/cuda-tile/scripts/apply_patch.sh Script to apply the provided chapter patch to a local copy.
mlir/cuda-tile/sample/validation.py Numpy validation snippet for matmul results.
mlir/cuda-tile/sample/test.mlir Sample MLIR module using CUDA shim calls.
mlir/cuda-tile/sample/matmul_numpy.py Numpy reference implementation for Toy examples.
mlir/cuda-tile/sample/matmul.toy.mlir Sample Toy-MLIR for a matmul/transpose case.
mlir/cuda-tile/sample/matmul.toy Sample Toy source including matmul usage.
mlir/cuda-tile/sample/lowering-llvm.sh Script to lower MLIR to LLVM dialect/IR and link with CUDA shim.
mlir/cuda-tile/sample/gpu.mlir Sample MLIR using toy.gpu_func + toy.launch_gpu.
mlir/cuda-tile/sample/gpu-func.mlir Expanded host-side CUDA shim sample for launching a kernel.
mlir/cuda-tile/sample/example.toy Minimal Toy example program.
mlir/cuda-tile/sample/cuda-tile.mlir Sample MLIR including a cuda_tile.module entry.
mlir/cuda-tile/explore/run.sh Experimental script for GPU lowering to NVVM/LLVM IR.
mlir/cuda-tile/explore/outlined.mlir Sample MLIR with GPU kernel outlining results.
mlir/cuda-tile/explore/gpu.mlir Experimental MLIR showing gpu.* dialect usage.
mlir/cuda-tile/explore/extern_fun.mlir Experimental MLIR for external/shim function calls.
mlir/cuda-tile/cuda_shim/vector_add.cu CUDA kernel source used for PTX generation testing.
mlir/cuda-tile/cuda_shim/outlined_gpu_kernel.cu CUDA kernels corresponding to outlined Toy GPU subgraphs.
mlir/cuda-tile/cuda_shim/load_ptx_main.cpp Minimal C++ demo to load PTX and launch via shim ABI.
mlir/cuda-tile/build_with_conda.sh Build helper for conda-based environments.
mlir/cuda-tile/build.sh Build helper for non-conda environments.
mlir/cuda-tile/Toy/parser/AST.cpp Toy AST dumper implementation (copied/ported from tutorial).
mlir/cuda-tile/Toy/mlir/ToyCombine.td TableGen DRR patterns for Toy canonicalization.
mlir/cuda-tile/Toy/mlir/ToyCombine.cpp C++ canonicalization patterns registration.
mlir/cuda-tile/Toy/mlir/ShapeInferencePass.cpp Shape inference pass implementation.
mlir/cuda-tile/Toy/mlir/MLIRGen.cpp MLIR generation updates including matmul emission.
mlir/cuda-tile/Toy/mlir/LowerToLLVM.cpp Lowering pipeline from Toy/Affine/SCF to LLVM dialect.
mlir/cuda-tile/Toy/mlir/LowerToGpu.cpp Pass to outline GPU-eligible Toy op subgraphs into toy.gpu_func.
mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp Pass to write CUDA Tile bytecode, run tileiras, and annotate launches with CUDA binary metadata.
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.td Shape inference op interface definition (TableGen).
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.h Generated interface header inclusion wrapper.
mlir/cuda-tile/Toy/include/toy/Passes.h Pass factory declarations (incl. GPU/cuda-tile passes).
mlir/cuda-tile/Toy/include/toy/Parser.h Toy parser implementation header.
mlir/cuda-tile/Toy/include/toy/Ops.td Toy ODS op definitions (adds matmul, launch_gpu, gpu_func).
mlir/cuda-tile/Toy/include/toy/MLIRGen.h MLIRGen API header.
mlir/cuda-tile/Toy/include/toy/Lexer.h Toy lexer header.
mlir/cuda-tile/Toy/include/toy/Dialect.h Toy dialect + op/interface includes.
mlir/cuda-tile/Toy/include/toy/CMakeLists.txt TableGen targets for Toy dialect/ops/interfaces.
mlir/cuda-tile/Toy/include/toy/AST.h Toy AST node definitions.
mlir/cuda-tile/Toy/include/cuda_shim/SupportOps.hpp Defines which Toy ops are considered GPU-eligible by the outlining pass.
mlir/cuda-tile/Toy/include/CMakeLists.txt Adds Toy include subdirectory.
mlir/cuda-tile/Toy/cuda_wrapper/CMakeLists.txt Builds the CUDA shim wrapper library.
mlir/cuda-tile/Toy/CMakeLists.txt Builds toy-cuda tool and wires MLIR/CUDA Tile deps.
mlir/cuda-tile/CMakeLists.txt Top-level CMake config for the cuda-tile MLIR/Toy project.
mlir/cuda-tile/.pre-commit-config.yaml Pre-commit hooks config (clang-format/cmake-format/etc.).
mlir/cuda-tile/.gitignore Ignores CUDA/LLVM build artifacts and generated binaries.
mlir/cuda-tile/.envsetup.sh Conda activation helper script.
mlir/cuda-tile/.devcontainer/noop.txt Placeholder to satisfy devcontainer COPY steps.
mlir/cuda-tile/.devcontainer/devcontainer.json Devcontainer config for CUDA/MLIR development.
mlir/cuda-tile/.devcontainer/Dockerfile Devcontainer image definition with LLVM/Clang/CUDA tooling.
mlir/cuda-tile/.clang-format Clang-format configuration for this subproject.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Alwaysproblem and others added 3 commits March 8, 2026 17:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Alwaysproblem Alwaysproblem merged commit 8a28372 into main Mar 8, 2026
0 of 3 checks passed
@Alwaysproblem Alwaysproblem deleted the cuda-tile branch March 8, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants