Implement CUDA support and GPU operations for tensor processing by Alwaysproblem · Pull Request #8 · Alwaysproblem/MLcompiler-tutorial

Alwaysproblem · 2026-03-08T02:08:12Z

Pull request overview

This PR adds a CUDA/GPU-oriented path to the mlir/cuda-tile Toy-based compiler flow, including a new matmul op, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.

Changes:

Add Toy dialect/compiler extensions: matmul op + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass.
Add build/sync scripts, devcontainer configuration, and VS Code configs to stand up a CUDA/MLIR development environment.
Add sample .toy/.mlir programs and CUDA shim/kernel sources demonstrating GPU execution.

Reviewed changes

Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
mlir/cuda-tile/vscode/settings.json	VS Code CMake/C++ tooling configuration for the project.
mlir/cuda-tile/vscode/launch.json	VS Code debug launch configuration.
mlir/cuda-tile/vscode/cmake-kits.json	CMake Tools kit definition for the workspace.
mlir/cuda-tile/vscode/c_cpp_properties.json	IntelliSense configuration for the workspace.
mlir/cuda-tile/vscode/.zsh_history	Added shell history file (should not be committed).
mlir/cuda-tile/vscode/.initial_container.sh	Helper script to start a CUDA-enabled dev container.
mlir/cuda-tile/scripts/update.sh	Script to sync/update tutorial sources from LLVM examples.
mlir/cuda-tile/scripts/sync_deps.sh	Script to clone/sync LLVM + cuda-tile dependencies.
mlir/cuda-tile/scripts/patch/matmul.patch	Patch capturing `matmul`-related changes against upstream tutorial chapters.
mlir/cuda-tile/scripts/patch/matmul.back.patch	Alternate/back patch variant for `matmul` changes.
mlir/cuda-tile/scripts/make_patch.sh	Script to generate the chapter diff patch.
mlir/cuda-tile/scripts/build_deps.sh	Script to configure/build/install LLVM+MLIR with CUDA runner enabled.
mlir/cuda-tile/scripts/build_cuda_tile.sh	Script to build/install the external `cuda-tile` dependency.
mlir/cuda-tile/scripts/apply_patch.sh	Script to apply the provided chapter patch to a local copy.
mlir/cuda-tile/sample/validation.py	Numpy validation snippet for matmul results.
mlir/cuda-tile/sample/test.mlir	Sample MLIR module using CUDA shim calls.
mlir/cuda-tile/sample/matmul_numpy.py	Numpy reference implementation for Toy examples.
mlir/cuda-tile/sample/matmul.toy.mlir	Sample Toy-MLIR for a matmul/transpose case.
mlir/cuda-tile/sample/matmul.toy	Sample Toy source including matmul usage.
mlir/cuda-tile/sample/lowering-llvm.sh	Script to lower MLIR to LLVM dialect/IR and link with CUDA shim.
mlir/cuda-tile/sample/gpu.mlir	Sample MLIR using `toy.gpu_func` + `toy.launch_gpu`.
mlir/cuda-tile/sample/gpu-func.mlir	Expanded host-side CUDA shim sample for launching a kernel.
mlir/cuda-tile/sample/example.toy	Minimal Toy example program.
mlir/cuda-tile/sample/cuda-tile.mlir	Sample MLIR including a `cuda_tile.module` entry.
mlir/cuda-tile/explore/run.sh	Experimental script for GPU lowering to NVVM/LLVM IR.
mlir/cuda-tile/explore/outlined.mlir	Sample MLIR with GPU kernel outlining results.
mlir/cuda-tile/explore/gpu.mlir	Experimental MLIR showing `gpu.*` dialect usage.
mlir/cuda-tile/explore/extern_fun.mlir	Experimental MLIR for external/shim function calls.
mlir/cuda-tile/cuda_shim/vector_add.cu	CUDA kernel source used for PTX generation testing.
mlir/cuda-tile/cuda_shim/outlined_gpu_kernel.cu	CUDA kernels corresponding to outlined Toy GPU subgraphs.
mlir/cuda-tile/cuda_shim/load_ptx_main.cpp	Minimal C++ demo to load PTX and launch via shim ABI.
mlir/cuda-tile/build_with_conda.sh	Build helper for conda-based environments.
mlir/cuda-tile/build.sh	Build helper for non-conda environments.
mlir/cuda-tile/Toy/parser/AST.cpp	Toy AST dumper implementation (copied/ported from tutorial).
mlir/cuda-tile/Toy/mlir/ToyCombine.td	TableGen DRR patterns for Toy canonicalization.
mlir/cuda-tile/Toy/mlir/ToyCombine.cpp	C++ canonicalization patterns registration.
mlir/cuda-tile/Toy/mlir/ShapeInferencePass.cpp	Shape inference pass implementation.
mlir/cuda-tile/Toy/mlir/MLIRGen.cpp	MLIR generation updates including `matmul` emission.
mlir/cuda-tile/Toy/mlir/LowerToLLVM.cpp	Lowering pipeline from Toy/Affine/SCF to LLVM dialect.
mlir/cuda-tile/Toy/mlir/LowerToGpu.cpp	Pass to outline GPU-eligible Toy op subgraphs into `toy.gpu_func`.
mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp	Pass to write CUDA Tile bytecode, run `tileiras`, and annotate launches with CUDA binary metadata.
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.td	Shape inference op interface definition (TableGen).
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.h	Generated interface header inclusion wrapper.
mlir/cuda-tile/Toy/include/toy/Passes.h	Pass factory declarations (incl. GPU/cuda-tile passes).
mlir/cuda-tile/Toy/include/toy/Parser.h	Toy parser implementation header.
mlir/cuda-tile/Toy/include/toy/Ops.td	Toy ODS op definitions (adds `matmul`, `launch_gpu`, `gpu_func`).
mlir/cuda-tile/Toy/include/toy/MLIRGen.h	MLIRGen API header.
mlir/cuda-tile/Toy/include/toy/Lexer.h	Toy lexer header.
mlir/cuda-tile/Toy/include/toy/Dialect.h	Toy dialect + op/interface includes.
mlir/cuda-tile/Toy/include/toy/CMakeLists.txt	TableGen targets for Toy dialect/ops/interfaces.
mlir/cuda-tile/Toy/include/toy/AST.h	Toy AST node definitions.
mlir/cuda-tile/Toy/include/cuda_shim/SupportOps.hpp	Defines which Toy ops are considered GPU-eligible by the outlining pass.
mlir/cuda-tile/Toy/include/CMakeLists.txt	Adds Toy include subdirectory.
mlir/cuda-tile/Toy/cuda_wrapper/CMakeLists.txt	Builds the CUDA shim wrapper library.
mlir/cuda-tile/Toy/CMakeLists.txt	Builds `toy-cuda` tool and wires MLIR/CUDA Tile deps.
mlir/cuda-tile/CMakeLists.txt	Top-level CMake config for the cuda-tile MLIR/Toy project.
mlir/cuda-tile/.pre-commit-config.yaml	Pre-commit hooks config (clang-format/cmake-format/etc.).
mlir/cuda-tile/.gitignore	Ignores CUDA/LLVM build artifacts and generated binaries.
mlir/cuda-tile/.envsetup.sh	Conda activation helper script.
mlir/cuda-tile/.devcontainer/noop.txt	Placeholder to satisfy devcontainer COPY steps.
mlir/cuda-tile/.devcontainer/devcontainer.json	Devcontainer config for CUDA/MLIR development.
mlir/cuda-tile/.devcontainer/Dockerfile	Devcontainer image definition with LLVM/Clang/CUDA tooling.
mlir/cuda-tile/.clang-format	Clang-format configuration for this subproject.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

This PR adds a CUDA/GPU-oriented path to the mlir/cuda-tile Toy-based compiler flow, including a new matmul op, GPU outlining, and a pass to emit/embed CUDA Tile binaries, plus assorted scripts and sample MLIR/Toy programs to exercise the pipeline.

Changes:

Add Toy dialect/compiler extensions: matmul op + lowering, GPU outlining (toy.launch_gpu/toy.gpu_func), and a CUDA Tile emission/embedding pass.
Add build/sync scripts, devcontainer configuration, and VS Code configs to stand up a CUDA/MLIR development environment.
Add sample .toy/.mlir programs and CUDA shim/kernel sources demonstrating GPU execution.

Reviewed changes

Copilot reviewed 73 out of 73 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
mlir/cuda-tile/vscode/settings.json	VS Code CMake/C++ tooling configuration for the project.
mlir/cuda-tile/vscode/launch.json	VS Code debug launch configuration.
mlir/cuda-tile/vscode/cmake-kits.json	CMake Tools kit definition for the workspace.
mlir/cuda-tile/vscode/c_cpp_properties.json	IntelliSense configuration for the workspace.
mlir/cuda-tile/vscode/.zsh_history	Added shell history file (should not be committed).
mlir/cuda-tile/vscode/.initial_container.sh	Helper script to start a CUDA-enabled dev container.
mlir/cuda-tile/scripts/update.sh	Script to sync/update tutorial sources from LLVM examples.
mlir/cuda-tile/scripts/sync_deps.sh	Script to clone/sync LLVM + cuda-tile dependencies.
mlir/cuda-tile/scripts/patch/matmul.patch	Patch capturing `matmul`-related changes against upstream tutorial chapters.
mlir/cuda-tile/scripts/patch/matmul.back.patch	Alternate/back patch variant for `matmul` changes.
mlir/cuda-tile/scripts/make_patch.sh	Script to generate the chapter diff patch.
mlir/cuda-tile/scripts/build_deps.sh	Script to configure/build/install LLVM+MLIR with CUDA runner enabled.
mlir/cuda-tile/scripts/build_cuda_tile.sh	Script to build/install the external `cuda-tile` dependency.
mlir/cuda-tile/scripts/apply_patch.sh	Script to apply the provided chapter patch to a local copy.
mlir/cuda-tile/sample/validation.py	Numpy validation snippet for matmul results.
mlir/cuda-tile/sample/test.mlir	Sample MLIR module using CUDA shim calls.
mlir/cuda-tile/sample/matmul_numpy.py	Numpy reference implementation for Toy examples.
mlir/cuda-tile/sample/matmul.toy.mlir	Sample Toy-MLIR for a matmul/transpose case.
mlir/cuda-tile/sample/matmul.toy	Sample Toy source including matmul usage.
mlir/cuda-tile/sample/lowering-llvm.sh	Script to lower MLIR to LLVM dialect/IR and link with CUDA shim.
mlir/cuda-tile/sample/gpu.mlir	Sample MLIR using `toy.gpu_func` + `toy.launch_gpu`.
mlir/cuda-tile/sample/gpu-func.mlir	Expanded host-side CUDA shim sample for launching a kernel.
mlir/cuda-tile/sample/example.toy	Minimal Toy example program.
mlir/cuda-tile/sample/cuda-tile.mlir	Sample MLIR including a `cuda_tile.module` entry.
mlir/cuda-tile/explore/run.sh	Experimental script for GPU lowering to NVVM/LLVM IR.
mlir/cuda-tile/explore/outlined.mlir	Sample MLIR with GPU kernel outlining results.
mlir/cuda-tile/explore/gpu.mlir	Experimental MLIR showing `gpu.*` dialect usage.
mlir/cuda-tile/explore/extern_fun.mlir	Experimental MLIR for external/shim function calls.
mlir/cuda-tile/cuda_shim/vector_add.cu	CUDA kernel source used for PTX generation testing.
mlir/cuda-tile/cuda_shim/outlined_gpu_kernel.cu	CUDA kernels corresponding to outlined Toy GPU subgraphs.
mlir/cuda-tile/cuda_shim/load_ptx_main.cpp	Minimal C++ demo to load PTX and launch via shim ABI.
mlir/cuda-tile/build_with_conda.sh	Build helper for conda-based environments.
mlir/cuda-tile/build.sh	Build helper for non-conda environments.
mlir/cuda-tile/Toy/parser/AST.cpp	Toy AST dumper implementation (copied/ported from tutorial).
mlir/cuda-tile/Toy/mlir/ToyCombine.td	TableGen DRR patterns for Toy canonicalization.
mlir/cuda-tile/Toy/mlir/ToyCombine.cpp	C++ canonicalization patterns registration.
mlir/cuda-tile/Toy/mlir/ShapeInferencePass.cpp	Shape inference pass implementation.
mlir/cuda-tile/Toy/mlir/MLIRGen.cpp	MLIR generation updates including `matmul` emission.
mlir/cuda-tile/Toy/mlir/LowerToLLVM.cpp	Lowering pipeline from Toy/Affine/SCF to LLVM dialect.
mlir/cuda-tile/Toy/mlir/LowerToGpu.cpp	Pass to outline GPU-eligible Toy op subgraphs into `toy.gpu_func`.
mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp	Pass to write CUDA Tile bytecode, run `tileiras`, and annotate launches with CUDA binary metadata.
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.td	Shape inference op interface definition (TableGen).
mlir/cuda-tile/Toy/include/toy/ShapeInferenceInterface.h	Generated interface header inclusion wrapper.
mlir/cuda-tile/Toy/include/toy/Passes.h	Pass factory declarations (incl. GPU/cuda-tile passes).
mlir/cuda-tile/Toy/include/toy/Parser.h	Toy parser implementation header.
mlir/cuda-tile/Toy/include/toy/Ops.td	Toy ODS op definitions (adds `matmul`, `launch_gpu`, `gpu_func`).
mlir/cuda-tile/Toy/include/toy/MLIRGen.h	MLIRGen API header.
mlir/cuda-tile/Toy/include/toy/Lexer.h	Toy lexer header.
mlir/cuda-tile/Toy/include/toy/Dialect.h	Toy dialect + op/interface includes.
mlir/cuda-tile/Toy/include/toy/CMakeLists.txt	TableGen targets for Toy dialect/ops/interfaces.
mlir/cuda-tile/Toy/include/toy/AST.h	Toy AST node definitions.
mlir/cuda-tile/Toy/include/cuda_shim/SupportOps.hpp	Defines which Toy ops are considered GPU-eligible by the outlining pass.
mlir/cuda-tile/Toy/include/CMakeLists.txt	Adds Toy include subdirectory.
mlir/cuda-tile/Toy/cuda_wrapper/CMakeLists.txt	Builds the CUDA shim wrapper library.
mlir/cuda-tile/Toy/CMakeLists.txt	Builds `toy-cuda` tool and wires MLIR/CUDA Tile deps.
mlir/cuda-tile/CMakeLists.txt	Top-level CMake config for the cuda-tile MLIR/Toy project.
mlir/cuda-tile/.pre-commit-config.yaml	Pre-commit hooks config (clang-format/cmake-format/etc.).
mlir/cuda-tile/.gitignore	Ignores CUDA/LLVM build artifacts and generated binaries.
mlir/cuda-tile/.envsetup.sh	Conda activation helper script.
mlir/cuda-tile/.devcontainer/noop.txt	Placeholder to satisfy devcontainer COPY steps.
mlir/cuda-tile/.devcontainer/devcontainer.json	Devcontainer config for CUDA/MLIR development.
mlir/cuda-tile/.devcontainer/Dockerfile	Devcontainer image definition with LLVM/Clang/CUDA tooling.
mlir/cuda-tile/.clang-format	Clang-format configuration for this subproject.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlir/cuda-tile/vscode/.zsh_history

mlir/cuda-tile/vscode/settings.json

mlir/cuda-tile/CMakeLists.txt

mlir/cuda-tile/vscode/settings.json

mlir/cuda-tile/.envsetup.sh

mlir/cuda-tile/explore/run.sh

mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Alwaysproblem added 30 commits December 29, 2025 14:12

Inital commits

3679d7a

Make ch6 works

e3204ff

Added Matmul Toy Op

2b310d3

Added the validation for cuda tile

ca1bd69

Try gpu and failed

afe7997

update the build dep

88ce121

Change float type to FP32

8a0ba30

Added the GPU C API

8be2741

Added the GPU related operation

7e1f45c

Create the gpu outline pass

37800da

Added Affine pass code

fd70514

Added the EntryOp for cuda tile IR

60b8083

Sync the command line history

d12717f

Added devcontainer

bdb9a66

Added the make tensor view

07c722f

Sync

43b2739

Added the return, add, mul lowering

0a3ccf4

Added the pass that can compile cuda tile IR

f2d06f6

Verified the cuda shim API and POC is ready for the cuda shim

99596b0

sync the progress not finish

80e2354

upload file

034cc87

sync

5007d48

Sync: added the getglobal memref

b9cfa7e

Added the input and allocation for the cuda shim

dbf81a7

Tested with cuda 12.x cubin worked

96a122c

Move the helper function into cuda shim builder

b2e34d0

Tested on the GPU RTX4090 with cuda 12.x

45a5e01

Tested on the RTX4090

8ad69b7

Update ReadMe

5cdabd4

Tested on RTX5090

1d98f48

Test on RTX4090 after grid change

d41cea8

Alwaysproblem self-assigned this Mar 8, 2026

Copilot AI review requested due to automatic review settings March 8, 2026 02:08

Copilot started reviewing on behalf of Alwaysproblem March 8, 2026 02:08 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Alwaysproblem and others added 3 commits March 8, 2026 17:41

Update mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp

87330ea

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update mlir/cuda-tile/Toy/mlir/EmitCudaTile.cpp

c7b1673

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix the comments from github copilot

bcb8038

Alwaysproblem merged commit 8a28372 into main Mar 8, 2026
0 of 3 checks passed

Alwaysproblem deleted the cuda-tile branch March 8, 2026 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CUDA support and GPU operations for tensor processing#8

Implement CUDA support and GPU operations for tensor processing#8
Alwaysproblem merged 34 commits intomainfrom
cuda-tile

Alwaysproblem commented Mar 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alwaysproblem commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Alwaysproblem commented Mar 8, 2026 •

edited

Loading