Skip to content

feat: Add LoRA (Low-Rank Adaptation) support for efficient model fine-tuning#108

Open
chen2021673 wants to merge 7 commits intomasterfrom
add_lora
Open

feat: Add LoRA (Low-Rank Adaptation) support for efficient model fine-tuning#108
chen2021673 wants to merge 7 commits intomasterfrom
add_lora

Conversation

@chen2021673
Copy link
Contributor

@chen2021673 chen2021673 commented Feb 12, 2026

Summary

Added LoRA (Low-Rank Adaptation) support for parameter-efficient fine-tuning. This feature significantly reduces the number of trainable parameters through low-rank decomposition, enabling efficient fine-tuning of large models.

Changes

New Features

LoRA Infrastructure (infini_train/include/nn/lora/):

  • lora_config.h/cc - LoRA configuration (rank, alpha, dropout)
  • lora_linear.h/cc - LoRA linear layer wrapper
  • lora_model.h/cc - Multi-LoRA layer management
  • lora_parallel_linear.h/cc - Tensor parallelism support
  • lora_utils.h/cc - Utility functions

Tests:

  • test/lora/test_lora.cc - Unit tests

Documentation:

  • docs/lora_usage.md - Usage documentation

Examples:

  • example/gpt2/main.cc - Added LoRA training example

Build:

  • CMakeLists.txt - Added test_lora build target

Test Result

精度:
image
性能:
image
llama3 运行结果对比:
image
img_v3_02vg_653a27ac-7db3-4718-8777-56ea489152ag

chen2021673 and others added 5 commits February 12, 2026 09:11
- Add LoRA module infrastructure with configurable rank, alpha, dropout
- Implement LoRALinear wrapper for seamless integration with Linear layers
- Support tensor parallelism via LoRAParallelLinear
- Add LoRAModel utility for managing multiple LoRA layers
- Integrate LoRA configuration and utilities
- Add GPT2 example demonstrating LoRA fine-tuning
- Include comprehensive usage documentation and test suite

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Refactor LoRA config construction with proper target module parsing
- Add GetLoRAModel for in-place LoRA layer injection
- Fix DDP reducer to correctly handle LoRA parameters
- Fix RowParallel/ColumnParallel LoRA input handling to match base module behavior
- Add shape-based defensive checks for TP/SP consistency
- Move TP/SP communication helper function declarations to utils.h
- Move getter implementations from header to .cc file
- Add unit test for SaveLoRAWeights/LoadLoRAWeights functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// LoRA A: [rank, in_features] - replicated across TP ranks (implemented as Linear)
// LoRA B: [out_features_per_partition, rank] - sharded like base weight (implemented as ColumnParallelLinear with
// gather_output)
class LoRAColumnParallelLinear : public nn::CloneableModule<LoRAColumnParallelLinear> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉是不是可以继承自原 ColumnParallelLinear,篇幅上可以省一些基类的成员定义和 getter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不建议继承,因为LoRA 是在原 Linear 上叠加增量方法(典型decorator,LoRA 不管并行细节,把通信留给 base),不是一种新的 ColumnParallelLinear,此时组合优于继承。继承会让基类和 base_module_ 各自维护一套 weight / flags,容易不一致。

continue;
}

if (type == Linear::kType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件里从这里开始有比较多这种三个 if 判断,但实际上就是一个 class name 的差异的代码,感觉可以采取一些更优雅的写法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里最多只能再抽一个公共函数减少每个if分支里的内容,因为type是运行期确定的,不能再用模板简化分支数量了。但这里逻辑应该不会再有增量了,我认为可以接受

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants