Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
-
Updated
Jul 6, 2025 - Python
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
A high-performance, memory-efficient healthcare framework that deploys fine-tuned Large Language Models (LLMs) on edge devices. Multi-agent system to provide personalized diagnostic reasoning, health education, and dietary planning.
On-device AI 민원 처리 및 분석 시스템 | LLM 경량화 & 파인튜닝 | 현장미러형 연계 프로젝트 - 산업체 수요 기반 현장 실무 역량 강화
Let me make GGUF files quickly
Local & lightweight LLM inference runtime in C++ with support for GGUF & quantization
Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.
Enable expert-level, multi-step diagnostic reasoning in Claude Code with an easy-to-use skill for clear and explainable AI diagnosis.
Implementation of advanced Natural Language Processing architectures and optimization techniques, built from scratch. The projects focus on understanding the internal mechanics of Transformers, LLM efficiency through quantization, and scaling via Mixture-of-Experts (MoE).
Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.
Add a description, image, and links to the llm-quantization topic page so that developers can more easily learn about it.
To associate your repository with the llm-quantization topic, visit your repo's landing page and select "manage topics."