Under Review

LiME: Lightweight Mixture
of Experts

Expert specialization through lightweight modulation vectors and zero-parameter routing for efficient multimodal multi-task learning

Paper Code ๐Ÿค— Dataset
0.02โ€“0.57M
Trainable Params
4ร—
Fewer Parameters
4.52
Samples/s
47
Tasks

Method

How LiME Works

LiME replaces heavy per-expert parameters with lightweight modulation vectors and a zero-parameter router โ€” enabling efficient expert specialization at a fraction of the cost.

LiME Architecture

Figure 1. LiME architecture โ€” (a) Standard MoE-LoRA, (b) LiME shared adapter + modulation, (c) AutoTop-K routing, (d) Load balancing losses.

๐Ÿงฌ

Shared Adapter + Modulation

A single shared LoRA adapter modulated by lightweight per-expert vectors โ€” dramatically reducing parameters while preserving specialization.

โšก

Zero-Parameter Routing

Expert routing via n-gram hidden-state similarity โ€” no learned gating weights, no auxiliary parameters, no routing collapse.

๐ŸŽฏ

AutoTop-K Selection

Dynamically adjusts expert activation per token based on confidence thresholds, enabling adaptive computation.

โš–๏ธ

Load Balancing

Importance loss + KL-uniform loss ensures experts are utilized evenly, preventing dominant-expert collapse.


Results

Performance and Efficiency

LiME matches or outperforms existing MoE-PEFT methods with up to 4ร— fewer trainable parameters and higher throughput.

Results Table
Table 2. Average results across benchmark categories. LiME variants (highlighted) achieve top performance with significantly fewer parameters.
Efficiency Comparison
Figure 2. Efficiency comparison โ€” LiME achieves higher throughput, lower memory, and fewer trainable parameters across all variants.

Quick Start

Get Started in Minutes

Apply LiME to any pretrained model with just a few lines of code. Compatible with LLaVA-OneVision, Qwen2-VL, and more.

1

Install Dependencies

Clone the repository and install required packages.

2

Apply LiME to Your Model

Wrap any model with LiME using a single function call.

3

Train with LiMETrainer

Use our custom trainer with specialized learning rates for MoE and PEFT components.

apply_lime.pyPython
from LiMELoRA import apply_peft
from transformers import LlavaOnevisionForConditionalGeneration

# Load your base model
model = LlavaOnevisionForConditionalGeneration.from_pretrained(
    "llava-onevision-qwen2-7b-ov-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# โญ Apply LiME โ€” one line is all you need
model = apply_peft(
    model,
    targets=["q_proj", "k_proj", "v_proj", "o_proj", "out_proj"],
    num_experts=4,
    rank=2,
    use_shared_LiME=True,
    n_gram=1,
    top_k=1,
    rep_mode="token",
    jitter_noise=0.1,
    tokenizer=processor.tokenizer,
    temperature=0.5,
    gamma_routing=0.7,
    auto_topk=True,
    auto_topk_threshold=0.5,
)
train.pyPython
from trainer import LiMEArguments, LiMETrainer

training_args = LiMEArguments(
    output_dir="./llava-lime-finetuned",
    per_device_train_batch_size=5,
    gradient_accumulation_steps=4,
    num_train_epochs=4,
    bf16=True,
    learning_rate=2e-4,
    moe_lr=1e-3,          # Dedicated LR for routing components
    peft_lr=4e-4,         # Dedicated LR for LoRA A/B
    importance_coef=0.1,
    kl_coef=0.01,
    balance_every_n_steps=50,
)

trainer = LiMETrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    tokenizer=processor.tokenizer,
    data_collator=collator,
)

trainer.train()

Dataset

MMT-47 Benchmark

A comprehensive multimodal multi-task benchmark with 47 tasks spanning vision, language, reasoning, and video understanding.

Download ImagesBash
# Download the HuggingFace dataset
pip install datasets
python -c "from datasets import load_dataset; load_dataset('Kowsher/MMT-47')"

# Download images
huggingface-cli download \
  Kowsher/MMT-47 \
  --repo-type dataset \
  --include "images/*" \
  --local-dir images/

# Extract images (images.zip is a zip file)
cd images && unzip images.zip && cd ..
Download Videos (from MVTamperBench)Bash
# Download video data from MVTamperBench
huggingface-cli download \
  Srikant86/MVTamperBench \
  --repo-type dataset \
  --include "video/*" \
  --local-dir videos/

# Extract all video zip files
cd videos/
for f in *.zip; do
  d="${f%.zip}"
  if [ -d "$d" ]; then
    echo "Skipping $f (already extracted)"
  else
    echo "Extracting $f"
    unzip "$f" -d "$d"
  fi
done
cd ..
โš ๏ธ
License Notice: MMT-47 is a curated benchmark that aggregates data from multiple existing datasets, each governed by its own license. By using MMT-47, you agree to respect and comply with the individual license terms of every constituent dataset. Please review the original dataset licenses before using the data for any purpose. See the dataset card on HuggingFace for the full list of sources and their respective licenses.

Citation

Cite LiME

If you find LiME useful in your research, please consider citing our paper.

@inproceedings{lime2026, title = {LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning}, author = {[Authors]}, booktitle = {arxiv}, year = {2026} }