[PEFT]: Parameter Efficient Fine-Tuning


PEFT๋ž€?

PLMs๋ฅผ specific task์— ์ ์šฉํ•  ๋•Œ, ๋Œ€๋ถ€๋ถ„์˜ Parameter๋ฅผ freezeโ„๏ธ, ์†Œ์ˆ˜์˜ parameter๋งŒ FTํ•˜๋Š” ๊ธฐ๋ฒ•.
PEFT๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ + #parameter↓๊ฐ€ ๊ฐ€๋Šฅํ•จ.
๋˜ํ•œ, catastrophic forgetting๋ฌธ์ œ ์œ„ํ—˜๋„ ๋˜ํ•œ ๋‚ฎ์Œ.
๐Ÿค—Huggingface์—์„œ ์†Œ๊ฐœํ•œ ํ˜์‹ ์  ๋ฐฉ๋ฒ•์œผ๋กœ downstream task์—์„œ FT๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ๋จ.

Catastrophic Forgetting์ด๋ž€?

์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋ ๋•Œ, ๊ธฐ์กด์— ํ•™์Šตํ•œ ์ผ๋ถ€์˜ ์ง€์‹์— ๋Œ€ํ•ด์„œ๋Š” ๋ง๊ฐ์„ ํ•˜๊ฒŒ ๋˜๋Š” ํ˜„์ƒ



Main Concept

  • Reduced Parameter Fine-tuning(์ถ•์†Œ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ ํŒŒ์ธํŠœ๋‹)
    ์‚ฌ์ „ ํ•™์Šต๋œ LLM ๋ชจ๋ธ์—์„œ ๋Œ€๋‹ค์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ ์ •ํ•ด ์†Œ์ˆ˜์˜ ์ถ”๊ฐ€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์ 
    ์„ ํƒ์  ํŒŒ์ธํŠœ๋‹์œผ๋กœ ๊ณ„์‚ฐ์  ์š”๊ตฌ๊ฐ€ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๊ฐ์†Œํ•˜๋Š” ํšจ๊ณผ
  • Overcoming Catastrophic Forgetting(์น˜๋ช…์  ๋ง๊ฐ ๋ฌธ์ œ ๊ทน๋ณต)
    Catastrophic Forgetting ๋ฌธ์ œ๋Š” LLM ๋ชจ๋ธ ์ „์ฒด๋ฅผ ํŒŒ์ธ ํŠœ๋‹ํ•˜๋Š” ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํ˜„์ƒ์ธ๋ฐ, PEFT๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์น˜๋ช…์  ๋ง๊ฐ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Œ
    PEFT๋ฅผ ํ™œ์šฉํ•˜๋ฉด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์ƒํƒœ์˜ ์ง€์‹์„ ๋ณด์กดํ•˜๋ฉฐ ์ƒˆ๋กœ์šด downstream task์— ๋Œ€ํ•ด ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ
  • Application Across Modalities(์—ฌ๋Ÿฌ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ ์šฉ ๊ฐ€๋Šฅ)
    PEFT๋Š” ๊ธฐ์กด ์ž์—ฐ์–ด์ฒ˜๋ฆฌ(Natural Language Process: NLP) ์˜์—ญ์„ ๋„˜์–ด์„œ ๋‹ค์–‘ํ•œ ์˜์—ญ์œผ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅํ•จ
    ์Šคํ…Œ์ด๋ธ” ๋””ํ“จ์ „(stable diffusion) ํ˜น์€ Layout LM ๋“ฑ์˜ ํฌํ•จ๋œ ์ปดํ“จํ„ฐ ๋น„์ „(Computer Vision: CV) ์˜์—ญ,
    Whisper๋‚˜ XLS-R์ด ํฌํ•จ๋œ ์˜ค๋””์˜ค ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋งˆ๋‹ฌ๋ฆฌํ‹ฐ์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋จ
  • Supported PEFT Methods(์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ PEFT)
    ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ๋‹ค์–‘ํ•œ PEFT ๋ฐฉ๋ฒ•์„ ์ง€์›ํ•จ
    LoRA(Low-Rank Adaption), Prefix Tuning, ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹ ๋“ฑ ๊ฐ๊ฐ์˜ ๋ฐฉ๋ฒ•์€ ํŠน์ •ํ•œ ๋ฏธ์„ธ ์กฐ์ • ์š”๊ตฌ ์‚ฌํ•ญ๊ณผ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋งž๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋จ

 

 

 

The output activations original (frozen) pretrained weights (left) are augmented by a low rank adapter comprised of weight matrics A and B (right).

์‚ฌ์ „ํ•™์Šต๊ฐ€์ค‘์น˜(โ„๏ธ)์˜ output activation์€ weight matrix์ธ A, B๋กœ ๊ตฌ์„ฑ๋œ LoRA์— ์˜ํ•ด ์ฆ๊ฐ€๋œ๋‹ค.

 

[Q-LoRA]: Quantized-LoRA

Q-LoRA๋ž€?

2023๋…„ 5์›” NeurIPS์—์„œ ์–‘์žํ™”์™€ LoRA๋ฅผ ํ•ฉ์ณ "A6000 ๋‹จ์ผ GPU๋กœ 65B๋ชจ๋ธ ํŠœ๋‹์ด ๊ฐ€๋Šฅ"ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ๋ฐœํ‘œํ•จ.
QLoRA๋Š” ๊ฒฐ๊ตญ ๊ธฐ์กด์˜ LoRA์— ์ƒˆ๋กœ์šด quantization์„ ๋”ํ•œ ํ˜•ํƒœ์ด๋‹ค.
๋ฒ ์ด์Šค ๋ชจ๋ธ์ธ PLM์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์–ผ๋ฆฌ๊ณ (frozen), LoRA ์–ด๋Œ‘ํ„ฐ์˜ ๊ฐ€์ค‘์น˜๋งŒ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๊ฒŒ(trainable)ํ•˜๋Š” ๊ฒƒ์€ LoRA์™€ ๋™์ผํ•˜๋ฉฐ, frozen PLM์˜ ๊ฐ€์ค‘์น˜๊ฐ€ '4๋น„ํŠธ๋กœ ์–‘์žํ™”'๋˜์—ˆ๋‹ค๋Š” ์ •๋„๊ฐ€ ๋‹ค๋ฅธ ์ ์ด๋‹ค.
๋•Œ๋ฌธ์—, QLoRA์—์„œ ์ฃผ์š”ํžˆ ์ƒˆ๋กœ ์†Œ๊ฐœ๋˜๋Š” ๊ธฐ์ˆ (Main Contribution)์€ ์–‘์žํ™” ๋ฐฉ๋ฒ•๋ก ์ด ์ฃผ๊ฐ€ ๋œ๋‹ค๋Š” ์‚ฌ์‹ค์ด๋‹ค.

์–‘์žํ™”๋ž€?

weight์™€ activation output์„ ๋” ์ž‘์€ bit๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜๋„๋ก ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ.
์ฆ‰, data์ •๋ณด๋ฅผ ์•ฝ๊ฐ„์ค„์ด๊ณ , ์ •๋ฐ€๋„๋Š” ๋‚ฎ์ถ”์ง€๋งŒ
"์ €์žฅ ๋ฐ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ์—ฐ์‚ฐ์„ ๊ฐ์†Œ์‹œ์ผœ ํšจ์œจ์„ฑ์„ ํ™•๋ณด"ํ•˜๋Š” ๊ฒฝ๋Ÿ‰ํ™” ๋ฐฉ๋ฒ•๋ก ์ด๋‹ค.



How to Use in MLLMs...?

๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ MLLMs์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ? MLLMs๋Š” ๋งค์šฐ ์ข…๋ฅ˜๊ฐ€ ๋งŽ์ง€๋งŒ, ๊ฐ€์žฅ ์‰ฌ์šด ์˜ˆ์ œ๋กœ VLMs๋ฅผ ๋“ค์–ด๋ณด์ž๋ฉด,
Q-LoRA ๋ฐ LoRA๋Š” PEFT๋ฐฉ๋ฒ•๋ก ์ด๊ธฐ์— ์ด๋Š” LLMs, MLLMs๋ชจ๋‘ ํ†ต์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
๊ทธ๋ ‡๊ธฐ์— VLMs(Vision Encoder + LLM Decoder)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์„ค๋ช…ํ•ด๋ณด์ž๋ฉด:

  • ์–ธ์–ด์  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”์‹œํ‚ค๊ณ  ์‹ถ๋‹ค๋ฉด, LLM๋งŒ PEFT๋ฅผ ์ง„ํ–‰.
  • ์‹œ๊ฐ์  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”์‹œํ‚ค๊ณ  ์‹ถ๋‹ค๋ฉด, Vision Encoder๋งŒ PEFT๋ฅผ ์ง„ํ–‰.
  • ๋‘ ๋Šฅ๋ ฅ ๋ชจ๋‘ ๊ฐ•ํ™”์‹œํ‚ค๊ณ  ์‹ถ๋‹ค๋ฉด, Encoder, Decoder ๋ชจ๋‘ PEFT๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด ๋œ๋‹ค.

Reference Code:

 

A Definitive Guide to QLoRA: Fine-tuning Falcon-7b with PEFT

Unveiling the Power of QLoRA: Comprehensive Explanation and Practical Coding with ๐Ÿค— PEFT

medium.com

 

 

Finetuning Llama2 with QLoRA — TorchTune documentation

Finetuning Llama2 with QLoRA In this tutorial, we’ll learn about QLoRA, an enhancement on top of LoRA that maintains frozen model parameters in 4-bit quantized precision, thereby reducing memory usage. We’ll walk through how QLoRA can be utilized withi

pytorch.org

 

 

์ฐธ๊ณ : https://github.com/V2LLAIN/Transformers-Tutorials/blob/master/qlora_baseline.ipynb

 

Transformers-Tutorials/qlora_baseline.ipynb at master · V2LLAIN/Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace. - V2LLAIN/Transformers-Tutorials

github.com

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Deepspeed๋ž€?

# finetune_qlora.sh

deepspeed ovis/train/train.py \
        --deepspeed scripts/zero2.json \
        ...โ€‹


๋ฌผ๋ก  ๋‚˜๋งŒ์˜ ๋ฐฉ๋ฒ•์„ ๊ณ ์ˆ˜ํ•˜๋Š”๊ฒƒ๋„ ์ข‹์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ user๋“ค์ด ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š”๊ฑธ ๋ด์„œ๋Š” ์ผ๋‹จ ์•Œ์•„๋†“๋Š”๊ฒŒ ์ข‹์„ ๊ฒƒ ๊ฐ™๊ธฐ์— ์•Œ์•„๋ณด๊ณ ์žํ•œ๋‹ค.


deepspeed...?

๋ชจ๋ธ์˜ training, inference์†๋„๋ฅผ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ฒŒ ๋„์™€์ฃผ๋Š” Microsoft์‚ฌ์˜ ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

ํ•™์Šต device ์ข…๋ฅ˜:

  • CPU
    Single GPU
    1 Node, Multi GPU
    Multi Node, Multi GPU --> ๋งค์šฐ ํฐ GPT4 ๋“ฑ์˜ ํ•™์Šต์„ ์œ„ํ•ด ์‚ฌ์šฉ๋จ.

๋ถ„์‚ฐํ•™์Šต ๋ฐฉ์‹:

  • Data Parallel: ํ•˜๋‚˜์˜ device๊ฐ€ data๋ฅผ ๋‚˜๋ˆ„๊ณ , ๊ฐ device์—์„œ ์ฒ˜๋ฆฌ๊ฒฐ๊ณผ๋ฅผ ๋ชจ์•„ ๊ณ„์‚ฐ
    --> ํ•˜๋‚˜์˜ device๊ฐ€ ๋‹ค๋ฅธ device์— ๋น„ํ•ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋งŽ์•„์ง€๋Š”, ๋ฉ”๋ชจ๋ฆฌ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค!
  • Distributed Data Parallel: ๊ฐ๊ฐ์˜ device๋ฅผ ํ•˜๋‚˜์˜ Process๋กœ ๋ณด๊ณ , ๊ฐ process์—์„œ ๋ชจ๋ธ์„ ๋„์›Œ์„œ ์‚ฌ์šฉ.
    ์ด๋•Œ, ์—ญ์ „ํŒŒ์—์„œ๋งŒ ๋‚ด๋ถ€์ ์œผ๋กœ gradient๋ฅผ ๋™๊ธฐํ™” --> ๋ฉ”๋ชจ๋ฆฌ ๋ถˆ๊ท ํ˜•๋ฌธ์ œโŒ


cf) Requirements:

- PyTorch must be installed before installing DeepSpeed.
- For full feature support we recommend a version of PyTorch that is >= 1.9 and ideally the latest PyTorch stable release.
- A CUDA or ROCm compiler such as nvcc or hipcc used to compile C++/CUDA/HIP extensions.
- Specific GPUs we develop and test against are listed below, this doesn't mean your GPU will not work if it doesn't fall into this category it's just DeepSpeed is most well tested on the following:
        NVIDIA: Pascal, Volta, Ampere, and Hopper architectures
        AMD: MI100 and MI200



pip install deepspeed

๋กœ ์„ค์น˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์‚ฌ์šฉ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.


์‚ฌ์šฉ๋ฐฉ๋ฒ•:
Step1)
deepspeed์‚ฌ์šฉ์„ ์œ„ํ•œ Config.jsonํŒŒ์ผ ์ž‘์„ฑ

{
	"train_micro_batch_size_per_gpu": 160,
    "gradient_accumulation_steps": 1,
    "optimizer":
    {
    	"type": "Adam",
        "params":
        {
        	"lr": 0.001
        }
    },
    "zero_optimization":
    {
        "stage": 1,
        "offload_optimizer":
        {
            "device": "cpu",
            "pin_memory":true
        },
        "overlap_comm": false,
        "contiguous_gradients": false
    }
}

config Args:https://www.deepspeed.ai/docs/config-json/

Step2) import & read json

import deepspeed
from deepspeed.ops.adam import DeepSpeedCPUAdam

with open('config.json', 'r') as f:
	deepspeed_config = json.load(f)



Step3) optimizer ์„ค์ • & model,optimizer ์ดˆ๊ธฐํ™”

optimizer = DeepSpeedCPUAdam(model.parameters(), lr=lr)

model, optimizer, _, _ = deepspeed.initialize(model=model,
                                            model_parameters=model.parameters(),
                                            optimizer=optimizer,
                                            config_params=deepspeed_config)


cf) ArgumentParser์— ์ถ”๊ฐ€ํ•˜๋Š”๊ฒƒ๋„ ๊ฐ€๋Šฅํ•จ!

parser = argparse.ArgumentParser()
parser.add_argument('--local_rank', type=int, default=-1)

parser = deepspeed.add_config_arguments(parser)




Step4) Train!

# >> train.py
deepspeed --num_gpus={gpu ๊ฐœ์ˆ˜} train_deepspeed.py
# train.sh
deepspeed --num_gpus={gpu ๊ฐœ์ˆ˜} train_deepspeed.py

# >> bash train.sh


์ฃผ์˜ !)

DeepSpeed๋Š” CUDA_VISIBLE_DEVICES๋กœ ํŠน์ • GPU๋ฅผ ์ œ์–ดํ•  ์ˆ˜ ์—†๋‹ค!
์•„๋ž˜์™€ ๊ฐ™์ด --include๋กœ๋งŒ ํŠน์ • GPU๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

deepspeed —inlcude localhost:<GPU_NUM1>, <GPU_NUM2> <python_file.py>


  • gpu node์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก deepspeed์˜ ์žฅ์ ์ธ ํ•™์Šต ์†๋„๊ฐ€ ๋นจ๋ผ์ง„๋‹ค!
 

DeepSpeed Configuration JSON

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

www.deepspeed.ai

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

+ Recent posts