๐Ÿงน Sweep์ด๋ž€?

0. Overview

๋‹จ 1%๋ผ๋„ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ๊ธฐ์œ„ํ•ด ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ถ€๋‹จํ•œ ๋…ธ๋ ฅ์„ ํ•œ๋‹ค. (ex. paperswithcode)

๊ฒฐ๊ตญ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€๋กœ ๋Œ์–ด์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ๋Š” Hyper-parameter๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉฐ ์ตœ์ ์˜ ๊ฐ’์„ ์ฐพ๊ธฐ์œ„ํ•ด ์ด์— ๋Œ€ํ•ด์„œ๋„ ๋ถ€๋‹จํžˆ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์—ฌ์•ผ ํ•˜์ง€๋งŒ, ์ด๋Š” ๋งค์šฐ ํ”ผ๊ณคํ•˜๊ณ  Cost๋ฅผ ๋งŽ์ด ๋“ค๊ฒŒ ๋งŒ๋“ ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ๋“ฑ์žฅํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ W&B์˜ Sweep์ด๋‹ค!

 

 

 

1. Sweep์ด๋ž€?

๊ธฐ๋ณธ์ ์œผ๋กœ Hyper-parameter๋ฅผ ์ž๋™์œผ๋กœ ์ตœ์ ํ™”์ฃผ๋Š” Tool

Hyper-parameter Seach๋ฐฉ์‹์œผ๋กœ ๋‹ค์Œ 3๊ฐ€์ง€๊ฐ€ ์กด์žฌํ•œ๋‹ค.

  • Grid ๋ฐฉ์‹
  • Random ๋ฐฉ์‹
  • Bayes ๋ฐฉ์‹

์„ ํƒํ•œ search ๋ฐฉ์‹์œผ๋กœ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์ด ์™„๋ฃŒ ๋˜๋ฉด WandB์˜ ์›น์—์„œ ์ œ๊ณต๋˜๋Š” dashboard๋กœ ์‹œ๊ฐํ™”๋œ ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด๋ ‡๊ฒŒ ์‹œ๊ฐํ™”๋œ ๋ชจ์Šต์€ ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค. Sweep์€ ์ž๋™์œผ๋กœ tuningํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ,

๊ฐ๊ฐ์˜ hyper parameter๋“ค์ด metric(accuracy, loss ๋“ฑ)์— ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ ์ง€ ์•Œ๋ ค์ฃผ๊ณ  ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ์— ํ•„์ˆ˜์ ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

(๋Œ€์‹œ๋ณด๋“œ)

 

 

 

 

 

๐Ÿงน Sweep ์‚ฌ์šฉ๋ฒ•โ—๏ธ

 

Tune Hyperparameters | Weights & Biases Documentation

Hyperparameter search and model optimization with W&B Sweeps

docs.wandb.ai

Sweep์€ ํ•„ํžˆ 2๊ฐœ์˜ ๋‹จ๊ณ„(Initialize the Sweep, Run the Sweep Agent)๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

 

 

1. Initialize the Sweep 

 โˆ™ Sweep Configuration๋ฅผ ์ •์˜

Sweep Initialize๋ฅผ ์œ„ํ•ด ๋จผ์ € ๊ตฌ์„ฑ์š”์†Œ(configuration)๋ฅผ ์ •์˜ํ•ด์•ผํ•œ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด required์™€ option์œผ๋กœ ๋‚˜๋‰œ๋‹ค.

 program(์–ด๋””์—์„œ) method(๋ฌด์—‡์„) parameters(์–ด๋–ป๊ฒŒ)
์ตœ์ ํ™”๋ฅผ ํ•  ๊ฒƒ์ธ์ง€ ์ •์˜ํ•ด์•ผํ•œ๋‹ค.

์ด๋•Œ, ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์œผ๋กœ 3๊ฐ€์ง€๊ฐ€ ์กด์žฌํ•œ๋‹ค.
  • Grid ๋ฐฉ์‹ : ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์กฐํ•ฉ ํƒ์ƒ‰ (= Cost↑)
  • Random ๋ฐฉ์‹ : randomํ•˜๊ฒŒ ์„ ํƒ (= Cost↓, opt์ฐพ์„ํ™•๋ฅ ↓)
  • Bayes ๋ฐฉ์‹ : ์ด์ „์— ์‹œ๋„ํ•œ hyper-parameter์กฐํ•ฉ์˜ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉ, ๋‹ค์Œ์‹œ๋„์กฐํ•ฉ ์ถ”๋ก ์‹œ ์‚ฌ์šฉ 
    → ๋ชจ๋ธ์„ฑ๋Šฅ์„ ์ตœ๋Œ€๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” hyper-parameter์กฐํ•ฉ์„ ์ฐพ๋Š”๋‹ค. (= ์ดˆ๊ธฐํƒ์ƒ‰์ด ๋Š๋ฆผ)

 


์ด๋•Œ, ํŠนํžˆ๋‚˜ parameters ํŒŒํŠธ๊ฐ€ ์ค‘์š”ํ•˜๊ธฐ์— ์ข€ ๋” ์‚ดํŽด๋ณด์ž.


values  value
Hyper-parameter์— ๋Œ€ํ•ด ํŠน์ • ๊ฐ’์„ ์„ค์ •ํ•ด์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฐ’๋งŒ ์„ ํƒํ•˜๊ฒŒ ํ•ด์คŒ.
(value๋Š” 1๊ฐ€์ง€ ๊ฐ’์„ ์„ค์ •ํ•ด์ค„ ๋•Œ ์‚ฌ์šฉ)


distribution
values์™€ ๋Œ€์กฐ๋˜๋Š” ๋ฐฉ์‹.
ํŠน์ • ๊ฐ’์„ ์„ค์ •ํ•˜๋Š” ๋Œ€์‹  ์›ํ•˜๋Š” ๋ถ„ํฌ ์•ˆ์—์„œ ๊ฐ’์„ ์„ ํƒ.
Sweep์—์„œ๋Š” uniform, normal, q_log_uniform๊ณผ ๊ฐ™์ด ๋‹ค์–‘ํ•œ ๋ถ„ํฌ๋ฅผ ์ œ๊ณต.
๋˜ํ•œ ์„ ํƒ๋œ ๋ถ„ํฌ๋ฅผ min, max์™€ mu, sigma, q๋ฅผ ํ†ตํ•ด ์ž์œ ๋กญ๊ฒŒ ๋ณ€ํ˜•๊ฐ€๋Šฅ.

min, max
๋ถ„ํฌ์˜ ์ตœ์†Œโˆ™์ตœ๋Œ€๊ฐ’์„ ์„ค์ •.

mu
  sigma

ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’, ์ •๊ทœ๋ถ„ํฌ(normal)์˜ ๋ชจ์–‘์„ ๊ฒฐ์ •.

q

Quantization์˜ ์•ฝ์ž๋กœ distribution์—์„œ ๋‚˜์˜จ ๊ฐ’ X๋ฅผ ์–‘์žํ™”.
ex) q๋ฅผ 2๋กœ ์„ค์ •ํ•œ๋‹ค๋ฉด X๋Š” 2์˜ ๋ฐฐ์ˆ˜๋กœ ๋ฐ”๋€œ.
(ex. ์‹ round(X / q) *q๋ฅผ ์ ์šฉํ•˜๋ฉด, -2.96์€ -2๋กœ 13.27์€ 14๋กœ 8.43์€ 8๋กœ ๋ฐ”๋€œ.)=

 โˆ™ project์— ์‚ฌ์šฉํ•˜๊ธฐ์œ„ํ•ด Sweep API๋กœ ์ดˆ๊ธฐํ™”

Sweep์˜ config๊ฐ€ ์ œ๋Œ€๋กœ ์ •์˜๊ฐ€ ๋๋‹ค๋ฉด ์ด์ œ ํ”„๋กœ์ ํŠธ์— ์ ์šฉ์„ ํ•ด์ค˜์•ผํ•œ๋‹ค.

 

sweep ์ดˆ๊ธฐํ™” ์ฝ”๋“œ:

sweep_id = wandb.sweep(config.sweep_config)


์œ„์—์„œ ์ •์˜๋œ config ๋ณ€์ˆ˜๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  sweep id๋ฅผ ์ถœ๋ ฅํ•ด์ค€๋‹ค.

์ด id๋Š” ๋‹ค์Œ step์—์„œ sweep์„ ์‹คํ–‰์‹œํ‚ฌ ๋•Œ ๊ณ ์œ ํ•œ identifier๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

 

 

 

 

 

 

2. Run the Sweep Agent

  • ํ•จ์ˆ˜๋‚˜ ํ”„๋กœ๊ทธ๋žจ์„ W&B์„œ๋ฒ„์—์„œ ์‹คํ–‰.

์ด์ œ ๋ณธ๊ฒฉ์ ์ธ ์‹คํ–‰๋งŒ์ด ๋‚จ์•˜๋‹ค.

์œ„์—์„œ ์ •์˜ํ•ด์ค€ configuration์„ ์‚ฌ์šฉํ•ด sweep์„ ์ง„ํ–‰ํ•˜์ž.

 

sweep ์ง„ํ–‰์ฝ”๋“œ:

wandb.agent(sweep_id, function=train, count=count)

์ด๋•Œ, ์œ„์—์„œ ์ถœ๋ ฅ๋œ sweep_id๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ค€๋‹ค.

๋˜ํ•œ, function์— ์šฐ๋ฆฌ๊ฐ€ ์ •์˜ํ•œ trainํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์ฃผ๊ณ 

sweep์„ ๋ช‡๋ฒˆ ์ง„ํ–‰ํ•  ์ง€ ์ˆซ์ž๋ฅผ count์— ์ž…๋ ฅํ•ด์ค€๋‹ค.

์„ฑ๊ณต์ ์œผ๋กœ Sweep ์‹คํ–‰์‹œ ์ถœ๋ ฅ๋˜๋Š” ํ™”๋ฉด.

 

 

 

 

 

cf). yaml ํŒŒ์ผ๋กœ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•.

project์™€ entity๋ฅผ ๊ธฐ์ž… ๊ฐ€๋Šฅํ•œ ๊ณณ

  • config ์„ค์ • ํ•˜๋Š” ํŒŒ์ผ (config.py ํ˜น์€ config.yaml)
  • wandb.sweep()
  • wandb.init()
  • wandb.agent()

config๋ฅผ .pyํŒŒ์ผ๋กœ ์ •์˜ํ•˜๋Š” ๋ฐฉ์‹๊ณผ .yamlํŒŒ์ผ๋กœ ์ •์˜ํ•˜๋Š” ๋ฐฉ์‹์ด ์กด์žฌ.yamlํŒŒ์ผ๋กœ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.


1. config.yaml ํŒŒ์ผ ์ƒ์„ฑ

 

 

 

 

2. . yaml ํŒŒ์ผ, Sweep์— ์ž…๋ ฅ

wandb sweep config.yaml

 

 

3. Sweep id๋ฅผ Agent์— ์ž…๋ ฅ

wandb agent SWEEP_ID

 

 

 

cf) wandb terminal์—์„œ ๋ช…๋ น์–ด๋กœ ์ง€์ •ํ•˜๊ธฐ.

โˆ™ sweep ํšŸ์ˆ˜ ์ œํ•œ

wandb agent --count [LIMIT_NUM] [SWEEPID]

 

 

โˆ™ Multi-GPU sweep ์‚ฌ์šฉ

CUDA_VISIBLE_DEVICES=0 wandb agent sweep_id
CUDA_VISIBLE_DEVICES=1 wandb agent sweep_id

 

 

 

 


 

 

 

๐Ÿงน W&B Sweep ์‹คํ–‰์„ ์œ„ํ•œ ์˜ˆ์‹œ์ฝ”๋“œ

from dataset import SweepDataset
from model import ConvNet
from optimize import build_optimizer
from utils import train_epoch

import wandb
import config

parser = argparse.ArgumentParser()
parser.add_argument('--batch-size', type=int, default=8, metavar='N')
parser.add_arguemnt('--epochs', type=int, default=10)                     
args = parser.parse_args()                   

wandb.config.update(args)

def train():
    wandb.init(config=config.hyperparameter_defaults)
    w_config = wandb.config

    loader = SweepDataset(w_config.batch_size, config.train_transform)
    model = ConvNet(w_config.fc_layer_size, w_config.dropout).to(config.DEVICE)
    optimizer = build_optimizer(model, w_config.optimizer, w_config.learning_rate)

    wandb.watch(model, log='all')

    for epoch in range(w_config.epochs):
        avg_loss = train_epoch(model, loader, optimizer, wandb)
        print(f"TRAIN: EPOCH {epoch + 1:04d} / {w_config.epochs:04d} | Epoch LOSS {avg_loss:.4f}")
        wandb.log({'Epoch': epoch, "loss": avg_loss, "epoch": epoch})     
        
sweep_id = wandb.sweep(config.sweep_config)
wandb.agent(sweep_id, train, count=2)

sweep์„ ์œ„ํ•œ config ํŒŒ์ผ์€ config.py์— ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ์„ค๋ช…ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1. hyper parameter์˜ ์ดˆ๊ธฐ๊ฐ’์„ wandb.init์— ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค.
2. w_config๋Š” sweep์„ ํ•  ๋Œ€์ƒ hyper parameter์ž…๋‹ˆ๋‹ค.
3. loader, model, optimizer ํ•จ์ˆ˜์— w_config๋ฅผ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ „๋‹ฌํ•ด์ค๋‹ˆ๋‹ค.
4. model์„ ์ •์˜ํ•˜๋ฉด wandb.watch ํ•จ์ˆ˜๋กœ gradient๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.
5. epoch ๋ณ„๋กœ ๋‚˜์˜ค๋Š” log๋ฅผ wandb.log์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
6. config ํŒŒ์ผ์— ์ •์˜ํ•ด๋‘” ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ wandb.sweep์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
7. wandb.sweep์—์„œ ๋‚˜์˜จ id์™€ ์œ„์— ๊ตฌํ˜„๋œ train ํ•จ์ˆ˜, ๊ทธ๋ฆฌ๊ณ  ํšŸ์ˆ˜๋ฅผ wandb.agent์— ์ž…๋ ฅํ•˜๊ณ  sweep์„ ์‹คํ–‰์‹œํ‚ต๋‹ˆ๋‹ค.

 

์ฐธ๊ณ ) https://pebpung.github.io/wandb/2021/10/10/WandB-2.html

 

 

 


๐Ÿงน Sweep ์‹œ๊ฐํ™”

WandB์‹คํ–‰ ์ดํ›„, ์‹œ๊ฐํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•ด๋ณด์ž. (๋Œ€์‹œ๋ณด๋“œ)

์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” Sweep workspace์˜ ๊ตฌ์„ฑ๋ฐฉ์‹์— ๋Œ€ํ•ด ์•Œ์•„๋ด์•ผํ•œ๋‹ค.

 

์ขŒ์ธก ๊ทธ๋ž˜ํ”„: y์ถ•์€ metric, X์ถ•์€ ์ƒ์„ฑ๋œ ๋‚ ์งœ๋ฅผ ์˜๋ฏธ.

์šฐ์ธก ํ‘œ: hyper parameter๊ฐ€ metric(accuracy, loss ๋“ฑ)์— ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ ์ง€์™€ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์–ด๋Š์ •๋„ ์ธ์ง€๋„ ์•Œ๋ ค์คŒ.

 


 

์œ„ ๊ทธ๋ฆผ์€ hyper-parameter์„ ํƒ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณด์—ฌ์ค€ ๊ทธ๋ฆผ์ด๋‹ค.

 โˆ™ X์ถ•: config์—์„œ ์„ค์ •ํ•œ hyper-parameter์˜ ์ข…๋ฅ˜

 โˆ™ y์ถ•: config์—์„œ ์„ค์ •ํ•œ hyper-parameter์˜ ๋ฒ”์œ„

 

์ถ”๊ฐ€์ ์œผ๋กœ ๋งˆ์šฐ์Šค๋ฅผ ๊ฐ€์ ธ๋‹ค ๋†“์œผ๋ฉด ํ•ด๋‹น ๊ทธ๋ž˜ํ”„์—์„œ์˜ ๊ฐ’์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

+ Recent posts