Learning Rate Finder
Suggested learning rates based on model size and batch size.
Baseline (3e-4)3.00e-4
Scaled by model size4.01e-5
Scaled by batch size3.00e-4
Combined scaling4.01e-5
Suggested range to search1.34e-5 → 1.20e-4
Rules of thumb: Smaller models → higher LR. Larger batches → higher LR. Always validate with warmup + cosine decay.