@koolaidman123 - Leaf-dance

koolaidman123@alien.top · 10 months ago

You’re doing something wrong. I’ve managed to reduce vram usage cost by >4x w lora on 7b llama models from 160gb vram to 40gb.

Performance is a separate issue, but thats the tradeoff for memory savings

koolaidman123@alien.top · 10 months ago

/r/learnmachinelearning

koolaidman123@alien.top · 10 months ago

One draw of keras that would get people to switch over would be how easy it is to model parallelism, but you’d need to get better mfu than fsdp/deepspeed, ideally competitive to megatron while being way easier and more flexible for people to switch

koolaidman123@alien.top · 10 months ago

If you look at something like evolinstruct data its so similar to humane al itd be a surprise if models trained on that data (or other synthetic data) dont perform well

As a rule of thumb i only generally trust base models (even then its iffy) on benchmarks and for finetuned models only by using it

koolaidman123@alien.top · 10 months ago

Read noam shazeers work, you have now caught up

koolaidman123@alien.top · 10 months ago

Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before

koolaidman123@alien.top · 10 months ago

A lot of reported architecture improvements disappear at scale, or end up having some contamination

Best way to see if it works is to release the code and have others tinker with it