/r/learnmachinelearning
/r/learnmachinelearning
One draw of keras that would get people to switch over would be how easy it is to model parallelism, but you’d need to get better mfu than fsdp/deepspeed, ideally competitive to megatron while being way easier and more flexible for people to switch
If you look at something like evolinstruct data its so similar to humane al itd be a surprise if models trained on that data (or other synthetic data) dont perform well
As a rule of thumb i only generally trust base models (even then its iffy) on benchmarks and for finetuned models only by using it
Read noam shazeers work, you have now caught up
Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before
A lot of reported architecture improvements disappear at scale, or end up having some contamination
Best way to see if it works is to release the code and have others tinker with it
You’re doing something wrong. I’ve managed to reduce vram usage cost by >4x w lora on 7b llama models from 160gb vram to 40gb.
Performance is a separate issue, but thats the tradeoff for memory savings