• vatsadev@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,

    • The 1b model was good, and is still good after several newer gens
    • the 7b was capable pre llama 2
    • 40b and 180b were never as good
  • koolaidman123@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before