• we_are_mammals@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    According to the scaling laws, the loss/error is approximated as

    w0 + w1 * pow(num_params, -w2) + w3 * pow(num_tokens, -w4)
    

    Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing num_params. In the limit, the corresponding term disappears, but the others do not.

    • liongalahad@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I’m not an expert by any means, just someone who is interested and reads AI news, but lately it seems like optimisation and efficiency work better than increasing parameters to improve performance of LLMs. And research is also clearly pointing at different architectures, other than transformers, to improve performance. I’d be surprised if GPT5 , which is 2-3 years away, will be just a mere development of GPT4, i.e. a LLM with many more parameters. These statements from Bill seem a little bit short sighted and contradictory to the general consensus.

      I am also aware of the Dunning-Krueger effect and how it may be tricking me into thinking I somewhat understand things I have no idea of lol