wojcech@alien.topB to Machine Learning@academy.gardenEnglish · 10 months ago[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Modelsarxiv.orgexternal-linkmessage-square30fedilinkarrow-up11arrow-down10cross-posted to: hackernews@lemmy.smeargle.fanshackernews@derp.foomachinelearning@lemmit.online
arrow-up11arrow-down1external-link[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Modelsarxiv.orgwojcech@alien.topB to Machine Learning@academy.gardenEnglish · 10 months agomessage-square30fedilinkcross-posted to: hackernews@lemmy.smeargle.fanshackernews@derp.foomachinelearning@lemmit.online
minus-squareStartledWatermelon@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoGPT-4: 1.76 trillion parameters, about 6.5* trillion tokens in the dataset. could be twice that, the leaks weren’t crystal clear. The above number is more likely though.
GPT-4: 1.76 trillion parameters, about 6.5* trillion tokens in the dataset.