PierroZ-PLKG@alien.topB to Machine Learning@academy.gardenEnglish · 10 months ago[D] Is grokking present in LLMs?message-squaremessage-square3fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1message-square[D] Is grokking present in LLMs?PierroZ-PLKG@alien.topB to Machine Learning@academy.gardenEnglish · 10 months agomessage-square3fedilinkfile-text
As said in the title I’m curious if grokking has been proven to happen with llm, could it be the case with gpt-4?
minus-squareyannbouteiller@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoAm I correct to say that “grokking” is apparently an effect of regularization, as in reaching good generalization performance from pushing the weights to be as small as possible until the model reaches a capacity that is smaller than the dataset?
Am I correct to say that “grokking” is apparently an effect of regularization, as in reaching good generalization performance from pushing the weights to be as small as possible until the model reaches a capacity that is smaller than the dataset?