[R]eading List for Andrej Karpathy’s “Busy person’s intro to Large Language Models” Video

FallMindless3563@alien.top · 10 months ago

[R]eading List for Andrej Karpathy’s “Busy person’s intro to Large Language Models” Video

Maykey@alien.top · 10 months ago

I haven’t watch the talk, but I think the reading list should have some love for SSM. (S4, S5, H3): on one hand their variants are very prominent on long range arena on other they are relatively “unknown”.

They are not unknown to researchers seeing how many variants there are, but there are hundreds more videos and blogs explaining transformers. If you find a course about LLM, it will likely include Transformers but not SSM, so I think their success in LRA and absence in learning materials qualifies them for “dive in deeper” list.

derpgod123@alien.top · 10 months ago

Only papers to read no books?

FallMindless3563@alien.top · 10 months ago

The only book he explicitly mentions is “Thinking Fast and Slow” by Daniel Kahneman, but I think there are a ton of books that would be great resources along side the papers. I just happened to pull a lot of the papers from the footnotes and concepts he mentioned.

um-xpto@alien.top · 10 months ago

Nice! Thank you for your work.

Regarding the video.

Q1) minute 14:14 Finetuning into an Assistant, when you have multiple tasks / datasets with diverse outputs how is training performed ? Are all datasets combined in a single training ? Or Is finetuning done over a previous finetuning ? Or the question is parsed and sent to a specific model ?

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

Disastrous_Elk_6375@alien.top · 10 months ago

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

The naive way is to use langchain, but that’s hit and miss for several reasons, and whatever you build will be held together by duct tape and prayers. Alternative frameworks include Haystack and Griptape.

I’ve found that for local models the best tool-usage you can get is by using an advanced control library. This gives you a lot of flexibility in organising the prompts and “helping” the local models a lot. Guidance and LMQL are two such libraries.

um-xpto@alien.top · 10 months ago

Thanks. Guidance seems a good fit I’ll start looking for more info.

FallMindless3563@alien.top · 10 months ago

You certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.

akardashian@alien.top · 10 months ago

thanks for compiling!

coumineol@alien.top · 10 months ago

Thanks but here’s the problem with this list: most of the papers mentioned are on a very high technical level, and people who would be able to understand them are probably people who have already read them. Note that Andrej was careful to keep the material at a certain level because he addresses those who want to go one step further than talking to ChatGPT, without necessarily understanding all the underlying theory.

teryret@alien.top · 10 months ago

Right, that’s why OP prefaced with “to dive deeper into a lot of the topics”. If folks aren’t at a point where diving deeper makes sense, it’s not a list for them. There are plenty of resources for any given level of understanding, obviously no list is going to be appropriate for every member of a diverse community.

coumineol@alien.top · 10 months ago

Not to start an argument here but I can’t imagine anybody with any level of understanding who should start diving deeper by reading the “Attention is All You Need” paper. Yes, this is a diverse community, but when you try to address everybody’s needs, you usually end up with addressing nobody’s needs.

eek04@alien.top · 10 months ago

Since “Attention is All You Need” is fairly high on my reading list for understanding the details of transformer architecture, what do you recommend instead?

coumineol@alien.top · 10 months ago

https://arxiv.org/abs/2106.04554

If you’re trying to learn more about language models don’t bother with anything written before 2020. That’s basically the Stone Age.

eek04@alien.top · 10 months ago

Thank you!

whymauri@alien.top · 10 months ago

Just me, but I think of busy coworkers with great background in math/stats and ‘classic’ ML who would ramp up quickly from a list like this. When I onboarded chemists (PhDs) to my ML team at a drug startup, I would send them a similarly dense reading list. With their strong background in physics, it would take them two weeks flat to understand the necessary theory and jargon to be productive (in our niche field).

coumineol@alien.top · 10 months ago

Didn’t mean to say those papers are completely useless, but even for those with a strong Math/ML background I would advise starting with recent survey papers. Reading “Attention is All You Need” is kind of like reading the General Relativity papers of Einstein - cool as a historical curiosity, but not ideal for optimizing expertise acquisition.

lakolda@alien.top · 10 months ago

Some of the content also seems to allude to what Q* might be…

currentscurrents@alien.top · 10 months ago

It really doesn’t, because no one has any clue what Q* is or if it’s even real.

lakolda@alien.top · 10 months ago

Ever hear the term might?