• 0 Posts
  • 15 Comments
Joined 5 months ago
cake
Cake day: October 15th, 2024

help-circle





  • Arkthos@pawb.socialtoPrivacy@lemmy.worldPlease, don't!
    link
    fedilink
    English
    arrow-up
    6
    ·
    14 days ago

    You can offload them into ram. The response time gets way slower once this happens, but you can do it. I’ve run a 70b llama model on my 3060 12gb at 2 bit quantisation (I do have plenty of ram so no offloading from ram to disk at least lmao). It took like 6-7 minutes to generate replies but it did work.








  • Thankfully there is often a pretty big difference between studying and working.

    I found there to be a level of stress in my studies that I never had a problem with later. An idea that any moment not spent pouring over books was contributing, at least in my mind, to inevitable failure; doubly so with exams looming ahead.

    For me finishing my engineering degree was such a massive relief and work is so much better. I’m in anon’s boat.