Hey guys,

What’s currently the best LLM for low-VRAM machines with only 6 GB VRAM? I’ve got 32GB RAM as well.

I’m experimenting a little with SillyTavern and I’m curious which model gets the most out of my setup. Should be multilingual and suitable for “casual chatting”.

I know I will probably not get very far with this, but I’m still interested in how far we’ve already come.

(Using KoboldCPP if that matters).

~sp3ctre

  • Denixen@feddit.nu
    link
    fedilink
    English
    arrow-up
    5
    ·
    5 days ago

    My setup is a laptop with 8 GB vram and 16 gb ram.

    I have been using ministral 3b (fast) and 14b (slower but somewhat smarter/capable) via ollama. They work remarkably well considering how small they are.

    I have been using it as a text translator, summarizer and assistant for discussing more basic things, including integrating it in pycharm using the ollama assist plugin as a coding assistant.

    For autocomplete in pycharm I have to use llama 3.1 8b, since ministral cannot do autocomplete (?).

    I can recommend ministral, Mistral are really great at creating small distilled models that have a lot of bang for the parameters they have.