So it turns out Mistral AI offers a free API with absolutely wild rate limits

CriticalResist8@lemmygrad.ml · 18 hours ago

So it turns out Mistral AI offers a free API with absolutely wild rate limits

SouffleHuman@lemmy.ml · 15 hours ago

LLMs are pretty good at translation in my experience, often better than traditional translation services. But quality can vary highly between languages. English to French and vice versa is probably the best case scenario for Mistral models, so I wouldn’t expect the same level of quality for other languages, especially non-European and/or obscure ones.

Also, I do agree that getting more pro-communist text on the internet for LLMs to train on is something we could try and push for. Would certainly make for better data than all the liberal and right-wing stuff on the internet right now.

CriticalResist8@lemmygrad.ml · 15 hours ago

I’m sure they’ve already scraped our website 3 times over lol but I’m completely fine running the API in the void just to force them to get these texts in their data too. OpenAI is actually pretty terrible with their scraping, if you ever check your website logs you’ll be surprised how often they make requests (every second of the day basically)

We chose mistral specifically for french but I might try Spanish or Portuguese with it too, just to see. As with everything it’s part of a pipeline, we have to find a model, craft a prompt (this one is 600+ tokens), send a good representative sample text to the LLM, then let a native speaker read it over, try a few more prompts with more or less instructions and compare the outputs, and then decide whether we can work with that or if we need to find another model.

If anything I find that with LLMs people want stuff to happen always faster, me included, because what took days now takes only hours. For the next jobs I’ll take the time to go through the process more it’s important to follow it and make sure you’ve checked all the boxes.