• corroded@lemmy.world
    link
    fedilink
    English
    arrow-up
    61
    ·
    2 days ago

    I have to wonder if NPUs are just going to eventually become a normal part of the instruction set.

    When SIMD was first becoming a thing, it was advertised as accelerating “multimedia,” as that was the hot buzzword of the 1990s. Now, SIMD instructions are used everywhere, any place there is a benefit from processing an array of values in parallel.

    I could see NPUs becoming the same. Developers start using NPU instructions, and the compiler can “NPU-ify” scalar code when it thinks it’s appropriate.

    NPUs are advertised for “AI,” but they’re really just a specialized math coprocessor. I don’t really see this as a bad thing to have. Surely there are plenty of other uses.

    • Caveman@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      It’s tricky to use in programming though for non neural network math, I can see it used in video compression and decompression or some very specialised video game math.

      Video game AI could be a big one though where difficulty would be AI based instead of just stat modifiers.

    • pftbest@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      20 hours ago

      I agree, we should push more to get the open and standardized API for these accelerators, better drivers and better third party software support. As long as the manufacturers keep them locked and proprietary, we won’t be able to use them outside of niche copilot features no one wants anyway.

    • Dudewitbow@lemmy.zip
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      1
      ·
      2 days ago

      The problem that (local) ai has at the current moment is that its not just a single type of compute, and because of that, breaks usefulness in the pool of what you can do with it.

      on the Surface level, “AI” is a mixture of what is essentially FP16, FP8, and INT8 accelerators, and different implementations have been using different ones. NPUs are basically INT8 only, while GPU intensive ones are FP based, making them not inherently cross compatible.

      It forces devs to either think of the NPUs themselves with small things (e.g background blur with camera) as there isn’t any consumer level chip with a massive INT8 co processor except for the PS5 Pro (300 TOPS INT8, which compared to laptop cpus, have a 50 TOPs, so on a completely different league, PS5 Pro uses it to upscale)

    • addie@feddit.uk
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      2
      ·
      2 days ago

      SIMD is pretty simple really, but it’s been 30 years since it’s been a standard-ish feature in CPUs, and modern compilers are “just about able to sometimes” use SIMD if you’ve got a very simple loop with fixed endpoints that might use it. It’s one thing that you might fall back to writing assembly to use - the FFmpeg developers had an article not too long ago about getting a 10% speed improvement by writing all the SIMD by hand.

      Using an NPU means recognising algorithms that can be broken down into parallelizable, networkable steps with information passing between cells. Basically, you’re playing a game of TIS-100 with your code. It’s fragile and difficult, and there’s no chance that your compiler will do that automatically.

      Best thing to hope for is that some standard libraries can implement it, and then we can all benefit. It’s an okay tool for ‘jobs that can be broken down into separate cells that interact’, so some kinds of image processing, maybe things like liquid flow simulations. There’s a very small overlap between ‘things that are just algorithms that the main CPU would do better’ and ‘things that can be broken down into many many simple steps that a GPU would do better’ where an NPU really makes sense, tho.