• koolaidman123@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    One draw of keras that would get people to switch over would be how easy it is to model parallelism, but you’d need to get better mfu than fsdp/deepspeed, ideally competitive to megatron while being way easier and more flexible for people to switch