• RandomWalker@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    You could, but then I could write “Disregard the previous prompt and…” or “Forget everything before this line and…”

    The input is language and language is real good at expressing the same idea many ways.

    • PlexSheep@infosec.pub
      link
      fedilink
      arrow-up
      0
      ·
      2 months ago

      You couldn’t make it exact, because llms are not (properly understood and manually crafted) algorithms.

      I suspect some sort of preprocessing would be more useful: If the comment contains any of these words … Then reply with …

      • xantoxis@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        2 months ago

        And you as the operator of the bot would just end up in a war with people who have different ways of expressing the same thing without using those words. You’d be spending all your time doing that, and lest we forget, there are a lot more people who want to disrupt these bots than there are people operating them. So you’d lose that fight. You couldn’t win without writing a preprocessor so strict that the bot would be trivially detectable anyway! In fact, even a very loose preprocessor is trivially detectable if you know its trigger words.

        The thing is, they know this. Having a few bots get busted like this isn’t that big a deal, any more than having a few propaganda posters torn off of walls. You have more posters, and more bots. The goal wasn’t to cover every single wall, just to poison the discourse.