• Ethanol@pawb.social
      link
      fedilink
      English
      arrow-up
      18
      ·
      2 days ago

      That’s a great post, that well displays the issues with AI tests! For my own personal curiosity I looked at the testing rewrite of rsync, specifically the chgrp_test because it was the smallest test I quickly found. If you look at the original shell script, all it does is call chgrp and then fail if it doesn’t work. In the Python rewrite on the other hand the LLM calls chown to change the group and only if that fails, it tests chgrp. So if for some reason chown works but chgrp would fail, the original shell script would easily catch that (cause why do you test for chown anyways) while the Python rewrite doesn’t even call chgrp in case chown works.

      Even though, this might not be as much of a problem in practice, I think it illustrates that the AI tends to write tests where it already anticipates and tries to fix potential issues, which absolutely goes against the use of tests!

      • The_Decryptor@aussie.zone
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        2 days ago

        I think it illustrates that the AI tends to write tests where it already anticipates and tries to fix potential issues, which absolutely goes against the use of tests!

        LLMs just generate “statistically probable” text, all it’s doing is generating text that looks like how you’d write tests, they may or may not actually test anything.

    • whoisearth@lemmy.ca
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      Lol this is hilarious! I want to see the prompt they used cause my god they didn’t think it through