• Encephalotrocity@feddit.online
    link
    fedilink
    English
    arrow-up
    3
    ·
    19 days ago

    Perhaps the most discussed technical detail is the “Undercover Mode.” This feature reveals that Anthropic uses Claude Code for “stealth” contributions to public open-source repositories.

    The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”

    Laws should have been put in place years ago to make it so that AI usage needs to be explicitly declared.

    • a4ng3l@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      19 days ago

      In Europe we have the AI act which, as of August, will introduce some form of transparency obligations. Not perfect obviously but a start. Probably will not be followed by the rest of the world though so like GDPR it will be forcibly eroded by other’s interests through lobbying but at least we try.

    • merc@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      19 days ago

      The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”

      This is so incredibly stupid.

      You’ve tried security.

      You’ve tried security through obscurity.

      Now try security through giving instructions to an LLM via a system prompt to not blow its cover.

    • Modern_medicine_isnt@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      19 days ago

      That doesn’t sound like it is saying don’t identify yourself. That it’s called claude isn’t internal information. So it doesn’t seem that instruction is doing tpwhat you are saying. Must be more instructions.

    • JohnEdwa@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      19 days ago

      With how massive of a computer science field artificial intelligence is and how much of it already is or is getting added to every piece of software that exists, a label like that would be equally useless as the California prop 65 cancer warnings.

      Do you use a mobile keyboard that supports swipe typing and has autocorrect? Remember to mark everything you write as being AI assisted.

      • mrbutterscotch@feddit.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        19 days ago

        Well yes, if you let autocorrect write code contribution, I think you should lable that contribution as AI.

      • Encephalotrocity@feddit.online
        link
        fedilink
        English
        arrow-up
        0
        ·
        19 days ago

        If it was the law then the AI itself would be coded to not allow going “undercover”, and there would be legal consequences if caught. Torvald’s stance only matters for how things ‘are’ not how they ‘could be’.

        Would it be a cure all? Of course not. Fraud still happens despite the illegality. But it’s better than not being able to trust anything ever again.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          19 days ago

          and there would be legal consequences if caught.

          Like for driving over the speed limit? Or putting glass in the regular trash instead of the recycling? Yeah, just what I need in my life, another arbitrary law that’s enforced 0.0001% of the time as a flex by the people in power to target and abuse people they don’t like.

  • CorrectAlias@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    Be careful not to introduce security vulnerabilities such as command injection, XSS, SQL injection, and other OWASP top 10 vulnerabilities. If you notice that you wrote insecure code, immediately fix it.

    Lmao. I’m sure that will solve the problem of it writing insecure slop code.

    • filcuk@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      19 days ago

      It doesn’t fix it, but as stupid as it looks, it should actually improve the chances.
      If you’ve seen how the reasoning works, they basically spit out some garbage, then read it again and think whether it’s garbage enough or not.
      They do try to ‘correct their errors’, so to say.

  • NocturnalMorning@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    By 4:23 am ET, Chaofan Shou (@Fried_rice), an intern at Solayer Labs, broadcasted the discovery on X (formerly Twitter).

    Ha, by an intern

    • mermella@piefed.social
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      19 days ago

      Against best practice of informing the company first to remediate. Now it’s a security nightmare for anyone running it locally

    • hactar42@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      18 days ago

      I think saw one of the keywords was dumbass. And another looked for you calling it a piece of shit

      • smeenz@lemmy.nz
        link
        fedilink
        English
        arrow-up
        1
        ·
        18 days ago

        Something in a song on my car radio triggered my phone to wake google yesterday and I casually told it to fuck off, and it replied “I’m sorry you’re upset. You can send feedback”

  • spez@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    edit-2
    18 days ago

    I mean it’s not that big a deal. However, it would another thing if the model itself leaked. Now that would be something.

    edit: Like I thought, it turns out to be a TS wrapper with more internal prompts. The fireship video is really funny, they use regex to detect if the user is angry 😭

      • cecilkorik@piefed.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        19 days ago

        I dabble in local AI and this always blows my mind. How do people just casually throw 135b parameter models around? Are people like, renting datacenter hardware or GPU time or something, or are people just building personal AI servers with 6 5090s in them, or are they quantizing them down to 0.025 bits or what? what’s the secret? how does this work? am I missing something? like the Q4 of Qwen3.5 122B is between 60-80GB just for the model alone. That’s 3x 5090s minimum, unless I’m doing the math wrong, and then you need to fit the huge context windows these things have in there too. I don’t get it.

        Meanwhile I’m over here nearly burning my house down trying to get my poor consumer cards to run glm-4.7-flash.

  • pelespirit@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    19 days ago

    Like a healthy brain. And just like a healthy brain, it’ll still hallucinate and make mistakes probably:

    The leaked source reveals a sophisticated, three-layer memory architecture that moves away from traditional “store-everything” retrieval.

    As analyzed by developers like @himanshustwts, the architecture utilizes a “Self-Healing Memory” system.