Perhaps the most discussed technical detail is the “Undercover Mode.” This feature reveals that Anthropic uses Claude Code for “stealth” contributions to public open-source repositories.
The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”
Laws should have been put in place years ago to make it so that AI usage needs to be explicitly declared.
In Europe we have the AI act which, as of August, will introduce some form of transparency obligations. Not perfect obviously but a start. Probably will not be followed by the rest of the world though so like GDPR it will be forcibly eroded by other’s interests through lobbying but at least we try.
The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”
This is so incredibly stupid.
You’ve tried security.
You’ve tried security through obscurity.
Now try security through giving instructions to an LLM via a system prompt to not blow its cover.
That doesn’t sound like it is saying don’t identify yourself. That it’s called claude isn’t internal information. So it doesn’t seem that instruction is doing tpwhat you are saying. Must be more instructions.
With how massive of a computer science field artificial intelligence is and how much of it already is or is getting added to every piece of software that exists, a label like that would be equally useless as the California prop 65 cancer warnings.
Do you use a mobile keyboard that supports swipe typing and has autocorrect? Remember to mark everything you write as being AI assisted.
Well yes, if you let autocorrect write code contribution, I think you should lable that contribution as AI.
AI usage needs to be explicitly declared.
Pointless. https://www.theregister.com/2026/01/08/linus_versus_llms_ai_slop_docs/
If it was the law then the AI itself would be coded to not allow going “undercover”, and there would be legal consequences if caught. Torvald’s stance only matters for how things ‘are’ not how they ‘could be’.
Would it be a cure all? Of course not. Fraud still happens despite the illegality. But it’s better than not being able to trust anything ever again.
and there would be legal consequences if caught.
Like for driving over the speed limit? Or putting glass in the regular trash instead of the recycling? Yeah, just what I need in my life, another arbitrary law that’s enforced 0.0001% of the time as a flex by the people in power to target and abuse people they don’t like.
Be careful not to introduce security vulnerabilities such as command injection, XSS, SQL injection, and other OWASP top 10 vulnerabilities. If you notice that you wrote insecure code, immediately fix it.
Lmao. I’m sure that will solve the problem of it writing insecure slop code.
That sounds like written by some dumbass vibe-coder who actually believes their LLM is “smart”.
It doesn’t fix it, but as stupid as it looks, it should actually improve the chances.
If you’ve seen how the reasoning works, they basically spit out some garbage, then read it again and think whether it’s garbage enough or not.
They do try to ‘correct their errors’, so to say.
By 4:23 am ET, Chaofan Shou (@Fried_rice), an intern at Solayer Labs, broadcasted the discovery on X (formerly Twitter).
Ha, by an intern
Against best practice of informing the company first to remediate. Now it’s a security nightmare for anyone running it locally
Best part of the leak, they use regex matches for sentiment lol
I think saw one of the keywords was dumbass. And another looked for you calling it a piece of shit
Something in a song on my car radio triggered my phone to wake google yesterday and I casually told it to fuck off, and it replied “I’m sorry you’re upset. You can send feedback”
Adversarial audio, but just occurring by chance? Wild stuff. I was just looking into how to do that.
Lmao, so the LLM framework falls back to similar shit to what ALICE used?
I mean it’s not that big a deal. However, it would another thing if the model itself leaked. Now that would be something.
edit: Like I thought, it turns out to be a TS wrapper with more internal prompts. The fireship video is really funny, they use regex to detect if the user is angry 😭
Tool usage is very important. Qwen3.5 (135b) can already do wonderful things on OpenCode.
I dabble in local AI and this always blows my mind. How do people just casually throw 135b parameter models around? Are people like, renting datacenter hardware or GPU time or something, or are people just building personal AI servers with 6 5090s in them, or are they quantizing them down to 0.025 bits or what? what’s the secret? how does this work? am I missing something? like the Q4 of Qwen3.5 122B is between 60-80GB just for the model alone. That’s 3x 5090s minimum, unless I’m doing the math wrong, and then you need to fit the huge context windows these things have in there too. I don’t get it.
Meanwhile I’m over here nearly burning my house down trying to get my poor consumer cards to run glm-4.7-flash.
I pay for Ollama Cloud. As for the training of the big models, big companies do it using who-knows-what resources.
This is just the UI right? Or the models too?
Like a healthy brain. And just like a healthy brain, it’ll still hallucinate and make mistakes probably:
The leaked source reveals a sophisticated, three-layer memory architecture that moves away from traditional “store-everything” retrieval.
As analyzed by developers like @himanshustwts, the architecture utilizes a “Self-Healing Memory” system.








