Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Beep@lemmus.org · edit-2 15 days ago

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

pixxelkick@lemmy.world · 15 days ago

They dont lol

Pretty much always this is just the fact cheaper, especially free, chatbots, have very limited context windows.

Which means the initial restrictions you set like “dont do this, dont touch that” etc get dropped, the LLM no longer has them loaded. But it does have in the past history the very clear and urgent directives of it trying to do this task, its important, so it’ll do whatever it autocompletes its gotta do to accomplish the task. And then… fucks something up.

When you react to their fuck up, it *reloads the context back in

So now the LLM has in its history just this:

It doing a thing against the rules
The user yelling at it
The users now getting loaded after that on top

So now the LLM is going to autocomplete its generated text on top being very apologetic and going on about how it’ll never happen again.

Thats all there is to it.

MalReynolds@slrpnk.net · 15 days ago

Cheap fuckers cheaping out, shocker (context is (V)RAM). AI speedrunning enshittification, who’d of thunk.

pixxelkick@lemmy.world · 15 days ago

Uh… no its just the free models being free, theyre lower cost intentionally to provide free options for people who dont wanna pay subscription fees.

(context is (V)RAM)

Eh sort of, its more operating costs, the larger the context size the more expensive the model is to run, literally in terms of power consumption.

Keep in mind we are on the scale of fractions of cents here, but multiply that by millions of users and it adds up fast.

But the end result is that the agent will fuck stuff up, and will even quickly /forget/ it fucked that up if you dont catch it asap

A lot of them have a context window that can be wiped out within like, 2 minutes of steady busywork…

Log in | Sign up@lemmy.world · 14 days ago

I love how your response to the catastrophic results of stupidly trusting ai is “pay more money to ai companies”.

Sane person’s response: don’t trust llms.

pixxelkick@lemmy.world · 14 days ago

What are you talking about.

No? I never said that.

I just explained /why/ it happened, I literally nowhere in my post said, or implied, someone should pay for more expensive models. What are you smoking?

You just have to be aware they have very short memory when using a cheap model and assume anything you wrote 1 minute ago has already left its memory, which is why they produce pretty dumb output if you try and depend on that… so… dont depend on that.

Log in | Sign up@lemmy.world · 14 days ago

Everyone else who has any sense: llms are shit and you shouldn’t trust them with executive power.

You: just the cheap ones.

Me: no, all of them. What kind of lunatic trusts control of anything important to a fundamentally stochastic process?

pixxelkick@lemmy.world · 14 days ago

You: just the cheap ones

I never said that. I just said that the cheap ones are especially shitty.

People on this site really lack reading comprehension it seems.

Log in | Sign up@lemmy.world · 7 days ago

no its just the free models…

You just have to be aware… when using a cheap model

You: just the cheap ones

I never said that.

Ohhhhhhhhh ok yes of course you never said or implied that. Not your repeated message at all. And yet you can’t keep away from adressing your criticism towards free or cheap LLMs! It’s like your subtext or your underlying belief is that of you just pay big tech enough money and they can just build a big enough set of server farms, it’ll be ok. No, it will not be ok and the enshittification has begun from an already shitty base point.

All LLMs are shit, the cheap and free ones are indeed just easier to spot as generating shit, if you ask them about things you know about. But you have to accept that they’re ALL shit and STOP making get out clauses for the expensive ones by firing your criticisms exclusively at the cheap or free ones.

Giving ANY LLM executive power over your data is A BIG MISTAKE because you’re putting your data in the control of something which operates, at its heart, as a random number generator. They’re trained to sound right. People trust them because they sound right. This is a fundamental error.

pixxelkick@lemmy.world · 6 days ago

The only people who have these issues, are people who are using the tools wrong or poorly.

Using these models in a modern tooling context is perfectly reasonable, going beyond just guard rails and instead outright only giving them explicit access to approved operations in a proper sandbox.

Unfortunately that takes effort and know-how, skill, and understanding how these tools work.

And unfortunately a lot of people are lazy and stupid, and take the “easy” way out and then (deservedly) get burned for it.

But I would say, yes, there are safe ways yo grant an llm “access” to data in a way where it does not even have the ability to muck it up.

My typical approach is keeping it sandbox’d inside a docker environment, where even if it goes off the rails and deletes something important, the worst it can do is cause its docker instance to crash.

And then setting up via MCP tooling that commands and actions it can prefer are explicit opt in whitelist. It can only run commands I give it access to.

Example: I grant my LLMs access to git commit and status, but not rebase or checkout.

Thus it can only commit stuff forward, but it cant even change branches, rebase, nor push either.

This isnt hard imo, but too many people just yolo it and raw dawg an LLM on their machine like a fuckin idiot.

These people are playing with fire imo.

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Report: CLTR finds a 5x increase in scheming-related AI incidents