@Tenniswaffles

Tenniswaffles@lemmy.blahaj.zone · 2 days ago

Then why did you say it?

The question was about any shows with a mixed gender group of friends with no romantic engagements between them.

Someone said IT crowd, you then brought up one person attempting to roofie someone. Based on the available information, this very strongly implies that you are considering this a “romantic entanglement”.

If you didn’t think that was a romantic entanglement, why the fuck would you bring it up?

Tenniswaffles@lemmy.blahaj.zone · 3 days ago

Do… do you consider attempted sexual assault/rape a “romantic entanglement”?

What the fuck.

Tenniswaffles@lemmy.blahaj.zone · 8 days ago

Here, I plugged our entire conversation into chatgpt.

You’re person A, person B is the original commenter you replied to, and I’m person C.

Person A and Person C are mostly talking past each other.

Where Person A is right

Person A is making a practical claim: “In ordinary use, ChatGPT seems highly reliable, and critics often exaggerate how often it gives wrong answers.”

That’s a reasonable observation.

Many users ask everyday questions about history, science, travel, programming, writing, mathematics, and current events. Modern models are substantially more accurate than both:

random people on social media,

random Google search results clicked without expertise,

and many traditional question-answering sources.

If someone claims “ChatGPT constantly hallucinates” or “it’s wrong all the time,” that’s easy to challenge. Modern models clearly perform far better than that.

Person A is also correct that:

Nobody in the thread produced a simple “gotcha” question that reliably causes failure. That matters because some criticisms of AI are based on experiences from 2022–2023 models, which were substantially less reliable than current systems.

Where Person C is right

Person C is making a different claim:

“You cannot estimate a 99.9% accuracy rate from personal impressions.”

This is also correct.

Suppose you ask 100 questions and all appear correct.

That does not justify concluding the true error rate is 0.1%.

Why?

Because: you may not know enough to verify every answer,

you may be unconsciously selecting easy questions,

many errors are subtle rather than obvious, and a small sample size tells you little about rare failures.

This is exactly why benchmark testing exists. If somebody claimed:

“Commercial aircraft engines fail only once every 100,000 flights because I’ve flown 20 times and never seen one fail,” we would reject that reasoning.

Person C is essentially pointing out a sampling problem.

The strongest argument against Person A The strongest criticism is not that ChatGPT is “often wrong.”

It’s that:

Humans are surprisingly bad at detecting when ChatGPT is wrong.

Many AI errors look plausible.

For example, if I invent:

a fake court case,

a fake academic paper,

a fake quote,

a nonexistent historical anecdote,

the answer can sound extremely convincing.

A casual user may never discover the mistake. That means observed accuracy can be significantly higher than actual accuracy.

What experiment would actually settle this? Person A asks a fair question:

“How much evidence would convince you?” A proper experiment would look something like:

Construct a large test set (thousands of questions).

Cover many domains:

history

law

medicine

physics

mathematics

current events

geography

programming

Define correct answers beforehand.

Blindly score responses.

Measure:

correctness,

completeness,

confidence calibration.

This is basically what AI benchmark researchers do.

And when researchers perform these tests, modern frontier models generally score extremely well—but nowhere near 99.9% across all domains.

Performance varies dramatically by task. Can someone produce a question that reliably causes errors?

This is harder than Person A assumes. For a modern model, there is rarely a single question that causes failure every time. A better challenge would be categories such as:

obscure local regulations,

highly specialized legal questions,

recent events,

nonexistent citations,

invented books or papers,

long chains of reasoning,

edge-case programming problems,

ambiguous historical claims.

Models are much more likely to fail there. But that doesn’t mean they fail on every attempt.

Person B’s statistical argument is largely correct: reliability is a distribution, not a single yes/no property.

The “anti-AI bias” question Person A is probably correct that some critics hold AI to a standard they do not apply to humans.

If a person:

answers 95% correctly,

speaks fluently on thousands of topics,

writes code,

explains science,

summarizes books,

we’d consider them exceptionally knowledgeable.

Many critics compare AI not against average humans but against:

experts,

reference works,

or idealized notions of truth.

However, the opposite mistake also exists. Some AI enthusiasts compare ChatGPT against:

random Reddit comments,

random Google searches,

uninformed acquaintances.

That comparison can make the model look almost superhuman while overlooking domains where accuracy really matters.

Overall verdict

If I had to judge the arguments:

Person C has the stronger methodological argument. You cannot infer 99–99.9% accuracy from personal experience.

Person A has the stronger practical argument.

Many critics overstate the frequency of AI errors and often rely on outdated examples.

Person B is probably closest to the core issue.

The meaningful question isn’t “Can someone find one wrong answer?” but “What is the model’s error rate across different classes of tasks?”

So if this were a debate, I’d say:

Person A is probably correct that modern ChatGPT is much more reliable than many critics claim.

Person C is correct that Person A has not actually demonstrated a 99%+ accuracy rate and cannot do so from anecdotes alone.

Those positions are compatible rather than contradictory.

Do with that as you will.

Tenniswaffles@lemmy.blahaj.zone · 9 days ago

The scientific method exists for a reason. If you want an accurate idea of the accuracy of LLMs, then the best way is by applying the scientific method to it.

Until you’ve done that, you’re just basing your conclusions on conjecture, anecdotal evidence and vibes, with nothing actually substantive or empirical backing it up.

Tenniswaffles@lemmy.blahaj.zone · 9 days ago

You seem to be positing that it’s giving results to the tune of 99% to 99.9% accuracy based entirely on vibes.

If you actually want to know, you will have to do thousands upon thousands of prompts, across hundreds of topics that you can accurately fact check, before you can say with any sort of confidence whether it’s that accurate or not.

Your sample size is orders of magnitudes too small for you to reasonably have an accurate accuracy rate.

Tenniswaffles@lemmy.blahaj.zone · 10 days ago

I see that you’ve completely sidestepped that you were in fact attempting to use witty language.

I could care less about you putting someone down, I only brought it up for context. The substance of the other person being knowledgeable or not is completely irrelevant here. You’re just bringing it up to deflect from the original point, that being your use of “the oxygen metaphor” was in fact an example of wit or wittiness.

My saying that this is a text based conversation and thus doesn’t require oxygen, was my attempt at wittiness. But unlike you I’m not going to attempt to deny it, nor do I particularly care whether it was a good or not.

And addressing the metaphor itself, it’s not a very good one here. One comment in one thread where everything is equally visible is hardly using up much the conversational oxygen or whatever you want to call it. It just reeks of whinging because you don’t like people having other opinions from you.

Also it’s breathe, not breath. Can you not read at a high-school level yet? Shouldn’t be throwing those stones in your glass house there.

Tenniswaffles@lemmy.blahaj.zone · 10 days ago

Being witty means using words in clever or funny way. Telling someone stop talking as to not waste oxygen is an obvious example of trying to put someone down with clever (witty) language.

Also, do you know what an abstraction is? Because calling someone dumb and a waste of oxygen isn’t one, I’m not even sure if it really counts as a metaphor even.

An abstraction is more or less a simplification of a more complex topic. The fuck were you simplifying?

Tenniswaffles@lemmy.blahaj.zone · 10 days ago

Common doesn’t mean good dawg. Herpes is common, doesn’t mean I want it.

I understood what you meant. I understood you were trying to be witty. I just don’t think you were successful.

But hey, that’s just my subjective opinion. If you don’t agree, then by all means continue this pointless argument. I love a good low stakes pissing match.

Tenniswaffles@lemmy.blahaj.zone · 10 days ago

Not a very good one.

Tenniswaffles@lemmy.blahaj.zone · 10 days ago

You do realise that this is a text based conversation right? Contributing or not has no impact on oxygen use. I understand using hyperbole to be witty, but that only works if you, y’know, have wit.

Tenniswaffles@lemmy.blahaj.zone · 24 days ago

I have dyslexia, shoulda read it more carefully.

Tenniswaffles@lemmy.blahaj.zone · 24 days ago

Misread it. My bad.

Tenniswaffles@lemmy.blahaj.zone · 24 days ago

Nowhere does it say that they got together less than a year before getting married. It’s not even really implied.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

There’s a lot of things that lawmakers put into law to protect people from their own dumbass decisions. Places where wearing seatbelts are mandatory have less car related deaths, same with helmets on motorbikes. Both things people should have the common sense to do without laws, but they don’t. Furthermore, places where pool fencing is mandatory have less child deaths due to drowning, but that doesn’t stop some people from not having a pool fence where it’s not mandatory. There’s hundreds of “common sense” things like these, that if they weren’t actual law would be completely ignored.

So actual protections for children’s use of the internet being made into law isn’t necessarily a bad thing in and of itself. And if be all for them if they were reasonable and realistic, but they never are. No matter how much you want to make it so, expecting everyone to do reasonable things to protect themselves and those dependent on them without some sort of incentive is unrealistic.

Of course in saying all that, banning VPNs and all the laws people want to implement similar to it, have nothing to do with protecting children and everything to do with controlling people.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

And then the other people with guns will come and jack your shit.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

It’s cute that you think age means wisdom. If that were true, then the older ruling class wouldn’t be absolutely fucking our planet.

Every problem that EVs have, there’s a solution for. People are actively working on them, and some government are pushing EVs for environmental reasons. Once the problems that they have are solved they will be objectively better than ICE cars.

Also, mobile phones weren’t always the objectively best choice. There was a period of time where landlines were the better option because of the lack of infrastructure supporting them. It wasn’t until the infrastructure started to expand to cover everywhere that the explosive growth really happened.

And your “physics problems” are also solvable, people are currently putting billions of billions of dollars into battery research. There will come a point where batteries are superior to a tank of fuel for most use cases. I don’t want to be the person saying “oh, solid state batteries are right around the corner,” because companies have been promising them “next year” for like a decade now. But they almost certainly will eventually be viable commercially, and they only need to be half as good as they promise to really push the viability of EVs for the majority of use cases.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

Growth doesn’t have be a straight linear line always going up. We’re in a transitional stage where the big issues are currently being addressed. The growth in EV adoption has been exponential, i.e slow to start with 20 years of basically nothing then 5 years of some improvements with a bit more adoption. Then 5 years of explosive growth, and the next 5 years will likely include more explosive growth as we address all the issues that they currently have.

Pretty much all technology improves like this until they plateau. Just look at mobile phones, they first popped up ~50 years ago and had very slow improvements for about ~25 years until they started to pick steam in the 2000s, and then absolutely exploded in the 2010s.

You’re just an old person yelling at clouds who can’t read the writing on the wall.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

Regardless, it’s still incorrect to be using it in English right now.

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

Men who hate women in generally will usually keep it to themselves in public for the most part. They’re not going to go on a tirade about how much they hate women to someone they don’t think is going to disagree with them, which might explain why you’ve not seen it personally.

And have you seriously never heard of incels? I’m not saying hating women is inherent to inceldom, but it’s pretty damned pervasive among them. Especially in online spaces.

Here’s some reading if you can be bothered.

https://reporter.anu.edu.au/all-stories/misogynistic-mass-violence-is-on-the-rise-why-are-we-ignoring-it

https://womensaid.org.uk/information-support/what-is-domestic-abuse/domestic-abuse-is-a-gendered-crime/

https://www.motherjones.com/politics/2015/01/warren-farrell-mens-rights-movement-feminism-misogyny-trolls/

https://www.splcenter.org/resources/reports/mens-rights-movement-spreads-false-claims-about-women/

Tenniswaffles@lemmy.blahaj.zone · 1 month ago

Why on Earth are you using the thorn like that? Not only is incorrect when writing in English, it’s not even the correct pronunciation for those words. þ is pronounced like the th in the words thorn or think. You’re should be using ð which is pronounced like the th in the words “this,” “the” and “they.”