• 0 Posts
  • 26 Comments
Joined 3 years ago
cake
Cake day: August 16th, 2023

help-circle


  • Here, I plugged our entire conversation into chatgpt.

    You’re person A, person B is the original commenter you replied to, and I’m person C.


    Person A and Person C are mostly talking past each other.

    Where Person A is right

    Person A is making a practical claim: “In ordinary use, ChatGPT seems highly reliable, and critics often exaggerate how often it gives wrong answers.”

    That’s a reasonable observation.

    Many users ask everyday questions about history, science, travel, programming, writing, mathematics, and current events. Modern models are substantially more accurate than both:

    random people on social media,

    random Google search results clicked without expertise,

    and many traditional question-answering sources.

    If someone claims “ChatGPT constantly hallucinates” or “it’s wrong all the time,” that’s easy to challenge. Modern models clearly perform far better than that.

    Person A is also correct that:

    Nobody in the thread produced a simple “gotcha” question that reliably causes failure. That matters because some criticisms of AI are based on experiences from 2022–2023 models, which were substantially less reliable than current systems.

    Where Person C is right

    Person C is making a different claim:

    “You cannot estimate a 99.9% accuracy rate from personal impressions.”

    This is also correct.

    Suppose you ask 100 questions and all appear correct.

    That does not justify concluding the true error rate is 0.1%.

    Why?

    Because: you may not know enough to verify every answer,

    you may be unconsciously selecting easy questions,

    many errors are subtle rather than obvious, and a small sample size tells you little about rare failures.

    This is exactly why benchmark testing exists. If somebody claimed:

    “Commercial aircraft engines fail only once every 100,000 flights because I’ve flown 20 times and never seen one fail,” we would reject that reasoning.

    Person C is essentially pointing out a sampling problem.

    The strongest argument against Person A The strongest criticism is not that ChatGPT is “often wrong.”

    It’s that:

    Humans are surprisingly bad at detecting when ChatGPT is wrong.

    Many AI errors look plausible.

    For example, if I invent:

    a fake court case,

    a fake academic paper,

    a fake quote,

    a nonexistent historical anecdote,

    the answer can sound extremely convincing.

    A casual user may never discover the mistake. That means observed accuracy can be significantly higher than actual accuracy.

    What experiment would actually settle this? Person A asks a fair question:

    “How much evidence would convince you?” A proper experiment would look something like:

    Construct a large test set (thousands of questions).

    Cover many domains:

    history

    law

    medicine

    physics

    mathematics

    current events

    geography

    programming

    Define correct answers beforehand.

    Blindly score responses.

    Measure:

    correctness,

    completeness,

    confidence calibration.

    This is basically what AI benchmark researchers do.

    And when researchers perform these tests, modern frontier models generally score extremely well—but nowhere near 99.9% across all domains.

    Performance varies dramatically by task. Can someone produce a question that reliably causes errors?

    This is harder than Person A assumes. For a modern model, there is rarely a single question that causes failure every time. A better challenge would be categories such as:

    obscure local regulations,

    highly specialized legal questions,

    recent events,

    nonexistent citations,

    invented books or papers,

    long chains of reasoning,

    edge-case programming problems,

    ambiguous historical claims.

    Models are much more likely to fail there. But that doesn’t mean they fail on every attempt.

    Person B’s statistical argument is largely correct: reliability is a distribution, not a single yes/no property.

    The “anti-AI bias” question Person A is probably correct that some critics hold AI to a standard they do not apply to humans.

    If a person:

    answers 95% correctly,

    speaks fluently on thousands of topics,

    writes code,

    explains science,

    summarizes books,

    we’d consider them exceptionally knowledgeable.

    Many critics compare AI not against average humans but against:

    experts,

    reference works,

    or idealized notions of truth.

    However, the opposite mistake also exists. Some AI enthusiasts compare ChatGPT against:

    random Reddit comments,

    random Google searches,

    uninformed acquaintances.

    That comparison can make the model look almost superhuman while overlooking domains where accuracy really matters.

    Overall verdict

    If I had to judge the arguments:

    Person C has the stronger methodological argument. You cannot infer 99–99.9% accuracy from personal experience.

    Person A has the stronger practical argument.

    Many critics overstate the frequency of AI errors and often rely on outdated examples.

    Person B is probably closest to the core issue.

    The meaningful question isn’t “Can someone find one wrong answer?” but “What is the model’s error rate across different classes of tasks?”

    So if this were a debate, I’d say:

    Person A is probably correct that modern ChatGPT is much more reliable than many critics claim.

    Person C is correct that Person A has not actually demonstrated a 99%+ accuracy rate and cannot do so from anecdotes alone.

    Those positions are compatible rather than contradictory.


    Do with that as you will.




  • I see that you’ve completely sidestepped that you were in fact attempting to use witty language.

    I could care less about you putting someone down, I only brought it up for context. The substance of the other person being knowledgeable or not is completely irrelevant here. You’re just bringing it up to deflect from the original point, that being your use of “the oxygen metaphor” was in fact an example of wit or wittiness.

    My saying that this is a text based conversation and thus doesn’t require oxygen, was my attempt at wittiness. But unlike you I’m not going to attempt to deny it, nor do I particularly care whether it was a good or not.

    And addressing the metaphor itself, it’s not a very good one here. One comment in one thread where everything is equally visible is hardly using up much the conversational oxygen or whatever you want to call it. It just reeks of whinging because you don’t like people having other opinions from you.

    Also it’s breathe, not breath. Can you not read at a high-school level yet? Shouldn’t be throwing those stones in your glass house there.









  • There’s a lot of things that lawmakers put into law to protect people from their own dumbass decisions. Places where wearing seatbelts are mandatory have less car related deaths, same with helmets on motorbikes. Both things people should have the common sense to do without laws, but they don’t. Furthermore, places where pool fencing is mandatory have less child deaths due to drowning, but that doesn’t stop some people from not having a pool fence where it’s not mandatory. There’s hundreds of “common sense” things like these, that if they weren’t actual law would be completely ignored.

    So actual protections for children’s use of the internet being made into law isn’t necessarily a bad thing in and of itself. And if be all for them if they were reasonable and realistic, but they never are. No matter how much you want to make it so, expecting everyone to do reasonable things to protect themselves and those dependent on them without some sort of incentive is unrealistic.

    Of course in saying all that, banning VPNs and all the laws people want to implement similar to it, have nothing to do with protecting children and everything to do with controlling people.



  • It’s cute that you think age means wisdom. If that were true, then the older ruling class wouldn’t be absolutely fucking our planet.

    Every problem that EVs have, there’s a solution for. People are actively working on them, and some government are pushing EVs for environmental reasons. Once the problems that they have are solved they will be objectively better than ICE cars.

    Also, mobile phones weren’t always the objectively best choice. There was a period of time where landlines were the better option because of the lack of infrastructure supporting them. It wasn’t until the infrastructure started to expand to cover everywhere that the explosive growth really happened.

    And your “physics problems” are also solvable, people are currently putting billions of billions of dollars into battery research. There will come a point where batteries are superior to a tank of fuel for most use cases. I don’t want to be the person saying “oh, solid state batteries are right around the corner,” because companies have been promising them “next year” for like a decade now. But they almost certainly will eventually be viable commercially, and they only need to be half as good as they promise to really push the viability of EVs for the majority of use cases.


  • Growth doesn’t have be a straight linear line always going up. We’re in a transitional stage where the big issues are currently being addressed. The growth in EV adoption has been exponential, i.e slow to start with 20 years of basically nothing then 5 years of some improvements with a bit more adoption. Then 5 years of explosive growth, and the next 5 years will likely include more explosive growth as we address all the issues that they currently have.

    Pretty much all technology improves like this until they plateau. Just look at mobile phones, they first popped up ~50 years ago and had very slow improvements for about ~25 years until they started to pick steam in the 2000s, and then absolutely exploded in the 2010s.

    You’re just an old person yelling at clouds who can’t read the writing on the wall.