• 0 Posts
  • 21 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2024

help-circle
  • Well, I suppose we can at least agree to disagree.

    I have seen so much incoherent but confident nonsense produced by LLMs (mainly by frontier models trying to do even basic software development) that I would not be able to say in good conscience that thought was involved. Junior developers would have done better. The experience definitely fits the behavior of a word predictor, though.

    Having seen what LLMs claim about software development, my stance is that absolutely no one should trust at face value what these models output. They’re Dunning-Kruger machines.

    As for producing new ideas, these models are as creative as a random number generator. Coincidentally, that’s what is responsible for faking their creativity (the “temperature” parameter).

    I guess that’s all I feel like saying in this particular thread.



  • I think the Wikipedia definition of thought is quite good.

    However, I have a feeling whatever definition I came up with, you’d just claim LLMs fit into it because their output is sometimes somewhat coherent.

    You can claim that technically LLMs “think” because the output text sometimes contains conclusions, and sometimes they’re even rational, even though the LLMs still struggle with counting Rs in “strawberry”.

    I find that disingenuous because it implies that the LLM is in any way aware of anything, that it can passively form ideas.

    Most importantly, it implies that you can trust it for even basic reasoning. That you can trust the plagiarism machine that tells you that you should put glue on your pizza, eat rocks and walk to the car wash instead of driving, or that you will be able to trust it at some point in the future.

    Whatever definition of thinking we use, it should include a simple rule - that the allegedly thinking entity should demonstrate that intelligence by being able to reliably answer simple queries correctly. Humans, by and large, can do that. LLMs fail at it miserably. If the LLMs were truly thinking, that should be shocking. Understanding the underlying technology - and that it is not truly reasoning - makes it obvious and expected.

    Even OpenAI admitted hallucinations are an unfixable mathematical inevitability - something you handwaved as a matter of time to fix. No, the fact that humans can have hallucinations is not comparable.


  • I hold a MSc from what is arguably the most prestigious University in Europe

    Good for you. Have a cookie, I guess?

    LLM do not simply regurgitate existing content, and are in fact capable of creating wholly new content not seen before.

    Citation needed.

    Hallucinations occur when their context buffer is too small, and as time goes on, it will largely be a thing of the past.

    A whole book of citations needed. That claim is wildly inconsistent with the consensus about AI hallucinations.

    Magic Eight Balls, as I’m sure you’re aware, have a limited, predetermined number of responses.

    You mean like how LLMs keep hallucinating the same passwords and nonexistent dependencies to the point that bad actors are using that fact to compromise vibe coded systems via techniques like slopsquatting?

    I would disagree with you, and would suspect you are basing your assessment of their abilities on dated usage.

    In fact, I keep experimenting with frontier models (including Fable when it was available) just so that the “but we’ve made so much progress in the past few months” argument can’t be used against me. You’re wildly overselling their capabilities.


  • Except LLM output is largely gibberish. Just confident gibberish. There’s a reason we call it “AI slop”.

    LLM responses are only ever “sound” when they’re regurgitating existing information they were trained on. Beyond some simple transformations, they are unable to create original ideas. They very frequently break down on somewhat unique tasks, as evidenced by the ever-prevalent code-slop which is eroding our software.

    They don’t have a memory of previous conversations (unless you literally copy-paste it into the prompt), they don’t learn (Claude “memories” is literally just copy-pasting a summary into the prompt, only automatically). They don’t have any “thoughts” of their own between prompts (OpenClaw just keeps prompting them to pretend they are autonomous).

    The underlying implementation of “reasoning” in LLMs is literally “hallucinate some more text which vaguely looks like thoughts and hope that influences the answer”. LLMs are probabilistic models which we figured out how to make so they produce somewhat correct-looking answers at a rate a little higher than chance.

    Magic 8-balls sometimes give sound responses. Do they think? Where do we draw the line with this interpretation of “thinking”?
















  • You need the problem space to be restricted to a manageable bunch of classes. If that’s not the case, split the problem until you get there.

    Make sure you have 100% test coverage on this piece of code and that the tests are actually understandable and documenting the functionality well. You might find that there is unreachable code which should be deleted. Mutation testing may also help find code paths which are untested (and so edge cases you might not have considered).

    Until that is true, write tests and/or refactor existing ones. At this point you may find a ton of new bugs there. Better find this now than later and then wonder whether your own changes introduced them. Your new failing tests will document the presence of those bugs.

    By now you should have full documentation of what the code is supposed to do via tests. This will be tremendously helpful in understanding the code.

    Then go public method by public method, line by line, renaming variables, extracting private methods etc. Your basic run-of-the-mill refactoring of classes, until you fully understand what’s going on in the production code.

    For every small refactor, run the tests and commit if they pass. If they fail, you only have a small amount of uncommitted code which you know is the culprit.

    Finally, fix any bugs you documented in the test writing stage.

    At this point you can add any new functionality to the code.