• pooterbroo@programming.dev
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    6 days ago

    Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.