A new study found that nearly half of the medical advice generated by popular AI chatbots like ChatGPT and Grok is problematic. The chatbots frequently provided incorrect health information, faked scientific references, and refused to admit ignorance.
Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.