AI chatbots fail to diagnose patients by talking with them

Although popular AI models score highly on medical exams, their accuracy drops significantly when making a diagnosis based on a conversation with a simulated patient