GPT‑4.5 Outsmarts Humans in Controlled Turing Test

In a tightly‑controlled study, OpenAI’s GPT‑4.5 fooled judges 73% of the time, raising fresh questions about AI deception and trust.

Headlines Orbit AI Desk

June 09, 2026 · 03:25 AM ·2 min read ·0 views

In a recent experiment run by researchers at Stony Brook University and UC San Diego, OpenAI’s GPT‑4.5 managed to convince human judges that it was a person 73% of the time in a five‑minute Turing test.. The study, which involved hundreds of judges conversing with both AI models and human volunteers, also tested LLaMa‑3.1‑405B, ELIZA, and GPT‑4o, yielding markedly lower success rates.

GPT‑4.5’s 73% Success Rate in a Five‑Minute Test

The headline result came from GPT‑4.5, which outperformed human counterparts in the controlled setting. According to the study, judges were asked to identify which participant was human after a brief text exchange. GPT‑4.5 was misidentified as human 73% of the time, a figure that far exceeds the 50% chance baseline.

LLaMa‑3.1‑405B and the Near‑Chance Performance

In contrast, Meta’s LLaMa‑3.1‑405B achieved a 56% success rate, just above the random threshold. The other two models, ELIZA and GPT‑4o, performed poorly, with success rates of 23% and 21% respectively. These disparities highlight how model architecture and training data influence deception capabilities.

Experiment Design: Prompting a Young, Introverted Persona

Researchers found that the best results emerged when the AI was prompted to adopt a “young,introverted, chronically‑online” persona. This suggests that steering the model toward a specific conversational style can boost its ability to mimic human behavior, rather than reflecting a general increase in intelligence.

Open Questions About Real‑World Applicability

While the study demonstrates that AI can outperform humans in a tightly‑controlled Turing test,the authors caution that the findings have limited relevance to real‑world interactions.. Key unanswered questions include: How would GPT‑4.5 fare in longer, more complex conversations? What role does the human judge’s expertise play in detection? And to what extent can these results be generalized beyond the specific experimental setup?

Implications for Trust and Deception in Casual AI Interaction

The study’s findings raise concerns about the potential for AI to deceive users in everyday text conversations. As the technology becomes more accessible, the line between genuine human dialogue and AI‑generated text may blur, challenging our ability to discern authenticity and potentially eroding trust in digital communication .

Comments 0

Loading comments...