Ontario Auditor General Shelley Spence has warned that AI-powered medical transcription tools are producing unreliable patient records. A recent investigation found that these systems often generate hallucinations or omit critical data, potentially endangering patient care.

The 20 AI programs flagged by Shelley Spence

The province's Auditor General, Shelley Spence, uncovered extensive errors across 20 different AI programs designed to assist doctors with note-taking. According to the report, these tools—which are intended to streamline the transcription of conversations between physicians and their patients—frequently provided incomplete or entirely incorrect information. This discovery was not an isolated study but part of a broader probe into how artificial intelligence is being integrated across the Ontario provincial public service.

The scale of the failure suggests that the deployment of these tools may have outpaced the rigorous testing required for clinical environments.. When AI systems are used to document medical history or current symptoms, the margin for error is virtually zero, yet the Auditor General found that these 20 programs were not evaluated adequately before being put into practice.

How AI hallucinations threaten Ontario patient treatment plans

A primary concern highlighted in the report is the presence of "hallucinations," a phenomenon where AI generates confident but entirely fabricated information. In a medical context, a hallucinated symptom or a misplaced medication dosage in a transcript could lead to inadequate or even harmful treatment plans for patients. As the report says, these inaccuracies have the potential to directly impact health outcomes by misleading the providers who rely on these notes for long-term care.

The danger is compounded by the nature of medical documentation. If a doctor relies on an AI-generated summary that omits a critical allergy or a specific patient complaint, the resulting clinical decision could be based on a flawed data set. this transforms a tool meant for efficiency into a potential liability for the Ontario healthcare system.

The government's mandate for manual doctor reviews

To mitigate the risks of AI errors, the Ontario government has issued guidelines requiring doctors to manually review every AI-generated note to ensure accuracy. While this creates a safety buffer, it essentially shifts the burden of quality control back onto the physician. The original promise of AI scribes was to reduce administrative burnout; however, the necessity of auditing every word of a transcript may negate those time-saving benefits.

This manual review process assumes that doctors have the time and inclination to catch subtle hallucinations. If a physician is overworked, they may be more likely to skim a transcript and miss a critical error, meaning the government's current solution is a procedural fix for a technical failure.

A wider pattern of AI failure in Ontario's public service

The failures of these medical scribes are part of a larger trend of rapid, under-tested AI adoption within the Ontario provincial public service. This rush to automate mirrors a global trend where governments deploy generative AI to cut costs or increase efficiency without establishing strict validation frameworks. the Auditor General's probe suggests that the province may be treating high-stakes medical data with the same casualness as low-stakes administrative tasks.

Historically, the introduction of electronic health records (EHR) faced similar teething problems, but the autonomous nature of AI introduces a new variable: the ability of the system to invent facts. This makes the Ontario experience a cautionary tale for other jurisdictions considering the integration of LLMs (Large Language Models) into frontline healthcare.

Which specific AI vendors failed the Auditor General's test?

Despite the severity of the findings, the report leaves several critical questions unanswered. Most notably, the Auditor General did not publicly name the specific vendors or the names of the 20 programs that failed the evaluation. Without this transparency, healthcare providers in Ontario may not know if the specific tool they are using is one of the problematic systems flagged by Shelley Spence.

Furthermore, the report does not specify the exact error rate or the frequency of hallucinations across the tested programs. It remains unclear whether some of the 20 tools performed significantly better than others, or if the inaccuracies were systemic across all platforms tested by the provincial government.