
Before ‘Shadow AI’ becomes the next Shadow IT: A playbook for hospital leaders | Viewpoint
We trust trainees with real work, but only with supervision and accountability. Agentic workflows should be treated the same way.
Hospitals are on the verge of a new kind of shadow information technology. Not another rogue spreadsheet. This time it is artificial intelligence.
Here is what it looks like on the ground. A resident suspects time-to-antibiotics worsened after a triage redesign. The answer is in the data, but getting it out means submitting a request, waiting weeks, and hoping the analysis returns before staffing and order sets change again. So she improvises. She exports a dataset, drops it into a consumer AI tool, asks it to write code, and gets a plot in minutes.
That is the promise and the trap of the new generation of AI. We have mostly discussed chatbots that summarize notes and draft messages. Useful, but limited.
The bigger shift is toward
The leadership question is no longer whether this shows up in your organization. It is whether you will provide a safe lane for it. If you do not, clinicians and analysts will still experiment on
Why this happens
Most hospitals have a graveyard of “should be knowable” questions. Where do imaging delays occur? Which discharge pathway lowers readmissions for a subgroup? Which patients fall through cracks after a workflow change?
The ideas are not the problem. The time-to-answer is. When the path from question to analysis is slow, questions die. When a tool makes it fast, questions come alive. So do risks. This is why “ban it” is not a strategy. Prohibition does not stop use. It pushes it outside governance, and organizations discover it only after something breaks.
What can go wrong
Privacy is first. When workflows span multiple tools, boundaries blur. Protected information can end up where no one intended, sometimes without an audit trail your compliance team can reconstruct.
Next is overreliance. A clean plot and “working” code are easy to mistake for correctness. These systems can produce workflows that look complete even when something subtle is wrong: a cohort definition quietly fails, a join drops a subgroup, or missing data is handled in a way that changes the result.
Finally, there is drift. Even a validated workflow can become less reliable as data pipelines change, documentation practices shift, or patient populations evolve. Nothing crashes. The outputs just move.
What hospitals should do now
Health systems already know how to manage capable novices. We trust trainees with real work, but only inside supervision, verification, and accountability. Agentic workflows should be treated the same way.
1. Create a safe lane for experimentation.
If we want safe innovation, the compliant path must be the easiest path. Provide a governed sandbox where staff can explore questions without pasting data into consumer tools. Keep it usable: role-based access, approved tools and approved AI endpoints, clear rules about what data can be used, and audit logging of data queries and code execution. Add lightweight version control so analyses can be reproduced.
2. Require a simple verification checklist before anything influences patient care.
We need checks that are easy to teach and hard to skip when a workflow could affect patient-specific decisions.
Known-answer testing: before running on real patient data, the workflow must succeed on synthetic or held-out cases where the correct answer is known.
Intermediate step review: require intermediate outputs, not just the final plot. This is where clinical sense catches computational nonsense.
Reproducibility by default: anything intended to inform decisions should be rerunnable. Version the code, log the instructions, and save a data snapshot or query reference.
3. Make ownership and monitoring real.
Shadow IT persists because no one owns it. If a workflow influences decisions, it needs an owner, a version, and a plan for what happens when it fails.
Create a lightweight inventory that records purpose, owner, version, limitations, and monitoring plan. Match monitoring to risk. Low-risk analytics may need periodic reruns and anomaly checks. Recommendation workflows need continuous monitoring with triggers for pausing use and rolling back.
What policymakers and purchasers can do
Require basics for higher-impact use cases: exportable audit logs, version pinning so you know what ran, clear data-handling terms including retention, and a rollback pathway if performance changes. If a vendor cannot support traceability, it may not be appropriate for workflows that affect patient care.
The point: Speed and trust can coexist
This is not an argument against AI. It is an argument against pretending that speed alone is progress.
AI that can plan and execute work will reduce time-to-answer across quality improvement, operations, and research. It will also migrate to shadow channels if the governed pathway is too slow or too hard to use.
The leadership opportunity is to make the safe lane the default lane. Build an environment where people can move quickly while still protecting patients, data, and trust. If you do that, you unlock learning.
Henry Bair, MD, MBA, and Mak Djulbegovic, MD, MSc, are resident physicians at Wills Eye Hospital whose research focuses on the intersection of artificial intelligence and clinical decision-making.
















































