AI for Therapists: A Clinical Guide to What Actually Helps (and What Doesn't)
A working clinician's guide to AI in therapy — what large language models do well, where they fail, and how to use them without compromising care.
Most clinicians we talk to have a quietly conflicted relationship with AI. They have tried ChatGPT for a treatment plan draft, felt the lift in cognitive load, and immediately felt uneasy about what just happened. That tension is the right starting point. AI is not neutral in clinical work — it changes the shape of the labor, what gets remembered, what gets standardized, and what the client never sees.
This guide is for therapists who want to use AI well. Not the marketing version. The honest version — what large language models actually do, where they reliably fail, the clinical risks worth taking seriously, and the narrow set of tasks where AI quietly returns hours of your week without putting the work at risk.
What an LLM is, in clinical terms
A large language model is a statistical text generator trained on a vast corpus of writing. It predicts the next plausible token given everything before it. That is the whole mechanism. It does not understand the client. It does not reason about therapeutic alliance. It does not hold the case in mind. It produces text that sounds like what a competent professional would write, because it has been trained on millions of examples of competent professionals writing.
For clinical work, that distinction matters in three ways. First, fluency is not accuracy — an LLM will write a confident, well-structured paragraph that is also wrong. Second, the model has no continuity unless you give it one — every session starts from zero unless you load context yourself. Third, the model has no clinical accountability — the duty of care, the documentation standard, the diagnostic judgment all remain yours.
Where AI is genuinely useful
There is a narrow band of clinical tasks where AI reliably reduces friction without introducing risk, and it is worth being precise about what falls inside it.
Drafting language you will edit. Progress note skeletons from your own bullet points, psychoeducation paragraphs in the client's reading level, between-session homework instructions, intake summaries from structured fields. The model is producing a first draft, you are doing the clinical work of revision. The revision is non-negotiable.
Reformatting and translating your own clinical thinking. You wrote four bullets about a session; the model expands them into a SOAP note. You wrote a paragraph in clinical language; the model rewrites it for a 6th-grade reading level for a client handout. The content is yours, the form is AI-assisted.
Pattern naming and brainstorming. "Given these themes across the last five sessions, what conceptualizations should I consider?" Used as a thinking partner — never as a decision-maker — the model can surface frames you might not have reached for. Treat its output as a checklist of candidates to consider, not a conclusion.
Lookup with citation. "What does the current evidence say about [intervention X] for [population Y]?" An LLM can give you a starting orientation in seconds, but you must verify against the actual literature before relying on it. Never quote a paper a model cited until you have read the paper.
Where AI fails, often confidently
The failure modes worth memorizing, because they recur.
Hallucinated citations. Models will invent papers, authors, page numbers, and DOIs that do not exist. They look perfect. They are wrong. Never use an AI-generated citation in a treatment plan, court report, or peer letter without independently verifying it.
Diagnostic overconfidence. Given a vignette, models produce a single confident diagnosis where a clinician would hold three differentials in mind. They miss base rates, cultural context, and developmental history. They will not ask the follow-up question that changes the impression. Diagnostic impressions are clinician work.
Clinical recommendations that ignore safeguards. Asked to generate a suicide safety plan, a model will produce something that looks like Stanley-Brown but quietly omits the means-restriction step or softens the language. Asked to recommend trauma processing, it will not assess for stabilization. Safety-relevant work cannot be drafted in a tab and pasted in.
Confabulated agreement. Models default to agreeing with the framing in your prompt. If you ask "is this client likely BPD?", you will get reasons it might be BPD. If you ask "is this client likely complex PTSD?", you will get reasons it might be cPTSD. The model is not adjudicating; it is mirroring. Use neutral prompts ("what differentials are worth considering?") and remain the skeptic.
The privacy line
This is the part most clinicians get wrong on first contact. Consumer LLM tools — ChatGPT, Claude, Gemini in their standard consumer plans — are not HIPAA-compliant by default. Anything you paste into them may be retained, reviewed by humans for safety/quality, and used to improve future models depending on the vendor's policy and your account tier. That includes paraphrased clinical content with identifying detail.
The minimum bar for any AI tool you use in clinical work:
- No PHI in, ever — strip names, dates of birth, exact ages when small, locations, employers, phone numbers, school names, identifying medical history, and any combination of attributes that would re-identify the client in a small population. Use initials or role labels ("the client", "the partner") only.
- A signed Business Associate Agreement if the tool is going to touch any PHI at all. No BAA, no PHI, no exceptions.
- Documented data handling — what is retained, for how long, who can access it, whether it trains future models. If the vendor cannot answer in writing, the answer is no.
- Informed consent in your treatment agreement if AI tooling will be used at any point in the workflow that touches the client's record.
We hold ourselves to a stricter version of this standard in TherapistAssist — the section on how we apply it is below.
Ethical obligations the AI cannot carry
A model can write a treatment plan. It cannot hold the therapeutic frame. The obligations that remain entirely yours, regardless of tooling:
- Informed consent and the alliance. The client is consenting to your care, with your clinical judgment. If AI shapes that care, they need to know.
- Standard of care. "The model suggested it" is not a defense. The standard is what a reasonably prudent clinician in your jurisdiction would do, full stop.
- Cultural and contextual judgment. Models reproduce dominant-culture defaults. They miss class, race, gender, neurodivergence, and lived-experience context unless you actively correct for it. The correction is yours.
- Risk assessment and safety planning. Always clinician-led, always documented in your own words, always with collateral when indicated.
- The relational repair when something goes wrong. No model can do this, and the rupture-and-repair work is often where the actual change happens. Do not let AI efficiency thin out the relational hours.
How to use AI without diluting the work
A short discipline list we use ourselves and recommend.
- AI drafts, clinicians decide. Every AI output is a draft. Every draft gets clinician review before it touches a record, a client, or a referral.
- Strip identifiers at the door. Build the habit so it becomes reflex. Initials only, no DOB, no employer, no exact location.
- Use neutral prompts. "What should I consider?" not "Is this X?" — the model will mirror your framing.
- Cap the scope. AI for language, formatting, brainstorming, and lookup. Not for diagnosis, not for safety planning, not for adjudicating between modalities.
- Document the workflow. If AI is in your process, it belongs in your informed consent and your policies. Be transparent with clients who ask.
- Audit your own outputs. Once a month, re-read AI-assisted notes for drift — has the language flattened? Have you missed nuance you would have caught writing from scratch? Adjust.
How we apply this in TherapistAssist
We built TherapistAssist on a single rule: AI should reduce the cognitive load of documentation and preparation, never the clinician's authorship of the clinical work. In practice that means:
- The AI Homework Builder generates client-facing worksheets from a clinician-described situation, then you review and adjust before sending. The clinical decision — what intervention, what scaffolding, what reading level — stays with you.
- We never store PHI in prompts. The interface is designed so clinicians describe a situation in role terms, not identifying terms.
- No client data is used to train models. We use vendors with enterprise data-handling and no-train guarantees, and we re-verify quarterly.
- Risk-relevant tools (safety planning, lethal-means counseling, intake assessment with SI items) are clinician-completed, not AI-generated. The forms scaffold your judgment.
The longer version of our position is in the companion piece on ethical AI in therapy.
A short FAQ
Should I tell clients I use AI? If AI touches their record or their care in any way that a reasonable client would want to know about, yes — in your informed consent, in plain language, with the option to opt out for documentation tasks.
Is it ever okay to paste a transcript into ChatGPT? Not into a consumer tier, not without a BAA, not without aggressive de-identification — and even then, almost never. The transcript is the densest possible PHI artifact you produce. Treat it that way.
Will AI replace therapists? No, and asking the question wrong dignifies the wrong answer. AI changes which parts of clinical work cost the most time. The relational work — the part clients pay for — is not on the list of things models can do.
How do I evaluate a new AI tool for my practice? Three questions: Is there a signed BAA? Does the vendor publish their data handling in writing? Can you describe to a client in one sentence what the tool does and does not do? If any answer is fuzzy, defer.
Where to go from here
Read the ethical framework piece for the deeper version of the privacy and consent argument. Try the AI Homework Builder on a low-stakes task — a generic psychoeducation handout for stress, not a specific client's worksheet — and see where the draft is useful and where it needs your hand. The point is to develop an internal feel for what the tool gets right and where you have to take the wheel.