Large Language Models in Psychology: a Workshop for MSc Students

In March, I had a chance to spend a long day with MSc students in personality psychology and talk about something that is changing their field whether they like it or not — large language models. The format was a six-hour workshop: half theory, half hands-on demos, with a lot of conversation in between. The goal was simple but ambitious: by the end of the day, the students should know not only how to use LLMs, but also why they work, where they break, and what this means for psychological research and practice.

Why this matters for psychology

It is tempting to treat LLMs as just another piece of software that one needs to be slightly familiar with. I argued the opposite: for psychology — both as a science and as a practice — LLMs are arguably the most consequential tool to appear in the last few decades. Today, more than 300 million people use ChatGPT every week. About 85% of Fortune 500 companies have already integrated LLMs into their workflows. Psychologists are, of course, no exception: clients are arriving with conclusions drawn from ChatGPT, students are writing literature reviews with the help of Claude, and qualitative researchers are starting to use LLMs for thematic analysis.

Once you accept that this technology is here to stay, the question is no longer whether to use it, but how to use it responsibly and competently.

A short history of AI — without the magic

Before talking about how to use the models, I wanted to demystify them. We spent a meaningful chunk of the workshop walking from the 1950s (Turing test, Searle’s Chinese Room, Nagel’s philosophical concerns about consciousness) through classical machine learning of the 80s and 90s, the deep-learning revolution of the 2010s, the “Attention Is All You Need” paper in 2017, and finally to the current generation of decoder-only transformers — GPT-5, Claude 4.6, Gemini 3.

The reason for that detour is that, in my experience, students relate to LLMs very differently once they understand that there is no homunculus inside. We went through the basics of supervised, unsupervised, and reinforcement learning with concrete psychology-flavoured examples (kNN classifying anxiety vs. depression, decision trees for screening, neural networks predicting whether someone will go for a walk). We worked through a tiny neural network by hand — multiplying matrices, applying ReLU, computing a sigmoid — so that everyone could see, in plain arithmetic, what “the model thinks” actually means: numbers in matrices, nothing more.

Then we went one level deeper: tokenisation, embeddings, the attention mechanism. The library metaphor worked surprisingly well — a query, a key on each book’s spine, and the value inside. We even computed attention scores for a short Russian sentence by hand to demonstrate how a single word’s representation shifts toward the context. Multi-head attention, masked self-attention, autoregressive generation, the three stages of training (pre-training, SFT, RLHF) — all of it fits into a few hours if you let the students do the math themselves.

Myths worth breaking

Once the mechanics are demystified, the myths fall on their own. I picked three that come up most often in psychology contexts:

“LLMs store all the texts they were trained on.” No — they store statistical patterns. The weights are compressed knowledge, not a database.
“LLMs tell the truth.” They hallucinate, and so do humans — something psychologists, of all people, should appreciate. Critical thinking and verification are not optional.
“LLMs will never replace X.” They might. AI ethics can be formalised. Research already shows that people sometimes feel more comfortable opening up to bots. Add a body, and the picture changes further. The honest answer is: we don’t know, but “never” is not a serious position.

Where LLMs actually help in psychology

The most useful part of the workshop, judging by the questions, was the demo block. We went through concrete examples:

Qualitative analysis. Coding interviews, thematic analysis, extracting categories from open-ended responses. With the right prompt, an LLM can do in fifteen minutes what used to take a research assistant a week — provided you treat its output as a first draft, not a final result.
Literature reviews. Synthesising hundreds of articles, comparing theoretical frameworks, fact-checking claims. Deep Research modes in modern models are genuinely useful here, and citations make the result auditable.
Psychometrics. Generating and validating questionnaires, drafting items aligned with a target construct, sanity-checking translations.
Therapy-session analysis. Transcription, pattern detection, theme extraction — with all the privacy caveats one would expect.

Multimodality matters too. Text is the lingua franca, but modern models also process images (diagrams, scans, hand-written notes) and audio (recorded sessions, lectures). For a discipline that lives off interviews and observations, this is a much bigger deal than it sounds.

Caveats — and they are not negligible

I tried to spend as much time on the caveats as on the capabilities. Hallucinations. Privacy and informed consent when working with sensitive material. The temptation to outsource clinical judgement to a chatbot. The risk of training students to write with an LLM before they have learned to write without one. The fact that an LLM is, by design, a fluent pattern-completer — and fluency is not truth.

For psychology specifically, there is also a subtler risk: LLMs reflect the corpora they were trained on, which means they reflect the dominant theoretical frames of the English-language internet. That is not nothing, and it is worth flagging to students who will be using these tools to think with.

What I hoped they would walk away with

Three things, mainly. First, that an LLM is a tool, not an oracle — useful in the hands of someone who understands what it is doing. Second, that the gap between “I use ChatGPT sometimes” and “I use LLMs as part of a research workflow” is large, and crossing it is mostly about practice and prompt discipline, not technical wizardry. Third, that the people best positioned to use these models well in psychology are psychologists themselves — not engineers who happen to have read a textbook on attachment theory.

The workshop ran for six hours, with breaks. We didn’t cover everything; we couldn’t. But by the end, the students were asking the right kind of questions — about validation, about ethics, about reproducibility — and that is, I think, the most one can hope for from a single day.