Rethinking UX research in the age of AI

Why starting fresh, testing carefully, and asking better questions matters more than ever

Jul 14, 2025

This week has been one of those weeks where a lot of ideas came at me from different directions - through conversations, testing things out in workshops, and a fair bit of reading. And the more I dug into how we are using AI in UX research, the more I realised how easily things can go off track if we are not paying close attention.

Here is a round-up of the key things I have been thinking about and sharing with others lately.

Starting fresh is not just good practice but it is now highly critical

Something I have really come to appreciate is how unreliable LLMs can become over the course of a conversation. I used to think this was only an issue in longer chats, but it turns out even a short back-and-forth can subtly shift the way a model interprets your prompts.

Some recent studies (and a few experiments I ran myself) showed that the quality of the output drops when you feed instructions step-by-step instead of all at once. They still droft, even the more reasoning-heavy models are not immune.

That is why I have started recommending people open a brand-new chat for any serious task. It is not just about tidiness, it is about maintaining control and avoiding unintentional interference from earlier turns.

Language models are incredibly suggestible

One of the more surprising things I witnessed was how easily a model can be nudged off course. In a conversation that casually referenced safaris a few times, the model eventually started hallucinating giraffes into the content without any prompt explicitly asking for them.

It sounds silly, but it is a good reminder: these models are not neutral. Every sentence you feed in leaves behind a trace and repeated ideas (even by accident) can skew the results. It is like writing in wet cement and every input sticks.

Context build-up can work against you

I have also been reflecting on how difficult it is to recover from early misunderstandings or errors in a conversation. Once a flawed assumption gets into the mix, it tends to hang around and shape what comes next and often without you realising.

People call this “context poisoning”, but I think of it more as gradual distortion. You might not see it straight away but the output of the model starts making less sense the longer you go. It is one of the reasons I do not trust long-running sessions for anything that matters.

One clear prompt is better than 10 clarifications

I am more convinced than ever that the most reliable outputs come from concise, well-structured instructions delivered in a single go. And I am saying this after testing various prompts and setups this week.

I now write my prompt like I am writing a mini research brief: clear scope, clear method, no assumptions that the model “remembers” what I said two minutes ago. I do this if I am doing anything that resembles research analysis or synthesis.

Using third-party tools? check what is behind the curtain

During a conversation with a data scientist last week, he shared an insight that completely changed my perspective. Something I have started incoportaing is how most AI research tools are not actually doing the hard work themselves. They are just interfacing with models like GPT, Claude, or others behind the scenes.

That is not necessarily a bad thing but it does mean we are relying on someone else’s prompt engineering, model selection, and context handling. If you don’t know what model a tool is using or how it is injecting your data then you are basically flying blind.

One simple trick I ahve found helpful is checking the subprocessor lists on privacy pages. It is not a full answer, but it gives you a sense of whether they are running things locally or just using OpenAI or Anthropic on your behalf.

Not all AI tools are what they seem

During a conversation with a data scientist last week, he shared an insight that completely changed my perspective.

A lot of these AI tools are beautifully packaged, with clever interfaces and automation features that promise to make synthesis “effortless.” But peel back the surface, and what you often find is that they are simply acting as middlemen. Connecting your data to a commercial model like GPT-4, Claude, or another foundational model in the background.

The trouble is, you are rarely told exactly how that is happening. What version of the model is running? What temperature settings are used? Is your data being chopped up and embedded via RAG or just dumped in as-is? Who is crafting the prompt on your behalf and what assumptions are baked into it?

It is a bit like handing over your notes to a stranger and asking them to write your insights without knowing their process, tools, or whether they have even read the whole thing.

That is why I have started encouraging people to look for indicators of what is going on behind the curtain. Sometimes you can find clues in a tools’ privacy policy or documentation like whether they mention prompt injection or API usage or subprocessor.

To be clear, I am not saying we should avoid these platforms entirely. They can be incredibly useful especially when time is tight. But if you are working with sensitive data or your research will inform major product decisions, it is worth asking: am I still the one in charge here? Or have I quietly given away the steering wheel?

Working with AI in UX means rethinking our methods

All of this has made me think about how AI is changing our practice and not just by automating tasks but by pushing us to adopt new mindsets. Here is where I have landed:

We need to treat AI like a new material. You can not master it without playing around, sketching with it, testing its limits.
We have to design our prompts and evaluation methods with the same care we would give to an A/B test or usability study.
AI-generated prototypes are brilliant for sparking ideas, but they are not production-ready and that is fine?? Sometimes it is worth making ten dodgy mockups just to discard nine and learn something from the tenth.
The more we use AI in human-centred research, the more responsibility we have to notice when things go wrong, especially around misrepresentation, bias, or harm.
Finally, we should not limit AI to just copying what we already do. Some of the most interesting work I have seen recently involves using AI to explore data differently or build interfaces that adapt to what users actually need in the moment.

To whoever is reading this, I will leave you with this thought: AI can absolutely support research and design but only if we stay in the driver’s seat. This week’s learnings reminded me that it is not enough to get a good-looking output. We need to understand how it got there, what is missing, and where it might be leading us astray.

Until we build stronger guardrails around how we use these tools, every insight we generate is only as good as the attention we gave to the process.

Until next time,