Skip to content
Paul Sherman
April 21, 2026

AI Checking Its Own Work: Bad Idea Right Now

P17 - Head of UX, Design Consultancy

A 25-year UX strategist at a small design agency watching his clients' customer journeys move off-site into LLMs, warning that organizations now running synthetic personas and synthetic user testing against AI-generated work are one expensive reality check away from a major failure, while quietly testing what a strategist's new execution range looks like when Figma Make plus Supabase removes the database-schema barrier.

Isn't there a concern of being completely disassociated from the actual reality of the user, and the fact that AI is checking its own work?

P17: Survey Data and Session Summary

Survey Responses

QuestionResponse
Age55–64
EducationBachelor's degree
Current role / position levelBoth practitioner & lead
Job titleHead of UX
Years of professional experienceMore than 25 years
Organization descriptionUX Research / Design / Strategy
IndustryOther or not sure
Individual AI tools usedText generation (e.g., creating documents, emails, summaries), Search and information retrieval, Data analysis and synthesis, Code generation and completion, Vibe coding / design
Organizational AI tools deployedInternal search and knowledge summarization, Code generation and developer tools, Figma design, research automation
AI adoption involvementContributed to technical design, requirements gathering, or implementation; Led the project, strategy, or initiative (project manager, initiative owner); Provided subject matter expertise, requirements, or end-user feedback
Biggest work win with AIRapid prototype development in order to visualize scope of user experience and capture project vision.
Biggest disappointment with AIAnalysis tasks that seem correct but that are executed poorly and in a manner that is difficult to trust.
Organization's biggest AI successLearning to encapsulate our institutional knowledge and expertise into automated toolsets.
Organization's biggest AI challengeAs the tools evolve we see greater opportunities but challenges in aligning our processes with variables outside of our control.

Background

P17 is Head of UX at a small design agency, with more than 25 years of professional experience. He is both a practitioner and a design strategy lead, and his perspective on AI adoption is distinct from most participants in this study: he's watching his clients navigate AI adoption, rather than primarily navigating it from inside a single organization. The portfolio of clients he sees gives him a wide-angle view of the distribution of approaches, from hesitancy on one end to companies declaring they are going "completely AI" by a fixed date on the other.

His "oh wow" moment with AI was Figma Make connected to Supabase. The combination let him do something he has spent years working around: build a prototype that actually persists data, generates fake records, and round-trips schema changes from the interface back to the database. The barrier that disappeared was not coding (he had AI to write the code) but database schema design, which he names as "really not my sweet spot." That single observation organizes much of his self-assessment for the rest of the session.

The agency perspective also gives him line of sight into how LLMs are restructuring his clients' customer journeys. Organic SEO is "getting brutalized," product discovery is moving off-site into LLMs, and clients are trying to figure out how marketing websites should change when discovery has already happened elsewhere. He has been advocating that clients restructure content into atomistic, recombinable units that LLMs can use without extrapolating into hallucination. He thinks the cost of doing this well may be high, but the cost of doing it poorly is higher.

Key Findings

Designing Agent-Optimized Marketing Sites

P17 articulates the clearest end-to-end picture in the study so far of how LLMs are reshaping the customer journey for B2B websites. His clients' organic SEO is "getting brutalized" because users are running product comparisons and discovery questions inside LLMs instead of inside search engines. The role of the marketing website is shifting from discovery (where am I, what do you do) to confirmation (the LLM told me X, is that true). Companies have near-zero visibility into how they are being represented inside LLMs, and the tools springing up to give them visibility (e.g., Profound, SEMrush) are guessing at LLM behavior by querying it repeatedly rather than measuring real user conversations.

"All of our clients have noted that their organic SEO is just getting brutalized... It feels like we're right back to the very beginning of Google organic search and Google AdWords, where people were trying to game or guess the algorithm. That's where we are right now with LLMs."

His prescription is to restructure content into "atomistic parts" that LLMs can recombine accurately for the very specific questions users are asking. Generic marketing pages and broad product summaries do not give the LLM enough material to handle requests like "I'm a midsize manufacturing company, I need a product that does X, Y, Z with this licensing model." Companies that do not break down their offerings into recombinable units force the LLM to extrapolate, which is where hallucinations enter.

"It's about, how do you essentially break down your solutions into atomistic parts, so that the LLM can appropriately recombine them and then talk about them in a way that minimizes hallucination and extrapolation? Because that's what people are doing [with] LLMs."

The expensive part is not the strategy. The expensive part is the execution: every knowledge-based article, every data point buried in product documentation, must be indexed, kept accurate, and date-stamped well enough for the LLM to know how stale it is. P17 notes that LLMs do not look at date stamps the way a human reader does, so a four-year-old article being pulled into a current LLM response is "a real danger." This is the Top-of-Funnel Displacement theme as a fully fleshed argument, not a one-line complaint.

AI Checking Its Own Work

P17 names a recursive failure mode that this study has not yet captured: organizations stacking AI evaluation on top of AI generation, with no human-grounded reference point anywhere in the loop. The concrete instance is a friend's company doing synthetic personas + synthetic user testing + AI-run QA in an effort to ship a wholesale-AI redesign by a fixed date. The friend acknowledges the blind-spot risk but accepts it for the theoretical efficiency.

"My friend said 'We're using synthetic personas to validate it. We're doing synthetic user testing in order to test it. We're running QA programs against that.' I'm like, 'Well, you're doing it all within an AI. Isn't there a concern of being completely disassociated from the actual reality of the user, and the fact that AI is checking its own work?'"

The failure mode is structural rather than tactical. Any blind spot baked into the synthetic personas persists through every subsequent validation step that uses those personas, because the validators inherit the same gap as the generators. P17 frames it as "one big screw-up away from a big expensive reality check," with the specific failure surfaces being security incidents, user experience regressions, and transactional failures. This is Synthetic Validation Recursion (new) and it predicts a class of post-launch failures that have plausible-looking pre-launch validation reports.

AI Augmentation Works When It's Bounded

P17's "oh wow" moment is a precise illustration of AI as Cognitive Prosthetic. He's been creating prototypes for 25 years, but database schema design is "really not my sweet spot," which has historically meant that any prototype with real persistence required handing the data layer to someone else. Figma Make plus Supabase changed that for him: he tells the interface what he wants the prototype to do, and the system handles the schema inference, recombines existing data shapes, and round-trips the changes back to the prototype.

"I can get some of the functionality that I really want without necessarily having to go through and use a separate interface in order to try to describe what I need that database schema to be, which is really not my sweet spot. So the ability for the interface to infer what a database schema would be, and as I would imagine a whole bunch of other technical factors, from what I want the interface to be able to accomplish, and the functionality it has to support, was actually a big moment for me."

The framing is specific rather than general. He's not claiming AI made him faster across the board. He's naming a self-identified limitation (schema design) and a tool that targets exactly that limitation. This is the same idea as P6 and P9 using AI to compensate for cognitive weaknesses, and P16's "process brain first versus a visual person" framing. P17 also takes the next step, locating this new execution range inside his strategist role. This doesn't mean that he can now replace his team's designers or developers. It means that he can take a prototype further into execution before handing it off.

"As someone whose role is primarily a strategist, I can actually see advantages to me, because I think about it in terms of inputs and research, and putting it all together, and then I do a certain level of design, and then I pass it off to someone else. Theoretically as a good design strategist, that opens up doors for me to actually do execution that I wouldn't be able to do normally."

The Human Factors Gap: Reading Markdown vs. Seeing the Page

P17 raises an observation about how LLMs perceive interfaces that the codebook had not yet captured. When asked to evaluate a UX design, an LLM responds confidently with a checklist-style assessment ("check, check, check, check, check") that looks like a competent review. The problem, he explains, is that the LLM is not seeing the interface the way a human user sees it. It's reading markdown, structural relationships, and whatever schema sits behind the rendered page. It's not perceiving proximity, visibility, prioritization, or obscurity the way a human eye does.

"It's important to understand that human beings digest information very differently, and LLMs do a very good job of obscuring that fact."

The mechanism is mechanistically distinct from hallucination. The LLM is not making things up; it's correctly reporting on what it can see, which is not what humans see. P17 has tested this directly by debating with Claude about its own perceptual model:

"I've been going back and forth with Claude about this and I've been like, 'All the ways in which you perceive an interface are very different from the ways that I perceive an interface, and there are, there seem to be, hidden obstacles in that process.' It will tell you, 'Yes, you're absolutely right. I'm looking at in terms of a markdown. I'm looking at in terms of, what if there's a schema behind it.'"

This new theme (AI–Human Perception Mismatch) names a specific risk surface for the wave of AI-driven UX evaluation: confident pass/fail judgments produced by a system whose perceptual modality does not include the human factors UX is fundamentally about. P17 notes that this is fine when the goal is to produce interfaces another machine will read (he expects agentic-to-agentic interactions to be where AI shines), but it is a structural problem for interfaces real people will use.

The Big Fears

When asked about his biggest fear, P17 names replacement directly: that the perception of clients will become that all his embedded UX expertise already exists inside the LLM, so they no longer need to hire him for it. He reaches for the travel agent as the historical referent: a profession built on being a storehouse and guardian of information, destroyed when the internet made the information directly accessible.

"People whose jobs were built upon being a storehouse and guardian of knowledge, they got destroyed, like, when you talk about a travel agent and things like that. Because people didn't have access to the information, and the travel agent had the ability to piece together, hold and guard, that information, and then piece that information together for you. In some ways, that's some of what our job is."

He locates the fear inside a specific life stage: "as a 50-something-year-old guy who's been working in technology, like, is that going to be obliterated?" That single phrase is the cleanest articulation in the study so far of late-career Job Security Anxiety in AI-exposed knowledge work. The fear is not abstract about the labor market; it is concrete about whether the next 5–10 years of his own career could end abruptly.

The other half of this finding is the apprenticeship case he makes for the next generation. He spends multiple turns arguing that the cost of pulling AI into entry-level work will lead to the destruction of the developmental pipeline that produces senior practitioners who can spot edge cases and non-obvious failures.

"It's about building the next generation of people who can be the senior developers who are going to look at this and be able to do it. The same thing is going to happen with design. The same thing is going to happen with research."

The conjunction of the two halves, his own replacement fear and the apprenticeship erosion he expects across all three of his agency's professional roles, is the emotional center of the session.

Warehouse of Project Knowledge vs. Slop

P17 offers a positive case for Vibe Code Governance that participants in earlier sessions had only seen as a negative pattern (P14 and P16 both framed vibe-coded artifacts as governance problems). His agency built an internal estimating tool through vibe coding. It is useful internally. But he never intends it to be external-facing, and he was clear about why: "if it fucks up, then all of a sudden, it would be a significant problem."

What made the experience useful was not the speed of execution. It was that the prototype became, in his framing, a "warehouse of project knowledge." Once functional, the artifact carried business logic, requirements, and design intent in a form that downstream team members (information designers, visual designers, developers) could pick up and refine without re-reading static specifications.

"I never viewed this as an opportunity to supplant the other people in my team or doing that process, but it allowed me to get them an artifact that was much richer and much more effective."

The negative case lives in the same session, in his concern about product managers who bypass designers and engineers entirely:

"When you have a product manager who thinks, 'I don't need designers, I don't even need engineers. I'm just going to spec it all out and feed it in here, and then I'm going to look at it not with the critical eye that really is necessary, but just from a product manager perspective, and like, it works, and then push it to production.' And then it turns out it doesn't work because there was no critical expertise."

The contrast is useful. The same tool, used inside a scaffolded process with the right people around it, becomes a productive artifact. Used as a substitute for the people, it produces work that looks finished but is not.

Emerging Themes

ThemeDescriptionKey Quote
Augmentation Not ReplacementUsing AI to extend existing roles rather than displace teammates"I never viewed this as an opportunity to supplant the other people in my team."
AI as Cognitive ProstheticTargeted use of AI to compensate for a self-identified skill gap"Database schema design is really not my sweet spot."
Organizational AI Adoption ChallengesThe full distribution of client responses, from hesitancy to wholesale "going AI" by a fixed date"There's a lot of hesitancy there that's happening as they are trying to figure it out."
Synthetic Validation Recursion (NEW)AI validating AI output with no human-grounded reference anywhere in the loop"Isn't there a concern... that AI is checking its own work?"
Top-of-Funnel Displacement (NEW)The compression and relocation of product discovery from people-focused marketing channels into agent-mediated interactions"The role of a lot of our clients' marketing website has shifted from discovery to confirmation or validation."
AI–Human Perception Mismatch (NEW)LLMs read markdown and structure; they do not see proximity, visibility, prioritization the way humans do"LLMs do a very good job of obscuring that fact."
Hallucination FrustrationConfident citations that do not cover the actual evidence base"It's going to give you a handful of those. It's not going to give you all of them."
Trust CalibrationProbabilistic-engine awareness at the model level, higher accuracy bar at the onsite-deployment level"You have no idea where that evidence comes from. So controlling the evidence, right?"
Knowledge DisplacementClients perceive the LLM as already containing the practitioner's expertise"I don't need your personal expertise about it."
Job Security AnxietyLate-career framing of replacement risk"Is that going to be obliterated?"
Apprenticeship ErosionLoss of the developmental pipeline that produces senior practitioners across design, development, and research"It's about building the next generation of people who can be the senior developers."
Vibe Code GovernanceBoth the negative case (PM bypasses everyone) and the positive case (artifact as warehouse of project knowledge)"When you have a product manager who thinks, 'I don't need designers, I don't even need engineers.'"

P17 contributes three new themes to the codebook: Top-of-Funnel Displacement, Synthetic Validation Recursion, and AI–Human Perception Mismatch. Each names a pattern that no prior session articulated in this form, and each predicts a specific class of failures I expect to see again in future sessions.

P17's Augmentation Not Replacement framing is locked tightly to his strategist role. He's making a specific claim that vibe-coded artifacts let a strategist take a project further into execution before the handoff, and that the value to the team is a richer artifact, not fewer people on the team.

"It wasn't so much that I was replacing our information designer or our visual designer. It was the fact that the things that I needed this interface to be able to do were very hard to communicate in terms of requirements or in terms of business logic, or anything like that. You were able to visualize that very easily."

P17's AI as Cognitive Prosthetic evidence is the cleanest instance in the dataset so far. He names a specific limitation (database schema design) and technical stack (Figma Make + Supabase). This lines up with P6, P9, and P16's similar pattern of targeting AI at a self-identified gap.

"The ability for the interface to infer what a database schema would be, and as I would imagine a whole bunch of other technical factors, from what I want the interface to be able to accomplish, and the functionality it has to support, was actually a big moment for me."

P17's Organizational AI Adoption Challenges evidence covers a wider range than any single session so far. His agency perspective gives him exposure to clients across the cautious-to-aggressive spectrum simultaneously, including the friend's company that has declared a full-AI conversion by June 1st. He is also the first participant to itemize the operational cost of doing marketing website AI-readiness well: optimizing all knowledge-based articles, ensuring date stamps, fixing indexing.

"For our clients right now there is uncertainty, and even if they kind of think, 'Well, this is probably where it's going to go,' it's going to be expensive to get there. I mean, you're talking about optimizing all of your knowledge-based articles, you're talking about optimizing all this data that is buried down in there."

P17's Synthetic Validation Recursion contribution is the first evidence for this theme that I've found. The friend's company is the concrete instance, and P17's "one big screw-up away" framing is the prediction worth tracking.

"Any blind spots that are built into, let's say, the synthetic personas, are going to persist throughout that process... It made me think that we're one big screw-up away from having a big expensive reality check in terms of what people are outputting."

P17's Top-of-Funnel Displacement contribution is the longest sustained section of the session (roughly 00:08:54 through 00:16:00). He identifies a structural shift in B2B digital marketing that affects every client he has, and it predicts specific operational consequences (e.g., the need for atomistic content, and agent-friendly website designs). This theme is worth watching for marketing strategy-adjacent participants in future sessions.

"That is a big thing for our customers, our clients, trying to understand: how does LLM fit into the overall customer journey? How is the customer journey being changed by this? Then what do they need to do in terms of optimizing their website to really do the things that LLMs can't do very well?"

P17's AI–Human Perception Mismatch contribution is the first evidence for this theme. It's unusual in this study because P17 has tested the claim directly by interrogating Claude about its own perceptual model and Claude has confirmed the gap back to him. That makes the evidence both observational and demonstrable, and the failure mode it predicts (confident AI UX assessments that miss the human factors elements that UX is fundamentally about) is one to watch for in any future session where a participant describes an AI evaluation of an interface.

"I've been going back and forth with Claude about this and I've been like, 'All the ways in which you perceive an interface are very different from the ways that I perceive an interface.'"

P17's Knowledge Displacement evidence operates at two scales simultaneously. At the individual scale, he worries that clients will perceive the LLM as already containing his embedded UX expertise. At the systemic scale, he worries about losing the ability to generate the senior practitioners who evaluate complex products, an evaluation skill the LLM cannot perform because it does not have access to the edge-case sense that comes from years of seeing things break.

"You're going to lose the ability to generate the people who are going to be necessary in order to do the overall evaluation of whether this thing is working correctly or not."

P17's Apprenticeship Erosion contribution applies the case to all three of his agency's professional tracks (design, development, research) and names the mechanism explicitly: the entry-level work is the developmental path to senior expertise, and removing the entry-level work removes the pipeline.

"I know that for a lot of entry-level jobs, and I know there's been a lot of talk about this, it's not about cost savings. It's about building the next generation of people who can be the senior developers who are going to look at this and be able to do it."

P17's Vibe Code Governance contribution is the first in the study to fully articulate the positive case for the theme. The same artifact (a vibe-coded prototype) becomes a "warehouse of project knowledge" when used inside a scaffolded process with the right people around it, and becomes a governance failure when used to bypass those people. The contrast clarifies what the theme is actually about: it's not the tool that introduces risk, but the absence of the surrounding process and expertise.

"One of the things that we found doing this sort of vibe coding that was going on, is, actually, it was still super useful to do high-level flows and functionality, because once you, if you just try to jump in and start coding it, the complexity gets to you really quickly."

Interview Transcript

00:00:00

Paul: Okay, so thanks for joining me today. The first thing I'd like to ask is, I'd like you to tell me the story of your first "oh wow" moment with AI. So what was going on that made you try AI, and what happened that made the light bulb turn on for you?

P17: I think for me, I used it for assistance and research analysis for sure. I had used it in my daily life as sort of like a super Google. But when I actually started to use Figma Make in terms of roughing out the experience of an actual thing from beginning to end, that was a change for me. There were two things about it that really jumped out. One was that Figma was the first one at that time that allowed you to actually select items within the interface and to comment and to prompt based on a particular item. That was excellent because one of the hardest things I found with using AI to do any sort of interface was describing the thing I wanted it to do and the particular thing I wanted to change, and not to change other things.

00:01:13

P17: That was a big moment where I thought, oh well, yeah, this makes sense. Now I can actually see how I can use this to strategically iterate or tactically iterate different elements of it. The other part was when I actually hooked up Figma to Supabase, and was able to make this thing really work, rather than just building prototypes for 25 years. It's always been a game of, you know, "imagine if" and "don't go down this path" and all those different things where you're sort of emulating actual functionality. Here I was able to actually make it work and to save that information not just in the session but to persist over time, which allowed me to generate a whole bunch of fake data to actually store in the Supabase. That was a big moment, but probably even more so was the idea that I didn't have to learn how to configure that database. As I was telling Figma what I wanted it to do and making changes to the interface, it would say, "Okay, hold on. I need to go back and look at that database and either create a new handler for it, or I need to

00:02:28

P17: change the way that this data is stored," because it will always assume the simplest thing. It will glom together five pieces of information into one item. But then when you say, "Well, actually, I want to use that information slightly differently," it will say, "Oh, I've got to take that big glob of information, break it up into five items. Hold on, let me go do that." And then it would make the changes in Supabase and round-trip those back to the interface. That was a sort of big moment for me, because I was like, well, I can get some of the functionality that I really want without necessarily having to go through and use a separate interface in order to try to describe what I need that database schema to be, which is really not my sweet spot. So the ability for the interface to infer what a database schema would be, and as I would imagine a whole bunch of other technical factors, from what I want the interface to be able to accomplish, and the functionality it has to support, was actually a big moment for me.

Paul: I've been asking people about how their organization is handling AI adoption, but because you're in a consulting role I want to change the question to: how are your clients handling AI adoption? I want you to do some compare and contrast and just talk to me about what you're seeing.

P17: So, I think that there's a lot of hesitancy right now, because teams are trying to figure out a couple different things. One, how should their teams be using AI, whether it's for design or for research or for a myriad other things? And the efficiency that they get out of that and/or the cost that's associated with that, how does that impact how they want to structure their team, whether they want to work with agencies or not? I think that there's a lot of hesitancy there that's happening as they are trying to figure it out.

00:04:32

P17: It's changing every day right now and there's a lot of hype around it as well. I was talking with a friend of mine who works for a company, and he said their company's going completely AI. Like, all their designers, all they're going to do is write specs. I was like, well, that makes sense in a lot of ways if you're doing additive features to an interface. It's a lot harder to spec out what an entire thing has to be. That gets really complicated really fast. That was actually one of the things I liked about Figma, and actually the thing that kind of changed a little bit. I'm a sort of a ground-up designer. I always think about what are the sort of building blocks, what is the data, what are the elements, and then build up the interface from there. Whereas I know a lot of designers really start with the first page and then they try to figure it out the other way.

00:05:27

P17: Being ground up actually works really well with AI, because what it allows you to do is think, okay, well I'm going to create the end-result page, let's say, and then I can evolve toward the warehouse and the functionality that gets the user to some version of that page. In the process of doing so, you're actually creating data for whatever you're creating. His company says, "We're doing this by like June 1st." I was like, "Wow, that's really ambitious." And he's like, "Yeah." I was like, "Are you concerned, you know, not for your job, but about the quality of the output?" He says, "Yeah, of course we're concerned." But the interesting thing was, it's like, "We're using synthetic personas to validate it. We're doing synthetic user testing in order to test it. We're running QA programs against that." I'm like, "Well, you're doing it all within an AI.

00:06:29

P17: Isn't there a concern of being completely disassociated from the actual reality of the user, and the fact that AI is checking its own work? Any blind spots that are built into, let's say, the synthetic personas, are going to persist throughout that process." He says, "It's a very real problem and we're aware of it, and we're going to try to tune it as we go through, but the theoretical cost savings, and time savings more than the cost savings, give us that 80/20 rule." I was like, "Well, good luck." It made me think that we're one big screw-up away from having a big expensive reality check in terms of what people are outputting, whether that's going to be security related, or user experience related, or transactional related in some way. So that was one perspective. In terms of what we're hearing from our clients, much more cautious in terms of how they're implementing it.

00:07:33

P17: We've been advocating for the implementation of more AI onsite for people, like their own versions of LLMs. People are becoming habituated to using LLMs for product discovery, product comparison, things like that, and they get the idea, but I think how they would prepare their organizations for that, in order for it to be effective, their benchmarks for accuracy are much higher. So if you put anything on your site that's LLM oriented and it tells someone something that's wrong, well, if that happens in ChatGPT, ChatGPT is like, "Sorry," you know, you go validate it with the client. But if you're actually on the client site and let's say I want to validate that a package or a certain combination of products will work in a certain way and I have very detailed questions and it's giving me very detailed answers and that turns out to be wrong, then the repercussions of that are much higher. I think people are starting to come around to the idea that so much of their product discovery process is happening off-site, outside their control, and they only have indirect control of that.

00:08:54

P17: Users are going and asking very detailed questions about our customers' products and trying to figure out what is the best product, what's the best version of the product. I always tell people, when people, it's a very hard process. No one likes to do it. It's no one's primary job to go and figure out what's this enterprise product that we have to implement across our organization, or even just for our department. It's always been a very hard thing that no one really likes, mostly because a lot of times getting the information has been very hard. Creating an apples-to-apples comparison is very hard, and LLMs are making that very easy. Now, whether it's doing it correctly or not, or doing it in your favor or not, is another question. But from the user standpoint, I suspect that people are asking very detailed questions that these companies don't have content to really support yet.

00:09:56

P17: They're saying things like, "Look, I'm a midsize manufacturing company. We need a product that has this function, this function, this function. I want this type of licensing model." They are able to describe their problem. That's the only thing that they come into this process with at least some clarity about. The bridge has always been about trying to map their problem to your products, and to make it even harder, to map your product to other products and to be able to make that decision-making process feel accurate and complete. LLMs are helping with that tremendously. But in order for that to work, your company needs to actually be able to come up with super solutions, or synthetic solutions, that allow it to extrapolate correctly about how your product would work for the very specific situations that they are defining using LLM.

Paul: That's an interesting challenge.

P17: And it is a very interesting time, because it's not about creating a thousand solutions. It's about, how do you essentially break down your solutions into atomistic parts, so that the LLM can appropriately recombine them and then talk about them in a way that minimizes hallucination and extrapolation? Because that's what people are doing on LLM. The role of a lot of our clients', let's say, marketing website has shifted from, or probably will continue to shift even more so from, discovery to confirmation or validation. That is really changing the role of what a website is supposed to do, and what becomes important on that. Now you have to figure out, if you assume that someone is having an experience on LLM and will eventually come to your website, then you have to figure out, what's that graceful bridge? Like, what have they been set for? What, when they actually do hit your website, what do they need to know? What do they already know? Which is hard, because LLMs are a bit of a black box right now, right? So you don't know, you can only kind of guess based on where they entered into.

00:12:11

Paul: It's sort of like SEO is necessary but not sufficient anymore.

P17: Yes. And all of our clients have noted that their organic SEO is just getting brutalized, because you're getting those Gemini answer overviews that are happening there, and a lot of people are shifting to an LLM and they're doing their discovery there. So the ability to precisely understand what people are [doing], and this is a problem, because all these companies have spent the last 10 years optimizing their SEO and they want to have the same type of inbound tracking that they've grown very accustomed to. Their ability to tune it and their ability to understand it, it feels like we're right back to the very beginning of Google organic search and Google AdWords, where people were trying to game or guess the algorithm. That's where we are right now with LLM. All the strategy around inbound LLM is basically, "How do I game it? How do I optimize for it?

00:13:23

P17: So I'm most likely to be cited, my answers are accurate, I have a large share of voice," all these different things. But at the end of the day, you have almost zero visibility into that. All the tools that are springing up right now, like Profound and Semrush and things like that, they're trying to mimic that sense of control, but it's really not, because, like, Profound is basically taking the 10 quote-unquote most likely answers, or most likely questions, that people have, and hitting those LLMs again and again and trying to say, "Well, this is what it's most likely saying." But there's no guarantee that that's actually what it's saying, or what people are engaging with. Never mind the fact that, the ability to model and understand what the structure of the entire conversation looks like, that's another thing. When you talk to your clients and they're like, "Oh well, someone says, 'Show me the top five competitor companies that do X,'" and they pop up, they're like, "Well, that's probably good enough." You're like, "Well, that's really just the first question of what is likely at least a five- to ten-point of engagement. And where does that end up? Where does that conversation end up?" That is a big thing for our customers, our clients, trying to understand: how does LLM fit into the overall customer journey? How is the customer journey being changed by this? Then what do they need to do in terms of optimizing their website to really do the things that LLMs can't do very well? When someone is coming in, they can pick up that baton. The top part of that customer journey, the product discovery part of the journey, is being compressed and happening elsewhere. But the other parts of that customer journey, which is, not the discovery, not even the high-level consideration, but the "okay, let's get into it" type of part of stuff, that is still critically important for these websites to be able to do. But the pathways to that, they don't have the information about what does that maturation look like? What are people ready for? How should we, like, we don't want to push that stuff too far, because there is going to be a certain portion of people who are still like, "Now, what are you all about?

00:15:53

P17: For our clients right now there is uncertainty, and even if they kind of think, "Well, this is probably where it's going to go," it's going to be expensive to get there. I mean, you're talking about optimizing all of your knowledge-based articles, you're talking about optimizing all this data that is buried down in there. You have to make sure it gets indexed, you have to make sure it's talking about it correctly, you have to make sure it's updated, because LLMs won't look at date stamps. You and I will look at a knowledge-based article and be like, "Well, that's four years old. I don't know if that's really true anymore." But if an LLM just goes and pulls that in and uses that as a basis of a response, then there's a real danger there.

00:16:44

Paul: That makes me feel like there's going to be, if not a new web browsing mode that's LLM-specific, there's definitely going to be, I want to say, new ways so that LLMs can recognize that, "Oh, look, this article is five years old. I should discount this."

I want to ask you, well, let me lead this question in by saying you've described a win that you had with Figma Make plus Supabase, so that you could really make full-featured prototypes and proofs of concept. Have you had any unexpected disappointments or surprise failures trying to implement AI? And if not you, feel free to describe someone in your organization, or another organization that you know about.

P17: Yeah. We haven't used it on a lot of client-facing projects yet. The reason why was because our projects were too far along... AI is really kind of good at the beginning, I think, and once you get further in, it becomes more sort of human-driven in a lot of ways, because it's less about scale of execution and more about strategy and understanding what the client is asking for, and then we have a certain number of projects which are sufficiently unique that AI is not going to be particularly useful in that.

00:18:15

P17: That's one of the other things, AI is a very good copycat. It's very good at giving you something that you think, based on what everyone else is doing. But if the thing that you're doing is not something that really exists elsewhere or is sufficiently different, then, it can still be a very good tactical execution tool. You can give it a task saying, "Okay, I need you to come up with some ideas about that." But it's not like you can just say, "Here's a marketing website," and what it's going to do is generate that. So I don't know if we've had any real failures with it yet. I think we're still trying to figure out exactly the best ways to use it in a way that still leverages what we are good at as an agency, and promotes that difference to our clients to make ourselves still relevant and useful. I do think research is one of those, honestly. Research is one of those things that goes in and out of favor a lot.

00:19:21

P17: When times are good, everyone thinks, "Oh, we should do research." Then when times get tight, you feel like it's one of the first things that they cut, and then they start to rely upon internal stakeholder direction, and then you have all the problems where you get a myopic view of what we're supposed to be doing, and then everyone just assumes it's working well, and it turns out it's not working well. Then people go back to the research. I think trying to figure out ways to leverage the data synthesis part of it [is key]. I do think at the end of the day, AI tools, when it comes to research, are going to be very useful, but they're only going to be as useful as the quality of the data that you feed into it.

00:20:09

Paul: I would plus-one that and say, and the quality of the planning, and ability to recognize insights.

P17: Yes, and the ability to, I mean, one of the nice things about it is that we've always had this qual, right? With this, potentially there's the idea that you can scale qual a little bit more, and still make it so, instead of doing 20 sessions, you can do 50 sessions. The key is not just to have a transcript and dump it in there and be like, "Tell me what it says." The key is being able to ask very, very cogent questions in that, and then be able to do pattern analysis across it. But it's going to come down to the quality of the questions, because I've been doing a lot of comparison of, "Well, what would be the things I recommend about changing on a website versus what AI is telling me?" There is definitely some overlap for sure, but the areas of overlap is, AI is good at what is best practices it's found someplace else.

00:21:15

P17: Those are fine, but there's a lot of things about the actual particularities of the interface or the use case that it's not quite so good at. You can try to get there by being very specific with the LLM, but it's still drawing upon a probabilistic data set of what is the most likely right answer based on the preponderance of evidence, and you have no idea where that evidence comes from. So controlling the evidence, right?

Paul: That's because it's a word prediction machine at heart. I'm plus-oneing what you're saying, and also getting on a soapbox, so I apologize, but I don't think there's a real... I'm going to regret saying this in x number of years. I don't think there's a real chance that we'll replace a researcher's, or a critical thinker's judgment about both what questions to ask, how to analyze what the content is actually saying, and what recommendations to generate.

00:22:20

Paul: I think the AI augments, but it can't replace.

P17: Yeah. And I think, the augmentation, the one thing about AI, it's very good at looking like it has really good specific and general knowledge, right? But it doesn't. Like you said, it's a probabilistic engine that is looking for what is most likely the next right word to put together. It's very good at seeming like intelligence, but really at the end of the day it's not. All it's giving you is, "This is what basically everyone else is saying." If you have tasks which are sort of wisdoms-of-the-crowds-oriented, then it works really well. But if you have tasks that are not based on that, that are more individualistic or unique, then it's not very good at that. The question is, a lot of people don't understand the difference between those two things.

00:23:23

Paul: My wife describes, because she's been in AI and conversation design for decades, and she says that we mistake fluency for competency and intelligence.

P17: Yeah. I think at the end of the day, we've all seen some big examples, like that McKinsey report, where they charged like $100,000 and it turned out it was all AI slop. The ability for a human being to go through and discern whether something is correct or incorrect, and then be able to go back to the data set that is built on, when it comes to being able to trust your own evidence, your own sources, your own starting points, and being able to look at it and run your own analysis on it. You can tell an LLM, "Give me all the citations which you're drawing from," and it's going to give you a handful of those. It's not going to give you all of them. And even then, now I'm going to have to go through someone else's data and try to figure out whether what they actually said and whether the extrapolation was correct. That's a problem, because, now, what time is being saved here?

00:25:28

P17: I do think the really interesting thing that's happening right now, that AI, everyone who's developing AI, loves, is the idea that they can extract knowledge from you as a practitioner on a very granular way to try to encode that in some way as either a skill or some sort of MD file, and then to use that as a very specific metric against it. Now, there's still an assumption in there that you have described that correctly, and that it's interpreting that correctly, and all these different things. But I do think that there's going to be some way of getting it. Personas are a good example, right? It's like, if you say, "Here's a synthetic persona," and you create a sample persona and you say, "I want you to evaluate the website against this persona," it can do a reasonably decent job at that. Whether it's really thinking about it correctly or not, it looks like it's doing it correctly.

00:26:27

P17: Now, for me, I look at that as a challenge to say, well, the way that we traditionally think about a persona is really a synopsis of a whole bunch of different things, to make it digestible by a human, and to give it a whole bunch of shortcuts. So if we went into that and really broke that down on a much more granular basis, to essentially do like a psychographic profile of what we think that persona really is, on a level with a series of criteria that's much more extensive than we would do in a traditional persona, you know, things that we would just kind of add a blurb for and be like, "That's enough." But you could actually break that down into like 20 different criteria of decision-making and perception and things like that.

P17: I wonder if it'll be, but then at that point I'm sort of working against what everyone thinks the efficiencies of the LLM are. Now I'm sort of doing more machine learning in some way, and I'm still relying upon the LLM's ability to interpret that into an interface that it cannot see. This is the thing that most people don't realize, is that LLMs, when you say, "Hey, go look at this website," it doesn't read an interface the same way a human being does. This comes across oftentimes when you say, "Hey, how well is this page designed according to this criteria?" and it will be like, "Check, check, check, check, check." Then you go and you look at the interface and you're like, you understand that the LLM doesn't digest or read an interface with the same variability or weaknesses as a human. So the issues of prioritization and visibility and obscurity, like, "Well, it exists on the page." You're like, "Well, but does it exist in the right spot on the page in a way that someone's going to see it, or in the way that human beings associate proximity with relative relationships," all these different factors that go into how humans actually do it. Now, if you're asking an LLM to create an interface that another machine can read, yeah, it's going to kick ass doing that. That's why I think everyone is very excited about agentic, where agents are going to be talking to agents, and humans are going to be taken out of that, except as a completely confirmation type of role. But in terms of interfaces that people use, it's important to understand that human beings digest information very differently, and LLMs do a very good job of obscuring that fact. When you ask them, you and I, I've been going back and forth with Claude about this and I've been like, "All the ways in which you perceive an interface are very different from the ways that I perceive an interface, and there are, there seem to be, hidden obstacles in that process." It will tell you, "Yes, you're absolutely right. I'm looking at in terms of a markdown. I'm looking at in terms of, what if there's a schema behind it." All these things that are completely invisible to the user, and I'm making value judgments based on that. Now, from a research standpoint, does that hurt it when it comes to a design standpoint? Probably not so much. But again, what it's going to do is, it's going to look at examples out there that it thinks, "Hey, everyone's doing it this way. This is the right answer."

Paul: And that's an appropriate answer sometimes and in some contexts, but not all.

P17: Yes, not all.

Paul: Yeah. I want to ask you, because I want to respect our time box and we're just coming up, we're a little over. So I want to understand how all of this makes you feel.

00:29:32

Paul: I want to get at that by asking you, what's your biggest fear about this brave new world? And then, what's your biggest hope, in your most optimistic mood, for AI?

P17: So, I think the biggest fear is replacement, or the idea that what I do is no longer [valued] and the knowledge I hold and the experience I have is not nearly as valuable anymore. I look at what happened when the internet first came along.

People whose jobs were built upon being a storehouse and guardian of knowledge, they got destroyed, like, when you talk about a travel agent and things like that. Because people didn't have access to the information, and the travel agent had the ability to piece together, hold and guard, that information, and then piece that information together for you. In some ways, that's some of what our job is, when we talk about UX, or when we talk about design. We have an embedded expertise about certain things.

00:30:35

P17: So my concern is that, whether it's true or not, the perception of customers or clients would be that all that just exists in the LLM already. So, "I don't need your personal expertise about it." I don't know where that's going to shake out, I honestly don't.

So that is a concern that makes me, as a 50-something-year-old guy who's been working in technology, like, is that going to be obliterated?
On the other side of it, having worked through some prototypes and things like that, I definitely do see the advantages of speed of execution. As someone whose role is primarily a strategist, I can actually see advantages to me, because I think about it in terms of inputs and research, and putting it all together, and then I do a certain level of design, and then I pass it off to someone else.

00:31:45

P17: Theoretically as a good design strategist, that opens up doors for me to actually do execution that I wouldn't be able to do normally, Like being able to connect a prototype to Supabase, which three years ago we could have, but it would have been a lot of diving deep into the... A lot of back and forth, and involving a lot more people. Whereas now I can take the concept much further into that process. It's interesting because we've been having conversations like, we're back to the whole sort of unicorn thing, right? Every time a new technology comes out, everyone's like, "Oh well, it's just going to take one guy, and my company's looking for the person who can do it from stakeholder interviews, through research, through design, all the way to coding, and then just hand it off to like a DevOps person who's just going to make it work." That can work, theoretically, but I think the other side of the coin, that people tend to not really appreciate, is that a large part of the process that we do is not just executing a design but socializing designs within organizations for the purpose of decision-making and collaboration. That process, as much as you automate it, or make it AI assisted, it still requires a certain skill set that is more human-based than anything else. A big part of my job is, working with the client, "Okay, so what are we trying to achieve out of this?" If we do some research, being able to say what are the most important or germane parts of that research, and how do we flow that into design? And then, "Okay, here's some overall concepts."

00:34:03

P17: With the project that we did, one of the things that we found doing this sort of vibe coding that was going on, is, actually, it was still super useful to do high-level flows and functionality, because once you, if you just try to jump in and start coding it, the complexity gets to you really quickly. You can recover from it, sure. It's not like you're going to go, "Oh my God, I can't do it." But it makes a lot more sense to think through the entire experience of what you're trying to do for it, and then to use AI to execute either on a screen level or on a functionality level.

Paul: I will yes-and that, and say that also jumping straight into prototyping always gives you tunnel vision and keeps you from the top of the discovery funnel, define the problem, define the solution space.

P17: Yes. The LLMs suffer from the same problem on a coding level, is that it starts to iterate out code based on assumptions that were created earlier in the process, before you actually were able to describe the overall experience. So now it's working backwards and trying to figure it and fix it... Yes, you can go back to Claude Code and like, "Okay, so now this thing is done. I want you to go through and I want you to optimize all the code, I want you to optimize the database, I want you to look at it for security holes." It can do that, but the question is, is your prototype full enough now? It's gone down these, you know, it's connected 17 different rabbit holes. Can it still step back and say, "Okay, this is how we should think about this holistically, not just from a stylesheet level or anything like that, but from a sort of overall functional level?"

P17: For me, when I look at it and what is my potential role in this, like, if I was a coder, I'd be terrified, right? Because I'm not a person who can look at that and be able to verify where their code is. But I know that for a lot of entry-level jobs, and I know there's been a lot of talk about this, it's not about cost savings. It's about building the next generation of people who can be the senior developers who are going to look at this and be able to do it. The same thing is going to happen with design. The same thing is going to happen with research. People are going to have very short-term thinking about it. You're going to lose the ability to generate the people who are going to be necessary in order to do the overall evaluation of whether this thing is working correctly or not.

Paul: This has been coming up with everyone I've been talking to.

P17: I think it's true, and I think it is a real concern. Maybe not for you and I, but for, you know, 10 years from now, where people are just going to trust that it works, and that they're going to use the LLM to verify that it works.

00:37:08

P17: At the end of the day, my biggest concern, my biggest fear, is, when you build a very complex interface, is that you completely miss edge cases, right? And not, like, extreme edge cases, but just non-obvious use cases. That causes a really big problem in the interface. It's not necessarily a bug, but a certain way of a person using the interface is not going to be supported, or a certain path, whether they take, do, they're going to dead-end, or they're going to lose their data. They're going to do all these different things. That is really one of my concerns about how this thing is, when you start to, when you have a product manager who thinks, "I don't need designers, I don't even need engineers. I'm just going to spec it all out and feed it in here, and then I'm going to look at it not with the critical eye that really is necessary, but just from a product manager perspective, and like, it works, and then push it to production." And then it turns out it doesn't work because there was no critical expertise in terms of understanding exactly how things work, or what that process should look like.

00:38:15

Paul: That brings up a huge question that I know everyone's kicking around, and I want to get some of this on the recording. So then we'll break in like three or four minutes. The job descriptions as we know them are collapsing. So you just described a product manager who says, "What do I need a designer for?" But you also implied, "What do I need a developer for?" because you said they push it to production.

P17: Right.

Paul: To me, this is a natural evolution. There used to be a job called systems engineer, and that was my first role at Lucent. Now, maybe you'll see the job systems engineer, which was basically a speccer, you know, you were responsible for the big honkin' spec. And now there

P17: Right, which no one ever read.

Paul: Right, of course. Now the LLM's committing [code]. So I don't know where I'm going with this question. It's not really a question, but I want to get your perspective on this collapse of the span of control, the collapse of job descriptions because the span of control now is so much wider.

00:39:27

P17: I think what you'll start to see is probably more overlap that will happen for a while, and then it will do what it's always done. It will start to digress back out to people who perhaps have overlapping skills. Like, you might have a researcher who's also a good UX strategist, or vice versa, or a UX strategist who can also do good information design, maybe not visual design, etc. Instead of being distinct roles, you're going to see some, like, I always think of the roles as a continuum, right? You have a business analyst, then you have a researcher, then you have a UX strategist, and somewhere in there you usually have a product manager, or yeah, project manager. You have a good interaction design, information design person. You have a good visual design. You have a good coder.

00:40:29

P17: People have always made themselves more valuable by understanding what comes before them and what comes after them, even if they don't do it per se. So the gray area between their area of expertise and other people's expertise. We have a good visual designer who's also an excellent animator who also understands code, right? He doesn't write code. We never ask him to write code. But he can anticipate what is going to happen with his designs on a code level. That makes it so much easier. So those gray areas are probably going to expand a bit more. What you're going to see in this is people who can do a little bit more and push the project further down. A good example is this project that we did, where it was an internal tool for estimating projects on a very granular basis based on historical data, and the ability to basically go through and say, "Okay, we're going to do this, we're going to do this, we're going to do this, we're going to do this," and have it generate both the schedule as well as the cost estimate, and then hours estimate, because we're an agency, so everything is based on hours. As we were doing it, I never thought, "Well, we're going to push this live," right? We might use it internally, or put it in front of our clients or anything like that, but we might use it internally, where, but our problem, we have a lot of the same problems everyone else has, that, you know, if it fucks up, then all of a sudden, it would be a significant problem.

00:41:51

P17: But when I was doing it, as someone who was very excited that I could actually put together a live prototype, it wasn't so much that I was replacing our information designer or our visual designer. It was the fact that the things that I needed this interface to be able to do were very hard to communicate in terms of requirements or in terms of business logic, or anything like that. You were able to visualize that very easily, and to make this artifact something that incorporated that information in a way that was much easier to understand. So you don't have the risk of a designer not reading a note about how something should work, because it actually works that way. One of the things that we talked about was, push, as we're doing all these different things, that becomes the artifact that we push through the pipeline. We can do a very high-level diagram of what it looks like, and then we can visualize that in a functional prototype in one way or another. But then the information designer could go in and actually change all the screens, and even the functionality there, but still understand what it's supposed to do. And then we pass it off to visual designer. For me, this prototype as being a warehouse of project knowledge was excellent. We toyed with the idea of annotating it, just having it so, like, as a rule, it's like throwing up the business logic so that anyone can see it as it's happening, and to bring that all back to the idea is that, I never viewed this as an opportunity to supplant the other people in my team or doing that process, but it allowed me to get them an artifact that was much richer and much more effective, and potentially to share that with our clients. Because clients notoriously have a very hard time, you know, you can show them 10 different screens and walk them through how they're all going to work together, but they're not professionals. They don't see it that way. So there's always a risk there, that their assumptions are coming into play, or their lack of ability to understand what you're actually telling them, and what they're agreeing to. So doing that in that way, I did view it as a way of saying, it's a way for me to extend, not my skill set per se, but to visualize what I do in a way that is more effective. I don't know whether everyone else is going to see it that way, whether, or even whether a client will see it that way, but I view it that way. I am not a visual designer. I'm a pretty good information designer. But my value to our projects is not that. My value is being able to look at the whole thing and say, "Is it really doing what we needed to do? Here's an area, or here's a way that we might approach that type of problem," and letting other people solve those problems.

Paul: Yeah. That's a good place to stop the recording.

AI Use Disclosure

I used AI to analyze the data collected via interviews and surveys. How?

  • I took notes after each session.
  • I fed those notes to several AIs, along with the moderator guide, project proposal, session transcript, the participant's survey responses, and a codebook of tags and themes I've been iterating as I collect data.
  • I prompted each to write a background, findings, and emerging themes section.
  • Then I iterated on each AI's draft, challenging the AI where appropriate and removing what I'm euphemistically calling "hallucinatory content" :-).
  • I collected each AI's drafts, added them to the project I've set up in Claude Cowork, and prompted it to draft the background, findings, and emerging themes section, pushing back as appropriate.
  • Then I edited the content, because "human in the loop" means "I have final edit." At least to me it does.
  • I then published each session writeup.

There's a bit more to it, but I'm trying to keep this short. Reach out if you want to talk about my AI-assisted workflow, which I'm still evolving as I go.