P4: Survey Data and Session Summary

Survey Responses

Question	Response
Age	25-34
Education	Master's degree
Role / Level	Individual contributor
Job title	Senior UX Researcher
Years of experience	8-15 years
Organization description	Make an agentic AI tax return software for accountants/CPA firms to automate their clients tax returns (Tax industry, but software company)
Industry	Other or not sure
Individual AI tools used	Text generation (creating documents, emails, summaries), Media creation (images, audio, video), Search and information retrieval, Data analysis and synthesis, Workflow automation and process automation, AI prototyping/vibe coding
Organizational AI tools	Internal search and knowledge summarization, Code generation and developer tools
AI adoption involvement	No direct involvement in adoption or deployment (mostly a user of a deployed AI system)
Biggest work win with AI	I am on a one week research sprint for my new product (Ready to Review). We went from PowerPoint to MVP in under 10 months and the field of tax is complex with each CPA being mostly mistrusting of AI and is extremely skeptical of AI agents. Accuracy is important. With our constant feedback, we were able to launch features that even convinced detractors. However, since our product has a lot of friction on the front-end, it is been difficult to get adoption (migrating clients over; supporting enough source documents). Because of this short timeline, I have to host 5-7 customer calls every week and get the report done in a day. I clean up the transcripts with a Transcript cleaner that our doctor UX researcher on our team created and then throw those into a chain in Claude to look for themes and make a draft report. I generally have a skeleton of the top themes, but use this draft supplementally. I then write my report and have Claude edit to be concise. Also, this one week research sprint put a lot of stress on design (who had to have designs ready each week), so v0 and FigmaMake have been a gamechanger for us.
Biggest work disappointment with AI	I have one tool that our Data Analyst built for us that is a Vector Search + Reranking based on Claude that analyzes all sales and pre-demo calls uploaded to Gong (what our sales team hosts their calls). Me and my product manager can set filters and ask any questions. Although it is built on Claude, it has hallucinated, so I spend more time having to fact-check, which you guessed it, my product manager does not do. Also, when prompting it, you get different results each time. It will not quantify how it ranks it and comes up with the top ten themes, but does not feel like it is actually holistically looking across all calls. It is better than nothing, but it is not ideal or accurate.
Organization's biggest AI success	[organization] has actually been quite thoughtful and I have been impressed with how they have invested in tools. One of the doctors (super smart dude, Yeti Li) has been reserved to only make tools for us fellow researchers to be faster in our jobs for this year. Happy to show you the roadmap of what we will get built for us. Q1 is the transcript screener and Q2 will be a Notetaker that integrates with Claude workspaces with the quotes on sticky notes.
Organization's biggest AI challenge	In a big corporation, everyone is scared that we are utilizing a tool that will replace us. They cut a significant portion of product and rehired the bottom of the barrel devs in India (who are not as competent as the former US employees and the India devs lost so much context/internal knowledge that never got passed over since they cut whole departments). TR announced they will want 50 percent of all code written to be done agentically. However, devs are frustrated because they rather just write it perfectly the first time and KNOW where an error is then have to find it later. Also, it is hard to run tests to explain why it failed - we just implemented a new DataDog but for AI tool, but it is confusing. Because this large corporation has very specialized roles, folks who are UX Content Designers or UX Accessibility Designers get very nervous as they build tools in Cursor that help automate their processes that they will get cut in 1-3 years.

Background

P4 is a senior UX researcher at a large software maker of business productivity tools, working on an agentic AI product that automates tax return preparation for accountants and CPA firms. At the time of the interview, they were operating on one-week research sprint cycles: designs finalized on Friday, customer sessions on Tuesday and Wednesday, analysis on Thursday, stakeholder debrief on Friday morning. The pace is intense enough that P4 described it as making you "want to really pant."

The organization has made substantial investments in researcher-specific AI tooling. A PhD researcher on the team has been reassigned full-time to build custom tools for the research function, including a transcript cleaner (delivered in Q1) and an upcoming notetaker that integrates with Claude workspaces (planned for Q2). P4 also has access to an internal platform that provides Claude-based prompt chains, an LLM marketplace, and recently issued Claude licenses to all employees. The tooling investment is real, but so is the pressure: the organization has set an OKR requiring researchers to document 10 hours per week of time savings from AI tools by end of year, and P4's performance bonus is tied to AI usage metrics.

The session was the longest in the study at over an hour and included screen-sharing of P4's internal tools and workflows, making it the most operationally detailed interview so far. It was also the most politically candid, with P4 describing data center construction in their community, grid stress their spouse witnesses as a lineman, and a generational anxiety about whether the AI investment bubble will collapse.

Key Findings

The Role-Based Trust Gradient

P4 articulated a trust continuum that runs across professional roles within the organization. Researchers occupy the high-verification end, performing a second pass on all AI output. Developers fall roughly in the middle. Product managers and marketers occupy the low-verification end, tending to accept AI output without substantive review.

This pattern manifested concretely in P4's description of their product manager using the customer insights explorer. P4 observed that the PM would "spin anything to it being okay" and, when asked whether she had verified a source, responded "I don't have time to check it." The trust gradient is not a matter of individual personality but of role-based accountability norms: researchers are trained to verify, and their professional identity depends on the accuracy of their analysis. Product managers face different incentive structures.

"If everybody was on a scale of who it is, research is more on I'm going to do my second pass. We're probably on the highest end and then product managers are way over here."

Hallucination Across the Stack

P4 encountered fabricated AI output in three distinct contexts: a customer insights explorer that hallucinated an entire quote, vibe-coded PRDs containing "hallucinated pain points" disconnected from user evidence, and an LLM that silently skipped input documents without disclosing the gap. Each failure mode is different (fabrication, confabulation, silent omission), but the downstream effect is the same: unreliable output flowing into decisions.

The PRD hallucination is particularly consequential. Product managers use AI to generate requirements documents that include sections for "secondary research and customer research," but the cited pain points are invented. Because the documents look polished, they pass review without scrutiny. P4's frustration centers on the gap between appearance and substance.

"A lot of them now are making them look really cool and have vibe coding but I don't think they ever go back in and add anything just whatever they prompted and told our customers and there's parts where it says supposed to have secondary research and customer research and it's just making up pain points in there and so everything looks put together and there's a lot of words on a page but nobody's still going in for that second layer."

Tool Proliferation Without Absorption

The organization is deploying AI tools at a pace that exceeds its capacity to train employees on them. P4 listed their internal AI platform, Claude licenses, Cursor, Figma Make, Linear, and v0 as tools that have been thrown at various teams. The research team is relatively well-supported, with structured training sessions, pilot studies, and a job-to-be-done mapping exercise that identified where AI could reduce pain points. Designers, by contrast, are "literally just being told by their manager like, 'Here's Figma. Go play with it.'"

The churn compounds the problem. P4 described cycling through "Rive mania to Figma Make mania and now we're on Cursor mania," each wave requiring new learning before the previous tool has been fully integrated. The gap here is not between official and shadow IT (as in P3's organization), but between the volume of tools deployed and the organizational capacity to absorb them meaningfully.

"We already went from Rive mania to Figma Make mania and now we're on like Cursor mania. It seems like there is always a new tool and you have to almost use all of them to feel comfortable."

Mandated Adoption and Perverse Incentives

P4's organization has tied AI usage to measurable performance targets. The OKR calls for 10 hours per week of documented time savings. P4's individual performance bonus is attached to how much they use AI at work. Before P4 had figured out how to integrate AI meaningfully into their workflow, they were generating unnecessary queries (asking for grocery lists, "other dumb stuff") to hit the usage target, while simultaneously feeling guilty about the environmental cost of each query.

This is structurally distinct from expectation escalation. The organization is not just expecting more output; it is measuring and rewarding AI usage itself as a KPI, regardless of whether that usage produces value. The 50% agentic code mandate for developers creates a parallel dynamic: developers report that AI-generated code requires more cleanup time than writing it from scratch, but the mandate persists.

"My bonus, my performance, is attached to how much I use AI at work. So I have to [use it]... if I don't I might not get my bonus. So at first before I was really figuring out how to do it in my workflow. I was just asking it for my grocery list and other dumb stuff and I felt bad because everybody tells you one search is dumping out a water bottle and I'm like oh no I have to do so many searches a day or else I don't get my bonus."

The View from the Grid

The session took an unusually personal turn when P4 described the physical infrastructure of AI in their community. Two data centers are under construction near their home, consuming farmland and straining water resources. P4's spouse works as a lineman for the local electrical company and sees the grid stress firsthand. A friend who poured concrete for one of the data centers told P4 that the builders are already designing the facility to convert into a warehouse if the market shifts.

P4 described a split between their professional persona ("wear a mask on LinkedIn and like I love AI") and their personal convictions about the environmental and social costs. Their hope for the future is that "it pops a little bit," not that the underlying technology disappears, but that the hype and bloatedness deflate. They compared the current moment to the dot-com bust but noted it feels "more apocalyptic" because of the environmental impact, workforce displacement, and the reach of misinformation through AI-generated content.

"Our friend who does the concrete for the [nearby city] data center that there was a big push to get that closed, but there's just not very many laws to protect the rights of what people want. They're already building it in a way that they're like, 'Well, we can turn this into a warehouse or maybe this would just be an Amazon warehouse afterwards.' So they're already kind of predicting like the people that are building it are already like this bubble might pop."

Emerging Themes

Theme	Description	Key Quote
Trust Calibration	Deliberate, ongoing practices for evaluating AI trustworthiness on a spectrum	"Research is more on I'm going to do my second pass. We're probably on the highest end and then product managers are way over here."
Hallucination Frustration	Disappointment at AI confidently producing fabricated content	"Right off the bat, the first time I used this like a month ago, it hallucinated a whole quote."
Expectation Escalation	AI enabling faster delivery while simultaneously raising stakeholder expectations	"I think they want us to be like down 10 hours of work a week with these tools by the end of the year."
Corporate Tooling Gap	Mismatch between tools deployed and organizational capacity to absorb them	"They are throwing every single tool our way. And I feel bad for our designers because they have even more."
Job Security Anxiety	Fear that AI will reduce headcount, prompting career anxiety	"Are they still going to have me in three years or will there just be less of us?"
Apprenticeship Erosion	Concern that skipping the "second layer" of review eliminates what juniors would apprentice into	"Everybody's like, 'Looks cool.' And then I'm like, 'No, but read it.' Does any of this [make sense]?"
Infrastructure Anxiety	Concern about AI's physical and environmental costs, grounded in direct personal proximity	"We're just seeing all these horror stories of people running out of water and we know they're coming for the Midwest because of our water."
Organizational AI Adoption Challenges	Organizations struggling to find an effective path forward with AI, from arbitrary code targets to bonuses tied to usage metrics	"They're saying, 'Oh, we want 50% of code to be written by AI.'... it just would have been faster."

Interview Transcript

00:06:14

Paul: What was the first AI tool you remember trying and what were you hoping it would do for you?

00:06:14

P4: Yeah, that part's always tricky because I feel like there's been a lot of features that have like technically been AI that we want to categorize as AI. Especially when I was at my former company, [former company], and it seemed like any type of marketing or like SEO buzz, you almost had to say something to say, I don't like get it in like other like hits and other blog posts. I would take it. So I would say technically like the OCR capabilities was like AI that was in some of our like Snag It products for like grabbing the text from a screen capture image and the video editing audio portion.

00:07:33

P4: So, we had this tool called Audiate, which like integrates with Camtasia, which now they're just like smashing those two things together. so, you might not even really find the word audiate, but essentially Audiate was a tool where it's kind of like the video editing to script platform where you're able to edit just the words out of your videos, like all the ums and a's like instantly. So it kind of builds on like a lot of people forget that like a whole bunch of like transcripts and auto captions that's still like a lot powered by AI. Or at least like one form of it. So, those are probably like the first tools. And I would say it's only been a muscle that I've been flexing recently. Due to like the speed of I'm on one week sprint cycles here at a large software publisher on my product, which is makes you want to like really pant. So I like have to use AI to like at least help me get drafts or like clean up a report. And I just lean on lean on Claude. I was using like rag chains with chat GPT, but I like the way Claude like words things better.

Expectation Escalation

I'm on one week sprint cycles here at [organization] on my product, which makes you want to really pant. So I have to use AI to at least help me get drafts or clean up a report.