Trust Calibration

Human-AI RelationshipProvisional

Deliberate, ongoing practices for evaluating how much to trust AI output, not a binary trust/don't-trust but a spectrum that requires continuous recalibration

17 sessions37 annotated passages

Evidence

“We serve as coaches. We serve as supervisors. We evaluate the agentic risk. That's the expert's job.”

P1 - Principal UX Designer, InsuretechSoftware

“I will never 100% trust AI ever because I don't think it will earn that. It's hard. I say it's an oxymoron.”

P1 - Principal UX Designer, InsuretechSoftware

“I would say that it has to do with how important what you're asking it is. So I think it's a great first step.”

P2 - IT Business Analyst & Adjunct Professor, HealthcareInformation Technology

“We had were doing plenty of studies where it's not actually reading all seven. It might have only referenced five. So that's been an issue for us is trusting to be like, okay, did it actually analyze all the calls?”

P4 - Senior UX Researcher, SoftwareSoftware

“She can spin anything to it being okay and that we don't have to improve it. So that's really hard for me because I know she's coming in here and saying like what are the wins and then if she doesn't find win she's going to like twist it and then I'm like wait did you check that? And she's like I don't have time to check it.”

P4 - Senior UX Researcher, SoftwareSoftware

“If everybody was on a scale of who it is, research is more on I'm going to do my second pass. We're probably on the highest end and then product managers are way over here.”

P4 - Senior UX Researcher, SoftwareSoftware

“So, I think, well, I guess disappointment is maybe the right word. So, it was kind of discovering that, maybe not unexpectedly, I guess the bar was pretty low, but discovering that Dovetail still needed a lot of babysitting to get a lot of results. We had to go in and we realized that the transcripts had a lot of misattributions. I mean there were just a lot of things that need to be cleaned up to make it useful beforehand. That allowing Dovetail to kind of create its own tags and apply those was not sufficient. We still needed to do the diligence to go in and apply our own tags to make it more meaningful and real world context.”

P5 - Sr. Manager, UX Research, SoftwareSoftware

“We had product that was making recommendations and we're kind of proving on the results that were presented by the AI to the participants in testing and that was one thing that just came out as huge, that without some insight into where the AI was coming up with that output, and having some indication like just calling out these are the preferences or I'm getting it from our discussion about X Y and Z, without that, particularly in an enterprise context when decisions can be costly and have risk associated with them,”

P5 - Sr. Manager, UX Research, SoftwareSoftware

“it was a clear message from the participants that there's no way that they would rely on those outputs without some insight into where they were coming from.”

P5 - Sr. Manager, UX Research, SoftwareSoftware

“I don't know if you recall, there's an old George Carlin routine that, you'd be talking with someone that sounds like they really know what they're talking about for a while and you're like, "Yeah, yeah, yeah, go on." And then there's this, he's full of b.s., I think I've encountered that with AI a few times.”

P5 - Sr. Manager, UX Research, SoftwareSoftware

“Yeah, I mean I think in some cases if I'm not too far down a path I can just go back and kind of confirm but then also kind of reset the chat saying like it's getting off topic, I want to focus more on X Y and Z and I'd like to have it based on these particular types of resources and just kind of pull it pull it back in focus a little.”

P5 - Sr. Manager, UX Research, SoftwareSoftware

“It's grossly inaccurate, but I think that kind of points to the human-in-the-loop element: it only gets smarter if whoever's using that system goes back and double checks and says, "Oh, no, it's not that, it's this.”

P6 - Senior Technical Product Manager, Consumer FinanceConsumer Finance

“And because we don't want people just translating things who don't understand the language, it also assigns a confidence rating. And our set confidence is if it's a 95% or above confidence rating you can roll with it.”

P6 - Senior Technical Product Manager, Consumer FinanceConsumer Finance

“If it's below that it needs some human oversight, and if it's below a certain level, like once we hit like 70%, it's something that we would want to send to our translation partner.”

P6 - Senior Technical Product Manager, Consumer FinanceConsumer Finance

“I am an advocate of continuous human oversight. I saw a quote from IBM today and it was something to the effect of, a computer cannot be held accountable and therefore it should not make managerial decisions. That applies over a lot of different areas. I think, I'm sure you've read about the United Healthcare stuff where it was making accept/reject determinations that resulted in a massive lawsuit. I am a huge advocate for human in the loop.”

P6 - Senior Technical Product Manager, Consumer FinanceConsumer Finance

“I don't take what comes out of an AI at face value, ever. In general.”

P7 - Principal Design Researcher, Software ConsultingSoftware

“it was suggesting that I went to a specific place, I did a plan, I didn't check, and then that place was shut down that day and for a while for renovation. I said, "I wasted one day that I had in this location. Why didn't I think about checking?”

P7 - Principal Design Researcher, Software ConsultingSoftware

“I think I rely a lot on, when there are things about research, things that I know, I use myself as a benchmark and I say, "No, you're not saying the right thing." And then that worries me, though, because I'm thinking, "Okay, all the things that I don't know, which are many, and all the domains that I don't know... should I believe it or not?”

P7 - Principal Design Researcher, Software ConsultingSoftware

“just really defining, getting narrower and narrower in focus of what I want AI to do, and that got me better results. And that's kind of a metaphor for how I interact with the different chatbots: start with a really good definition of what I'm looking for, give it some background, and I turn it into a conversation.”

P8 - UX Researcher/Designer, Electric UtilitiesElectric Utilities

“If I can't trust it, well, my first assumption is that I did not define the problem well enough, I think.”

P8 - UX Researcher/Designer, Electric UtilitiesElectric Utilities

“Well, first of all, I started teaching about hallucinations from early on, and you can literally say, "LLM, teach me how to avoid hallucinations." And there is plenty you can do to make sure what you're getting back is real, right? So there are steps you can take, but now my friends are building these things, these AI brains where they're stacking the LLMs on top of each other, pulling a confidence score. But also the hallucination gap is closing too. It's getting less and less. I tell people all the time, if it's a subject matter that is common like tech, right, and there's old long history with it, it's probably going to get that right. But if you ask it about like, you know, the Figma update from yesterday, it's going to get it wrong. So, there's a time piece to it.”

P9 - UX Researcher and AI Specialist, IndependentIndependent Consultant

“There's a subject piece to it. There's a prompting piece to it. But you can have the AI, I tell people all the time, have the AI teach you about the AI.”

P9 - UX Researcher and AI Specialist, IndependentIndependent Consultant

“I honestly try to hedge against that by only asking it things where there is no quote unquote wrong answer, because like I said, because of that December calendar incident. I'm not 100% sure that I would trust it if I asked it what 2 plus 2 is half the time. So if I have a big hairy task that again would require processing tens of thousands of rows of data, I don't know that I would, I would probably ironically, and again this goes back to that concern that I have of am I now suddenly seduced into overrelying on it, but I would probably ask it to say, "Okay, process this 10,000-row file but then also tell me how should I double-check your work," which is circular reasoning in the worst sort of way. But yeah, I think ultimately my personal strategy is try not to use it in spaces where there is high risk and or where there is an absolute 00:17:02 need for 100% accuracy and then just stick to it more where the spaces are of, like, I know I keep saying it, but the idea generation where there really is no wrong answer per se, it's just input for me.”

P10 - UX Manager, InsuranceInsurance

“Sometimes Perplexity will give me a bad link and I always check the links. I always go back and review the work that AI did in the background because I can go back and look at the source links and sometimes the links, like I noticed in Gemini when I was doing some research, I was asking general questions about AI use and I found it was citing sources that weren't as rigorous as others. It was citing blogs. It was just searching the internet. It wasn't doing an academic search of stuff that I could cite. So I would say that's kind of the part I don't trust. There's also another piece of trust that I don't have and that's bias.”

P11 - CTE Program Manager, K-12 EducationK-12 Education

“So, this is something I think about a lot. I serve a majority minority district and what are the sources? What's the input? Because there's so much, like can I see the data set that AI was trained on? Because I want to know that when it's giving a teacher an answer, say they're not very, how do I put it, they're not very culturally sensitive, if it gives them something that's wrong I want them to be able to identify it.”

P11 - CTE Program Manager, K-12 EducationK-12 Education

“Okay. So an ongoing chat I have in Gemini, I saw past tense being used in a conversation. So I asked at the current time which was off, which was really sort of shocking to me, and it's obviously not a constant, you'd think maybe a computer would be. So I had to ask it to going forward always refer to the atomic clock. So occasionally I'll ask what time it is and sometimes it'll also tell me what time it is when I answer for a new part of the chat. But I think critical thinking is so so important because I will notice in this ongoing chat things that are left out, I will question and they'll act like they forgot. I don't know where that disconnect is, but I would say if you're not really critically thinking about the information you're getting, it's going to probably let you down in some ways.”

P12 - UX Designer/Researcher, Advertising & DesignAdvertising & Design

“Well it's what happens all the time because AI is a tool and a tool that's based on algorithms. So any wrong command, any wrong prompt is going to trigger a not so accurate response. So normally when I think about like, for example, my investments or some of the accounts, I was like, well 1 + 1 equals, why are you giving me 4.2? What's irrational. And a lot of times I compare Claude with ChatGPT and I say, okay, you know, this is wrong, or whatever the situation. And I caught it a lot of times. Say I give a table for you to tell me what's going on and you're not reading the table properly. It's like, okay, do your job as you should do. And my response, because there is, so you have at the end of the day it's not about the AI, it's how can you use it? How can you leverage?”

P13 - UX Design Consultant, Consumer FinanceConsumer Finance

“I think because my work has been more just front end, I haven't gotten in trouble yet with anything. There's definitely one instance in our company, we have a siloed off instance of Claude that has all of our product context in it that, because we're on Azure, we have to use and I'm on Mac but I have to use a virtual desktop to use, and it has all of the context for our product and so you can ask it questions and it will give pretty good technical answers. And I think one of our salespeople or product people was responding to a client and used a different instance of Claude and it gave him a plausible answer that ended up in a client email that was wrong. And that was not good and that had to be, that was sort of like a step back moment for the company to say please be careful and please validate everything you're seeing coming out of the LLMs.”

P14 - Head of Design, Healthcare SoftwareHealthcare Software

“We sort of raised that flag and had to move on because we knew it was just how we were building it. And then until recently we came across a new scenario where we had a whole other kind of, a patient-facing example, and we had to take a step back and our engineer had to basically re-engineer how that was working so that it wasn't generative anymore but it was only generative based on a design system that was already defined, which is I think how my vision of it was already but we weren't there yet. So now that we have that structure in place, there's work for me to go back and make sure that the design system it's referencing makes sense. But at least that's open to us now. And that is more context into why a vibe-coded app, when you don't have that design system in place, has even more weight because there's nothing to ground it.”

P14 - Head of Design, Healthcare SoftwareHealthcare Software

“That's if AI does everything perfectly. But we all know it hallucinates and it's interesting because when multiple things hallucinate then you've got this chain reaction of everything going off the rails.”

P15 - Senior Developer, TelecommunicationsTelecommunications

“My current work is to meet the security technical implementation guide from the DSA, the defense industry. So they have a huge list of requirements for, we'll just call it security hardening, and there are tools out there for scanning. So I have to build an image, scan it, look at it, look at the requirements, and there's not always a mitigation or remediation that is something you could script. Sometimes it's a site policy that has to be manually done, like it could be password complexity rules or checks that have to be done manually by somebody on site. So the process spans both code and policy documentation, and that really varies from customer to customer. So it's not the best example of how to use the AI to take on a bunch of requirements. I can take them one by one and say, "Hey, how do I remediate this?" And sometimes it can be done with scripting or some tweaks in the OS or whatever, but closing the loop of, "Hey, just take this whole document that's a thousand pages and implement everything," well, the government's going to come back and say, "Where's your audit trail of your development?" And I'm going to say, "Oh, well, I just told Claude to do it." I don't know how that's going to fly.”

P15 - Senior Developer, TelecommunicationsTelecommunications

“I think that goes with anything that we research online. You know, just because you Google something doesn't mean it's fact. It's all in reference to the context that you give it. So analyzing metrics, analyzing statistics, I took a stats psychology class back in my undergrad and it's given me a whole different perspective on statistics. Like, you can look at statistics five different ways and it's going to give you five different answers.”

P16 - Senior Product Designer, B2B SaaSB2B SaaS

“I don't know if there's actually been a time that I trusted it and I shouldn't have. If I've been unsure, I'm always one that reviews it. So I never trust it at face value. I will still go review it and determine, like, is this actually what I told it? Is this not? So I can't really say that's something I've encountered. It could have been something I encountered if I didn't review it. It messed up my interactions in a prototype that I was trying to create because the prompt didn't account for it. It messed up the interaction. So, I had to go back and revert and rewrite the prompt. But I think to make sure that we don't take things at face value, you still have to have that analytical thinking, right?”

P16 - Senior Product Designer, B2B SaaSB2B SaaS

“So you review it, you determine, does this make sense? And if it doesn't, research it on your own and come up with the answer. I think that goes with anything that we research online. You know, just because you Google something doesn't mean it's fact. It's all in reference to the context that you give it. So analyzing metrics, analyzing statistics. I took a stats psychology class back in my undergrad and it's given me a whole different perspective on statistics. Like, you can look at statistics five different ways and it's going to give you five different answers. It's just depending on, how do you want that to come off? How do you want that to be portrayed by your reader?”

P16 - Senior Product Designer, B2B SaaSB2B SaaS

“We've been advocating for the implementation of more AI onsite for people, like their own versions of LLMs. People are becoming habituated to using LLMs for product discovery, product comparison, things like that, and they get the idea, but I think how they would prepare their organizations for that, in order for it to be effective, their benchmarks for accuracy are much higher. So if you put anything on your site that's LLM oriented and it tells someone something that's wrong, well, if that happens in ChatGPT, ChatGPT is like, "Sorry," you know, you go validate it with the client. But if you're actually on the client site and let's say I want to validate that a package or a certain combination of products will work in a certain way and I have very detailed questions and it's giving me very detailed answers and that turns out to be wrong, then the repercussions of that are much higher. I think people are starting to come around to the idea that so much of their product discovery process is happening off-site, outside their control, and they only have indirect control of that.”

P17 - Head of UX, Design ConsultancyDesign Consulting

“Those are fine, but there's a lot of things about the actual particularities of the interface or the use case that it's not quite so good at. You can try to get there by being very specific with the LLM, but it's still drawing upon a probabilistic data set of what is the most likely right answer based on the preponderance of evidence, and you have no idea where that evidence comes from. So controlling the evidence, right?”

P17 - Head of UX, Design ConsultancyDesign Consulting

“And now it's like maybe five, not even five, minutes. Like a couple of minutes. And then also, anything that has to do with creating spreadsheets, I really use AI a lot. And I'll ask it for second opinions, like when it comes to research determinations: is this study exempt? Is it a limited review? I'll ask it for second opinions. It's not so good at that, but at least I get a second opinion to bounce off of.”

P18 - Research Integrity Program Manager, Higher EducationHigher Education

Sessions

The Reluctant Early Adopter

P1 - Principal UX Designer, Insuretech · Software · Apr 14, 2026

The Pragmatic Equalizer

P2 - IT Business Analyst & Adjunct Professor, Healthcare · Information Technology · Apr 14, 2026

Mandated Enthusiasm: Bonuses, Bubbles, and the View from the Grid

P4 - Senior UX Researcher, Software · Software · Apr 15, 2026

The Five-Day, One-Day, Three-Day Problem

P5 - Sr. Manager, UX Research, Software · Software · Apr 15, 2026

The Bigger, Badder Machine

P6 - Senior Technical Product Manager, Consumer Finance · Consumer Finance · Apr 15, 2026

The Tunnel Vision Experiment

P7 - Principal Design Researcher, Software Consulting · Software · Apr 16, 2026

Hallucinations Are a Feature

P8 - UX Researcher/Designer, Electric Utilities · Electric Utilities · Apr 16, 2026

The Suitcase and the Plan B

P9 - UX Researcher and AI Specialist, Independent · Independent Consultant · Apr 17, 2026

The Seductive Skeptic

P10 - UX Manager, Insurance · Insurance · Apr 17, 2026

The Norm-Setter on the Guest Network

P11 - CTE Program Manager, K-12 Education · K-12 Education · Apr 17, 2026

The Visual Thinker with a Framework

P12 - UX Designer/Researcher, Advertising & Design · Advertising & Design · Apr 20, 2026

Midnight Ideas and Shadow Adoption

P13 - UX Design Consultant, Consumer Finance · Consumer Finance · Apr 20, 2026

Fighting Fire with Fire

P14 - Head of Design, Healthcare Software · Healthcare Software · Apr 20, 2026

Building My Own Replacement

P15 - Senior Developer, Telecommunications · Telecommunications · Apr 20, 2026

Stay in Your Lane

P16 - Senior Product Designer, B2B SaaS · B2B SaaS · Apr 21, 2026

AI Checking Its Own Work: Bad Idea Right Now

P17 - Head of UX, Design Consultancy · Design Consulting · Apr 21, 2026

AI Agents Talking to AI Agents

P18 - Research Integrity Program Manager, Higher Education · Higher Education · Apr 21, 2026

Related Themes in Human-AI Relationship

Self-Maintenance Skill Erosion Augmentation Not Replacement AI as Sounding Board AI as Learning Partner AI as Cognitive Prosthetic Hallucination as Engagement Useful AI Techniques

← Back to Theme Explorer