Trust Calibration
Human-AI RelationshipProvisional
Deliberate, ongoing practices for evaluating how much to trust AI output, not a binary trust/don't-trust but a spectrum that requires continuous recalibration
Evidence
“We serve as coaches. We serve as supervisors. We evaluate the agentic risk. That's the expert's job.”
“I will never 100% trust AI ever because I don't think it will earn that. It's hard. I say it's an oxymoron.”
“I would say that it has to do with how important what you're asking it is. So I think it's a great first step.”
“We had were doing plenty of studies where it's not actually reading all seven. It might have only referenced five. So that's been an issue for us is trusting to be like, okay, did it actually analyze all the calls?”
“She can spin anything to it being okay and that we don't have to improve it. So that's really hard for me because I know she's coming in here and saying like what are the wins and then if she doesn't find win she's going to like twist it and then I'm like wait did you check that? And she's like I don't have time to check it.”
“If everybody was on a scale of who it is, research is more on I'm going to do my second pass. We're probably on the highest end and then product managers are way over here.”
“So, I think, well, I guess disappointment is maybe the right word. So, it was kind of discovering that, maybe not unexpectedly, I guess the bar was pretty low, but discovering that Dovetail still needed a lot of babysitting to get a lot of results. We had to go in and we realized that the transcripts had a lot of misattributions. I mean there were just a lot of things that need to be cleaned up to make it useful beforehand. That allowing Dovetail to kind of create its own tags and apply those was not sufficient. We still needed to do the diligence to go in and apply our own tags to make it more meaningful and real world context.”
“We had product that was making recommendations and we're kind of proving on the results that were presented by the AI to the participants in testing and that was one thing that just came out as huge, that without some insight into where the AI was coming up with that output, and having some indication like just calling out these are the preferences or I'm getting it from our discussion about X Y and Z, without that, particularly in an enterprise context when decisions can be costly and have risk associated with them,”
“it was a clear message from the participants that there's no way that they would rely on those outputs without some insight into where they were coming from.”
“I don't know if you recall, there's an old George Carlin routine that, you'd be talking with someone that sounds like they really know what they're talking about for a while and you're like, "Yeah, yeah, yeah, go on." And then there's this, he's full of b.s., I think I've encountered that with AI a few times.”
“Yeah, I mean I think in some cases if I'm not too far down a path I can just go back and kind of confirm but then also kind of reset the chat saying like it's getting off topic, I want to focus more on X Y and Z and I'd like to have it based on these particular types of resources and just kind of pull it pull it back in focus a little.”
“It's grossly inaccurate, but I think that kind of points to the human-in-the-loop element: it only gets smarter if whoever's using that system goes back and double checks and says, "Oh, no, it's not that, it's this.”
“And because we don't want people just translating things who don't understand the language, it also assigns a confidence rating. And our set confidence is if it's a 95% or above confidence rating you can roll with it.”
“If it's below that it needs some human oversight, and if it's below a certain level, like once we hit like 70%, it's something that we would want to send to our translation partner.”
“I am an advocate of continuous human oversight. I saw a quote from IBM today and it was something to the effect of, a computer cannot be held accountable and therefore it should not make managerial decisions. That applies over a lot of different areas. I think, I'm sure you've read about the United Healthcare stuff where it was making accept/reject determinations that resulted in a massive lawsuit. I am a huge advocate for human in the loop.”
“I don't take what comes out of an AI at face value, ever. In general.”
“it was suggesting that I went to a specific place, I did a plan, I didn't check, and then that place was shut down that day and for a while for renovation. I said, "I wasted one day that I had in this location. Why didn't I think about checking?”
“I think I rely a lot on, when there are things about research, things that I know, I use myself as a benchmark and I say, "No, you're not saying the right thing." And then that worries me, though, because I'm thinking, "Okay, all the things that I don't know, which are many, and all the domains that I don't know... should I believe it or not?”
“just really defining, getting narrower and narrower in focus of what I want AI to do, and that got me better results. And that's kind of a metaphor for how I interact with the different chatbots: start with a really good definition of what I'm looking for, give it some background, and I turn it into a conversation.”
“If I can't trust it, well, my first assumption is that I did not define the problem well enough, I think.”
“Well, first of all, I started teaching about hallucinations from early on, and you can literally say, "LLM, teach me how to avoid hallucinations." And there is plenty you can do to make sure what you're getting back is real, right? So there are steps you can take, but now my friends are building these things, these AI brains where they're stacking the LLMs on top of each other, pulling a confidence score. But also the hallucination gap is closing too. It's getting less and less. I tell people all the time, if it's a subject matter that is common like tech, right, and there's old long history with it, it's probably going to get that right. But if you ask it about like, you know, the Figma update from yesterday, it's going to get it wrong. So, there's a time piece to it.”
“There's a subject piece to it. There's a prompting piece to it. But you can have the AI, I tell people all the time, have the AI teach you about the AI.”
“I honestly try to hedge against that by only asking it things where there is no quote unquote wrong answer, because like I said, because of that December calendar incident. I'm not 100% sure that I would trust it if I asked it what 2 plus 2 is half the time. So if I have a big hairy task that again would require processing tens of thousands of rows of data, I don't know that I would, I would probably ironically, and again this goes back to that concern that I have of am I now suddenly seduced into overrelying on it, but I would probably ask it to say, "Okay, process this 10,000-row file but then also tell me how should I double-check your work," which is circular reasoning in the worst sort of way. But yeah, I think ultimately my personal strategy is try not to use it in spaces where there is high risk and or where there is an absolute 00:17:02 need for 100% accuracy and then just stick to it more where the spaces are of, like, I know I keep saying it, but the idea generation where there really is no wrong answer per se, it's just input for me.”
“Sometimes Perplexity will give me a bad link and I always check the links. I always go back and review the work that AI did in the background because I can go back and look at the source links and sometimes the links, like I noticed in Gemini when I was doing some research, I was asking general questions about AI use and I found it was citing sources that weren't as rigorous as others. It was citing blogs. It was just searching the internet. It wasn't doing an academic search of stuff that I could cite. So I would say that's kind of the part I don't trust. There's also another piece of trust that I don't have and that's bias.”
“So, this is something I think about a lot. I serve a majority minority district and what are the sources? What's the input? Because there's so much, like can I see the data set that AI was trained on? Because I want to know that when it's giving a teacher an answer, say they're not very, how do I put it, they're not very culturally sensitive, if it gives them something that's wrong I want them to be able to identify it.”
“Okay. So an ongoing chat I have in Gemini, I saw past tense being used in a conversation. So I asked at the current time which was off, which was really sort of shocking to me, and it's obviously not a constant, you'd think maybe a computer would be. So I had to ask it to going forward always refer to the atomic clock. So occasionally I'll ask what time it is and sometimes it'll also tell me what time it is when I answer for a new part of the chat. But I think critical thinking is so so important because I will notice in this ongoing chat things that are left out, I will question and they'll act like they forgot. I don't know where that disconnect is, but I would say if you're not really critically thinking about the information you're getting, it's going to probably let you down in some ways.”
“Well it's what happens all the time because AI is a tool and a tool that's based on algorithms. So any wrong command, any wrong prompt is going to trigger a not so accurate response. So normally when I think about like, for example, my investments or some of the accounts, I was like, well 1 + 1 equals, why are you giving me 4.2? What's irrational. And a lot of times I compare Claude with ChatGPT and I say, okay, you know, this is wrong, or whatever the situation. And I caught it a lot of times. Say I give a table for you to tell me what's going on and you're not reading the table properly. It's like, okay, do your job as you should do. And my response, because there is, so you have at the end of the day it's not about the AI, it's how can you use it? How can you leverage?”
“I think because my work has been more just front end, I haven't gotten in trouble yet with anything. There's definitely one instance in our company, we have a siloed off instance of Claude that has all of our product context in it that, because we're on Azure, we have to use and I'm on Mac but I have to use a virtual desktop to use, and it has all of the context for our product and so you can ask it questions and it will give pretty good technical answers. And I think one of our salespeople or product people was responding to a client and used a different instance of Claude and it gave him a plausible answer that ended up in a client email that was wrong. And that was not good and that had to be, that was sort of like a step back moment for the company to say please be careful and please validate everything you're seeing coming out of the LLMs.”
“We sort of raised that flag and had to move on because we knew it was just how we were building it. And then until recently we came across a new scenario where we had a whole other kind of, a patient-facing example, and we had to take a step back and our engineer had to basically re-engineer how that was working so that it wasn't generative anymore but it was only generative based on a design system that was already defined, which is I think how my vision of it was already but we weren't there yet. So now that we have that structure in place, there's work for me to go back and make sure that the design system it's referencing makes sense. But at least that's open to us now. And that is more context into why a vibe-coded app, when you don't have that design system in place, has even more weight because there's nothing to ground it.”
“That's if AI does everything perfectly. But we all know it hallucinates and it's interesting because when multiple things hallucinate then you've got this chain reaction of everything going off the rails.”
“My current work is to meet the security technical implementation guide from the DSA, the defense industry. So they have a huge list of requirements for, we'll just call it security hardening, and there are tools out there for scanning. So I have to build an image, scan it, look at it, look at the requirements, and there's not always a mitigation or remediation that is something you could script. Sometimes it's a site policy that has to be manually done, like it could be password complexity rules or checks that have to be done manually by somebody on site. So the process spans both code and policy documentation, and that really varies from customer to customer. So it's not the best example of how to use the AI to take on a bunch of requirements. I can take them one by one and say, "Hey, how do I remediate this?" And sometimes it can be done with scripting or some tweaks in the OS or whatever, but closing the loop of, "Hey, just take this whole document that's a thousand pages and implement everything," well, the government's going to come back and say, "Where's your audit trail of your development?" And I'm going to say, "Oh, well, I just told Claude to do it." I don't know how that's going to fly.”
Sessions
The Reluctant Early Adopter
P1 - Principal UX Designer, Insuretech · Software · Apr 14, 2026
The Pragmatic Equalizer
P2 - IT Business Analyst & Adjunct Professor, Healthcare · Information Technology · Apr 14, 2026
Mandated Enthusiasm: Bonuses, Bubbles, and the View from the Grid
P4 - Senior UX Researcher, Software · Software · Apr 15, 2026
The Five-Day, One-Day, Three-Day Problem
P5 - Sr. Manager, UX Research, Software · Software · Apr 15, 2026
The Bigger, Badder Machine
P6 - Senior Technical Product Manager, Consumer Finance · Consumer Finance · Apr 15, 2026
The Tunnel Vision Experiment
P7 - Principal Design Researcher, Software Consulting · Software · Apr 16, 2026
Hallucinations Are a Feature
P8 - UX Researcher/Designer, Electric Utilities · Electric Utilities · Apr 16, 2026
The Suitcase and the Plan B
P9 - UX Researcher and AI Specialist, Independent · Independent Consultant · Apr 17, 2026
The Seductive Skeptic
P10 - UX Manager, Insurance · Insurance · Apr 17, 2026
The Norm-Setter on the Guest Network
P11 - CTE Program Manager, K-12 Education · K-12 Education · Apr 17, 2026
The Visual Thinker with a Framework
P12 - UX Designer/Researcher, Advertising & Design · Advertising & Design · Apr 20, 2026
Midnight Ideas and Shadow Adoption
P13 - UX Design Consultant, Consumer Finance · Consumer Finance · Apr 20, 2026
Fighting Fire with Fire
P14 - Head of Design, Healthcare Software · Healthcare Software · Apr 20, 2026
Building My Own Replacement
P15 - Senior Developer, Telecommunications · Telecommunications · Apr 20, 2026