The Bottom Left Hand Corner
On the dance between consumer trust and technical capabilities
One of the underrated parts of running a beta program is getting to learn from conversations the users have, with us and with each other. Every so often, a thread breaks out that cracks open an idea I have been trying to shape for months and hands it back to me sharper.
This week, one of those threads landed on trust and technical capabilities. A user wrote in with what looked like a simple question: what is AVA actually for? If the job is just to make sure nothing slips through the cracks, she could see how some capabilities (catching new things) might matter more than others (editing or deleting old ones). But if the job is reducing the mental load of running the calendar, then editing and deleting have to be as reliable as adding.
She had, in effect, just named the gap between two related but different things: how capable the system actually is, and how much the user can stop watching it. Here is what I wrote back:
I see the ultimate goal as the user carrying a significantly reduced mental load. Not just because AVA can manage the calendar completely, but also do things like call the doctor’s office and fill out forms. I see the progression of capabilities being plotted along a line with two axes: 1) primitive capability (does the tech actually work) and 2) user trust (do users trust the tech to work). They are related, but not entirely the same thing. Where we are today, I believe, is at the bottom-left-hand corner of that curve. We will push on the primitives, and constantly calibrate on the user trust.
This frame — primitive capability and user trust — will not leave my brain. Their relationship, and our intentional approach to each one, is what we have spent the last year trying to articulate. After digging into the research, it turns out this dynamic is one of the most-studied in twentieth-century engineering. Yet almost nobody who is building or writing about AI-for-families is reading it.
I want to take a longer pass. The curve, the corner we are in, and the order of operations are the most important things, I think, for anyone building or using an AI agent in their home right now.
The Curve
The two-axis frame is this. On one axis, primitive capability: can the system actually do the thing. On the other, user trust: will the human let the system do the thing without standing over its shoulder. They are correlated, but absolutely not the same. A system can be highly capable and trusted by no one. A system can be barely capable and trusted by many. The first failure mode is IBM Watson Health, a multibillion-dollar AI that hospitals quietly refused to use. The second is Tesla Autopilot, fifty-plus NHTSA-verified deaths from drivers treating driver-assist as full autonomy.
What matters for an agentic system is not where you sit on either axis. It is whether you are moving along the diagonal. Capability without trust is a demo. Trust without capability is a disaster. The work, all the time, is closing the gap between the two.
Every consumer technology that has ever crossed this curve has done it in roughly the same way: a long, slow, often boring crossing in which the technology became more reliable in small steps and humans recalibrated their trust in small steps to match. The crossings that look fast in retrospect were almost never fast in practice. And the home, it turns out, is the hardest place on earth to do this.
How Long This Usually Takes
It is worth saying out loud how long these crossings have taken in the past, because the public conversation about AI assumes a different timeline.
The ATM was invented in 1967 at a Barclays branch in north London. The first US ATM was installed two years later, in 1969, at a Chemical Bank in Long Island. By 1973, there were 2,000 of them across the country, all tied to single banks. Shared networks did not appear until the mid-1980s. The ATM did not become a routine piece of infrastructure that an ordinary American adult used without thinking about it until the early 1990s. That is roughly a twenty-five-year crossing for a technology whose primitive capability (dispense cash, debit account, print receipt) was effectively solved in the first year.
The commercial airplane autopilot is even older. Lawrence Sperry demonstrated the first in 1914, at the Concours de la Sécurité en Aéroplane in France, by climbing out of the cockpit and standing on the wing while the plane flew straight and level. The audience was stunned. The technology was real. And then it took another half-century for passengers to stop flinching every time the pilot announced he was engaging it. Modern autoland systems, which can land a commercial jet in zero visibility better than a human can, were certified in the late 1960s and did not enter routine use until the 1980s. That is roughly seventy years from working to trusted.
Online banking is the closest analog to what we are trying to do, and it is the one I find most clarifying. Wells Fargo launched the first commercial web-based banking platform in May 1995. By mid-1997, two years in, the site was getting 450,000 visits a week and 12,000 banking sessions a day. By February 2005, the Pew Research Center put US online banking adoption at 25 percent of all adults and 44 percent of internet users. By 2010, 46 percent of US adults and 58 percent of internet users. The 51 percent threshold was not crossed until 2013. That is an eighteen-year crossing for a technology that, on the capability axis, did not change much after about 2001.
These are the success stories.
The case where the curve has not been crossed and it is still not clear that it will be is self-driving. Google’s secret self-driving car project started in 2009. Waymo spun out as its own Alphabet subsidiary in 2016. Waymo One launched commercially in Phoenix in 2018, became driverless in 2019, and as of mid-2026 operates as a commercial robotaxi service in a handful of US cities. That is seventeen years of capability work for a service most Americans will not see this decade. The reason is not that the technology cannot work in the cities it operates in. It can. The reason is that the trust axis has not moved as quickly as the capability axis, and the stakes of a single failure are so high that the curve is genuinely steeper than for any other consumer technology in living memory.
The fastest crossing of the four is the Roomba. iRobot launched it in September 2002. By 2004, sales had passed a million units. By 2006, two million. By 2010, more than five million. By 2025, more than fifty million. That is an eight-to-ten-year crossing, the fastest in the modern home, and it tells us something specific about why.
Stakes and Visibility
The Roomba crossed fast because of two structural features that almost no other automated technology in the home has.
The first is low stakes. If the Roomba does a bad job, you have a slightly dusty floor. You do not have a missed cancer screening, a forfeited mortgage application, or a six-year-old who shows up at school without lunch. The downside of a Roomba failure is recoverable in fifteen minutes.
The second is visible work. You can see whether the Roomba cleaned. The trust calibration loop runs in real time, against direct observation. You watch it work. You watch the bin fill. You inspect the floor. The system’s reliability is legible to you, in your living room, every single run.
Compare that to calling the doctor’s office. The stakes are high. A wrong appointment time creates a real downstream cost. A missed medication question creates a worse one. And the work is mostly invisible. The user does not hear the call. The user does not see the form. The user does not watch the agent click submit. The trust calibration loop has to run on artifacts (the confirmation email, the calendar event, the inbox notification) rather than on direct observation of the work.
This is the structural shape of why the home, and specifically the household-operations layer of the home, is an exceedingly hard place to cross the trust-capability curve. The stakes are uniformly high. The work is uniformly invisible. And the user has, almost always, been burned by other systems before.
What We Can Learn From Decades of Engineering Research
This is not new ground in engineering research. It is just new ground in consumer AI.
The foundational paper in the field is Raja Parasuraman and Victor Riley’s 1997 Humans and Automation: Use, Misuse, Disuse, Abuse, published in Human Factors. Parasuraman and Riley named the four failure modes that show up at the boundary between humans and automated systems. Use is when the human appropriately relies on the system. Misuse is when the human overrelies, fails to monitor, and gets burned by an automation error nobody caught. Disuse is when the human under-relies, distrusts the system, and does the work themselves (more slowly, more error-prone) because they cannot calibrate confidence in the machine. Abuse is when the system is deployed in contexts it was not designed for. Three of the four failure modes are about trust, not about capability.
The most-cited paper in the field, John D. Lee and Katrina A. See’s 2004 Trust in Automation: Designing for Appropriate Reliance, also in Human Factors, builds on this. The paper has been cited more than four thousand times. Its central concept is calibrated trust: trust that is appropriate to the system’s actual reliability. The failure modes Parasuraman and Riley named are forms of miscalibration. The job of an automated system is not to maximize user trust. It is to help the user calibrate trust accurately, so the user relies on the system in the situations where the system is reliable and overrides it in the situations where it is not. Trust calibration is part of a closed-loop process. It evolves based on performance feedback, organizational context, and the user’s own propensity to trust. Designing for calibrated reliance, not for trust maximization, is the actual job.
The third paper worth knowing about is Lisanne Bainbridge’s Ironies of Automation, from 1983, published in Automatica. Bainbridge is the one who pointed out, decades before consumer AI was a thing, that automating part of a task changes the human’s job in ways that are not usually anticipated. The human is now a monitor of the system. Monitoring is harder than doing. The system handles the routine cases; the human is left with the rare, hard, intervention-only cases for which they have had no practice. Every problem with self-driving handoffs, with pilot deskilling, with the way doctors miss what AI screening missed, is downstream of what she described.
Pull those three papers together and you get the actual shape of the curve we are crossing. Capability matters, but it is not the binding constraint. The binding constraint is whether the system trains the user to know when to trust it and when to verify, and whether the system keeps the user’s monitoring skills intact while the system is doing the work. That is a much harder design problem than “make the AI better.” And it is the one nobody in our category is really talking about.
This engineering research tells us the shape of the curve. What it doesn’t tell us is why the household has the steepest version of it. Next week, I’ll share how new technologies in the home have historically added to mother’s mental load, and what we’re doing at AVA to navigate this two-axis framework so we’re able to successfully bring that load down.
Sources
Parasuraman, R., & Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2), 230-253.
Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50-80.
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779.
Glikson, E., & Woolley, A. W. (2020). Human Trust in Artificial Intelligence: Review of Empirical Research. Academy of Management Annals, 14(2), 627-660.
Daminger, A. (2019). The Cognitive Dimension of Household Labor. American Sociological Review, 84(4), 609-633.
American Banker — The evolution of the ATM and CNBC — 5 fun facts about the ATM, in honor of its 50th birthday and ATM Marketplace — Timeline: The ATM’s history (1967 Barclays, 1969 Chemical Bank, 1973 two thousand US ATMs, shared networks mid-1980s)
Wells Fargo History — First in online banking (May 1995 launch; ten thousand of 3.5 million customers using Prodigy; 450,000 weekly visits by mid-1997)
Pew Research Center — Online Banking 2005 and Pew Research Center — 51% of U.S. Adults Bank Online (2013) (adoption curve)
Federal Reserve Bulletin — U.S. Households’ Access to and Use of Electronic Banking, 1989-2007
Bolt Flight — The Surprising History of Airplane Autopilot: How a 1912 Invention Revolutionized Aviation and Wikipedia — Autopilot (1912 Sperry; 1914 Concours de la Sécurité en Aéroplane demonstration; autoland systems certified late 1960s, routine 1980s)
iRobot — Sales of iRobot Roomba Vacuuming Robot Surpass 2 Million Units and iRobot Achieves Milestone in Home Robot Market and IEEE Spectrum — iRobot Roomba History: How a Focus Group Changed It and Wikipedia — Roomba (September 2002 launch; 1M units by 2004; 2M by 2006; 5M by 2010; 50M by 2025)
Bernard Marr — Key Milestones Of Waymo - Google’s Self-Driving Cars and Wikipedia — Waymo (2009 project start; 2016 Waymo spin-out; 2018 commercial launch in Phoenix; 2019 driverless service)

