There's a moment, usually in a car or on a walk, when you remember something important. Something you absolutely cannot forget. A thing you've been meaning to do, a call you've been putting off, an idea that finally crystallised after three days of vague background processing.

And then you have a small, familiar crisis. Do you stop to type it? Risk it evaporating while you unlock your phone, find the app, and hunt for the right place to put it? Or do you just try to hold it — add it to the already crowded mental pile and hope for the best?

Most of us choose the pile. Which is a shame, because the alternative — just saying it out loud — is faster, lower effort, and, increasingly, more accurate than typing. We've had the technology for years. We're just weirdly resistant to using it.

The numbers are more striking than you'd expect

The speed case for voice input is not subtle. Research from Stanford University found that voice input is roughly three times faster than typing on a smartphone — 161 words per minute for speech versus around 53 for keyboard entry. The average person speaks at 150 words per minute and types at 40. That's not a marginal difference. It's the difference between capturing a thought at the speed it arrives and transcribing it long after the moment has passed.

150 wpm Average speaking speed
40 wpm Average typing speed
Speed advantage of voice over typing on mobile (Stanford)

And yet, despite this, typing remains the overwhelming default for capturing thoughts, notes, and tasks. We carry devices with Siri, Google Assistant, and Alexa built in — Siri alone has 86.5 million users in the United States — and most of us use them to set timers and ask about the weather, then go back to thumbing things out letter by letter for everything that actually matters.

The gap between capability and behaviour is the whole interesting story here.

We're already doing it more than we think

Here's what's easy to miss: voice input has already become a genuine habit for a significant chunk of the population — just not for the things that would help most.

Around 52% of people use voice search daily or almost daily. 71% of consumers say they prefer voice queries over typing when possible. There are now more than 8.4 billion voice-enabled devices in use globally — more than there are people on Earth. The infrastructure is there. The habit, in a limited form, is already forming.

But there's a pattern in how people actually use it. Voice tends to get deployed for low-stakes, transactional requests — "hey Siri, what time is it," "Alexa, add milk to my shopping list," "okay Google, navigate home." The moment the task gets more personal, more complex, or more likely to be overheard, most people reach for the keyboard instead.

The moment the task gets more personal, more complex, or more likely to be overheard, most people reach for the keyboard instead.

Which tells you something important: the barrier to voice input isn't technological. The technology is good. Recognition accuracy has reached 93–95% across major platforms. The barrier is psychological.

The embarrassment problem nobody talks about

Research on voice assistant adoption consistently surfaces one finding that doesn't get nearly enough attention: people feel self-conscious talking to devices, especially in the presence of others.

Studies show that especially infrequent users feel socially embarrassed about using speech-based assistants with other people present. There's something about speaking aloud to a phone that still reads as slightly odd — performative in a way that tapping doesn't. We've absorbed a set of social norms around when it's appropriate to use your voice in public, and "dictating your to-do list on the bus" doesn't yet fit comfortably within them.

This isn't irrational. It's a real social calculation. The irony is that as voice input becomes more ubiquitous — more people doing it, in more contexts — the self-consciousness tends to dissolve. The people who pushed through the awkward early phase consistently report that it stopped feeling strange faster than they expected.

There's also a subtler version of the embarrassment problem that operates even in private. Saying something out loud makes it feel more real, more committed, more observable — even if no one is listening. For capturing difficult thoughts or admitting to yourself that you've been avoiding something, the voice can feel higher stakes than the keyboard. It's worth knowing that feeling is normal, and that it fades.

What voice actually does to your thinking

Beyond speed, there's something more interesting happening when you use voice to capture thoughts rather than type them.

When you type, there's a cognitive bottleneck between what you're thinking and what gets recorded. Your brain is generating ideas at speaking pace — roughly 150 words per minute — but your fingers can only keep up with a fraction of that. Something has to give. Usually it's nuance, or completeness, or the half-formed idea you haven't quite articulated yet. You simplify in order to transcribe.

Voice removes that bottleneck. You capture the thought at something closer to the speed it actually exists in your head. The idea doesn't have to survive the compression of typing — it arrives intact.

93.7% Accuracy rate of modern voice assistant responses. The technology has quietly crossed the threshold where it's reliable enough to trust — most of us just haven't updated our assumptions about it yet.

This matters particularly for the kind of thoughts that make up a mental load: half-finished tasks, things to remember, things to follow up on. These are often not fully formed ideas. They're fragments. Typing them forces premature articulation. Speaking them lets them exist in the form they actually take in your head — messy, quick, incomplete in a way that's still useful.

What happens when you make it easy enough

The research on friction and behaviour change is pretty clear: small reductions in effort produce disproportionate changes in behaviour. A habit that requires five steps is vastly less likely to stick than one that requires two, even if the five-step version is objectively better in every other way.

This is the real reason voice input hasn't fully displaced typing for thought capture, even though it's faster. The existing options — voice memos, Siri, Alexa — require enough setup, navigation, or context-switching that the friction still wins. You reach for the keyboard because it's already open, already familiar, already there.

A habit that requires five steps is vastly less likely to stick than one that requires two, even if the five-step version is objectively better in every other way.

The promise of voice input — for task capture, for mental load, for all the things people are trying to hold in their heads — only really delivers when the friction gets low enough that the moment between "thought" and "captured" collapses almost completely. One tap. Say it. Done. No navigation, no categorisation in real time, no decisions required.

Which is why we built Carry around voice

Carry is a mental load app. Its job is to get the things in your head out of your head — tasks, reminders, things to follow up on, all the fragments of ongoing life that compete for working memory and lose.

We could have built it around typing. Most apps do. But the research pointed somewhere else: voice is faster, lower friction, and better suited to capturing thoughts in the form they actually arrive — which is usually while you're doing something else entirely. On a walk. Driving. Mid-conversation when something surfaces unexpectedly.

The primary input in Carry is voice. You tap, you speak, and the app handles the rest — categorising, organising, surfacing things back at the right moment. The goal was to make the gap between "I just remembered something" and "that thing is captured and I can stop holding it" as small as possible.

Because the mental load problem isn't really about organisation. It's about the cognitive cost of carrying things around unexternalised. And the only way to reduce that cost is to make externalisation easy enough that you actually do it — in the moment, every time, without it becoming another task on the list.

Voice, done right, is the closest thing we've found to capturing a thought at the speed of thought. The technology got there a while ago. The habit is just catching up.