The diplomat, the researcher, and the founder: three independent verdicts on local-first personal AI

Andy Massey · 26 April 2026 · Hong Kong

Three people, three completely different vantage points, three different reasons to care, and the same architectural answer over the past six months. None of them was talking to the others. None of them had any reason to converge on the same picture. They did anyway.

That kind of convergence is rare. It is also the strongest signal you can get that an architecture is real and not a fashion. So I want to lay out who said what, why it lands, and what it means for anyone trying to figure out where personal AI is heading.

The diplomat

On 26 April 2026, Vivian Balakrishnan, Singapore's Minister for Foreign Affairs, posted publicly on Facebook about a system he had been quietly building. He called it a second brain for a diplomat. It runs on a Raspberry Pi on his desk. It compiles a knowledge graph from his speeches and articles over time. It answers questions about his work, drafts speeches, condenses information, and sits at the intersection of every conversation channel he uses. He wrote one line about how he feels about it that no marketing team could manufacture:

It has become invaluable – I don't dare switch it off.

The minister is a surgeon-turned-technocrat who has been in the Singapore cabinet for two decades. He is not a hobbyist. He is also not someone with anything to gain from endorsing one architecture over another. He is a senior user describing what is working for him, and what is working for him is a small reasoner running locally on inexpensive hardware, glued to a personal corpus that never leaves the device. (Original Facebook post.)

The architecture he describes is not Ostler. He is using NanoClaw, an open-source self-hosted Claude assistant by Gavriel Cohen, plus the LLM Wiki pattern from Andrej Karpathy. Different runtime, different physical hardware, different country, different use case. But the architectural decisions are the same as the ones we made: the reasoner is small and local, the memory is yours and lives on your hardware, and nothing material flows out of the device unless you choose to send it.

The thing that strikes me about his post is the operational tone. He is not arguing for the architecture. He is just using it, and the sentence that fell out of his typing was about not wanting to lose access to it. That is what category-defining feels like from inside.

The researcher

Six months before the minister's post, on 17 October 2025, Andrej Karpathy went on Dwarkesh Patel's podcast and laid out the cognitive-core argument in detail. Roughly 95% of a frontier model's weights are doing memorisation work that has nothing to do with reasoning. Split the two functions apart, and a small reasoner with a curated external memory beats a 1.8-trillion-parameter monolith on the work that actually matters to a single human.

The maths backs him up. Llama 3 compresses its training data at something like 0.07 bits per token. Well-structured English carries around 1.5 bits per token. The frontier model is holding a lossy compressed image of the open web, and most of that image is noise that you, the user, do not need to access. GPT-4o at roughly 200 billion parameters already outperforms the original GPT-4 at 1.8 trillion. Inference cost for GPT-3.5-level quality fell by a factor of 280 between 2022 and 2024. The trend line is clear and not slowing.

Karpathy was making an architectural argument. He was not building a product. But the architecture he was describing is exactly what runs on the diplomat's Pi and on every Ostler customer's Mac. (Longer post on Karpathy's argument here.)

And he was not the only researcher saying this. Six months later, on 21 April 2026, OpenAI quietly released Privacy Filter as Apache-2.0 open weights. A 1.5-billion-parameter specialist whose entire job is to scrub PII from text on-device, designed to slot in front of any local pipeline. The cognitive-core thesis as a concrete deliverable, shipped by the company you would least expect to ship it. (Post here.)

A few days after that, a peer-reviewed paper from Nanjing University and ByteDance landed with a benchmark. PersonaVLM (arXiv:2604.13074): a 7-billion-parameter reasoner with a curated personalised memory beats GPT-4o by 5.2% on long-term personalisation tasks. Their memory taxonomy – core, semantic, procedural, episodic – maps almost cleanly onto the Personal World Graph that Ostler already builds. Their personality-evolution mechanism, a five-dimensional Big Five vector updated via exponential moving average across interactions, is a clean answer to a question I had been carrying in my own backlog.

One researcher articulating the architecture. One vendor shipping a primitive. One academic team benchmarking a piece. All independent. All converging.

The founder

I am the third vantage point, and the least authoritative of the three. I am not a researcher. I am not a senior user with two decades of public-service operational expertise. I am someone who founded Creative Machines in September 2025, looked at the cloud route for personal AI, and concluded that it could not be made to work on the privacy axis without compromise that I personally could not stomach. So I built the local-first version instead.

The pattern fell out of the constraints. Ostler runs a 9-billion-parameter reasoner on your Mac, glued to a personal knowledge graph that holds your contacts, calendar, messages, browsing, documents, and conversations as a structured memory. The reasoner takes a question, queries the memory, returns an answer. The reasoner is interchangeable; we will swap it for whatever is best every quarter. The memory is the thing the user owns and the thing the cloud route fundamentally cannot replicate.

I did not see this as a thesis at the time. It was just the only architecture that satisfied the privacy guarantee I wanted. The fact that the most cited researcher in the field, the most data-rich frontier lab, a senior research team in China, and the foreign minister of one of Asia's most technocratic governments have now all independently arrived at the same architecture is not a victory lap. It is the strongest piece of validation I could have asked for, from people who have no reason to confirm a small founder's hypothesis.

Why convergence from disjoint vantage points matters

If the diplomat were a researcher, you could call it groupthink. If the researcher were a vendor, you could call it positioning. If the founder were anything other than a small operator with a budget you could fit on a credit card, you could call it well-funded marketing. None of those frames work when the agreement comes from three roles that have no operational overlap.

The diplomat is solving a real, daily, operational problem: how does a busy minister keep his own working memory up to date and accessible. He is not optimising for elegance. He is optimising for not having to switch the system off.

The researcher is solving an architectural problem: where in the stack does each function belong. He is optimising for the right factoring of the workload. The diplomat's use case never enters his analysis.

The founder, in this case me, is solving a constraint problem: what is the smallest set of architectural commitments that satisfy a strict privacy guarantee while still being a useful product. Different lens again.

And yet they all end up with: a small reasoner, running locally, talking to a curated personal memory that lives on the user's own hardware. Three problems, three lenses, one answer.

The contrast still holds

Every cloud-routed personal AI on the market is making the opposite bet. Apple is reportedly about to announce that Siri will route difficult queries through Google Gemini. Perplexity sells a "Personal Computer" product that ships your data to their servers. Poke raised twenty-five million dollars to put an iMessage assistant on the cloud. Every one of these products has a trillion-parameter model doing the reasoning and a remote database holding the memory. They are optimising the wrong half. The reasoner is shrinking by a factor of two every nine months. The memory is the part that should never have left your device in the first place.

If the architecture survives the scrutiny of three completely disjoint vantage points, the cloud-routed competitors are not just making a different bet. They are building on a foundation that is actively shrinking underneath them.

Where this leaves us

The honest version of this story is that the architecture was forced on me by privacy, validated by the cost-quality trend, articulated by a researcher, shipped as a primitive by a frontier lab, benchmarked by an academic team, and is now being used by a foreign minister to answer policy questions on a Raspberry Pi. We did not predict the validation cluster. We just kept building, and the validation kept arriving.

If that is the future you want to bet on, Ostler is the version of it that lives on your Mac. The product is in friends beta. The architecture is documented here.

Three independent verdicts. One architecture.

A small reasoner. Your life as memory. Ostler runs on your Mac.

Why Ostler?

Questions, corrections, disagreements – hello@ostler.ai.