OpenAI shipped an open-weight PII model. We are wiring it in.

Andy Massey · 25 April 2026 · Hong Kong

On 21 April 2026, OpenAI quietly released something called Privacy Filter. It is a 1.5-billion-parameter mixture-of-experts model that detects personally identifiable information in text and redacts it. It runs locally. It is Apache 2.0. It hits 96% F1 on the standard PII-masking benchmark. The entire model, weights and all, sits on Hugging Face and GitHub for anyone to use.

I want to talk about what that release means, and why we are adding it to Ostler's pipeline this week.

What the thing actually is

The model is small by frontier standards. 1.5 billion total parameters, only 50 million active at inference time thanks to the mixture-of-experts design. 128k context window, which is generous for a task that usually operates on short strings. It is a bidirectional token classifier with span decoding, which is the right architecture for this job – it marks the start and end of every private span in the text rather than rewriting the whole string and hoping.

Eight categories out of the box: private person, private address, private email, private phone, private URL, private date, account number, and generic secret. Fine-tunable with small amounts of domain-specific data, reportedly taking the F1 from around 54% to 96% with not much effort.

On a consumer Mac, this thing is nearly free to run. 50 million active parameters is less RAM than a browser tab. You can scrub a 10,000-character payload before a web request finishes loading.

Why OpenAI releasing it matters

Set aside the product for a minute. The strategic signal here is loud.

OpenAI's commercial business is built on you sending them your text. That is the unit economics. Every API call is revenue. If PII detection needed to happen at their edge, on their servers, in their cloud, that is where they would have built it – and they would have charged you per-token for the privilege.

They did not. They shipped it as open weights, on permissive licence, designed to run on your own machine, with explicit documentation saying this is for de-identifying data before it leaves the device.

When the largest cloud-AI company on the planet ships you a model specifically designed to stop your data reaching cloud AI companies, pay attention. They are telling you where they think the industry is going.

This is the same thesis that Karpathy articulated on Dwarkesh on 17 October 2025. Small models, narrowly specialised, running locally, beat big models doing everything. Six months later OpenAI shipped a concrete example of the pattern. And a few days after that, a peer-reviewed paper from Nanjing University and ByteDance ("PersonaVLM", arXiv:2604.13074) landed with the same architectural argument and a benchmark behind it – a 7-billion-parameter reasoner with curated personalised memory beating GPT-4o by 5.2% on long-term personalisation tasks. Three independent endorsements – one researcher in October, one frontier lab and one academic team in April – for the bet we already placed.

What we are doing with it

Ostler has always been local-first. Your data does not leave your Mac unless you explicitly ask it to, and even then it travels through a payload viewer that shows you every scalar before it is sent.

Privacy Filter slots in as an extra belt on top of the braces. We are wiring it into three places.

One: ingest. When Ostler imports a GDPR export, or reads your browser history, or extracts facts from a recorded conversation, there are edge cases where third-party names slip into places the user did not expect. Running Privacy Filter over the extracted facts lets us flag "this note mentions someone who is not in your graph yet, do you want to store their details?" instead of quietly indexing them. That is a user-consent question Ostler is now in a position to ask.

Two: the Doctor diagnostic bundle. When something breaks and you send us a support bundle, we already sanitise it. Today that is a hand-maintained allowlist of acceptable scalar types. Tomorrow it is the allowlist plus Privacy Filter running over every string in the bundle before compression. Anything the model flags gets redacted and shown to you in a before-and-after view. You approve the send, or you cancel it. Nothing goes without your eye on it.

Three: cloud routing, if and when. Ostler does not currently route any query to cloud LLMs. At some point we may add an opt-in pathway for queries that are obviously public – "what year did the Berlin Wall come down" – where a cloud model gives a better answer and there is no personal data involved. Privacy Filter becomes the pre-flight gate. If it detects anything private in the query, the route is aborted, the query runs locally, and you are shown why. No silent leaks.

The engineering work is not huge. The integration scoping doc I drafted this week puts Phase 1 at under three engineer-days. The bigger work is the UI around it – the before-and-after viewer, the consent flows, the audit trail – because that is where user trust is actually earned.

The belt-and-braces argument

Privacy Filter is not a silver bullet. 96% F1 means 4% of PII slips through somewhere, which over a long enough timescale is a real failure mode. We are not replacing our existing controls with it. We are stacking it on top.

The architecture goes: allowlist first, model second, user eyes third. Any one of those layers failing does not leak your data. The allowlist rejects anything that does not match a known safe pattern. The model flags natural-language PII the allowlist cannot reason about. The viewer shows you the final payload before it leaves the machine. Three failures have to stack up for a leak to happen, and the third one is you clicking "send" on something you can read.

That is a different trust model from "we pinky-promise our cloud is secure." The interesting question is not whether our layers are perfect. It is whether our layers plus your attention are more trustworthy than someone else's server plus their privacy policy. I think they obviously are. I built Ostler on that belief.

What this says about the next twelve months

Open-weight specialist models are going to flood the market over the next year. Privacy Filter is one. There will be small local models for code analysis, for document classification, for medical redaction, for accessibility transcription. Every one of them is a piece of infrastructure that a local-first product can compose into its pipeline for roughly the cost of the RAM to hold them.

The gap between "local AI" and "cloud AI" used to be a capability gap. It is becoming an assembly gap. Who has the best pipeline of small, composable, local models doing narrow jobs really well? Not who has the biggest single model.

We have a head start on that pipeline. OpenAI just added a good part to it for free.

Privacy is an architecture, not a promise

Ostler keeps your data on your Mac. See how the pipeline fits together.

Why Ostler?

Thoughts, questions, or corrections to the pipeline design – hello@ostler.ai.