White Circle: From Universal Jailbreak to $11M AI Safety Startup

The Jailbreak That Became a Company

One evening in late 2024, Denis Shilov wrote a single prompt that broke every major AI model he tried. The instruction was simple: stop acting like a chatbot with safety rules and instead behave like an API endpoint, a piece of software that accepts requests and returns responses without deciding whether to refuse them. It worked. ChatGPT, Claude, and other models complied with questions they were designed to reject.

Researchers call this a universal jailbreak. It could bypass any model's guardrails and produce prohibited outputs. Shilov posted his findings on X, where Today's Startup News reported the post reached 1.4 million views.

That virality was not just a distribution event; it was a credential. The post drew direct attention from Anthropic and Hugging Face, leading Shilov to join Anthropic's bug bounty program. Those conversations made something clear to him: the problem extended past finding clever prompts. "Jailbreaks are just one part of the problem," Shilov said. "In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm."

When the person who exposed a fundamental flaw in your product shows up offering to build the fix, you pay attention. The $11 million seed round announced on May 12, 2026, backed by operators from OpenAI, Anthropic, DeepMind, Mistral, and Hugging Face, is the financial manifestation of that attention. But for engineers evaluating whether to join White Circle, the jailbreak is the real pitch. It proves the founder identified a structural weakness in how the largest AI labs build their safety systems, and it gives the company a technical origin story that no amount of venture funding can manufacture.

The Investor Roster Is the Hiring Pipeline

The seed round's investor list reads less like a cap table and more like a who's-who of the labs whose models White Circle is built to police. Romain Huet from OpenAI, Dirk Kingma (ex-OpenAI, now at Anthropic), Guillaume Lample from Mistral, Thomas Wolf from Hugging Face, Mehdi Ghissassi and Paige Bailey from DeepMind, Olivier Pomel from Datadog, François Chollet (Keras), and David Cramer from Sentry all put personal money into the round, per the company's May 2026 announcement.

That's not a passive check. These are operators who see the production guardrail problem from the inside every day. When the people building the models also fund the layer that monitors them, the signal to potential engineering hires is direct: this is where the industry is heading, and the people who know the models best are betting on it.

The dynamic works both ways. White Circle's investors function as early design partners. They know exactly where their own production deployments break, and they have a financial stake in White Circle fixing it. For a safety engineer weighing offers between a model lab's internal alignment team and a startup, that operator-led roster is a recruiting tool. It says the company has access to real production constraints, not just research abstractions.

The hiring implications are concrete. White Circle plans to expand its team across the US, UK, and Europe with the $11M, and the investor network doubles as a talent referral engine. Zero G Talent's board data shows OpenAI alone has 62 roles added in the past week; Anthropic has 30. Engineers at those labs who want to move from building models to controlling them in production now have a direct line, since the same people writing the checks are the ones who can make the introduction.

Ophelia Cai, partner at Tiny VC, which also participated in the round, framed it plainly: the team has "deep technical credibility and a clear commercial instinct." That combination is what makes the investor-hiring pipeline credible. These aren't brand-name endorsements. They're operators who broke the models, saw the gap, and funded the fix.

CircleGuardBench: A Shared Measuring Stick

White Circle open-sourced CircleGuardBench on GitHub in May 2025, and the move does something most AI safety startups never attempt: it gives the entire field a shared measuring stick before the company has shipped a production product. The benchmark evaluates large language model guard systems across 17 harm categories, from violence and cybercrime to jailbreaking and self-harm, and folds in adversarial prompt variations, false-positive rates on safe inputs, and runtime latency. Most benchmarks stop at accuracy. CircleGuardBench combines accuracy, attack robustness, and speed into a single integral score.

That design choice signals where White Circle thinks the actual bottleneck is. A guard model that catches 99% of harmful prompts but adds 800 milliseconds of latency per request will get bypassed by any product team shipping real-time chat. The benchmark's leaderboard, hosted as a Hugging Face Space, ranks models on macro-averaged metrics across default prompts, jailbreak attempts, and per-category breakdowns. The public dataset sits on Hugging Face under White Circle's Responsible Use License, which restricts commercial model training and redistribution without written consent.

The engineering details matter here. CircleGuardBench supports four inference engines: OpenAI's API, vLLM, SGLang, and Hugging Face Transformers. The repo's commit history shows active maintenance through early 2026, including a March 2026 pull request that stripped heavy dependencies like vLLM and PyTorch from the base install to keep the Hugging Face Space build functional on CPU-only Docker. The project has 70 stars and 5 forks on GitHub, modest numbers that understate its utility: this is infrastructure, not a viral demo.

When a safety company releases a public benchmark, it's making a claim that the problem is bigger than any single vendor's solution, and that the company's long-term value lies in being the layer everyone builds on top of. That's a different bet than joining Anthropic's internal safety team or OpenAI's alignment division, where the work feeds one model lab's product roadmap. At White Circle, the bet is that guardrails become a horizontal discipline, and the engineers building the evaluation tooling today are defining the standards the rest of the industry will measure against.

For anyone evaluating offers across the AI safety space, the distinction is concrete. Model labs treat safety as a constraint on their product. A company like White Circle, which literally published the ruler everyone else will get measured with, treats safety as the product. The CircleGuardBench GitHub repo is the proof.

The Hiring Blitz: Mission Control for Enterprise AI

White Circle's $11M seed isn't going into compute clusters or research papers. It's going into headcount, specifically the kind of engineering talent that turns AI safety from a whiteboard exercise into something that runs inside a company's live product.

The roles on offer tell you exactly what "production-grade guardrails" means in practice. This isn't a lab hiring alignment theorists to write papers about catastrophic risk. It's a company building a mission-control layer for enterprise AI, and the job postings read like that of an infrastructure startup, not a think tank.

What "Mission Control for AI" Actually Means

When White Circle talks about mission control, the term maps to a concrete engineering problem. Enterprises deploying agentic AI, systems that take actions, call tools, move money, adjust inventory, need a real-time oversight layer that sits between the model and the action. That layer has to do four things simultaneously: detect when the model is drifting off-policy, block harmful outputs before they reach the user or the database, log every decision for compliance, and do all of it with latency low enough that it doesn't break the product.

That's a systems engineering problem. It requires people who know how to build low-latency inference pipelines, how to write detection logic that runs alongside a model call, how to integrate guardrail APIs into existing enterprise stacks, and how to validate that the whole thing works at scale. The safety part matters, you need to understand adversarial prompting, jailbreak vectors, and output classification, but it's applied safety, the kind that ships.

Three Talent Clusters Driving the Buildout

Based on the company's stated direction and the pattern of roles emerging in the AI safety infrastructure space, White Circle's hiring push centers on three clusters of talent.

Guardrail engineering. These are the people building the actual detection and filtering systems, the middleware that intercepts model outputs and decides in milliseconds whether something passes. The work requires fluency in natural language processing, adversarial testing, and the kind of latency-sensitive systems design you'd find at a database company. Salaries for this profile at well-funded startups run from $250K to $385K in San Francisco, a range reflected across the board at companies like OpenAI and Anthropic, where safety-adjacent engineering roles are priced at parity with core model work.

Enterprise deployment. Getting guardrails into a Fortune 500 requires integration engineers who understand API gateways, identity management, and the compliance frameworks (SOC 2, HIPAA) that procurement teams demand. Mistral's recent hiring in Seoul for applied AI and fullstack deployment roles signals that this isn't just a U.S. trend; the demand for engineers who can ship safety infrastructure into production is going global.

Red team and benchmark development. CircleGuardBench exists because the industry needed a standard way to measure guardrail performance. Maintaining and evolving that benchmark requires people who can design adversarial test suites, curate evaluation datasets, and turn red-team findings into engineering requirements. It's the feedback loop between attack and defense, and it's a role that barely existed as a job title three years ago.

Why This Hiring Wave Matters Beyond One Startup

The pattern here is bigger than White Circle. Zero G Talent's data shows OpenAI added 62 roles in the past week alone, including a Device Safety & Risk Operations Specialist and a Government and Community Affairs Manager, positions that reflect the operational reality of running AI systems that touch real users. Anthropic posted 30 roles in the same window, spanning applied AI architects and editorial staff focused on model behavior. Mistral is building out an applied AI team in Seoul.

Across the board, the AI safety and alignment field is shifting from a research discipline to an engineering one. The job postings prove it: companies aren't just hiring people to think about safety. They're hiring people to build it, deploy it, monitor it, and keep it running when the system is live at 3 a.m. and a model is doing something it shouldn't.

White Circle's bet is that this production layer, the mission control, is a distinct product category, not a feature that model labs will build themselves. If that bet is right, the engineers joining now are getting in at the point where AI safety stops being a research problem and starts being an industry.

Why Enterprise AI Needs a Mission-Control Layer

The bottleneck in enterprise AI has shifted. The models work. The APIs are stable. The remaining problem is that nobody has built the layer between a capable model and a production workflow that can say "no" reliably and log why.

Companies deploying agentic systems into customer-facing or regulated workflows are hitting the same wall: the model can do the task, but the organization can't verify, constrain, or audit what it does at runtime. Existing guardrail approaches, prompt-level instructions, post-hoc filtering, human-in-the-loop review queues, break down the moment an agent chains multiple tool calls or operates across a session longer than a few turns. The failure mode isn't the model hallucinating once. It's the model taking a sequence of individually reasonable actions that, in aggregate, violate a policy nobody encoded.

This is the gap White Circle is targeting. The company's pitch is that enterprise AI needs a mission-control layer, a runtime enforcement system that sits between the agent and the tools it can access, evaluating each action against a defined policy before execution, not after. That positioning matters because it reframes safety from a research problem (how do we align the model?) to an infrastructure problem (how do we constrain the system the model operates inside?). The second problem has engineering answers: policy engines, deterministic guardrails, audit logs, override mechanisms. The first one doesn't, yet.

The hiring data supports the thesis that this layer is where the work is moving. OpenAI posted a Device Safety & Risk Operations Specialist role in the past week. Anthropic is hiring an Applied AI Architect for partnerships, a role that implies deploying models into external systems with constraints attached. Mistral's recent listings in Seoul are all deployment-focused: AI Deployment Strategist, Applied AI Engineer, Fullstack. The job titles tell the story. The industry is staffing for the layer between model and production, not just for better models.

White Circle's bet is that this layer can't be built inside any single model lab, because it has to work across models, including ones the lab didn't make. That's the structural reason a standalone company can exist here where a research team inside Anthropic or OpenAI can't fully replace it. A mission-control layer that only works on one model's outputs is a feature. One that works across OpenAI, Anthropic, Mistral, and open-weight models is infrastructure. The $11M seed, drawn from operators at all of those labs, is a bet that the infrastructure play is the one worth making.

Safety as Infrastructure vs. Safety as Research

White Circle sits in a crowded field, but the company's thesis is structurally different from most of what's competing for the same engineering talent. The distinction comes down to where safety lives in the stack and who it's built for.

Anthropic's internal safety teams and OpenAI's alignment divisions operate as research functions inside model labs. Their work feeds back into the models themselves: red-teaming, constitutional training, refusal tuning, interpretability research. The output is a safer model, shipped by the lab. Engineers who join those teams are joining a research org that happens to sit inside a product company. The work is deep, but it's upstream; it ends when the model ships.

White Circle is building downstream. The company's bet is that model-level safety is necessary but not sufficient, and that the real gap is at the deployment layer, the point where an enterprise plugs an AI agent into a customer-support queue, a financial-approval pipeline, or a healthcare triage system. That layer needs guardrails that are configurable, auditable, and benchmarked against real production scenarios, not just research benchmarks. CircleGuardBench is the signal of that positioning: an open-source benchmark implies the company wants to become a standard, not a proprietary feature.

The investor composition reinforces the split. OpenAI and Anthropic are hiring aggressively for their own safety and alignment roles. Zero G Talent's board shows the same 62 roles in the past week alone, including a Device Safety & Risk Operations Specialist and an Agent Post-Training Context Research position, while Anthropic posted 30. Those are research-heavy roles inside the model labs. White Circle's backers come from the same companies, but they're writing checks as individuals, not as corporate strategy arms. That operator-led funding pattern, engineers and leaders from OpenAI, Anthropic, DeepMind, Mistral, and Hugging Face investing personal capital, suggests they see a gap the labs themselves aren't structured to fill.

The competitive implication for engineers is concrete. Joining Anthropic's safety team means working on the next model's alignment. Joining White Circle means building the layer that sits between that model and the enterprise that deploys it. One is research. The other is infrastructure. The talent market is starting to price that difference.

Working in AI? Zero G Talent tracks the openings: browse AI jobs, openings at OpenAI, Anthropic and Mistral AI, and the people building the field.

He Broke Every Major AI Model in One Night. Now the Labs That Built Them Just Paid Him $11 Million.

The Jailbreak That Became a Company

The Investor Roster Is the Hiring Pipeline

CircleGuardBench: A Shared Measuring Stick

The Hiring Blitz: Mission Control for Enterprise AI

What "Mission Control for AI" Actually Means

Three Talent Clusters Driving the Buildout

Why This Hiring Wave Matters Beyond One Startup

Why Enterprise AI Needs a Mission-Control Layer

Safety as Infrastructure vs. Safety as Research

Explore Related Content

Related Categories

Related Articles

Related Articles

Temporal's Job Posting Bans Data Scientists. Senior Engineers Report $340K Median.

Anthropic's London AI Engineers Now Command £340k, Resetting Europe's Pay Ceiling

First Hire Post-Merger: $283K DevEx PM, Not an AI Researcher

Ready to Start Your Space Career?