
Job Description
About Tessel
Tessel is the evidence infrastructure for safety-critical AI — the system that proves a model works, and keeps proving it.
The hard part in safety-critical sectors isn't building models, it's proving to stakeholders that they are safe and effective. Today, regulators, buyers, and insurers increasingly demand evidence about AI behavior across approval, procurement, and reimbursement. AI vendors are chasing a moving target: there are no settled evidence requirements, and each stakeholder defines safety differently. The result is massive back-and-forth and time spent on post-hoc model analysis surfacing issues too late to address efficiently, with work that doesn't compound and breaks down at scale. That friction is the tax on every AI product in healthcare, financial services, and autonomous systems, and it's only growing as regulation tightens.
Tessel closes that gap. We align AI development with the outcomes that matter to the business by providing the infrastructure and methodology to continuously investigate how a model behaves. This surfaces AI failures before they cost companies regulatory approval and customer trust, and generates the behavioral evidence needed to navigate regulatory, procurement, and reimbursement processes.
Role Overview
You'll work directly with diagnostic imaging AI vendors preparing 510(k) or De Novo submissions, and with academic medical centers building and deploying their own under LDT pathways, running evidence investigations using an AI-native workflow to investigate model behavior. This involves rigorously analyzing data, probing the models' internal representations, and producing claims about model behavior backed by structured evidence.
You'll iterate on the platform itself — rethinking the evidence standards and methodology we impose, such as safety case abstractions we instantiate. You'll critique the current platform, write methodology proposals, and write Python platform code yourself. When frontend features need user testing before we commit to building them in production, you'll prototype them with AI tooling.
We're hiring a Technical Staff (the in-fashion term for an ML engineer), but it’s really a founding member of the team. We're looking for someone who can challenge our assumptions and work with us beyond the technical to build what we think is foundational infrastructure for the future of ML, starting with delivering real value to our customers.
Key Responsibilities
Own engagements end-to-end: run the customer meetings, scope the real question, investigate model behavior (data analysis, probing internal representations, building safety cases), and deliver findings you stand behind, including telling customers when a model isn't ready.
Improve the platform: write methodology proposals on evidence standards and safety case abstractions, ship Python platform code, and prototype frontend features with AI tooling for user testing.
Shape how engagements run as the team grows: cadences, standards, onboarding.
Skills & Experience
Required:
Strong data science and ML fundamentals. You understand the math behind the methods you use, not just the APIs, and you've applied them in real systems with messy data and unclear questions.
Python fluency end-to-end, from raw data through analysis to platform code others will use.
You make informed engineering decisions based on trade-offs: how to structure code, where to draw interface boundaries, when reusability is worth the cost. If not yet fully there, strong willingness to make progress on this.
Adapts technical communication to different audiences without losing precision, and holds rigorous positions under commercial pressure. Comfortable telling customers their model isn't ready when it isn't.
Fluent with AI-assisted workflows. Comfortable using AI tooling as a primary mode of working, not just an occasional helper.
Nice-to-Have:
Top-tier degree in CS with an AI track, ML, math, physics, engineering, or related, plus 3–5 years of professional ML experience or a PhD in a relevant area such as model evaluation, robustness, OOD detection, or interpretability.
Reviewed AI/ML medical devices at the FDA, or equivalent regulatory experience.
Strong background in safety case methodology from aviation, automotive, or another safety-critical field.
Performance Expectations
By month 3: leading 2 engagements independently, with findings the team would defend externally.
By month 6: running 4 concurrent engagements, and having contributed to and pushed back on platform features and evidence methodology in meaningful ways.
By month 12: setting internal standards across engagement methodology, evidence quality, and onboarding and coaching for new hires.
We'll refine these milestones together in your first weeks, with 1:1s and quarterly formal reviews.
Optimize Your Resume for This Job
Get a match score and see exactly which keywords you're missing
Job Details
- Category
- Aerospace Engineering
- Employment Type
- Full Time
- Location
- Sunnyvale, CA
- Posted
- Compensation
- $150,000 - $220,000 per year
About tessel
Tessel conducts rigorous, independent evaluations of medical imaging AI systems. We focus on bridging the gap between benchmark performance and clinical reliability by analyzing how models behave in real-world settings. We serve as an independent evaluator that both healthcare institutions and AI vendors trust to measure, explain, and monitor model performance.
Similar Aerospace Engineering Roles



Found this role interesting?