
Job Description
About the Role
We’re seeking a strong Software Engineer to join our AI Evaluation team, focused on building the infrastructure and internal tooling that enable reliable, repeatable evaluation of AI systems in production. In this role, you’ll develop the scaffolding around our evaluations platform—API/data integrations, test configuration management, and reporting tools that make evaluations easy to run, extend, and operate at scale. You’ll also build and maintain evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrails, productionalizing these frameworks to support regular analysis, regression detection, and continuous monitoring.
In addition, you’ll create debugging tools that make it easy to inspect end-to-end calls, trace failures, and surface all relevant signals in one place—empowering internal teams to diagnose issues quickly and confidently. The ideal candidate is a pragmatic, infrastructure-minded engineer who enjoys turning ad hoc analysis into durable systems, cares deeply about developer experience, holds a high bar for software-quality and takes pride in building tooling that makes complex AI systems observable, testable, and easier to operate.
Responsibilities:
Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability tools.
Develop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrails.
Partner with ML, engineering and QA teams to translate evaluation requirements into robust, maintainable infrastructure and tooling.
Improve developer experience by making evaluation systems easy to extend, well-documented, and reliable in day-to-day use.
Ensure evaluation tooling meets production standards for reliability, performance, and maintainability.
Qualifications:
5+ years of professional software engineering experience, with a strong focus on building backend systems, platforms, or developer tooling.
Proven experience designing and maintaining production-grade infrastructure with code, including APIs, services, and data pipelines.
Strong proficiency in at least one general-purpose programming language (e.g., Python, Typescript/Javascript, Java, or similar).
Experience using test automation frameworks, evaluation pipelines, or CI/CD-integrated testing systems.
Familiarity with observability and debugging tools (logging, metrics, tracing) and building internal tools that improve developer and QA workflows.
Strong debugging skills and a methodical approach to diagnosing production and evaluation issues.
Ability to collaborate effectively across engineering, QA, and operations teams, translating requirements into reliable, maintainable systems.
Product-minded approach to infrastructure, with attention to usability, documentation, and long-term maintainability.
Preferred:
Experience working with complex, multi-component systems (e.g., ASR, LLMs, TTS, or other ML-powered services)
Experience working in healthcare or other regulated environments, including awareness of HIPAA and PHI handling.
Familiarity with conversational AI or voice agents, including multi-turn dialogue, latency constraints, and error recovery.
Familiarity with LLM observability or evaluation tools (e.g., Langfuse, prompt eval frameworks).
Background in digital health, care coordination, or patient-facing systems.
Salary and Benefits
We offer competitive salary and benefits, including 401(k) matching, health, vision, and dental insurance, and very flexible paid time off.
The typical salary range for this role is $160,000 to $210,000 USD, depending on skills, qualifications, and relevant experience.
Background Checks
As a health technology company, we reserve the right to run background checks on candidates to whom we extend offers, in compliance with applicable laws. We evaluate candidates holistically and comply with all “ban the box” regulations.
Assistance
If you have a disability or require accommodations during the application or recruitment process, please contact [email protected].
Optimize Your Resume for This Job
Get a match score and see exactly which keywords you're missing
Job Details
- Category
- Aerospace Engineering
- Employment Type
- Full Time
- Location
- San Francisco, CA (Hybrid)
- Posted
- Last updated
- Jun 2, 2026, 06:36 PM
- Compensation
- $160,000 - $210,000 per year
About Ellipsis Health
Sage provides 24/7 support and simplified healthcare navigation, guiding patients through complex healthcare processes with empathetic conversation.
More Roles at Ellipsis Health





Similar Aerospace Engineering Roles



Found this role interesting?