Skip to main content
← Back to jobs
CellType logo

Founding Platform Engineer, Data & ML Systems

Compensation
$145,000–$250,000/year

Job Description

About CellType

CellType is building foundation models and agent systems for biology.

We believe the next major advances in biotech AI will come from rich biological data, strong model systems, and reliable infrastructure working together. We work with pharma and biotech partners on problems such as preclinical-to-clinical translation, response prediction, biomarker discovery, and scientific reasoning across complex biological datasets.

We are building the core intelligence layer for biology, and that requires a world-class data and ML platform.

About the role

We are hiring a Founding Platform Engineer to build the infrastructure backbone behind our training, evaluation, and inference stack.

We are looking for someone who can build the systems that make biological data usable for model development at speed and at scale: ingestion, indexing, search, retrieval, dataset interfaces, reproducibility, validation, orchestration, observability, and distributed performance.

You will work on the full path from raw data to training-ready datasets to reliable production workflows. The right person will make it dramatically easier for the rest of the team to build, evaluate, and ship models.

What you'll do

  • Build and maintain data infrastructure for model training, evaluation, and inference
  • Design and scale high-performance inference serving systems for biological foundation models
  • Design standardized dataset interfaces so biological data is consistent, discoverable, and easy to use across the team
  • Build ingestion and processing pipelines for public, proprietary, and customer datasets
  • Build indexing, search, and retrieval systems that make large datasets queryable and useful in practice
  • Establish safeguards and validation systems so datasets are reproducible, versioned, and trustworthy once standardized
  • Improve throughput, latency, and reliability of distributed data loading and ML pipelines
  • Profile and eliminate performance bottlenecks across GPU, networking, and storage layers
  • Automate fault detection and recovery for serving and training systems
  • Build internal tools for dataset inspection, debugging, quality control, and operational visibility
  • Partner closely with ML engineers and researchers so the platform fits real workflows rather than abstract platform ideals
  • Help define how we handle permissions, privacy, compliance boundaries, and operational rigor for sensitive biological and customer data

You may be a fit if you

  • Have deep experience in backend, infrastructure, distributed systems, or data platform engineering
  • Have built scalable data pipelines or stateful distributed systems in production
  • Have experience building or operating large-scale inference or training systems
  • Have a deep understanding of GPU execution constraints, memory trade-offs, and data-loading bottlenecks around training workloads
  • Have experience with dataset infrastructure for large-scale ML systems, training pipelines, or inference-adjacent systems
  • Have worked with multimodal or very large datasets that cannot simply fit in memory
  • Have hands-on experience with data indexing, search, or retrieval infrastructure, and understand how to make large datasets discoverable, queryable, and usable in practice
  • Can reason about system-level trade-offs between latency, throughput, and cost
  • Have experience working with privacy-sensitive or compliance-sensitive data systems
  • Have built internal developer tools for ML or data teams
  • Have a track record of owning critical production infrastructure
  • Are comfortable designing APIs, modular abstractions, and internal platform interfaces with strong attention to user experience
  • Have strong instincts around reliability, reproducibility, and operational simplicity
  • Are comfortable with cloud infrastructure, containers, Kubernetes, Infrastructure-as-Code, CI/CD, and observability
  • Produce maintainable code and make pragmatic architecture decisions under time pressure
  • Thrive in a small team where ownership is broad and priorities can change quickly

We'd be especially excited if you also have

  • Experience with biological, genomic, or scientific data formats and workflows
  • Contributions to open-source data or ML infrastructure projects
  • Experience building streaming or real-time data systems
  • Background in database internals, storage engines, or query optimization
  • Experience designing systems that serve both batch training and low-latency inference workloads

At CellType, the quality of our data and ML platform directly determines research speed, model quality, and customer trust. The right person will make the entire company faster and will shape the foundation we build on for years.

If you want to build the systems layer behind frontier AI for biology, we'd love to talk.

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Job Details

Category
Software
Employment Type
Full Time
Location
New York, NY, US
Posted
Apr 11, 2026, 12:40 AM
Listed
Apr 11, 2026, 12:40 AM
Compensation
$145,000 - $250,000 per year

About CellType

Part of the growing space & AI ecosystem pushing the frontiers of technology.

Found this role interesting?

Founding Platform Engineer, Data & ML Systems
CellType
Apply ↗

Shipping like we're funded. We're not. No affiliation.

Sequoia logo
Y Combinator logo
Founders Fund logo
a16z logo