Skip to main content
← Back to jobs
Luminal logo

Cloud Inference Engineer

Compensation
$150,000–$350,000/year

Job Description

Qualifications

  • CUDA + GPU inference optimization
  • vLLM, SGLang, or TensorRT-LLM experience
  • KV caching, paged attention, batching, token streaming, etc.
  • Distributed compute (with GPUs is a super plus)
  • No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

  • Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
  • Conducting model performance reviews
  • Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
  • Sometimes write kernels and, yes, occasional tasteful shitposting

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Job Details

Location
San Francisco, CA, US
Posted
Mar 24, 2026, 04:26 PM
Listed
Mar 24, 2026, 04:26 PM
Compensation
$150,000 - $350,000 per year

About Luminal

Part of the growing space & AI ecosystem pushing the frontiers of technology.

Found this role interesting?

Cloud Inference Engineer
Luminal
Apply ↗

Shipping like we're funded. We're not. No affiliation.

Sequoia logo
Y Combinator logo
Founders Fund logo
a16z logo