← Back to jobs

Compensation
$150,000–$350,000/year
Job Description
Qualifications
- CUDA + GPU inference optimization
- vLLM, SGLang, or TensorRT-LLM experience
- KV caching, paged attention, batching, token streaming, etc.
- Distributed compute (with GPUs is a super plus)
- No degree required
Company
Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.
Role
Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.
Day to day responsibilities:
- Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
- Conducting model performance reviews
- Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
- Sometimes write kernels and, yes, occasional tasteful shitposting
Optimize Your Resume for This Job
Get a match score and see exactly which keywords you're missing
Job Details
- Category
- Software
- Employment Type
- Full Time
- Location
- San Francisco, CA, US
- Posted
- Mar 24, 2026, 04:26 PM
- Compensation
- $150,000 - $350,000 per year
About Luminal
Part of the growing frontier tech ecosystem pushing the edges of what's possible.
More Roles at Luminal
Similar Software Roles

SpaceX
Software
Sr. Network Security Engineer (Firewalls)
Hawthorne, CA Full Time
21 minutes ago
Software
21 minutes ago
SpaceX
Software
Automation & Controls Engineer (Facilities)
Hawthorne, CA$100K - $115K Full Time
21 minutes ago
Software
21 minutes ago
Anduril Industries
Software
Senior Product Designer
Costa Mesa, CA$154K - $231K Full Time
36 minutes ago
Software
36 minutes agoFound this role interesting?
Cloud Inference Engineer
Luminal