← Back to jobs

Cloud Inference Engineer

Luminal•San Francisco, CA, US

Posted 21 hours ago

Compensation

$150,000–$350,000/year

Job Description

Qualifications

CUDA + GPU inference optimization
vLLM, SGLang, or TensorRT-LLM experience
KV caching, paged attention, batching, token streaming, etc.
Distributed compute (with GPUs is a super plus)
No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
Conducting model performance reviews
Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
Sometimes write kernels and, yes, occasional tasteful shitposting

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Ready to Apply?

This will take you to Luminal's application page

Apply on Luminal ↗

Job Details

Location: San Francisco, CA, US
Posted: Mar 24, 2026, 04:26 PM
Listed: Mar 24, 2026, 04:26 PM
Compensation: $150,000 - $350,000 per year

About Luminal

Part of the growing space & AI ecosystem pushing the frontiers of technology.

View Company Profile Visit Website ↗

More Roles at Luminal

Compiler Engineer

San Francisco, CA$150K - $350K FULL_TIME

View All Luminal Jobs

Found this role interesting?

Browse More Jobs

Cloud Inference Engineer

Luminal

Shipping like we're funded. We're not. No affiliation.