Cloud Inference Engineer

San Francisco, CA

• Full Time

Posted 3 months ago• Software

Compensation

$150,000–$350,000/year

Job Description

Qualifications

CUDA + GPU inference optimization
vLLM, SGLang, or TensorRT-LLM experience
KV caching, paged attention, batching, token streaming, etc.
Distributed compute (with GPUs is a super plus)
No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
Conducting model performance reviews
Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
Sometimes write kernels and, yes, occasional tasteful shitposting

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Ready to Apply?

This will take you to the Luminal application page

Apply on Luminal

Job Details

Category: Software
Employment Type: Full Time
Location: San Francisco, CA
Posted: Mar 24, 2026, 04:26 PM
Last updated: Jun 29, 2026, 12:40 PM
Compensation: $150,000 - $350,000 per year

About Luminal

Luminal provides a machine learning compiler and serverless cloud platform that automates PyTorch model optimization and deployment.

View Company Profile Visit Website More Software Jobs

More Roles at Luminal

Aerospace Engineering

Senior Compiler Engineer

San Francisco, CA$200K - $350K Full Time

Aerospace Engineering

View All Luminal Jobs

Similar Software Roles

Data Science Manager

Spain Full Time

Thales Alenia Space

Estágio em Desenvolvimento de Software

Barueri, Brazil Full Time

Site Reliability Engineer Senior

Colombia Full Time

View All Software Jobs

Found this role interesting?

Browse More Jobs

Cloud Inference Engineer

Luminal

See full Software compensation at Luminal

Real salary data from H1B filings, public bands, and verified community submissions.