Job Description
You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better.
Key Responsibilities
- Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads
- Build and maintain CI/CD pipelines and deployment infrastructure
- Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability
- Build dashboards, alerting, and anomaly detection across our systems
- Define and enforce SLOs and build out incident response processes
- Manage and improve our networking, load balancing, and service mesh configurations
- Drive reliability improvements across the stack through automation, runbooks, and chaos engineering
Requirements
- 5+ years experience in managing critical production systems and software development workflows
- Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible)
- Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS
- Experience building CI/CD systems and GitOps workflows (FluxCD, ArgoCD)
- Proficiency in Python and either Go or Bash for tooling and automation
- Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, VictoriaMetrics, Datadog)
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
- Experience with managing GPU and AI/ML workloads
- Experience with kernel-based monitoring and routing (eBPF, XDP)
- Experience with security tooling (Falco, Coroot, SIEM)
- Experience with bare metal Kubernetes networking (Calico, Cilium, MetalLB)
- Experience with distributed storage systems (Ceph, Longhorn, etc.)
Location
-
Turkey
What we offer at fal
- Interesting and challenging work
- A lot of learning and growth opportunities
- Regular team events and offsites
Optimize Your Resume for This Job
Get a match score and see exactly which keywords you're missing
Job Details
- Category
- Software
- Employment Type
- Full Time
- Location
- Turkey
- Posted
- May 7, 2026, 01:18 PM
- Listed
- Mar 30, 2026, 07:15 PM
About Fal
Part of the growing frontier tech ecosystem pushing the edges of what's possible.
More Roles at Fal

Fal
Software
Senior Software Engineer, Data
San Francisco, CA Full Time
3 hours ago
Software
3 hours ago
Fal
RemoteSales & Marketing
Technical Support Engineer
Remote Full Time
8 hours ago
RemoteSales & Marketing
8 hours ago
Fal
Sales & Marketing
Technical Business Development (Model Labs)
San Francisco, CA$220K - $270K Contract
8 hours ago
Sales & Marketing
8 hours ago
Fal
Business & Finance
Technical Accounting and Reporting Manager
San Francisco, CA$160K - $200K Full Time
8 hours ago
Business & Finance
8 hours ago
Fal
Software
Staff Technical Lead for Inference & ML Performance
San Francisco, CA Full Time
8 hours ago
Software
8 hours agoSimilar Software Roles

Fresco
Software
Founding Software Engineer
San Francisco, CA$100K - $200K Full Time
33 minutes ago
Software
33 minutes ago
Flagright
Software
Software Engineer - Bangalore
Bangalore, IN Full Time
33 minutes ago
Software
33 minutes ago
Cignara
RemoteSoftware
Founding (Senior) Frontend Engineer
In, IN₹4000K - ₹6000K Full Time
34 minutes ago
RemoteSoftware
34 minutes agoFound this role interesting?
Software Engineer, Site Reliability
Fal