Skip to main content
← Back to jobs
Serve Robotics logo

Reliability Operations Engineer (Mexico)

Job Description

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

The Reliability Operations Engineer supports the operational reliability of robotic and cloud systems by handling Tier 2 escalations, following and improving runbooks, and performing technical investigations during your region’s daytime hours. This role works closely with senior team members, product engineering, and SREs to investigate issues, refine operational workflows, and strengthen system health. This position contributes to incident response by providing triage and clear communication, ensuring timely escalation and effective coordination across teams.

Responsibilities

  • Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response.

  • Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed.

  • Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures.

  • Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks

  • Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance.

  • Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination.

  • Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability.

  • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise.

  • Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves.

Qualifications

  • hands-on experience.

  • 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.

  • Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation.

  • Exposure to operational environments supporting distributed or cloud-based systems.

  • Participation in incident response workflows and/or on-call rotations.

  • Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics.

  • Experience using and contributing to runbooks and operational workflows.

  • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry.

  • Familiarity with cloud platforms, preferably Google Cloud Platform (GCP).

  • Ability to follow documented remediation steps, with good judgment around when to escalate.

  • Understanding of CI/CD pipelines and how application deployments affect runtime behavior.

  • Experience using Jira or similar ticketing systems.

  • Clear and effective communicator, especially when providing updates during time-sensitive operational issues.

  • Calm, organized approach to troubleshooting and prioritization.

  • Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs.

  • Strong sense of ownership and accountability for operational responsibilities.

What Makes You Stand Out
Prior experience participating in high-severity incident response or supporting operational incidents.

  • Exposure to robot fleets, IoT systems, or other distributed physical device environments.

  • Ability to write or modify lightweight scripts and automations to improve operational workflows.

  • Familiarity with incident management platforms such as PagerDuty, OpsGenie, Jira Service Management, or Grafana IRM.

  • Experience contributing to the creation or improvement of operational runbooks and support documentation.

  • Strong networking fundamentals; familiarity with Tailscale or similar zero-trust networking tools is a plus.

  • Demonstrated ability to learn quickly and contribute to improving operational maturity within a team

Additional Information

  • As part of maintaining continuous operational coverage, this role also participates in a rotating weekend on-call schedule shared across the Reliability Operations team.

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Job Details

Department
Software
Category
Software
Employment Type
Full Time
Location
Mexico City, MX (remote) (Remote Available)
Posted
Apr 6, 2026, 03:31 PM
Listed
Apr 6, 2026, 03:41 PM

About Serve Robotics

Part of the growing space & AI ecosystem pushing the frontiers of technology.

Found this role interesting?

Reliability Operations Engineer (Mexico)
Serve Robotics
Apply ↗

Shipping like we're funded. We're not. No affiliation.

Sequoia logo
Y Combinator logo
Founders Fund logo
a16z logo