← Back to jobs

Reliability Operations Engineer (Mexico)

Serve Robotics•Mexico City, 15020, MX•Remote• Full Time

Posted 2 hours ago• Software• Software

Job Description

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

The Reliability Operations Engineer supports the operational reliability of robotic and cloud systems by handling Tier 2 escalations, following and improving runbooks, and performing technical investigations during your region’s daytime hours. This role works closely with senior team members, product engineering, and SREs to investigate issues, refine operational workflows, and strengthen system health. This position contributes to incident response by providing triage and clear communication, ensuring timely escalation and effective coordination across teams.

Responsibilities

Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response.
Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed.
Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures.
Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks
Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance.
Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination.
Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability.
Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise.
Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves.

Qualifications

hands-on experience.
2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.
Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation.
Exposure to operational environments supporting distributed or cloud-based systems.
Participation in incident response workflows and/or on-call rotations.
Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics.
Experience using and contributing to runbooks and operational workflows.
Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry.
Familiarity with cloud platforms, preferably Google Cloud Platform (GCP).
Ability to follow documented remediation steps, with good judgment around when to escalate.
Understanding of CI/CD pipelines and how application deployments affect runtime behavior.
Experience using Jira or similar ticketing systems.
Clear and effective communicator, especially when providing updates during time-sensitive operational issues.
Calm, organized approach to troubleshooting and prioritization.
Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs.
Strong sense of ownership and accountability for operational responsibilities.

What Makes You Stand Out
Prior experience participating in high-severity incident response or supporting operational incidents.

Exposure to robot fleets, IoT systems, or other distributed physical device environments.
Ability to write or modify lightweight scripts and automations to improve operational workflows.
Familiarity with incident management platforms such as PagerDuty, OpsGenie, Jira Service Management, or Grafana IRM.
Experience contributing to the creation or improvement of operational runbooks and support documentation.
Strong networking fundamentals; familiarity with Tailscale or similar zero-trust networking tools is a plus.
Demonstrated ability to learn quickly and contribute to improving operational maturity within a team

Additional Information

As part of maintaining continuous operational coverage, this role also participates in a rotating weekend on-call schedule shared across the Reliability Operations team.

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Ready to Apply?

This will take you to Serve Robotics's application page

Apply on Serve Robotics ↗

Job Details

Department: Software
Category: Software
Employment Type: Full Time
Location: Mexico City, MX (remote) (Remote Available)
Posted: Apr 6, 2026, 03:31 PM
Listed: Apr 6, 2026, 03:41 PM

About Serve Robotics

Part of the growing space & AI ecosystem pushing the frontiers of technology.

View Company Profile Visit Website ↗More Software Jobs

More Roles at Serve Robotics

Sr. Reliability Operations Engineer (Mexico)

Mexico City, MX Full Time

Aerospace Engineering

Senior Systems Verification & Validation Engineer

San Francisco, CA Full Time

Aerospace Engineering

Los Angeles, CA Full Time

RemoteAerospace Engineering

IT Identity and Access Management (IAM) Engineer

los angeles, CA Full Time

RemoteAerospace Engineering

Business & Finance

Operations Coordinator

Miami, FL Full Time

Business & Finance

View All Serve Robotics Jobs

Similar Software Roles

Advanced Technology: R&D Engineer - AI/ML, HPC

Sunnyvale, CA Full Time

Advanced Technology: AI/ML Research Scientist

Sunnyvale, CA Full Time

Full Stack Engineer-AI Voice

New York, NY$100K - $130K Full Time

View All Software Jobs

Found this role interesting?

Browse More Jobs

Reliability Operations Engineer (Mexico)

Serve Robotics

Shipping like we're funded. We're not. No affiliation.