Location: Berlin, Germany

We Say HI*

Site Reliability Engineer / Cloud Operations Engineer (f/m/d)

German companies and public administrations in this country are ready to accelerate their digital transformation and the use of AI—but they will never compromise on the security of their most sensitive data. This is where Thales in Germany, in partnership with Google Cloud and our new company currently being established, comes into play. With a new, 100% German business unit, we are providing a concrete response to the strict requirements of the BSI. What we are creating is a locally and fully autonomously operated “Trusted Cloud”. It provides access to the broadest service portfolio on the market, while everything remains strictly under European jurisdiction. By combining German and French standards such as SecNumCloud, C5 and C3-A, we offer our customers unequaled resilience and business continuity. This is a turning point for our industry and a decisive step towards a strong, sovereign digital Europe.

Your mission as Site Reliability Engineer:

Operate and maintain mission-critical sovereign cloud services with availability targets of 99.99% and above.
Monitor service health, reliability, scalability, latency, and performance using Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Investigate, troubleshoot, and resolve complex production incidents across large-scale distributed cloud environments.
Participate in a structured 24/7 on-call rotation (approximately one week every six weeks) to ensure continuous service availability.
Collaborate with Site Reliability Engineers, Cloud Infrastructure Specialists, and Product Experts across international teams to mitigate incidents and drive long-term solutions.
Build a deep understanding of Google's cloud technologies and distributed systems through an intensive training program covering technologies such as Borg, Colossus, Spanner, and other core GCP components.
Drive operational excellence by creating and maintaining technical documentation, standardizing incident response procedures, and continuously improving operational playbooks.
Lead and contribute to post-incident reviews, root cause analyses, and the implementation of preventive measures to improve platform reliability.
Identify opportunities for automation and contribute to improving operational efficiency, scalability, compliance, and service reliability.
Support the operation of highly secure cloud environments designed to meet stringent regulatory and sovereignty requirements.

We are looking forward to:

Several years of experience in Site Reliability Engineering, Cloud Operations, DevOps, Platform Engineering, Infrastructure Engineering, Production Support, Network Operations (NOC), Technical Operations, or a comparable role.
Experience operating and supporting business-critical production systems with demanding uptime and availability requirements.
Strong troubleshooting and incident management skills in complex technical environments.
Experience monitoring, operating, and maintaining distributed systems, cloud platforms, infrastructure services, or large-scale applications.
Familiarity with reliability engineering concepts, observability, monitoring, alerting, incident response, and root cause analysis.
Experience working with automation, scripting, operational tooling, or Infrastructure-as-Code approaches.
Strong analytical and problem-solving skills with a structured and methodical approach.
Professional proficiency in both German and English.
Willingness to participate in a regular on-call rotation.
Curiosity, adaptability, and a strong desire to learn and work with hyperscale cloud technologies.

The Group invests more than €4,5 billion per year in Research & Development in key areas, particularly for critical environments, such as Artificial Intelligence, cybersecurity, quantum and cloud technologies.

In 2025, the Group generated sales of €22.1 billion.

For our more than 85,000 employees in 65 countries we open up visionary perspectives, realise individual career paths and enable creative freedom. This is achieved with courage, versatility and the firm intention to make the demanding challenges of our time safer and more inclusive. With our sustainable value-focused management we support diversity actively.

Say HI* – Your journey to us

At times of change our international teams are ready to meet the complexity of today with the industry-leading technologies of tomorrow. Will you be part of it? Your Talent Acquisition contact Andre Fuhrmann is looking forward to your online application.

Andre Fuhrmann – Talent Acquisition Partner

+49 7156 / 302-22002

*Human Intelligence

#LI-AF1

#LI-HYBRID

Senior Site Reliability Engineer / Cloud Operations Engineer (m/f/d)

Job Description

Optimize Your Resume for This Job

Ready to Apply?

Job Details

About Thales Alenia Space

More Roles at Thales Alenia Space

Similar Software Roles

See full compensation ladder for Software Engineer at Thales Alenia Space