Senior Site Reliability Engineer (SRE) – Hybrid Cloud & Automation

Tel Aviv, Israel

Not Disclosed

Type : Full-time

Exp. 7-10 years

Hybrid Cloud ArchitectureInfrastructure-as-Code (Terraform/Ansible/Helm)Linux System EngineeringCI/CD (GitHub/Jenkins/Artifactory)Kubernetes & Container ManagementGoogle Cloud Platform (GCP)

Posted on : July 24, 2025
Openings : 1

Apply Now

Job description

We are seeking a Senior Site Reliability Engineer who thrives in a fast-paced environment and enjoys working with innovative hybrid cloud technologies. In this role, you’ll design, automate, and manage resilient cloud infrastructure that powers mission-critical enterprise applications. You’ll collaborate with engineering teams to ensure scalability, reliability, and operational excellence across our systems.

Your Impact

Architect, provision, and manage hybrid cloud deployments using automation frameworks.
Partner with development teams to ensure production-ready applications with built-in scalability and reliability.
Oversee CI/CD platforms and Linux infrastructure; perform capacity planning, build operational runbooks, and improve automation frameworks.
Develop tools and frameworks to automate deployments, monitoring, and operational tasks for services and applications.
Participate in on-call rotations for critical incident response and lead root cause analysis for production issues.
Manage scalability, redundancy, and resiliency strategies to meet stringent SLAs.
Implement proactive monitoring, alerting, and trend analysis to maintain high service availability.
Contribute to documentation covering design, deployment, validation, and operations.

Your Experience

6+ years in system engineering for mission-critical, enterprise-level environments.
Extensive experience with Linux platforms (Ubuntu, SUSE, CentOS) in hybrid (cloud + on-prem) settings.
Infrastructure-as-Code expertise (Terraform, Ansible, Helm) for building large-scale environments.
3+ years with public cloud platforms, preferably Google Cloud Platform (GCP).
Strong foundation in Linux OS troubleshooting, design, and implementation.
Hands-on with CI/CD pipelines (GitHub, Jenkins, Artifactory).
Programming experience in Python, Bash, Go, or Perl for automation.
Knowledge of networking, firewalls, load balancers, and complex architectures.
Familiarity with monitoring tools (Datadog, Nagios, Grafana, Graphite, Cacti).
Understanding of Kubernetes and container lifecycle management.
Proven skills in high availability, disaster recovery, and scalability planning.
Excellent problem-solving, communication, and collaboration skills.
Bonus: Passion, drive, energy, and a positive, team-oriented attitude.