Senior Site Reliability Engineer (SRE) – Hybrid Cloud & Automation

Tel Aviv, Israel
Not Disclosed
Type : Full-time
Exp. 7-10 years
Hybrid Cloud ArchitectureInfrastructure-as-Code (Terraform/Ansible/Helm)Linux System EngineeringCI/CD (GitHub/Jenkins/Artifactory)Kubernetes & Container ManagementGoogle Cloud Platform (GCP)
  • Posted on : July 24, 2025
  • Openings : 1

Job description

We are seeking a Senior Site Reliability Engineer who thrives in a fast-paced environment and enjoys working with innovative hybrid cloud technologies. In this role, you’ll design, automate, and manage resilient cloud infrastructure that powers mission-critical enterprise applications. You’ll collaborate with engineering teams to ensure scalability, reliability, and operational excellence across our systems.


Your Impact

  • Architect, provision, and manage hybrid cloud deployments using automation frameworks.

  • Partner with development teams to ensure production-ready applications with built-in scalability and reliability.

  • Oversee CI/CD platforms and Linux infrastructure; perform capacity planning, build operational runbooks, and improve automation frameworks.

  • Develop tools and frameworks to automate deployments, monitoring, and operational tasks for services and applications.

  • Participate in on-call rotations for critical incident response and lead root cause analysis for production issues.

  • Manage scalability, redundancy, and resiliency strategies to meet stringent SLAs.

  • Implement proactive monitoring, alerting, and trend analysis to maintain high service availability.

  • Contribute to documentation covering design, deployment, validation, and operations.


Your Experience

  • 6+ years in system engineering for mission-critical, enterprise-level environments.

  • Extensive experience with Linux platforms (Ubuntu, SUSE, CentOS) in hybrid (cloud + on-prem) settings.

  • Infrastructure-as-Code expertise (Terraform, Ansible, Helm) for building large-scale environments.

  • 3+ years with public cloud platforms, preferably Google Cloud Platform (GCP).

  • Strong foundation in Linux OS troubleshooting, design, and implementation.

  • Hands-on with CI/CD pipelines (GitHub, Jenkins, Artifactory).

  • Programming experience in Python, Bash, Go, or Perl for automation.

  • Knowledge of networking, firewalls, load balancers, and complex architectures.

  • Familiarity with monitoring tools (Datadog, Nagios, Grafana, Graphite, Cacti).

  • Understanding of Kubernetes and container lifecycle management.

  • Proven skills in high availability, disaster recovery, and scalability planning.

  • Excellent problem-solving, communication, and collaboration skills.

  • Bonus: Passion, drive, energy, and a positive, team-oriented attitude.

© 2025 Hire Dev Now. All rights reserved. designed and developed by Hiredevnow