We are seeking a Senior Site Reliability Engineer who thrives in a fast-paced environment and enjoys working with innovative hybrid cloud technologies. In this role, you’ll design, automate, and manage resilient cloud infrastructure that powers mission-critical enterprise applications. You’ll collaborate with engineering teams to ensure scalability, reliability, and operational excellence across our systems.
Architect, provision, and manage hybrid cloud deployments using automation frameworks.
Partner with development teams to ensure production-ready applications with built-in scalability and reliability.
Oversee CI/CD platforms and Linux infrastructure; perform capacity planning, build operational runbooks, and improve automation frameworks.
Develop tools and frameworks to automate deployments, monitoring, and operational tasks for services and applications.
Participate in on-call rotations for critical incident response and lead root cause analysis for production issues.
Manage scalability, redundancy, and resiliency strategies to meet stringent SLAs.
Implement proactive monitoring, alerting, and trend analysis to maintain high service availability.
Contribute to documentation covering design, deployment, validation, and operations.
6+ years in system engineering for mission-critical, enterprise-level environments.
Extensive experience with Linux platforms (Ubuntu, SUSE, CentOS) in hybrid (cloud + on-prem) settings.
Infrastructure-as-Code expertise (Terraform, Ansible, Helm) for building large-scale environments.
3+ years with public cloud platforms, preferably Google Cloud Platform (GCP).
Strong foundation in Linux OS troubleshooting, design, and implementation.
Hands-on with CI/CD pipelines (GitHub, Jenkins, Artifactory).
Programming experience in Python, Bash, Go, or Perl for automation.
Knowledge of networking, firewalls, load balancers, and complex architectures.
Familiarity with monitoring tools (Datadog, Nagios, Grafana, Graphite, Cacti).
Understanding of Kubernetes and container lifecycle management.
Proven skills in high availability, disaster recovery, and scalability planning.
Excellent problem-solving, communication, and collaboration skills.
Bonus: Passion, drive, energy, and a positive, team-oriented attitude.
© 2025 Hire Dev Now. All rights reserved. designed and developed by Hiredevnow