25 may
|
Agileengine
|
Argentina
25 may
Agileengine
Argentina
Postúlate en Kit Empleo: kitempleo.com.ar/empleo/q5b9v
We are looking for a Middle SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You'll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands‑on, execution‑focused, with real ownership across CI/CD pipelines, Git Ops workflows, and on‑call rotations.
What you will do
- Monitor and support production and staging environments to ensure availability, performance, and stability;
- Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts;
- Participate in on‑call rotations with defined SLAs;
- Handle operational requests from internal teams;
- Maintain and improve monitoring, alerting, dashboards, logs, and metrics;
- Support CI/CD pipelines, production releases, and Git Ops workflows;
- Contribute to automation initiatives to reduce operational overhead;
- Maintain and improve Kubernetes-based infrastructure and containerized workloads;
- Support Infrastructure as Code practices and environment improvements.
Must haves
- 2+ years of experience in Site Reliability Engineering, Dev Ops, or Production Operations;
- Experience with AWS supporting production environments;
- Experience supporting production SaaS applications ;
- Strong understanding of CI/CD systems (Git Hub Actions, Jenkins, CircleCI) ;
- Experience with Git Ops and Git fundamentals ;
- Experience using Git Hub, Jira, and Confluence ;
- Experience with Kubernetes (EKS, kOps or similar) ;
- Experience with Docker and containerization ;
- Experience with observability tools (Grafana, Prometheus, Loki, Pager Duty) ;
- Proficiency in scripting (Bash, Python, or Go) ;
- Experience with Infrastructure as Code (Terraform, Helm) ;
- Ability to work within structured operational processes and SLAs;
- Strong written and verbal English communication skills;
- Self-driven with a growth mindset.
Nice to haves
- AWS certifications such as Solutions Architect, Dev Ops Engineer, or Sys Ops Administrator ;
- Experience with multi-tenant SaaS environments ;
- Experience working in globally distributed teams ;
- Familiarity with Chat Ops practices ;
- Experience improving monitoring quality and reducing alert fatigue.
#J-18808-Ljbffr
Postúlate en Kit Empleo: kitempleo.com.ar/empleo/q5b9v
📌 Site Reliability Engineer (Argentina)
🏢 Agileengine
📍 Argentina