Senior Staff Software Engineer, Reliability (Buenos Aires)

Senior Staff Software Engineer, Reliability (Buenos Aires)

26 may
|
hireworks.io
|
Buenos Aires

26 may

hireworks.io

Buenos Aires

Senior Staff Software Engineer, Reliability
About the job Senior Staff Software Engineer, Reliability
About hireworks
hireworks is building a community of top talent in key international markets by unlocking unparalleled access to positions at leading U.S. based companies.
As your employer, hireworks will ensure you have a seamless interview, onboarding, and employee experience - providing ongoing support and resources along the way.
Established in ****, hireworks is forging corp-to-corp relationships with leading U.S. based organizations looking to grow their teams with best-in-class talent around the world.
Working with hireworks means unlocking access to a network of local peers and mentors and career opportunities through our client network.
About our client
Our client is building artificial intelligence to make the physical world more responsive.
The company is pioneering what it calls the Recognition Economy, a future where repetitive tasks disappear and being recognized unlocks seamless access, comfort, and personalized experiences across everyday environments.
From transforming parking into a frictionless drive-in, drive-out experience for millions of users to expanding its intelligence layer across industries such as retail and hospitality, the company is developing technology that makes real-world interactions more intuitive and efficient.
As the organization continues to grow, it is looking for builders, innovators, and problem solvers who want to help shape the next generation of intelligent infrastructure for physical spaces
Position Overview
Our clientis seeking a Staff Software Engineer focused on Reliability to own reliabilityacross their entire platform and drive the comprehensive practices thatensure system availability, resilience, and observability for our mission-critical mobilityinfrastructure.
In this role, you will build reliability from first principles, architectingfailover systems, implementing chaos engineering, and improving our observabilityfoundation to maintain 99.9%+ uptime as we scale to new markets.
As the technical owner of our reliability posture, you will tackle challenges like externalservice failover, dependency mirroring, and database replication,



working alongsidehighly technical teams across the organization to influence architecture decisions andestablish company-wide reliability standards.
You will join the Product Foundationsteam, playing a key role in building the foundational infrastructure that powers the future of mobility commerce.
What You'll Do
Own the overall reliability posture for the platform, establishing
practices, metrics, and systems that ensure 99.9%+ uptime across all services
Design and implement automatic failover mechanisms for critical external
dependencies like Twilio for SMS/voice and Stripe for payments with circuit
breakers, retry policies, and degraded mode operations
Architect and build active-passive or active-active regional deployment
strategies with database replication, automated failover, and DNS-based traffic
routing including disaster recovery planning and testing
Establish comprehensive monitoring using Datadog for APM, logs, and metrics
correlation
Implement synthetic monitoring, SLO-based alerting, on-call rotation, and
escalation policies while building service health dashboards that show customer
impact
Own the incident management process including workflows, tooling,
post-mortem culture, runbook automation, and MTTR reduction initiatives to
drive down mean time to recovery from detection to resolution
Drive adoption of resilience patterns across all services including health checks,
graceful degradation, feature flags, rate limiting, backpressure mechanisms, and
chaos engineering practices
Build and maintain local mirrors for critical dependencies with artifact caching,
dependency pinning, and vulnerability scanning to prevent build failures from
upstream outages
About You
8+ years of engineering experience including software engineering, reliability
engineering, SRE practices, or production operations at scale
Demonstrate expert-level reliability engineering skills including hands-on
experience with multi-region architectures, failover automation,



circuit breakers,
chaos engineering, and disaster recovery
Utilize production observability expertise with deep experience implementing
monitoring, alerting, tracing, and logging systems at scale – specifically Datadog
or similar APM platforms in high-load environments
Apply strong systems thinking with proven ability to design resilient distributed
systems that gracefully handle failures, network partitions, and external
dependency outages
Demonstrate database and data systems knowledge including replication
strategies, backup/restore procedures, connection pooling, query optimization,
and experience with both relational and NoSQL databases
Leverage cloud platform expertise with production experience operating and
ensuring reliability of systems on AWS including multi-region deployments, load
balancing, and DNS-based failover
Possess experience with AI-powered development tools such as Claude Code,
GitHub Copilot, or similar agentic coding tools for enhanced productivity –
context engineering in particular
Exhibit excellent technical communication with ability to influence technical
decisions across teams, document complex systems, conduct post-mortems,
and establish reliability standards organization-wide
Demonstrate expert-level Java and/or Scala proficiency with strong
understanding of JVM performance, concurrency, and operational
characteristics
Our Stack
Languages + Frameworks:
TypeScript, React, Scala (principally), Java (limited)
Cloud:
AWS
Version control:
Git & GitHub
AI Tooling:
Copilot on GitHub
hireworks is cultivating a growing community of top talent across Colombia, Argentina or Bulgaria.
In addition to unlocking access to positions at top tier U.S. based companies, we offer a variety of benefits to enhance your experience:
Competitive Pay –
compensation that reflects your experience and accomplishments.
Remote Flexibility –
work from anywhere within your local country (Colombia, Argentina or Bulgaria), with the option to use co-working space as available locally.
Paid Time Off –
ample vacation days to rest and recharge.
Public Holidays –
all local federal holidays are fully paid days off.
#J-*****-Ljbffr

📌 Senior Staff Software Engineer, Reliability (Buenos Aires)
🏢 hireworks.io
📍 Buenos Aires

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: senior staff software engineer, reliability (buenos aires) / buenos aires
Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: senior staff software engineer, reliability (buenos aires) / buenos aires