Site Reliability Engineering Manager, GWCP

ProNavigator

ProNavigator

Software Engineering, Other Engineering

Curitiba, PR, Brazil

Posted on May 15, 2026

Job Description

What You'll Do

Technical Leadership & Execution

  • Provide technical direction and oversight for SRE initiatives, ensuring best practices in reliability, scalability, and performance.

  • Remain hands-on where needed, contributing to system design, automation, and incident resolution.

  • Guide the design and development of tools supporting 24x7 follow-the-sun operations.

  • Drive automation across infrastructure provisioning, deployments, and operational workflows.

  • Ensure effective observability strategies (metrics, logging, tracing) and promote self-healing systems.

  • Partner with engineering teams to influence system design for reliability and operability.


Reliability, Process Engineering & Continuous Improvement

  • Design, evolve, and simplify SRE processes (incident management, production readiness, capacity planning, change management) with a focus on effectiveness over overhead.

  • Apply process engineering principles—ensuring processes are lightweight, scalable, and enable teams rather than slow them down.

  • Prioritize people over process: use processes as guardrails, not rigid workflows, and empower engineers to make sound decisions.

  • Proactively identify gaps, inefficiencies, and risks—and drive them through to resolution with a bias for action.

  • Establish and enforce SLOs, SLIs, and error budgets across services.

  • Lead major incident response and ensure blameless postmortems result in real, implemented improvements, not just documentation.

  • Continuously reduce operational toil through automation and simplification.

  • Ensure follow-the-sun operations are practical, sustainable, and optimized for real-world execution.


Leading People Working for You

  • Hire, onboard, and develop SRE engineers.

  • Lead the people working for you by setting clear expectations, providing guidance, and removing obstacles to execution.

  • Foster a culture of ownership, accountability, and service orientation.

  • Support engineers in making decisions and taking action, rather than relying on rigid processes or escalation.

  • Encourage critical thinking and problem-solving over checklist-driven execution.

  • Balance workload across the team, ensuring sustainable on-call participation and operational responsibilities.

  • Set clear priorities and ensure the team is focused on high-impact work that improves reliability and customer outcomes.


Cross-Team Collaboration & Service Mentality

  • Act as a key stakeholder across SRE Platform, Product Development, and Cloud Engineering teams.

  • Demonstrate a strong service mentality—ensuring platform capabilities meet the needs of internal teams and customers.

  • Balance platform standards with pragmatism, enabling teams while maintaining reliability and guardrails.

  • Partner with teams to solve problems collaboratively, rather than acting as a gatekeeper.

  • Drive adoption of best practices through influence, not enforcement alone.


Operational Strategy & Execution

  • Define and track metrics that reflect real outcomes (reliability, customer impact, team efficiency), not just process adherence.

  • Ensure work is prioritized toward meaningful improvements in reliability, scalability, and developer experience.

  • Continuously evaluate whether processes, tools, and practices are delivering value—and adjust when they are not.

  • Avoid unnecessary process overhead; focus on enabling teams to move faster safely.

  • Advocate for and drive investments in platform improvements and reliability initiatives.


Documentation & Knowledge Sharing

  • Ensure high-quality documentation, runbooks, and operational guidance.

  • Promote knowledge sharing across teams and regions.

  • Enable teams to operate independently through clear documentation and tooling.


Who You Are


Technical Expertise

  • Strong programming skills in Python or Go; experience with Java/Spring Boot is a plus.

  • Deep experience with Kubernetes (EKS), including networking, ingress, and operator patterns.

  • Expertise in Terraform and infrastructure as code at scale.

  • Advanced knowledge of AWS services and distributed systems architecture.

  • Strong background in observability tools such as Prometheus, OpenTelemetry, or Datadog.

  • Experience supporting production systems at scale in a microservices environment.

  • Familiarity with CI/CD systems such as TeamCity, GitHub Actions, or Jenkins.

  • Understanding of SSO, SAML, OAuth; experience with Okta is a plus.


Leadership & Ownership

  • Proven experience leading engineers working for you while remaining technically credible.

  • Demonstrated ability to build and evolve processes that serve people and outcomes, not bureaucracy.

  • Strong sense of ownership with a track record of driving issues through to resolution.

  • Demonstrated ability to identify problems, take initiative, and implement solutions without waiting for direction.

  • Ability to balance short-term operational needs with long-term improvements.

  • Comfortable making decisions and taking accountability in high-pressure situations.


Collaboration & Communication

  • Excellent communication skills with the ability to influence across teams.

  • Ability to translate complex technical concepts into clear, actionable insights.

  • Experience working in agile environments (Scrum, Kanban).


Mindset

  • Strong service-oriented mindset with a focus on enabling others to succeed.

  • Bias toward action and problem-solving over coordination and escalation.

  • Focus on outcomes, not process overhead.

  • Passion for reliability, automation, and continuous improvement.

  • Curiosity and willingness to explore emerging technologies, including AI, to improve productivity and outcomes.


Bonus Points

  • Kubernetes or AWS certifications.

  • Experience leading SRE or platform teams.

  • Contributions to open source projects.

  • Familiarity with tools like KubeVela (OAM) or Crossplane.

  • Experience implementing SLO/error budget frameworks at scale.