Sign in

Site Reliability Engineer III, GWCP

ProNavigator

Software Engineering

Curitiba, PR, Brazil

Posted on May 15, 2026

Apply now

Job Description

What You’ll Do

Drive Reliability, Automation & Scale

Design, build, and operate highly reliable, scalable infrastructure for a multi-tenant SaaS platform.
Automate deployment, provisioning, and operational workflows across cloud infrastructure and applications.
Develop internal tools, services, and frameworks to improve efficiency and reduce manual effort.
Participate in a 24x7 follow-the-sun on-call rotation to support critical production systems.

Improve Platform & Infrastructure

Contribute to core platform systems by building features, resolving issues, and enhancing reliability.
Partner with development teams to ensure systems meet availability, performance, and scalability requirements.
Proactively identify risks, bottlenecks, and failure modes, and implement solutions before they impact customers.

Observability, Incident Management & Resilience

Build and maintain observability systems (metrics, logging, tracing, dashboards).
Define and track Service Level Objectives (SLOs) and reliability metrics.
Lead or contribute to incident response, root cause analysis, and blameless postmortems.
Drive improvements toward self-healing systems and reduced operational toil.

Security & Identity

Design and support secure access patterns, including SSO, SAML, and OAuth-based authentication systems.
Ensure platform services meet security and compliance standards.

Enablement & Collaboration

Collaborate across engineering teams, providing guidance, feedback, and hands-on contributions.
Create and maintain documentation, runbooks, and training materials.
Mentor engineers and promote best practices in reliability engineering and automation.

Who You Are

Technical Expertise

Strong programming skills in Python or Go (Java/Spring Boot is a plus).
Deep experience with AWS and building/operating production systems at scale.
Hands-on expertise with Kubernetes (EKS), Docker, Helm, CNI, and Ingress networking.
Strong understanding of Kubernetes primitives and patterns (deployments, services, operators, etc.).
Experience with Infrastructure as Code (Terraform, Terragrunt, or similar).
Solid understanding of Linux systems and networking fundamentals.

Observability & Operations

Experience with observability platforms such as Datadog, Prometheus, OpenTelemetry, or CloudWatch.
Familiarity with incident management practices and production support in a microservices environment.
Experience with messaging/streaming systems (e.g., Kafka, SQS) and relational databases (e.g., Aurora, RDS) is a plus.

Security & Identity

Working knowledge of SSO, SAML, OAuth, and identity providers (Okta is a plus).

DevOps & Delivery

Experience with CI/CD and GitOps tools such as GitHub Actions, TeamCity, Jenkins, FluxCD, or Bitbucket.
Comfortable working in agile environments (Scrum, Kanban).

Mindset & Collaboration

Strong troubleshooting and problem-solving skills with a proactive, systems-thinking mindset.
Passion for automation: “If you have to do it more than once, automate it.”
Excellent communication skills and ability to work across distributed teams.
A collaborative team player who can influence, mentor, and lead through technical expertise.
Demonstrated ability to leverage AI and data-driven insights to improve productivity and outcomes.

Preferred Qualifications

Bachelor’s degree in Computer Science or related field (or equivalent experience).
Experience supporting large-scale SaaS platforms.
Exposure to modern platform frameworks such as KubeVela (OAM) or Crossplane.
AWS or Kubernetes certifications.
Contributions to open-source projects.

Why Guidewire?

Work on a mission-critical global platform used by leading insurers.
Solve complex, real-world problems at scale.
Be part of a collaborative, high-impact engineering culture.
Opportunity to shape the future of a rapidly evolving cloud platform.

AI & Innovation at Guidewire

We foster a culture of curiosity and innovation, empowering engineers to responsibly leverage AI and emerging technologies to drive continuous improvement, efficiency, and better outcomes.

Apply now

See more open positions at ProNavigator

Find the best jobs with the most innovative companies in the MaRS community.

Site Reliability Engineer III, GWCP

What You’ll Do

Who You Are

Why Guidewire?

Stay up to date on news and events from MaRS. Sign up to tailor your own innovation ecosystem newsletter.