Site Reliability Engineer III

ProNavigator

ProNavigator

Software Engineering, IT

Bengaluru, Karnataka, India

Posted on May 19, 2026

Job Description

What You'll Do

Drive Reliability, Automation & Scale:

  • Design, build, and operate highly reliable, scalable infrastructure for a multi-tenant SaaS platform.
  • Automate deployment, provisioning, and operational workflows across cloud infrastructure and applications.
  • Develop internal tools, services, and frameworks to improve efficiency and reduce manual effort.
  • Participate in a 24x7 follow-the-sun on-call rotation to support critical production systems.

Improve Platform & Infrastructure:

  • Contribute to core platform systems by building features, resolving issues, and enhancing reliability.

  • Partner with development teams to ensure systems meet availability, performance, and scalability requirements.

  • Proactively identify risks, bottlenecks, and failure modes, and implement solutions before they impact customers.

Observability, Incident Management & Resilience:

  • Build and maintain observability systems (metrics, logging, tracing, dashboards).

  • Define and track Service Level Objectives (SLOs) and reliability metrics.

  • Lead or contribute to incident response, root cause analysis, and blameless postmortems.

  • Drive improvements toward self-healing systems and reduced operational toil.

Security & Identity:

  • Design and support secure access patterns, including SSO, SAML, and OAuth-based authentication systems.

  • Ensure platform services meet security and compliance standards.

Enablement & Collaboration:

  • Collaborate across engineering teams, providing guidance, feedback, and hands-on contributions.

  • Create and maintain documentation, runbooks, and training materials.

  • Mentor engineers and promote best practices in reliability engineering and automation.

Who You Are

Infrastructure Development:

  • Strong programming skills in Python or Go (Java/Spring Boot is a plus).

  • Deep experience with AWS and building/operating production systems at scale.

  • Hands-on expertise with Kubernetes (EKS), Docker, Helm, CNI, and Ingress networking.

  • Strong understanding of Kubernetes primitives and patterns (deployments, services, operators, etc.).

  • Experience with Infrastructure as Code (Terraform, Terragrunt, or similar).

  • Solid understanding of Linux systems and networking fundamentals.

Observability & Operations:

  • Experience with observability platforms such as Datadog, Prometheus, OpenTelemetry, or CloudWatch.

  • Familiarity with incident management practices and production support in a microservices environment.

  • Experience with messaging/streaming systems (e.g., Kafka, SQS) and relational databases (e.g., Aurora, RDS) is a plus.

Security & Identity:

  • Working knowledge of SSO, SAML, OAuth, and identity providers (Okta is a plus).

  • Experience with AWS IAM (roles, policies, IRSA), VPC security groups, and Kubernetes security primitives (RBAC, network policies, pod security standards, secrets management).

DevOps & Delivery:

  • Experience with CI/CD and GitOps tools such as GitHub Actions, TeamCity, Jenkins, FluxCD, or Bitbucket.

  • Comfortable working in agile environments (Scrum, Kanban).

Mindset & Collaboration:

  • Strong troubleshooting and problem-solving skills with a proactive, systems-thinking mindset.

  • Passion for automation: “If you have to do it more than once, automate it.”

  • Excellent communication skills and ability to work across distributed teams.

  • A collaborative team player who can influence, mentor, and lead through technical expertise.

  • Demonstrated ability to leverage AI and data-driven insights to improve productivity and outcomes.

Preferred Qualifications

  • Bachelor's degree in Computer Science or related field, or equivalent experience

  • Experience supporting large-scale SaaS platforms

  • AWS or Kubernetes certifications

  • Exposure to modern platform frameworks such as KubeVela (OAM) or Crossplane

  • Contributions to open-source projects

Why Guidewire?

  • Work on a mission-critical global platform used by leading P&C insurers worldwide

  • Solve complex, real-world infrastructure problems at genuine scale

  • Be part of a collaborative, high-impact engineering culture grounded in integrity, rationality, and collegiality

  • Opportunity to shape the future of a rapidly evolving cloud platform

  • A culture of curiosity and innovation where engineers are empowered to leverage AI and emerging technologies.