Site Reliability Engineer II (SRE) - Guidewire Cloud Platform (Application)
ProNavigator
Software Engineering
Kraków, Poland
Job Description
What You'll Do:
Assist in troubleshooting and resolving issues in collaboration with development teams, reducing customer impact.
Develop and maintain automated runbooks to address common issues proactively.
Apply engineering principles and basic automation to enhance our operating environments.
Monitor applications and help improve their reliability and performance on the Guidewire Cloud Platform.
Use your software engineering skills to optimize systems and reduce manual tasks.
Document incidents and assist in refining processes to prevent future occurrences.
Stay informed about industry trends, tools, and best practices in site reliability engineering.
Contribute to a culture of innovation, learning, and continuous improvement.
Participate in on-call rotations to ensure the availability and reliability of our services.
What You'll Bring:
Experience as an SRE or similar role, focusing on improving system reliability.
Strong problem-solving skills and the ability to assist in analyzing complex systems and devising effective solutions.
Effective collaboration and communication skills to work cross-functionally and document processes.
Experience with automation, monitoring, and performance optimization tools and techniques.
Commitment to maximizing uptime, scalability, and delivering an exceptional end-user experience.
Passion for technology and a desire to continuously learn and grow your skills.
Alignment with Guidewire's mission to leverage technology to help protect and support others.
Required Skills:
Basic understanding of SLI's, SLO's, and Error Budgets
Familiarity with application performance monitoring (APM) and telemetry tools
Some experience with troubleshooting and debugging distributed systems on cloud infrastructure
Exposure to CICD pipelines within K8S or legacy ecosystems
Familiarity with creating monitors, dashboards, and synthetic transactions in monitoring tools like Datadog
Some experience deploying and managing infrastructure within AWS or Kubernetes ecosystems using Terraform or other cloud-native approaches
Familiarity with infrastructure configuration management using tools such as GitOps, Puppet, or Ansible
Basic understanding of AWS cloud networking and security
Comfortable with Linux system administration and the ability to program/script using Python, Go, Java, shell, or equivalent
Preferred Skills:
Pursuing or interested in pursuing SRE Certification
Pursuing or interested in pursuing AWS Certification
Familiarity with SQL, database administration, data pipelines, performance tuning, and schema design
Exposure to pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, or GitHub Actions
Interest in learning about open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc.
Why Guidewire
This is an opportunity to join a mission-driven company and make a real impact in the lives of people facing challenges. You'll work with cutting-edge technology, collaborate with talented peers, and grow your skills in a culture that values innovation, teamwork, and work-life balance. We offer competitive compensation, comprehensive benefits, and opportunities for career development.
If you're an SRE who combines deep technical expertise with a passion for problem-solving and a commitment to reliability, we'd love to hear from you. Join us in building the software that helps insurers care for their customers when they need it most.
This position requires participation in mandatory on-call rotations to ensure the availability and reliability of our services. This includes responding to incidents and alerts outside of regular business hours, on weekends, and during holidays, as per the established on-call schedule. Candidates must be willing and able to fulfill this critical responsibility.
#LI-AS3