Platform Architect
IT
Bengaluru, Karnataka, India
Job Description
AWS Production Environment: Stand up and operate the team’s AWS PROD environment end-to-end — networking, IAM, compute, storage, observability. Set the standards every Strike Team’s workloads will run inside.
CI/CD Across Strike Teams: Build and maintain unified CI/CD pipelines that all Strike Teams consume. Ensure consistent quality gates, deployment patterns, and rollback safety across 4–6 concurrent projects.
Vector Database Operations: Own the vector database tier (e.g., OpenSearch, Pinecone, Weaviate, neo4j). Scale indexes, manage ingestion pipelines, and ensure cost-efficient operation as project volume grows.
API Hosting & Token Optimization: Manage LLM API integrations and hosting — quotas, rate limits, model routing, caching, and ongoing cost optimization across providers (Bedrock, Anthropic, OpenAI).
Cross-Team Enablement: Build reusable Terraform/CDK modules, golden paths, and platform documentation that let Product Engineers ship features without re-inventing infrastructure.
Technical Standards: Set platform standards in partnership with the AI Architect, QA Architect, and SecOps Engineer — coherent decisions across infra, AI, quality, and security so that projects compose cleanly
KPIs & Success Metrics
AWS PROD environment availability and operational SLAs met.
CI/CD adoption: share of Strike Teams on the unified pipeline; deployment frequency and rollback safety.
Vector database performance and cost-efficiency (retrieval latency and cost per query within budget).
LLM API and token-cost optimization — cost per project trending down with no quota incidents.
Developer enablement: reusable Terraform/CDK modules and golden paths adopted; reduced per-project infra setup time.
Key Skills & Experience
Education: Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field.
Experience: 12+ years of software/infrastructure engineering experience, with at least 5 years in a senior platform or cloud architect role.
-
Technical Proficiency:
Deep expertise in AWS — networking (VPC, PrivateLink), compute (ECS/EKS, Lambda), data services (RDS, OpenSearch, S3), and IAM.
Strong Infrastructure-as-Code skills with Terraform or AWS CDK.
Hands-on experience operating production vector databases at scale.
Production experience integrating and operating LLM APIs (Bedrock, Anthropic, OpenAI) — including cost modeling, caching strategies, and observability.
Strong CI/CD background (GitHub Actions, CodePipeline, or equivalent) with multi-team pipeline design experience.
Engineering Excellence: Proven track record of operating platforms used by multiple downstream teams. Strong instincts for self-service tooling, golden paths, and developer experience.
Domain Context: Experience supporting AI or data-intensive workloads in production.
Preferred Skills & Experience
Guidewire Knowledge: Prior experience with Guidewire Cloud or insurance/regulated-industry platforms.
MLOps: Experience building MLOps platforms, model evaluation pipelines, or LLM observability stacks.
Security: AWS security specialization or familiarity with SOC 2 / compliance-driven environments.
Multi-Region: Experience operating production platforms across multiple AWS regions.