AI Engineer
Software Engineering, Data Science
Bengaluru, Karnataka, India
Job Description
Key Responsibilities
• Harness Engineering: Build and maintain the LLM harnesses that power each AI feature — agent loops, tool/function calling, context construction, memory, retries, and failure handling. Use frameworks like LangChain, LlamaIndex, or LangGraph where they fit; write custom orchestration where they don't.
• Prompt Engineering: Design, iterate, and version prompts as first-class assets. Run structured prompt experiments, measure deltas with eval datasets, and keep prompt libraries clean and reviewable.
• RAG Pipelines: Build and operate Retrieval-Augmented Generation pipelines that extract information from documents and parse it into structured knowledge bases — chunking, indexing, retrieval, reranking, prompt assembly, and response handling — collaborating with the AI Architect to shape and refine the patterns.
• Embeddings & Vector Indexes: Generate embeddings and manage vector indexes (e.g., OpenSearch, Pinecone, pgvector). Tune indexes for retrieval quality and cost.
• Data Parsing & Curation: Build extraction and parsing pipelines for documents, structured records, and customer datasets so they are clean, labeled, and ready for downstream AI work.
• Accuracy Validation: Write and operate accuracy validation scripts. Maintain evaluation datasets and report quality metrics to the Strike Team and the AI Architect.
• Light ML Work: Apply traditional ML where appropriate — classification, clustering, lightweight fine-tuning — to complement LLM-based components.
• Analysis: Do the data analysis that informs AI design decisions — sample inspection, error analysis, prompt iteration.
KPIs & Success Metrics
• AI feature accuracy and quality against eval datasets meets the project bar.
• Prompt iteration velocity with measurable eval deltas on owned features.
• RAG and retrieval quality (relevance, groundedness) for owned pipelines.
• On-time delivery of AI-side Strike Team commitments.
• Eval coverage: golden and regression sets maintained for owned features.
Key Skills & Experience
• Education: Bachelor's or Master's degree in Computer Science, Data Science, Statistics, or a related technical field.
• Experience: 3+ years of AI or data science experience, with hands-on time in LLM-based application development.
• Technical Proficiency:
– Strong Python skills, including the standard data science stack (pandas, numpy, scikit-learn).
– Hands-on experience building LLM harnesses — agent loops, tool/function calling, structured outputs — against APIs like Anthropic, OpenAI, or Bedrock.
– Strong prompt engineering practice: structured iteration, prompt versioning, and prompt evaluation against datasets.
– Working experience with at least one orchestration framework (LangChain, LlamaIndex, LangGraph) and at least one vector database.
– Comfortable with embeddings, similarity search, and basic retrieval evaluation.
– Working knowledge of classical ML for analysis and lightweight modeling tasks.
– Comfort using AI coding assistants (Claude Code) for daily work.
• Engineering Excellence: Able to write clean, tested code that ships to production — not just notebooks. Familiar with Git, code review, and basic CI/CD.
• Analytical Mindset: Strong instinct for data analysis, error inspection, and iterative experimentation.
Preferred Skills & Experience
• Agent Frameworks: Hands-on experience with agentic frameworks (LangGraph, Claude Agent SDK, OpenAI Agents) or custom agent harnesses in production.
• Evals: Experience building structured eval harnesses (golden sets, regression suites, LLM-as-judge patterns).
• Cloud: Hands-on AWS experience (Bedrock, SageMaker, OpenSearch).
• Fine-Tuning: Any experience with model fine-tuning or distillation.
• Guidewire Knowledge: Familiarity with Guidewire products or the insurance domain is a plus.
• Domain Analysis: Prior experience working on document-heavy, regulated, or insurance/finance datasets.