TL;DR
AI QA Engineer: Developing and executing AI model evaluation strategies and implementing automated/manual testing for LLM-based applications with an accent on detecting biases and hallucinations. Focus on optimizing model performance and ensuring high-quality AI outputs through debugging and root cause analysis.
Location: Hybrid in Armenia
What you will do
- Develop and execute AI model evaluation pipelines, ensuring accuracy, consistency, and fairness.
- Implement automated and manual testing for LLM-based applications.
- Work closely with AI engineers to debug failures, identify root causes, and optimize model performance.
- Collaborate with AI Engineers to integrate testing into early-stage development.
- Build and manage test datasets, ensuring high-quality, diverse, and balanced samples.
- Develop synthetic data pipelines to enhance model evaluation.
Requirements
- Experience with AI/ML testing frameworks and LLM evaluation methodologies.
- Strong understanding of LLM behaviors, biases, failure modes, and edge cases.
- Proficiency in Python and familiarity with ML testing frameworks (e.g., PyTest, Unittest).
- Experience with test dataset management and annotation tools.
- Familiarity with synthetic data generation and adversarial testing techniques.
- Strong problem-solving and debugging skills to analyze AI failures and inconsistencies.
- English: B2 required with the ability to evaluate AI-generated text and improve prompts.
Culture & Benefits
- Krisp is an Equal Opportunity Employer.
- All applicants are considered regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.
- No tolerance for discrimination or harassment of any kind.
- All employees and contractors treat each other with respect and empathy.
Hiring process
- Apply through the provided form.
- Only shortlisted candidates will be contacted for the next stages.
