About this role
We’re hiring a Member of Technical Staff (MTS) to act as a technical owner operating at the intersection of research, data, and real-world AI systems. This is a hands-on role focused on improving model and system performance through rigorous evaluation, failure analysis, and iterative development.
You’ll work closely with researchers, domain experts, and operators to ensure that experimental work produces clean, defensible research signal—and that this signal translates into meaningful improvements in deployed systems.
Skills
Research Signal JudgmentML-Oriented Data DesignOps-to-Research TranslationRL Environments
Key responsibilities
- Own research and evaluation initiatives end-to-end: problem framing, data design, quality calibration, and signal validation.
- Design ML-oriented data systems, including task definitions, annotation schemas, rubrics, incentives, and pipelines optimized for downstream model performance.
- Analyze model and system failures to identify root causes, edge cases, and opportunities for improvement.
- Translate ambiguous, real-world behavior into structured evaluation frameworks and new data categories.
- Work closely with researchers and domain experts to calibrate quality early and continuously raise the signal bar.
- Iterate rapidly on evaluations, datasets, and feedback loops to improve system performance.
- Act as a quality gate: block claims, pause work, or force scope changes when signal strength or data integrity is insufficient.
- Partner with cross-functional and client-facing teams to translate research progress into clear, credible narratives grounded in evidence.
- Identify gaps in data or evaluation coverage and recommend where to invest, iterate, or stop based on learnings and impact.
Required skills & qualifications
- Strong judgment around research signal quality and when work is (or is not) ready to be externalized.
- Experience designing ML-oriented datasets, evaluation frameworks, and QA processes.
- Ability to translate messy, real-world system behavior into structured research and evaluation opportunities.
- Comfort operating in ambiguity, with a bias toward ownership and decisive action.
- Clear written and verbal communication, especially when explaining tradeoffs, limitations, and signal strength to technical and non-technical stakeholders.
- Proven ability to work directly with experts during project kickoff, calibration, and iteration.
- A systems-level mindset, with interest in improving end-to-end model or agent performance rather than isolated components.
Preferred qualifications
- Experience with reinforcement learning environments, simulators, or feedback-driven training systems.
- Experience improving agentic systems or AI systems operating in real-world workflows.
- Prior work embedded in applied research or production environments with direct impact on deployed systems.
- Experience with evaluation design for complex or real-world tasks.
- Familiarity with expert incentive design and engagement in high-stakes technical projects.
Apply on micro1 →
This role is posted on our partner platform. When you click Apply, you'll go to the posting, where the application, interview, skill validation, and onboarding all happen. lehico is an independent site that surfaces these opportunities — we don't process applications or guarantee acceptance.