AI Systems Testing & Evaluation Expert
Software Developer in Test · IEEE Member
Specializing in engineering-grade evaluation frameworks for LLMs, RAG systems, multimodal models, and agentic AI pipelines.
Author of the upcoming AI Systems Testing & Evaluation Program launching June 1, 2026 · Available for $999.
Book a Consultation — $150/hour Join the Course Waitlist — $999
About the Author
Dmytro Kyiashko is an AI evaluation engineer with 10+ years of experience in software testing, engineering leadership, and the development of structured methodologies for assessing AI systems.

His expertise includes:
  • designing evaluation frameworks for LLM, RAG & agent workflows
  • constructing reproducible datasets and scoring pipelines
  • developing automated evaluators and metrics
  • implementing continuous evaluation in CI/CD
  • failure mode diagnostics and taxonomy engineering
  • assessing reasoning quality, factuality, hallucinations, and stability
As an IEEE member, he contributes to international engineering communities, reviews scientific work, and serves as an expert judge at global events including UAtech Venture Night (Web Summit Vancouver).

Consulting — $150/hour
Dmytro provides high-level technical consulting for teams building or scaling AI systems.
Book a Consultation
How you will like our course
  • AI Evaluation Framework Design
    Custom evaluation pipelines for LLMs, RAG systems, agentic workflows, and multimodal architectures.
  • CI/CD Integration for Continuous Evaluation
    Automated evaluators, regression detection, model version comparison, and production monitoring.
  • Metrics & Dashboards
    Factuality, stability, safety, reasoning depth, and engineering-grade quality dashboards.
  • Failure Analysis & Risk Diagnostics
    Taxonomies, axial coding, hallucination detection, edge-case discovery, critical risk identification.
About the Course
AI Systems Testing & Evaluation Program
Course Launch: June 1, 2026
Enrollment is now open.
This program provides a full engineering methodology for testing and evaluating modern AI systems — LLMs, RAG pipelines, agentic workflows, and multimodal models.
Students learn how to design reproducible evaluation workflows, create structured metrics, diagnose failures, and build continuous evaluation pipelines for production.
Who This Course Is For
  • SDET & QA Engineers
  • Machine Learning Engineers
  • Data Scientists
  • Tech Leads & Engineering Managers
  • AI Product Teams
  • Startups building AI/LLM solutions
Learning Outcomes
Participants will learn to:
  • build evaluation pipelines for LLM, RAG & agent systems
  • design scoring rubrics & reproducible datasets
  • automate factuality, hallucination & stability checks
  • create failure taxonomies with axial coding
  • evaluate multi-step reasoning and agent workflows
  • integrate evaluation into CI/CD
  • build dashboards for production monitoring
Course Curriculum (3 Modules)
Module 1
Systematic Testing of LLMs and AI Pipelines
Full-stack methodology for evaluating LLM-driven systems, including RAG architectures, tool-augmented workflows, and multi-component pipelines. Students learn dataset creation, operationalizing LLM-as-a-judge, automating checks, and building taxonomies.
Module 2
Evaluation of Multimodal and Multi-Step Agentic Systems
Assess multimodal models and agent systems with multi-step reasoning. Includes intermediate state evaluation, tool interaction validation, stability checks, and alignment assessment.
Module 3
Continuous Evaluation & CI/CD for AI Systems
Design continuous evaluation pipelines, integrate automated evaluators, detect regressions, compare model versions, and build monitoring dashboards.
Course Format
6 weeks
Weekly lectures and hands-on assignments
Real evaluation datasets
Templates and engineering frameworks
Step-by-step implementation guides
Lifetime access
Certificate of completion
Prices
Book a Consultation or
Join the Course
© 2025–2026 · Dmytro Kyiashko
AI Systems Testing & Evaluation Expert
Made on
Tilda