Demetrios Brinkmann on Building Reliable Evaluation Systems for MLOps | swampUP 2025

September 15, 2025

Standard benchmarks often fall short and can be misleading. Leaderboards can erode trust in model claims, as they rarely address specific, real-world needs. In this talk, Demetrios Brinkmann will detail how MLOps engineers and developers can build and continuously update their own evaluation systems to create a strong competitive advantage. He’ll cover how to build a reliable “golden dataset,” optimize data collection, labeling, and utilize the right tools to ensure evaluations truly reflect their intended use case.

Share some ❤
Categories: JFrog swampUP 2025
starts in 10 seconds