#evaluation

3 items tagged with "evaluation"

📄 Articles

📄 article

medium.com

Apr 7, 2026

New article: Which Verdicts Changed, and Why: a Row-Level Audit of Fabric Data Agent Evaluation

The author performs a detailed row‑level audit of a 72‑question benchmark to understand why evaluation verdicts changed after fixing errors in the benchmark itself. Many initial “failures” turn out to be caused by faulty ground truth, ambiguous phrasing, or inconsistent casing rules rather than true Data Agent mistakes. After refining benchmark wording, tightening Agent instructions, and clarifying metric definitions, accuracy rises to 97.2%. The few remaining errors stem from extremely complex multi‑step prompts and ambiguous schema references, revealing limits of the underlying model rather than flaws in the benchmark.

data-agent fabric evaluation

Author: Luca Zavarella

📄 article

lucazavarella.medium.com

Mar 17, 2026

We Built the Benchmark. Now Let’s Evaluate the Fabric Data Agent for Real

This article shows how to move from a benchmark design to a real evaluation workflow for a Microsoft Fabric Data Agent. Starting from a 72-question benchmark built in a previous article for an Italian multilingual scenario, it explains how to complete the ground-truth dataset, run evaluate_data_agent on Fabric, inspect summary and row-level results, and use notebooks to operationalize the full process. A key insight is that part of the observed weakness may come not only from the Data Agent, but also from the evaluation layer itself. By inspecting the SDK source code and testing a stricter custom critic prompt, the article shows how evaluation reliability can improve significantly without changing the agent or the benchmark. Overall, the piece is a practical guide to benchmarking and evaluating Fabric Data Agents more rigorously, especially in multilingual business scenarios.

data-agent fabric evaluation multilingual

Author: Luca Zavarella

🎬 Videos

🎬 video

youtube.com

Aug 9, 2025

Extend Fabric Data Agents with Python SDK End to End Tutorial

Let's walk through and End to End tutorial for extending an existing Data Agent with Bradley Ball, aka ‪@SQLBalls‬. In this tutorial we will use the first two links to get and clone the GitHub repo and to use the code to substitute our questions for those hard coded in the tutorial. We also extend the tutorial to make a change to move from hard coded questions to a more interactive Q & A application!

dataagent pythonsdk evaluation

Speaker: Bradley Ball