#evaluation

2 items tagged with "evaluation"

📄 Articles

📄 article

lucazavarella.medium.com

Mar 17, 2026

We Built the Benchmark. Now Let’s Evaluate the Fabric Data Agent for Real

This article shows how to move from a benchmark design to a real evaluation workflow for a Microsoft Fabric Data Agent. Starting from a 72-question benchmark built in a previous article for an Italian multilingual scenario, it explains how to complete the ground-truth dataset, run evaluate_data_agent on Fabric, inspect summary and row-level results, and use notebooks to operationalize the full process. A key insight is that part of the observed weakness may come not only from the Data Agent, but also from the evaluation layer itself. By inspecting the SDK source code and testing a stricter custom critic prompt, the article shows how evaluation reliability can improve significantly without changing the agent or the benchmark. Overall, the piece is a practical guide to benchmarking and evaluating Fabric Data Agents more rigorously, especially in multilingual business scenarios.

data-agent fabric evaluation multilingual

Author: Luca Zavarella

🎬 Videos

🎬 video

youtube.com

Aug 9, 2025

Extend Fabric Data Agents with Python SDK End to End Tutorial

Let's walk through and End to End tutorial for extending an existing Data Agent with Bradley Ball, aka ‪@SQLBalls‬. In this tutorial we will use the first two links to get and clone the GitHub repo and to use the code to substitute our questions for those hard coded in the tutorial. We also extend the tutorial to make a change to move from hard coded questions to a more interactive Q & A application!

dataagent pythonsdk evaluation

Speaker: Bradley Ball