{"id":156,"date":"2026-03-24T15:40:43","date_gmt":"2026-03-24T15:40:43","guid":{"rendered":"https:\/\/thelab.wppresolve.com\/?p=156"},"modified":"2026-04-28T09:57:43","modified_gmt":"2026-04-28T09:57:43","slug":"data-quality-assurance-agent-blog-post","status":"publish","type":"post","link":"https:\/\/cms.research.wpp.com\/?p=156","title":{"rendered":"Meet Your New Agentic Data Guardian"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-left-radius:0px;border-bottom-right-radius:0px\"><strong>1. The high cost of \u201cdirty\u201d data<\/strong><\/h2>\n\n\n\n<p>In the modern enterprise, data is the raw ingredient behind every strategic decision. Think of it like a premier restaurant: the Data Engineer is the sous-chef, meticulously sourcing and preparing ingredients, while the Data Scientist is the executive chef, transforming them into the predictive models and insights that drive the business forward. If the ingredients are spoiled or mislabelled, the final dish fails, no matter how talented the chef.<\/p>\n\n\n\n<p>Across several of our AI initiatives at WPP, we uncovered a pattern that was quietly draining velocity from our most ambitious projects. Our \u201csous-chefs\u201d, skilled data engineers responsible for pipeline integrity, were spending <strong>up to one full day per week<\/strong> on tedious, largely manual Quality Assurance (QA) of data flowing into BigQuery. Row by row, column by column, they checked for missing values, logical contradictions, and phantom duplicates, work that was essential but deeply repetitive.<\/p>\n\n\n\n<p>This wasn\u2019t just an inconvenience. It was a <strong>strategic bottleneck<\/strong>: it slowed the delivery of every downstream AI application, consumed senior engineering talent on janitorial tasks, and most dangerously created risk. When a human eye is the only safeguard between raw data and a production model, errors don\u2019t just slip through occasionally. They slip through <em>systematically<\/em>, at exactly the moments when the data is most complex and the engineer is most fatigued.<\/p>\n\n\n\n<p>We asked ourselves a different question: <em>What if, instead of building another dashboard or writing another validation script, we built an intelligent agent, one that could reason about data quality the way an experienced engineer does, learn from every audit it performs, and get better over time?<\/em><\/p>\n\n\n\n<p>This article describes how we built that agent, what makes it fundamentally different from traditional automation, and what happened when we put it to the test.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"margin-top:19px;margin-bottom:0\"><strong>2. The problem: why data quality demands more than scripts<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The data &amp; the modelling ecosystem<\/strong><\/h3>\n\n\n\n<p>The agent operates on digital marketing campaign performance data hosted in BigQuery, massive tables that track how advertising campaigns perform on a daily basis across major ad networks like Meta (Facebook and Instagram). Each row represents a highly granular intersection of a specific campaign, audience segment, platform, device, and creative asset. This data captures everything from broad identifiers (like the parent brand and geographical targeting) down to precise performance metrics, including impressions, clicks, daily spend, conversions, leads, and app installs.<\/p>\n\n\n\n<p>This foundational data is the lifeblood of two critical machine learning systems:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>The Prediction Model:<\/strong>&nbsp;A classification system designed to predict whether a planned campaign will yield a negative, neutral, or positive outcome.<\/li>\n\n\n\n<li><strong>The Recommendation System:<\/strong>&nbsp;A highly flexible advisory engine capable of handling any combination of &#8220;missing modalities.&#8221; For example, if a media planner inputs a specific Brand, Target Audience, and Location, the system dynamically recommends the optimal missing parameters, such as the best platform to use and the most effective creative asset to deploy.<\/li>\n<\/ol>\n\n\n\n<p>For more background on the broader modelling context, see <a href=\"https:\/\/research.wpp.com\/blog\/from-guesswork-to-foresight-how-ai-is-predicting-the-future-of-marketing-campaigns\">From Guesswork to Glimpse: How AI is Predicting the Future of Marketing Campaigns<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The silent threat of data corruption<\/strong><\/h3>\n\n\n\n<p>Because these models directly inform real-world media spend and strategic campaign planning, their accuracy is paramount. The underlying data is regularly refreshed directly from the advertising platforms to keep the models up to date. However, this automated refresh process frequently introduces subtle corruption and systemic inconsistencies.<\/p>\n\n\n\n<p>For instance, while metrics like&nbsp;<code>engagement<\/code>&nbsp;and&nbsp;<code>clicks<\/code>&nbsp;generally remain stable, downstream pipeline issues frequently render&nbsp;<code>conversions<\/code>&nbsp;and&nbsp;<code>awareness<\/code>&nbsp;metrics unreliable (&#8220;not high quality&#8221;). At the individual row level, these anomalies are often entirely invisible. But at scale, they are devastating. If left unchecked, these untrustworthy data points bleed into the training sets, silently degrading the prediction model&#8217;s accuracy and causing the recommendation engine to suggest sub-optimal, expensive campaign configurations. This makes rigorous, automated data quality validation not just a nice-to-have, but an absolute necessity for the ecosystem to function.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The failure modes<\/h3>\n\n\n\n<p>The scale and velocity of data flowing into BigQuery mean that errors don\u2019t announce themselves. They hide. Through our manual QA process, we catalogued six prevalent failure modes, each one capable of silently degrading every model built on top of the data:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Failure Mode<\/th><th>What Happens<\/th><th>Why It Matters<\/th><\/tr><\/thead><tbody><tr><td><strong>Missing Values<\/strong><\/td><td>Fields arrive empty: sometimes 5% of a column, sometimes 40%<\/td><td>Models trained on incomplete data learn incomplete patterns. Forecasts drift silently.<\/td><\/tr><tr><td><strong>Outliers<\/strong><\/td><td>A metric reads 200,000 clicks when the true value is 500<\/td><td>A single extreme value can skew an entire model\u2019s calibration, distorting spend recommendations.<\/td><\/tr><tr><td><strong>Duplicate Rows<\/strong><\/td><td>Identical records appear multiple times<\/td><td>Inflated counts cascade into inflated budgets. Campaigns appear to outperform reality.<\/td><\/tr><tr><td><strong>Categorical Corruption<\/strong><\/td><td>A brand name like <code>\"Nike\"<\/code> is replaced with <code>\"zX9pQ\"<\/code><\/td><td>Segmentation breaks. Reports attribute performance to entities that don\u2019t exist.<\/td><\/tr><tr><td><strong>Logical Inconsistencies<\/strong><\/td><td>More clicks than impressions. Spend recorded against zero impressions.<\/td><td>These are the most insidious. Each value looks valid in isolation, but the <em>relationships<\/em> between them violate business reality.<\/td><\/tr><tr><td><strong>Missing Columns<\/strong><\/td><td>An entire field disappears from a refresh<\/td><td>Downstream pipelines fail or, worse, silently fall back to defaults.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>A static validation script can catch some of these: the easy ones, the ones you&#8217;ve already seen. But scripts are <strong>brittle<\/strong>: they encode yesterday\u2019s assumptions and break on tomorrow\u2019s edge case. They cannot reason about <em>why<\/em> a pattern looks wrong, weigh it against historical context, or decide whether a recurring anomaly is a genuine error or a known artifact of a data source.<\/p>\n\n\n\n<p>That requires judgment. And judgment is what we built the agent to provide.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Our approach: an agent that reasons, remembers, and improves<\/h2>\n\n\n\n<p>We designed the <strong>Data Quality Assurance Agent<\/strong> as a <em>reasoning entity<\/em> capable of planning an audit strategy, querying data, forming hypotheses about its health, testing those hypotheses, and learning from the results. The distinction matters. A script checks what you tell it to check. An agent decides <em>what to check<\/em>, based on what it knows and it has the tools to act on that decision end-to-end.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture: one agent, specialised tools<\/h3>\n\n\n\n<p>The agent is powered by a single reasoning core that plans, decides, and acts. What gives it breadth is its <strong>toolkit<\/strong>, a set of specialised capabilities it can invoke as needed, selecting the right tool for each step of the audit:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"572\" src=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/generated-image-1774261680374-1024x572.jpg\" alt=\"\" class=\"wp-image-438\" srcset=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/generated-image-1774261680374-1024x572.jpg 1024w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/generated-image-1774261680374-300x167.jpg 300w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/generated-image-1774261680374-768x429.jpg 768w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/generated-image-1774261680374.jpg 1376w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Data Agent Architecture Diagram<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Database Tool:<\/strong> enables the agent to query BigQuery directly, fetching schemas, row counts, column statistics, and raw data samples.<\/li>\n\n\n\n<li><strong>Auditing Tool:<\/strong> the agent\u2019s analytical engine. It formulates hypotheses about potential quality issues, runs targeted checks, and compiles structured findings. This tool reads from and writes to the <strong>Memory Bank<\/strong>.<\/li>\n\n\n\n<li><strong>Analytics Tool:<\/strong> generates visualisations using Python, including charts, distributions, and plots that make audit findings immediately legible to stakeholders.<\/li>\n\n\n\n<li><strong>Artifact Tool:<\/strong> packages the final audit report, charts, and evidence into downloadable artifacts stored in Google Cloud.<\/li>\n<\/ul>\n\n\n\n<p>The agent orchestrates these tools autonomously. When a user asks it to audit a table, the agent formulates a plan, queries the data, runs its checks, generates visualisations where useful, and compiles a structured report, all without the user needing to specify which tool to use or in what order.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The key innovation: long-term memory<\/h3>\n\n\n\n<p>Most AI tools are stateless. When the session ends, everything the system learned disappears. The next audit starts from zero. This is the fundamental limitation we set out to break. The agent maintains a <strong>persistent Memory Bank,<\/strong> a long-term knowledge store that survives across sessions and accumulates institutional intelligence over time. This memory captures three categories of knowledge:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Historical Explanations<\/strong> When a data engineer confirms that a recurring anomaly is caused by a known tracking limitation or data source quirk, the agent records that explanation. The next time it encounters the same pattern, it doesn\u2019t waste time flagging it as a new issue, it references the known cause, notes it in the report, and moves on to genuinely novel problems.<\/li>\n\n\n\n<li><strong>Business Context<\/strong> Over successive audits, the agent absorbs the specific rhythms and patterns of our marketing data, seasonal spikes, platform-specific reporting delays, expected variance ranges for different campaign types. This contextual awareness allows it to distinguish between a <em>real<\/em> anomaly and normal business variation.<\/li>\n\n\n\n<li><strong>Evolutionary Learning<\/strong> With every audit, the agent\u2019s knowledge base deepens. Instead of repeating the same blind checks, it refines its hypotheses based on what it has seen before, including which columns tend to have issues, which tables are most prone to duplication, and which logical inconsistencies recur. The agent doesn\u2019t just run. It <em>compounds<\/em>.<\/li>\n<\/ol>\n\n\n\n<p>This is what separates an agent from a script. A script executes the same logic every time, regardless of history. The agent carries forward everything it has learned and every audit it performs makes the next one sharper.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The tech stack<\/h3>\n\n\n\n<p>To ensure the agent was enterprise-grade, we built on the full Google Cloud AI ecosystem:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Component<\/th><th>Role<\/th><\/tr><\/thead><tbody><tr><td><strong>Vertex AI Agent Engine<\/strong><\/td><td>Manages the agent\u2019s long-term specific memory persistence, and saving of the chat sessions<\/td><\/tr><tr><td><strong>BigQuery<\/strong><\/td><td>The single source of truth where the agent performs direct, in-place auditing against production tables<\/td><\/tr><tr><td><strong>Agent Development Kit (ADK)<\/strong><\/td><td>The framework used to define the agent\u2019s tools, constraints, and interaction boundaries<\/td><\/tr><tr><td><strong>Google Cloud Storage<\/strong><\/td><td>Persistent storage for audit trails, PDF reports, and visual evidence<\/td><\/tr><tr><td><strong>Cloud Runs<\/strong><\/td><td>Used to deploy the A2A Agent API, and the ADK Web UI for demo purposes<\/td><\/tr><tr><td><strong>A2A<\/strong><\/td><td>The protocol to expose our Agent as a headless API<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Proving it works: synthetic error injection<\/h2>\n\n\n\n<p>We didn\u2019t hope the agent worked. We <strong>proved<\/strong> it using a controlled methodology we call <strong>Synthetic Error Injection<\/strong>. The premise is straightforward: take a perfectly clean dataset, intentionally corrupt it in specific, measurable ways, and then challenge the agent to find every error we planted. If the agent can detect artificially injected errors, whose exact type, location, and severity we control, we can be confident it will handle real-world data corruption, which is typically far less extreme.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Preparing the test data<\/h3>\n\n\n\n<p>Before injecting errors, we prepare the data for safe, controlled experimentation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Anonymisation:<\/strong> Real brand and advertiser names are replaced with generic identifiers (<code>\"Brand 1\"<\/code>, <code>\"Company A\"<\/code>). Sensitive business information never enters the test environment.<\/li>\n\n\n\n<li><strong>Corruption:<\/strong> The dataset then receives a different severity level of corruption. This allows us to map the agent\u2019s detection accuracy as a function of error density, from subtle (5%) to extreme (40%).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Injecting controlled errors<\/h3>\n\n\n\n<p>Using purpose-built scripts, we introduce precisely calibrated corruptions into a table, 4 types of Structural and 7 types of Logical errors:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Category<\/strong><\/th><th><strong>Error<\/strong><\/th><th><strong>Description<\/strong><\/th><\/tr><\/thead><tbody><tr><td>Structural<\/td><td>Missing Values (Nulls)<\/td><td>X% of cells set to NULL<\/td><\/tr><tr><td>Structural<\/td><td>Duplicate Rows<\/td><td>X% exact row copies<\/td><\/tr><tr><td>Structural<\/td><td>Dropped Columns<\/td><td>X% of columns removed<\/td><\/tr><tr><td>Structural<\/td><td>Categorical Errors<\/td><td>Random alphanumeric strings in category fields<\/td><\/tr><tr><td>Logical<\/td><td>Clicks &gt; Impressions<\/td><td>Can&#8217;t click what wasn&#8217;t shown<\/td><\/tr><tr><td>Logical<\/td><td>Conversions &gt; Clicks<\/td><td>Can&#8217;t convert without clicking<\/td><\/tr><tr><td>Logical<\/td><td>Spend with 0 Impressions<\/td><td>Paying for zero ad delivery<\/td><\/tr><tr><td>Logical<\/td><td>Video Completions &gt; Plays<\/td><td>Can&#8217;t finish without starting<\/td><\/tr><tr><td>Logical<\/td><td>Purchases without Add-to-Cart<\/td><td>Funnel step skipped<\/td><\/tr><tr><td>Logical<\/td><td>Landing Page Views &gt; Clicks<\/td><td>More landings than clicks<\/td><\/tr><tr><td>Logical<\/td><td>Negative Metric Values<\/td><td>Performance metrics can&#8217;t be negative<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Synthetic ground truth dataset<\/h3>\n\n\n\n<p>We keep track of the errors we introduce in a table and produce a ground truth dataset that looks like:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-group has-global-padding is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-style-default is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<figure style=\"margin-right:0;margin-left:0\" class=\"wp-block-table has-small-font-size\"><table style=\"border-style:none;border-width:0px\"><thead><tr><th>Table_name<\/th><th>number_of_injected_logical_errors<\/th><th>type_of_logical_error<\/th><th>number_of_injected_structural_errors<\/th><th>type_of_structural_error<\/th><\/tr><\/thead><tbody><tr><td>table_01<\/td><td>0<\/td><td>&#8211;<\/td><td>1<\/td><td>categorical errors<\/td><\/tr><tr><td>table_02<\/td><td>0<\/td><td>&#8211;<\/td><td>1<\/td><td>dropped columns<\/td><\/tr><tr><td>table_03<\/td><td>1<\/td><td>clicks_exceed_impressions<\/td><td>0<\/td><td>&#8211;<\/td><\/tr><tr><td>table_04<\/td><td>1<\/td><td>spend_with_zero_impressions<\/td><td>0<\/td><td>&#8211;<\/td><\/tr><\/tbody><\/table><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. 5. Evaluation pipeline, experiments and results<\/h2>\n\n\n\n<p>To evaluate our Agent we follow the pipeline below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"559\" src=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/evaluation_pipeline_generated-image-1774267652120-1024x559.jpg\" alt=\"\" class=\"wp-image-440\" style=\"width:800px\" srcset=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/evaluation_pipeline_generated-image-1774267652120-1024x559.jpg 1024w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/evaluation_pipeline_generated-image-1774267652120-300x164.jpg 300w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/evaluation_pipeline_generated-image-1774267652120-768x419.jpg 768w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/evaluation_pipeline_generated-image-1774267652120.jpg 1408w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Evaluation pipeline flow diagram<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">The 4 experiments and results<\/h2>\n\n\n\n<p>Each experiment isolates a <strong>single variable<\/strong> to understand what affects the auditor agent&#8217;s detection quality.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Experiment 1: Prompt comparison<\/h3>\n\n\n\n<p><strong>Question:<\/strong> <em>Does giving the agent a more detailed prompt improve error detection?<\/em><\/p>\n\n\n\n<p>Runs the agent 3 times on the same table, each time with a different user query style:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Prompt Level<\/th><th>What the user asks<\/th><th>Agent&#8217;s system instruction<\/th><\/tr><\/thead><tbody><tr><td><strong>Simple<\/strong><\/td><td>&#8220;Check if there are any errors for table X&#8221;<\/td><td>Stays constant (forensic mode)<\/td><\/tr><tr><td><strong>Medium<\/strong><\/td><td>&#8220;Perform a structured assessment checking physical integrity, numerical sanity, categorical validity&#8221;<\/td><td>Stays constant<\/td><\/tr><tr><td><strong>Complex<\/strong><\/td><td>&#8220;Conduct a forensic audit checking for 11 specific error types with detailed cross-column logical checks&#8221;<\/td><td>Stays constant<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Key insight from results:<\/strong> &nbsp;Only the complex prompt successfully detected the injected&nbsp;<code>spend_with_zero_impressions<\/code>&nbsp;error (139 rows, 1.82%), while both the simple and medium prompts missed it entirely, confirming that more detailed, forensic-style instructions are critical for the agent to test nuanced logical relationships rather than just surface-level checks.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Experiment 2: Table sweep<\/h3>\n\n\n\n<p><strong>Question:<\/strong> <em>How well does the agent detect different types of errors?<\/em><\/p>\n\n\n\n<p>Experiment 2 stress-tests the Auditor agent (using the complex prompt) on 11 synthetic BigQuery tables with progressively stacked error combinations, ranging from a single logical violation to the full set of 7 logical plus 4 structural error types (11 total). The agent achieved perfect detection on 8 of 11 tables (72.7% with F1 = 1.0) and delivered strong overall performance (precision 0.905, recall 0.864, F1 0.883).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Experiment 3: Consistency check<\/h3>\n\n\n\n<p><strong>Question:<\/strong> <em>If we run the exact same audit N times, do we get the same results?<\/em><\/p>\n\n\n\n<p>Runs the identical configuration multiple times (default: 3).<\/p>\n\n\n\n<p>The Consistency Experiment demonstrated that the auditor agent achieves perfect 100% reproducibility, consistently detecting both injected errors and real data quality issues with the exact same metrics and explanations across all three runs. This deterministic performance confirms that the agent&#8217;s complex prompt configuration is highly reliable effectively eliminating the need for redundant audits.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Experiment 4: Memory ablation<\/h3>\n\n\n\n<p><strong>Question:<\/strong> <em>Does loading past audit memories help the agent perform better?<\/em><\/p>\n\n\n\n<p>Runs the agent twice on the same table with the complex prompt:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Condition<\/th><th>What happens<\/th><\/tr><\/thead><tbody><tr><td><strong>No Memory<\/strong><\/td><td>Cold start; agent has no context from prior audits<\/td><\/tr><tr><td><strong>With Memory<\/strong><\/td><td>Agent loads past findings from Vertex AI memory before auditing<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Key insight from results:<\/strong> &nbsp;Memory doesn&#8217;t unlock new capabilities when tables contain simple structural errors or when the user prompt leads to a very detailed auditing but acts as a critical consistency enforcer, stabilising the audit scope to guarantee a 100% success rate. However, in cases that there are many different logical errors in a table and the prompt is simple, when past memories are loaded into the agent\u2019s present context performance becomes optimal.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. The agent in action<\/h2>\n\n\n\n<p>Below is an actual interaction with the deployed agent, demonstrating its reasoning in practice. The agent discovers available tables, loads its accumulated memory from prior audits, and then, guided by that memory, conducts a targeted audit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Discovery<\/h3>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Human:<\/strong> \u201cWhat tables do you have access to?\u201d<\/p>\n\n\n\n<p><strong>Agent:<\/strong> \u201cI have access to the following tables in the dataset:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>campaign_performance_table_1<\/code><\/li>\n\n\n\n<li><code>campaign_performance_table_2<\/code><\/li>\n\n\n\n<li><code>campaign_performance_table_3<\/code><\/li>\n\n\n\n<li><code>campaign_performance_table_4<\/code><\/li>\n\n\n\n<li><code>campaign_performance_table_5<\/code><\/li>\n<\/ul>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Memory recall<\/h3>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Human:<\/strong> \u201cLoad your memories.\u201d<\/p>\n\n\n\n<p><strong>Agent:<\/strong> \u201cI have loaded the following memories:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>A general instruction to <strong>always check for null values<\/strong> in tables.<\/li>\n\n\n\n<li>A specific data quality issue in <code>campaign_performance_table_2<\/code> where <strong>188 rows (18.8%) were duplicates<\/strong> based on <code>event_date<\/code>, <code>campaign_id<\/code>, and <code>ad_creative_id<\/code>.<\/li>\n\n\n\n<li>A preference for reports to be <strong>formatted in Markdown<\/strong> and include an <strong>executive summary<\/strong>.\u201d<\/li>\n<\/ol>\n<\/blockquote>\n\n\n\n<p>The agent doesn\u2019t start from scratch. It arrives with institutional knowledge: prior findings, user preferences, and standing instructions; all retrieved from its persistent Memory Bank.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Hypothesis-driven audit<\/h3>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Human:<\/strong> \u201cBased on these memories, audit <code>campaign_performance_table_3<\/code>.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>The agent formulates a plan, queries the table using its database tool, and, informed by its memories, produces a structured audit report:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Executive Summary:<\/strong> This audit revealed a consistent pattern of missing values across key columns, affecting exactly 5% of total rows. Specifically, <code>event_date<\/code>, <code>campaign_id<\/code>, <code>impressions<\/code>, <code>clicks<\/code>, and <code>conversions<\/code> each had 50 null entries out of 1,000 rows. No exact duplicate rows were found. This suggests a <strong>systemic issue with data completeness<\/strong> rather than random omissions.<\/p>\n<\/blockquote>\n\n\n\n<p>The agent then provides detailed findings per column, quantifying impact, assessing severity, and recommending next steps, before offering to proceed with outlier detection, categorical validation, and logical consistency checks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Persistent learning<\/h3>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Human:<\/strong> \u201cSave the results for this table.\u201d<\/p>\n\n\n\n<p><strong>Agent:<\/strong> \u201cThe audit results have been saved to memory.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>The next time the agent audits <em>any<\/em> table, it will carry forward what it learned here: the 5% missing-value pattern, the columns affected, the systemic nature of the issue. Every audit makes the next one faster, more targeted, and more informed.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Impact and what comes next<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Operational impact<\/h3>\n\n\n\n<p>The deployment of the Data Quality Assurance Agent has delivered measurable results:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reclaimed engineering capacity.<\/strong> By automating the bulk of manual QA, data engineers have shifted from repetitive inspection work to higher-value activities.<\/li>\n\n\n\n<li><strong>Earlier error detection.<\/strong> Quality issues are now caught <em>at the point of ingestion<\/em>, before they propagate into downstream models. This \u201cshift-left\u201d approach reduces the blast radius of bad data from hours to minutes.<\/li>\n\n\n\n<li><strong>Higher model reliability.<\/strong> Marketing agents, analytics pipelines, and machine learning models now operate on data that has been systematically validated, reducing the risk of predictions and recommendations built on flawed foundations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">The bigger picture<\/h3>\n\n\n\n<p>This agent is more than a tool. It is a <strong>blueprint for autonomous data governance,<\/strong> a pattern that can be replicated across any data pipeline where quality, scale, and velocity collide.<\/p>\n\n\n\n<p>We are currently extending the agent along three axes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cross-table auditing:<\/strong> enabling the agent to detect inconsistencies <em>across related datasets<\/em>, not just within a single table. Many of the most damaging data quality issues manifest as contradictions between tables that individually look clean.<\/li>\n\n\n\n<li><strong>Event-driven execution:<\/strong> triggering the agent automatically whenever a BigQuery table is updated, transforming data quality monitoring from a scheduled chore into a <strong>continuous, always-on safeguard<\/strong>.<\/li>\n\n\n\n<li><strong>Adversarial stress-testing:<\/strong> today, our synthetic error injection is script-based and manually configured. We are building a dedicated adversarial agent whose sole purpose is to generate increasingly complex, realistic data corruptions, subtle logical contradictions, plausible-looking outliers, correlated missing-value patterns, specifically designed to challenge the QA agent\u2019s detection capabilities. By putting one agent against the other in a continuous red-team \/ blue-team loop, both improve: the adversarial agent learns to craft harder-to-detect errors, and the QA agent learns to catch them, driving each other toward sharper, more robust performance over time.<\/li>\n<\/ul>\n\n\n\n<p>Together, these extensions move us toward a future where data quality monitoring is not a task that consumes an engineer\u2019s day. It is a capability the agent handles continuously and intelligently, surfacing only the issues that require human judgment and decision-making.<\/p>\n\n\n\n<p>Ready to explore the specifics? Read our full technical deep dive into    <a href=\"https:\/\/research.wpp.com\/pods\/data-quality-agent-pod\">Data Quality Agent Pod<\/a> for a closer look at our methodology.<\/p>\n\n\n\n<p><em>Disclaimer: This content was created with AI assistance. All research and conclusions are the work of the WPP AI Lab team.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Silent data corruption is a well-documented challenge across modern ML pipelines. Broken ingestion jobs, schema drift, logical inconsistencies: these issues rarely trigger alerts, and by the time they&#8217;re caught, downstream models may have already been learning from compromised data. We built an autonomous agent that audits data directly in BigQuery, runs forensic structural and logical checks with zero manual input, and , crucially, remembers. Its persistent memory architecture means every audit sharpens the next, elevating data quality from a routine operational task into a compounding strategic advantage. The results: F1 of 0.88, perfect detection on 73% of test scenarios, and 100% consistency across runs.<\/p>\n","protected":false},"author":20,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"tags":[],"ppma_author":[{"id":20,"display_name":"Anastasios Stamoulakatos","first_name":"Anastasios","last_name":"Stamoulakatos","nickname":"anastasios.stamoulakatos","user_nicename":"anastasios-stamoulakatos","user_email":"anastasios.stamoulakatos@satalia.com","biographical_info":"Anastasios (Tasos) Stamoulakatos is a Data Scientist at Satalia (WPP), focusing on agentic AI solutions for marketing. His work spans multi-agent systems, RAG and GraphRAG, and image retrieval, developing scalable AI solutions from early-stage POCs to production. He holds a PhD in Applied AI and Computer Vision from the University of Strathclyde and has over four years of commercial experience across industries including marketing, agriculture, pharmaceuticals, oil and gas, and manufacturing, with a strong focus on applied research and turning complex AI into practical business value.","avatar_url":"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/headshot_small.jpg","job_title":"Data Scientist","is_lead":null,"display_as_researcher":null,"order_priority":null},{"id":21,"display_name":"Thanos Lyras","first_name":"Thanos","last_name":"Lyras","nickname":"thanos.lyras","user_nicename":"thanos-lyras","user_email":"thanos.lyras@satalia.com","biographical_info":"Thanos Lyras is a data scientist at Satalia specializing in building end-to-end AI pipelines and deploying real-world AI applications. A graduate in Computer Engineering with an MSc in Data Science, his work has led to research contributions in the fields of Big Data, AI, and database performance. Currently, he is focused on pioneering research in agentic projects, exploring the next wave of artificial intelligence.","avatar_url":"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/04\/profile_photo.png","job_title":"Data Scientist","is_lead":false,"display_as_researcher":true,"order_priority":null}],"class_list":["post-156","post","type-post","status-publish","format-standard","hentry"],"acf":{"related_pods":[155],"featured":false},"authors":[{"term_id":20,"user_id":20,"is_guest":0,"slug":"anastasios-stamoulakatos","display_name":"Anastasios Stamoulakatos","avatar_url":"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/headshot_small.jpg","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":"","biographical_info":"Anastasios (Tasos) Stamoulakatos is a Data Scientist at Satalia (WPP), focusing on agentic AI solutions for marketing. His work spans multi-agent systems, RAG and GraphRAG, and image retrieval, developing scalable AI solutions from early-stage POCs to production. He holds a PhD in Applied AI and Computer Vision from the University of Strathclyde and has over four years of commercial experience across industries including marketing, agriculture, pharmaceuticals, oil and gas, and manufacturing, with a strong focus on applied research and turning complex AI into practical business value."},{"term_id":37,"user_id":21,"is_guest":0,"slug":"thanos-lyras","display_name":"Thanos Lyras","avatar_url":"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/04\/profile_photo.png","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":"","biographical_info":"Thanos Lyras is a data scientist at Satalia specializing in building end-to-end AI pipelines and deploying real-world AI applications. A graduate in Computer Engineering with an MSc in Data Science, his work has led to research contributions in the fields of Big Data, AI, and database performance. Currently, he is focused on pioneering research in agentic projects, exploring the next wave of artificial intelligence."}],"_links":{"self":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/posts\/156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=156"}],"version-history":[{"count":75,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/posts\/156\/revisions"}],"predecessor-version":[{"id":1325,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/posts\/156\/revisions\/1325"}],"wp:attachment":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=156"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=156"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}