{"id":1649,"date":"2026-03-20T12:55:26","date_gmt":"2026-03-20T12:55:26","guid":{"rendered":"https:\/\/cms.research.wpp.com\/?post_type=research_feed&#038;p=1649"},"modified":"2026-06-17T12:39:16","modified_gmt":"2026-06-17T12:39:16","slug":"ad-performance-pod-technical-walkthrough","status":"publish","type":"research_feed","link":"https:\/\/cms.research.wpp.com\/?research_feed=ad-performance-pod-technical-walkthrough","title":{"rendered":"Ad Performance Pod: Technical walkthrough"},"content":{"rendered":"","protected":false},"excerpt":{"rendered":"","protected":false},"author":2,"featured_media":0,"template":"","meta":{"_acf_changed":false},"tags":[],"content_types":[{"id":51,"name":"Technical Walkthrough","slug":"technical-walkthrough"}],"ppma_author":[{"id":2,"display_name":"MIchal Koziki","first_name":"MIchal","last_name":"Kozicki","nickname":"michal.kozicki","user_nicename":"michalk","user_email":"michal.kozicki@wppmedia.com","biographical_info":"Designer","avatar_url":"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/04\/default-1.jpg","job_title":null,"is_lead":false,"display_as_researcher":false,"order_priority":null}],"class_list":["post-1649","research_feed","type-research_feed","status-publish","hentry","content_type-technical-walkthrough"],"acf":{"content":"<p><!-- wp:heading {\"level\":1} --><\/p>\n<h1 class=\"wp-block-heading\"><strong>Motivation<\/strong><\/h1>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>The core objective of this project is to move beyond traditional multimodal model training by leveraging the reasoning capabilities of Large Language Models (LLMs). We aim to convert complex multimodal data types (numerical data, images, text, and video) into a single unified modality: semantic text. This text is then used to predict media post performance.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>An autonomous agent performs iterative training of a downstream model by optimizing the prompt used for the multimodal-to-text conversion. This approach addresses two challenges in modern advertising analytics:<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><strong>A. Modality Conversion to Text<\/strong> Traditional feature extraction requires manually defining what matters (e.g., detecting faces, measuring brightness, OCR). This is rigid and often misses high-level semantic nuances like &#8220;humor,&#8221; &#8220;urgency,&#8221; or &#8220;brand alignment.&#8221;<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>The Solution:<\/strong> Multimodal LLMs act as universal feature extractors capable of capturing abstract concepts that traditional computer vision misses. By projecting all modalities into natural language, we normalize heterogeneous data into a single, unified format that is easy to process.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><strong>B. Prompt Optimization<\/strong> Human engineers often struggle to write the &#8220;perfect&#8221; prompt to extract the right predictive features. We might ask an LLM to &#8220;describe the image,&#8221; but we don&#8217;t know if mentioning the color palette is more predictive of ad success than mentioning the audio pace.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>The Solution:<\/strong> We employ an Agentic Optimizer that treats the natural language prompt as a hyperparameter. By iteratively rewriting the prompt based on downstream model performance, the system learns which visual and textual features genuinely correlate with success.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:heading {\"level\":1} --><\/p>\n<h1 class=\"wp-block-heading\"><strong>High-Level Approach<\/strong><\/h1>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>The proposed architecture functions as a feedback loop consisting of three distinct engines: The <strong>Transmuter<\/strong>, The <strong>Predictor<\/strong>, and The <strong>Optimizer<\/strong>.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\"><strong>Phase I: Semantic Translation (The Transmuter)<\/strong><\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>This module is responsible for unifying the data modalities.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Input:<\/strong> Raw Media (Video\/Image files) + Metadata (CSV Metrics) + System Prompt.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Process:<\/strong> A Multimodal LLM (e.g., Gemini, GPT, Claude) ingests the media. To process large volumes of posts efficiently during each optimization round, we utilize <strong>Vertex AI Batch Predictions<\/strong>. The System Prompt directs the model to extract specific features (e.g., &#8220;Describe the video\/image,&#8221; &#8220;Read the copy,&#8221; &#8220;Contextualize the click-through rate&#8221;).<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Output:<\/strong> A rich, structured text document for each ad or post.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Phase II: Performance Prediction (The Predictor)<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>This module evaluates performance based purely on the text output from Phase I.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Input:<\/strong> The generated text profiles.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Process:<\/strong> A lightweight, text-only Language Model (e.g., a fine-tuned LLaMA) is trained on these descriptions. It can either predict a binary label (&#8220;Good&#8221; vs. &#8220;Bad&#8221; performance) or perform regression to predict continuous metrics like the number of likes.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Output:<\/strong> A performance probability score and a validation metric (e.g., F1-Score, RMSE, or AUC).<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Phase III: Iterative Optimization (The Optimizer Agent)<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>This is the meta-learning layer that improves the system over time without human intervention.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>The Logic:<\/strong> If the Predictor fails to classify an ad correctly, it implies the generated description was missing key information.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>The Agent:<\/strong> An LLM agent analyses the validation results, comparing the current prompt against error analysis data.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>The Update:<\/strong> The agent rewrites the System Prompt to be more effective. For example, if the model confused a &#8220;luxury&#8221; ad for a &#8220;budget&#8221; ad, the Agent might modify the prompt to: <em>&#8220;Include specific details about the production quality, describe the product environment, and capture the perceived vibe of the ad.&#8221;<\/em><\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><strong>Summary of Workflow:<\/strong><\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list {\"ordered\":true} --><\/p>\n<ol class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Initialize:<\/strong> Start with a generic prompt (&#8220;Describe this ad&#8221;).<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Transmute:<\/strong> Convert media to text using the current prompt.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Train:<\/strong> Train the text classifier\/regressor.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Evaluate:<\/strong> Measure accuracy.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Refine:<\/strong> The agent updates the prompt to extract better predictive features.<\/li>\n<p><!-- \/wp:list-item --><\/ol>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><strong>Repeat:<\/strong><\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>Loop until performance plateaus.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:image --><\/p>\n<figure class=\"wp-block-image\"><img alt=\"\"\/><\/figure>\n<p><!-- \/wp:image --><\/p>\n<p><!-- wp:code --><\/p>\n<pre class=\"wp-block-code\"><code>Illustration of the proposed approach\n<\/code><\/pre>\n<p><!-- \/wp:code --><\/p>\n<p><!-- wp:heading {\"level\":1} --><\/p>\n<h1 class=\"wp-block-heading\">Dataset<\/h1>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Dataset overview<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>We utilized the <a href=\"https:\/\/sites.google.com\/site\/sbkimcv\/dataset\/instagram-influencer-dataset\"><strong>Instagram Influencer Dataset<\/strong><\/a> to extract text descriptions of posts and predict engagement metrics (such as the number of likes).<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Type:<\/strong> Category classification and regression.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Description:<\/strong> This dataset contains 33,935 Instagram influencers categorized into nine domains: beauty, family, fashion, fitness, food, interior, pet, travel, and other. It features 300 posts per influencer, totalling roughly 10.18 million posts.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Structure:<\/strong> Post metadata is stored in JSON format (caption, user tags, hashtags, timestamp, sponsorship status, likes, comments). The image files are in JPEG format. Because a single post can contain multiple images, the dataset provides a JSON-to-Image mapping file to link metadata with its corresponding visual assets.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\"><strong>Exploratory Data Analysis (EDA)<\/strong><\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>To better understand the target variables for our Predictor engine, we conducted a rigorous EDA on the dataset, revealing several key structural behaviours:<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Visualizing the Distribution of Likes:<\/strong> When visualizing the distribution of likes across the dataset, we observed a massive right-skew. The average (mean) post receives ~4,344 likes, but the median is only 662. Because of this severe, exponential variance, <strong>we cannot perform regression directly on the raw number of likes.<\/strong> Instead, the target variable must be transformed using log(likes + 1) to normalize the distribution, stabilize the variance, and ensure our regression model can learn effectively.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Likes vs. Followers Correlation:<\/strong> The scatter plot distributions show a strong positive correlation (<strong>0.7853<\/strong>) between an influencer&#8217;s follower count and the number of likes they receive.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Engagement Rate Baseline:<\/strong> We calculated the Engagement Rate (Likes \/ Followers * 100). The dataset shows a <strong>mean engagement rate of 4.23%<\/strong> and a <strong>median of 2.96%<\/strong>.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:image {\"id\":103,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} --><\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"902\" height=\"342\" src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-6.png\" alt=\"\" class=\"wp-image-103\" srcset=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-6.png 902w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-6-300x114.png 300w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-6-768x291.png 768w\" sizes=\"auto, (max-width: 902px) 100vw, 902px\" \/><\/figure>\n<p><!-- \/wp:image --><\/p>\n<p><!-- wp:code --><\/p>\n<pre class=\"wp-block-code\"><code>                                               Histogram of likes and log(likes +1)\n<\/code><\/pre>\n<p><!-- \/wp:code --><\/p>\n<p><!-- wp:image {\"id\":104,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} --><\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"902\" height=\"600\" src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-7.png\" alt=\"\" class=\"wp-image-104\" srcset=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-7.png 902w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-7-300x200.png 300w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-7-768x511.png 768w\" sizes=\"auto, (max-width: 902px) 100vw, 902px\" \/><\/figure>\n<p><!-- \/wp:image --><\/p>\n<p><!-- wp:code --><\/p>\n<pre class=\"wp-block-code\"><code>                                                     Scatter plot of likes vs followers\n<\/code><\/pre>\n<p><!-- \/wp:code --><\/p>\n<p><!-- wp:heading {\"level\":1} --><\/p>\n<h1 class=\"wp-block-heading\">Results<\/h1>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Data Preparation<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>To ensure the integrity of our predictive modelling, we first applied filters based on our EDA. Roughly 13.8% of the dataset contained sponsor labels. We removed these sponsored posts entirely, as financial backing artificially skews organic engagement rates. We then narrowed our focus to create two high-density subsets: one featuring posts from the top 20 influencers, and a larger subset featuring the top 100 influencers (minimum 100 posts each). We opted to use the top 20 influences dataset in most of our experiments.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>To handle the computational load of the iterative prompt optimization, we built a scalable cloud pipeline. Raw images and post metadata were staged in <strong>GCP Buckets<\/strong>. Gemini 2.5 Flash was deployed as the <em>Transmuter<\/em> to generate the text profiles, capturing both general post context and specific image content. Because the agentic loop required regenerating descriptions for thousands of posts across multiple prompt iterations, we leveraged <strong>Google Batch Predictions<\/strong>. This allowed us to asynchronously and cost-effectively generate the text profiles for each optimization round. Finally, the poster\u2019s profile description, bio, and category were appended to the end of each generated description to provide complete semantic context for the downstream classifier.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Baseline Modelling: Classification vs. Regression<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>We initially framed the problem as a 3-class classification task (predicting low, average, and high likes) using a custom Deep Neural Network (three Linear layers with ReLU activation, and Cross-Entropy Loss). However, results showed that treating the problem as a <strong>regression task<\/strong> on the log(likes + 1) target yielded significantly better, more granular predictive performance.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>For the regression task, we benchmarked three models:<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>XGBoost Regressor:<\/strong> (n_estimators=300, learning_rate=0.05, max_depth=6, subsample=0.8)<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>LightGBM Regressor:<\/strong> (n_estimators=300, learning_rate=0.05, max_depth=6, num_leaves=31)<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Transformer Model:<\/strong> distilbert-base-uncased, fine-tuned end-to-end.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:table --><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>R2<\/th>\n<th>MAE<\/th>\n<th>RMSE<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>XGBoost<\/td>\n<td>0.6908<\/td>\n<td>0.4184<\/td>\n<td>0.5759<\/td>\n<\/tr>\n<tr>\n<td>LightGBM<\/td>\n<td>0.5749<\/td>\n<td>0.5084<\/td>\n<td>0.6752<\/td>\n<\/tr>\n<tr>\n<td>distilbert-base-uncased<\/td>\n<td>0.7925<\/td>\n<td>0.3775<\/td>\n<td>0.4804<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p><!-- \/wp:table --><\/p>\n<p><!-- wp:code --><\/p>\n<pre class=\"wp-block-code\"><code>Results on the regression task, using History-Based Optimization\n<\/code><\/pre>\n<p><!-- \/wp:code --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Iterative Prompt Optimization Strategies<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>Using the Top-20 influencer subset, we tested two distinct agentic prompt optimization approaches to see which method helped the LLM extract the most predictive features:<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list {\"ordered\":true} --><\/p>\n<ol class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>History-Based Optimization:<\/strong> The Gemini model was provided with the prompt history alongside the actual regression metrics (R2, MAE, RMSE) from previous iterations. The prompt instructed the LLM to deduce how to improve feature extraction based on these hard metrics.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Google Few-Shot Prompt Optimizer:<\/strong> Utilizing <a href=\"https:\/\/docs.cloud.google.com\/vertex-ai\/generative-ai\/docs\/learn\/prompts\/few-shot-optimizer\">Vertex AI&#8217;s Few-Shot Optimizer<\/a>, the agent was provided with 20 &#8220;good&#8221; and 20 &#8220;bad&#8221; prediction examples from the prior iteration. The optimization rubric was defined strictly as: [&#8220;Acceptable prediction error&#8221;, &#8220;Absolute prediction error value&#8221;].<\/li>\n<p><!-- \/wp:list-item --><\/ol>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:image {\"id\":105,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} --><\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"904\" height=\"534\" src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-8.png\" alt=\"\" class=\"wp-image-105\" srcset=\"https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-8.png 904w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-8-300x177.png 300w, https:\/\/cms.research.wpp.com\/wp-content\/uploads\/2026\/03\/image-8-768x454.png 768w\" sizes=\"auto, (max-width: 904px) 100vw, 904px\" \/><\/figure>\n<p><!-- \/wp:image --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>R2 value across 20 prompt optimization rounds using the Few-Shot prompt optimization strategy.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:table --><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Prompt Optimization Strategy<\/th>\n<th>Model<\/th>\n<th>R2<\/th>\n<th>MAE<\/th>\n<th>RMSE<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>History-Based Optimization<\/td>\n<td>XGBoost<\/td>\n<td>0.6908<\/td>\n<td>0.4184<\/td>\n<td>0.5759<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td>distilbert-base-uncased<\/td>\n<td>0.7925<\/td>\n<td>0.3775<\/td>\n<td>0.4804<\/td>\n<\/tr>\n<tr>\n<td>Google Few-Shot Prompt Optimizer<\/td>\n<td>XGBoost<\/td>\n<td>0.6763<\/td>\n<td>0.4691<\/td>\n<td>0.6113<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td>distilbert-base-uncased<\/td>\n<td>0.8068<\/td>\n<td>0.3463<\/td>\n<td>0.4544<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p><!-- \/wp:table --><\/p>\n<p><!-- wp:code --><\/p>\n<pre class=\"wp-block-code\"><code>Quantitative evaluation of prompt optimization strategies on the regression task\n<\/code><\/pre>\n<p><!-- \/wp:code --><\/p>\n<p><!-- wp:heading --><\/p>\n<h2 class=\"wp-block-heading\">Embedding Model Benchmarking<\/h2>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>We additionally conducted an experiment to find the optimal text embedding model. We vectorized the generated descriptions (2,677 in total) using several popular embedding architectures and measured the downstream XGBoost regression performance.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:table --><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th><strong>Embedding Model<\/strong><\/th>\n<th><strong>Dimension<\/strong><\/th>\n<th><strong>R2 Score<\/strong><\/th>\n<th><strong>MAE<\/strong><\/th>\n<th><strong>RMSE<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>thenlper\/gte-base<\/strong><\/td>\n<td><strong>768<\/strong><\/td>\n<td><strong>0.7372<\/strong><\/td>\n<td><strong>0.4044<\/strong><\/td>\n<td><strong>0.5308<\/strong><\/td>\n<\/tr>\n<tr>\n<td>thenlper\/gte-large<\/td>\n<td>1024<\/td>\n<td>0.7165<\/td>\n<td>0.4100<\/td>\n<td>0.5513<\/td>\n<\/tr>\n<tr>\n<td>sentence-transformers\/gtr-t5-large<\/td>\n<td>768<\/td>\n<td>0.7122<\/td>\n<td>0.4179<\/td>\n<td>0.5554<\/td>\n<\/tr>\n<tr>\n<td>all-mpnet-base-v2<\/td>\n<td>768<\/td>\n<td>0.6934<\/td>\n<td>0.4357<\/td>\n<td>0.5734<\/td>\n<\/tr>\n<tr>\n<td>all-MiniLM-L12-v2<\/td>\n<td>384<\/td>\n<td>0.6273<\/td>\n<td>0.4775<\/td>\n<td>0.6322<\/td>\n<\/tr>\n<tr>\n<td>all-MiniLM-L6-v2<\/td>\n<td>384<\/td>\n<td>0.5594<\/td>\n<td>0.4996<\/td>\n<td>0.6874<\/td>\n<\/tr>\n<tr>\n<td>all-roberta-large-v1<\/td>\n<td>1024<\/td>\n<td>0.5520<\/td>\n<td>0.5198<\/td>\n<td>0.6931<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p><!-- \/wp:table --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>Evaluation of different embedding models on the regression task, using XGBoost model.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:heading {\"level\":1} --><\/p>\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n<p><!-- \/wp:heading --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>This project demonstrates a highly effective, interpretable alternative to traditional black-box multimodal models for predicting media performance. By leveraging Large Language Models as universal feature extractors (The Transmuter), we successfully unified heterogeneous data inputs into a single, human-readable semantic modality.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>Several key insights emerged from our experimental pipeline:<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:list --><\/p>\n<ul class=\"wp-block-list\"><!-- wp:list-item --><\/p>\n<li><strong>Target Transformation is Crucial:<\/strong> Predicting raw engagement metrics directly might inherently lead to faulty predictions due to label skewness. Transforming the target variable to log(likes + 1) and framing the problem as a continuous regression task yielded superior and more granular results compared to our baseline classification approach.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Prompt Optimization Strategy:<\/strong> In our agentic optimization loop, History-Based Optimization was more suitable for the task at hand . Explicitly feeding the LLM agent hard quantitative error metrics (R2, MAE, RMSE) from previous iterations allowed it to reason more effectively about feature importance. It successfully &#8220;learned&#8221; to rewrite prompts that extracted visual and semantic elements highly correlated with user engagement.<\/li>\n<p><!-- \/wp:list-item --><\/p>\n<p><!-- wp:list-item --><\/p>\n<li><strong>Embedding Efficiency Over Size:<\/strong> Our benchmarking revealed that bigger isn&#8217;t always better. The thenlper\/gte-base model (768 dimensions) achieved the highest predictive performance (R2: 0.7372), outperforming significantly heavier models like gte-large and all-roberta-large-v1. This highlights that for this specific transmuted text space, highly optimized, mid-sized embeddings offer the best linear separability for tree-based regressors like XGBoost.<\/li>\n<p><!-- \/wp:list-item --><\/ul>\n<p><!-- \/wp:list --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>Ultimately, this agentic feedback loop proves that natural language prompts can be treated as tunable hyperparameters. This architecture not only predicts media success with strong accuracy but, more importantly, provides the crucial <em>\u201cwhy\u201d<\/em> behind the prediction\u2014giving human engineers and marketers the transparency that traditional vision models lack.<\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><a href=\"https:\/\/www.notion.so\/The-Performance-Optimization-Agent-Technical-Pipeline-and-Evaluation-f432494e8bd982dcbbbe81e99afca694?pvs=21\"><strong>The Performance Optimization Agent: Technical Pipeline and Evaluation<\/strong><\/a><\/p>\n<p><!-- \/wp:paragraph --><\/p>\n<p><!-- wp:paragraph --><\/p>\n<p><a href=\"https:\/\/www.notion.so\/Prediction-Optimization-with-Self-Improving-AI-Agents-Optimizing-and-Explaining-Media-Performance-M-90a2494e8bd98382b13901f6c8e15121?pvs=21\"><strong>Prediction Optimization with Self-Improving AI Agents: Optimizing and Explaining Media Performance Models<\/strong><\/a><\/p>\n<p><!-- \/wp:paragraph --><\/p>\n","related_pods":[102],"content_quarter":"Q1 2026"},"research_categories":[],"raw_acf":{"content":"<!-- wp:heading {\"level\":1} -->\n<h1 class=\"wp-block-heading\"><strong>Motivation<\/strong><\/h1>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>The core objective of this project is to move beyond traditional multimodal model training by leveraging the reasoning capabilities of Large Language Models (LLMs). We aim to convert complex multimodal data types (numerical data, images, text, and video) into a single unified modality: semantic text. This text is then used to predict media post performance.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>An autonomous agent performs iterative training of a downstream model by optimizing the prompt used for the multimodal-to-text conversion. This approach addresses two challenges in modern advertising analytics:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p><strong>A. Modality Conversion to Text<\/strong> Traditional feature extraction requires manually defining what matters (e.g., detecting faces, measuring brightness, OCR). This is rigid and often misses high-level semantic nuances like \"humor,\" \"urgency,\" or \"brand alignment.\"<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>The Solution:<\/strong> Multimodal LLMs act as universal feature extractors capable of capturing abstract concepts that traditional computer vision misses. By projecting all modalities into natural language, we normalize heterogeneous data into a single, unified format that is easy to process.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p><strong>B. Prompt Optimization<\/strong> Human engineers often struggle to write the \"perfect\" prompt to extract the right predictive features. We might ask an LLM to \"describe the image,\" but we don't know if mentioning the color palette is more predictive of ad success than mentioning the audio pace.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>The Solution:<\/strong> We employ an Agentic Optimizer that treats the natural language prompt as a hyperparameter. By iteratively rewriting the prompt based on downstream model performance, the system learns which visual and textual features genuinely correlate with success.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:heading {\"level\":1} -->\n<h1 class=\"wp-block-heading\"><strong>High-Level Approach<\/strong><\/h1>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>The proposed architecture functions as a feedback loop consisting of three distinct engines: The <strong>Transmuter<\/strong>, The <strong>Predictor<\/strong>, and The <strong>Optimizer<\/strong>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\"><strong>Phase I: Semantic Translation (The Transmuter)<\/strong><\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>This module is responsible for unifying the data modalities.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Input:<\/strong> Raw Media (Video\/Image files) + Metadata (CSV Metrics) + System Prompt.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Process:<\/strong> A Multimodal LLM (e.g., Gemini, GPT, Claude) ingests the media. To process large volumes of posts efficiently during each optimization round, we utilize <strong>Vertex AI Batch Predictions<\/strong>. The System Prompt directs the model to extract specific features (e.g., \"Describe the video\/image,\" \"Read the copy,\" \"Contextualize the click-through rate\").<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Output:<\/strong> A rich, structured text document for each ad or post.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Phase II: Performance Prediction (The Predictor)<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>This module evaluates performance based purely on the text output from Phase I.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Input:<\/strong> The generated text profiles.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Process:<\/strong> A lightweight, text-only Language Model (e.g., a fine-tuned LLaMA) is trained on these descriptions. It can either predict a binary label (\"Good\" vs. \"Bad\" performance) or perform regression to predict continuous metrics like the number of likes.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Output:<\/strong> A performance probability score and a validation metric (e.g., F1-Score, RMSE, or AUC).<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Phase III: Iterative Optimization (The Optimizer Agent)<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>This is the meta-learning layer that improves the system over time without human intervention.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>The Logic:<\/strong> If the Predictor fails to classify an ad correctly, it implies the generated description was missing key information.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>The Agent:<\/strong> An LLM agent analyses the validation results, comparing the current prompt against error analysis data.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>The Update:<\/strong> The agent rewrites the System Prompt to be more effective. For example, if the model confused a \"luxury\" ad for a \"budget\" ad, the Agent might modify the prompt to: <em>\"Include specific details about the production quality, describe the product environment, and capture the perceived vibe of the ad.\"<\/em><\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p><strong>Summary of Workflow:<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Initialize:<\/strong> Start with a generic prompt (\"Describe this ad\").<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Transmute:<\/strong> Convert media to text using the current prompt.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Train:<\/strong> Train the text classifier\/regressor.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Evaluate:<\/strong> Measure accuracy.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Refine:<\/strong> The agent updates the prompt to extract better predictive features.<\/li>\n<!-- \/wp:list-item --><\/ol>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p><strong>Repeat:<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Loop until performance plateaus.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:image -->\n<figure class=\"wp-block-image\"><img alt=\"\"\/><\/figure>\n<!-- \/wp:image -->\n\n<!-- wp:code -->\n<pre class=\"wp-block-code\"><code>Illustration of the proposed approach\n<\/code><\/pre>\n<!-- \/wp:code -->\n\n<!-- wp:heading {\"level\":1} -->\n<h1 class=\"wp-block-heading\">Dataset<\/h1>\n<!-- \/wp:heading -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Dataset overview<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>We utilized the <a href=\"https:\/\/sites.google.com\/site\/sbkimcv\/dataset\/instagram-influencer-dataset\"><strong>Instagram Influencer Dataset<\/strong><\/a> to extract text descriptions of posts and predict engagement metrics (such as the number of likes).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Type:<\/strong> Category classification and regression.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Description:<\/strong> This dataset contains 33,935 Instagram influencers categorized into nine domains: beauty, family, fashion, fitness, food, interior, pet, travel, and other. It features 300 posts per influencer, totalling roughly 10.18 million posts.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Structure:<\/strong> Post metadata is stored in JSON format (caption, user tags, hashtags, timestamp, sponsorship status, likes, comments). The image files are in JPEG format. Because a single post can contain multiple images, the dataset provides a JSON-to-Image mapping file to link metadata with its corresponding visual assets.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\"><strong>Exploratory Data Analysis (EDA)<\/strong><\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>To better understand the target variables for our Predictor engine, we conducted a rigorous EDA on the dataset, revealing several key structural behaviours:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Visualizing the Distribution of Likes:<\/strong> When visualizing the distribution of likes across the dataset, we observed a massive right-skew. The average (mean) post receives ~4,344 likes, but the median is only 662. Because of this severe, exponential variance, <strong>we cannot perform regression directly on the raw number of likes.<\/strong> Instead, the target variable must be transformed using log(likes + 1) to normalize the distribution, stabilize the variance, and ensure our regression model can learn effectively.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Likes vs. Followers Correlation:<\/strong> The scatter plot distributions show a strong positive correlation (<strong>0.7853<\/strong>) between an influencer's follower count and the number of likes they receive.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Engagement Rate Baseline:<\/strong> We calculated the Engagement Rate (Likes \/ Followers * 100). The dataset shows a <strong>mean engagement rate of 4.23%<\/strong> and a <strong>median of 2.96%<\/strong>.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:image {\"id\":103,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} -->\n<figure class=\"wp-block-image size-full\"><img src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-6.png\" alt=\"\" class=\"wp-image-103\"\/><\/figure>\n<!-- \/wp:image -->\n\n<!-- wp:code -->\n<pre class=\"wp-block-code\"><code>                                               Histogram of likes and log(likes +1)\n<\/code><\/pre>\n<!-- \/wp:code -->\n\n<!-- wp:image {\"id\":104,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} -->\n<figure class=\"wp-block-image size-full\"><img src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-7.png\" alt=\"\" class=\"wp-image-104\"\/><\/figure>\n<!-- \/wp:image -->\n\n<!-- wp:code -->\n<pre class=\"wp-block-code\"><code>                                                     Scatter plot of likes vs followers\n<\/code><\/pre>\n<!-- \/wp:code -->\n\n<!-- wp:heading {\"level\":1} -->\n<h1 class=\"wp-block-heading\">Results<\/h1>\n<!-- \/wp:heading -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Data Preparation<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>To ensure the integrity of our predictive modelling, we first applied filters based on our EDA. Roughly 13.8% of the dataset contained sponsor labels. We removed these sponsored posts entirely, as financial backing artificially skews organic engagement rates. We then narrowed our focus to create two high-density subsets: one featuring posts from the top 20 influencers, and a larger subset featuring the top 100 influencers (minimum 100 posts each). We opted to use the top 20 influences dataset in most of our experiments.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>To handle the computational load of the iterative prompt optimization, we built a scalable cloud pipeline. Raw images and post metadata were staged in <strong>GCP Buckets<\/strong>. Gemini 2.5 Flash was deployed as the <em>Transmuter<\/em> to generate the text profiles, capturing both general post context and specific image content. Because the agentic loop required regenerating descriptions for thousands of posts across multiple prompt iterations, we leveraged <strong>Google Batch Predictions<\/strong>. This allowed us to asynchronously and cost-effectively generate the text profiles for each optimization round. Finally, the poster\u2019s profile description, bio, and category were appended to the end of each generated description to provide complete semantic context for the downstream classifier.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Baseline Modelling: Classification vs. Regression<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>We initially framed the problem as a 3-class classification task (predicting low, average, and high likes) using a custom Deep Neural Network (three Linear layers with ReLU activation, and Cross-Entropy Loss). However, results showed that treating the problem as a <strong>regression task<\/strong> on the log(likes + 1) target yielded significantly better, more granular predictive performance.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>For the regression task, we benchmarked three models:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>XGBoost Regressor:<\/strong> (n_estimators=300, learning_rate=0.05, max_depth=6, subsample=0.8)<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>LightGBM Regressor:<\/strong> (n_estimators=300, learning_rate=0.05, max_depth=6, num_leaves=31)<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Transformer Model:<\/strong> distilbert-base-uncased, fine-tuned end-to-end.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:table -->\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Model<\/th><th>R2<\/th><th>MAE<\/th><th>RMSE<\/th><\/tr><\/thead><tbody><tr><td>XGBoost<\/td><td>0.6908<\/td><td>0.4184<\/td><td>0.5759<\/td><\/tr><tr><td>LightGBM<\/td><td>0.5749<\/td><td>0.5084<\/td><td>0.6752<\/td><\/tr><tr><td>distilbert-base-uncased<\/td><td>0.7925<\/td><td>0.3775<\/td><td>0.4804<\/td><\/tr><\/tbody><\/table><\/figure>\n<!-- \/wp:table -->\n\n<!-- wp:code -->\n<pre class=\"wp-block-code\"><code>Results on the regression task, using History-Based Optimization\n<\/code><\/pre>\n<!-- \/wp:code -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Iterative Prompt Optimization Strategies<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Using the Top-20 influencer subset, we tested two distinct agentic prompt optimization approaches to see which method helped the LLM extract the most predictive features:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>History-Based Optimization:<\/strong> The Gemini model was provided with the prompt history alongside the actual regression metrics (R2, MAE, RMSE) from previous iterations. The prompt instructed the LLM to deduce how to improve feature extraction based on these hard metrics.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Google Few-Shot Prompt Optimizer:<\/strong> Utilizing <a href=\"https:\/\/docs.cloud.google.com\/vertex-ai\/generative-ai\/docs\/learn\/prompts\/few-shot-optimizer\">Vertex AI's Few-Shot Optimizer<\/a>, the agent was provided with 20 \"good\" and 20 \"bad\" prediction examples from the prior iteration. The optimization rubric was defined strictly as: [\"Acceptable prediction error\", \"Absolute prediction error value\"].<\/li>\n<!-- \/wp:list-item --><\/ol>\n<!-- \/wp:list -->\n\n<!-- wp:image {\"id\":105,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} -->\n<figure class=\"wp-block-image size-full\"><img src=\"https:\/\/research.wpp.com\/wp-content\/uploads\/2026\/03\/image-8.png\" alt=\"\" class=\"wp-image-105\"\/><\/figure>\n<!-- \/wp:image -->\n\n<!-- wp:paragraph -->\n<p>R2 value across 20 prompt optimization rounds using the Few-Shot prompt optimization strategy.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:table -->\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Prompt Optimization Strategy<\/th><th>Model<\/th><th>R2<\/th><th>MAE<\/th><th>RMSE<\/th><\/tr><\/thead><tbody><tr><td>History-Based Optimization<\/td><td>XGBoost<\/td><td>0.6908<\/td><td>0.4184<\/td><td>0.5759<\/td><\/tr><tr><td><\/td><td>distilbert-base-uncased<\/td><td>0.7925<\/td><td>0.3775<\/td><td>0.4804<\/td><\/tr><tr><td>Google Few-Shot Prompt Optimizer<\/td><td>XGBoost<\/td><td>0.6763<\/td><td>0.4691<\/td><td>0.6113<\/td><\/tr><tr><td><\/td><td>distilbert-base-uncased<\/td><td>0.8068<\/td><td>0.3463<\/td><td>0.4544<\/td><\/tr><\/tbody><\/table><\/figure>\n<!-- \/wp:table -->\n\n<!-- wp:code -->\n<pre class=\"wp-block-code\"><code>Quantitative evaluation of prompt optimization strategies on the regression task\n<\/code><\/pre>\n<!-- \/wp:code -->\n\n<!-- wp:heading -->\n<h2 class=\"wp-block-heading\">Embedding Model Benchmarking<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>We additionally conducted an experiment to find the optimal text embedding model. We vectorized the generated descriptions (2,677 in total) using several popular embedding architectures and measured the downstream XGBoost regression performance.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:table -->\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Embedding Model<\/strong><\/th><th><strong>Dimension<\/strong><\/th><th><strong>R2 Score<\/strong><\/th><th><strong>MAE<\/strong><\/th><th><strong>RMSE<\/strong><\/th><\/tr><\/thead><tbody><tr><td><strong>thenlper\/gte-base<\/strong><\/td><td><strong>768<\/strong><\/td><td><strong>0.7372<\/strong><\/td><td><strong>0.4044<\/strong><\/td><td><strong>0.5308<\/strong><\/td><\/tr><tr><td>thenlper\/gte-large<\/td><td>1024<\/td><td>0.7165<\/td><td>0.4100<\/td><td>0.5513<\/td><\/tr><tr><td>sentence-transformers\/gtr-t5-large<\/td><td>768<\/td><td>0.7122<\/td><td>0.4179<\/td><td>0.5554<\/td><\/tr><tr><td>all-mpnet-base-v2<\/td><td>768<\/td><td>0.6934<\/td><td>0.4357<\/td><td>0.5734<\/td><\/tr><tr><td>all-MiniLM-L12-v2<\/td><td>384<\/td><td>0.6273<\/td><td>0.4775<\/td><td>0.6322<\/td><\/tr><tr><td>all-MiniLM-L6-v2<\/td><td>384<\/td><td>0.5594<\/td><td>0.4996<\/td><td>0.6874<\/td><\/tr><tr><td>all-roberta-large-v1<\/td><td>1024<\/td><td>0.5520<\/td><td>0.5198<\/td><td>0.6931<\/td><\/tr><\/tbody><\/table><\/figure>\n<!-- \/wp:table -->\n\n<!-- wp:paragraph -->\n<p>Evaluation of different embedding models on the regression task, using XGBoost model.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading {\"level\":1} -->\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>This project demonstrates a highly effective, interpretable alternative to traditional black-box multimodal models for predicting media performance. By leveraging Large Language Models as universal feature extractors (The Transmuter), we successfully unified heterogeneous data inputs into a single, human-readable semantic modality.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Several key insights emerged from our experimental pipeline:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul class=\"wp-block-list\"><!-- wp:list-item -->\n<li><strong>Target Transformation is Crucial:<\/strong> Predicting raw engagement metrics directly might inherently lead to faulty predictions due to label skewness. Transforming the target variable to log(likes + 1) and framing the problem as a continuous regression task yielded superior and more granular results compared to our baseline classification approach.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Prompt Optimization Strategy:<\/strong> In our agentic optimization loop, History-Based Optimization was more suitable for the task at hand . Explicitly feeding the LLM agent hard quantitative error metrics (R2, MAE, RMSE) from previous iterations allowed it to reason more effectively about feature importance. It successfully \"learned\" to rewrite prompts that extracted visual and semantic elements highly correlated with user engagement.<\/li>\n<!-- \/wp:list-item -->\n\n<!-- wp:list-item -->\n<li><strong>Embedding Efficiency Over Size:<\/strong> Our benchmarking revealed that bigger isn't always better. The thenlper\/gte-base model (768 dimensions) achieved the highest predictive performance (R2: 0.7372), outperforming significantly heavier models like gte-large and all-roberta-large-v1. This highlights that for this specific transmuted text space, highly optimized, mid-sized embeddings offer the best linear separability for tree-based regressors like XGBoost.<\/li>\n<!-- \/wp:list-item --><\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p>Ultimately, this agentic feedback loop proves that natural language prompts can be treated as tunable hyperparameters. This architecture not only predicts media success with strong accuracy but, more importantly, provides the crucial <em>\u201cwhy\u201d<\/em> behind the prediction\u2014giving human engineers and marketers the transparency that traditional vision models lack.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p><a href=\"https:\/\/www.notion.so\/The-Performance-Optimization-Agent-Technical-Pipeline-and-Evaluation-f432494e8bd982dcbbbe81e99afca694?pvs=21\"><strong>The Performance Optimization Agent: Technical Pipeline and Evaluation<\/strong><\/a><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p><a href=\"https:\/\/www.notion.so\/Prediction-Optimization-with-Self-Improving-AI-Agents-Optimizing-and-Explaining-Media-Performance-M-90a2494e8bd98382b13901f6c8e15121?pvs=21\"><strong>Prediction Optimization with Self-Improving AI Agents: Optimizing and Explaining Media Performance Models<\/strong><\/a><\/p>\n<!-- \/wp:paragraph -->","content_quarter":"Q1 2026","related_pods":["102"],"featured":"","legacy_perspective_source_id":""},"_links":{"self":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/research_feed\/1649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/research_feed"}],"about":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/types\/research_feed"}],"author":[{"embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"acf:post":[{"embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=\/wp\/v2\/research_pods\/102"}],"wp:attachment":[{"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1649"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1649"},{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcontent_types&post=1649"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/cms.research.wpp.com\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=1649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}