From guesswork to foresight: How AI is predicting the future of marketing campaigns

Ever wonder why some advertisements seem to pop up exactly when you’re thinking about buying something, while others feel completely irrelevant? Or how a brand knows just the right message to share to get your attention? The answer lies in the evolving world of marketing campaigns, and increasingly, in the powerful capabilities of Artificial Intelligence (AI).

But what is a marketing campaign?

At its core, a marketing campaign is a carefully planned series of activities designed to achieve a specific goal for a business – whether that’s selling more products, building brand awareness, or encouraging people to sign up for a service. Think of it like launching a rocket: you need to choose the right destination (your objective), design a powerful engine (your creative message), select the perfect crew (your audience), and pick the best launchpad (your platform).

The process of creating and running these campaigns involves countless decisions, such as:

  • Audience: Who are we trying to reach? What are their interests, demographics, and behaviours?
  • Brand: What message do we want to convey about our brand? How does our brand resonate with the audience?
  • Creative: What kind of ads should we run? (text, images, videos, headlines, calls to action).
  • Objective: What’s the main goal? (e.g., getting clicks, making sales, increasing brand recognition).
  • Platform: Where should we run these ads? (e.g., Facebook, Instagram, Google Search, TV, billboards).

Campaign design is a complex process shaped by multiple factors, such as creative genius, market insights, and domain expertise. Marketers launch campaigns, closely track their impact, and then adjust their approach in real time, refining messages or recalibrating target audiences. However, given the diverse and dynamic nature of consumer behavior, this iterative adaptation process can be taxing in terms of both budget and time. It’s like setting a rocket’s course: unforeseen atmospheric shifts can require significant mid-flight corrections, each consuming valuable resources.

This is where the big challenge lies: how do we predict if a campaign will be successful before we invest significant time and money into it?

Machine Learning: Your marketing crystal ball

This challenge is precisely where Machine Learning (ML) steps in. Simply put, Machine Learning is a branch of AI that allows computers to “learn” from data without being explicitly programmed. Instead of following a strict set of rules, ML algorithms analyze vast amounts of past information, identify hidden patterns and relationships, and then use those learnings to make predictions or decisions on new, unseen data.

In the context of marketing campaigns, ML becomes an incredibly powerful tool:

  • Data powerhouse: Imagine collecting every detail from thousands of past marketing campaigns: who saw them, what the ads looked like, where they were shown, how much they cost, and crucially, what the final outcome was (e.g., how many clicks, sales, or sign-ups they generated). ML algorithms can digest this colossal amount of data in seconds.
  • Pattern recognition: These algorithms don’t just store data; they look for correlations. Did campaigns with a specific type of image perform better with a certain age group? Does a particular headline style lead to more conversions on one platform versus another? ML can uncover these subtle yet powerful insights that human analysts might miss.
  • Predictive power: Once trained, an ML model can take the proposed details of a new campaign (e.g., its target audience, creative idea, intended platform) and predict its likely outcome. It can estimate click-through rates, conversion probabilities, or even the potential return on investment (ROI) before a single dollar is spent.

The benefits are transformative: marketers can make data-driven decisions, allocate budgets more efficiently, target the most receptive audiences with precision, and ultimately, launch campaigns with a much higher probability of success. It’s like having a detailed weather forecast for your rocket launch, helping you choose the perfect day and trajectory.

The multimodal challenge: Mixing apples, oranges, and billboards

In reality, and contrary to what many might assume, a campaign isn’t just a neat row of numbers on a spreadsheet; it’s a vibrant, messy mix of text, images, locations, and abstract concepts like brand identity. This presents a fundamental challenge: how do we empower AI to not just process, but truly understand and effectively connect these inherently different types of information to form a holistic view? For instance, how can an AI understand the interplay between the nuanced visual cues of a video Ad with the detailed socio-economic data of a specific target audience in a specific location?

The “secret sauce” is a technology called embeddings. Think of an embedding as a universal translator. It takes complex information, like the “feeling” of a brand or the intent of a sentence, and turns it into a list of numbers that an algorithm can easily digest. However, every piece of the campaign puzzle requires a different translation strategy.

Translating the campaign puzzle

To build a complete picture, we process each element through a specialized lens:

  • Audience, Platform, and Objective: We convert these categories into numerical “flavours.” This allows the AI to recognise the distinct profile of, for example, an Instagram awareness campaign versus a search engine lead-generation tactic.
  • Brand identity: We leverage the fact that Large Language Models (LLMs) already possess a wealth of knowledge about established brands. By feeding the AI a rich, descriptive profile of a brand, we create a deep numerical representation of its identity. This task is so nuanced that it led to the birth of our Brand Perception Atlas Pod.
  • Creative (Images): A picture may be worth a thousand words, but our models currently prefer numbers. To bridge this gap, we use AI to extract a highly detailed description of each image, which is then translated into data. We quickly discovered that the quality of these descriptions depends entirely on the instructions given to the AI. This led us to develop the Self-improving AI Agent.
  • Geography: Location is more than just a pin on a map. To capture the true essence of a region, we use advanced models that go beyond coordinates. In detail, Google’s PDFM (Pre-trained Deep Foundation Models) Embeddings are able to capture the social, economic, and demographic fabric of an area, providing the AI with the “soul” of a location rather than just its name.

Where does the data come from?

Real-world marketing data is essential, but on its own it is not enough for AI research. At WPP, we combine rich, real-world data with carefully engineered synthetic data to build and evaluate models more effectively. Real data grounds our work in genuine market behaviour, complexity, and business context. Synthetic data adds something equally important: control. It allows us to create the specific conditions we need to properly challenge, probe, and improve our models.

This matters because many of the scenarios that determine whether a model is truly robust are rare, emerging, or simply absent from historical records until the moment they become a real problem. To prepare for that, we deliberately generate datasets that introduce edge cases, shifting patterns, variable data volumes, heterogeneity, sparsity, and data drift. In other words, we use synthetic data to stress-test models in ways that real data alone cannot support, so they are more resilient, reliable, and ready for the real world.

To address this, we built a Synthetic Data Generator. Think of this as a high-fidelity flight simulator for marketing. Instead of testing our models only on the limited “flights” we’ve taken in the past, this tool creates realistic, artificial campaign data. This allows us to:

  • Train with precision: We can create scenarios that haven’t happened yet to see how the AI reacts.
  • Test the limits: We can stress-test our models against extreme market conditions without any real-world risk.
  • Ensure safety: We can evaluate performance using high-quality data that carries none of the privacy concerns of personal information.
  • Hold the answer key: Because we generate this artificial data from scratch, we already know the exact outcome (the “ground truth”) of every scenario. It’s like giving our AI a test where we already hold the perfect answer key, allowing us to verify its predictions and recommendations.

By “conjuring” this artificial data, we ensure our models are battle-tested and ready for the complexities of the live market.

From data to decisions: Empowering the expert

We’ve explored the “ingredients” and the “recipe,” but what does this actually look like in the hands of a marketing expert? Our goal isn’t just to crunch numbers; it’s to provide actionable recommendations that make experts more efficient and their campaigns more successful.

Imagine a strategist coming to the platform with a specific mission:

I’m launching a campaign for Brand B, targeting Audience A in Location X, with the objective of Increasing Awareness. What is the best platform and creative style to use?

To answer a question like this, we need more than a search engine, we need a Predictive Engine, our “crystal ball”.

Before we can offer a recommendation, we must train a Machine Learning (ML) model to understand performance. We teach it to look at millions of historical and synthetic data points to predict an outcome: Is this specific combination of elements likely to be Good, Average, or Bad?

There isn’t just one way to build this crystal ball. In our research, we explore a spectrum of algorithms, including both traditional models and modern techniques. Each approach offers its own set of advantages: some prioritise speed, while others prioritise pinpoint accuracy. By testing across this variety, we ensure that when an expert asks for a recommendation, the answer is backed by the most robust mathematical thinking available today.

1. The reliable workhorse: LightGBM

We started with a classic, high-speed approach called LightGBM. Think of this as a highly efficient logic tree. It’s fast, dependable, and excellent at spotting clear patterns in structured data. It serves as our “baseline”, the standard we aim to beat.

2. The specialist team: Neural Networks

Next, we built a more sophisticated system, based on Neural Network architectures, that works like a well-organized corporation. We divided the AI into two stages:

  • Specialized departments: Each type of data (like your brand identity or your creative images) is handled by its own “mini-expert” that decides which details are actually important.
  • The executive board: Once the experts have done their work, a central “manager”, which is called MLP, looks at all the reports together to make the final call: Will this campaign succeed?

In this category, we have experimented with multiple different architectures and techniques. For example, one of our best models, before making a prediction, it mathematically groups elements that “belong” together. If a specific high-energy image consistently drives high success when paired with a young, active audience, the model learns to pull those winning pieces closer. This not only makes the model smarter but also helps us give you much better recommendations for future pairings.

3. The language experts: LLMs

Finally, we tested whether a standard Large Language Model (like the ones used for chatbots) could do the job on its own. Interestingly, we found that “out-of-the-box” AI isn’t naturally great at these specific marketing predictions. However, when we provide specialized training (process called “Fine-Tuning”) their performance skyrockets, as evidenced by our research: From hype to impact: Predicting campaign performance with fine-tuned LLMs.

The verdict: Measuring impact

To evaluate our models and determine how accurately they predict campaign performance, we must first establish a rigorous testing ground. This involves two key components: the diversity of our data and the precision of our metrics.

Datasets

To ensure our findings aren’t just a “lucky” outlier, we don’t rely on a single source of information. Instead, we test every model against three different versions of our synthetic datasets. By proving that our models can perform consistently across various simulated environments, we can be confident that their predictive power is both reliable and adaptable to real-world shifts.

While each of our datasets shares a consistent structure, we have intentionally varied their internal characteristics to put our models through a rigorous stress test. By using our Synthetic Data Generator, we can precisely control three key variables to create progressively more challenging environments:

  • Volume: Testing how the models perform with both limited information and vast amounts of data.
  • Balance: Adjusting the “label distribution”. For example, creating datasets where “average” results are far more common than clear successes or failures, to reflect the reality of a crowded market.
  • Signal strength: Tuning how obvious or subtle the patterns are, which forces the models to work harder to find the winning combinations.

This approach ensures that our models aren’t just memorizing easy patterns, but are truly learning to find value in complex, “noisy” environments where the right answer isn’t always obvious.

Model performance

When it comes to measuring performance, we use a standard industry benchmark known as the F1 score, because simple accuracy can be a liar. Imagine you have a box of 100 fruits: 10 are apples and 90 are oranges. You build a robot to grab only the apples. If the robot sits still and does nothing, it is technically “90% accurate” because it correctly ignored the 90 oranges, but it’s a total failure at its job. The F1 score exposes this by balancing two hidden grades:

  • Precision (the “quality” grade): When the robot grabs a fruit and says “Apple,” is it right? High precision means it never accidentally grabs an orange.
  • Recall (the “completeness” grade): Did the robot find all 10 apples, or did it leave some behind? High recall means the robot is thorough and doesn’t miss any.

The F1 score is a single number that averages these two. Unlike a normal average, it “punishes” extreme failure. If your robot is perfectly accurate but misses every single apple, its F1 score will be 0. This gives us a much more honest picture of how well a model actually works in the real world. To circle back to our case, we use the F1 scores in two ways:

  • The big picture: We report the Average F1 score across the entire dataset to show overall model health.
  • Performance by category: We break down results into three specific classes: Negative, Average, and Positive.

This granular view is where the true business value lies. It allows us to ensure the model excels at the extremes, identifying the “Negative” combinations a marketer should avoid at all costs, and the “Positive” combinations that will truly drive results beyond the status quo.

To bring structure to our innovation, we developed a centralized model Leaderboard. This platform serves as the definitive “source of truth” for our research team, ensuring that every breakthrough is measured against the same rigorous standards. The Leaderboard allows team members to download standardized training and testing splits for any dataset (whether real or synthetic) and submit their results for comparison. By centralizing our findings in one place, we achieve several key advantages:

  • True comparability: We can be certain that we are comparing equals across different algorithms and techniques.
  • Accelerated testing: It allows us to quickly and safely iterate on new ideas without reinventing the wheel.
  • Institutional knowledge: It creates a permanent record of our progress, ensuring that the best-performing models are always visible and ready to be deployed.

This structured environment is what allows us to move from individual experiments to a scalable, high-efficiency engine for marketing AI.

With our datasets defined and our Leaderboard in place, we put our models to the ultimate test. By measuring how each approach handled “Negative,” “Average,” and “Positive” campaign outcomes, we can clearly see which strategies offer the most reliable path to success.

Here is a glimpse of how our top models performed across the board:

DatasetModelOverall F1Neg F1Avg F1Pos F1
Small and easyTree-based85.11%86.4%80.21%88.73%
Deep Learning (v1)86.26%89.81%82.51%86.44%
Deep Learning (v2)89.47%91.25%85.95%91.22%
Small and slightly noisyTree-based85.54%87.48%80.92%88.21%
Deep Learning (v1)84.80%89.41%80.50%84.50%
Deep Learning (v2)88.84%91.13%85.26%90.14%
Big and slightly noisyTree-based77.40%76.40%81.47%74.34%
Deep Learning (v1)80.74%80.88%86.10%75.25%
Deep Learning (v2)81.29%82.25%84.58%77.03%
F1 score Performance of Top Models (Tree-based, Deep Learning v1, and Deep Learning v2) Across Varied Datasets

Analysing the Leaderboard: Reliability at scale

The results from our testing provide a clear picture of how these models handle real-world complexity.

Our tree-based model remains a formidable workhorse, maintaining an F1 score above 85%. Most importantly, it demonstrates high accuracy in identifying “Positive” and “Negative” outcomes. This means the model is exceptionally reliable at flagging the two things marketers care about most: which campaigns are likely to be massive successes and which ones are headed for failure. While performance naturally dips as we introduce more noise and scale into the datasets, its baseline remains impressively high.

While the classic models are strong, both of our Deep Learning approaches consistently take the lead. These models perform better because of their inherent capacity for “relational intelligence”, they can spot the subtle, complex connections that simpler, logic-based systems often miss.

As the datasets grow larger and the patterns become more “noisy,” this deep understanding becomes a critical advantage. Seeing these models maintain performance above the 80% mark, even in the most challenging scenarios, gives us the confidence that our AI can handle high complexity scenarios.

The privacy puzzle: Learning without sharing

While our research shows how powerful these models can be, a significant question remains: How do we build an elite AI that learns from everyone, without exposing anyone’s private data?

In the traditional world of marketing, building a “super-brain” meant pooling all client data into one giant, central database. In today’s world, that is a massive privacy red flag. We believe you shouldn’t have to choose between competitive intelligence and data security. To solve this, we utilize a cutting-edge approach called Federated Learning (FL).

Think of Federated Learning like a team of specialized doctors working in different hospitals. To find a cure for a new disease, they don’t send their private patient files to a central office, that would be a breach of trust. Instead, each doctor studies their own patients locally, they discover what works and what doesn’t and then they share only the “recipe for the cure” with their colleagues, never the patient’s identity.

In our ecosystem, each client trains the AI model locally on their own private data. The “lessons learned” are sent back to our main server, where they are combined to create a smarter, globally-informed model for everyone.

The result? You benefit from a model that has “seen” millions of scenarios, yet your private data never leaves your hands.

Discover more about Federated Learning by reading our post: Training together, sharing nothing: The promise of Federated Learning.

Looking to the Future: The Next Frontier

Our journey doesn’t end with a successful prediction. We are already exploring the next horizon of marketing intelligence, moving from understanding the past to actively designing the future. Here is what we are building next:

1. Infusing data with “common sense”

What if our models understood human psychology as well as they understand spreadsheets? We are exploring ways to inject the broad, contextual knowledge of LLMs directly into our training data. This gives our models a “common sense” layer, allowing them to understand the subtle cultural and social nuances that drive human behavior. Learn more about it in our post: The uncharted territory: Beyond the known data

2. AI building AI

We believe the best architect for a complex model might be the AI itself. Instead of manually designing every layer of a neural network, we are using advanced systems to automatically discover the ultimate model structure for marketing predictions. This process of “digital evolution”, which we delve into further in our AlphaEvolve article, ensures our tech is always one step ahead.

3. From predictions to proactive recommendations

We are currently building a tool that doesn’t just predict success, it suggests it. Imagine entering your brand and target audience and having the AI instantly recommend the perfect visual or message. We are perfecting this using two unique methods:

  • The Matchmaker space: Utilizing the “relational map” we built in our earlier phases to instantly pair your audience with the creative assets they are most likely to love.
  • The “hot or cold” optimisation: We treat our AI like a high-precision compass. If you have a brand and an audience but are missing the right “creative,” the system rapidly tests thousands of variations. It plays a high-speed game of “hot or cold” until it locks onto the highest possible performance score.

By moving from educated guesswork to advanced, multimodal AI, we are finally bridging the gap between creative intuition and measurable results. The rocket is fueled, the coordinates are set, and the launch sequence has officially begun.

Ready to explore the specifics? Read our full technical deep dive into Multimodal Fusion Models for a closer look at our methodology.

Disclaimer: This content was created with AI assistance. All research and conclusions are the work of the WPP Research team.

Authors

  • Jael is a Data Scientist at Satalia, leveraging her physics background for a deep analytical foundation in complex systems analysis and modeling. Her experience, spanning foundational research in computational physics and a proven track record in data science consultancy, provides a unique perspective for architecting robust, scalable models in intricate environments. In Satalia’s Research Lab, she bridges scientific methodology with industrial innovation to address WPP’s most sophisticated data challenges. Her current research focuses on multimodal fusion models, aiming to improve campaign performance and pioneer state-of-the-art machine learning.

  • Eirini is a Data Scientist at Satalia with a multidisciplinary background in Management Science and Computer Science. She specialises in architecting end-to-end data science solutions, leveraging a deep technical toolkit to solve complex industrial challenges across diverse sectors. Known for bridging the gap between theoretical research and scalable application, she focuses on delivering high-impact models that translate abstract data patterns into actionable strategic intelligence.

    Her current research focuses on sophisticated campaign performance multimodal modelling and the development of data enrichment frameworks to maximise predictive accuracy.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *