Bayesian Marketing Mix Modeling in Python via PyMC3 (2023)

Bayesian Marketing Mix Modeling in Python via PyMC3 (1)


December 30, 2021

read 10 minutes

(Video) Bayesian Marketing Mix Models: State of the Art and their Future

practical tutorials

Calculate saturation, drag and other parameters at once, including their uncertainty

Bayesian Marketing Mix Modeling in Python via PyMC3 (2)

WithinFor this article, I want to combine two concepts that I've discussed in previous posts:Bayesian modeledjMarketing Mix Modeling馃嚙馃嚪 Since you are probably unfamiliar with these two topics, I would like to give you a brief introduction and some further reading. I go to

  1. motivate what marketing mix modeling is,
  2. What is Bayesian modeling and
  3. Why does it make sense to combine both?

So I'll show you how it works in practicePyMC3.

If you are an avid reader of my articles (隆gracias!), you can skip some sections to go straight to the code. If not, read on.

Marketing Mix Modeling

A fundamental problem of every company is the decision about whatchannelsspend the marketing budget. You could spend 鈧1,000 on TV ads, 鈧2,000 on radio ads and 鈧3,000 on web banners every day, following your intuition. But that's good?

Maybe the web banner channeloverfilledYes, and if you only spend 鈧1,500 it's even 鈧3,000. Then you can save 鈧1,500 or put it into other higher-performing channels to generate more sales.

Or maybe a channel has onenegative ROI鈥 For every euro you spend on ads, you get back less than one euro. We definitely shouldn't waste a lot of money on this channel, at least if it's not strategically important from a business point of view.

To answer questions like these, you need to understand how differentmedia costs(TV, radio, 鈥)Impact on your salesu other KPI of interest.

In marketing mix modeling, you start with a media spend dataset. It usually spreads with somecontrol variables, i.e. more information about everything that could affect the target KPI, such as holidays, weather, football championships, closures, the price of a product and much more. However, we omit the control variables here for the sake of clarity. Then of course you need a KPI that you want to predict. It is often about sales, number of new customers, etc. A typical data set could look like this:

Bayesian Marketing Mix Modeling in Python via PyMC3 (3)

In my previous articles I describe the motivation and the design of the marketing mix in more detail. To understand the rest of this article, check out both here:

Introduction to Marketing Mix Modeling in PythonWhich Ad Spend Really Drives Your Sales? towards
An Improved Marketing Mix Model in PythonMake my lackluster Marketing Mix model much more powerful for Data

Bayesian modeled

Many estimators and models are derived from amaximum probabilityGetting closer. For example, imagine you want to estimate the probabilitypa coin with heads. Youspin 10 timesand see8 headswhat do you close A natural probability estimate is thenp= 8/10 = 80%, which is also the maximum likelihood estimate. You can also calculate a confidence interval to see if this estimate is reliable, but let's go the other way.

Imagine we want to integrate someprior knowledgeabout probabilityp馃嚙馃嚪 For example, if you accidentally took the coin from your wallet, there is no reason to think that the coin is crookedpit shouldn't be too far from 50%, assuming you're not a mage.

With Bayesian modeling, you can incorporate this prior knowledge to: adensityEstimate fromp, so not a single value, but a complete distribution. This distribution will likely peak somewhere between the maximum likelihood estimate and the previous one, perhaps 65%.

Bayesian Marketing Mix Modeling in Python via PyMC3 (4)
(Video) A Bayesian Approach to Media Mix Modeling (Michael Johns & Zhenyu Wang)

In short, Bayesian modeling is about finding a balance between prior knowledge and observed data. In the figure above, this means the following: With no data, we start with the blue curve. It's just a belief, a feeling. Next, let's look at the data that tells us to push the blue curve closer to the red. We end with the yellow hybrid curve, which represents the so-calledrearDistribution.

You can read more about motivation here:

A Gentle Introduction to Bayesian Inference Learn more about the difference between frequentist and Bayesian reasoning at data

Well, understanding the theory is good, but we also need to be able to apply it to get things done. I usually use an awesome Python libraryPyMC3for Bayesian modelling. You can see it in action here:

Bayesian Linear Regression in Python via PyMC3Learn how to derive model parameters and make predictions for new data, including uncertainty estimates.

Why model the marketing mix with Bayes?

You can define marketing mix templates with many hyper parameters:

  • saturation
  • traction
  • drag length

You can use a hyperparameter tuning method to find the best match. I did that in my other article on Marketing Mix Modeling,An updated marketing mix model in Python.

This approach works well, but there's something I don't like:

Hyperparameter estimates are often unstable.

This means that completely different sets of hyperparameters can produce equally good models.might have

  • Model Awith a TV transmit power of 0.4 and TV saturation of 0.8, and
  • Modell Bwith a TV transmission strength of 0.9 and a TV saturation of 0.5,

both have the samer虏 or MAPE in the test set. From a forecasting point of view, both models are interchangeable,if you stay within the marketing spend limits you've seen so far.

However, the extrapolation with model A is completely different than the extrapolation with model B. And this is a very unsatisfactory and problematic behavior because extrapolation is like thatawhat to do when optimizing your media budget. If you have spent 0-1000鈧 per day on TV advertising in the past, to optimize you need to know what happens when you spend 5000鈧 or even 10000鈧.

You need a model with excellent extrapolation skills.

And usually you have to choose between more than two models. In this case, you can proceed in at least two ways.

  • You may choose the first template you create because you are not even aware of the problem. This approach is simple but dangerous.
  • You can choose a template thatfeels rightfor you, some domain experts or prospects. This is fine for some people, but I prefer not to include the performance expectations in the model due to the following issue:

If someone already knows the answer, why would they build a model that just reproduces that answer?

There might also be ways to do something sensible from here, but for now I want to show you how to avoid this problem using the Bayesian model.

First, let's get our dataset.

Import pandas as pd

data = pd.read_csv (

X = data.drop(columns=['Vendas'])
y = cube['bandages']

Next, we need to define the saturate and drag functions, similar to what we did in the previous article. In PyMC3 language it might look like this:

import theano.tensor als tt

def saturated(x, a):
developed 1 - tt.exp(-a*x)

def remaining(x, force, length = 21):
w = tt.as_tensor_variable(
[tt.power(fuerza, i) for i in range(longitude)]

x_lags = tt.pila(
]) for i in range (length)]

devolve, x_lags)

The saturation function should be easy to understand. The rest is a bit of work though. Basically you can express drag transform as matrix-vector multiplication. All you have to do is assemble the array.x_lagsand the vectorWFirst. As an example, we can transform the input vectorx= (x拢,x$,x鈧,x鈧) with a remaining length of 3 ways

(Video) Bayesian #4: Bayesian linear regression using Python Package pymc3

Bayesian Marketing Mix Modeling in Python via PyMC3 (5)

LosContinueThe function in the code above does just that. When we have these functions, we can finally start modeling.

importiere pymc3 als pm

with pm.Model() like hmm:
channel contribution = []

for the channel in X.columns:
coef = pm.Exponential(f'coef_{canal}', lam=0,0001)
sat = pm.Exponential(f'sat_{canal}', lam=1)
auto = pm.Beta(f'car_{channel}', alpha=2, beta=2)

channel data = X[channel].values
channel_contribution = pm.Deterministisch (
coef * saturate (
data channel,
sat down


basis = pm.Exponential('basis', lam=0.0001)
Noise = pm.Exponential('Noise', lam=0.0001)

Sales = pm.Normal(
mu=sum(channel_contributions) + base,
observed = y

rastreamento = pm.sample(return_inferencedata=True, tune=3000)

We see all parameters (no more hyperparameters!) marked in bold. These are the regression coefficients, saturation power, drag power, baseline, and noise.

Note that I didn't consider the transfer length, I prefer to set it to 21. That's because I still haven't figured out how to create matrices and vectors inCarFunction with variable dimensions in PyMC3. But normally you would use a Poisson random variable for the remaining length. Write to me if you know how to do it right!馃槈It might also be a good time to try Pyro, another probabilistic programming language like PyMC3.

model output analysis

Then we can look at the usual pictures. Let's start with later distributions. constantly

Import the file as is



Bayesian Marketing Mix Modeling in Python via PyMC3 (6)

Here you can see the sequence of all parameters. Everyone has a sweetunimodal(= a beak) form. You can also study how pairs of variables behave together via

var_names=['coef_TV', 'sat_TV'],
Bayesian Marketing Mix Modeling in Python via PyMC3 (7)

Here you can see that the saturation level and the regression coefficient are not independent of each other, but are negatively correlated: the higher the coefficient, the lower the saturation parameter tends to be. This makes sense because a higher coefficient can compensate for a slower saturation curve up (= lowersat_tv) and vice versa.

Let's look at another:

var_names=['car_TV', 'sat_TV'],
Bayesian Marketing Mix Modeling in Python via PyMC3 (8)

Here we can see why hyperparameter optimization can have problems. Each point in this image is a potential pattern that you can find using hyperparameter optimization.

For a better and truly unique model, we prefer to see a point cloud heavily concentrated around a single point (car_TV_true, sat_TV_true). But here we see that the TV transmission strength can have meaningful values 鈥嬧媌etween 0.4 and 0.5 depending on the saturation parameter.

We can also check if the model is good before jumping to conclusions. constantly

import matplotlib.pyplot as plt

com mmm:
posterior = pm.sample_posterior_predictive(Strich)

(Video) Probabilistic Python: An Introduction to Bayesian Modeling with PyM || Chris Fonnesbeck

mean = return['sales'].average(0)
stds = posterior['ventas'].std(0)

plt.figure(figsize=(20, 8))
plt.plot(y.values, linewidth=2, c='r', label='Observa莽玫es');
plt.plot(means, line width=1, c='b', label='mittlere Prognose')
plt.fill_ between(np.arange(len(y)), significa - 2*stds, significa + 2*stds, alfa=0,33)

give us

Bayesian Marketing Mix Modeling in Python via PyMC3 (9)

So it looks like the model has picked up something useful. I won't go into detail on how to test model performance here, we might do that in the future.

channel posts

So far we have dealt with distributions, but for the image of the posts of our favorite channel we take the mean to end up with a single value again. As we have introduced some channel contribution variables into the PyMC3 code, now we can simply use them with a smallcalculate_averageProfession.

def compute_media(Trace, Kanal):
return (track)
.reform(4000, 200)

channels = ['banner', 'radio', 'tv']
unadj_contributions = pd.DataFrame(
{'Basis': back.trace['base'].average.values()},

for channel on channels:
unadj_contributions[canal] = compute_mean(rastreo, canal)

adj_posts = (unadj_posts
.div(unadj_posts.sum(eje=1), eje=0)
.mul(y, axis=0)

machado = (adj_contributions
figure size = (16, 10),
line width = 1,
title='Sales Forecast and Breakdown',

Handles, labels = ax.get_legend_handles_labels()
handle[::-1], label[::-1],
title='Channels', loc="Middle Left",
bbox_to_anchor=(1.01, 0.5)

Bayesian Marketing Mix Modeling in Python via PyMC3 (10)

Seems to be good!

In this article, we discussed that marketing mix modeling with a maximum likelihood approach can be difficult due to hyperparameter estimation. There may be many models that work well in the test set, but have completely different extrapolation behavior: (hyper)parameters are quite unstable.

Proper extrapolation, however, is key to optimization. Therefore, we designed a basic marketing mix model with saturation and drag effects in the Bayesian environment. This was useful as it estimates all parameters at once and provides more stable parameter estimates.

We then implemented it with PyMC3 and recreated a nice contribution chart to see how much channel spend contributed to revenue.

Template Extensions

Now we can further extend the Bayesian model we created. For example, we can include changing parameters over time. This can be useful when the TV broadcast strength started at 0.8 two years ago but has slowly decreased to 0.5 over time, which is an examplearises from the concept馃嚙馃嚪 We can easily model this by usingGaussian wanderings, as shown in my other article on mobile regression:

Rockin' Rolling Regression in Python via PyMC3 Learn how to use variable parameters at

We can model variable baselines in the same way. So far we've treated the baseline as a single fixed number for the entire training period. But maybe it's also increased over time due to our strategic promotional efforts. This is something that is difficult to solve with our old hyperparameter-optimized maximum likelihood models.

I hope you learned something new, interesting and useful today. Thank you for reading!

As a last point, if you

  1. would like to help me to write more about machine learning and
  2. plan a medium subscription anyway,

why not dovia this link馃嚙馃嚪 It would help me a lot! 馃嚙馃嚪

To be transparent, the price to you doesn't change, but about half of the subscription fees go directly to me.

Thank you if you consider supporting me!

If you have any questions, write to meLinkedIn!

(Video) Thomas Wiecki - Solving Real-World Business Problems with Bayesian Modeling | PyData London 2022


1. Bayesian generative modelling of stock returns
(Simon Ouellette)
2. Thomas Wiecki - Probablistic Programming Data Science with PyMC3
3. Football Analytics Using Hierarchical Bayesian Models in PyMC | PyData Global 2021
4. Thomas Wiecki - Probablistic Programming Data Science with PyMC3
5. Webinar: Marketing Mix Optimization with Bayesian Networks
6. Thomas Wiecki - Probabilistic Programming in Python
(EuroPython Conference)


Top Articles
Latest Posts
Article information

Author: The Hon. Margery Christiansen

Last Updated: 03/16/2023

Views: 6161

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.