consequences
December 30, 2021
·
read 10 minutes
·
practical tutorials
Calculate saturation, drag and other parameters at once, including their uncertainty
WithinFor this article, I want to combine two concepts that I've discussed in previous posts:Bayesian modeledjMarketing Mix Modeling🇧🇷 Since you are probably unfamiliar with these two topics, I would like to give you a brief introduction and some further reading. I go to
- motivate what marketing mix modeling is,
- What is Bayesian modeling and
- Why does it make sense to combine both?
So I'll show you how it works in practicePyMC3.
If you are an avid reader of my articles (¡gracias!), you can skip some sections to go straight to the code. If not, read on.
Marketing Mix Modeling
A fundamental problem of every company is the decision about whatchannelsspend the marketing budget. You could spend €1,000 on TV ads, €2,000 on radio ads and €3,000 on web banners every day, following your intuition. But that's good?
Maybe the web banner channeloverfilledYes, and if you only spend €1,500 it's even €3,000. Then you can save €1,500 or put it into other higher-performing channels to generate more sales.
Or maybe a channel has onenegative ROI— For every euro you spend on ads, you get back less than one euro. We definitely shouldn't waste a lot of money on this channel, at least if it's not strategically important from a business point of view.
To answer questions like these, you need to understand how differentmedia costs(TV, radio, …)Impact on your salesu other KPI of interest.
In marketing mix modeling, you start with a media spend dataset. It usually spreads with somecontrol variables, i.e. more information about everything that could affect the target KPI, such as holidays, weather, football championships, closures, the price of a product and much more. However, we omit the control variables here for the sake of clarity. Then of course you need a KPI that you want to predict. It is often about sales, number of new customers, etc. A typical data set could look like this:
In my previous articles I describe the motivation and the design of the marketing mix in more detail. To understand the rest of this article, check out both here:
Bayesian modeled
Many estimators and models are derived from amaximum probabilityGetting closer. For example, imagine you want to estimate the probabilitypa coin with heads. Youspin 10 timesand see8 headswhat do you close A natural probability estimate is thenp= 8/10 = 80%, which is also the maximum likelihood estimate. You can also calculate a confidence interval to see if this estimate is reliable, but let's go the other way.
Imagine we want to integrate someprior knowledgeabout probabilityp🇧🇷 For example, if you accidentally took the coin from your wallet, there is no reason to think that the coin is crookedpit shouldn't be too far from 50%, assuming you're not a mage.
With Bayesian modeling, you can incorporate this prior knowledge to: adensityEstimate fromp, so not a single value, but a complete distribution. This distribution will likely peak somewhere between the maximum likelihood estimate and the previous one, perhaps 65%.
In short, Bayesian modeling is about finding a balance between prior knowledge and observed data. In the figure above, this means the following: With no data, we start with the blue curve. It's just a belief, a feeling. Next, let's look at the data that tells us to push the blue curve closer to the red. We end with the yellow hybrid curve, which represents the so-calledrearDistribution.
You can read more about motivation here:
Well, understanding the theory is good, but we also need to be able to apply it to get things done. I usually use an awesome Python libraryPyMC3for Bayesian modelling. You can see it in action here:
Why model the marketing mix with Bayes?
You can define marketing mix templates with many hyper parameters:
- saturation
- traction
- drag length
- …
You can use a hyperparameter tuning method to find the best match. I did that in my other article on Marketing Mix Modeling,An updated marketing mix model in Python.
This approach works well, but there's something I don't like:
Hyperparameter estimates are often unstable.
This means that completely different sets of hyperparameters can produce equally good models.might have
- Model Awith a TV transmit power of 0.4 and TV saturation of 0.8, and
- Modell Bwith a TV transmission strength of 0.9 and a TV saturation of 0.5,
both have the samer² or MAPE in the test set. From a forecasting point of view, both models are interchangeable,if you stay within the marketing spend limits you've seen so far.
However, the extrapolation with model A is completely different than the extrapolation with model B. And this is a very unsatisfactory and problematic behavior because extrapolation is like thatawhat to do when optimizing your media budget. If you have spent 0-1000€ per day on TV advertising in the past, to optimize you need to know what happens when you spend 5000€ or even 10000€.
You need a model with excellent extrapolation skills.
And usually you have to choose between more than two models. In this case, you can proceed in at least two ways.
- You may choose the first template you create because you are not even aware of the problem. This approach is simple but dangerous.
- You can choose a template thatfeels rightfor you, some domain experts or prospects. This is fine for some people, but I prefer not to include the performance expectations in the model due to the following issue:
If someone already knows the answer, why would they build a model that just reproduces that answer?
There might also be ways to do something sensible from here, but for now I want to show you how to avoid this problem using the Bayesian model.
First, let's get our dataset.
Import pandas as pddata = pd.read_csv (
'https://raw.githubusercontent.com/Garve/datasets/4576d323bf2b66c906d5130d686245ad205505cf/mmm.csv',
parse_dates=['dates'],
index_col='data'
)
X = data.drop(columns=['Vendas'])
y = cube['bandages']
Next, we need to define the saturate and drag functions, similar to what we did in the previous article. In PyMC3 language it might look like this:
import theano.tensor als ttdef saturated(x, a):
developed 1 - tt.exp(-a*x)
def remaining(x, force, length = 21):
w = tt.as_tensor_variable(
[tt.power(fuerza, i) for i in range(longitude)]
)
x_lags = tt.pila(
[tt.concatenate([
tt.ceros(i),
x[:x.forma[0]-i]
]) for i in range (length)]
)
devolve tt.dot(w, x_lags)
The saturation function should be easy to understand. The rest is a bit of work though. Basically you can express drag transform as matrix-vector multiplication. All you have to do is assemble the array.x_lags
and the vectorW
First. As an example, we can transform the input vectorx= (x£,x$,x€,x₄) with a remaining length of 3 ways
LosContinue
The function in the code above does just that. When we have these functions, we can finally start modeling.
importiere pymc3 als pmwith pm.Model() like hmm:
channel contribution = []
for the channel in X.columns:
coef = pm.Exponential(f'coef_{canal}', lam=0,0001)
sat = pm.Exponential(f'sat_{canal}', lam=1)
auto = pm.Beta(f'car_{channel}', alpha=2, beta=2)
channel data = X[channel].values
channel_contribution = pm.Deterministisch (
f'post_{channel}',
coef * saturate (
Continue(
data channel,
Coach
),
sat down
)
)
channel_contributions.append(channel_contribution)
basis = pm.Exponential('basis', lam=0.0001)
Noise = pm.Exponential('Noise', lam=0.0001)
Sales = pm.Normal(
'Sale',
mu=sum(channel_contributions) + base,
sigma=noise,
observed = y
)
rastreamento = pm.sample(return_inferencedata=True, tune=3000)
We see all parameters (no more hyperparameters!) marked in bold. These are the regression coefficients, saturation power, drag power, baseline, and noise.
Note that I didn't consider the transfer length, I prefer to set it to 21. That's because I still haven't figured out how to create matrices and vectors in
Car
Function with variable dimensions in PyMC3. But normally you would use a Poisson random variable for the remaining length. Write to me if you know how to do it right!😉It might also be a good time to try Pyro, another probabilistic programming language like PyMC3.
model output analysis
Then we can look at the usual pictures. Let's start with later distributions. constantly
Import the file as isaz.plot_posterior(
Path,
var_names=['~post'],
filter_vars='Like'
)
income
Here you can see the sequence of all parameters. Everyone has a sweetunimodal(= a beak) form. You can also study how pairs of variables behave together via
az.plot_joint(
Path,
var_names=['coef_TV', 'sat_TV'],
)
Here you can see that the saturation level and the regression coefficient are not independent of each other, but are negatively correlated: the higher the coefficient, the lower the saturation parameter tends to be. This makes sense because a higher coefficient can compensate for a slower saturation curve up (= lowersat_tv
) and vice versa.
Let's look at another:
az.plot_joint(
Path,
var_names=['car_TV', 'sat_TV'],
)
Here we can see why hyperparameter optimization can have problems. Each point in this image is a potential pattern that you can find using hyperparameter optimization.
For a better and truly unique model, we prefer to see a point cloud heavily concentrated around a single point (car_TV_true, sat_TV_true). But here we see that the TV transmission strength can have meaningful values between 0.4 and 0.5 depending on the saturation parameter.
We can also check if the model is good before jumping to conclusions. constantly
import matplotlib.pyplot as pltcom mmm:
posterior = pm.sample_posterior_predictive(Strich)
(Video) Probabilistic Python: An Introduction to Bayesian Modeling with PyM || Chris Fonnesbeckmean = return['sales'].average(0)
stds = posterior['ventas'].std(0)
plt.figure(figsize=(20, 8))
plt.plot(y.values, linewidth=2, c='r', label='Observações');
plt.plot(means, line width=1, c='b', label='mittlere Prognose')
plt.fill_ between(np.arange(len(y)), significa - 2*stds, significa + 2*stds, alfa=0,33)
plt.legend()
give us
So it looks like the model has picked up something useful. I won't go into detail on how to test model performance here, we might do that in the future.
channel posts
So far we have dealt with distributions, but for the image of the posts of our favorite channel we take the mean to end up with a single value again. As we have introduced some channel contribution variables into the PyMC3 code, now we can simply use them with a smallcalculate_average
Profession.
def compute_media(Trace, Kanal):
return (track)
.post[f'post_{channel}']
.Values
.reform(4000, 200)
.media(0)
)channels = ['banner', 'radio', 'tv']
unadj_contributions = pd.DataFrame(
{'Basis': back.trace['base'].average.values()},
index=X.index
)
for channel on channels:
unadj_contributions[canal] = compute_mean(rastreo, canal)
adj_posts = (unadj_posts
.div(unadj_posts.sum(eje=1), eje=0)
.mul(y, axis=0)
)
machado = (adj_contributions
.Area(
figure size = (16, 10),
line width = 1,
title='Sales Forecast and Breakdown',
ylabel='Sale',
xlabel='data'
)
)
Handles, labels = ax.get_legend_handles_labels()
axe.legend(
handle[::-1], label[::-1],
title='Channels', loc="Middle Left",
bbox_to_anchor=(1.01, 0.5)
)
Seems to be good!
In this article, we discussed that marketing mix modeling with a maximum likelihood approach can be difficult due to hyperparameter estimation. There may be many models that work well in the test set, but have completely different extrapolation behavior: (hyper)parameters are quite unstable.
Proper extrapolation, however, is key to optimization. Therefore, we designed a basic marketing mix model with saturation and drag effects in the Bayesian environment. This was useful as it estimates all parameters at once and provides more stable parameter estimates.
We then implemented it with PyMC3 and recreated a nice contribution chart to see how much channel spend contributed to revenue.
Template Extensions
Now we can further extend the Bayesian model we created. For example, we can include changing parameters over time. This can be useful when the TV broadcast strength started at 0.8 two years ago but has slowly decreased to 0.5 over time, which is an examplearises from the concept🇧🇷 We can easily model this by usingGaussian wanderings, as shown in my other article on mobile regression:
We can model variable baselines in the same way. So far we've treated the baseline as a single fixed number for the entire training period. But maybe it's also increased over time due to our strategic promotional efforts. This is something that is difficult to solve with our old hyperparameter-optimized maximum likelihood models.
I hope you learned something new, interesting and useful today. Thank you for reading!
As a last point, if you
- would like to help me to write more about machine learning and
- plan a medium subscription anyway,
why not dovia this link🇧🇷 It would help me a lot! 🇧🇷
To be transparent, the price to you doesn't change, but about half of the subscription fees go directly to me.
Thank you if you consider supporting me!
If you have any questions, write to meLinkedIn!
(Video) Thomas Wiecki - Solving Real-World Business Problems with Bayesian Modeling | PyData London 2022