Last year, I noticed that Snickers bars seem to taste different in different countries, but I was not sure. So my partner Nellissa and I conducted a little experiment that involved a lot of chocolate and a little Bayesian statistics.

We wanted to establish whether Snickers bars from different countries taste different or not. To this end, we collected three Snickers bars, one from England (GB), one from Germany (DE), and one from Vietnam (VN). There are five plausible hypothesis:

  1. All Snickers bars taste the same: \mathcal{H}_{=}
  2. All Snickers bars taste different: \mathcal{H}_{\neq}
  3. The German and English bars are identical, but the Vietnamese is different: \mathcal{H}_{VN}
  4. The German and Vietnamese bars are identical, but the English is different: \mathcal{H}_{GB}
  5. The English and Vietnamese bars are identical, but the German is different: \mathcal{H}_{DE}
FIGURE 1: Slicing up Snickers bars from three different countries in preparation of the experiment.

During the experiment we sliced the three bars into 12 slices each. Then we randomly paired slices and assessed whether they tasted the same or not. For example, one measurement may result in the observation \rm{DE} = \rm{GB} and another measurement may result in \rm{GB} \neq \rm{VN}. In addition, we also compare Snickers bars to themselves, so we might obtain \rm{VN} = \rm{VN}. We can perform a Bayesian update (explained below) on the above hypotheses after each measurement.

The measurement is subjective and may be affected by substantial noise. In particular, given two samples the experimenter may decide they taste different, even though they are equal, or vice versa. For simplicity, we assume a single failure rate \epsilon for all experiments. Given one sample pair, the probability that any experimenter misjudges the equality of the samples is \epsilon, thus \epsilon \in [0,1].

To make the five hypotheses that we have introduced above more explicit, we write down the outcome probabilities of an experiment (likelihoods), given that the hypothesis k is true and \epsilon is known.

For the all-equal hypothesis \mathcal{H}_{=} we have:

P(\rm{DE} = \rm{UK}\,|\,\mathcal{H}_{=}, \epsilon) = P(\rm{DE} = \rm{VN}\,|\,\mathcal{H}_{=}, \epsilon) = P(\rm{UK} = \rm{VN}\,|\,\mathcal{H}_{=}, \epsilon) = 1 - \epsilon\;, P(\rm{DE} \neq \rm{UK}\,|\,\mathcal{H}_{=}, \epsilon) = P(\rm{DE} \neq \rm{VN}\,|\,\mathcal{H}_{=}, \epsilon) = P(\rm{UK} \neq \rm{VN}\,|\,\mathcal{H}_{=}, \epsilon) = \phantom{1 - } \epsilon\;.

For the \mathcal{H}_{VN} hypothesis we have instead:

P(\rm{DE} = \rm{UK}\,|\,\mathcal{H}_{VN}, \epsilon) = P(\rm{DE} \neq \rm{VN}\,|\,\mathcal{H}_{VN}, \epsilon) = P(\rm{UK} \neq \rm{VN}\,|\,\mathcal{H}_{VN}, \epsilon) = 1 - \epsilon\;, P(\rm{DE} \neq \rm{UK}\,|\,\mathcal{H}_{VN}, \epsilon) = P(\rm{DE} = \rm{VN}\,|\,\mathcal{H}_{VN}, \epsilon) = P(\rm{UK} = \rm{VN}\,|\,\mathcal{H}_{VN}, \epsilon) = \phantom{1 - } \epsilon\;,

and so on.

Before we performed the experiment, we formulated the following prior beliefs. We assigned equal probabilities to all five hypothesis, i.e.

P(\mathcal{H}_{=}) = P(\mathcal{H}_{\neq}) = P(\mathcal{H}_{\rm{GB}}) = P(\mathcal{H}_{\rm{DE}}) = P(\mathcal{H}_{\rm{VN}}) = 1/5

In addition, we were uncertain about the failure rate \epsilon. Therefore, we split each of the five hypotheses into sub-hypotheses with different values for \epsilon. Specifically, we define the cases \{\epsilon \in [0.0,0.1], \epsilon \in [0.1,0.2], \dots, \epsilon \in [0.9,1.0]\} and assign the following prior probabilities to the different epsilons:

FIGURE 2: Prior probability mass distribution for the failure rate \epsilon. The probabilities are chosen subjectively, arguing that our failure rate is probably not greater than 50%, but most likely around 25%. Of course we might be over-confident in our sense of taste, but the data will show if that is the case.
P(\epsilon \in [0.0, 0.1]) = 10/81 P(\epsilon \in [0.1, 0.2]) = 13/81 P(\epsilon \in [0.2, 0.3]) = 16/81 P(\epsilon \in [0.3, 0.4]) = 14/81 P(\epsilon \in [0.4, 0.5]) = 12/81 P(\epsilon \in [0.5, 0.6]) = 8/81 P(\epsilon \in [0.6, 0.7]) = 4/81 P(\epsilon \in [0.7, 0.8]) = 2/81 P(\epsilon \in [0.8, 1.0]) = 1/81

For simplicity we assumed that Nellissa and I both have the same failure rate, and we also assumed that \epsilon is independent of the hypothesis, i.e.

P(\mathcal{H}_k, \epsilon) = P(\mathcal{H}_k)\,P(\epsilon)\;.

Thus, we can display the whole probability space in a contour plot, as shown in Figure 3.

FIGURE 3: Contour plot of the assumed prior probability distribution. Darker colors correspond to higher probability.

When we collect a datum such as \rm{DE} = \rm{VN}, we update our believes (probabilities) according to Bayes’ theorem:

P(\mathcal{H}_k, \epsilon\,|\,x) =P(x\,|\,\mathcal{H}_k, \epsilon)\,P(\mathcal{H}_k, \epsilon) / P(x)\;.

where x is the datum and we can expand the probability to observe x into

P(x) = \sum_{k,i} P(x\,|\,\mathcal{H}_k, \epsilon_i)\,P(\mathcal{H}_k)\,P(\epsilon_i)\;.

With a little bit of code, we can now explore how each datum changes our belief map, according to Bayes’ theorem above:

Change of our belief map due to the data we collected. The most likely hypothesis is \mathcal{H}_{\rm{VN}}, i.e. the German and English Snickers are identical, but the Vietnamese Snickers is different from the two.

The most probable hypothesis is thus that the German and English Snickers bars are the same, but the Vietnamese Snickers bar is different (\mathcal{H}_{\rm{VN}}), and that our failure rate \epsilon is 30% to 40%.

The conclusion changes, however, when only mine or Nellissa’s measurements are taken into account. Here is a little video, where I explore the data with a little app I wrote.

Much more insight can be gained from the belief maps shown in the video above, but I leave this to you, dear reader, to think about the results. I, for my part, have had enough Snickers bars for a lifetime.