## Part 2

# NHL 19 THREES Data Deep Dive

*Analyzing game development using data **science*

*By Brian Maresso & Sean Finlon*

## Recap

If you haven’t already, check out Part 1 of our NHL 19 THREES Data Deep Dive. In Part 1, we collected data from 100 games of NHL 19 THREES and used it to analyze the probable distribution of Money Pucks. Two tables are of particular interest from this part:

Puck Value | Frequency |
---|---|

+1 | 49.80% |

+2 | 26.98% |

+3 | 7.93% |

-1 | 8.32% |

-2 | 3.65% |

-3 | 3.32% |

+1 | +2 | +3 | -1 | -2 | -3 | |
---|---|---|---|---|---|---|

N/A | 56.00% | 36.00% | 8.00% | 0.00% | 0.00% | 0.00% |

+1 | 46.39% | 24.76% | 9.38% | 10.34% | 4.57% | 4.57% |

+2 | 49.67% | 26.80% | 7.41% | 9.15% | 3.92% | 3.05% |

+3 | 45.86% | 25.56% | 1.50% | 15.04% | 6.77% | 5.26% |

-1 | 54.93% | 38.03% | 7.04% | 0.00% | 0.00% | 0.00% |

-2 | 67.74% | 20.97% | 11.29% | 0.00% | 0.00% | 0.00% |

-3 | 68.63% | 27.45% | 3.92% | 0.00% | 0.00% | 0.00% |

The difference in probability for any given state (Money Puck) between the general probability (first table) and the conditional probability (second) leads us to believe that Money Puck distribution is a Markov Chain. The probability that the next Money Puck has a given value is entirely determined by the current puck’s value. For example, while there is a 7.93% chance any given puck is a +3, the chance that the next puck is a +3 given that the current puck is a -2 is 11.29%. The -3 puck has a 3.32% of appearing, but given the current puck has a value of +1, the chance becomes 4.57%. Since we found no evidence that Money Pucks influence the probability of pucks beyond the one directly following them, we can treat Money Puck distribution as a Markov Chain.

## Introduction

Consider the following situation: Team A is playing Team B are playing a normal game of hockey. When time expires, Team A has scored 6 goals to Team B’s 2. Team A would have won the game 6-2. In hockey, this would generally be considered a rout. Now, what if I said that Team B was actually the winner despite scoring 4 fewer times than their opponent? This is very much within the realm of possibility in THREES. How? The following scenario:

- Team A scores a +1 (1-0)
- Team A scores a +1 (2-0)
- Team A scores a +1 (3-0)
- Team B scores a -3 (0-1)
- Team A scores a +1 (1-1)
- Team A scores a +1 (2-1)
- Team A scores a +1 (3-1)
- Team B scores a -3 (0-2)
- Time expires

In real hockey, we keep track of a stat called plus/minus (often abbreviated as +/-). Team +/- tells us how many more goals a team scored compared to their opponent. This can also be called a scoring differential. Typically, good teams will have a high +/- since they’re generally scoring more goals than they let in. For example, the 2014/15 Chicago Blackhawks won the Stanley Cup with 220 goals for (GF) and 186 goals against (GA) for a team +/- of +34. Compare this to the 2018/19 Chicago Blackhawks (a team which did not make the playoffs) with 267 GF and 291 GA for a team +/- of -24 (source: Hockey Reference).

In THREES, +/- takes on a different meaning. +/- in THREES tells us *puck* differential- not goal differential. Going back to our earlier example, Team A would have a +4 rating while Team B would have a -4. In real hockey, having a positive team +/- in a game means that your team won. This is not the case in THREES. Since the +/- stat can generally tell us which team is more skilled (i.e. the ability to outscore the opponent while preventing their scoring opportunities), THREES presents an interesting conundrum- the more skilled team does not always win. This presents our next question for this article.

The Big Question:

**What is the probability that a team will still lose despite outscoring their opponent?**

## Section 1: Game of States

One way that we could go about answering our Big Question would be to play another hundred or so games and record our +/- and the final score. However, this is problematic. We want a generalized rule which can apply to *any* team, not just ours. We win over 75% of games, so that would skew the data towards a winning record. Additionally, there are millions of game permutations which we have not played (and likely never will).

The solution is to develop a formula which takes +/- (also referred to as puck differential or *p*) and total number of pucks scored in the game *G* and returns the probability that the given team won. Of course, this is easier said than done…

Our first step was to brute-force the solution as far as possible. This would take *every* possible way to make a chain of *G* pucks and apply these ‘game states’ to *every* possible way that two teams can score *G* pucks. Note: we assume that both teams are of equal skill- we do this by giving each team an equal probability (50%) to score any given puck.

First, we define each Team as an enum. This could also be accomplished as a boolean, but this way makes the code more readable:

Next, we need to generate every possible way that two teams can score *G* pucks. This is no different than generating every possible binary string of *G* length. A binary string of *G* length will have 2^{G} possible strings. Rather than write the method to generate all 2^{G} possible scoring options by hand, we can take the easy way out and convert a binary string to an array of Teams (at the cost of some efficiency)

Next, we need to generate every possible way that Money Pucks can make a chain of *G* length. This is by far the most expensive operation to brute-force. We define a container class called `GameState`

which stores a `Game`

(recall from Part 1 that `Game`

is a simple wrapper class for a `List<MoneyPuck>`

) and the probability that this `GameState`

occurs. Each `GameState`

*s1* can use the table from Part 1 to determine the probability that it will transition into a `GameState`

*s2* which contains the given Money Puck *m2* from the last Money Puck in *s1* labeled *m1*. The total probability of the next `GameState`

*s2* will simply be *P(s1)*P(m1*->*m2*)

This method will ignore states where it is impossible to transition from *m1* to *m2*. For example, a `GameState`

which ends with a -2 cannot transition into a state which follows with a -1 since *P(*-2->-1*)*=0. If we start with a `GameState`

which only contains N/A (the default ‘before-first’ Money Puck) then the probability is 1. As we recursively create more states and multiply probabilities, the chance of each state gets smaller and smaller.

After running both of these methods, we get the following table:

Game Length (G) |
Scoring Combinations | Money Puck Combinations |
---|---|---|

1 | 2 | 3 |

2 | 4 | 18 |

3 | 8 | 81 |

4 | 16 | 405 |

5 | 32 | 1,944 |

6 | 64 | 9,477 |

7 | 128 | 45,927 |

8 | 256 | 223,074 |

9 | 512 | 1,082,565 |

10 | 1,024 | 5,255,361 |

## Section 2: Playing the Odds

With all scoring combinations and money puck combinations in place, the final preparatory step is to write a method to calculate the score of a `GameState`

given the scoring order (expressed as`Team[]`

). Note that THREES games cannot end in a tie- they will continue to add overtime periods until one team scores (much like NHL playoff games). However, every value of *G* except 1 includes states where both teams will have an equal number of goals (though not necessarily pucks). To plan for this possibility, we return a `Nullable`

form of `Team`

instead of a winning `Team`

(a C# feature). A `null`

winner tells us that the game is a tie.

First, we create a method to determine the score of a `Game`

which we will later use to determine a winner.

Next, the simple method to determine a winner (if one exists):

Using these methods, we can now determine the probability that a team wins, loses, or ties a given `GameState`

provided their puck differential (+/-). This is a compound probability influenced by 1) the probability that Money Pucks form the given chain and 2) the probability that two equally-skilled teams score in the given order. The following tables are the results for values of *G* from 1-10:

G=1

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+1 | 100.00000% | 0.00000% | 0.00000% |

-1 | 0.00000% | 100.00000% | 0.00000% |

G=2

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+2 | 100.00000% | 0.00000% | 0.00000% |

+0 | 30.20820% | 30.20820% | 39.58360% |

-2 | 0.00000% | 100.00000% | 0.00000% |

G=3

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+3 | 100.00000% | 0.00000% | 0.00000% |

+1 | 80.36029% | 3.65183% | 15.98788% |

-1 | 3.65183% | 80.36029% | 15.98788% |

-3 | 0.00000% | 100.00000% | 0.00000% |

G=4

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+4 | 100.00000% | 0.00000% | 0.00000% |

+2 | 97.28788% | 0.21376% | 2.49837% |

+0 | 36.35404% | 36.35404% | 27.29191% |

-2 | 0.21376% | 97.28788% | 2.49837% |

-4 | 0.00000% | 100.00000% | 0.00000% |

G=5

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+5 | 100.00000% | 0.00000% | 0.00000% |

+3 | 99.77505% | 0.00000% | 0.22495% |

+1 | 74.06629% | 10.07839% | 15.85532% |

-1 | 10.07839% | 74.06629% | 15.85532% |

-3 | 0.00000% | 99.77505% | 0.22495% |

-5 | 0.00000% | 100.00000% | 0.00000% |

G=6

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+6 | 100.00000% | 0.00000% | 0.00000% |

+4 | 100.00000% | 0.00000% | 0.00000% |

+2 | 93.05398% | 1.91151% | 5.03451% |

+0 | 39.03839% | 39.03839% | 21.92322% |

-2 | 1.91151% | 93.05398% | 5.03451% |

-4 | 0.00000% | 100.00000% | 0.00000% |

-6 | 0.00000% | 100.00000% | 0.00000% |

G=7

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+7 | 100.00000% | 0.00000% | 0.00000% |

+5 | 100.00000% | 0.00000% | 0.00000% |

+3 | 98.67824% | 0.26110% | 1.06066% |

+1 | 70.52975% | 14.56674% | 14.90351% |

-1 | 14.56674% | 70.52975% | 14.90351% |

-3 | 0.26110% | 98.67824% | 1.06066% |

-5 | 0.00000% | 100.00000% | 0.00000% |

-7 | 0.00000% | 100.00000% | 0.00000% |

G=8

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+8 | 100.00000% | 0.00000% | 0.00000% |

+6 | 100.00000% | 0.00000% | 0.00000% |

+4 | 99.81772% | 0.02387% | 0.15841% |

+2 | 89.30659% | 4.15212% | 6.54129% |

+0 | 40.56486% | 40.56486% | 18.87027% |

-2 | 4.15212% | 89.30659% | 6.54129% |

-4 | 0.02387% | 99.81772% | 0.15841% |

-6 | 0.00000% | 100.00000% | 0.00000% |

-8 | 0.00000% | 100.00000% | 0.00000% |

G=9

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+9 | 100.00000% | 0.00000% | 0.00000% |

+7 | 100.00000% | 0.00000% | 0.00000% |

+5 | 99.98220% | 0.00107% | 0.01673% |

+3 | 96.73148% | 1.01228% | 2.25623% |

+1 | 65.56358% | 19.14714% | 15.28929% |

-1 | 19.14714% | 65.56358% | 15.28929% |

-3 | 1.01228% | 96.73148% | 2.25623% |

-5 | 0.00107% | 99.98220% | 0.01673% |

-7 | 0.00000% | 100.00000% | 0.00000% |

-9 | 0.00000% | 100.00000% | 0.00000% |

G=10

+/- | Chance to Win | Chance to Lose | Chance to Tie |
---|---|---|---|

+10 | 100.00000% | 0.00000% | 0.00000% |

+8 | 100.00000% | 0.00000% | 0.00000% |

+6 | 99.99901% | 0.00000% | 0.00098% |

+4 | 99.07636% | 0.22742% | 0.69622% |

+2 | 81.17651% | 8.68209% | 10.14140% |

+0 | 40.13809% | 40.13809% | 19.72382% |

-2 | 8.68209% | 81.17651% | 10.14140% |

-4 | 0.22742% | 99.07636% | 0.69622% |

-6 | 0.00000% | 99.99901% | 0.00098% |

-8 | 0.00000% | 100.00000% | 0.00000% |

-10 | 0.00000% | 100.00000% | 0.00000% |

An alternative way of looking at all of these tables at once is to treat each column as the total number of pucks scored (*G*) and each row as the team’s +/-. Each cell represents the probability that the team will win given their +/- and the value of *G*. A reference to this table can be found here.

## Section 3: Analysis

We could not get the brute-force method to run beyond 10-puck games due to memory constraints. Given the calculation rates, it would have taken ~1.5 years to calculate the probability table for *G*=11 and 15 billion millennia to calculate the probability table for *G*=27 (our longest recorded game).

Brute force gave us a good stepping stone for future work. There are clear patterns that develop over time which we can use to create approximation methods. But for now- let’s take a look at what we can learn from this data.

We’ll start with two “water is wet” statements to confirm the validity of our data. First- if a team’s +/- equals *G*, then they will win 100% of the time. Put simply, if one team scores all of the goals, it’s impossible for them to lose. This is reflected in our probability tables.

Second, the probability to tie reaches its maximum value around the time that the team’s +/- equals zero. Additionally, the chance to tie for a given +/- *p* is always equal to the chance to tie for *-p*.

Now for the more interesting patterns. The most unfair game possible in our calculations would be one of two options (depending on how you define the word ‘unfair’):

- In a game where 10 pucks are scored,
**Team A could outscore Team B by +6 pucks**. The chance is minuscule (0.00098%) but still possible. This would involve Team A consistently scoring +1 pucks while Team B exclusively scores -3 pucks, erasing Team A’s lead again and again. Simply bad luck or a choke job on par with the 2018/19 Tampa Bay Lightning?*and still walk away with a tie* - In a game where 9 pucks are scored,
**Team A could outscore Team B by +5 pucks**. This runs at a probability of 0.00107%. For reference, this would be a score of 7-2 in real hockey. And the team that scored 2 wins.*and lose*

Some other miscellaneous patterns:

- We also notice that
**ties trend towards a 20% probability**as*G*approaches infinity (for even-numbered values of*G*). When the +/- of a team is zero (only possible when*G*is even), the probability to win is always equal to the probability to lose. As*G*increases, the probability to tie approaches 20%. This value starts at ~40% when*G*=2. - Once 6 pucks have been scored,
**it is impossible to lose or tie if a team scores ≥**regardless of their value.*G*-2 pucks - Finally, we can see that the game is THREES is still fair
*as a whole*. No matter how many pucks are scored,**the team which scored more pucks will probably win**. While this is certainly not always the case (just ask any THREES player), the game is still set up to favor the more skilled team. Still, there is always going to be some level of frustration. Especially if you outscore your opponent 7-2 and walk away with the L. Data science probably isn’t much consolation in that case…

The next article will discuss our methodology for approximating win probabilities and game states. The brute-force method provides exact data, but we don’t exactly have 15 billion millennia of free time to wait for a 27-length probability table. Since our average game is 14 pucks long, being capped at 10 isn’t ideal. This also leads us down the path to answer the age-old THREES question: *how many times have we been robbed of a victory by Money Pucks?*