Part 1
NHL 19 THREES Data Deep Dive
Analyzing game development using data science
By Brian Maresso & Sean Finlon
Introduction
Montreal Canadiens defenseman Andrei Markov and mathematician Andrey Markov have a few things in common. Both were born in Russia, both are well known in their respective fields, and their full names are one letter apart. That’s about it. Trust me, I’m going somewhere with this.
Take a step back and consider the following. Data science is revolutionizing the professional sports world. Every league is beginning to embrace the changes brought about by analytics, and the National Hockey League (NHL) is no exception. Coaches, players, and general managers are the benefactors of big data which can provide insights towards roster moves, trades, and on-ice strategies. The core game of hockey has changed very little in the Stanley Cup era in terms of rules and regulations. Sure, the game has gotten faster and player safety has become more of a priority leading to rule changes regarding hits and fighting, but the core aspects are the same. Put the puck in the other team’s net and you get a goal. The team with the most goals at the end of the game wins. We know the ins and outs of the sport of hockey, so we can use our available tools (like computers) to maximize our chance to win. The rules are well-documented and well-known.
But what if this was not the case? What if the rules for the game were decided behind closed doors? League officials would decide the rules and referees would enforce them. Of course, the game could still be played in this case. Over time, players and coaches might get a good idea of the game they’re playing, but how can one extrapolate the unknown rules of a game while participating it? One game worth of observations would not be enough. Think of it this way- there are dozens of penalties in the NHL (seriously, there are some obscure ones) but a referee wouldn’t call all of them in a single game. In order to collect a comprehensive collection of the rules, one would need to observe many games and use analytics to pinpoint every rule and its definition…
Section 1: THREES
Sean and I play a video game called EA NHL 19, a hockey video game. Specifically, we play a minigame called THREES. We’re both ranked in the top 100 players worldwide on Xbox One, so we like to think we’re pretty good at it. If you’re not familiar with it, here is the description from the publisher Electronic Arts (source):
Inspired by fun, pick-up-and-play arcade sports games, NHL THREES brings fans a three-on-three hockey experience with big hits, fast-paced action, and intense back-and-forth competition. The THREES announcer and colorful arenas with unique on-ice designs enhance the excitement, while no offsides and no icing means the fun never stops. Mechanics like RPM Skating and Collision Physics give you all-new ways to perform bigger hits, as you go up-and-down the ice with authentic responsiveness. The smaller-sized THREES rink leads to non-stop action, as less space heightens the thrill of the game.
THREES has one important distinction from the regular game of hockey- the MoneyPuck™ (which we will refer to simply as the Money Puck). Whereas one puck is worth one goal in the NHL, pucks can be worth bonus goals for the scorer (+2 or +3 instead of +1) or take goals away from the other team (-1, -2, or -3). Clearly, scoring Money Pucks is incredibly important to winning in THREES. However, there is something frustrating about being a THREES player- you don’t understand how the game distributes Money Pucks. Sometimes it seems like they appear at the worst possible time to let the other team take the lead. Other times you’re trying to complete a comeback but Money Pucks are nowhere to be found. A player can’t help but wonder- is the game rigged? There is no documentation regarding Money Puck distribution. The rules are decided on behind closed doors by the developers at EA. But just because this information isn’t readily available, that doesn’t mean we don’t have the ability to calculate it ourselves.
The Big Question:
How are Money Pucks in THREES distributed?
Section 2: Data Collection
The first step to answering this question is to collect data. We recorded the value of every single puck (between -3 and +3) for 100 games. Each row in the .csv file represented one game, and the nth column represented the nth puck value for that game. This process took over a week of recording, and the data file needed to be cleaned before we could process it. Once the file was ready, we could begin analyzing our data. (If you’re interested, the .csv file can be downloaded here)
Note: For this project, I used C# as my language of choice. Sean used R as his.
First, the program environment needs to be set up. I decided to express Money Pucks as a enum.
The ‘None’ Money Puck is not obtainable in-game but is used to represent the the puck that comes before the first one. This becomes relevant later. The numerical value of each enum represents its index in an array (rather than using a Dictionary
).
Using a ‘Game’ class (a simple wrapper class for a List<MoneyPuck>
) we load every MoneyPuck into a usable data structure.
We also define several extension methods for the MoneyPuck enum. There are more efficient ways to go about this, but this is suitable for our simple application.
After reading in all of the Money Pucks from every game in our database, we retrieved the following table:
Puck Value | Instances |
---|---|
+1 | 886 |
+2 | 480 |
+3 | 141 |
-1 | 148 |
-2 | 65 |
-3 | 59 |
We can already see from this table that different puck values have different frequencies. This already tells us the following:
Money Pucks are NOT chosen at random
We know this because the opposite is false. If Money Pucks were distributed randomly, then we would expect that each Money Puck would occur with near-identical frequency. Instead, we see that +1 is the most common puck while -3 is the least common.
If we take the instances of each Money Puck and divide by the total number of observed pucks, we can get the distribution of Money Pucks by their value:
Puck Value | Frequency |
---|---|
+1 | 49.80% |
+2 | 26.98% |
+3 | 7.93% |
-1 | 8.32% |
-2 | 3.65% |
-3 | 3.32% |
Section 3: Markov Pucks
When we began gathering the data for this project, we became interested in the order in which Money Pucks are distributed. For example, games seemed to start with a +2 or +3 value fairly often. At the same time, they never seemed to open with a negative Money Puck (which make sense, since a team cannot have a score below zero in THREES). We wondered if this observation would be backed up by the data. In particular, we wanted to know if the process by which Money Pucks were chosen was a Markov Chain. A Markov Chain is a sequence of events in which the probability of each state depends solely on the previous state. This made sense from our observations, but we needed to prove it. We can theorize that if Money Pucks form a Markov Chain, then the probability of a given Money Puck occurring must be a function of the previous Money Puck. Should the distribution of Money Pucks be significantly altered from their overall distribution (see the previous table) then we can postulate that the pucks are indeed following a Markov Chain.
First, we must go through the dataset and identify the number of time each state (Money Puck) proceeds every other state. The result will be a 2D array where each row represents the current state and each column represents the probability that the current state transitions to the given state. This was accomplished in two steps:
The first step is to simply get the number of times each state follows every other state in every game in the dataset. This is accomplished by the following method:
Next, we need to calculate the probability that each state proceeds another by simply dividing the count by the sum:
The result is the following table:
+1 | +2 | +3 | -1 | -2 | -3 | |
---|---|---|---|---|---|---|
N/A | 56.00% | 36.00% | 8.00% | 0.00% | 0.00% | 0.00% |
+1 | 46.39% | 24.76% | 9.38% | 10.34% | 4.57% | 4.57% |
+2 | 49.67% | 26.80% | 7.41% | 9.15% | 3.92% | 3.05% |
+3 | 45.86% | 25.56% | 1.50% | 15.04% | 6.77% | 5.26% |
-1 | 54.93% | 38.03% | 7.04% | 0.00% | 0.00% | 0.00% |
-2 | 67.74% | 20.97% | 11.29% | 0.00% | 0.00% | 0.00% |
-3 | 68.63% | 27.45% | 3.92% | 0.00% | 0.00% | 0.00% |
The N/A state represents the lack of a current state. This is used to denote the puck ‘before’ the first puck in the game. Therefore, the probability that N/A transitions into another state s is simply the probability that a game begins with s.
We can see that these numbers are not identical to the total distribution in the earlier table. This tells us that the current state does indeed have an impact on the next state. Given information on the current state, we can more accurately predict the following puck as compared to predicting based solely on the total probability of each puck. Knowing this, we can answer our original question:
Money Pucks are chosen by a probability based solely on the value of the current puck
Some interesting observations from this relationship:
- The game cannot open with a negative Money Puck. It makes sense that the developers included this rule since negative Money Pucks are only worth 1 goal to the scoring team. Since games begin 0-0 and scores cannot be reduced to negative numbers, opening with a negative Money Puck would be functionally identical to opening with a 1 goal puck
- A negative Money Puck cannot be followed by another negative Money Puck. This is more of an interesting game design choice than a logical one. If we consider Money Pucks to be valued at a ‘goal swing’ rather than a simple value (something we explore in a future article) then a -1 puck represents a 2-goal swing (since the scoring team gets +1 goal and the other team loses -1 goal). This means that a -1 is (in the best case) worth 2 goals. This is comparable to a +2 Money Puck which is worth +2 in all cases. The most valuable puck in terms of best-case potential swing is a -3 since it represents a best-case 4-goal swing. It would make sense that developers did not want two -3 pucks to follow each other (a maximum 8 goal swing from 2 pucks is infuriating for the other team) but why a -1 cannot follow a -1 is something we cannot answer
- +1 Pucks become less likely to follow one another. The overall probability that a random puck is a +1 is 49.80%. The probability that a +1 follows a +1 is 46.39%. That means that on average, +1 pucks are less likely to transition into themselves- albeit by a small amount.
- The least likely transition is a +3 to a +3. For a long time collecting the data, we assumed that a +3 could not follow a +3 for the same reason that a negative puck cannot follow a negative puck. Only twice in 100 games did a +3 follow a +3 (the same number of times that a +3 follows a -3- but since a -3 cannot transition to another negative puck, the 2 instances where it transitioned to a +3 carried a higher probability than the 2 instances for +3 to +3)
- The most likely transition is a -3 to a +1. At 68.63%, it is more than a 2/3 chance that a +1 will follow a -3. It figures that the rarest Money Puck should want to be followed by the most common puck
The next article will dive into win probabilities and game outcomes. The way that THREES is structured, the better team doesn’t always win- and we’ll determine just how often that happens…