Previous: README
Next: 2. Process data

1. Scrape raw data

nhl.com has play-by-play data available starting from season 2002–2003, for both regular season and playoff games. To scrape it, I am using the nhlscrapr package. It has a single command compile.all.games(), which downloads and compiles everything together.

However, it waits 20 seconds between every game, and therefore takes more than 3.5 days process all 12 seasons available. Instead, one might want to use something like the two options below to download games one-by-one or by season, and to set a shorter time interval.

suppressMessages({
  library(nhlscrapr)
})

compile.all.games()
## Loading game and player data.
## 20022003: no games need updating.
## 20032004: no games need updating.
## 20052006: no games need updating.
## 20062007: no games need updating.
## 20072008: no games need updating.
## 20082009: no games need updating.
## 20092010: no games need updating.
## 20102011: no games need updating.
## 20112012: no games need updating.
## 20122013: no games need updating.
## 20132014: no games need updating.
## Downloading files for game 20142015 30117
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30136
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30137
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30157
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30167
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30175
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30176
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30177
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30187
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30217
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30235
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30236
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30237
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30246
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30247
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Downloading files for game 20142015 30417
## Warning in download.single.game(season, gcode, rdata.folder, ...): Could
## not recover x-y coordinates.
## Pausing: 20
## Saving to output file
## [1] TRUE

In order to download games one-by-one or by season, these approaches can be used.

# get full list of games available
games <- full.game.database()

# download by game
apply(games, 1, function(game) {
  download.single.game(season=game["season"], gcode=game["gcode"], wait=2)
  gc()
})

# download by season
lapply(unique(games$season), function(season) {
  download.games(games[games$season == season, ], wait=2)
  gc()
})

# and once downloaded, compile everything together
compile.all.games()

Next: 2. Process data
Previous: README