Code Documentation
Rmarkdown and HTML Files
- Rmarkdown file: report.Rmd
- Github Repo link
- Compiled HTML link
./data
- game_stats_all.csv
- team_stats_all.csv
- post_62_games_pred.rds
- pre_20_games.rds
./nba_simulation_shiny
- ShinyApp source code
- ShinyApp link
./plots
./source
- EDA_visualize.R
- data_integration.R
- data_scraping.R
- data_wrangling.R
- feature_eng.R
- model_build.R
- simulation.R
Wrap-Up Functions
- Data Scraping
get_schedule_data(year,month)
: obtain the game schedules and results in specific year and monthcombine_game_data()
: combine the game schedules dataget_team_stats(year)
: obtain the team-level data (30 teams in the league) of first 20 games in specific year
- Data Wrangling and Feature Engineering
get_ws_data(year)
: obtain the win share data in specific yearget_team_year_feature(year)
: obtain the team performance feature vectors in specific yearget_game_year_feature(year)
: obtain the game results in tidy format in specific yearget_injury_data(year)
,combine_injury_data()
: obtain and combine the injury data
- EDA
team_def_off_plot(year)
: show each team’s defense vs offense in specific yearteam_rank_plot(year,Nfirst)
: show each team’s season rank vs first N games’ rank in specific yearhome_way_plot(year)
: show each team’s home and away records in specific year; show western vs eastern recordsteam_ws_plot(year)
: show each team’s highest winshare in specific yearplayer_ws_plot()
: show the winshare of top players in the league in recent years
- Model Building
get_train_test_data(year,firstN)
: obtain the training and test data of first N games in specific year.build_logit_model_year(train_year, test_year, firstN)
: use the first N games of train_year to build the model, and test the model on test_year. return the prediction results (W/L) and winning probability of each game.
- Integrative Prediction
combine_season_data()
: combine the team performance feature data of multiple seasonsteam_pc_plot()
: create the PC plot for the team performances of all the 30 teams in multiple seasons; the distance matrix of season datateam_cluster_plot()
: create the cluster plot for each team in multiple seasonswin_prob_prediction(train_year, test_year,firstN,year_effect)
: The winning probability prediction function allow for year_effect
- Simulation
WL_record_cal(game_schedule,WL_record)
: calculate the W/L records for each team given the game schedulegame_simulation(pred_result,B,year,Nfirst)
: given the predict results in specific year, repeat the game simulation for B times