NBA Fantasy League Prediction Based on Historical Data

Machine Learning Project

Jeevan Parmar


Introduction

For years I have loved to both love and play the game of basketball. As I got older, my friends and I started to play in a fantasy league together. Each year before our draft: we would spends hours sifting through websites of predictions and rankings for the upcoming season. One day while eating breakfast I was reading through an article by ESPN about the upcoming season and a thought dawned on me; why couldn't I create my own predictions. Using Linear Regression, K-Nearest Neighbors Regressor and a Decision Tree Regressor machine learning models, I predicted the upcoming NBA season. I created this website to show my findings and give my opinion on the predicted 2022-23 season produced by these machine learning models.


Approach

My approach consisted of the following steps: collecting and storage of the data, cleaning and manipulation of the data, and finally training and testing of the data to predict the 2022-23 season.

Collecting and Storage of the Data

Before being able to do anything I had to actually get the data. I used the official API for raw NBA statistics as well as designing a web scraper using requests to pull data from NBA's advanced stats website. After pulling the data, I stored it into a MySQL database.

Cleaning and Manipulation

I spent a considerable amount of time cleaning and manipulating the data to be able to do anything meaningful with it. After the data was collected I had to determine whether each column was relevant or irrelevant. I also then had to add relevant columns to the data frame based on my own knowledge of fantasy league and online forums. My final data frame consisted of the most important player centric metrics from 2010 to 2022. A lot of sanity checking went into cleaning the data so most of what I kept and didn't keep is subjective to the person.

Training and Testing

After the cleaning of the data was complete, it was time to finally start the process of predicting the 2022-23 season. I predicted the 2022-23 season with 3 different machine learning algorithms: Linear Regression, K-Nearest Neighbors Regressor and a Decision Tree Regressor.

My dataset was split 90% to testing and the remaining 10% to testing so I would have one season outputted by each algorithm. To probably train and test each machine learning model, I created algorithm that looped through the data frame, taking a subset of the data frame in each loop where age stayed constant. In other words in each iteration, the independent variable, X = the age of the player and the dependent variable, Y = a column during the specific iteration.


Results

First I present the top 25 players predicted, using Simple Linear Regression:

SEASON_IDPLAYER_NAMETEAM_ABBREVIATIONAGENBA_FANTASY_PTS_RANK_ESPNNET_ESPNAVG_NET_ESPN
2022-23Nikola JokicDEN281392947.91
2022-23LeBron JamesLAL382368544.94
2022-23Kevin DurantBKN343343341.87
2022-23Giannis AntetokounmpoMIL284336541.04
2022-23Karl-Anthony TownsMIN275325339.67
2022-23Joel EmbiidPHI296324239.54
2022-23Russell WestbrookLAL347313238.2
2022-23Ben SimmonsPHI258301436.76
2022-23Luka DoncicDAL249299136.48
2022-23Chris PaulPHX3810291635.56
2022-23James HardenPHI3311289935.35
2022-23Trae YoungATL2412287735.09
2022-23Pascal SiakamTOR2913281134.28
2022-23Stephen CurryGSW3514274533.48
2022-23Damian LillardPOR3215271833.15
2022-23Anthony DavisLAL3016268332.72
2022-23Julius RandleNYK2817261231.85
2022-23Bradley BealWAS3018256331.26
2022-23Domantas SabonisSAC2719255731.18
2022-23Donovan MitchellUTA2620251030.61
2022-23Devin BookerPHX2621249830.46
2022-23Rudy GobertUTA3122248430.29
2022-23Khris MiddletonMIL3122248430.29
2022-23Andre DrummondBKN2924247530.18
2022-23Scottie BarnesTOR2125245829.98

Secondly I present the top 25 players based on the K-Nearest Neighbors Regressor Machine Learning Model:

SEASON_IDPLAYER_NAMETEAM_ABBREVIATIONAGENBA_FANTASY_PTS_RANK_ESPNNET_ESPNAVG_NET_ESPN
2022-23Russell WestbrookLAL341465557
2022-23James HardenPHI332447955
2022-23Nikola JokicDEN283421551
2022-23LeBron JamesLAL384390248
2022-23Stephen CurryGSW356375246
2022-23Giannis AntetokounmpoMIL286376146
2022-23Trae YoungATL247368145
2022-23Luka DoncicDAL248339141
2022-23Kevin DurantBKN349319739
2022-23Miles BridgesCHA2510310238
2022-23Anthony DavisLAL3010312038
2022-23De'Aaron FoxSAC2512302137
2022-23Paul GeorgeLAC3312306637
2022-23Hassan WhitesideUTA3414297036
2022-23Andre DrummondBKN2914294236
2022-23Kyrie IrvingBKN3117283535
2022-23DeAndre JordanPHI3417287235
2022-23Domantas SabonisSAC2717287035
2022-23Damian LillardPOR3220282034
2022-23Chris PaulPHX3820280734
2022-23Donovan MitchellUTA2620280434
2022-23DeMar DeRozanCHI3320281834
2022-23Eric BledsoePOR3324267633
2022-23Nikola VucevicCHI3224267233
2022-23Rudy GobertUTA3124270933

Finally I present the top 25 players based on the Decision Tree Regressor Machine Learning Model:

SEASON_IDPLAYER_NAMETEAM_ABBREVIATIONAGENBA_FANTASY_PTS_RANK_ESPNNET_ESPNAVG_NET_ESPN
2022-23Nikola JokicDEN281421551
2022-23Chris PaulPHX382399349
2022-23James HardenPHI334394348
2022-23Stephen CurryGSW354391948
2022-23Giannis AntetokounmpoMIL285345942
2022-23Ben SimmonsPHI256334741
2022-23Russell WestbrookLAL347331340
2022-23LeBron JamesLAL388320539
2022-23Paul GeorgeLAC338318639
2022-23Luka DoncicDAL2410313438
2022-23Anthony DavisLAL3010312038
2022-23De'Aaron FoxSAC2512302137
2022-23DeAndre JordanPHI3412307337
2022-23Hassan WhitesideUTA3414297036
2022-23Andre DrummondBKN2914294236
2022-23Trae YoungATL2417283735
2022-23Kyrie IrvingBKN3117283535
2022-23Domantas SabonisSAC2717287035
2022-23Damian LillardPOR3220282034
2022-23Donovan MitchellUTA2620280434
2022-23LaMarcus AldridgeBKN3722273933
2022-23Nikola VucevicCHI3222267233
2022-23Al HorfordBOS3722270233
2022-23Rudy GobertUTA3122269433
2022-23Paul MillsapPHI3826264932

Conclusion

In my opinion the Simple Linear Regression machine learning model preformed the best. Keeping that in mind, I wouldn't base my fantasy team based on its predictions, or I would at least take it with a grain of salt. There are some intangibles and factors that I didn't take into account (the team the player plays for, the player's role, the player's injury severity history, player usage, etc). If I were to do this project again I would try to create algorithm that creates a ranking system and applies a numeric value to each of these intangibles before inputting the data into a machine learning; so the fantasy data that is predicted is more realistic. I did think for this project, that was beyond my knowledge in terms of implementing what I just proposed. At the end of the day, this project's goal was to showcase some of my skills in programming, data science and cognitive thinking. I believe my goal was met and I am satisfied with the project.

My next step is to actually create a fantasy league team based on my findings and see how it preforms.