For years I have loved to both love and play the game of basketball. As I got older, my friends and I started to play in a fantasy league together. Each year before our draft: we would spends hours sifting through websites of predictions and rankings for the upcoming season. One day while eating breakfast I was reading through an article by ESPN about the upcoming season and a thought dawned on me; why couldn't I create my own predictions. Using Linear Regression, K-Nearest Neighbors Regressor and a Decision Tree Regressor machine learning models, I predicted the upcoming NBA season. I created this website to show my findings and give my opinion on the predicted 2022-23 season produced by these machine learning models.
My approach consisted of the following steps: collecting and storage of the data, cleaning and manipulation of the data, and finally training and testing of the data to predict the 2022-23 season.
Before being able to do anything I had to actually get the data. I used the official API for raw NBA statistics as well as designing a web scraper using requests to pull data from NBA's advanced stats website. After pulling the data, I stored it into a MySQL database.
I spent a considerable amount of time cleaning and manipulating the data to be able to do anything meaningful with it. After the data was collected I had to determine whether each column was relevant or irrelevant. I also then had to add relevant columns to the data frame based on my own knowledge of fantasy league and online forums. My final data frame consisted of the most important player centric metrics from 2010 to 2022. A lot of sanity checking went into cleaning the data so most of what I kept and didn't keep is subjective to the person.
After the cleaning of the data was complete, it was time to finally start the process of predicting the 2022-23 season. I predicted the 2022-23 season with 3 different machine learning algorithms: Linear Regression, K-Nearest Neighbors Regressor and a Decision Tree Regressor.
My dataset was split 90% to testing and the remaining 10% to testing so I would have one season outputted by each algorithm. To probably train and test each machine learning model, I created algorithm that looped through the data frame, taking a subset of the data frame in each loop where age stayed constant. In other words in each iteration, the independent variable, X = the age of the player and the dependent variable, Y = a column during the specific iteration.
First I present the top 25 players predicted, using Simple Linear Regression:
SEASON_ID | PLAYER_NAME | TEAM_ABBREVIATION | AGE | NBA_FANTASY_PTS_RANK_ESPN | NET_ESPN | AVG_NET_ESPN |
---|---|---|---|---|---|---|
2022-23 | Nikola Jokic | DEN | 28 | 1 | 3929 | 47.91 |
2022-23 | LeBron James | LAL | 38 | 2 | 3685 | 44.94 |
2022-23 | Kevin Durant | BKN | 34 | 3 | 3433 | 41.87 |
2022-23 | Giannis Antetokounmpo | MIL | 28 | 4 | 3365 | 41.04 |
2022-23 | Karl-Anthony Towns | MIN | 27 | 5 | 3253 | 39.67 |
2022-23 | Joel Embiid | PHI | 29 | 6 | 3242 | 39.54 |
2022-23 | Russell Westbrook | LAL | 34 | 7 | 3132 | 38.2 |
2022-23 | Ben Simmons | PHI | 25 | 8 | 3014 | 36.76 |
2022-23 | Luka Doncic | DAL | 24 | 9 | 2991 | 36.48 |
2022-23 | Chris Paul | PHX | 38 | 10 | 2916 | 35.56 |
2022-23 | James Harden | PHI | 33 | 11 | 2899 | 35.35 |
2022-23 | Trae Young | ATL | 24 | 12 | 2877 | 35.09 |
2022-23 | Pascal Siakam | TOR | 29 | 13 | 2811 | 34.28 |
2022-23 | Stephen Curry | GSW | 35 | 14 | 2745 | 33.48 |
2022-23 | Damian Lillard | POR | 32 | 15 | 2718 | 33.15 |
2022-23 | Anthony Davis | LAL | 30 | 16 | 2683 | 32.72 |
2022-23 | Julius Randle | NYK | 28 | 17 | 2612 | 31.85 |
2022-23 | Bradley Beal | WAS | 30 | 18 | 2563 | 31.26 |
2022-23 | Domantas Sabonis | SAC | 27 | 19 | 2557 | 31.18 |
2022-23 | Donovan Mitchell | UTA | 26 | 20 | 2510 | 30.61 |
2022-23 | Devin Booker | PHX | 26 | 21 | 2498 | 30.46 |
2022-23 | Rudy Gobert | UTA | 31 | 22 | 2484 | 30.29 |
2022-23 | Khris Middleton | MIL | 31 | 22 | 2484 | 30.29 |
2022-23 | Andre Drummond | BKN | 29 | 24 | 2475 | 30.18 |
2022-23 | Scottie Barnes | TOR | 21 | 25 | 2458 | 29.98 |
Secondly I present the top 25 players based on the K-Nearest Neighbors Regressor Machine Learning Model:
SEASON_ID | PLAYER_NAME | TEAM_ABBREVIATION | AGE | NBA_FANTASY_PTS_RANK_ESPN | NET_ESPN | AVG_NET_ESPN |
---|---|---|---|---|---|---|
2022-23 | Russell Westbrook | LAL | 34 | 1 | 4655 | 57 |
2022-23 | James Harden | PHI | 33 | 2 | 4479 | 55 |
2022-23 | Nikola Jokic | DEN | 28 | 3 | 4215 | 51 |
2022-23 | LeBron James | LAL | 38 | 4 | 3902 | 48 |
2022-23 | Stephen Curry | GSW | 35 | 6 | 3752 | 46 |
2022-23 | Giannis Antetokounmpo | MIL | 28 | 6 | 3761 | 46 |
2022-23 | Trae Young | ATL | 24 | 7 | 3681 | 45 |
2022-23 | Luka Doncic | DAL | 24 | 8 | 3391 | 41 |
2022-23 | Kevin Durant | BKN | 34 | 9 | 3197 | 39 |
2022-23 | Miles Bridges | CHA | 25 | 10 | 3102 | 38 |
2022-23 | Anthony Davis | LAL | 30 | 10 | 3120 | 38 |
2022-23 | De'Aaron Fox | SAC | 25 | 12 | 3021 | 37 |
2022-23 | Paul George | LAC | 33 | 12 | 3066 | 37 |
2022-23 | Hassan Whiteside | UTA | 34 | 14 | 2970 | 36 |
2022-23 | Andre Drummond | BKN | 29 | 14 | 2942 | 36 |
2022-23 | Kyrie Irving | BKN | 31 | 17 | 2835 | 35 |
2022-23 | DeAndre Jordan | PHI | 34 | 17 | 2872 | 35 |
2022-23 | Domantas Sabonis | SAC | 27 | 17 | 2870 | 35 |
2022-23 | Damian Lillard | POR | 32 | 20 | 2820 | 34 |
2022-23 | Chris Paul | PHX | 38 | 20 | 2807 | 34 |
2022-23 | Donovan Mitchell | UTA | 26 | 20 | 2804 | 34 |
2022-23 | DeMar DeRozan | CHI | 33 | 20 | 2818 | 34 |
2022-23 | Eric Bledsoe | POR | 33 | 24 | 2676 | 33 |
2022-23 | Nikola Vucevic | CHI | 32 | 24 | 2672 | 33 |
2022-23 | Rudy Gobert | UTA | 31 | 24 | 2709 | 33 |
Finally I present the top 25 players based on the Decision Tree Regressor Machine Learning Model:
SEASON_ID | PLAYER_NAME | TEAM_ABBREVIATION | AGE | NBA_FANTASY_PTS_RANK_ESPN | NET_ESPN | AVG_NET_ESPN |
---|---|---|---|---|---|---|
2022-23 | Nikola Jokic | DEN | 28 | 1 | 4215 | 51 |
2022-23 | Chris Paul | PHX | 38 | 2 | 3993 | 49 |
2022-23 | James Harden | PHI | 33 | 4 | 3943 | 48 |
2022-23 | Stephen Curry | GSW | 35 | 4 | 3919 | 48 |
2022-23 | Giannis Antetokounmpo | MIL | 28 | 5 | 3459 | 42 |
2022-23 | Ben Simmons | PHI | 25 | 6 | 3347 | 41 |
2022-23 | Russell Westbrook | LAL | 34 | 7 | 3313 | 40 |
2022-23 | LeBron James | LAL | 38 | 8 | 3205 | 39 |
2022-23 | Paul George | LAC | 33 | 8 | 3186 | 39 |
2022-23 | Luka Doncic | DAL | 24 | 10 | 3134 | 38 |
2022-23 | Anthony Davis | LAL | 30 | 10 | 3120 | 38 |
2022-23 | De'Aaron Fox | SAC | 25 | 12 | 3021 | 37 |
2022-23 | DeAndre Jordan | PHI | 34 | 12 | 3073 | 37 |
2022-23 | Hassan Whiteside | UTA | 34 | 14 | 2970 | 36 |
2022-23 | Andre Drummond | BKN | 29 | 14 | 2942 | 36 |
2022-23 | Trae Young | ATL | 24 | 17 | 2837 | 35 |
2022-23 | Kyrie Irving | BKN | 31 | 17 | 2835 | 35 |
2022-23 | Domantas Sabonis | SAC | 27 | 17 | 2870 | 35 |
2022-23 | Damian Lillard | POR | 32 | 20 | 2820 | 34 |
2022-23 | Donovan Mitchell | UTA | 26 | 20 | 2804 | 34 |
2022-23 | LaMarcus Aldridge | BKN | 37 | 22 | 2739 | 33 |
2022-23 | Nikola Vucevic | CHI | 32 | 22 | 2672 | 33 |
2022-23 | Al Horford | BOS | 37 | 22 | 2702 | 33 |
2022-23 | Rudy Gobert | UTA | 31 | 22 | 2694 | 33 |
2022-23 | Paul Millsap | PHI | 38 | 26 | 2649 | 32 |
In my opinion the Simple Linear Regression machine learning model preformed the best. Keeping that in mind, I wouldn't base my fantasy team based on its predictions, or I would at least take it with a grain of salt. There are some intangibles and factors that I didn't take into account (the team the player plays for, the player's role, the player's injury severity history, player usage, etc). If I were to do this project again I would try to create algorithm that creates a ranking system and applies a numeric value to each of these intangibles before inputting the data into a machine learning; so the fantasy data that is predicted is more realistic. I did think for this project, that was beyond my knowledge in terms of implementing what I just proposed. At the end of the day, this project's goal was to showcase some of my skills in programming, data science and cognitive thinking. I believe my goal was met and I am satisfied with the project.
My next step is to actually create a fantasy league team based on my findings and see how it preforms.