Chapter 4 NBA Analytics

4.1 Introduction

The purpose of this chapter was to examine if there is a relationship between west and east coast teams. looked at some stats from teams such as point, rebounds, assists, steals, and blocks and examined if there were any correlations between east and west coast teams.

4.2 Loading and Preparing the Data

library(readxl)
library(tidyverse)
NBA_teams<- read_xlsx("NBA Team Total Data 2024-2025.xlsx")
View(NBA_teams)

loading_teams<- function(file_name,team_name,sheet_name,PRA, Stocks){
  team_data<- read_xlsx(file_name, sheet=sheet_name)
  team_data$Team<- team_name
  team_data$Sheet<- sheet_name
  team_data$PRA<- rowSums(team_data[, c("PTS", "ORB", "AST")], na.rm=TRUE)
  team_data$Stocks<- rowSums(team_data[, c("STL", "BLK")], na.rm=TRUE)
  team_data$Won_award<-ifelse(is.na(team_data$Awards),"0","1")
  return(team_data)
}
team_warrior<- loading_teams("NBA Team Total Data 2024-2025.xlsx", "Warriors", "Warriors", "PRA", "Stocks")
team_warrior
## # A tibble: 23 × 35
##       Rk Player    Age     G    GS    MP    FG   FGA `FG%`  `3P` `3PA` `3P%`  `2P` `2PA` `2P%` `eFG%`    FT   FTA
##    <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
##  1     1 Stephe…    36    70    70  2252   564  1258 0.448   311   784 0.397   253   474 0.534  0.572   279   299
##  2     2 Draymo…    34    68    66  1983   216   509 0.424    80   246 0.325   136   263 0.517  0.503   101   147
##  3     3 Buddy …    32    82    22  1863   328   786 0.417   203   549 0.37    125   237 0.527  0.546    53    64
##  4     4 Brandi…    21    64    33  1716   280   629 0.445   115   309 0.372   165   320 0.516  0.537    72    95
##  5     5 Moses …    22    74    34  1649   246   568 0.433   126   337 0.374   120   231 0.519  0.544   106   133
##  6     6 Andrew…    29    43    43  1296   261   588 0.444    94   248 0.379   167   340 0.491  0.524   139   179
##  7     7 Jonath…    22    47    10  1144   258   568 0.454    46   151 0.305   212   417 0.508  0.495   157   235
##  8     8 Kevon …    28    76     6  1142   143   278 0.514     2     5 0.4     141   273 0.516  0.518    56    99
##  9     9 Jimmy …    35    30    30   980   159   334 0.476    19    68 0.279   140   266 0.526  0.504   201   231
## 10    10 Trayce…    24    62    37   967   174   302 0.576     0     3 0       174   299 0.582  0.576    59   102
## # ℹ 13 more rows
## # ℹ 17 more variables: `FT%` <dbl>, ORB <dbl>, DRB <dbl>, TRB <dbl>, AST <dbl>, STL <dbl>, BLK <dbl>, TOV <dbl>,
## #   PF <dbl>, PTS <dbl>, `Trp-Dbl` <dbl>, Awards <chr>, Team <chr>, Sheet <chr>, PRA <dbl>, Stocks <dbl>,
## #   Won_award <chr>
getwd()
## [1] "/Users/crystaladote/Downloads/Reproducible Psyc Fall 2025/Bookdown_assignment"
path<- "/Users/crystaladote/Downloads/Reproducible Psyc Fall 2025/NBA Team Total Data 2024-2025.xlsx"
file.exists(path)
## [1] TRUE
team_sheets<- excel_sheets(path)
team_sheets
##  [1] "Nets"         "Knicks"       "Raptors"      "Philly"       "Celtics"      "Timberwolves" "Thunder"     
##  [8] "Jazz"         "Trailblazers" "Nuggets"      "Bulls"        "Bucks"        "Cavaliers"    "Pistons"     
## [15] "Pacers"       "Warriors"     "Suns"         "Lakers"       "Clippers"     "Kings"        "Hornets"     
## [22] "Magic"        "Wizards"      "Hawks"        "Heat"         "Grizzles"     "Spurs"        "Pelicans"    
## [29] "Rockets"      "Mavericks"
file_name <- "/Users/crystaladote/Downloads/Reproducible Psyc Fall 2025/NBA Team Total Data 2024-2025.xlsx"
team_sheets <- excel_sheets(file_name)

all_teams <- bind_rows(
  lapply(team_sheets, function(sheet_name) {
    loading_teams(file_name = file_name, team_name = sheet_name, sheet_name = sheet_name)
  })
)
View(all_teams)

4.3 Adding Conference Information

team_conference<- read_excel("Team Conferences.xlsx")
View(team_conference)
full_team_data<- merge(all_teams, team_conference)
View(full_team_data)
full_team_data<- full_team_data %>% 
  mutate(
    Conference= case_when(
     Conference== "East" ~ 1,
     Conference== "West" ~ 0
    )
  )

4.4 Visual Exploration

library(ggplot2)
ggplot(full_team_data, aes(x=PRA, y=Stocks, color=Conference))+
  geom_point()+
  labs(
    title= "Relationship between PRA and Stocks",
    x= "PRA",
    y= "Stocks"
  )
This scatter plot shows us the relationship between PRA and Stocks.

Figure 4.1: This scatter plot shows us the relationship between PRA and Stocks.

This scatter plot shows us that for both conferences (East and West), there is a positive relationship between PRA and Stocks.

ggplot(full_team_data, aes(x=DRB, y=STL, color= Conference)) +
  geom_point()+
  labs(
    title= "Relationship between Defensive Rebounds and Steals",
    x= "Defensive Rebounds",
    y= "Steals"
  )
This scatter plot shows the relationship between defensive rebounds and steals.

Figure 4.2: This scatter plot shows the relationship between defensive rebounds and steals.

The scatter plot shows us that there is a positive relationship with defensive rebounds and steals for both East and West.

4.5 Correlation Analysis

cor.test(full_team_data$PRA, full_team_data$Conference)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = -1.7941, df = 650, p-value = 0.07325
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.146194514  0.006620927
## sample estimates:
##         cor 
## -0.07019864

There is a negative, weak correlation between PRA and Conference (-0.070). This correlation is not statistically significant, given that it has a p-value of 0.073.

cor.test(full_team_data$Stocks, full_team_data$Conference)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = -2.094, df = 650, p-value = 0.03665
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.157650363 -0.005105577
## sample estimates:
##         cor 
## -0.08185737

There is a negative,weak correlation between Stocks and Conference (-0.0818). However, this correlation is statistically significant, with a p-value of 0.036.

library(ggcorrplot)
stats_matrix<- full_team_data %>% dplyr::select(Age, PRA, Stocks)
APS_matrix<- cor(stats_matrix, use="pairwise.complete.obs")
APS_matrix
##               Age       PRA     Stocks
## Age    1.00000000 0.1246811 0.07734898
## PRA    0.12468112 1.0000000 0.81779753
## Stocks 0.07734898 0.8177975 1.00000000
ggcorrplot(APS_matrix, lab=TRUE, type="lower")+
  labs(title="Correlation Matrix: Age, PRA, and Stocks")
This figure looks at the correlation matrix that includes age, PRA, and Stocks.

Figure 4.3: This figure looks at the correlation matrix that includes age, PRA, and Stocks.

The relationship between PRA and Stocks is the strongest, with a positive correlation of 0.82. It is the closest out of all 3 correlations to 1.

library(ppcor)
pcor.test(full_team_data$PRA, full_team_data$Stocks, full_team_data$Age)
##    estimate       p.value statistic   n gp  Method
## 1 0.8169568 2.748291e-157   36.0888 652  1 pearson

Controlling for the variable Age conveys that there is a strong positive correlation between PRA and Stocks (0.81695). Meaning that age doesn’t much effect on those 2 variables. It also shows us that it is statistically significant due to the p-value.

4.6 Findings

To Mr. Silver,

According to the findings, and correlations there doesn’t appear to be a difference between the East and West teams, especially regarding PRA (points, rebounds, and assists) and Stocks (steals and blocks). We also saw that there is a strong, positive correlation between PRA and Stocks when Age was controlled, conveying that age doesn’t have an effect on either variables. One scatter plot showed the relationship between defensive rebounds and steals conveying that offensive and defensive performances tend to move together. The scatter plot conveys a positive, somewhat strong correlation between the two variables. One potential next step for when analyzing this data could be to look at the relationship between the 2-point and 3-point averages and possibly field goal average. One limitation of my analysis would be that the weak correlations that were observed could possibly be due to other factors that aren’t in the data.