Chapter 9 NYCComplaintApp

9.1 Introduction

For this assignment, I opted to build off code my professor (Christian Martinez) created and use that as a launch point for thinking about language analysis. For the first study in my thesis project, we ended up collecting a lot of “open-response” data. Yet, it wasn’t directly related to our overall questions, so didn’t do much of anything with it. I thought that I could use this project to start exploring very very basic language analysis ideas, in case my advisor and I ever decided to do something with that existing data. (Note: This code is intended for rhsiny,not rmd, so it will not run within this book).

9.2 Calling Our Data

library(shiny)
library(wordcloud)
library(tm)

data_311 <- nyc_311(limit = 10000)

9.3 Interactive Visualizations

Below, I preserved your code from class. I edited some minor aspects (such as the max number of complaints, fonts, colors, etc.), but kept it as it was otherwise.

inputPanel(
  selectInput("boro","Choose a Borough:",
              choices = c("BRONX","BROOKLYN","MANHATTAN","QUEENS","STATEN ISLAND","Unspecified")),
  sliderInput("top_n","Number of Top Complaint Types:", min = 5, max = 15, value = 10)
)

Choose a Borough:

Number of Top Complaint Types:

renderPlot({
  data_311 %>%
    filter(borough == input$boro) %>%
    count(complaint_type, sort = TRUE) %>%
    slice_head(n = input$top_n) %>%
    ggplot(aes(n, fct_reorder(complaint_type, n))) +
    geom_col(fill = "purple") +
    labs(
      title = paste("Top Complaint Types in", input$boro),
      x = "Number of Complaints",
      y = "Complaint Type"
    ) +
    theme_minimal(base_size = 30) +   
    theme(
      plot.title = element_text(size = 30, face = "bold"),   
      axis.text = element_text(size = 20),                  
      axis.title = element_text(size = 18)                   
    )
})

9.4 Word Clouds

After this, I thought it would be interesting to create a word cloud, based on the specific descriptions of the complaints tenants gave. While, obviously, word clouds are not particularly scientific, I think they are fun, and even meaningful, visual representations of the data. (Also, they are good starting points for t-shirt design, for anyone who wishes to protest NYC’s landlords). I did use outside resources for this code, but I tried to make sure I understood what every piece of it meant. I also opted to eliminate the words entire, apartment & building from the code, since I didn’t feel that they provided particularly meaningful descriptions of the types of complaints tenants were making. The amount of words that pop up in the word cloud correspond to both the borough and amount of overall complaints one selects at the beginning.

renderPlot({
  
  par(mar = c(2, 2, 4, 2)) 

  text_data <- data_311[data_311$borough == input$boro, "descriptor"]
  
  text_data <- text_data[!is.na(text_data)]
  text_data <- text_data[text_data != ""]

  corpus <- VCorpus(VectorSource(text_data))
  corpus <- tm_map(corpus, content_transformer(tolower))
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, removeWords, stopwords("english"))
  corpus <- tm_map(corpus, removeWords, c("entire", "apartment", "building"))


  tdm <- TermDocumentMatrix(corpus)
  mat <- as.matrix(tdm)
  freq <- sort(rowSums(mat), decreasing = TRUE)
  
  freq <- head(freq, input$top_n) 

  wordcloud(
  words = names(freq),
  freq = freq,
  max.words = 100,
  scale = c(5, 1.4),         
  colors = brewer.pal(8, "Dark2")
)

     title(main = "Graphic Depiction of Common Complaint Words Per Selected Borough",
        cex.main = 2.0) 
  
})

9.5 Histograms

The word clouds are fun, but not particularly precise. So, I thought it would be good to have histograms of the descriptors as well. These, like the word clouds above, correspond to the initial selection you make regarding boroughs and complaints per borough.

renderPlot({


  text_data <- data_311[data_311$borough == input$boro, "descriptor"]

 
  text_data <- text_data[!is.na(text_data)]
  text_data <- text_data[text_data != ""]


  corpus <- VCorpus(VectorSource(text_data))
  corpus <- tm_map(corpus, content_transformer(tolower))
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, removeWords, stopwords("english"))
  corpus <- tm_map(corpus, removeWords, c("entire", "apartment", "building"))


  tdm <- TermDocumentMatrix(corpus)
  mat <- as.matrix(tdm)
  freq <- sort(rowSums(mat), decreasing = TRUE)


  freq <- head(freq, input$top_n)

  freq_df <- data.frame(
    word = names(freq),
    count = as.numeric(freq)
  )

 
  ggplot(freq_df, aes(x = reorder(word, count), y = count)) +
    geom_col(fill = "steelblue") +
    labs(
      title = paste("Top", input$top_n, "Descriptor Words in", input$boro),
      x = "Word",
      y = "Frequency"
    ) +
    theme_minimal(base_size = 20) +     
    theme(
      plot.title = element_text(size = 26, face = "bold"),   
      axis.text = element_text(size = 18),                   
      axis.title = element_text(size = 20),                  
      axis.text.x = element_text(angle = 45, hjust = 1)      
    )
    
})

I also included actual raw data table below, in case users wished to cross-reference the histograms, word clouds, etc. with the original data. It automatically shows the raw data for whatever borough you select at the beginning.

9.6 Source Data

renderDT({
  data_311 %>% filter(borough == input$boro)
})