ACS Census Data (2015)

Sara Stoudt true
04-30-2018

NOTE: When updating my website in June 2021, this code was revealed to be deprecated. I am using eval = F to preserve the post, but code will not run as is. I will try to update at some point (or if you are reading this and now what to do to fix it, let me know).

Setup

Week 5 - County-level American Community Survey (5-year estimates) 2015

RAW DATA
DataSource: census.gov
Kaggle source

This week I am taking inspiration from the Tidy Tuesday submissions of @AidoBo and @jakekaupp.

I’m slightly tweaking @AidoBo’s function to plot continuous variables on a map to help me explore.

For #TidyTuesday I created simple function which allows you to plot any continuous variable in the data on a map #rstats #r4ds pic.twitter.com/6Q1I121VqI

— Aidan Boland (@AidoBo) May 1, 2018

And inspired by @jakekaupp’s work showing commute time in terms of number of Despacito listens

A blog post catching up on week 4 and week 5 of #TidyTuesday https://t.co/AoXuNI5s0j Code available at https://t.co/kuJdBQG4pn #rstats #r4ds pic.twitter.com/IXjONQ0LXs

— Jake Kaupp (@jakekaupp) May 3, 2018

I wanted to adapt the function from above (thanks @AidoBo) to make a commuting map for any song. We can use the spotifyr package to access the length of a given song.

counties= map_data("county")
state=map_data("state")

  county_plot <-function(x){
  ## adapted from
  
  ##https://twitter.com/AidoBo/status/991338257391804416
  
  all_county$x<-all_county[,x] ## a different fix for this? something like aes_string?
  
  ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
    geom_polygon(data=all_county, aes(fill=x),color="grey")+labs(fill=x)+scale_fill_distiller(palette="Spectral")+theme_void()+
    geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
  
}
setwd("~/Desktop/tidytuesday/data/2018/2018-04-30")
acs<-read.csv("week5_acs2015_county_data.csv")

head(acs)
names(acs)

## from @AidoBo too
all_county<-inner_join(counties,acs %>% mutate(County=tolower(County),State=tolower(State)),by=c("subregion"="County","region"="State"))

Get your own Client ID and Client Secret here.

spotify_client_id="" ## put yours here
spotify_client_secret="" ## put yours here
access_token <- get_spotify_access_token(client_id=spotify_client_id,client_secret=spoitfy_client_secret)
county_commute_plot_tunes <-function(artist,song,access_token){
  artists <- get_artists(artist,authorization=access_token)
  albums <- get_albums(artists$artist_uri[1],authorization=access_token)
  tracks<-get_album_tracks(albums,authorization=access_token)
  

  track= tracks[which(grepl(song,tracks$track_name)),][1,"track_uri"]
  
  audio_features <- get_track_audio_features(track,authorization=access_token)
  
  songLength=audio_features$duration_ms/1000/60
  
  all_county$commuteTune=all_county$MeanCommute/songLength
  
  ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
    geom_polygon(data=all_county, aes(fill=commuteTune),color="grey")+labs(fill=paste("Song Plays Per \n Average Commute"))+scale_fill_distiller(palette="Spectral")+theme_void()+ggtitle(paste(artist,song,sep=" - "))+
    geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
  
}

Commuting

What are these hot spots?

p=county_plot("MeanCommute") 
p

We can use ggplotly to use hover information to identify counties of interest.

county_plotly <-function(x){
  ## adapted from
  
  ##https://twitter.com/AidoBo/status/991338257391804416
  
  all_county$x<-all_county[,x] ## a different fix for this? something like aes_string?
  
  ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
    geom_polygon(data=all_county, aes(fill=x,region=region,subregion=subregion),color="grey")+labs(fill=x)+scale_fill_distiller(palette="Spectral")+theme_void()+
    geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
  
}

test=county_plotly("MeanCommute")
ggplotly(test,tooltip=c("region","subregion"))

Must be the money?

county_commute_plot_tunes("Nelly","Ride Wit Me",access_token)

Where can commuting make you make more money?

Caveat: I’m not really answering this question because we don’t have the data at the individual level, but as an exploratory exercise…

p <- ggplot(acs, aes(x = MeanCommute, y = IncomePerCap, text =paste(County,State,sep="-"))) +
  geom_point() +xlab("mean commute")+
  ylab("income per cap")+ggtitle("Where Does/Doesn't Commuting Pay Off?")
p ## static for GitHub
#ggplotly(p)

Can we get a rough idea of where it does and doesn’t pay to commute on a map instead of relying on hovering?

## which counties have above average income and below average commute time per state (averages within a state)
averagesByState=group_by(acs,State)%>% summarize(avgMeanCommute=mean(MeanCommute),avgIncomePerCap=mean(IncomePerCap))


acsM=merge(acs,averagesByState,by.x="State",by.y="State",all.x=T)

acs$goodCommuteIncomeLevels=rep(0, nrow(acs))
acs$goodCommuteIncomeLevels[which(acsM$IncomePerCap>acsM$avgIncomePerCap & acsM$MeanCommute < acsM$avgMeanCommute)]=1
acs$goodCommuteIncomeLevels=as.factor(acs$goodCommuteIncomeLevels)

all_county<-inner_join(counties,acs %>% mutate(County=tolower(County),State=tolower(State)),by=c("subregion"="County","region"="State"))

## need a discrete version of the map
ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
    geom_polygon(data=all_county, aes(fill=goodCommuteIncomeLevels),color="grey")+labs(fill="Good Commuter Given Income")+scale_fill_discrete()+theme_void()+
    geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries

Work At Home

p=county_plot("WorkAtHome") 
p

Again we could use ggplotly to identify hot spots?

test=county_plotly("WorkAtHome")
ggplotly(test,tooltip=c("region","subregion"))

A lot could be going on here, so again I don’t want to read too much into this plot. Since we don’t have income per person we don’t know if those working from home make more or less than those in other jobs within their county. However, there are some interesting patterns here that it would be interesting to look into with data at the individual level.

ggplot(acs,aes(x=WorkAtHome,y=acs$IncomePerCap))+geom_point() +xlab("percentage working from home")+
  ylab("income per cap")+ggtitle("Does Working From Home Pay?")
county_commute_plot_tunes("Fifth Harmony","Work from Home",access_token)

Note: My county_commute_plot_tunes is not robust to capitalization. There was some trial and error involved.