NOTE: When updating my website in June 2021, this code was revealed to be deprecated. I am using eval = F
to preserve the post, but code will not run as is. I will try to update at some point (or if you are reading this and now what to do to fix it, let me know).
RAW DATA
DataSource: census.gov
Kaggle source
This week I am taking inspiration from the Tidy Tuesday submissions of @AidoBo and @jakekaupp.
I’m slightly tweaking @AidoBo’s function to plot continuous variables on a map to help me explore.
For #TidyTuesday I created simple function which allows you to plot any continuous variable in the data on a map #rstats #r4ds pic.twitter.com/6Q1I121VqI
— Aidan Boland (@AidoBo) May 1, 2018
And inspired by @jakekaupp’s work showing commute time in terms of number of Despacito listens
A blog post catching up on week 4 and week 5 of #TidyTuesday https://t.co/AoXuNI5s0j Code available at https://t.co/kuJdBQG4pn #rstats #r4ds pic.twitter.com/IXjONQ0LXs
— Jake Kaupp (@jakekaupp) May 3, 2018
I wanted to adapt the function from above (thanks @AidoBo) to make a commuting map for any song. We can use the spotifyr package to access the length of a given song.
counties= map_data("county")
state=map_data("state")
county_plot <-function(x){
## adapted from
##https://twitter.com/AidoBo/status/991338257391804416
all_county$x<-all_county[,x] ## a different fix for this? something like aes_string?
ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
geom_polygon(data=all_county, aes(fill=x),color="grey")+labs(fill=x)+scale_fill_distiller(palette="Spectral")+theme_void()+
geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
}
Get your own Client ID and Client Secret here.
spotify_client_id="" ## put yours here
spotify_client_secret="" ## put yours here
access_token <- get_spotify_access_token(client_id=spotify_client_id,client_secret=spoitfy_client_secret)
county_commute_plot_tunes <-function(artist,song,access_token){
artists <- get_artists(artist,authorization=access_token)
albums <- get_albums(artists$artist_uri[1],authorization=access_token)
tracks<-get_album_tracks(albums,authorization=access_token)
track= tracks[which(grepl(song,tracks$track_name)),][1,"track_uri"]
audio_features <- get_track_audio_features(track,authorization=access_token)
songLength=audio_features$duration_ms/1000/60
all_county$commuteTune=all_county$MeanCommute/songLength
ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
geom_polygon(data=all_county, aes(fill=commuteTune),color="grey")+labs(fill=paste("Song Plays Per \n Average Commute"))+scale_fill_distiller(palette="Spectral")+theme_void()+ggtitle(paste(artist,song,sep=" - "))+
geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
}
What are these hot spots?
p=county_plot("MeanCommute")
p
We can use ggplotly
to use hover information to identify counties of interest.
county_plotly <-function(x){
## adapted from
##https://twitter.com/AidoBo/status/991338257391804416
all_county$x<-all_county[,x] ## a different fix for this? something like aes_string?
ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
geom_polygon(data=all_county, aes(fill=x,region=region,subregion=subregion),color="grey")+labs(fill=x)+scale_fill_distiller(palette="Spectral")+theme_void()+
geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
}
test=county_plotly("MeanCommute")
ggplotly(test,tooltip=c("region","subregion"))
Must be the money?
county_commute_plot_tunes("Nelly","Ride Wit Me",access_token)
Where can commuting make you make more money?
Caveat: I’m not really answering this question because we don’t have the data at the individual level, but as an exploratory exercise…
Can we get a rough idea of where it does and doesn’t pay to commute on a map instead of relying on hovering?
## which counties have above average income and below average commute time per state (averages within a state)
averagesByState=group_by(acs,State)%>% summarize(avgMeanCommute=mean(MeanCommute),avgIncomePerCap=mean(IncomePerCap))
acsM=merge(acs,averagesByState,by.x="State",by.y="State",all.x=T)
acs$goodCommuteIncomeLevels=rep(0, nrow(acs))
acs$goodCommuteIncomeLevels[which(acsM$IncomePerCap>acsM$avgIncomePerCap & acsM$MeanCommute < acsM$avgMeanCommute)]=1
acs$goodCommuteIncomeLevels=as.factor(acs$goodCommuteIncomeLevels)
all_county<-inner_join(counties,acs %>% mutate(County=tolower(County),State=tolower(State)),by=c("subregion"="County","region"="State"))
## need a discrete version of the map
ggplot(data=counties,mapping=aes(x=long,y=lat,group=group))+
geom_polygon(data=all_county, aes(fill=goodCommuteIncomeLevels),color="grey")+labs(fill="Good Commuter Given Income")+scale_fill_discrete()+theme_void()+
geom_path(data=state, aes(x=long,y=lat,group=group),color="black") ## add state boundaries
p=county_plot("WorkAtHome")
p
Again we could use ggplotly
to identify hot spots?
A lot could be going on here, so again I don’t want to read too much into this plot. Since we don’t have income per person we don’t know if those working from home make more or less than those in other jobs within their county. However, there are some interesting patterns here that it would be interesting to look into with data at the individual level.
county_commute_plot_tunes("Fifth Harmony","Work from Home",access_token)
Note: My county_commute_plot_tunes
is not robust to capitalization. There was some trial and error involved.