NOTE: When updating my website in June 2021, this code was revealed to be deprecated. I am using eval = F
to preserve the post, but code will not run as is. I will try to update at some point (or if you are reading this and now what to do to fix it, let me know).
I recently spent some time at the Geometry of Redistricting Hackathon where I learned about quantitative approaches to assessing gerrymandering. Check out the Metric Geometry and Gerrymandering Group on GitHub to see how you can get involved. I focused on improving documentation during my brief time at the hackathon, but I did not get a chance to contribute as much as I would have liked during the hackathon itself due to my own time constraints.
Here I hope to continue to improve the overall documentation by giving some intuition about how one can generate valid redistricting plans and evaluate them based on compactness measures using the mandeR and redist packages as well as some code from the MCMC visualization project. The mandeR package takes shapefiles and calculates a variety of compactness measures for each polygon. The redist package implements a Markov Chain Monte Carlo (MCMC) approach to generating valid redistricting plans. We combine the functionality of both to get some intuition about the issues facing the quantitative study of gerrymandering.
*See the README for mandeR if you have trouble installing it.
The redist package implements a new approach for simulating possible redistricting scenarios using MCMC. Fifield et al. define the problem as “a state consisting of \(m\) geographical units (e.g. census blocks or voting precincts) must be divided into \(n\) contiguous districts.”
For small \(m\) or \(n\) it may be possible to enumerate all valid redistricting plans with a specified number of districts given a set of geographic units. redist.enumerate() does this, but quickly becomes too slow as \(m\) and \(n\) increase. The example provided by the documentation enumerates all possible redistricting plans for 25 contiguous precincts in Florida. The method only expects an adjacency list (which geographic units share a boundary), and does not utilize any other spatial information.
We can see the number of valid redistricting scenarios. A valid redistricting scenario gives a district label to each geographic unit (here a 1 or 2).
If we want to impose equal population constraints for each district, the popvec argument allows us to give population values for each geographical unit, and popcons gives the threshold for how far from equal population we will allow (here we specify within 5%).
We can see there are many fewer valid redistricting scenarios when we impose this constraint.
If we want three districts instead of two, we already reach intractability (at least for my patience level on my laptop).
test=redist.enumerate(adjobj=algdat.pfull$adjlist,ndists=3) ## don't run
If it is not feasible to try all possible combinations, how can we generate possible redistricting scenarios efficiently? The challenge is that a random “redistricting” may not be a valid one. We also need to impose a certain structure (contiguous geographic units) and set of constraints (equal population, geographical compactness). Previous approaches are inefficient and ad-hoc. redist.mcmc() is an algorithm that uses MCMC to uniformly sample redistricting plans with a specified number of contiguous districts subject to constraints.
Let’s walk through the example in the documentation:
For each geographic unit, we see which district it is placed in per iteration. We can also track various constraint measures. However, this all operates via adjacency list, which isn’t easy to parse. What if we want to visualize these redistricting scenarios to better see what is going on? We will use some helper functions from the mcmcviz project.
## wrapper for redist.mcmc
## pre-procceing: change shapefile to adjacency list
## post-processing: thinning
#redistrict = function(geom, nsims, nthin, nburn, ndists, popcons, eprob, lambda) {
redistrict = function(geom, nsims, nthin, nburn, ndists, popcons) { ## changed for our example
adj_obj = st_relate(geom, pattern = "****1****")
mcmc = redist.mcmc(adj_obj, geom$population,
nsims=nsims+nburn, ndists=ndists, popcons=popcons) ## removed eprob, lambda for our example
mcmc$partitions %>% as.data.frame() %>% as.list() ##thin(nsims, nburn, nthin=nthin) %>% ## took out thin, couldn't find the appropriate function (not coda)
}
## groups geographic units into districts, makes polygons by pasting all together
create_district_map = function(geom, districts)
{
mutate(geom, district = as.character(districts)) %>%
group_by(district) %>%
summarize(
population = sum(population),
geometry = st_union(geometry)
)
}
## gets a district map per iteration
gather_maps=function(geom, iters) {
mclapply(iters, create_district_map, geom = geom, mc.cores = detectCores()) ## parallel
}
The mcmcviz project also has some real shapefiles of Anne Arundel, MD that we will use here. I have this data downloaded locally, but I would love if someone could tell me how to load a shapefile from GitHub via code (I suspect issues because the other .dbf, .prj, etc. files are also needed at the same time).
setwd("~/Desktop/mcmcviz/data")
geom = st_read("AnneArundelN.shp", quiet = TRUE)
names(geom) = tolower(names(geom)) ## needed in order for redist.rsg to be able to create an initial districting
iters = redistrict(geom, nsims=1000, nthin=10, nburn=100, ndists=3, popcons=0.05)
maps = gather_maps(geom, iters) ## time intensive even in parallel
## look at a few iterations
mapdata1 = maps[[1]]
mapdata2 = maps[[10]]
mapdata3 = maps[[20]]
mapdata4 = maps[[30]]
mapdata5 = maps[[40]]
mapdata6 = maps[[50]]
mapDistrict<-function(idx){
mapdata=maps[[idx]]
g1=ggplot(mapdata)+geom_sf(aes(fill=district))+theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
return(g1)
}
g1=mapDistrict(1)
g2=mapDistrict(10)
g3=mapDistrict(20)
g4=mapDistrict(30)
g5=mapDistrict(40)
g6=mapDistrict(50)
grid.arrange(g1,g2,g3,g4,g5,g6,ncol=2)
Using the README from mandeR we can now check the compactness measures of the proposed redistricting scenarios.
Expanding on the description of compactness measures from compactnesslib:
Read more about compactness measures here.
mapdata = maps[[1]]
#Convert the shapefile to WKT (class needed by compactlib)
wkt_str <- lapply(st_geometry(mapdata),st_as_text)
#Retrieve compactness scores from mandeR
scores <- lapply(wkt_str,getScoresForWKT)
scores=do.call(rbind,scores)
scores$id=1:nrow(scores)
#Merge scores back into districts
dists<-merge(mapdata,scores,by.x="district",by.y="id")
names(dists)
mapScore<-function(dists,name){
g1=ggplot(dists)+geom_sf(aes_string(fill =name ))+theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
}
g1=mapScore(dists,"population")
g2=mapScore(dists,"CvxHullPS")
g3=mapScore(dists,"ReockPS")
g4=mapScore(dists,"Schwartzbe")
g5=mapScore(dists,"PolsbyPopp")
grid.arrange(g1,g2,g3,g4,g5,ncol=2)
mapdata = maps[[50]]
#Convert the shapefile to WKT
wkt_str <- lapply(st_geometry(mapdata),st_as_text)
#Retrieve compactness scores from mandeR
scores <- lapply(wkt_str,getScoresForWKT)
scores=do.call(rbind,scores)
scores$id=1:nrow(scores)
#Merge scores back into districts
dists<-merge(mapdata,scores,by.x="district",by.y="id")
g1=mapScore(dists,"population")
g2=mapScore(dists,"CvxHullPS")
g3=mapScore(dists,"ReockPS")
g4=mapScore(dists,"Schwartzbe")
g5=mapScore(dists,"PolsbyPopp")
grid.arrange(g1,g2,g3,g4,g5,ncol=2)
If we want to more systematically compare all of the redistricting options, we can get compactness scores for all of the iterations.
getScoresFn=function(mapdata){
wkt_str <- lapply(st_geometry(mapdata),st_as_text)
#Retrieve compactness scores from mandeR
scores <- lapply(wkt_str,getScoresForWKT)
scores=do.call(rbind,scores)
scores$id=1:nrow(scores)
#Merge scores back into districts
dists<-merge(mapdata,scores,by.x="district",by.y="id")
return(dists)
}
scoresPerIter=mclapply(maps,getScoresFn,mc.cores = detectCores())
Then we can plot the density of the scores. This can help us see which particular redistricting plans are extreme (used as evidence for intentional gerrymandering).
Now we can: