I wanted a simple way to group items across restaurants, so I’m going to follow this example and use hierarchical clustering via the Levenshtein Distance (typically used for string distances).
Let’s see what groups we end up with.
1 2 3 4 5 6 7 8 9 10
57 368 39 5 15 12 4 4 2 9
$`1`
cut item restaurant
1 1 Artisan Grilled Chicken Sandwich Mcdonalds
58 1 Chargrilled Chicken Club Sandwich Chick Fil-A
59 1 Chargrilled Chicken Sandwich Chick Fil-A
73 1 4 Piece Grilled Chicken Nuggets Chick Fil-A
74 1 6 Piece Grilled Chicken Nuggets Chick Fil-A
75 1 8 piece Grilled Chicken Nuggets Chick Fil-A
76 1 12 Piece Grilled Chicken Nuggets Chick Fil-A
77 1 Spicy Grilled Chicken Sub Sandwich Chick Fil-A
78 1 Regular Grilled Chicken Sub Sandwich Chick Fil-A
79 1 Smokehouse BBQ Bacon Sandwich Chick Fil-A
82 1 Chargrilled Chicken Cool Wrap Chick Fil-A
112 1 3 Piece Crispy Chicken Tender Dinner Sonic
113 1 5 Piece Crispy Chicken Tender Dinner Sonic
117 1 Small Jumbo Popcorn Chicken Sonic
118 1 Large Jumbo Popcorn Chicken Sonic
$`2`
cut item restaurant
2 2 Single Bacon Smokehouse Burger Mcdonalds
3 2 Double Bacon Smokehouse Burger Mcdonalds
6 2 Big Mac Mcdonalds
7 2 Cheeseburger Mcdonalds
8 2 Classic Chicken Sandwich Mcdonalds
9 2 Double Cheeseburger Mcdonalds
10 2 Double Quarter Pounder® with Cheese Mcdonalds
11 2 Filet-O-Fish® Mcdonalds
12 2 Garlic White Cheddar Burger Mcdonalds
15 2 Hamburger Mcdonalds
16 2 Lobster Roll Mcdonalds
17 2 Maple Bacon Dijon 1/4 lb Burger Mcdonalds
20 2 McChicken Mcdonalds
21 2 McDouble Mcdonalds
22 2 McRib Mcdonalds
$`3`
cut item restaurant
4 3 Grilled Bacon Smokehouse Chicken Sandwich Mcdonalds
5 3 Crispy Bacon Smokehouse Chicken Sandwich Mcdonalds
13 3 Grilled Garlic White Cheddar Chicken Sandwich Mcdonalds
14 3 Crispy Garlic White Cheddar Chicken Sandwich Mcdonalds
18 3 Grilled Maple Bacon Dijon Chicken Sandwich Mcdonalds
19 3 Crispy Maple Bacon Dijon Chicken Sandwich Mcdonalds
24 3 Grilled Pico Guacamole Chicken Sandwich Mcdonalds
25 3 Crispy Pico Guacamole Chicken Sandwich Mcdonalds
26 3 Premium Buttermilk Crispy Chicken Deluxe Sandwich Mcdonalds
27 3 Premium Crispy Chicken Deluxe Sandwich Mcdonalds
30 3 Grilled Signature Sriracha Chicken Sandwich Mcdonalds
31 3 Crispy Signature Sriracha Chicken Sandwich Mcdonalds
33 3 Grilled Sweet BBQ Bacon Chicken Sandwich Mcdonalds
34 3 Crispy Sweet BBQ Bacon Chicken Sandwich Mcdonalds
35 3 3 piece Buttermilk Crispy Chicken Tenders Mcdonalds
$`4`
cut item restaurant
46 4 4 piece Sweet N' Spicy Honey BBQ Glazed Tenders Mcdonalds
47 4 6 piece Sweet N' Spicy Honey BBQ Glazed Tenders Mcdonalds
48 4 10 piece Sweet N' Spicy Honey BBQ Glazed Tenders Mcdonalds
270 4 4 Piece Chicken Strip Basket w/ Country Gravy Dairy Queen
271 4 6 Piece Chicken Strip Basket w/ Country Gravy Dairy Queen
$`5`
cut item
49 5 Premium Asian Salad w/o Chicken
50 5 Premium Asian Salad w/ Grilled Chicken
51 5 Premium Asian Salad w/ Crispy Chicken
52 5 Premium Bacon Ranch Salad w/o Chicken
53 5 Premium Bacon Ranch Salad w/ Grilled Chicken
54 5 Premium Bacon Ranch Salad w/ Crispy Chicken
55 5 Premium Southwest Salad w/o Chicken
56 5 Premium Southwest Salad w/ Grilled Chicken
57 5 Premium Southwest Salad w/ Crispy Chicken
225 5 Chicken BLT Salad w/ Grilled Chicken
226 5 Chicken BLT Salad w/ Crispy Chicken
227 5 Chicken Caesar Salad w/ Grilled Chicken
228 5 Chicken Caesar Salad w/ Crispy Chicken
229 5 Chicken, Apple & Cranberry Salad w/ Grilled Chicken
230 5 Chicken, Apple & Cranberry Salad w/ Crispy Chicken
restaurant
49 Mcdonalds
50 Mcdonalds
51 Mcdonalds
52 Mcdonalds
53 Mcdonalds
54 Mcdonalds
55 Mcdonalds
56 Mcdonalds
57 Mcdonalds
225 Burger King
226 Burger King
227 Burger King
228 Burger King
229 Burger King
230 Burger King
$`6`
cut item restaurant
92 6 Sonic Bacon Cheeseburger (w/mayo) Sonic
99 6 Super Sonic Bacon Double Cheeseburger (w/mayo) Sonic
100 6 Super Sonic Double Cheeseburger W/ Mustard Sonic
101 6 Super Sonic Double Cheeseburger W/ Ketchup Sonic
102 6 Super Sonic Double Cheeseburger W/ Mayo Sonic
103 6 Super Sonic Jalapeno Double Cheeseburger Sonic
435 6 Cool Ranch® Doritos® Double Decker® Taco Taco Bell
442 6 Spicy Sweet Double Stacked Taco Taco Bell
443 6 Cool Ranch Habanero Double Stacked Taco Taco Bell
444 6 Nacho Crunch Double Stacked Taco Taco Bell
445 6 Fiery Doritos® Double Decker® Taco Taco Bell
449 6 Nacho Cheese Doritos® Double Decker® Taco Taco Bell
$`7`
cut
223 7
224 7
231 7
232 7
item
223 Bacon Cheddar Ranch Chicken Salad w/ grilled Chicken & Dressing
224 Bacon Cheddar Ranch Chicken Salad w/ crispy Chicken & Dressing
231 Garden Grilled Chicken Salad w/ Grilled Chicken, no dressing
232 Garden Grilled Chicken Salad w/ Crispy Chicken, no dressing
restaurant
223 Burger King
224 Burger King
231 Burger King
232 Burger King
$`8`
cut item
233 8 Side Caesar Salad with dressing
234 8 Side Garden Salad and Avocado Ranch Dressing
374 8 Buffalo Chicken Salad (with Ranch dressing)
377 8 Chicken & Bacon Ranch Melt Salad (includes Ranch dressing)
restaurant
233 Burger King
234 Burger King
374 Subway
377 Subway
$`9`
cut item restaurant
362 9 6" Turkey Italiano Melt (with Provolone) Subway
363 9 Footlong Turkey Italiano Melt (with Provolone) Subway
$`10`
cut item restaurant
436 10 Cool Ranch® Doritos® Locos Taco Taco Bell
437 10 Cool Ranch® Doritos® Locos Taco Supreme Taco Bell
446 10 Fiery Doritos® Locos Taco Taco Bell
447 10 Fiery Doritos® Locos Taco Supreme Taco Bell
450 10 Nacho Cheese Doritos® Locos Tacos Taco Bell
451 10 Nacho Cheese Doritos® Locos Tacos Supreme Taco Bell
470 10 Doritos® Cheesy Gordita Crunch - Cool Ranch Taco Bell
471 10 Doritos® Cheesy Gordita Crunch - Fiery Taco Bell
472 10 Doritos® Cheesy Gordita Crunch - Nacho Cheese Taco Bell
## should be purrr-ing
First things first, the Doritos + Tacobell pairing is in a league of its own.
Just from looking at the top 15:
Group 1: chicken
Group 2: sandwiches (no chicken) [but chicken items are in here, just not in head]
Group 3: chicken sandwiches
Group 4: chicken
Group 5: chicken salads
Group 6: doubles
Group 7: chicken salads with dressing
Group 8: salad with dressing
Group 9: Subway
Group 10: Doritos + Tacobell
Where are the other Subway items? Where are the Arby’s items hiding?
lapply(
split(df, df$restaurant),
function(x) {
x %>%
group_by(cut) %>%
summarise(count = n()) %>%
mutate(prop = count / sum(count))
}
)
$Arbys
# A tibble: 3 x 3
cut count prop
* <int> <int> <dbl>
1 1 16 0.291
2 2 37 0.673
3 3 2 0.0364
$`Burger King`
# A tibble: 6 x 3
cut count prop
* <int> <int> <dbl>
1 1 2 0.0286
2 2 51 0.729
3 3 5 0.0714
4 5 6 0.0857
5 7 4 0.0571
6 8 2 0.0286
$`Chick Fil-A`
# A tibble: 2 x 3
cut count prop
* <int> <int> <dbl>
1 1 10 0.370
2 2 17 0.630
$`Dairy Queen`
# A tibble: 3 x 3
cut count prop
* <int> <int> <dbl>
1 1 1 0.0238
2 2 39 0.929
3 4 2 0.0476
$Mcdonalds
# A tibble: 5 x 3
cut count prop
* <int> <int> <dbl>
1 1 1 0.0175
2 2 24 0.421
3 3 20 0.351
4 4 3 0.0526
5 5 9 0.158
$Sonic
# A tibble: 4 x 3
cut count prop
* <int> <int> <dbl>
1 1 6 0.113
2 2 29 0.547
3 3 12 0.226
4 6 6 0.113
$Subway
# A tibble: 4 x 3
cut count prop
* <int> <int> <dbl>
1 1 14 0.146
2 2 78 0.812
3 8 2 0.0208
4 9 2 0.0208
$`Taco Bell`
# A tibble: 4 x 3
cut count prop
* <int> <int> <dbl>
1 1 7 0.0609
2 2 93 0.809
3 6 6 0.0522
4 10 9 0.0783
Most of the Subway and Arby’s items are in the biggest group. Increasing the number of clusters would eventually break this group up further.
What makes some restaraunts more spread out across groups than others? Since the distance metric counts additions, deletions, and substitutions, there is some dependence on the length of the item name.
ff$len <- nchar(ff$item)
ggplot(ff, aes(len, col = restaurant, group = restaurant)) +
geom_density()
Hm… nothing is jumping out at me here.
Are there differences in nutrient levels across clusters? I’m going to color the clusters that I preliminarily marked as “salad” groups to see if they are healthier. Remember, that these groups have all salads, but there are salads in other groups.
toP <- df %>%
group_by(cut) %>%
summarise_if(is.numeric, mean, na.rm = T) %>%
gather(nutrient, value, -cut)
toP$isSalad <- as.factor(ifelse(toP$cut %in% c(5, 7, 8), 1, 0))
ggplot(toP, aes(as.factor(cut), value, col = isSalad)) +
geom_col() +
facet_wrap(~nutrient, scales = "free_y") +
xlab("cluster") +
ylab("mean value")
The purely salad groups still have fairly high cholesterol and saturated fat levels. They also have high values of vitamin A and C though, so it’s not all bad.
Why are chicken items so spread out among groups? Again, is this due to a difference in the length of item description? We can also see the distribution of distances between an item and all other items by whether or not the item has “chicken” in it to see if that gives us any clues.
dim(d) ## distance between each item and every other item
[1] 515 515
dim(ff)
[1] 515 17
ff$med <- apply(d, 1, median)
ff$max <- apply(d, 1, max)
ff$sd <- apply(d, 1, sd)
ff$isChicken <- as.factor(ifelse(grepl("Chicken", ff$item), 1, 0))
ff$len <- nchar(ff$item)
g1 <- ggplot(ff, aes(med, col = isChicken, group = isChicken)) +
geom_density() +
xlab("median string distance between other items")
g2 <- ggplot(ff, aes(max, col = isChicken, group = isChicken)) +
geom_density() +
xlab("max string distance between other items")
g3 <- ggplot(ff, aes(sd, col = isChicken, group = isChicken)) +
geom_density() +
xlab("sd string distance between other items")
g4 <- ggplot(ff, aes(len, col = isChicken, group = isChicken)) +
geom_density() +
xlab("length of item name")
grid.arrange(g1, g2, g3, g4, ncol = 2)
Chicken items do seem to be further in string distance from other items than non-chicken items. They also have slightly longer item name lengths. However, non-chicken items actually have higher maximum distances from other items and a wider range of variability in distances.
Disclaimer: This is not real text analysis; I’m just using some fancy-ish tools to explore. If anyone has ideas about how to make this more rigorous, I’d be curious to learn more.