Day 2 of 30DayMapChallenge: « Lines » (previously).
We’ll make a map of the street name gender in Lyon. We need a database of french first names where we’ll find the gender. We will extract the Lyon streets from OpenStreetMap.
library(arrow) library(dplyr) library(tidyr) library(readr) library(purrr) library(ggplot2) library(stringr) library(sf) library(osmdata) library(ggspatial) library(glue) library(knitr) set.seed(42)
if (!file.exists("freq_prenoms.rds")) { freq_prenoms <- read_parquet("") |> filter(preusuel != "_PRENOMS_RARES") |> mutate(preusuel = iconv(preusuel, to = "ASCII//TRANSLIT")) |> group_by(preusuel, sexe) |> summarise(n = sum(nombre, na.rm = TRUE), .groups = "drop_last") |> mutate(total = sum(n)) |> ungroup() |> mutate(sexe = case_when(sexe == 1 ~ "M", sexe == 2 ~ "F", .default = NA_character_)) |> pivot_wider(names_from = sexe, values_from = n, values_fill = 0) |> mutate(across(c(M, F), \(x) x / total)) |> write_rds("freq_prenoms.rds") } else { freq_prenoms <- read_rds("freq_prenoms.rds") }
We have 34234 first names and their gender frequencies since 1900.
preusuel | total | M | F |
ZENABOU | 48 | 0 | 1 |
EMILIENE | 25 | 0 | 1 |
KINGSLEY | 878 | 1 | 0 |
DOLOVAN | 73 | 1 | 0 |
ERCOLE | 67 | 1 | 0 |
YVA | 178 | 0 | 1 |
ISSEY | 79 | 1 | 0 |
SAWSSEN | 121 | 0 | 1 |
MISBAH | 24 | 0 | 1 |
GOHANN | 20 | 1 | 0 |
lyon_bbox <- getbb("Lyon, France", featuretype = "city") if (!file.exists("osm.rds")) { lyon <- opq(lyon_bbox) |> add_osm_features(features = c( '"highway"="motorway"', '"highway"="trunk"', '"highway"="primary"', '"highway"="secondary"', '"highway"="tertiary"', '"highway"="motorway_link"', '"highway"="trunk_link"', '"highway"="primary_link"', '"highway"="secondary_link"', '"highway"="tertiary_link"', '"highway"="motorway_junction"', '"highway"="unclassified"', '"highway"="service"', '"highway"="pedestrian"', '"highway"="living_street"', '"highway"="residential"')) |> osmdata_sf() |> pluck("osm_lines") |> select(osm_id, name) |> drop_na(name) |> group_by(name) |> summarise() |> write_rds("osm.rds") } else { lyon <- read_rds("osm.rds") }
We use a brute-force method: for each street we check if a part of it’s label is present in our list of female or male first names. We keep only first names with a high frequency in any of the genders.
female <- freq_prenoms |> filter(F > .8, str_length(preusuel) > 1, preusuel != "LA") |> pull(preusuel) male <- freq_prenoms |> filter(M > .8, str_length(preusuel) > 1) |> pull(preusuel) street_gender <- lyon |> mutate(name = str_to_upper(iconv(name, to = "ASCII//TRANSLIT")), m = str_extract_all(name, glue_collapse(male, sep = "\\b|\\b", last = "\\b")), f = str_extract_all(name, glue_collapse(female, sep = "\\b|\\b", last = "\\b")), gender = unlist(map2(f, m, ~ case_when(length(.y) > length(.x) ~ "male", length(.x) > length(.y) ~ "female", identical(.x, character(0)) & identical(.y, character(0)) ~ "not concerned", length(.x) == length(.y) ~ "undecidable", .default = NA_character_))))
name | geometry | m | f | gender |
COURS DE VERDUN RECAMIER | LINESTRING (4.830426 45.748… | not concerned | ||
IMPASSE DES ANGLAIS | LINESTRING (4.795807 45.753… | not concerned | ||
RUE DES PROVENCES | LINESTRING (4.79335 45.7369… | not concerned | ||
CHEMIN DES PEUPLIERS | LINESTRING (4.866587 45.801… | not concerned | ||
ALLEE DU LEVANT | LINESTRING (4.878859 45.759… | not concerned | ||
RUE ROPOSTE | LINESTRING (4.866353 45.760… | not concerned | ||
ALLEE NELLIE BLY | LINESTRING (4.84882 45.7429… | NELLIE | female | |
LA VIEILLE ROUTE | LINESTRING (4.769782 45.720… | not concerned | ||
AVENUE DE CHAMPAGNE | MULTILINESTRING ((4.796801 … | not concerned |
street_gender |> mutate(gender = factor(gender, levels = c("female", "male", "undecidable", "not concerned"))) |> st_set_crs("EPSG:4326") |> ggplot() + geom_sf(aes(color = gender), linewidth = .5, key_glyph = "timeseries") + scale_color_manual(values = c("female" = "lightpink1", "male" = "lightskyblue", "undecidable" = "lightyellow4", "not concerned" = "seashell2")) + annotation_scale(bar_cols = c("darkgrey", "white"), line_col = "darkgrey", text_col = "darkgrey", height = unit(0.1, "cm")) + coord_sf(xlim = lyon_bbox[c(1, 3)], ylim = lyon_bbox[c(2, 4)]) + labs(title = "Gender in Lyon street names", color = "", caption = glue("Map data © OpenStreetMap contributors using INSEE Fichier des prénoms 2023 - {Sys.Date()}")) + theme_void() + theme(plot.background = element_rect(color = NA, fill = "white"), plot.caption = element_text(size = 5, color = "darkgrey"))
Lots of bias make this map unreliable, and would need manual editing…