IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    spuriouscorrelations: An R package to show examples about spurious correlations

    pacha.dev/blog发表于 2025-05-17 04:00:00
    love 0
    [This article was first published on pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    I’ve been busy with the field exams, so I haven’t had much time to work on the blog.

    spuriouscorrelations package started as a fun project for one of my tutorials.

    Here is a case of an interesting correlation: the number of people who drowned by falling into a pool and the number of films Nicholas Cage appeared in.

    library(spuriouscorrelations)
    library(dplyr)
    Attaching package: 'dplyr'
    The following objects are masked from 'package:stats':
    
        filter, lag
    The following objects are masked from 'package:base':
    
        intersect, setdiff, setequal, union
    library(ggplot2)
    
    unique(spurious_correlations$var1)
     [1] Suicides by hanging, strangulation and suffocation              
     [2] Number of people who drowned by falling into a pool             
     [3] Number of people who died by becoming tangled in their bedsheets
     [4] Murders by steam, hot vapours and hot objects                   
     [5] Computer science doctorates awarded in the US                   
     [6] Sociology doctorates awarded in the US                          
     [7] Civil engineering doctorates awarded in the US                  
     [8] People who drowned after falling out of a fishing boat          
     [9] Drivers killed in collision with railway train                  
    [10] Total US crude oil imports                                      
    [11] Number of people who drowned while in a swimming-pool           
    [12] Suicides by crashing of motor vehicle                           
    [13] Number of people killed by venomous spiders                     
    [14] Mathematics doctorates awarded                                  
    14 Levels: Civil engineering doctorates awarded in the US ...
    drownings <- spurious_correlations %>%
      filter(
         var1 == "Number of people who drowned by falling into a pool"
      ) %>%
      select(year, var1, var2, var1_value, var2_value)
    
    cor(drownings$var1_value, drownings$var2_value)
    [1] 0.6660043

    Now let’s plot the data.

    # compute a scale factor so that max(var2_value * factor) ≈ max(var1_value)
    max1 <- max(drownings$var1_value)
    max2 <- max(drownings$var2_value)
    ratio <- max1 / max2
    
    ggplot(drownings, aes(x = year)) +
      geom_line(aes(y = var1_value, color = "Drownings")) +
      geom_line(aes(y = var2_value * ratio, color = "Films")) +
      scale_y_continuous(
        name = "Number of drownings",
        sec.axis = sec_axis(~ . / ratio,
          name = "Number of films"
        ),
        limits = c(0, NA)
      ) +
      scale_color_manual(
        name = "",
        values = c(
          "Drownings" = "blue",
          "Films" = "red"
        )
      ) +
      theme_minimal() +
      labs(
        title = "Number of people who drowned by falling into a pool vs.\nNumber of films Nicholas Cage appeared in",
        caption = "Source: Spurious Correlations (Vigen 2015)"
      )

    Interested? You can install the package from GitHub

    pak::pkg_install("pachadotdev/spuriouscorrelations")
    To leave a comment for the author, please follow the link and comment on their blog: pacha.dev/blog.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: spuriouscorrelations: An R package to show examples about spurious correlations


沪ICP备19023445号-2号
友情链接