IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Summarising Dates with Missing Values

    R on Alan Yeung发表于 2023-10-02 00:00:00
    love 0

    This blog post is just a note that when you try to do a grouped summary of a date variable but some groups have all missing values, it will return Inf. This means that the summary will not show up as an NA and this can cause issues in analysis if you are not careful.

    library(tidyverse)
    
    df <- tibble::tribble(
        ~id,          ~dt,
        1L, "01/01/2001",
        1L,           NA,
        2L,           NA,
        2L,           NA
    ) %>% 
        mutate(dt = dmy(dt))
    
    z1 <- df %>% 
        group_by(id) %>% 
        summarise(dt_min = min(dt, na.rm = TRUE),
                  .groups = "drop")
    
    z1
    # A tibble: 2 × 2
    #      id dt_min    
    #   <int> <date>    
    # 1     1 2001-01-01
    # 2     2 Inf
    
    sum(is.na(z1$dt_min))
    # [1] 0

    There are a couple of ways around this. Firstly you can use an if() statement.

    z2 <- df %>% 
        group_by(id) %>% 
        summarise(dt_min = if (all(is.na(dt))) NA_Date_ else min(dt, na.rm = TRUE),
                  .groups = "drop")
    
    z2
    # A tibble: 2 × 2
    #      id dt_min    
    #   <int> <date>    
    # 1     1 2001-01-01
    # 2     2 NA  
    sum(is.na(z2$dt_min))
    # [1] 1

    Or you can summary functions from the hablar package.

    z3 <- df %>% 
        group_by(id) %>% 
        summarise(dt_min = hablar::min_(dt),
                  .groups = "drop")
    
    z3
    # A tibble: 2 × 2
    #      id dt_min    
    #   <int> <date>    
    # 1     1 2001-01-01
    # 2     2 NA  
    sum(is.na(z3$dt_min))
    # [1] 1

    Is there a reason why R decides to return Inf when summarising dates? Are there any other solutions to summarising date variables that contain missing values? Leave me a comment if you know thanks.

    Continue reading: Summarising Dates with Missing Values


沪ICP备19023445号-2号
友情链接