IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Working with Ordinal Ranks in {marginaleffects}

    Stat's What It's All About发表于 2025-04-30 21:00:00
    love 0
    [This article was first published on Stat's What It's All About, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Given an ordinal regression model, it is relatively easy to get class-wise predictions – the conditional predicted probability of each level of the outcome. However, often, one might be interested in summarizing effects not on the class-wise probability scale (nor on the latent scale), but instead on the rank scale – the expected ordinal level, expressed as a single number.

    The {emmeans} package has the mode = "mean.class" that does just this.

    Let’s see how we can do this in {marginaleffects} by utilizing the powerful hypothesis= argument.

    Fit a model

    library(tidyverse)
    
    library(marginaleffects)
    
    bfi <- psych::bfi |> 
      mutate(
        gender = factor(gender),
        A1 = ordered(A1)
      ) |> 
      drop_na(A1, age, gender)
    
    nrow(bfi)
    #> [1] 2784
    
    fit <- MASS::polr(A1 ~ age + gender, 
                      data = bfi,
                      Hess = TRUE)

    Class-Wise Predictions

    pr1 <- predictions(fit, variables = "gender")
    pr1
    #> 
    #>  Group Estimate Std. Error     z Pr(>|z|)     S   2.5 % 97.5 %
    #>      1   0.1800    0.01136 15.85   <0.001 185.5 0.15777 0.2023
    #>      1   0.1886    0.01128 16.71   <0.001 205.9 0.16649 0.2107
    #>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
    #>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
    #>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
    #> --- 33398 rows omitted. See ?print.marginaleffects --- 
    #>      6   0.0289    0.00335  8.64   <0.001 57.3 0.02234 0.0355
    #>      6   0.0231    0.00264  8.78   <0.001 59.1 0.01798 0.0283
    #>      6   0.0219    0.00250  8.76   <0.001 58.8 0.01699 0.0268
    #>      6   0.0207    0.00238  8.71   <0.001 58.1 0.01604 0.0254
    #>      6   0.0121    0.00165  7.35   <0.001 42.2 0.00891 0.0154
    #> Type:  probs
    
    nrow(pr1) # original data * 2 (counterfactual gender) * 6 (levels of A1)
    #> [1] 33408

    For each observation in the original data frame we now have 12 predictions:
    - 2 counterfactual predictions for the two levels of gender, times
    - 6 (arguably also counterfactual) predictions for each of the possible outcome levels.

    Sum-scores (mean rank)

    We can average the class ranks by their predicted probability to obtain “sum scores”:

    And we can do this for each row in the counterfactual data frame (marked by the rowidcf variable):

    sum_score <- function(x) {
      x |> 
        arrange(rowidcf, group, gender) |> 
        # for each counterfactual row and level of gender
        summarise(
          term = "sum score",
          estimate = sum(estimate * (1:6)), 
          
          .by = c(rowidcf, gender)
        )
    }
    
    pr2 <- predictions(fit, variables = "gender", 
                       hypothesis = sum_score)
    pr2
    #> 
    #>  Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
    #>      3.02     0.0583 51.8   <0.001 Inf  2.90   3.13
    #>      2.51     0.0451 55.6   <0.001 Inf  2.42   2.60
    #>      2.97     0.0553 53.7   <0.001 Inf  2.86   3.08
    #>      2.46     0.0415 59.4   <0.001 Inf  2.38   2.55
    #>      3.00     0.0568 52.8   <0.001 Inf  2.88   3.11
    #> --- 5558 rows omitted. See ?print.marginaleffects --- 
    #>      2.24     0.0305 73.4   <0.001   Inf  2.18   2.30
    #>      2.67     0.0473 56.4   <0.001   Inf  2.57   2.76
    #>      2.20     0.0304 72.2   <0.001   Inf  2.14   2.26
    #>      2.26     0.0645 35.0   <0.001 890.8  2.13   2.38
    #>      1.85     0.0457 40.6   <0.001   Inf  1.77   1.94
    #> Term: sum score
    #> Type:  probs
    
    nrow(pr2) # original data * 2 (counterfactual gender)
    #> [1] 5568

    As expected, we now have 2 predictions for each observation in the original data frame: - 2 counterfactual predictions for the two levels of gender

    Note that we could have also computed the mean rank for the two levels of gender, or literally anything else. But here I am trying to mimic the basic behavior of the avg_/comparisons() functions by keeping with the workflow:

    1. Compute observation wise counterfactual predictions
    2. Contrast them for each observation
    3. (Possibly) average them

    So let’s get those contrasts!

    Contrast the ranks

    sum_score.contr <- function(x) {
      # same as before, and...
      sum_score(x) |> 
        # compute a single contrast for each counterfactual row
        summarise(
          term = "gender (2-1)",
          estimate = estimate[gender == "2"] - estimate[gender == "1"], 
          
          .by = rowidcf
        )
    }

    This function will compute a difference between the rank for each gender for each observation.

    pr3 <- predictions(fit, variables = "gender", 
                       hypothesis = sum_score.contr)
    pr3
    #> 
    #>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
    #>    -0.511     0.0588 -8.68   <0.001 57.8 -0.626 -0.395
    #>    -0.506     0.0584 -8.67   <0.001 57.7 -0.621 -0.392
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #> --- 2774 rows omitted. See ?print.marginaleffects --- 
    #>    -0.504     0.0582 -8.67   <0.001 57.6 -0.618 -0.390
    #>    -0.483     0.0560 -8.62   <0.001 57.1 -0.593 -0.373
    #>    -0.477     0.0554 -8.61   <0.001 56.9 -0.586 -0.368
    #>    -0.471     0.0548 -8.59   <0.001 56.7 -0.578 -0.363
    #>    -0.403     0.0488 -8.26   <0.001 52.6 -0.499 -0.307
    #> Term: gender (2-1)
    #> Type:  probs
    
    nrow(pr3) # original data
    #> [1] 2784

    Average Contrast

    Finally, we can get the average difference:

    # average contrast 
    avg_sum_score.contr <- function(x) {
      sum_score.contr(x) |> 
        summarise(
          term = "avg. gender (2-1)",
          estimate = mean(estimate)
        )
    }
    
    predictions(fit, variables = "gender", 
                hypothesis = avg_sum_score.contr)
    #> 
    #>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
    #>    -0.474     0.0551 -8.61   <0.001 56.9 -0.582 -0.366
    #> 
    #> Term: avg. gender (2-1)
    #> Type:  probs

    POMP

    POMP (percent of maximum possible) are a for of standardized units for Likert-type items (see Solomon Kurz’s excellent blog post for more – better – info).

    These are linear transformations of the ranks described above:

    We can obtain POMPs for each row in the counterfactual data frame by further processing the sum-scores:

    POMP <- function(x) {
      sum_score(x) |> 
        mutate(
          estimate = 100 * (estimate - 1) / (6 - 1)
        )
    }
    
    predictions(fit, variables = "gender", 
                hypothesis = POMP)
    #> 
    #>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
    #>      40.4      1.166 34.6   <0.001 870.6  38.1   42.7
    #>      30.2      0.902 33.5   <0.001 812.8  28.4   31.9
    #>      39.4      1.107 35.6   <0.001 920.9  37.3   41.6
    #>      29.3      0.829 35.3   <0.001 905.5  27.7   30.9
    #>      39.9      1.135 35.1   <0.001 896.3  37.7   42.1
    #> --- 5558 rows omitted. See ?print.marginaleffects --- 
    #>      24.7      0.609 40.6   <0.001   Inf  23.5   25.9
    #>      33.3      0.945 35.3   <0.001 902.9  31.5   35.2
    #>      23.9      0.608 39.3   <0.001   Inf  22.7   25.1
    #>      25.2      1.289 19.5   <0.001 279.4  22.6   27.7
    #>      17.1      0.914 18.7   <0.001 256.9  15.3   18.9
    #> Term: sum score
    #> Type:  probs

    And likewise get contrasts -

    POMP.contr <- function(x) {
      POMP(x) |> 
        summarise(
          term = "gender (2-1)",
          estimate = estimate[gender == "2"] - estimate[gender == "1"], 
          
          .by = rowidcf
        )
    }
    
    predictions(fit, variables = "gender", 
                hypothesis = sum_score.contr)
    #> 
    #>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
    #>    -0.511     0.0588 -8.68   <0.001 57.8 -0.626 -0.395
    #>    -0.506     0.0584 -8.67   <0.001 57.7 -0.621 -0.392
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
    #> --- 2774 rows omitted. See ?print.marginaleffects --- 
    #>    -0.504     0.0582 -8.67   <0.001 57.6 -0.618 -0.390
    #>    -0.483     0.0560 -8.62   <0.001 57.1 -0.593 -0.373
    #>    -0.477     0.0554 -8.61   <0.001 56.9 -0.586 -0.368
    #>    -0.471     0.0548 -8.59   <0.001 56.7 -0.578 -0.363
    #>    -0.403     0.0488 -8.26   <0.001 52.6 -0.499 -0.307
    #> Term: gender (2-1)
    #> Type:  probs

    And average contrasts -

    # average contrast 
    avg_POMP.contr <- function(x) {
      POMP.contr(x) |> 
        summarise(
          term = "avg. gender (2-1)",
          estimate = mean(estimate)
        )
    }
    
    predictions(fit, variables = "gender", 
                hypothesis = avg_POMP.contr)
    #> 
    #>  Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
    #>     -9.49        1.1 -8.61   <0.001 56.9 -11.6  -7.33
    #> 
    #> Term: avg. gender (2-1)
    #> Type:  probs

    A note for Bayesians

    Hey, Bayesians, how are you doing?

    I have some bad news: {marginaleffects} does not support hypothesis=<function>.

    But I also have some good news! You can basically do all of this by getting the posterior draws in an rvar format, and then directly manipulating the posterior(s) - so the examples above should basically work out of the box.

    For example:

    library(brms)
    library(posterior)
    
    fit_b <- brm(A1 ~ age + gender, 
                 data = bfi,
                 family = cumulative(),
                 prior = NULL) # obviously this is a bad prior
    
    pr1_b <- predictions(fit_b, variables = "gender")

    We just need to adapt the function(s) written above to work properly with rvars. I’ve marked

    avg_POMP.contr_b <- function(x) {
      x |> 
        arrange(rowidcf, group, gender) |> 
        # for each counterfactual row and level of gender
        summarise(
          term = "sum score",
          # estimate = sum(estimate * (1:6)), 
    1      estimate = rvar_sum(rvar * (1:6)),
          
          .by = c(rowidcf, gender)
        ) |> 
        # POMP
        mutate(
          estimate = 100 * (estimate - 1) / (6 - 1)
        ) |> 
        # contrasts
        summarise(
          term = "gender (2-1)",
          estimate = estimate[gender == "2"] - estimate[gender == "1"], 
          
          .by = rowidcf
        ) |> 
        # average
        summarise(
          term = "avg. gender (2-1)",
          # estimate = mean(estimate)
    2      estimate = rvar_mean(estimate)
        )
    }
    
    get_draws(pr1_b, shape = "rvar") |>
      avg_POMP.contr_b()
    #>                term   estimate
    #> 1 avg. gender (2-1) -9.4 ± 1.1
    1
    Use the rvar column and the rvar_sum() function (instead of sum())
    2
    Use the rvar_mean() function (instead of mean())
    To leave a comment for the author, please follow the link and comment on their blog: Stat's What It's All About.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Working with Ordinal Ranks in {marginaleffects}


沪ICP备19023445号-2号
友情链接