We all need to pivot data at some point, so these are just some notes
for my own benefit really, because gather
and spread
are no longer
in favour within tidyr
.
I tended to only ever need gather
, and nearly always relied on the
same key and value names, so it was an easy function for me to use.
pivot_longer
and pivot_wider
are much more flexible, they just take a little bit more thinking about.
For example, my old approach has now changed along these lines
# old way with `gather` df %>% mutate(row = row_number()) %>% gather('column', 'source', -row, -N) # key = column, value = source, retain row and N # further transforms # new way with pivot_longer df %>% mutate(row = row_number()) %>% pivot_longer(!c(row , N), names_to = 'column', values_to = 'source') # further transforms
However, what I really want to do is show how to replicate much of the
tidyr
pivot functionality with data.table
.
Once again, this is not intended to be in-depth.
I have simply used the tidyr
help file code, and tried to replicate it
with data.table
.
I’d be interested in improvements to my data.table code.
Let’s pivot!
Note – in all examples, I’ll create a copy of the data set as a
data.table using setDT(copy(source_data))
Also, I intended to use code folding to show the datasets and results, but that’s gone horribly wrong, so you can run the code yourself.
You only need:
library(tidyr) library(data.table)
I’m using the base pipe for simplicity.
Using the built-in relig_income
dataset:
## # A tibble: 18 × 11 ## religion `<$10k` $10-2…¹ $20-3…² $30-4…³ $40-5…⁴ $50-7…⁵ $75-1…⁶ $100-…⁷ ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Agnostic 27 34 60 81 76 137 122 109 ## 2 Atheist 12 27 37 52 35 70 73 59 ## 3 Buddhist 27 21 30 34 33 58 62 39 ## 4 Catholic 418 617 732 670 638 1116 949 792 ## 5 Don’t know/r… 15 14 15 11 10 35 21 17 ## 6 Evangelical … 575 869 1064 982 881 1486 949 723 ## 7 Hindu 1 9 7 9 11 34 47 48 ## 8 Historically… 228 244 236 238 197 223 131 81 ## 9 Jehovah's Wi… 20 27 24 24 21 30 15 11 ## 10 Jewish 19 19 25 25 30 95 69 87 ## 11 Mainline Prot 289 495 619 655 651 1107 939 753 ## 12 Mormon 29 40 48 51 56 112 85 49 ## 13 Muslim 6 7 9 10 9 23 16 8 ## 14 Orthodox 13 17 23 32 32 47 38 42 ## 15 Other Christ… 9 7 11 13 13 14 18 14 ## 16 Other Faiths 20 33 40 46 49 63 46 40 ## 17 Other World … 5 2 3 4 2 7 3 4 ## 18 Unaffiliated 217 299 374 365 341 528 407 321 ## # … with 2 more variables: `>150k` <dbl>, `Don't know/refused` <dbl>, and ## # abbreviated variable names ¹`$10-20k`, ²`$20-30k`, ³`$30-40k`, ⁴`$40-50k`, ## # ⁵`$50-75k`, ⁶`$75-100k`, ⁷`$100-150k`
Code comparison
relig_income |> pivot_longer(!religion, # keep religion as a column names_to = "income", # desired name for new column values_to = "count") # what data goes into the new column? melt(DT, id.vars = "religion", variable.name = "income", value.name = "count", variable.factor = FALSE) # added to keep output consistent with tidyr
With data.table, you can often get away with only supplying either
measure.vars
or id.vars
, and nothing else, and it does a pretty
great job of guessing what to do.
Obviously it’s better to be specific, but worth bearing in mind.
You can compare outputs here:
pivot_longer
output
## # A tibble: 180 × 3 ## religion income count ## <chr> <chr> <dbl> ## 1 Agnostic <$10k 27 ## 2 Agnostic $10-20k 34 ## 3 Agnostic $20-30k 60 ## 4 Agnostic $30-40k 81 ## 5 Agnostic $40-50k 76 ## 6 Agnostic $50-75k 137 ## 7 Agnostic $75-100k 122 ## 8 Agnostic $100-150k 109 ## 9 Agnostic >150k 84 ## 10 Agnostic Don't know/refused 96 ## # … with 170 more rows
melt
output
## religion income count ## 1: Agnostic <$10k 27 ## 2: Atheist <$10k 12 ## 3: Buddhist <$10k 27 ## 4: Catholic <$10k 418 ## 5: Don’t know/refused <$10k 15 ## --- ## 176: Orthodox Don't know/refused 73 ## 177: Other Christian Don't know/refused 18 ## 178: Other Faiths Don't know/refused 71 ## 179: Other World Religions Don't know/refused 8 ## 180: Unaffiliated Don't know/refused 597
Now, a slightly more complex case where columns have common prefix, and
missings are structural so should be dropped, using the billboard
dataset
Code comparison:
billboard |> pivot_longer( cols = starts_with("wk"), names_to = "week", names_prefix = "wk", values_to = "rank", values_drop_na = TRUE ) melt(DT, measure.vars = patterns("^wk"), variable.name = "week", value.name = "rank", na.rm = TRUE)
pivot_longer
output
## # A tibble: 5,307 × 5 ## artist track date.entered week rank ## <chr> <chr> <date> <chr> <dbl> ## 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87 ## 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82 ## 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72 ## 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77 ## 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87 ## 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94 ## 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99 ## 8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91 ## 9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87 ## 10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92 ## # … with 5,297 more rows
melt
output
## artist track date.entered week rank ## 1: 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87 ## 2: 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91 ## 3: 3 Doors Down Kryptonite 2000-04-08 wk1 81 ## 4: 3 Doors Down Loser 2000-10-21 wk1 76 ## 5: 504 Boyz Wobble Wobble 2000-04-15 wk1 57 ## --- ## 5303: Creed Higher 1999-09-11 wk63 50 ## 5304: Lonestar Amazed 1999-06-05 wk63 45 ## 5305: Creed Higher 1999-09-11 wk64 50 ## 5306: Lonestar Amazed 1999-06-05 wk64 50 ## 5307: Creed Higher 1999-09-11 wk65 49
Using the who
dataset
who
data
## A tibble: 7,240 × 60 ## country iso2 iso3 year new_s…¹ new_s…² new_s…³ new_s…⁴ new_s…⁵ new_s…⁶ ## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Afghan… AF AFG 1980 NA NA NA NA NA NA ## 2 Afghan… AF AFG 1981 NA NA NA NA NA NA ## 3 Afghan… AF AFG 1982 NA NA NA NA NA NA ## 4 Afghan… AF AFG 1983 NA NA NA NA NA NA ## 5 Afghan… AF AFG 1984 NA NA NA NA NA NA ## 6 Afghan… AF AFG 1985 NA NA NA NA NA NA ## 7 Afghan… AF AFG 1986 NA NA NA NA NA NA ## 8 Afghan… AF AFG 1987 NA NA NA NA NA NA ## 9 Afghan… AF AFG 1988 NA NA NA NA NA NA ## 10 Afghan… AF AFG 1989 NA NA NA NA NA NA ## … with 7,230 more rows, 50 more variables: new_sp_m65 <dbl>, ## new_sp_f014 <dbl>, new_sp_f1524 <dbl>, new_sp_f2534 <dbl>, ## new_sp_f3544 <dbl>, new_sp_f4554 <dbl>, new_sp_f5564 <dbl>, ## new_sp_f65 <dbl>, new_sn_m014 <dbl>, new_sn_m1524 <dbl>, ## new_sn_m2534 <dbl>, new_sn_m3544 <dbl>, new_sn_m4554 <dbl>, ## new_sn_m5564 <dbl>, new_sn_m65 <dbl>, new_sn_f014 <dbl>, ## new_sn_f1524 <dbl>, new_sn_f2534 <dbl>, new_sn_f3544 <dbl>, … ## ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
who |> pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c("diagnosis", "gender", "age"), names_pattern = "new_?(.*)_(.)(.*)", # Whoa values_to = "count" ) DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][,c("diagnosis", "temp") := tstrsplit(variable,"_", fixed = TRUE) ][, gender := tstrsplit(temp, "[^mf]+", fixed = FALSE) ][, age := tstrsplit(temp, "[mf]+", fixed = FALSE, keep = 2) ][, !c("variable","temp") ][,c(1:4,6:8,5)][]
OK - we’re going to need to break this one down. There’s a lot of magic
going on with that names_pattern
call in pivot_longer
, where various
parts of the tidyverse are working in the background.
I’m not going to dissect pivot_longer
, because calling debugonce
show you that various methods are involved, which is harder to debug.
Instead, I will build up the data.table solution.
Firstly, note that this time round, I am using melt
inside of DT
,
rather than melt(DT)
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][]
This returns a new variable
column, which needs splitting up into more
columns.
First, we need to get rid of that new_
, using gsub
to replace it
with nothing at all:
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][]
Now I want the diagnosis part, so I create a diagnosis column, and a “temp” column for the remaining bits, which I’ll split in the next passes
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][,c("diagnosis", "temp") := tstrsplit(variable,"_", fixed = TRUE) ][]
Now to split out the gender
column, from the new temp
column. This
is going to be either an “m” or an “f”.
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][,c("diagnosis", "temp") := tstrsplit(variable,"_", fixed = TRUE) ][, gender := tstrsplit(temp, "[^mf]+", fixed = FALSE) ][]
Then retrieve the age
, again from the temp
column. This generates
two vectors, I only want the second one, hence the keep = 2
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][,c("diagnosis", "temp") := tstrsplit(variable,"_", fixed = TRUE) ][, gender := tstrsplit(temp, "[^mf]+", fixed = FALSE) ][, age := tstrsplit(temp, "[mf]+", fixed = FALSE, keep = 2) ][]
The last two lines discard the variable
and temp
columns, and sort
the columns into the same order as that returned by pivot_longer
DT[,melt(.SD, measure.vars = 5:60, value.name = "count") ][,variable := gsub("new_?","", variable) ][,c("diagnosis", "temp") := tstrsplit(variable,"_", fixed = TRUE) ][, gender := tstrsplit(temp, "[^mf]+", fixed = FALSE) ][, age := tstrsplit(temp, "[mf]+", fixed = FALSE, keep = 2) ][, !c("variable","temp") ][,c(1:4,6:8,5)][]
## country iso2 iso3 year diagnosis gender age count ## 1: Afghanistan AF AFG 1980 sp m 014 NA ## 2: Afghanistan AF AFG 1981 sp m 014 NA ## 3: Afghanistan AF AFG 1982 sp m 014 NA ## 4: Afghanistan AF AFG 1983 sp m 014 NA ## 5: Afghanistan AF AFG 1984 sp m 014 NA ## --- ## 405436: Zimbabwe ZW ZWE 2009 rel f 65 NA ## 405437: Zimbabwe ZW ZWE 2010 rel f 65 NA ## 405438: Zimbabwe ZW ZWE 2011 rel f 65 NA ## 405439: Zimbabwe ZW ZWE 2012 rel f 65 NA ## 405440: Zimbabwe ZW ZWE 2013 rel f 65 725
Using the anscombe
dataset
## x1 x2 x3 x4 y1 y2 y3 y4 ## 1 10 10 10 8 8.04 9.14 7.46 6.58 ## 2 8 8 8 8 6.95 8.14 6.77 5.76 ## 3 13 13 13 8 7.58 8.74 12.74 7.71 ## 4 9 9 9 8 8.81 8.77 7.11 8.84 ## 5 11 11 11 8 8.33 9.26 7.81 8.47 ## 6 14 14 14 8 9.96 8.10 8.84 7.04 ## 7 6 6 6 8 7.24 6.13 6.08 5.25 ## 8 4 4 4 19 4.26 3.10 5.39 12.50 ## 9 12 12 12 8 10.84 9.13 8.15 5.56 ## 10 7 7 7 8 4.82 7.26 6.42 7.91 ## 11 5 5 5 8 5.68 4.74 5.73 6.89
Here’s what we’re aiming for:
## # A tibble: 44 × 3 ## set x y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 1 8 6.95 ## 3 1 13 7.58 ## 4 1 9 8.81 ## 5 1 11 8.33 ## 6 1 14 9.96 ## 7 1 6 7.24 ## 8 1 4 4.26 ## 9 1 12 10.8 ## 10 1 7 4.82 ## # … with 34 more rows
The two approaches look like this:
anscombe |> pivot_longer( everything(), cols_vary = "slowest", names_to = c(".value", "set"), names_pattern = "(.)(.)" ) DT[,melt(.SD, variable.name = "set", value.name = c("x","y"), variable.factor = FALSE, measure.vars = patterns("^x","^y"))]
This is the output from melt
- we’ve seen the pivot_longer
outputs above
## set x y ## 1: 1 10 8.04 ## 2: 1 8 6.95 ## 3: 1 13 7.58 ## 4: 1 9 8.81 ## 5: 1 11 8.33 ## 6: 1 14 9.96 ## 7: 1 6 7.24 ## 8: 1 4 4.26 ## 9: 1 12 10.84 ## 10: 1 7 4.82 ## 11: 1 5 5.68 ## 12: 2 10 9.14 ## 13: 2 8 8.14 ## 14: 2 13 8.74 ## 15: 2 9 8.77 ## 16: 2 11 9.26 ## 17: 2 14 8.10 ## 18: 2 6 6.13 ## 19: 2 4 3.10 ## 20: 2 12 9.13 ## 21: 2 7 7.26 ## 22: 2 5 4.74 ## 23: 3 10 7.46 ## 24: 3 8 6.77 ## 25: 3 13 12.74 ## 26: 3 9 7.11 ## 27: 3 11 7.81 ## 28: 3 14 8.84 ## 29: 3 6 6.08 ## 30: 3 4 5.39 ## 31: 3 12 8.15 ## 32: 3 7 6.42 ## 33: 3 5 5.73 ## 34: 4 8 6.58 ## 35: 4 8 5.76 ## 36: 4 8 7.71 ## 37: 4 8 8.84 ## 38: 4 8 8.47 ## 39: 4 8 7.04 ## 40: 4 8 5.25 ## 41: 4 19 12.50 ## 42: 4 8 5.56 ## 43: 4 8 7.91 ## 44: 4 8 6.89 ## set x y
We’ve done wide to long
, now for the opposite.
I have to be honest, dcast
is not a term that makes me think long to wide
, but you can’t have everything.
We’re using this fish_encounters
dataset for this bit.
It’s about time fish made an appearance on this blog.
Probably.
fish_encounters ## # A tibble: 114 × 3 ## fish station seen ## <fct> <fct> <int> ## 1 4842 Release 1 ## 2 4842 I80_1 1 ## 3 4842 Lisbon 1 ## 4 4842 Rstr 1 ## 5 4842 Base_TD 1 ## 6 4842 BCE 1 ## 7 4842 BCW 1 ## 8 4842 BCE2 1 ## 9 4842 BCW2 1 ## 10 4842 MAE 1 ## # … with 104 more rows
Here’s what we want:
## # A tibble: 19 × 12 ## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW ## <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> ## 1 4842 1 1 1 1 1 1 1 1 1 1 1 ## 2 4843 1 1 1 1 1 1 1 1 1 1 1 ## 3 4844 1 1 1 1 1 1 1 1 1 1 1 ## 4 4845 1 1 1 1 1 NA NA NA NA NA NA ## 5 4847 1 1 1 NA NA NA NA NA NA NA NA ## 6 4848 1 1 1 1 NA NA NA NA NA NA NA ## 7 4849 1 1 NA NA NA NA NA NA NA NA NA ## 8 4850 1 1 NA 1 1 1 1 NA NA NA NA ## 9 4851 1 1 NA NA NA NA NA NA NA NA NA ## 10 4854 1 1 NA NA NA NA NA NA NA NA NA ## 11 4855 1 1 1 1 1 NA NA NA NA NA NA ## 12 4857 1 1 1 1 1 1 1 1 1 NA NA ## 13 4858 1 1 1 1 1 1 1 1 1 1 1 ## 14 4859 1 1 1 1 1 NA NA NA NA NA NA ## 15 4861 1 1 1 1 1 1 1 1 1 1 1 ## 16 4862 1 1 1 1 1 1 1 1 1 NA NA ## 17 4863 1 1 NA NA NA NA NA NA NA NA NA ## 18 4864 1 1 NA NA NA NA NA NA NA NA NA ## 19 4865 1 1 1 NA NA NA NA NA NA NA NA
Code comparison:
fish_encounters |> pivot_wider(names_from = station, values_from = seen) dcast(DT, fish ~ station, value.var = "seen")
Technically, we can avoid stating that value.var = "seen"
, but we do
then get a large informational message in the console, which is always
disconcerting. (If data.table tells you something, you’re well advised
to act on it, because the help is scarily accurate)
Here are the data.table outputs
dcast
output
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW ## 1: 4842 1 1 1 1 1 1 1 1 1 1 1 ## 2: 4843 1 1 1 1 1 1 1 1 1 1 1 ## 3: 4844 1 1 1 1 1 1 1 1 1 1 1 ## 4: 4845 1 1 1 1 1 NA NA NA NA NA NA ## 5: 4847 1 1 1 NA NA NA NA NA NA NA NA ## 6: 4848 1 1 1 1 NA NA NA NA NA NA NA ## 7: 4849 1 1 NA NA NA NA NA NA NA NA NA ## 8: 4850 1 1 NA 1 1 1 1 NA NA NA NA ## 9: 4851 1 1 NA NA NA NA NA NA NA NA NA ## 10: 4854 1 1 NA NA NA NA NA NA NA NA NA ## 11: 4855 1 1 1 1 1 NA NA NA NA NA NA ## 12: 4857 1 1 1 1 1 1 1 1 1 NA NA ## 13: 4858 1 1 1 1 1 1 1 1 1 1 1 ## 14: 4859 1 1 1 1 1 NA NA NA NA NA NA ## 15: 4861 1 1 1 1 1 1 1 1 1 1 1 ## 16: 4862 1 1 1 1 1 1 1 1 1 NA NA ## 17: 4863 1 1 NA NA NA NA NA NA NA NA NA ## 18: 4864 1 1 NA NA NA NA NA NA NA NA NA ## 19: 4865 1 1 1 NA NA NA NA NA NA NA NA
Let’s get rid of those NA’s.
fish_encounters |> pivot_wider(names_from = station, values_from = seen, values_fill = 0) dcast(DT, fish ~ station, value.var = "seen", fill = 0)
Rather than have the DT
inside dcast
, we can use .SD
and have
dcast
inside DT
, which is helpful for further chaining. The same
applies to melt
, as you’ll have seen above.
DT[, dcast(.SD, fish ~ station, value.var = "seen", fill = 0)]
Using the us_rent_income
data
## # A tibble: 104 × 5 ## GEOID NAME variable estimate moe ## <chr> <chr> <chr> <dbl> <dbl> ## 1 01 Alabama income 24476 136 ## 2 01 Alabama rent 747 3 ## 3 02 Alaska income 32940 508 ## 4 02 Alaska rent 1200 13 ## 5 04 Arizona income 27517 148 ## 6 04 Arizona rent 972 4 ## 7 05 Arkansas income 23789 165 ## 8 05 Arkansas rent 709 5 ## 9 06 California income 29454 109 ## 10 06 California rent 1358 3 ## # … with 94 more rows
Comparison:
us_rent_income |> pivot_wider( names_from = variable, values_from = c(estimate, moe) ) dcast(DT, GEOID + NAME ~ variable, value.var = c("estimate","moe"))
Alternatively we can pass ...
to indicate all other unspecified columns:
dcast(DT, ... ~ variable, value.var = c("estimate","moe"))
us_rent_income |> pivot_wider( names_from = variable, names_sep = ".", values_from = c(estimate, moe) ) dcast(DT, GEOID + NAME ~ variable, value.var = c("estimate","moe"), sep = ".") ## alternatively DT[, dcast(.SD, GEOID + NAME ~ variable, value.var = c("estimate","moe"), sep = ".")]
From the tidyr help: “You can control whether names_from
values vary
fastest or slowest relative to the values_from
column names using
names_vary
”.
us_rent_income |> pivot_wider( names_from = variable, values_from = c(estimate, moe), names_vary = "slowest" ) |> names() ## [1] "GEOID" "NAME" "estimate_income" "moe_income" ## [5] "estimate_rent" "moe_rent"
To mimic this in data.table, we specify the final column order
DT[, dcast(.SD, GEOID + NAME ~ variable, value.var = c("estimate","moe")) ][,c(1:3,5,4,6)] |> names() ## [1] "GEOID" "NAME" "estimate_income" "moe_income" ## [5] "estimate_rent" "moe_rent"
The fastest
option is exactly the same as the original
us_rent_income |> pivot_wider( names_from = variable, values_from = c(estimate, moe), names_vary = "fastest" ) |> names() ## [1] "GEOID" "NAME" "estimate_income" "estimate_rent" ## [5] "moe_income" "moe_rent" DT[, dcast(.SD, GEOID + NAME ~ variable, value.var = c("estimate","moe"))] |> names() ## [1] "GEOID" "NAME" "estimate_income" "estimate_rent" ## [5] "moe_income" "moe_rent"
values_fn
Using the warpbreaks
dataset:
## breaks wool tension ## 1 26 A L ## 2 30 A L ## 3 54 A L ## 4 25 A L ## 5 70 A L ## 6 52 A L ## 7 51 A L ## 8 26 A L ## 9 67 A L ## 10 18 A M ## 11 21 A M ## 12 29 A M ## 13 17 A M ## 14 12 A M ## 15 18 A M ## 16 35 A M ## 17 30 A M ## 18 36 A M ## 19 36 A H ## 20 21 A H ## 21 24 A H ## 22 18 A H ## 23 10 A H ## 24 43 A H ## 25 28 A H ## 26 15 A H ## 27 26 A H ## 28 27 B L ## 29 14 B L ## 30 29 B L ## 31 19 B L ## 32 29 B L ## 33 31 B L ## 34 41 B L ## 35 20 B L ## 36 44 B L ## 37 42 B M ## 38 26 B M ## 39 19 B M ## 40 16 B M ## 41 39 B M ## 42 28 B M ## 43 21 B M ## 44 39 B M ## 45 29 B M ## 46 20 B H ## 47 21 B H ## 48 24 B H ## 49 17 B H ## 50 13 B H ## 51 15 B H ## 52 15 B H ## 53 16 B H ## 54 28 B H
warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) dcast(DT, tension ~ wool, value.var = "breaks", fun = mean) warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) ## # A tibble: 3 × 3 ## tension A B ## <fct> <dbl> <dbl> ## 1 L 44.6 28.2 ## 2 M 24 28.8 ## 3 H 24.6 18.8 dcast(DT, tension ~ wool, value.var = "breaks", fun = mean) ## tension A B ## 1: L 44.55556 28.22222 ## 2: M 24.00000 28.77778 ## 3: H 24.55556 18.77778
Or, do it inside DT:
DT[, dcast(.SD, tension ~ wool, value.var = "breaks", fun = mean)]
I hope this is useful to others.
If nothing else, please at least give me credit for not using the Ross
Geller GIF.