Ограничьте `pivot_wider » строками, соответствующими шаблону

#r #pattern-matching #tidyr

Вопрос:

Я хочу развернуть столбец шире, основываясь не на всех значениях в столбце, а только на тех, которые соответствуют шаблону.

Некоторые данные об игрушках:

 df lt;- data.frame(utterance = c("A and stuff",   "X and something",   "A and some more",   "B etc.",   "B",   "x yz and so on",   "BBB"),  timestamp = c("00:05:31.736 - 00:05:35.263", "00:05:31.829 - 00:05:36.449",   "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449",   "00:05:31.842 - 00:05:35.302", "00:05:35.088 - 00:05:36.134",   "00:05:35.263 - 00:05:53.052"))  

Я хочу развернуть шире только те строки, utterance которые начинаются с A или B . Я могу поворачиваться только шире по всем строкам в utterance :

 library(tidyr) df %gt;%  group_by(timestamp) %gt;%  pivot_wider(-utterance,   names_from = utterance,   values_from = utterance)  # A tibble: 5 x 8 # Groups: timestamp [5]  timestamp `A and stuff` `X and something` `A and some more` `B etc.` B `x yz and so on` BBB   lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; 1 00:05:31.736 - 00:05:35.263 A and stuff NA NA NA NA NA NA  2 00:05:31.829 - 00:05:36.449 NA X and something A and some more B etc. NA NA NA  3 00:05:31.842 - 00:05:35.302 NA NA NA NA B NA NA  4 00:05:35.088 - 00:05:36.134 NA NA NA NA NA x yz and so on NA  5 00:05:35.263 - 00:05:53.052 NA NA NA NA NA NA BBB  

Я попытался выполнить подмножество utterance по шаблону, но получил ошибку:

 df %gt;%  group_by(timestamp) %gt;%  pivot_wider(names_from = utterance[grepl("^(A|B)", utterance)],   values_from = utterance[grepl("^(A|B)", utterance)]) Error: object 'utterance' not found  

Как я могу поворачиваться только на совпадающих строках?

Ожидаемый:

 # timestamp `A` utterance `B`  # lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt;  # 00:05:31.736 - 00:05:35.263 A and stuff NA NA  # 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. # 00:05:31.842 - 00:05:35.302 NA NA B  # 00:05:35.088 - 00:05:36.134 NA x yz and so on NA  # 00:05:35.263 - 00:05:53.052 NA NA BBB  

Комментарии:

1. Не могли бы вы filter перед pivot ингом?

Ответ №1:

Вы можете создать новый names столбец:

 library(stringr) library(dplyr) library(tidyr)  df %gt;%   mutate(pvt = case_when(str_detect(utterance, "^A") ~ "A",  str_detect(utterance, "^B") ~ "B",  TRUE ~ "utterance")) %gt;%   pivot_wider(names_from = pvt,  values_from = utterance)  

Это возвращает

 # A tibble: 5 x 4  timestamp A utterance B   lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt;  1 00:05:31.736 - 00:05:35.263 A and stuff NA NA  2 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. 3 00:05:31.842 - 00:05:35.302 NA NA B  4 00:05:35.088 - 00:05:36.134 NA x yz and so on NA  5 00:05:35.263 - 00:05:53.052 NA NA BBB   

Ответ №2:

Решение без pivot_wider :

 library(tidyverse)  df lt;- data.frame(utterance = c("A and stuff",   "X and something",   "A and some more",   "B etc.",   "B",   "x yz and so on",   "BBB"),  timestamp = c("00:05:31.736 - 00:05:35.263", "00:05:31.829 - 00:05:36.449",   "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449",   "00:05:31.842 - 00:05:35.302", "00:05:35.088 - 00:05:36.134",   "00:05:35.263 - 00:05:53.052"))  df %gt;%   mutate(A = ifelse(str_detect(utterance,"^A"),utterance,NA),  B = ifelse(str_detect(utterance,"^B"),utterance,NA),  utterance = ifelse(str_detect(utterance,"^A|^B"),NA, utterance)) %gt;%   relocate(utterance,.before="B") %gt;%   group_by(timestamp) %gt;%   fill(everything(),.direction = "downup") %gt;%   ungroup() %gt;%   distinct()  #gt; # A tibble: 5 × 4 #gt; timestamp A utterance B  #gt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt;  #gt; 1 00:05:31.736 - 00:05:35.263 A and stuff lt;NAgt; lt;NAgt;  #gt; 2 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. #gt; 3 00:05:31.842 - 00:05:35.302 lt;NAgt; lt;NAgt; B  #gt; 4 00:05:35.088 - 00:05:36.134 lt;NAgt; x yz and so on lt;NAgt;  #gt; 5 00:05:35.263 - 00:05:53.052 lt;NAgt; lt;NAgt; BBB