#r #pattern-matching #tidyr
Вопрос:
Я хочу развернуть столбец шире, основываясь не на всех значениях в столбце, а только на тех, которые соответствуют шаблону.
Некоторые данные об игрушках:
df lt;- data.frame(utterance = c("A and stuff", "X and something", "A and some more", "B etc.", "B", "x yz and so on", "BBB"), timestamp = c("00:05:31.736 - 00:05:35.263", "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449", "00:05:31.842 - 00:05:35.302", "00:05:35.088 - 00:05:36.134", "00:05:35.263 - 00:05:53.052"))
Я хочу развернуть шире только те строки, utterance
которые начинаются с A
или B
. Я могу поворачиваться только шире по всем строкам в utterance
:
library(tidyr) df %gt;% group_by(timestamp) %gt;% pivot_wider(-utterance, names_from = utterance, values_from = utterance) # A tibble: 5 x 8 # Groups: timestamp [5] timestamp `A and stuff` `X and something` `A and some more` `B etc.` B `x yz and so on` BBB lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; 1 00:05:31.736 - 00:05:35.263 A and stuff NA NA NA NA NA NA 2 00:05:31.829 - 00:05:36.449 NA X and something A and some more B etc. NA NA NA 3 00:05:31.842 - 00:05:35.302 NA NA NA NA B NA NA 4 00:05:35.088 - 00:05:36.134 NA NA NA NA NA x yz and so on NA 5 00:05:35.263 - 00:05:53.052 NA NA NA NA NA NA BBB
Я попытался выполнить подмножество utterance
по шаблону, но получил ошибку:
df %gt;% group_by(timestamp) %gt;% pivot_wider(names_from = utterance[grepl("^(A|B)", utterance)], values_from = utterance[grepl("^(A|B)", utterance)]) Error: object 'utterance' not found
Как я могу поворачиваться только на совпадающих строках?
Ожидаемый:
# timestamp `A` utterance `B` # lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; # 00:05:31.736 - 00:05:35.263 A and stuff NA NA # 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. # 00:05:31.842 - 00:05:35.302 NA NA B # 00:05:35.088 - 00:05:36.134 NA x yz and so on NA # 00:05:35.263 - 00:05:53.052 NA NA BBB
Комментарии:
1. Не могли бы вы
filter
передpivot
ингом?
Ответ №1:
Вы можете создать новый names
столбец:
library(stringr) library(dplyr) library(tidyr) df %gt;% mutate(pvt = case_when(str_detect(utterance, "^A") ~ "A", str_detect(utterance, "^B") ~ "B", TRUE ~ "utterance")) %gt;% pivot_wider(names_from = pvt, values_from = utterance)
Это возвращает
# A tibble: 5 x 4 timestamp A utterance B lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; 1 00:05:31.736 - 00:05:35.263 A and stuff NA NA 2 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. 3 00:05:31.842 - 00:05:35.302 NA NA B 4 00:05:35.088 - 00:05:36.134 NA x yz and so on NA 5 00:05:35.263 - 00:05:53.052 NA NA BBB
Ответ №2:
Решение без pivot_wider
:
library(tidyverse) df lt;- data.frame(utterance = c("A and stuff", "X and something", "A and some more", "B etc.", "B", "x yz and so on", "BBB"), timestamp = c("00:05:31.736 - 00:05:35.263", "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449", "00:05:31.829 - 00:05:36.449", "00:05:31.842 - 00:05:35.302", "00:05:35.088 - 00:05:36.134", "00:05:35.263 - 00:05:53.052")) df %gt;% mutate(A = ifelse(str_detect(utterance,"^A"),utterance,NA), B = ifelse(str_detect(utterance,"^B"),utterance,NA), utterance = ifelse(str_detect(utterance,"^A|^B"),NA, utterance)) %gt;% relocate(utterance,.before="B") %gt;% group_by(timestamp) %gt;% fill(everything(),.direction = "downup") %gt;% ungroup() %gt;% distinct() #gt; # A tibble: 5 × 4 #gt; timestamp A utterance B #gt; lt;chrgt; lt;chrgt; lt;chrgt; lt;chrgt; #gt; 1 00:05:31.736 - 00:05:35.263 A and stuff lt;NAgt; lt;NAgt; #gt; 2 00:05:31.829 - 00:05:36.449 A and some more X and something B etc. #gt; 3 00:05:31.842 - 00:05:35.302 lt;NAgt; lt;NAgt; B #gt; 4 00:05:35.088 - 00:05:36.134 lt;NAgt; x yz and so on lt;NAgt; #gt; 5 00:05:35.263 - 00:05:53.052 lt;NAgt; lt;NAgt; BBB