Объединение двух файлов данных с разными именами строк/Столбцов Вместе

ProgramBox

Объединение двух файлов данных с разными именами строк/Столбцов Вместе

Post author:admin
Запись опубликована:9 января, 2022
Post category:Вопросы по программированию

#r #dplyr

Вопрос:

У меня есть этот фрейм данных:

 dtMatrix <- structure(list(category = c("Opponent", "Opponent", "Opponent", 
"Opponent", "P1", "P2", "P3", "P4", "P2", "Opponent", "Opponent", 
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", 
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Good Pass", 
"Intercepted Pass", "Bad Pass", "Good Pass"), Receiver = c(NA, 
NA, NA, NA, "P2", "P3", "P4", "P5", "P1", NA, NA, "P2")), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

С помощью этого я создал матрицу

 goodMatrix <- dtMatrix %>%
  filter(Event == 'Good Pass' amp; !is.na(Receiver)) %>%
  dplyr::count(category, Receiver) %>%
  tidyr::complete(category = dfList, Receiver = dfList, fill = list(n = 0)) %>%
  pivot_wider(names_from = Receiver, values_from = n) %>%
  column_to_rownames('category')

Это goodMatrix сохраняет комбинации хороших проходов между P1-P5. В столбце dtMatrix он также имеет другие значения Event , такие как оборот/перехваченный пас, а также учитывает противника. Я хотел бы создать аналогичную матрицу, goodMatrix но для событий и оппонента, упомянутых ранее.

countTypes <- dtMatrix %>% dplyr::count(category, Event) Захватывает все количество событий на основе столбца категории. С этими словами я тогда сделал:

 secondMatrix <- data.frame(matrix(ncol = length(unique(countTypes$Event)), nrow = length(unique(countTypes$category))))
rownames(secondMatrix) <- unique(countTypes$category)
colnames(secondMatrix) <- unique(countTypes$Event)
secondMatrix

test <- merge(goodMatrix, secondMatrix, by = "row.names")

Чтобы попытаться объединить две отдельные матрицы вместе.

 anotherMatrix <- dtMatrix %>% 
  dplyr::count(category, Event) %>% 
  tidyr::complete(category = dfList, Event = dfList, fill = list(n = 0)) %>%
  pivot_wider(names_from = Event, values_from = n) %>%
  column_to_rownames('category')

Это также добавляет их в одно целое, но не сохраняет значения dtMatrix и вместо этого сбрасывает их до 0.

Мой ожидаемый результат должен выглядеть так:

 expectedOutput <- structure(list(P1 = c(0, 1, 0, 0, 0, 0), P2 = c(2, 0, 0, 0, 0, 
0), P3 = c(0, 1, 0, 0, 0, 0), P4 = c(0, 0, 1, 0, 0, 0), P5 = c(0, 
0, 0, 1, 0, 0), `Good Pass` = c(2, 2, 1, 1, 0, 3), `Bad Pass` = c(0, 
0, 0, 0, 0, 1), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1), Turnover = c(0, 
0, 0, 0, 0, 1)), row.names = c("P1", "P2", "P3", "P4", "P5", 
"Opponent"), class = "data.frame")

И anotherMatrix делает половину этого, в то время dtMatrix как делает другую половину, но я изо всех сил пытаюсь объединить их в то, каким я хотел бы видеть свой результат.

Edit

 newTest <- test[,-1]
rownames(newTest) <- test[,1]
newTry <- merge(anotherMatrix, newTest, by = "row.names")

Just as an extra attempted method — this also gets close to my expected output, but does not include the opponent row, and also doubles every column.

 dfList <- c("P1", "P2", "P3", "P4", "P5")

Правка 2

Краткое продолжение по объединению 2 DF с разными длинами строк/столбцов, как бы я мог объединить passesComb copyComb в gamesComb :

 passesComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 0, 1), P2_Good = c(2, 
0, 0, 0, 0, 0, 2), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0, 
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 1, 0, 0, 1), P1_Bad = c(0, 
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0, 
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0, 
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 1, 1), `Good Pass` = c(2, 
2, 1, 1, 0, 3, 6), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1, 0
), Turnover = c(0, 0, 0, 0, 0, 1, 0), totalEvents = c(2, 2, 2, 
1, 0, 6, 7)), row.names = c("P1", "P2", "P3", "P4", "P5", "Opponent", 
"VT"), class = "data.frame")

 copyComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 1), P2_Good = c(2, 
0, 0, 0, 0, 2), P4_Good = c(0, 0, 0, 0, 0, 1), P5_Good = c(0, 
0, 1, 0, 0, 1), P1_Bad = c(0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 
0, 0, 0, 0), P3_Bad = c(0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 0, 
0, 0, 1), P5_Bad = c(0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 
0, 0, 1, 1), `Good Pass` = c(2, 2, 1, 0, 3, 6), `Intercepted Pass` = c(0, 
0, 0, 0, 1, 0), Turnover = c(0, 0, 0, 0, 1, 0), totalEvents = c(2, 
2, 1, 0, 6, 7)), row.names = c("P1", "P2", "P4", "P5", "Opponent", 
"VT"), class = "data.frame")

copyComb это то же passesComb самое, что и при удалении строки/столбца 3. Я попытался адаптироваться к коду для исходного ответа.

 gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),
                               ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb))))
                        
gamesComb[row.names(ifelse(nrow(passesComb) >= nrow(copyComb), passesComb, copyComb)),
                           colnames(ifelse(ncol(passesComb) >= ncol(copyComb), passesComb, copyComb))] <- passesComb

но это создает только df 7×15 и по какой-то причине не добавляет имена строк/столбцов, а также не добавляет значения ячеек.

Ответ №1:

Если вы намерены обновить «anotherMatrix» с goodMatrix помощью , используйте row.names «и colnames из» goodMatrix » для подмножества «anotherMatrix» и назначьте «goodMatrix» «anotherMatrix»

 anotherMatrix[row.names(goodMatrix), colnames(goodMatrix)] <- goodMatrix

Затем мы просто заменим значение NA на 0

 anotherMatrix[is.na(anotherMatrix)] <- 0

-сверяюсь с ожидаемым выводом

 > identical(expectedOutput, anotherMatrix[names(expectedOutput)])
[1] TRUE

1. Спасибо! Я только что добавил правку об объединении двух новых DF на основе того, что вы написали, и это не совсем то, чего я ожидал. Не могли бы вы помочь с тем, где я ошибся в редактировании?

2. @samrizz4 Я нахожу gamesComb , что созданный вами файл не имеет такого же имени столбца/строки, как в passesComb или в другом

3. Да, я попытался использовать эту строку gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb)))) ,чтобы получить одинаковый тусклый цвет в зависимости passesComb от того, что 7×15 и copyComb 6×14 больше, поэтому gamesComb превращается в 7×15, но не добавляет имена. Я пытаюсь добавить passesComb и copyComb в gamesComb ,где я бы добавил каждую ячейку в 2 DFS, которая имеет одно и то же имя строки/столбца, и, как и в моей операции, она имеет разные размеры.

Метки: Объединение двух файлов данных с разными именами строк/Столбцов Вместе