R справка, необходимая для моей кумулятивной процентной функции

ProgramBox

R справка, необходимая для моей кумулятивной процентной функции

Post author:admin
Запись опубликована:12 февраля, 2023
Post category:Вопросы по программированию

#r #function #cumulative-frequency

#r #функция #кумулятивная частота

Вопрос:

Недавнее изменение (будь то в R или где-то еще) привело к тому, что моя ранее работавшая функция перестала работать. Функция предназначена для генерации двух столбцов, которые сообщают мне, какой процентный балл (см. df2$CumPercent Для данного балла в опросе (см. df2$V1 ). Поэтому я внес несколько изменений в ручную версию логики, которая работает хорошо. Когда я применяю ту же логику внутри функции, она выдает сообщение об ошибке, указывающее, что Var1 переменная не найдена. Есть идеи, что здесь может пойти не так?

 df5 <- structure(list(MyVariable = c(4.66666666666667, 2.16666666666667, 
                                     5.66666666666667, 4.5, 5.16666666666667, 4.5, 1, 3.83333333333333, 
                                     2, 4, 2.33333333333333, 5.5, 5.66666666666667, 2.66666666666667, 
                                     5.66666666666667, 2.83333333333333, 4.33333333333333, 5.33333333333333, 
                                     5.66666666666667, 4.33333333333333, 2.33333333333333, 4.5, 3.66666666666667, 
                                     3.83333333333333, 2, 5, 2.83333333333333, 3, 4.83333333333333, 
                                     5.16666666666667, 3, 5.16666666666667, 1.33333333333333, 5.16666666666667, 
                                     2.16666666666667, 4, 3.66666666666667, 4, 3.5, 4.5, 3, 5.16666666666667, 
                                     4.83333333333333, 4.66666666666667, 3.16666666666667, 4.16666666666667, 
                                     2.83333333333333, 4.83333333333333, 2.66666666666667, 4.16666666666667, 
                                     5.16666666666667, 6.16666666666667, 1.83333333333333, 3.33333333333333, 
                                     4.5, 4.83333333333333, 5.5, 4.33333333333333, 4.33333333333333, 
                                     4.83333333333333, 2.33333333333333, 4.5, 4.16666666666667, 5.5, 
                                     4.5, 4.83333333333333, 5, 1, 4.5, 5, 2.33333333333333, 4, 3.5, 
                                     3.33333333333333, 4.66666666666667, 1.5, 5.83333333333333, 4.33333333333333, 
                                     5.16666666666667, 3.33333333333333, 4.66666666666667, 6, 4.33333333333333, 
                                     2.16666666666667, 4.16666666666667, 5.83333333333333, 3.66666666666667, 
                                     5, 5.83333333333333, 4.33333333333333, 4.33333333333333, 4.66666666666667, 
                                     4.83333333333333, 5.16666666666667, 5, 3.5, 5, 5.5, 4.66666666666667, 
                                     5.33333333333333, 5.5, 3.66666666666667, 1.83333333333333, 2.33333333333333, 
                                     5, 5.83333333333333, 4.66666666666667, 4.83333333333333, 5.83333333333333, 
                                     3.66666666666667, 3.33333333333333, 2.5, 5.33333333333333, 4.16666666666667, 
                                     4.16666666666667, 3.5, 3, 5.16666666666667, 3.66666666666667, 
                                     5.83333333333333, 4, 5.33333333333333, 6, 3.16666666666667, 2.33333333333333, 
                                     4.66666666666667, 5.66666666666667, 3.5, 4.66666666666667, 1.33333333333333, 
                                     4, 4.33333333333333, 3.5, 3.16666666666667, 5.16666666666667, 
                                     4.66666666666667, 2.83333333333333, 4, 2.5, 2.83333333333333, 
                                     4.83333333333333, 5.33333333333333, 4.5, 3.83333333333333, 4)), row.names = c(NA, 
                                                                                                                   -145L), class = "data.frame")

#Manual version of the cumulative percent logic (which works as intended)
PercentilesRaw <- data.frame(seq(from=0, to=7, by=.01)) #Create every increment of percentile as vector
colnames(PercentilesRaw)[colnames(PercentilesRaw)=="seq.from...0..to...7..by...0.01."] <- "V1" #Rename percentile column name
df <- data.frame(table(df5$MyVariable)) #Count the number of original values in the column
df[,"Var1"] <- as.numeric(as.character(df[,"Var1"])) #The table function above produces factor levels so need to convert to numeric
V1 <- df[,"Var1"] #Make a vector from the Var1 column
Frequency <- df[,"Freq"] #Make a vector from the Freq column
CumSum <- cumsum(df[,"Freq"]) #Calculate a cumulative sum from the Freq column
CumPercent <- CumSum/sum(df[,"Freq"])*100 #Calculate the cumulative percentage vector
CumPercent <- round(CumPercent,2) #Round the cumulative percentage vector to 2 dp
output <- cbind(round(V1,2), CumPercent) #Map the cumulative percent results to the V1 vector
df2 <- data.frame(output) #Convert the two columns into a df

#Now attempt to convert into a function.
cpave1 <- function(x) {
  PercentilesRaw <- data.frame(seq(from=0, to=7, by=.01)) #Create every increment of percentile as vector
  colnames(PercentilesRaw)[colnames(PercentilesRaw)=="seq.from...0..to...7..by...0.01."] <- "V1" #Rename percentile column name
  df <- data.frame(table(x)) #Count the number of original values in the column
  df[,"Var1"] <- as.numeric(as.character(df[,"Var1"])) #The table function above produces factor levels so need to convert to numeric
  V1 <- df[,"Var1"] #Make a vector from the Var1 column
  Frequency <- df[,"Freq"] #Make a vector from the Freq column
  CumSum <- cumsum(df[,"Freq"]) #Calculate a cumulative sum from the Freq column
  CumPercent <- CumSum/sum(df[,"Freq"])*100 #Calculate the cumulative percentage vector
  CumPercent <- round(CumPercent,2) #Round the cumulative percentage vector to 2 dp
  output <- cbind(round(V1,2), CumPercent) #Map the cumulative percent results to the V1 vector
  df2 <- data.frame(output) #Convert the two columns into a df
}

#Apply function to the MyVariable column.
MyVariable <- cpave1(df5$MyVariable)

Ответ №1:

Как следует из сообщения об ошибке "Var1" , в ваших данных нет столбца. Вызывается столбец x . Вот более короткая и обновленная версия вашей функции, которая возвращает тот же результат.

 cpave1 <- function(x) {
  df <- type.convert(data.frame(table(x)), as.is = TRUE)
  data.frame(V1 = round(df$x ,2), 
             CumPercent = round(cumsum(df$Freq)/sum(df$Freq)*100, 2))
}
cpave1(df5$MyVariable)

#     V1 CumPercent
#1  1.00       1.38
#2  1.33       2.76
#3  1.50       3.45
#4  1.83       4.83
#5  2.00       6.21
#6  2.17       8.28
#7  2.33      12.41
#8  2.50      13.79
#...

1. Большое вам спасибо, это работает отлично, но я не понимаю, почему. Я также не понимаю, как вы могли сказать, что функция создавала x ? Ручная версия выдает Var1 так почему моя функция не сделала то же самое (сообщение об ошибке мне не помогло)?

2. Мне пришлось отлаживать функцию, чтобы выяснить это. Имя столбца задается на основе имени входных данных. Например x <- df5$MyVariable;data.frame(table(x)) , создает столбец с x .

3. Ах, я вижу, это действительно полезно знать. Еще раз спасибо!

Ответ №2:

Я думаю, что data.table это лучший способ: решение без создания функции просто:

 library(data.table)
df5 <- data.table(df5)
df5[, .N, MyVariable][order(MyVariable)][, .(MyVariable, CumPercent = round(cumsum(N) / sum(N), 4) * 100)]

или, если вы хотите создать функцию:

 library(data.table)
df5 <- data.table(df5)
cpave2 <- function(data, colname) {
  data[, .N, get(colname)][order(get)][, .(Values = get,
                                           CumPercent = round(cumsum(N) / sum(N), 4) * 100)]
  
}

cpave2(df5, 'MyVariable')