r - Repeating sequential functions when creating multiple lists/matrices/dataframes within `lapply` -
this question elaboration on previous question i'd asked repeating functions on sequentially-labeled dataframes.
in past, needed make minor alterations data.tables
read in folder r (e.g. changing dates, recoding).
now, however, goals bit more complex: i'd read in several text files folder, take random sample character vectos, read random sample corpus (using package tm
) , generate new data.frame
has list of words/phrases , frequencies.
the code i've developed far follows:
bigramtokenizer <- function(x) ngramtokenizer(x, weka_control(min = 1, max = 5)) # finds words or phrases files <- list.files("~/path/", full.names = true, pattern="\\.txt$") # reads in files out <- lapply(1:length(files), function(x) { df <- scan(files[x], what="", sep="\n") # read in files df<-sample(c(df),size=1500,replace=f) # take random sample corpus <- corpus(vectorsource(df)) # create corpus corpus <- tm_map(corpus, stripwhitespace) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removewords, stopwords("english")) tdm <- termdocumentmatrix(corpus, control = list(tokenize = bigramtokenizer)) #create term document matrix m <- as.matrix(tdm) v <- sort(rowsums(m),decreasing=true) d <- data.frame(word = names(v),freq=v) # create new dataframe words & frequencies })
however, although function works, i'm not sure how access data.frames d
while discarding rest? out
contain of objects created in lapply
?
the lapply
function returns list containing values returned specified function. in example, function returns data frame assigned d
, out
list containing d
data frames. of other objects created function (such tdm
, m
, , v
) discarded, seems want.
you can access data frames in out
indexing them, in out[[1]], lapply, in lapply(out, function(d) d$word)
, or combining them do.call('rbind', out)
.
Comments
Post a Comment