r - Repeating sequential functions when creating multiple lists/matrices/dataframes within `lapply` -
this question elaboration on previous question i'd asked repeating functions on sequentially-labeled dataframes.
in past, needed make minor alterations data.tables read in folder r (e.g. changing dates, recoding).
now, however, goals bit more complex: i'd read in several text files folder, take random sample character vectos, read random sample corpus (using package tm) , generate new data.frame has list of words/phrases , frequencies.
the code i've developed far follows:
bigramtokenizer <- function(x) ngramtokenizer(x, weka_control(min = 1, max = 5)) # finds words or phrases files <- list.files("~/path/", full.names = true, pattern="\\.txt$") # reads in files out <- lapply(1:length(files), function(x) { df <- scan(files[x], what="", sep="\n") # read in files df<-sample(c(df),size=1500,replace=f) # take random sample corpus <- corpus(vectorsource(df)) # create corpus corpus <- tm_map(corpus, stripwhitespace) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removewords, stopwords("english")) tdm <- termdocumentmatrix(corpus, control = list(tokenize = bigramtokenizer)) #create term document matrix m <- as.matrix(tdm) v <- sort(rowsums(m),decreasing=true) d <- data.frame(word = names(v),freq=v) # create new dataframe words & frequencies }) however, although function works, i'm not sure how access data.frames d while discarding rest? out contain of objects created in lapply?
the lapply function returns list containing values returned specified function. in example, function returns data frame assigned d, out list containing d data frames. of other objects created function (such tdm, m, , v) discarded, seems want.
you can access data frames in out indexing them, in out[[1]], lapply, in lapply(out, function(d) d$word), or combining them do.call('rbind', out).
Comments
Post a Comment