Running PLSR predictions parallel in R using foreach -
users,
i looking solution "parallelize" plsr predictions in order save pprocessing time. trying use "foreach" construct "dopar" (cf. 2nd part of code below), unable allocate predicted values model performance parameters (rmsep) output variable.
the code:
set.seed(10000) # generate data... mat <- replicate(100, rnorm(100)) y <- as.matrix(mat[,1], drop=f) x <- mat[,2:100] ed <- dist(x, method = "euclidean") # distance matrix find close samples edm <- as.matrix(ed) kns <- matrix(na,nrow(x),10) # empty matrix allocate 10 closest samples (i in 1:nrow(edm)) { # identify closest samples in loop , allocate kns kns[i,] <- head(order(edm[,i]), 11)[-1] }
so far consider code "safe", next part challenging me, since never used "foreach" construct before:
library(pls) library(foreach) library(doparallel) cl <- makecluster(2) registerdoparallel(cl) out <- foreach(j = 1:nrow(mat), .combine="rbind", .packages="pls") %dopar% { pls <- plsr(y ~ x, ncomp=5, validation="cv", , subset=kns[j,]) predict(pls, ncomp=5, newdata=x[j,,drop=f]) rmsep(pls, estimate="cv")$val[1,1,5] } stopcluster(cl)
as understand, code line starting "rmsep(pls,..." overwriting written data "predict" code line. somehow assuming .combine
option take care of this?
many help!
best, chega
if want return 2 objects body of foreach
loop, need put them object such list:
out <- foreach(j = 1:nrow(mat), .packages="pls") %dopar% { pls <- plsr(y ~ x, ncomp=5, validation="cv", , subset=kns[j,]) list(p=predict(pls, ncomp=5, newdata=x[j,,drop=f]), r=rmsep(pls, estimate="cv")$val[1,1,5]) }
only "final value" of loop body returned master , processed .combine
function.
note removed .combine
argument result list of lists of length 2. it's not clear me rbind
appropriate function use process results.
Comments
Post a Comment