r - Why this assignment using `which` does not work in `data.table`? -


i have data.table structure:

classes ‘data.table’ , 'data.frame':  1336 obs. of  5 variables:  $ timestamp: posixct, format: "2013-02-01 00:03:49" "2013-02-01 00:03:49" "2013-02-01 00:07:54" ...  $ hour     : int  1 1 1 1 1 1 1 1 1 1 ...  $ price    : num  21 22 21 22 21 22 35 35.5 35.9 38 ...  $ qty      : num  50 20 50 20 50 20 15 20 3 30 ...  $ timegroup: int  1 250 506 757 758 1004 1253 1 250 506 ...  - attr(*, ".internal.selfref")=<externalptr>  

example data are:

> df                 timestamp hour price qty timegroup    1: 2013-02-01 00:03:49    1    21  50         1    2: 2013-02-01 00:03:49    1    22  20         1    3: 2013-02-01 00:07:54    1    21  50         1    4: 2013-02-01 00:07:54    1    22  20         1    5: 2013-02-01 00:11:59    1    21  50         1   ---                                              1332: 2013-04-07 00:12:10    1    40  50         1 1333: 2013-04-07 00:12:10    1    47  50         1 1334: 2013-04-07 00:12:10    1    53  15         1 1335: 2013-04-07 00:12:10    1    78  50         1 1336: 2013-04-07 00:12:10    1   345  25         1 

and trying clean data, because there duplicit entries @ different times. example rows 3 , 4 should deleted because duplicit row 1 , 2, registered @ different time. trying achieve generating groups of timestamps , comparing subsequent groups among themselves. got stuck @ generating groups of date-times.

groups <- unique(df$timestamp) df[,timegroup:=which(timestamp==groups)] 

but unknown reason timegroup column not want create itself. reason error, not me much

warning messages: 1: in `==.default`(timestamp, groups) :   longer object length not multiple of shorter object length 2: in `[.data.table`(df, , `:=`(timegroup, which(timestamp == groups))) :   supplied 7 items assigned 1336 items of column 'timegroup' (recycled leaving remainder of 6 items). 

also sapply , for loop work.

can tell me why? seems somehow connected format... thank you.

the answer immediate problem this:

df[, timegroup := .grp, = timestamp] 

i don't think understand general problem you're facing suggest solution that.

my relatively wild guess want this:

df = data.table(timestamp = c(1,1,2,2,3,3), var1 = c(1,2,1,2,1,3), var2 = c(1,2,1,2,1,4)) groups = unique(df$timestamp) groups.duplicated = c(false, sapply(seq_along(groups)[-1], function(i) {     identical(df[timestamp == groups[i-1],-1,with=f],               df[timestamp == groups[i],-1,with=f]) }))  df[timestamp %in% groups[!groups.duplicated]] #   timestamp var1 var2 #1:         1    1    1 #2:         1    2    2 #3:         3    1    1 #4:         3    3    4 

Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -