r - Why this assignment using `which` does not work in `data.table`? -
i have data.table
structure:
classes ‘data.table’ , 'data.frame': 1336 obs. of 5 variables: $ timestamp: posixct, format: "2013-02-01 00:03:49" "2013-02-01 00:03:49" "2013-02-01 00:07:54" ... $ hour : int 1 1 1 1 1 1 1 1 1 1 ... $ price : num 21 22 21 22 21 22 35 35.5 35.9 38 ... $ qty : num 50 20 50 20 50 20 15 20 3 30 ... $ timegroup: int 1 250 506 757 758 1004 1253 1 250 506 ... - attr(*, ".internal.selfref")=<externalptr>
example data are:
> df timestamp hour price qty timegroup 1: 2013-02-01 00:03:49 1 21 50 1 2: 2013-02-01 00:03:49 1 22 20 1 3: 2013-02-01 00:07:54 1 21 50 1 4: 2013-02-01 00:07:54 1 22 20 1 5: 2013-02-01 00:11:59 1 21 50 1 --- 1332: 2013-04-07 00:12:10 1 40 50 1 1333: 2013-04-07 00:12:10 1 47 50 1 1334: 2013-04-07 00:12:10 1 53 15 1 1335: 2013-04-07 00:12:10 1 78 50 1 1336: 2013-04-07 00:12:10 1 345 25 1
and trying clean data, because there duplicit entries @ different times. example rows 3 , 4 should deleted because duplicit row 1 , 2, registered @ different time. trying achieve generating groups of timestamps , comparing subsequent groups among themselves. got stuck @ generating groups of date-times.
groups <- unique(df$timestamp) df[,timegroup:=which(timestamp==groups)]
but unknown reason timegroup
column not want create itself. reason error, not me much
warning messages: 1: in `==.default`(timestamp, groups) : longer object length not multiple of shorter object length 2: in `[.data.table`(df, , `:=`(timegroup, which(timestamp == groups))) : supplied 7 items assigned 1336 items of column 'timegroup' (recycled leaving remainder of 6 items).
also sapply
, for
loop work.
can tell me why? seems somehow connected format... thank you.
the answer immediate problem this:
df[, timegroup := .grp, = timestamp]
i don't think understand general problem you're facing suggest solution that.
my relatively wild guess want this:
df = data.table(timestamp = c(1,1,2,2,3,3), var1 = c(1,2,1,2,1,3), var2 = c(1,2,1,2,1,4)) groups = unique(df$timestamp) groups.duplicated = c(false, sapply(seq_along(groups)[-1], function(i) { identical(df[timestamp == groups[i-1],-1,with=f], df[timestamp == groups[i],-1,with=f]) })) df[timestamp %in% groups[!groups.duplicated]] # timestamp var1 var2 #1: 1 1 1 #2: 1 2 2 #3: 3 1 1 #4: 3 3 4
Comments
Post a Comment