mongodb - Map Reduce kind of query with need to correlate with previous row -


i have following schema:

  • client id
  • location name
  • time of visit
  • purchases made // list

since unstructured data, flat db mongodb natural fit. using mongodb.

this data stores client visit information @ various locations. suppose want find out number of repeat visits on particular day. logic repeat visit simple: if person has visited store today had visited same store earlier, he's repeat visitor @ store.

i've logic using find out number of repeat visitors:

query: select * schema order location id asc , client id asc, 'time of visit' asc

once data above query sorted, can compare 'time of visit' previous , next rows if locationid , client id match. if difference if of > 1 day, repeat visit.

since data huge, join type of query highly inefficient (even if possible in mongodb).

now understand there map reduce framework in mongodb. however, possible perform comparison between previous , current record, , computation based on that, after map/reduce triggered?

example :

  • customer visits store b on day 1 // no repeat visit
  • customer visits store b again on day 1 // still no repeat visit
  • customer visits store b on day 2 // repeat visit customer on day 2
  • customer visits store b on day 2 // counted repeat visit on day 2
  • customer visits store b on day 3 // repeat visit customer on day 3

  • customer c visits store b on day 2 // first visit customer c, not repeat visit

  • customer c visits store b again on day 2 // first day of visit, not repeat visit
  • customer c visits store b on day 3 // repeat visit on day 3
  • customer c visits store b on day 4 // repeat visit on day 4

  • customer d visits store b on day 5 // first visit, not repeat visit

final output of repeat visits:

  • store b, day 1 : 0 repeat visits
  • store b, day 2 : 2 repeat visits
  • store b, day 3 : 2 repeat visits
  • store b, day 4 : 1 repeat visits
  • store b, day 5 : 0 repeat visits

if doing in relational database not comparing visits row row, instead use aggregation query find repeat visits (using select ... group by) should same way in mongodb.

first need aggregate visits per customer per store per day:

group1 = { "$group" : {         "_id" : {             "c" : "$clientid",             "l" : "$location",             "day" : {                 "y" : {                     "$year" : "$tov"                 },                 "m" : {                     "$month" : "$tov"                 },                 "d" : {                     "$dayofmonth" : "$tov"                 }             }         },         "visits" : {             "$sum" : 1         }     } }; 

edit since want repeat days next group customer, store , count how many different days there visits customer store:

group2 = {"$group" :      {"_id" : {         "c" : "$_id.c",         "s" : "$_id.l"     },     "totaldays" : {         "$sum" : 1     } } }; 

then want include records above there more 1 visit same customer same store across multiple days:

match = { "$match" : { "totaldays" : { "$gt" : 1 } } }; 

here's sample data set , result of aggregations using above pipeline operations:

> db.visits.find({},{_id:0,purchases:0}).sort({location:1, clientid:1, tov:1}) { "clientid" : 1, "location" : "l1", "tov" : isodate("2013-01-01t20:00:00z") } { "clientid" : 1, "location" : "l1", "tov" : isodate("2013-01-01t21:00:00z") } { "clientid" : 1, "location" : "l1", "tov" : isodate("2013-01-03t20:00:00z") } { "clientid" : 2, "location" : "l1", "tov" : isodate("2013-01-01t21:00:00z") } { "clientid" : 3, "location" : "l1", "tov" : isodate("2013-01-01t21:00:00z") } { "clientid" : 3, "location" : "l1", "tov" : isodate("2013-01-02t21:00:00z") } { "clientid" : 1, "location" : "l2", "tov" : isodate("2013-01-01t23:00:00z") } { "clientid" : 3, "location" : "l2", "tov" : isodate("2013-01-02t21:00:00z") } { "clientid" : 3, "location" : "l2", "tov" : isodate("2013-01-02t21:00:00z") } { "clientid" : 1, "location" : "l3", "tov" : isodate("2013-01-03t20:00:00z") } { "clientid" : 2, "location" : "l3", "tov" : isodate("2013-01-04t20:00:00z") } { "clientid" : 4, "location" : "l3", "tov" : isodate("2013-01-04t20:00:00z") } { "clientid" : 4, "location" : "l3", "tov" : isodate("2013-01-04t21:00:00z") } { "clientid" : 4, "location" : "l3", "tov" : isodate("2013-01-04t22:00:00z") }  > db.visits.aggregate(group1, group2, match) {     "result" : [     {         "_id" : {             "c" : 3,             "s" : "l1"         },         "totaldays" : 2     },     {         "_id" : {             "c" : 1,             "s" : "l1"         },         "totaldays" : 2     }     ],     "ok" : 1 } 

Comments

Popular posts from this blog

c# - Operator '==' incompatible with operand types 'Guid' and 'Guid' using DynamicExpression.ParseLambda<T, bool> -