mongodb - How to extract total count for values not in array type fields -


i storing complete message thread (with messages) single document. array field participants holds participants user ids. every message has array field read_by user ids read message.

example data:

db.threads_test.insert( { "subject" : "subject 1", "participants" : ["u1", "u2"], "messages" : [ {"message_id" : "m1", "message" : "msg 1", "read_by" : ["u1"]}, {"message_id" : "m2", "message" : "msg 2", "read_by" : ["u2"]} ]});  db.threads_test.insert( { "subject" : "subject 2", "participants" : ["u1", "u2"], "messages" : [ {"message_id" : "m3", "message" : "msg 3", "read_by" : ["u1"]}, {"message_id" : "m4", "message" : "msg 4", "read_by" : ["u1"]} ]});  db.threads_test.insert( { "subject" : "subject 3", "participants" : ["u1", "u3"], "messages" : [ {"message_id" : "m5", "message" : "msg 5", "read_by" : ["u1", "u3"]} ]}); 

i need found out how many unread threads , how many unread messages user have.

logic interpretation this:

  • unread thread 1 user in participants array , has @ least 1 message user not in read_by
  • unread message 1 belong thread has user in participants , not have user in read_by

expected unread counts per user:

u1: threds=1, messages=1 u2: threads=2, messages=3 u3: threads=0, messages=0 

i've been checking aggregation framework not find solution.

mongo version 2.4.9

this not easy 1 aggregation framework , due many arrays, there ways can wrong. fortunately there sound logic pattern here when @ closely since comes down comparison point:

db.threads_test.aggregate([     // unwind arrays     { "$unwind": "$messages" },     { "$unwind": "$messages.read_by" },     { "$unwind": "$participants" },      // group on distinct "message_id" comparing "particpant" , "read_by"     { "$group": {         "_id": {             "_id": "$_id",             "participant": "$participants",             "message_id": "$messages.message_id"         },         "unread": {              "$min": {                 "$cond": [                     { "$ne": [ "$participants", "$messages.read_by" ] },                     1,                     0                 ]             }         }     }},      // sum of unread per thread     { "$group": {         "_id": {             "_id": "$_id._id",             "participant": "$_id.participant",         },         "unread": { "$sum": "$unread" }     }},      // sum per participant counting unread threads     { "$group": {         "_id": "$_id.participant",         "threads": {              "$sum": {                  "$cond": [                     { "$ne": [ "$unread", 0 ] },                     1,                     0                 ]              }                },         "unread": { "$sum": "$unread" }     }} ]) 

which gives result:

{ "_id" : "u2", "threads" : 2, "unread" : 3 } { "_id" : "u3", "threads" : 0, "unread" : 0 } { "_id" : "u1", "threads" : 1, "unread" : 1 } 

the first $group stage there critical. ater $unwind being processed on each array there going lot of duplication invoved. fortunately each level of "thread" , "message" have own distinct "id" values. along distinct "participants" vital point.

when @ data in "unwound" form, should able see through duplication "key test" here compare "participant" , "read_by" values see if same. , "processing loops" ( except combinations not laid out ), need return "once" given message "participant" , "ready_by" "equal".

this explains "grouping" combination. "key" comprised of "thread", "participant" , "message_id" need $min numerical result comparison "read_by". if @ least "one" of "read_by" matched count 1 otherwise it's 0.

the next stages careful grouping totals. first total "unread" count per thread, count threads unread messages down final "participant" grouping key.

so though it's not "always" path finding solution, doing $unwind operations @ start way visualize data can understand solution.


alternate method

as state, have mongodb 2.4 available this, , depending on size of collection processing $unwind can cause lot of overhead. later versions have provision possibly problem.

i mentioned earlier "processing loops", can mapreduce instead.

while aggregation framework preferred here, might need consider if size restriction:

db.threads_test.mapreduce(     function () {       var doc = this;       doc.participants.foreach(function(participant) {         doc.messages.foreach(function(message) {           var obj = {             threads: [],             unread: 0           };            if ( message.read_by.indexof(participant) == -1 ) {             obj.threads.push(doc._id.valueof());             obj.unread = 1;           }            emit(participant,obj);         })       })     },     function (key,values) {        var result = { "threads": [], "unread": 0 };        values.foreach(function(value) {         value.threads.foreach(function(thread) {           if ( result.threads.indexof(thread) == -1 )             result.threads.push(thread);         })         result.unread += value.unread;       });        return result;      },     {         "finalize": function(key,value) {            value.threads = value.threads.length;            return value;        },        "out": { "inline": 1 }     } ) 

same thing here really. each participant on thread each message, compared "read_by" list see if in there. emit "thread id" when message unread , result if "unread". emitted every message on thread in combination participant. "loop of loops".

the results "reduced" pulling distinct values "threads" , totalling unread messages participant.

since "threads" distinct list of "id" values, want length of list in end after reduction. "finalize" here , converts list numeric value of it's length.

same results not pretty restriction of mapreduce:

    "results" : [             {                     "_id" : "u1",                     "value" : {                             "threads" : 1,                             "unread" : 1                     }             },             {                     "_id" : "u2",                     "value" : {                             "threads" : 2,                             "unread" : 3                     }             },             {                     "_id" : "u3",                     "value" : {                             "threads" : 0,                             "unread" : 0                     }             }     ], 

no matter 1 suits best, problem solution should clear both programitic brain , aggregation brain


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -