Latest web development tutorials

MongoDB Map Reduce

Map-Reduce is a computing model, simply means that the bulk of the work (data) decomposition (MAP) to perform, and then merge the results into a final result (REDUCE).

MongoDB provides a Map-Reduce is very flexible for large-scale data analysis is also quite practical.


MapReduce command

The following is the basic syntax of MapReduce:

>db.collection.mapReduce(
   function() {emit(key,value);},  //map 函数
   function(key,values) {return reduceFunction},   //reduce 函数
   {
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

Using MapReduce functions to achieve the two functions Map and Reduce functions, Map function call emit (key, value), traverse the collection in all the records, and the key value is passed to the Reduce function for processing.

Map function must call emit (key, value) Returns pairs.

Parameter Description:

  • map: mapping function (generate key sequences as reduce function parameters).
  • reduce statistical functions, the task is to reduce the function key-values into a key-value, that is, the values array into a single value value..
  • out statistical results stored set (do not specify the use of temporary collection automatically deleted after a client is disconnected).
  • aquery filter condition, only to meet the conditions of the document will be called map function.(Query.limit, sort can mix)
  • sort and limit binding sort sort parameter (also a former document sent to the map function to sort), you can optimize the grouping mechanism
  • limit the number of documents sent to the upper limit of the map function (if there is no limit, alone sort of little use)

Use MapReduce

Consider the following document structure to store the user's articles, documents, and stores the user user_name article status field:

>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "mark",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "w3big",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "w3big",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "本教程,最全的技术文档。",
   "user_name": "w3big",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })

Now we will use the posts set mapReduce function to select a published article (status: "active"), and by user_name packet calculated for each user Posts:

>db.posts.mapReduce( 
   function() { emit(this.user_name,1); }, 
   function(key, values) {return Array.sum(values)}, 
      {  
         query:{status:"active"},  
         out:"post_total" 
      }
)

Above mapReduce output is:

{
        "result" : "post_total",
        "timeMillis" : 23,
        "counts" : {
                "input" : 5,
                "emit" : 5,
                "reduce" : 1,
                "output" : 2
        },
        "ok" : 1
}

The results showed that a total of four match the query criteria (status: "active") documents generated four key in the map function in the document, and then use the same function to reduce key divided into two groups.

Specific parameters:

  • result: store the results of the collection's name, this is a temporary set back off the automatic connection of MapReduce has been deleted.
  • timeMillis: execution takes time, in milliseconds
  • input: the condition number of the document is sent to the map function
  • emit: times in map function emit is called, that is, the total amount of all data collection
  • ouput: Results Number of documents in the collection (count is very helpful for debugging)
  • ok: success, success 1
  • err: If it fails, there may be a reason to fail, but from the experience, the reason is vague, not very useful

Using the find operator to view the query results mapReduce of:

>db.posts.mapReduce( 
   function() { emit(this.user_name,1); }, 
   function(key, values) {return Array.sum(values)}, 
      {  
         query:{status:"active"},  
         out:"post_total" 
      }
).find()

The results of the above query is shown below, there are two users tom and mark two articles published:

{ "_id" : "mark", "value" : 4 }
{ "_id" : "w3big", "value" : 1 }

In a similar manner, MapReduce can be used to build large, complex aggregate queries.

Map function and Reduce functions can be implemented using JavaScript, MapReduce make use of very flexible and powerful.