Buscar

sábado, 29 de noviembre de 2014

MongoDB course for developers. unit 5/8. Aggregation framework

Simple Aggregation Example

Write the aggregation query that will find the number of products by category of a collection that has the form:


{
 "_id" : ObjectId("50b1aa983b3d0043b51b2c52"),
 "name" : "Nexus 7",
 "category" : "Tablets",
 "manufacturer" : "Google",
 "price" : 199
}

Have the resulting key be called "num_products," as in the video lesson. Hint, you just need to change which key you are aggregating on relative to the examples shown in the lesson.
Please double quote all keys to make it easier to check your result.
db.products.aggregate([
 {"$group": { "_id": "$category",
                       "num_products" : {"$sum" : 1}
                    }
 }
])

The Aggregation Pipeline

  • $project: reshape the document. (remodela el documento). (can manipulate 1 document to 1)
  • $match: filter etape, you can select all the documents. (can reduce n documents to 1)
  • $group: allow aggregation (perform opertarions like that sum, count, avg). (can reduce n documents to 1)
  • $sort: sort documents in a particular order. (can manipulate 1 document to 1)
  • $skip: to skip documents (skip 10 or 100 documets, etc). (can reduce n documents to 1)
  • $limit: limit the number of documents. (can reduce n documents to 1)
  • $unwind: to normalize the data. (can increments 1 documents to n) like that:
    • "tags": ["abc","xyz","123"] is transform to
      • "tags": "abc"
      • "tags": "xyz"
      • "tags": "123"
  • $out: out the result. (can manipulate 1 document to 1). By default the result of aggregation is returned in a cursor but you can redirect to a collection using $out
others operators:

  • $redact: to limit that documents the users can see
  • $geoNear: to pipeline stages based on geolocation
Which of the following are stages in the aggregation pipeline. Check all that apply.





Simple Example Expanded

If you have the following collection of stuff:




> db.stuff.find()
{ "_id" : ObjectId("50b26f9d80a78af03b5163c8"), "a" : 1, "b" : 1, "c" : 1 }
{ "_id" : ObjectId("50b26fb480a78af03b5163c9"), "a" : 2, "b" : 2, "c" : 1 }
{ "_id" : ObjectId("50b26fbf80a78af03b5163ca"), "a" : 3, "b" : 3, "c" : 1 }
{ "_id" : ObjectId("50b26fcd80a78af03b5163cb"), "a" : 3, "b" : 3, "c" : 2 }
{ "_id" : ObjectId("50b26fd380a78af03b5163cc"), "a" : 3, "b" : 5, "c" : 3 }
and you perform the following aggregation:
db.stuff.aggregate([{$group:{_id:'$c'}}])
How many documents will be in the result set from aggregate?
3

Compound Grouping

Given the following collection:
> db.stuff.find()
{ "_id" : ObjectId("50b26f9d80a78af03b5163c8"), "a" : 1, "b" : 1, "c" : 1 }
{ "_id" : ObjectId("50b26fb480a78af03b5163c9"), "a" : 2, "b" : 2, "c" : 1 }
{ "_id" : ObjectId("50b26fbf80a78af03b5163ca"), "a" : 3, "b" : 3, "c" : 1 }
{ "_id" : ObjectId("50b26fcd80a78af03b5163cb"), "a" : 3, "b" : 3, "c" : 2 }
{ "_id" : ObjectId("50b26fd380a78af03b5163cc"), "a" : 3, "b" : 5, "c" : 3 }
{ "_id" : ObjectId("50b27f7080a78af03b5163cd"), "a" : 3, "b" : 3, "c" : 2 }
And the following aggregation query:
db.stuff.aggregate([{$group:
       {_id:
        {'moe':'$a', 
         'larry':'$b',
         'curly':'$c'
        }
       }
      }])
How many documents will be in the result set?
5

Using a document for _id

In order to use a document for "_id", the document will be unique.

db.example.insert({ "_id" { "a" : 3, "b" : 3}, "c":1})
db.example.insert({ "_id" { "a" : 3, "b" : 3}, "c":1})
--> duplicate key error

Aggregation Expressions

Expressions you can use in $group stage:
  • $sum: add a constant, a value of a document or the value of the key
  • $avg: divide the avg of the keys across the documents
  • $min: find the minimun value of the documents
  • $max: find the maximun value of the documents
  • $push: build array adding keys to an array
  • $addToSet: build arrays adding unique keys to an array 
  • $first: require to first sort the documents. find the value of the first document from the result of the $group operartor
  • $last: require to first sort the documents. find the value of the last document from the result of the $group operartor
Which of the following aggregation expressions must be used in conjunction with a sort to make any sense?

Using $sum

This problem, and some after it, use the zips collection from media.mongodb.org/zips.json. You don't need to download it, but you can if you want, allowing you to test your queries within MongoDB. You can import, once downloaded, using mongoimport




Suppose we have a collection of populations by postal code. The postal codes in are in the _id field, anddb.zips.aggregate([
{$group: {_id:"$state", average_pop:{$avg: "$pop"} } }
])
db.zips.aggregate([
{$group: {_id:"$state", average_pop:{$avg: "$pop"} } }
])
db.zips.aggregate([
{$group: {_id:"$state", average_pop:{$avg: "$pop"} } }
])
 are therefore unique. Documents look like this:
{
 "city" : "CLANTON",
 "loc" : [
  -86.642472,
  32.835532
 ],
 "pop" : 13990,
 "state" : "AL",
 "_id" : "35045"
}
For students outside the United States, there are 50 non-overlapping states in the US with two letter abbreviations such as NY and CA. In addition, the capital of Washington is within an area designated the District of Columbia, and carries the abbreviation DC. For purposes of the mail, the postal service considers DC to be a "state." So in this dataset, there are 51 states. We call postal codes "zip codes." A city may overlap several zip codes.
Write an aggregation query to sum up the population (pop) by state and put the result in a field called population. Don't use a compound _id key (you don't need one and the quiz checker is not expecting one). The collection name is zips. so something along the lines of db.zips.aggregrate...
db.zips.aggregate([
{$group: { "_id": { "state": "$state"},
                  "population": { "$sum": "$pop"}}}
])

Using $avg

This problem uses the same dataset as we described in using $sum quiz and you should review that quiz if you did not complete it.




Given population data by zip code (postal code) that looks like this:
{
 "city" : "FISHERS ISLAND",
 "loc" : [
  -72.017834,
  41.263934
 ],
 "pop" : 329,
 "state" : "NY",
 "_id" : "06390"
}
Write an aggregation expression to calculate the average population of a zip code (postal code) by state. As before, the postal code is in the _id field and is unique. The collection is assumed to be called "zips" and you should name the key in the result set "average_pop".


db.zips.aggregate([
     { $group: { "_id" : "$state",
                        "average_pop" : {$avg: "$pop"} 
                      } 
      }
])

Using $addToSet

This problem uses the same zip code data as the $using sum quiz. See that quiz for a longer explanation.




Suppose we population by zip code (postal code) data that looks like this (putting in a query for the zip codes in Palo Alto)
> db.zips.find({state:"CA",city:"PALO ALTO"})
{ "city" : "PALO ALTO", "loc" : [ -122.149685, 37.444324 ], "pop" : 15965, "state" : "CA", "_id" : "94301" }
{ "city" : "PALO ALTO", "loc" : [ -122.184234, 37.433424 ], "pop" : 1835, "state" : "CA", "_id" : "94304" }
{ "city" : "PALO ALTO", "loc" : [ -122.127375, 37.418009 ], "pop" : 24309, "state" : "CA", "_id" : "94306" }
Write an aggregation query that will return the postal codes that cover each city. The results should look like this:
  {
   "_id" : "CENTREVILLE",
   "postal_codes" : [
    "22020",
    "49032",
    "39631",
    "21617",
    "35042"
   ]
  },
Again the collection will be called zips. You can deduce what your result column names should be from the above output. (ignore the issue that a city may have the same name in two different states and is in fact two different cities in that case - for eg Springfield, MO and Springfield, MA)


db.zips.aggregate([
{
    "$group" : { "_id": {  "city" : "$city"},
                "postal_codes" : { "$addToSet" : "$_id"}
    }
}

])

Using $push

Given the zipcode dataset (explained more fully in the using $sum quiz) that has documents that look like this:




> db.zips.findOne()
{
 "city" : "ACMAR",
 "loc" : [
  -86.51557,
  33.584132
 ],
 "pop" : 6055,
 "state" : "AL",
 "_id" : "35004"
}

would you expect the following two queries to produce the same result or different results?
db.zips.aggregate([{"$group":{"_id":"$city", "postal_codes":{"$push":"$_id"}}}])
db.zips.aggregate([{"$group":{"_id":"$city", "postal_codes":{"$addToSet":"$_id"}}}])

Using $max and $min

Again thinking about the zip code database, write an aggregation query that will return the population of the postal code in each state with the highest population. It should return output that looks like this:




{
   "_id" : "WI",
   "pop" : 57187
  },
  {
   "_id" : "WV",
   "pop" : 70185
  },
..and so on
Once again, the collection is named zips.


db.zips.aggregate([
{
             $group:{_id:"$state", pop: {$max:"$pop"}}
}
])

Double $group stages

Given the following collection:




> db.fun.find()
{ "_id" : 0, "a" : 0, "b" : 0, "c" : 21 }
{ "_id" : 1, "a" : 0, "b" : 0, "c" : 54 }
{ "_id" : 2, "a" : 0, "b" : 1, "c" : 52 }
{ "_id" : 3, "a" : 0, "b" : 1, "c" : 17 }
{ "_id" : 4, "a" : 1, "b" : 0, "c" : 22 }
{ "_id" : 5, "a" : 1, "b" : 0, "c" : 5 }
{ "_id" : 6, "a" : 1, "b" : 1, "c" : 87 }
{ "_id" : 7, "a" : 1, "b" : 1, "c" : 97 }

And the following aggregation query
db.fun.aggregate([{$group:{_id:{a:"$a", b:"$b"}, c:{$max:"$c"}}}, {$group:{_id:"$_id.a", c:{$min:"$c"}}}])
What values are returned?

$project

With $project you can:
  • remove keys
  • add new keys
  • reshape keys
  • use some simple:
    • $toUper
    • $tiLower
    • $add
    • $multiply
The main reason to use a $project is because you want to clean up the documents but you can also do it at beginning if you want eliminate and manipulate specific keys. It is a filtering stage to eliminate a lot of data before sending to a $group.

Write an aggregation query with a single projection stage that will transform the documents in the zips collection from this:



{
 "city" : "ACMAR",
 "loc" : [
  -86.51557,
  33.584132
 ],
 "pop" : 6055,
 "state" : "AL",
 "_id" : "35004"
}

to documents in the result set that look like this:
{
 "city" : "acmar",
 "pop" : 6055,
 "state" : "AL",
 "zip" : "35004"
}
So that the checker works properly, please specify what you want to do with the _id key as the first item. The other items should be ordered as above. As before, assume the collection is called zips. You are running only the projection part of the pipeline for this quiz.

A few facts not mentioned in the lesson that you will need to know to get this right: If you don't mention a key, it is not included, except for _id, which must be explicitly suppressed. If you want to include a key exactly as it is named in the source document, you just write key:1, where key is the name of the key. You will probably get more out of this quiz is you download the zips.json file and practice in the shell. zips.json link is in the using $sum quiz

db.zips.aggregate([
{ "$project" :
    {   "_id":0,
        "city": { "$toLower" : "$city" },
        "pop" : true,
        "state": true,
        "zip": "$_id"
    }
}
])

$match

If a document matches, it will be pushed to the next aggregation pipeline.

Reasons to use:

  • pre aggregation to filter documents
  • filter the results
One thing to note about $match (and $sort) is that they can use indexes, but only if done at the beginning of the aggregation pipeline.


Again, thinking about the zipcode collection, write an aggregation query with a single match phase that filters for zipcodes with greater than 100,000 people. You may need to look up the use of the $gt operator in the MongoDB docs.




Assume the collection is called zips.


db.zips.aggregate([
    { 
    "$match" : { "pop" : {"$gt" : 100000 }}
    }
])


$text

Full text search works so well with aggregation in mongoDB 2.6. It is better to have an index of text to use aggregation. In this example the aggregate will search in the "words" for the words "tree" or "rat" and it si not necessary to specify any key in $match stage bacause the collection only can have a text index and in this case we have it created:

db.test.aggregate([
    { "$match": { "$text" : {"$search" : "tree rat"} } 
    }
    ,{ "$project" : { "words" : true, "_id" : false }
    }
])

You can also modify the weight of the words in order to have first the documents that have the two words and after those that have only one word.

db.test.aggregate([
    { "$match": { "$text" : {"$search" : "tree rat"} } 
    }
   ,{ "$sort" : { score : { "$meta" : "textScore" } }
    }
    ,{ "$project" : { "words" : true, "_id" : false }
    }
])

The $match stage that includes $text searches must appear first in the aggregation, because if you use a text index, it can be used in that stage and not in next stages.


Which of the following statements are true about using a $text operator in the aggregation pipeline

$sort

The aggregation allow disk and memory sort based and by default it uses memory with s limit of 100MB. it can be used it before or after the grouping stage.

Again, considering the zipcode collection, which has documents that look like this,




{
 "city" : "ACMAR",
 "loc" : [
  -86.51557,
  33.584132
 ],
 "pop" : 6055,
 "state" : "AL",
 "_id" : "35004"
}

Write an aggregation query with just a sort stage to sort by (state, city), both ascending. Assume the collection is called zips.


db.zips.aggregate([
{$sort: {state:1, city:1} }
])

$limit and $skip

It is necessary to sort the documents before use, in order to have a defined result. To have sense, we use $skip and then $limit.

Suppose you change the order of skip and limit in the query shown in the lesson, to look like this:



db.zips.aggregate([
    {$match:
     {
  state:"NY"
     }
    },
    {$group:
     {
  _id: "$city",
  population: {$sum:"$pop"},
     }
    },
    {$project:
     {
  _id: 0,
  city: "$_id",
  population: 1,
     }
    },
    {$sort:
     {
  population:-1
     }
    },
    {$limit: 5},
    {$skip: 10} 
])

How many documents do you think will be in the result set?

Revisiting $first and $last

They are group operators. Example:

db.zips.aggregate([
{

/* get the population of every city in every state */

"$group" : {
"_id" : { "state" :"$state", "city": "$city"},
population : { "$sum" : "$pop"}
}
}

 /* sort by state and population */

,{
"$sort" : { "_id.state": 1, "population": -1}
}

/* group by state, get the first item in each group */

,{
"$group" : { "_id" : "$_id.state",
"city" : { "$first" : "$_id.city"},
"population": { "$first" : "$population"}
}
}

/* now sort by state again */
,{
"$sort" : { "_id" : 1}

}
])

Given the following collection:



> db.fun.find()
{ "_id" : 0, "a" : 0, "b" : 0, "c" : 21 }
{ "_id" : 1, "a" : 0, "b" : 0, "c" : 54 }
{ "_id" : 2, "a" : 0, "b" : 1, "c" : 52 }
{ "_id" : 3, "a" : 0, "b" : 1, "c" : 17 }
{ "_id" : 4, "a" : 1, "b" : 0, "c" : 22 }
{ "_id" : 5, "a" : 1, "b" : 0, "c" : 5 }
{ "_id" : 6, "a" : 1, "b" : 1, "c" : 87 }
{ "_id" : 7, "a" : 1, "b" : 1, "c" : 97 }
What would be the value of c in the result from this aggregation query
db.fun.aggregate([
    {$match:{a:0}},
    {$sort:{c:-1}}, 
    {$group:{_id:"$a", c:{$first:"$c"}}}
])

 $unwind

Example:
 db.items.findOne()
{
"_id" : "nail",
"attributes" : [
"hard",
"shiny",
"pointy",
"thin"
]

}
when we apply $unwind operator we get:

{ "_id" : "nail", "attributes" : "hard" }
{ "_id" : "nail", "attributes" : "shiny" }
{ "_id" : "nail", "attributes" : "pointy" }

{ "_id" : "nail", "attributes" : "thin" }


Suppose you have the following collection:

db.people.find()

{ "_id" : "Barack Obama", "likes" : [ "social justice", "health care", "taxes" ] }
{ "_id" : "Mitt Romney", "likes" : [ "a balanced budget", "corporations", "binders full of women" ] }
And you unwind the "likes" array of each document. How many documents will you wind up with?

$unwind example

Which grouping operator will enable to you to reverse the effects of an unwind?





Doble $unwind

When you have more than one array to unwind.

Examples:

db.inventory.aggregate([
    {$unwind: "$sizes"},
    {$unwind: "$colors"},
    /* create the color array */
    {$group: 
     {
 '_id': {name:"$name",size:"$sizes"},
  'colors': {$push: "$colors"},
     }
    },
    /* create the size array */
    {$group: 
     {
 '_id': {'name':"$_id.name",
  'colors' : "$colors"},
  'sizes': {$push: "$_id.size"}
     }
    },
    /* reshape for beauty */
    {$project: 
     {
  _id:0,
  "name":"$_id.name",
  "sizes":1,
  "colors": "$_id.colors"
     }
    } 
])

{
"sizes" : [
"Large",
"Medium",
"Small"
],
"name" : "Polo Shirt",
"colors" : [
"navy",
"white",
"orange",
"red"
]
}
{
"sizes" : [
"X-Large",
"Large",
"Medium",
"Small"
],
"name" : "T-Shirt",
"colors" : [
"navy",
"black",
"orange",
"red"
]
}
{
"sizes" : [
"31x30",
"32x32",
"36x32"
],
"name" : "Chino Pants",
"colors" : [
"navy",
"white",
"orange",
"violet"
]
}

db.inventory.aggregate([
    {$unwind: "$sizes"},
    {$unwind: "$colors"},
    {$group: 
     {
 '_id': "$name",
  'sizes': {$addToSet: "$sizes"},
  'colors': {$addToSet: "$colors"},
     }
    }
])


{
 "_id" : "Chino Pants",
 "sizes" : [
  "36x32",
  "31x30",
  "32x32"
 ],
 "colors" : [
  "violet",
  "white",
  "orange",
  "navy"
 ]
}
{
 "_id" : "T-Shirt",
 "sizes" : [
  "X-Large",
  "Large",
  "Medium",
  "Small"
 ],
 "colors" : [
  "red",
  "black",
  "orange",
  "navy"
 ]
}
{
 "_id" : "Polo Shirt",
 "sizes" : [
  "Large",
  "Medium",
  "Small"
 ],
 "colors" : [
  "red",
  "white",
  "orange",
  "navy"
 ]
}


db.inventory.aggregate([
    {$unwind: "$sizes"},
    {$unwind: "$colors"},
    {$group: 
     {
 '_id': {'size':'$sizes', 'color':'$colors'},
 'count' : {'$sum':1}
     }
    }
])


{ "_id" : { "size" : "36x32", "color" : "orange" }, "count" : 1 }
{ "_id" : { "size" : "36x32", "color" : "white" }, "count" : 1 }
{ "_id" : { "size" : "36x32", "color" : "navy" }, "count" : 1 }
{ "_id" : { "size" : "31x30", "color" : "orange" }, "count" : 1 }
{ "_id" : { "size" : "32x32", "color" : "orange" }, "count" : 1 }
{ "_id" : { "size" : "32x32", "color" : "white" }, "count" : 1 }
{ "_id" : { "size" : "X-Large", "color" : "red" }, "count" : 1 }
{ "_id" : { "size" : "36x32", "color" : "violet" }, "count" : 1 }
{ "_id" : { "size" : "Small", "color" : "black" }, "count" : 1 }
{ "_id" : { "size" : "X-Large", "color" : "orange" }, "count" : 1 }
{ "_id" : { "size" : "Medium", "color" : "black" }, "count" : 1 }
{ "_id" : { "size" : "Large", "color" : "orange" }, "count" : 2 }


Can you reverse the effects of a double unwind (2 unwinds in a row) in our inventory collection (shown in the lesson ) with the $push operator?

Using $out

It is used to redirect the output of the aggregation to a new collection only using 2.6 version and after. Example:


db.games.aggregate([
{ "$unwind" : "$moves" }
       ,{ "$out" : "sumary_results"}

])

in this case we can have duplicity of '_id" because de $unwind operator will produce new documents, one for each value of the array "moves". There is not any problem if the output is to shell but if the output is redirect to a new collection using $out operator, a dupplicated key error will occur when we execute the aggregation but the previous collections will not be altered because the temporary result of execution is sent to a mongo temporary collection.

Aggregation option

The aggregation subsystem allow you to specify some options that will change the behaviour of aggregation. The format is:

aggregate([ { stage1 }, { stage2 }, { stage3 }], {options} ]

The options are:

  • explain: let's you get the query plan if you run the aggregation query. It is used to optimicing aggregation operation.
                db.zips.aggregate([
                 {$group: { "_id": { "state": "$state"},
                                  "population": { "$sum": "$pop"}}}
               ],{ "explain": true})

  • allowDiskUse:control:if the aggregation framework will use the hard drive for some intermediate stages. By default any given stage of aggregation is limited to 100MB. Some stages like projection run the document in front and don't use a lot of memory but if we used $sort, that by default sort in memory, the aggregation will fail if the memory used exceeds of 100MB

  •           db.zips.aggregate([   {$group: { "_id": { "state": "$state"},
                                      "population": { "$sum": "$pop"}}}
                   ],{ "explain": true})
  • cursorif you will change the behaviour of returning results to Python, returning a cursos instead of a big document it is necessary specify this option in the python aggregation:
    • aggregate([{"$group": { "_id": "$state",
                               "population": { "$sum": "$pop"}}}],
                    cursor = {})
           

Which of the following aggregation expressions properly allows disk use when performing aggregation?

Python and Aggregation results

Prior 2.6 version, aggregation returns a big document. In 2.6 version aggregation returns a cursor and gives you a document after another. If we use Python, we receive the results in a big document but optionally it will receive a cursor. Example:

      results = db.zips.aggregate([


                    {"$group": { "_id": "$state",
                                         "population": { "$sum": "$pop"}}}],
                    cursor = {}, 
                    explain = True,
                    allowDiskUse = True)
           
            for doc in results
                 print doc

Which of the following statements about aggregation results are true? Check all that apply.

Mapping between SQL and Aggregations

SQL Terms, Functions, and ConceptsMongoDB Aggregation Operators
WHERE$match
GROUP BY$group
HAVING$match
SELECT$project
ORDER BY$sort
LIMIT$limit
SUM()$sum
COUNT()$sum
joinNo direct corresponding operator; however, the $unwindoperator allows for somewhat similar functionality, but with fields embedded within the document.

Some Common SQL examples

From the documentacion SQL to Aggregation Mapping Chart:
The following table presents a quick reference of SQL aggregation statements and the corresponding MongoDB statements. The examples in the table assume the following conditions:
  • The SQL examples assume two tables, orders and order_lineitem that join by theorder_lineitem.order_id and the orders.id columns.
  • The MongoDB examples assume one collection orders that contain documents of the following prototype:
    {
      cust_id: "abc123",
      ord_date: ISODate("2012-11-02T17:04:11.102Z"),
      status: 'A',
      price: 50,
      items: [ { sku: "xxx", qty: 25, price: 1 },
               { sku: "yyy", qty: 25, price: 1 } ]
    }
    
SQL ExampleMongoDB ExampleDescription
SELECT COUNT(*) AS count
FROM orders
db.orders.aggregate( [
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )
Count all records fromorders
SELECT SUM(price) AS total
FROM orders
db.orders.aggregate( [
   {
     $group: {
        _id: null,
        total: { $sum: "$price" }
     }
   }
] )
Sum theprice field from orders
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date
HAVING total > 250
db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )
For each uniquecust_id,ord_dategrouping, sum the pricefield and return only where the sum is greater than 250. Excludes the time portion of the date.
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250
db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )
For each uniquecust_id with status A, sum the pricefield and return only where the sum is greater than 250.
SELECT cust_id,
       SUM(li.qty) as qty
FROM orders o,
     order_lineitem li
WHERE li.order_id = o.id
GROUP BY cust_id
db.orders.aggregate( [
   { $unwind: "$items" },
   {
     $group: {
        _id: "$cust_id",
        qty: { $sum: "$items.qty" }
     }
   }
] )
For each uniquecust_id, sum the corresponding line item qtyfields associated with the orders.
SELECT COUNT(*)
FROM (SELECT cust_id,
             ord_date
      FROM orders
      GROUP BY cust_id,
               ord_date)
      as DerivedTable
db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        }
     }
   },
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )
Count the number of distinctcust_id,ord_dategroupings. Excludes the time portion of the date.

Limitations of the aggregation framework

  1. by default we have 100MB for pipeline stages.
    1. use allowDiskUse option for more than 100MB
  2. limit of 16MB if you decide to return the results in one document. In Python by default 16MB because by default it returns one document
    1. use cursor option to return a cursor instead of a big document
  3. in a sharding system, using $sort or $group or anytring that requeries looking at all the results, the results will return to the primary shard of the database

No hay comentarios:

Publicar un comentario