Buscar

domingo, 23 de noviembre de 2014

MongoDB for developers 4/8. Performance. Homeworks

Homework 4.1

Suppose you have a collection with the following indexes:
 
> db.products.getIndexes()
[
 {
  "v" : 1,
  "key" : {
   "_id" : 1
  },
  "ns" : "store.products",
  "name" : "_id_"
 },
 {
  "v" : 1,
  "key" : {
   "sku" : 1
  },
                "unique" : true,
  "ns" : "store.products",
  "name" : "sku_1"
 },
 {
  "v" : 1,
  "key" : {
   "price" : -1
  },
  "ns" : "store.products",
  "name" : "price_-1"
 },
 {
  "v" : 1,
  "key" : {
   "description" : 1
  },
  "ns" : "store.products",
  "name" : "description_1"
 },
 {
  "v" : 1,
  "key" : {
   "category" : 1,
   "brand" : 1
  },
  "ns" : "store.products",
  "name" : "category_1_brand_1"
 },
 {
  "v" : 1,
  "key" : {
   "reviews.author" : 1
  },
  "ns" : "store.products",
  "name" : "reviews.author_1"
 }

 Which of the following queries can utilize an index. Check all that apply.

Homework 4.2

Suppose you have a collection called tweets whose documents contain information about the created_at time of the tweet and the user's followers_count at the time they issued the tweet. What can you infer from the following explain output?
 
db.tweets.find({"user.followers_count":{$gt:1000}}).sort({"created_at" : 1 }).limit(10).skip(5000).explain()
{
        "cursor" : "BtreeCursor created_at_-1 reverse",
        "isMultiKey" : false,
        "n" : 10,
        "nscannedObjects" : 46462,
        "nscanned" : 46462,
        "nscannedObjectsAllPlans" : 49763,
        "nscannedAllPlans" : 49763,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 205,
        "indexBounds" : {
                "created_at" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "localhost.localdomain:27017"
}
 


Homework 4.3

Making the Blog fastPlease download hw4-3.zip from the Download Handout link to get started. This assignment requires Mongo 2.2 or above.
In this homework assignment you will be adding some indexes to the post collection to make the blog fast.
We have provided the full code for the blog application and you don't need to make any changes, or even run the blog. But you can, for fun.
We are also providing a patriotic (if you are an American) data set for the blog. There are 1000 entries with lots of comments and tags. You must load this dataset to complete the problem.
 
# from the mongo shell
use blog
db.posts.drop()
# from the a mac or PC terminal window
mongoimport -d blog -c posts < posts.json
or
mongoimport --host localhost --port 27017 --db blog --collection posts --file "posts.json" --drop --stopOnError

The blog has been enhanced so that it can also display the top 10 most recent posts by tag. There are hyperlinks from the post tags to the page that displays the 10 most recent blog entries for that tag. (run the blog and it will be obvious)
Your assignment is to make the following blog pages fast:
  • The blog home page
  • The page that displays blog posts by tag (http://localhost:8082/tag/whatever)
  • The page that displays a blog entry by permalink (http://localhost:8082/post/permalink)
By fast, we mean that indexes should be in place to satisfy these queries such that we only need to scan the number of documents we are going to return. To figure out what queries you need to optimize, you can read the blog.py code and see what it does to display those pages. Isolate those queries and use explain to explore.

****************************
    # returns an array of num_posts posts, reverse ordered
    def get_posts(self, num_posts):

##################################################################
        self.posts.ensure_index([ ("date", pymongo.DESCENDING)])
##################################################################        

        cursor = self.posts.find().sort('date', direction=-1).limit(num_posts)
        l = []

        for post in cursor:
            post['date'] = post['date'].strftime("%A, %B %d %Y at %I:%M%p") # fix up date
            if 'tags' not in post:
                post['tags'] = [] # fill it in if its not there already
            if 'comments' not in post:
                post['comments'] = []

            l.append({'title':post['title'], 'body':post['body'], 'post_date':post['date'],
                      'permalink':post['permalink'],
                      'tags':post['tags'],
                      'author':post['author'],
                      'comments':post['comments']})

        return l

    # returns an array of num_posts posts, reverse ordered, filtered by tag
    def get_posts_by_tag(self, tag, num_posts):

##################################################################
        self.posts.ensure_index([ ("tags", pymongo.ASCENDING),("date", pymongo.DESCENDING)])
##################################################################        

        cursor = self.posts.find({'tags':tag}).sort('date', direction=-1).limit(num_posts)
        l = []

        for post in cursor:
            post['date'] = post['date'].strftime("%A, %B %d %Y at %I:%M%p")     # fix up date
            if 'tags' not in post:
                post['tags'] = []           # fill it in if its not there already
            if 'comments' not in post:
                post['comments'] = []

            l.append({'title': post['title'], 'body': post['body'], 'post_date': post['date'],
                      'permalink': post['permalink'],
                      'tags': post['tags'],
                      'author': post['author'],
                      'comments': post['comments']})

        return l

    # find a post corresponding to a particular permalink
    def get_post_by_permalink(self, permalink):

##################################################################
        self.posts.ensure_index([ ("permalink", pymongo.ASCENDING)])
##################################################################        

        post = self.posts.find_one({'permalink': permalink})

        if post is not None:
            # fix up likes values. set to zero if data is not present
            for comment in post['comments']:
                if 'num_likes' not in comment:
                    comment['num_likes'] = 0

            # fix up date
            post['date'] = post['date'].strftime("%A, %B %d %Y at %I:%M%p")

        return post


Once you have added the indexes to make those pages fast run the following.
 
python validate.py

(note that for folks who are using MongoLabs or MongoHQ there are some command line options to validate.py to make it possible to use those services) Now enter the validation code below.


Homework 4.4

In this problem you will analyze a profile log taken from a mongoDB instance. To start, please download sysprofile.json from Download Handout link and import it with the following command:

mongoimport -d m101 -c profile < sysprofile.json
or
mongoimport --host localhost --port 27017 --db m101 --collection profile --file "sysprofile.json" --drop --stopOnError

Now query the profile data, looking for all queries to the students collection in the database school2, sorted in order of decreasing latency.

db.profile.find({"ns" : /school2.students/}).sort({"millis":-1}).limit(1).pretty()

What is the latency of the longest running operation to the collection, in milliseconds?


1 comentario: