Web applications development : MongoDB for developers 4/8. Performance

Indexes

Indexes are the most important factor in mongodb performance. By default the data is locked for sequently access.

Use indexing in order to find sorted data is faster than not indexing
The order of the keys in the index is important to find data fast
Because indexes take space on disk and need to be updated every write, it is important not to index for all keys of the document- It is much more efficiently to have indexes for all the most common queries
We can use the keys of index in the order of are defined. index (a,b,c) --> index for a, index for a,b but not index for c or index for c.

The optimization that have the greatest impact on the performance of a database is adding appropiate indexes on large collections so that only a small percentatge of queries need to scan the collection

Creating Indexes

db.collection.ensureIndex({ camp: order (1 or -1)})
db.system.indexes.find() --> show all the indexes
db.collection.getIndexes() --> show the indexes of a collection
db.collection.dropIndex({}) --> drop an index of a collection

Please provide the mongo shell command to add an index to a collection namedstudents, having the index key be class, student_name.

db.students.ensureIndex({class:1, student_name:1})

Multikey Indexes

mongodb supports: an index on a key with value can be an array

mongodb does not support: an index with combinations of a key with value of an array and other document's elements.

An index with more than one combination of arrays and values than are not arrays:
            ensureIndex({a:1, b:1})
            {a:1,b:1}
            {a:[1,2,3] , b:1}        --> support
            {a:[1,2,3] , b:[4,5,6]} --> not support

db.collection.find().explain() --> show ahow the find has been made

Suppose we have a collection foo that has an index created as follows:

db.foo.ensureIndex({a:1, b:1})

Which of the following inserts are valid to this collection?

we can make an index of subparts of an array:

b: [ {a:1,b:1, c: [1,2,3] }]

Two parallel arrays index are not allowed.

Index Creation option, Unique

Please provide the mongo shell command to add a unique index to the collectionstudents on the keys student_id, class_id.

db.students.ensureIndex({student_id:1, class_id:1}, {unique: true})

Index Creation, Removing Dups

db.collection.ensureIndex( {a:1},{unique: true, dropDups:true}}) --> when the index is created, if it finds a dupplicated document, remove all documents that have this dupplicated key, except one

If you choose the dropDups option when creating a unique index, what will the MongoDB do to documents that conflict with an existing index entry?

Delete them for ever and ever, Amen.

Index Creation, Sparse

To create unique indexes when the indexed key is not present in the document.
    1. {a:1,b:2,c:3}
    2. {a:10,b:5,c:10}
    3. {a:13,b:4}
    4. {a:7,b:23}

In this documents, a Spare index will create a index with the present keys discarding the documents that do not contain the same key. If we want to index for {c:1}, the documents 3 and 4 will not be added to the index

db.collection.ensureIndex( {a:1},{unique: true, sparse:true}}) --> it will create a disperse index
db.collection.find().sort().hint() --> hint() forces the query optimizer to use a specific index to fulfill the query

Suppose you had the following documents in a collection called people with the following docs:

> db.people.find()
{ "_id" : ObjectId("50a464fb0a9dfcc4f19d6271"), "name" : "Andrew", "title" : "Jester" }
{ "_id" : ObjectId("50a4650c0a9dfcc4f19d6272"), "name" : "Dwight", "title" : "CEO" }
{ "_id" : ObjectId("50a465280a9dfcc4f19d6273"), "name" : "John" }

And there is an index defined as follows:

db.people.ensureIndex({title:1}, {sparse:1})

If you perform the following query, what do you get back, and why?

db.people.find({title:null})

No documents, because the query uses the index and there are no documents with title:null in the index.

Index Creation, Background

foreground (default)    Background:
        faster                     slow
        block writes            dos not block writers
            (per DBlock)

Which things are true about creating an index in the background in MongoDB. Check all that apply.

Using Explain

Inform how the query was done,which index was used to and how they were used.

Given the following output from explain, what is the best description of what happened during the query?

{
 "cursor" : "BasicCursor",
 "isMultiKey" : false,
 "n" : 100000,
 "nscannedObjects" : 10000000,
 "nscanned" : 10000000,
 "nscannedObjectsAllPlans" : 10000000,
 "nscannedAllPlans" : 10000000,
 "scanAndOrder" : false,
 "indexOnly" : false,
 "nYields" : 7,
 "nChunkSkips" : 0,
 "millis" : 5151,
 "indexBounds" : {
  
 },
 "server" : "Andrews-iMac.local:27017"
}

The query scanned 10,000,000 documents, returning 100,000 in 5.2 seconds.

When is an index used?

MongoDb extract estatistic information of the useful queries and choose the best indexation in background every 100 queries more or less.

Given collection foo with the following index:

db.foo.ensureIndex({a:1, b:1, c:1})

Which of the following queries will use the index?

How large is your index?

Indexes have to be in memory in order to get good performance. The size of the index can be very big and will use a lot of memory. This is a consideration at time to planning what sort of indexes we want to create for the documents that we have.

db.collection.stats() --> statistic information
db.collection.totalIndexSize() --> get information of size on disc of indexes

Is it more important that your index or your data fit into memory?

Index Cardinality

Regular index: 1 to 1
Sparse index: <= documents
Multikey index: with array of tags > number of documents

Let's say you update a document with a key called tags and that update causes the document to need to get moved on disk. If the document has 100 tags in it, and if the tags array is indexed with a multikey index, how many index points need to be updated in the index to accomodate the move?

100

Indexing in pyMongo

db.collection.ensureIndex([ ('key1', pymongo.ASCENDING), ('key2', pymongo.DESCENDING)])

Hinting an Index

db.people.find().sort({'title':1}).hint({'title:1}
db.people.find().sort({'title':1}).hint({ $natural:1 }) --> specify the index which is the best for mongodb

hint() specify wich index will be used. Using an index with a key that do not exist in the documents, the query cannot be executed because there is not any pointer in the index to any document.

Given the following data in a collection:

> db.people.find()
{ "_id" : ObjectId("50a464fb0a9dfcc4f19d6271"), "name" : "Andrew", "title" : "Jester" }
{ "_id" : ObjectId("50a4650c0a9dfcc4f19d6272"), "name" : "Dwight", "title" : "CEO" }
{ "_id" : ObjectId("50a465280a9dfcc4f19d6273"), "name" : "John" }

and the following indexex:

> db.people.getIndexes()
[
 {
  "v" : 1,
  "key" : {
   "_id" : 1
  },
  "ns" : "test.people",
  "name" : "_id_"
 },
 {
  "v" : 1,
  "key" : {
   "title" : 1
  },
  "ns" : "test.people",
  "name" : "title_1",
  "sparse" : 1
 }
]

Which query below will return the most documents.

hint natural to use BasicCursor returns all docs.

Efficiency of index use

There are elements that $gt, $lt, $eq, $ne, $exist than can make the query slow because have to examine all the documents.

Is better to use regular expressions /abcd/ -> look for a,b,c,d, /^abcd/ do not look for a,b,c,d

Keep in mind when you think aboinut indexing you have to consider how the index was used: only for the sort o if it was used inefficiently and caused that de database examined millions of records, etc

Geospatial Indexes

They are indexes based in locations using 2D coordinates. : {'location': [x,y] }

ensureIndex({ "location": '2d', type: 1})
find({location: { "$near" : [x,y] }} ) --> retorn locatiosn in increase distances

Suppose you have a 2D geospatial index defined on the key location in the collection places. Write a query that will find the closest three places (the closest three documents) to the location 74, 140.

db.places.find({location: {$near: [74,140]}}).limit(3)

Geospatial Spherical

lng -> vertical
lat -> horizontal (-90 to 90)

specification GeoJSON -> ( )
{ "location" : { Type : "Point", "coordinates : [-122,40] "} }

ensureIndex( { location : '2dsphere'})

find( { "location" :
            { "$near" :
                { "$geometry" :
                    { "type"         : "Point" ,
                    "coordinates" : [-10,10] },
                    "$maxdistante" : 2000 <-- in meters
                    }
                }
            })

What is the query that will query a collection named "stores" to return the stores that are within 1,000,000 meters of the location latitude=39, longitude=-130? Type the query in the box below. Assume the stores collection has a 2dsphere index on "loc" and please use the "$near" operator. Each store record looks like this:

{ "_id" : { "$oid" : "535471aaf28b4d8ee1e1c86f" },

  "store_id" : 8,

  "loc" : { "type" : "Point", "coordinates" : [ -37.47891236119904, 4.488667018711567 ] } }

db.stores.find( { loc : { "$near" : { "$geometry" : { "type" : "Point", "coordinates : [ -130, 39]},"$maxdistance" : 1000000}}})

Full Text searches in mongoDb

There is a type of index that allow to look for text in the data.

ensureIndex( { 'words': 'text'})

db.collection.find( { "$text" : {"$search":'texto'}) --> look for dog in the documents no case-sensitive.

db.collection.find( { "$text" : {"$search":'word1 word2 word3 '}}, { "score" : {"$meta" : 'textScore'}}).sort( { "score": { "$meta" : 'textScore'}}) --> look for documents that contains all of the tree wordsYou create a text index on the "title" field of the movies collection, and then perform the following text search:

> db.movies.find( { $text : { $search : "Big Lebowski" } } )

Which of the following documents will be returned, assuming they are in the movies collection? Check all that apply.

Logging and profiling: log slow queries

Mongodb have a profiler to detect via log informing how mongod is accessing to database: system.profile

We have tree levels to log information of tue queries to know how our application is working:

level 0: default and it is log off
level 1: only log slow queries (register slow queries)
level 2: record all logs of the queries (register my queries) --> is for debugging

mongod --db dbpath --profile 1 --slows 2 (2 mseconds)

db.system.profile.find()
db.getProfilingLevel()
db.getProfilingStatus()
db.setProfilingLevel(1,4) level 1 , 4 mseconds
db.setProfilingLevel(0) --> off

Write the query to look in the system profile collection for all queries that took longer than one second, ordered by timestamp descending.

db.system.profile.find({millis: {$gt: 1000}}).sort({ts: -1})

MongoStat

system information of mongodb database.

column idx miss% --> % de perdida de memoria por los índices

MongoTop

give a high level view of how mongdb is spending the time.

Resume

    1. indexes are critical to performance
    2. explain()
    3. hint()
    4. profiling

Sharding

It is a techique to divide a collections in multiples servers

Application --> mongos --> mongod 1
--> mongod 2
--> mongod 3

It is necessaryt to include a sharding key to look for the server in which document is

Web applications development

Buscar

domingo, 23 de noviembre de 2014

MongoDB for developers 4/8. Performance