Indexes
Indexes are the most important factor in mongodb performance. By default the data is locked for sequently access.
- Use indexing in order to find sorted data is faster than not indexing
- The order of the keys in the index is important to find data fast
- Because indexes take space on disk and need to be updated every write, it is important not to index for all keys of the document- It is much more efficiently to have indexes for all the most common queries
- We can use the keys of index in the order of are defined. index (a,b,c) --> index for a, index for a,b but not index for c or index for c.
Creating Indexes
- db.collection.ensureIndex({ camp: order (1 or -1)})
- db.system.indexes.find() --> show all the indexes
- db.collection.getIndexes() --> show the indexes of a collection
- db.collection.dropIndex({}) --> drop an index of a collection
Multikey Indexes
ensureIndex({a:1, b:1})
{a:1,b:1}
{a:[1,2,3] , b:1} --> support
{a:[1,2,3] , b:[4,5,6]} --> not support
db.foo.ensureIndex({a:1, b:1})
Which of the following inserts are valid to this collection?
Index Creation option, Unique
Index Creation, Removing Dups
If
you choose the dropDups option when creating a unique index, what will
the MongoDB do to documents that conflict with an existing index entry?
Delete them for ever and ever, Amen.
To create unique indexes when the indexed key is not present in the document.
1. {a:1,b:2,c:3}
2. {a:10,b:5,c:10}
3. {a:13,b:4}
4. {a:7,b:23} In this documents, a Spare index will create a index with the present keys discarding the documents that do not contain the same key. If we want to index for {c:1}, the documents 3 and 4 will not be added to the index Suppose you had the following documents in a collection called people with the following docs: No documents, because the query uses the index and there are no documents with title:null in the index.
foreground (default) Background:
faster slow
block writes dos not block writers
(per DBlock)
Which things are true about creating an index in the background in MongoDB. Check all that apply.
Inform how the query was done,which index was used to and how they were used. Given the following output from explain, what is the best description of what happened during the query? The query scanned 10,000,000 documents, returning 100,000 in 5.2 seconds.
MongoDb extract estatistic information of the useful queries and choose the best indexation in background every 100 queries more or less.
Given collection foo with the following index:
Indexes have to be in memory in order to get good performance. The size of the index can be very big and will use a lot of memory. This is a consideration at time to planning what sort of indexes we want to create for the documents that we have. Is it more important that your index or your data fit into memory?
Let's
say you update a document with a key called tags and that update causes
the document to need to get moved on disk. If the document has 100 tags
in it, and if the tags array is indexed with a multikey index, how many
index points need to be updated in the index to accomodate the move? 100
db.collection.ensureIndex([ ('key1', pymongo.ASCENDING), ('key2', pymongo.DESCENDING)])
hint() specify wich index will be used. Using an index with a
key that do not exist in the documents, the query cannot be executed
because there is not any pointer in the index to any document.
Given the following data in a collection: hint natural to use BasicCursor returns all docs.
Is better to use regular expressions /abcd/ -> look for a,b,c,d, /^abcd/ do not look for a,b,c,d
Keep in mind when you think aboinut indexing you have to consider how the index was used: only for the sort o if it was used inefficiently and caused that de database examined millions of records, etc
They are indexes based in locations using 2D coordinates. : {'location': [x,y] }
ensureIndex({ "location": '2d', type: 1})
find({location: { "$near" : [x,y] }} ) --> retorn locatiosn in increase distances Suppose you have a 2D geospatial index defined on the key location in the collection places. Write a query that will find the closest three places (the closest three documents) to the location 74, 140. db.places.find({location: {$near: [74,140]}}).limit(3)
{ "location" : { Type : "Point", "coordinates : [-122,40] "} } ensureIndex( { location : '2dsphere'})
find( { "location" :
{ "$near" :
{ "$geometry" :
{ "type" : "Point" ,
"coordinates" : [-10,10] },
"$maxdistante" : 2000 <-- in meters
}
}
})
What is the query that will query a collection named "stores" to return the stores that are within 1,000,000 meters of the location latitude=39, longitude=-130? Type the query in the box below. Assume the stores collection has a 2dsphere index on "loc" and please use the "$near" operator. Each store record looks like this: db.stores.find( { loc : { "$near" : { "$geometry" : { "type" : "Point", "coordinates : [ -130, 39]},"$maxdistance" : 1000000}}})
ensureIndex( { 'words': 'text'}) db.collection.find( { "$text" : {"$search":'texto'}) --> look for dog in the documents no case-sensitive.
db.collection.find( { "$text" : {"$search":'word1 word2 word3 '}}, { "score" : {"$meta" : 'textScore'}}).sort( { "score": { "$meta" : 'textScore'}}) --> look for documents that contains all of the tree wordsYou create a text index on the "title" field of the movies collection, and then perform the following text search:
Which of the following documents will be returned, assuming they are in the movies collection? Check all that apply.
We have tree levels to log information of tue queries to know how our application is working:
db.system.profile.find({millis: {$gt: 1000}}).sort({ts: -1})
column idx miss% --> % de perdida de memoria por los índices
2. explain()
3. hint()
4. profiling
Application --> mongos --> mongod 1
--> mongod 2
--> mongod 3
It is necessaryt to include a sharding key to look for the server in which document is
Index Creation, Sparse
1. {a:1,b:2,c:3}
2. {a:10,b:5,c:10}
3. {a:13,b:4}
4. {a:7,b:23}
- db.collection.ensureIndex( {a:1},{unique: true, sparse:true}}) --> it will create a disperse index
- db.collection.find().sort().hint() --> hint() forces the query optimizer to use a specific index to fulfill the query
> db.people.find()
{ "_id" : ObjectId("50a464fb0a9dfcc4f19d6271"), "name" : "Andrew", "title" : "Jester" }
{ "_id" : ObjectId("50a4650c0a9dfcc4f19d6272"), "name" : "Dwight", "title" : "CEO" }
{ "_id" : ObjectId("50a465280a9dfcc4f19d6273"), "name" : "John" }
And there is an index defined as follows:db.people.ensureIndex({title:1}, {sparse:1})
If you perform the following query, what do you get back, and why?db.people.find({title:null})
Index Creation, Background
foreground (default) Background:
faster slow
block writes dos not block writers
(per DBlock)
Which things are true about creating an index in the background in MongoDB. Check all that apply.
Using Explain
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 100000,
"nscannedObjects" : 10000000,
"nscanned" : 10000000,
"nscannedObjectsAllPlans" : 10000000,
"nscannedAllPlans" : 10000000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 7,
"nChunkSkips" : 0,
"millis" : 5151,
"indexBounds" : {
},
"server" : "Andrews-iMac.local:27017"
}
When is an index used?
db.foo.ensureIndex({a:1, b:1, c:1})
Which of the following queries will use the index?
How large is your index?
- db.collection.stats() --> statistic information
- db.collection.totalIndexSize() --> get information of size on disc of indexes
Index Cardinality
- Regular index: 1 to 1
- Sparse index: <= documents
- Multikey index: with array of tags > number of documents
Indexing in pyMongo
Hinting an Index
- db.people.find().sort({'title':1}).hint({'title:1}
- db.people.find().sort({'title':1}).hint({ $natural:1 }) --> specify the index which is the best for mongodb
> db.people.find()
{ "_id" : ObjectId("50a464fb0a9dfcc4f19d6271"), "name" : "Andrew", "title" : "Jester" }
{ "_id" : ObjectId("50a4650c0a9dfcc4f19d6272"), "name" : "Dwight", "title" : "CEO" }
{ "_id" : ObjectId("50a465280a9dfcc4f19d6273"), "name" : "John" }
and the following indexex:> db.people.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.people",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"title" : 1
},
"ns" : "test.people",
"name" : "title_1",
"sparse" : 1
}
]
Which query below will return the most documents.
Efficiency of index use
There are elements that $gt, $lt, $eq, $ne, $exist than can make the query slow because have to examine all the documents.Geospatial Indexes
find({location: { "$near" : [x,y] }} ) --> retorn locatiosn in increase distances
Geospatial Spherical
- lng -> vertical
- lat -> horizontal (-90 to 90)
{ "location" : { Type : "Point", "coordinates : [-122,40] "} }
{ "$near" :
{ "$geometry" :
{ "type" : "Point" ,
"coordinates" : [-10,10] },
"$maxdistante" : 2000 <-- in meters
}
}
})
What is the query that will query a collection named "stores" to return the stores that are within 1,000,000 meters of the location latitude=39, longitude=-130? Type the query in the box below. Assume the stores collection has a 2dsphere index on "loc" and please use the "$near" operator. Each store record looks like this:
{ "_id" : { "$oid" : "535471aaf28b4d8ee1e1c86f" },
"store_id" : 8,
"loc" : { "type" : "Point", "coordinates" : [ -37.47891236119904, 4.488667018711567 ] } }
Full Text searches in mongoDb
There is a type of index that allow to look for text in the data.> db.movies.find( { $text : { $search : "Big Lebowski" } } )
Which of the following documents will be returned, assuming they are in the movies collection? Check all that apply.
Logging and profiling: log slow queries
Mongodb have a profiler to detect via log informing how mongod is accessing to database: system.profile- level 0: default and it is log off
- level 1: only log slow queries (register slow queries)
- level 2: record all logs of the queries (register my queries) --> is for debugging
- db.system.profile.find()
- db.getProfilingLevel()
- db.getProfilingStatus()
- db.setProfilingLevel(1,4) level 1 , 4 mseconds
- db.setProfilingLevel(0) --> off
MongoStat
system information of mongodb database.column idx miss% --> % de perdida de memoria por los índices
MongoTop
give a high level view of how mongdb is spending the time.Resume
1. indexes are critical to performance2. explain()
3. hint()
4. profiling
Sharding
It is a techique to divide a collections in multiples serversApplication --> mongos --> mongod 1
--> mongod 2
--> mongod 3
It is necessaryt to include a sharding key to look for the server in which document is
No hay comentarios:
Publicar un comentario