Web applications development : MongoDB for DBA's 3/7. Performance

Chapter 3. PERFORMANCE: Indexing and Monitoring

Indexes Explained

Binary-tree indexes where a key references a document location. B-tree indexes are "n-ary indexes" (you have multiple keys in a single child)

{ a: 1, b: -1} --> where 1 --> ASCending order and -1 --> DEScending order
{ a: 1 } --> the index will be ASC or DES because it is able to do it.

db.collection.ensureIndex({ a : 1})

db.collection.find({}).sort({ a : 1})
db.collection.find({}).sort({ a : -1})

The queries can use the same index to sort the documents.

How do a B-tree and a binary tree differ?

EnsureIndex() Command

it works well if the collections is really small or if it's new because the one issue is creating and send index if the collection is large due to the working to create it.

In collections with billions of documents, it is necessary to plan the creation of index and also, not creating explicitly during release maintenances.

Which index would you create to optimize your reads if you used only these queries in your application, plus queries that differed from these only in the value?

find( { a : 5 , b : 10 } )
find( { a: { $gt : 5 }  } )
find( { a : { $lt : 100 } } )

Collection Scans

Without an index, it is necessary to scan all the collection's documents sequentially.
With an index, the document is getting directly. Also"duplicate keys" in mongo are allowed in indexes because the ID of the index can be unique and different of the index field.

Index Notes

Keys of indexes can be any type and due to mongo have dynamic schema, we do pre-declare certain things like indexes. It doesn't mind if the index field contains integers, strings or nulls
the "_id" index is automatically created having an unique constraint on it because the _id value in mongo should be unique to the collection and immutable (it shouldn't change for that document's lifetime) but it doens't have unique across the entire database just for the collection in question.
Other than "_id" indexes can be created explicitly using the ensureIndex command
the indexes are automatically used when present
we don't have to usually explicitly what index we wish to be used for a query, or for a write operation
can index the contents of an array --> "multikeys" indexation
can index subdocuments and subfields
fieldnames are not in the index --> it saves space

Every index created will take up more space, both on disc and potencially in RAM

A MongoDB index can have keys of different types (i.e., ints, dates, string)

Unique Indexes

"default" index is with duplicate key allowed
Specifying a document's option { "unique" : true } , unique indexes are allowed
in compound indexes the unique option is also checked

If an index is created with the options document,

{ unique : true }

can 2 documents which do not include the field that is indexed exist in the same collection?

Sparse Indexes

The option { "sparse" : true } prevents to index empty values creating a index much more efficiently.

If an index is unique AND sparse, can 2 documents which do not include the field that is indexed exist in the same collection?

TTL Indexes

Time to live index sometime refered to TTL collection : gives the system a way to automatically update documents

Geospatial Indexes

previous v2.6 the indexes are two-dimensional and after v.2.6 the indexes can also be three-dimensional.

db.collection.ensureIndex( { "loc" : "2dsphere" } )

lng : -180 to 180 degrees
lat: -90 to 90 degrees

Examples:

db.cities.find( { "loc" : {
"$near": {
"$geometry": {
type : "Point",
coordinates : [-105.230256, 20.615845]
},
"spherical": true,
"$maxDistance" :20000,
"$minDistance" :0,
}}

}).pretty()

db.cities.find( { "loc" :
{ "$geoWithin" : {
"$polygon" : [ [-77.53466,23.75975],[-77.78,23.71],[-78.03405,24.28615],[-78.40848,24.57564],[-78.19087,25.2103],[-77.89,25.17],[-77.54,24.34],[-77.53466,23.75975],[-77.82,26.58],[-78.91,26.42],[-78.98,26.79],[-78.51,26.87],[-77.85,26.84],[-77.82,26.58],[-77,26.59],[-77.17255,25.87918],[-77.35641,26.00735],[-77.34,26.53],[-77.78802,26.92516],[-77.79,27.04],[-77,26.59] ]
}
}

})

Text Indexes

in mongodb we can create a text search index to optimize searchings. It creates a multikey index of all of words of the field like an array index.

db.collection.ensureIndex( { "things" : "text" } )

db.collection.find( { "$text" : { "$search" : "mobile" } } )
db.collection.find( { "$text" : { "$search" : "mobiles" } } ) --> find words singular or plural
db.collection.find( { "$text" : { "$search" : "home car" } } ) --> find the first word, the second word or both.

db.collection.ensureIndex( { "things" : "text" }, { "default_language" : "english" } )

You create a text index on the "title" field of the movies collection, and then perform the following text search:

db.movies.find( { $text : {$search : "Big Lebowski" } } )

Which of the following documents will be refunded, assuming they are in the movies collection? Check all that apply.

dropsDups paramenter

{ dropsDups : true } --> optional parameter to create an index --> if there is any duplicates, all of the duplicate documents except one will be arbitrarily deleted or dropped to assure the unique index --> this delete DATA

Background Index Creation

Characteristics: optional parameter { background : true }

to create the index in the background on the primary.
We can keep writing and reading from the collection while that's going on. However it is not a background operation on the secundaries for various technical reasons.
Doing queries to the secundaries while this building is occuring, they are going to block in this collections.
This is only interesting if you have replica set writing to and reading from the primary.
This option is slower than foreground option.
it may take a while (may be one hour for big collections) and not inmediate. Next operations need to wait until index will be created. Doing this in the shell will block the collection until index will be created. Other collections can be doing read and write operations on the primary while this is going on.

Characteristics: by default is create the index in foreground:

is good for building very large indexes of very large existing collections thar are larger than RAM.
Packs more than background. the way B-trees work

Covered Indexes

This is not a special index type, it is really just that the query optimizer can figure out how to do the query by only looking at the index.

Suppose we run:

db.foo.ensureIndex({a:1,b:2,c:3})
db.foo.find({a:"sports", b:{$gt:100}})

Then

Explain Plans and hint()

explain operation runs the query getting back some statistics of what happened. It can take some time and be slow. Using the killOp command we can kill the process

"hint" command will force to use an specific indecx.

db.collection.find({}). hint( { index_a : 1 } ).explain()

Ideally, you don't need to use hints because the query optimizer does a good job at picking the right query plan.

Hint always works.

Read & Write impact on Indexes

Do indexes speed up writes, slow down writes, or neither, or are they neutral ?
It depends what our operation is:

inserts: more indexes are goingo to always be slower
updates or deletes: it depens on the situation.

update or delete everything, it will be slower because we would have to delete keys from the index
update or delete one record: the one of the indexes might be helpful just to find the record to be deleted

In general indexes:

speed up reads
slow down writes, taking a lot of time if we have a lot of indexes because it needs to update all the indexes related with the write operation

inserts: take a specific time related to the number of indexes
deletes: ithey are faster than inserts because the only operation is to mark the key in the b-tree index
updates: can result in extra writes if an index exist for a field. If the field that is going to update is not indexed, the operation is very fast

If you have very high write rates on a collection, having a lot of indexes would generally not be a great idea.

What are some disadvantages to having indexes that are not needed or not used? Check all that apply.

Effects of Document Movement

In Mongo v2.2 an update operation we will need enough space to save the data.

If it exists enough padding to add data, any write will be needed if the field it is not in an index
If it is not sufficient padding, the document will be moved using a move operation being more expensive because they are going to find a new allocation unit. The old unit will be marked as deleted and the pointer to new unit will need to be updated in all of the indexes even though the field it is not indexed

Moves are expensive. Mongo add some padding to documents if it sees that moves happening, to try to reduce the number of moves. But it is a balance between the space used for padding and the number of moves occuring.

With the arrival of MongoDB 2.6, documents have padding by default -- usePowerOf2Sizes is on by default.

This helps minimize the number of document moves, and is a good default to have when you don't know how big documents are going to get when you update them over time, but it does trade some empty space for speed. Depending on your application, and how often documents will grow (and by how much) there may be a better allocation strategy.

PowerOf2Sizes

When mongodb inserts a document add some extra padding to the end of the allocated space giving a little bit space for additional data to come in.

If in an update operation there is not enough space to save the data, it needs to move the document to a new allocation updating the pointer of this new allocation in all the indexes in a more expensive operation.

How much padding is getting added ? In mongodb v2.6+ is different that older versions at least by default. In v2.6+ there is a configuration option for collections called powerOf2Sizes which is basically a setting that can either be true or false and by default is true.

PowerOf2Sizes : true --> means that the allocation units for records are in powers of 2.
The smallest documents you insert will be allocated in 32 byte allocation units and if it is greater than that size, the next size will be 64, 128, etc. to 16MB limit. After 4MB it actually adds only 1MB at a time.

The reason for this sizing strategy is to avoid heap fragmentation or deleted space fragmentation if we have records of differents sizes and there is lots of deletes and reinserts of thing of varying sizes. In other case what can happend is we have a lot of empty slots in the collection that are all of different sizes, and there is a tendency toward fragmentation.

If you have a collection where you really need to optimize and you want a very specific allocation unit size, you could turn this off

Read & Write Recap

generally, more indexes --> faster reads
generally, more indexes --> slower writes
faster to buid an index post-import than pre-import because all the insertions of the b-tree are done in the import process. In Post-import the creation of indexes can be made in batch mode bottom up except for the ID index which will be there automatically the whole time.

currentOp() & killOp() Revisited

db.currentOp() -> to see what operations are currently running
db.killOp( < opId> ) -> to kill operations or commands

db.currentOp().inprog.forEach ( function (op)
{
if ( op.secs_running >= 0)
{
print ("slow op in progress ? secs: " + op.secs_running + " op.id:" + op.opid);
}
})

Cautions:

Exercise care killing write ops on secondaries
Exercise care killing compact data operations
Don't kill internal ops

Which ops are safe to kill (without any special effort or cleanup or implications later beyond the obvious):

The Profiler

It runs at the mongod individual level. You can use it to profile individual MongoDBs in the shared cluster and in replica sets.

db.setProfinlingLevel( level, slowMilliseconds )

levels:

level = 0 --> OFF (by default)
level = 1 --> selective "slow" (give me slow ones, based on some threshold in milliseconds
level = 2 --> ON

db.system.profile().find().pretty()

the system profile collection is a capped collection.

how much overhead is there when profiling is on? it slows down not as much as you would expect
db.system.namespaces.find()
db.system.profile.getIndexes()
db.system.profile.stats() --> we can see if the collection is capped and other information

True or False: The system profiler is on by default.

mongostat and mongotop

They are utilities at the command line that are useful for administrators:

mongostat:

mongostat --help
mongostat --port 27017
gives information about:

insert x second
query x second
update x second
delete x second
getmore x second --> pulls from secondaries
commands x second
flushes --> data files in the background thread to disk
mapped --> how much memory is mapped by mongoDb
virtual sizes of the process
resident size
page faults
locked db: write lock percentage

mongotop: is a view of information by collection

Web applications development

Buscar

miércoles, 21 de enero de 2015

MongoDB for DBA's 3/7. Performance