Buscar

viernes, 30 de enero de 2015

MongoDB for DBA's 4/7. Replication

Chapter 4. REPLICATION: Replication, Failover, Recovery

Replication Overview

Replica Sets are equivalent to redundant copies across multiples virual or physical machines with internal or attached storage space.

The grace of the matter is to get a document on multiple servers getting multiples copies, redundant copies of the same data. 

Why would we do this?

The primary reasons we do replication are:

  1. HA (High Availability): if we loose a server we will have failover
  2. DS (Data Safety) in terms of durability
  3. DR (Disaster Recovery)
Other reasons are:
  1. We could potencially read or query from different places
  2. To get a little bit of scalability for geographic purposes because servers don't have to be in the same facility.
  3. If we have differents workloads to put them on differents servers (called read preference in MongoDB)

Why do we use replica sets? Check all that apply.

Replication with Sharding

If we have got multiple copies of the data on multiple servers (for example three servers), some people might write (N=3) and we will have three copies of the data. If this machines have a RAID in their storage, there might be internally some mirroring of the copies. It doesn't matter if there may or may not be mirroring on the disk subsystem. If these different servers having different documents and we have replica sets and we do sharding we would then multiple shards with different data in each one. The replicas with a shard have the same data with many documents but having the same information. 

What the replication factor is up to you to choose ?
-If we want two copies of the data, there would be two machines. It is unusually to have only one copy per shard because that would be a little dangerous. Considering if we had M = 1000 shards (1000 machines) and we don't have any extra copies , it's going to be very common for at least, one of the thousand machines to be down.

So, the more shards you have, more likely you want a high replication factor. If I had a half a dozen shards, It might be able to have two copies and if we had 1000 shards, we would tend to have three due to the higher probability of having two servers down at the same time within one shard.

We don't have to use sharding to use replication.
We can use replication on an unsharded system to have sharding (only one shard) 

Asynchronous Replication

The replication in mongoDb is synchronous which is not usually in databases. 

Commonly, in some databases, the replicas between master and slave are asynchronous and the machines tend to be side by side on a very low latency network connecting them to assure a high performance. 


Which of the following are true about replication in MongoDB?

Statement-based vs. Binary Replication

Statement-based is other of the replica set properties and you can use to contrast that with first its binary replications.

There is a couple of ways to do replication:

  1. sort of a logical levels
  2. her statement levels on other assistance in the raw data across
If you use statemens to acces data the replication don't need to control how the data is physically stored.

If you use binary replication, it is necessary to know if the files data are or are not changing to maintain the replication. For example, this system is used in journal files of mongoDB in case of crash recovery of a single node not involving in replication.



Replication Concepts

In mongoDB , the way we do replications is via replica sets. If we shards, eache shard is a replica set with one or more servers in it, probably two or three, typically and there may be more than two servers in a replica set. In fact, is possible to have one, and that becomes sort of the trivial case. It's not very useful, but the notion is supported. If we have three menbers of a replica set, de RF = 3 (replica set factor).


Automatic Failover

What replica sets give beyond the standard master/slave asynchronous replications is:

  1. Automatic failover
  2. Automatic node recovery after failure accurred

If we have one primary and two secondaries and the primary goes down, the two secondaries will be monitoring all that members each other until detect that the primary is off. At that moment, the secondaries have a consensus that the primary node is down and will determine who of them will be the new primary using an election. Now, replication occur between the new master and one of the secondaries. The drivers our replica sets of the client are aware and they realize to need to talk to the new primary instead of the primary off.

The failover is not instantaneous., it takes a little bit of time (ten of seconds) but doing that automatic failover is a strong consistency. Also, it is possible to read from secondaries using the read preference indicating from where primary or secondary will be done the read.

Imagine a replica set with 5 servers. What is the minimum number of servers (assume each server has 1 vote) to form a consensus?

Recovery

If the primary off, comes back, maybe, it just lost power, it booted back up and auto-started the mongod process on the server based on how we had set it up and ...

What do we do ?

What the mongoDB replica sets do is they will automatically roll back the writes that they did not make it out the door to the other server. If a secondary went down, if would became quite easy to just sync back up. You just start tailing from where you let off. 


In recoveries we have the facility to get aknowledgements of writes that have committed to the whole cluster. In this case, once that has occurred, the write will never be rolled back.


MongoDB has a method to ensure a write was replicated and will never need to be rolled back:


Starting Replica Sets

To set up a three node replica set only using one server we will run three mongod processes specifying diferent ports and database paths for each one. In case, they were actually separate machines or virtual machines it would not necessary  to do it like that.


We will need three directories (dir1, dir2, dir3) starting mongod in each one.

mongod --replSet nameReplicaSet --dbpath dir1 --port 27001 --opLogSize 50 --logpath log.1 --logappend --fork

mongod --replSet nameReplicaSet --dbpath dir2 --port 27002 --opLogSize 50 --logpath log.2 --logappend --fork

mongod --replSet nameReplicaSet --dbpath dir3 --port 27003 --opLogSize 50 --logpath log.3 --logappend --fork

--opLogSize = replication operation log
--logappend  = not to reset it yo empty on every start
--smallfiles = to use small files

mongo --port 27001
db.isMaster() --> to know if the server is the primary node or not.

Only writes will be allow to the primary. if we have the error "not master" in a write operation, we can use db.getLastErrorObj() to assure that error is coming back from "getLastError".

Why do we give replica sets names?

Initiating a Replica

What it will be happened when we initiate the set is:

  1. related to the initial data
  2. related to the configuration of the replica set

Related with the initial data is that the member of the set of the replica set which receives the initiate command can have initial data or not.

  1. NOT HAVING INITIAL DATA SET, everyone else has to start out empty on a initiation. The others will then sync from that.
  2. HAVING INITIAL DATA SET, we could use it to initiate the replica set. The command is replSetInitiate(). In the shell we can type rs.initiate(<< config >>)

Related to the configuration of the replica set, it will depend on the parametres of the initiation. We can create a json document containing the configuration like this:

config = {
//name of replica set
"_id" : "nameReplicaSet",
"members" : [ 
//node 1
{ "_id" : 1 , "host" : "nameHost.local:27001"}
//node 2
,{ "_id" : 2 , "host" : "nameHost.local:27002"}
//node 3
,{ "_id" : 3 , "host" : "nameHost.local:27003"}
]

}

Bests practices to name the hosts in replica set configs

  1. do not use raw IP addresses
  2. do not use names from /etc/hosts
  3. use DNS
    1. pick an appropiate TTL (5 minutes if we have a fast replacing machine)


To initiate the replica set:

rs.initiate(config) --> mongod will create the opLogs

Replica Set Status

rs.status() to get information of the replica set.

What does optimeDate represent?

Replica Set Commands

rs.conf() = gets the configuration of the replica set that is stored in the local database in a collection named system.replset. This is a special collection, just to hold the replica set config and only have one document in it at most. It is a sort of singleton collection.
The config data is a document for a cuple of reasons:

  1. We want able to modify this document atomically and hanving it be a single document let's us achieve that.
  2. To do reconfigurations it is necessary to use the command rs.conf not allowing direct writes
rs.add() = to add a new host or a new member

rs.stepDown() = to step down from primary for a certain period of time perhaps for 
administrative work, for maintenance.

rs.syncFrom( host ) = to make a sencodary to sync from the given member

rs.freeze( secs ) = to make a node ineligible to become a primary for the time specified

Which command prevents a node from becoming primary for 5 minutes?

Reading & Writing

We can use the command rs.slaveOk() in a secondary to make queries in a secondary

Failover

When a primary goes down and then is brought back online, it will always resume primary status:

Read Preference

The drivers to access mongoDB use the read preference. Having a replica set it is possible to have stale data in secondaries. The reasons for using read preference for some secundaries can be:

  1. geography: some nodes are near than others
  2. to separate a work load (analytics server)
  3. to separate load
  4. availability

Depends on what we want, we can read by default, from the primary, or chosing some secundary.

What are good reasons to read from a secondary? Check all that apply.

Read Preference Options

Only in mongoDB v2.2+. You can use tags to select the follow options:
  1. Primary (by default): When we open the connection in the mongo client using one of the drivers, we can select one of these options for connecting as a read preference. The default is Primary, and the reason that is the default is that gives the least surprising semantics, getting inmediately consistent reads without suprises like stale data.
  2. Primary preferred: try to talk to the primary but if the drive cannot find it, it will take a read from the secondary.
  3. Secondary: read only from secondary i f we want to1 keep the load off the primary
  4. Secondaty preferred: use secondaries first but use primary if you cannot reach any secondaries. In this case it might be hard for a primary to get 
  5. Nearest: go to the nearest member of the set in terms of network latency
In options 2,3,4 and 5, stale data is possible in the secondary.
In optins 2,4 and 5, primary is also possible.
  • use the default, Primary, if we are not sure what to use
  • use nearest
    • in a remote data center
    • coud also be good just for even read loads
  • use Secondaries for certain reporting workloads, taking in maind the possibility of lag
  • use Secondary preferred: if you have lots of capacity
For reads which must be consistent, which read preference(s) is used?

4 comentarios: