Buscar

miércoles, 4 de febrero de 2015

MongoDB for DBA's 5/7. Replication Part 2

Chapter 5. REPLICATION PART 2: Optimizing and monitoring your Replica Sets

Reconfiguring a Replica Set

You can do reconfigurations when a member is down. If a set has a configuration, we might want to change it like that:

  1. adding a new server
  2. a new replication factor
  3. etc...
if we have a replica set with three members (a primary and two secondaries) and one of the secondaries goes down, we can do a reconfiguration of the set, even though the server is down. To do a reconfiguration of the set requieres that the majority of the members of the set, of the voters, are up. In this case we can do a reconfiguration because we have a majority with two members up and only one member down. We will use the command:

rs.reconfig() using the shell --> sending this command to the primary of the set supposing that the primary is up. When the reconfiguration is done the verion number of the configuration is incremented. When the secondary goes up, will update the configuration of the replica set.

Example of configuration parameters of replica set that has been reconfigurated:

abc.PRIMARY> rs.conf()
{

"_id" : "abc",

 "version" : 2,

"members" : [ 

{ "_id" : 1  , "host" : "localhost:27001" }
,{ "_id" : 2 , "host" : "localhost:27002" }
,{ "_id" : 3 , "host" : "localhost:27003" }
]
}

Which of the following statements are true about reconfiguring a replica set? Check all that apply.

Arbiters

We can specify options in configuration document to set up a replica set:

config = {
//name del replica set
"_id" : "abc",
"members" : [ 
//node 1
{ "_id" : 1 , "host" : "localhost:27001", <options>}
//node 2
,{ "_id" : 2 , "host" : "localhost:27002", <options>}
//node 3
,{ "_id" : 3 , "host" : "localhost:27003", <options>}
]
}
options: --> 'arbiterOnly' : true  --> it indicates that the member of the set is what we call an arbiter ( a member that has not data). This arbiter can participate in elections in order to undo a tie causing a tiebreaker election. 

If we have one primary member and on secondary member and due to network problems the secondary member can not comunicate with primary, an election will do it. In the elections, each member will assing one vote to itself because each member has data. In this case any of them can not be elected because there is not a majority, they only have one vote.

If we have an arbiter member (without data) added to the replica set. In this case, that arbiter can vote to the primary because the primary has data but the primary can not vote to the arbiter, because the arbiter has not data. Then, we will have a primary with two votes and an arbiter with only the vote by itself. The majority of votes will be to the primary and will be elected as primary.

If we have two secondaries electing, each secondary will assign one vote to itself and the secondary who has the newest data, will receive on vote more for the other secondary causing a tiebreaker.

When might you want to use an arbiter? Check all that apply.

Priority Options

Another option is:

priority: <n> --> is used when we want to indicate some bias as to which server should be primary if both are elegible to be primary. The higher the number, the higher the priority of the server. Possible values:

  1. value = 1 --> By default
  2. value = 0 --> never primary
  3. value = 0.5 --> it can be primary if no one is eligible

Which values are valid "priority" values in a replia set configuration? Check all that apply.

Hidden Option & Slave Delay

Another option which is related to priority is:

hidden : <bool> --> it indicates that the member cannot be primary but maybe we want to query it. with this option 'true', the clients cannot see the server, and then if we declare it a disaster probably bump up the priority and we eliminate the hidden true.

slaveDelay : <seconds> --> it indicates that the member that must lag can never be fresher than the delay time. This options is super useful in the real world in terms of risk management and building reliable systems.  In other words, the member delays a time for replication to get time to recovery delete data from clients of the server that has the delay.

You need to set:
  1. hidden : true
  2. priority : 0
  3. slaveDelay : <seconds>
at the same time if we want the replica set member hidden because we will not want to get outdated data from a member delayed.

Example:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
cfg.members[0].slaveDelay = 3600
rs.reconfig(cfg)
After the replica set reconfigures, the delayed secondary member cannot become primary and is hidden from applications. The slaveDelay value delays both replication and the member’s oplog by 3600 seconds (1 hour).

When updating the replica configuration object, access the replica set members in the members array with the array index. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.

IMPORTANT: The length of the secondary slaveDelay must fit within the window of the oplog. If the oplog is shorter than the slaveDelay window, the delayed member cannot successfully replicate operations.

WARNING:
  • The rs.reconfig() shell method can force the current primary to step down, which causes an election. When the primary steps down, the mongod closes all client connections. While this typically takes 10-20 seconds, try to make these changes during scheduled maintenance periods.
  • To successfully reconfigure a replica set, a majority of the members must be accessible. If your replica set has an even number of members, add an arbiter to ensure that members can quickly obtain a majority of votes in an election for primary.

Which scenarios does it make sense to use slave delay? Check all that apply.

Voting Options

Another option is:

votes : <number> --> In general, as a best practice suggestion, do not use votes. For example, if we have two members with the replication factor of 2, we will not have a majority of votes if an election is done because both they will want to vote for themselves. The solution is to add an arbiter member but we can use this option to add votes to a member causing a tiebreaker.

Mongodb allows seven voting members in a replica set, so if your replica set has more than seven members, you will need to assign any further members a vote of zero.

Generally, is it typical for servers to have different vote counts?

Applied Reconfiguration

If we want to apply a reconfiguration to make a server slave-delayed, we need to shut it down but there is an option to do it without it. We can update the configuration parameteres of the replica set.

> var cfg = rs.conf()
cfg = {
"_id" : "abc",
        "version" : 1,
"members" : [ 
 { "_id" : 1 , "host" : "localhost:27001" }
,{ "_id" : 2 , "host" : "localhost:27002"}
,{ "_id" : 3 , "host" : "localhost:27003"}
]

}

> cfg.members[2].slaveDelay = 180;

> rs.reconfig( cfg )
{ "ok" :  1 }

> rs.conf()
{
"_id" : "abc",
        "version" : 2,
"members" : [ 
 { "_id" : 1 , "host" : "localhost:27001" }
,{ "_id" : 2 , "host" : "localhost:27002"}
,{ "_id" : 3 , "host" : "localhost:27003"}
]

}

> cfg.members[2].priority = 0;  --> it causes that the member can not be a primary in elections

> rs.reconfig( cfg )
{ "ok" :  1 }

>rs.conf()
{
"_id" : "abc",
        "version" : 2,
"members" : [ 
 { "_id" : 1 , "host" : "localhost:27001" }
,{ "_id" : 2 , "host" : "localhost:27002"}
,{ "_id" : 3 , "host" : "localhost:27003", priority : 0}
]

}

> rs.reconfig( cfg )  --> if this server is a primary , it will be disconnected and reconnected from clients to make an election (in this case less than a second) in other cases may take longer (ten seconds or longer). If the server was a secondary, there would not been any  disconnection. In this case, this server still votes and it is still a secondary. 

if now we make slaveDelay the server 3:

> cfg.members[2].slaveDelay = 8*3600; --> delay of eith hours.
> cfg.members[2].hidden = true; --> the server will automatically  force hidden to true for slave delayed members.

>cfg
{
"_id" : "abc",
        "version" : 3,
"members" : [ 
 { "_id" : 1 , "host" : "localhost:27001" }
,{ "_id" : 2 , "host" : "localhost:27002"}
,{ "_id" : 3 , "host" : "localhost:27003", priority : 0, slaveDelay: 28800, hidden : true}
]

}

It is possible to reconfigure the replica set, shutting it down and starting up the server 3 to update the configuration.

mongod --rest --replSet abc --dbpath 3 --port 27003 --oplogSize 50 --logpath log.3 --logappend --fork

> rs.reconfig( cfg )


Write Concern Principals

Write concern describes the guarantee that MongoDB provides when reporting on the success of a write operation. 

Write concern can include the 'w' option to specify the required number of acknowledgments before returning, the 'j' option to require writes to the journal before returning, and 'wtimeout' option to specify a time limit to prevent write operations from blocking indefinitely.

Principles to the replica sets:
  1. write is truly committed upon application at a majority of the set
  2. we can get acknowledgement of this

Combination of this commands in a write operation will return an acknowledgement when a cluster write commit has occurred and then the data is committed and durable

db.users.insert({ "aa" : 1 })
db.users.getLastError( { "w" : "majority", "wtimeout" : 8000 }

The 'w' parameter let's say hoy many servers we would like acknowledgement back from. We could specify a number.

The 'j' option: confirms that the mongod instance has written the data to the on-disk journal. This ensures that data is not lost if the mongod instance shuts down unexpectedly. Set to true to enable.Changed in version 2.6: Specifying a write concern that includes j: true to a mongod or mongos running with --nojournal option now errors. Previous versions would ignore the j: true. NOTE: Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the primary of the set regardless of the level of replica acknowledged write concern.

The 'wtimeout' option: specifies a time limit, in milliseconds, for the write concern. wtimeout is only applicable for w values greater than 1. wtimeout causes write operations to return with an error after the specified limit, even if the required write concern will eventually succeed. When these write operations return, MongoDB does not undo successful data modifications performed before the write concern exceeded the wtimeout time limit. If you do not specify the wtimeout option and the level of write concern is unachievable, the write operation will block indefinitely. Specifying a wtimeout value of 0 is equivalent to a write concern without the wtimeout option.

getLastError() --> return the error status of the preceding write operation on the current connection. 

With MongoDB 2.6+, you can simply use a write concern parameter in your write query (insert, update, or delete) in the options parameter. For example, if you wanted to perform an update query to the students collection while using a write concern of 3, you might use

db.students.update( { _id : 3 }, { $set : { grade : "A" }, { w : 3 } )

Imagine you're using a 5-server replica set and you have critical inserts which you do not want the potential for a rollback to happen. You also have to consider that secondaries may be taken down from to time for maintenance laving you with a potential 4-server replica set. Which write concern is best suited for these critical inserts?

Examining the 'w' parameter

The 'w' parameter let's say hoy many servers we would like acknowledgement back from. Options:
  1. 'w' = 1 --> Provides acknowledgment of write operations on a standalone mongod or the primary in a replica set. This is the default write concern for MongoDB.
  2. 'w' = 0 --> Disables basic acknowledgment of write operations, but returns information about socket exceptions and networking errors to the application.If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the server will require that mongod acknowledge the write operation.
  3. 'w' = number > 1 --> Guarantees that write operations have propagated successfully to the specified number of replica set members including the primary. For example, w: 2 indicates acknowledgements from the primary and at least one secondary.If you set w to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indefinitely.
  4. 'w' = majority --> Confirms that write operations have propagated to the majority of configured replica set: a majority of the set’s configured members must acknowledge the write operation before it succeeds. This allows you to avoid hard coding assumptions about the size of your replica set into your application.Changed in version 2.6: In Master/Slave deployments, MongoDB treats w: "majority" as equivalent to w: 1. In earlier versions of MongoDB, w: "majority" produces an error in master/slave deployments.
  5. 'w' = <tag set> -->By specifying a tag set, you can have fine-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern.

Write Concern is set at the DB level and every operation on this DB must use the same Write Concern:

Write Concern Use Cases & Patterns

Why would we ever not want to call getLasterror() and get a result back ?

Use cases of write concern:

  1. no call to getLastError() --> Example of page view counter: no user impact
  2. 'w' : 1 --> for writes that are not super critical
  3. 'w' : majority --> for writes where most things important
  4. 'w' : number > 1 --> for writes keeping in mind the flow control
  5. 'w' : <tag set> --> 
  6. call every N writes --> it depends on the criticality of data and how we want to view it. It is used to call getLastError() periodically:
for(i=0; i < N ;i++)
   {
   db.test.insert({"aa" : i});
   if ( i % 500 == 0 || i == N -1)
      getLastError({"w":majority});
   }

A variation:

for(i=0; i < N ;i++)
   {
   db.test.insert({"aa" : i});
   if ( i % 500 == 0 || i == N -1)
      getLastError({"w":majority});
   else
     getLastError({"w":1});
   }

For getLastError / WriteConcern with w=3, if you have an arbiter, it counts as one of the 3.

Reexaminig the Page View Counter Pattern


In order to have the best control related to the writes:

  1. use write concern
  2. user 'w' majority
  3. tune if and only if slow
Does getLastError() need to be called if using default Write Concerns?

wtimeout & Capacity Planning

In some situations is interesting call to getLastError() when job ends (wait and get an acknowledgement) when the job is completed.


For example, if we are using write concern calling before a write having a high volume application doing a lot, we will hitting the primary with writes with replicas to secondaries:

getLastError( { 'w':majority })

we are getting these acknowledgements on 'w':majority and we are hitting it with writes from many clients doing writes all the same time. If the system is fast, it is working quite well because we are getting back our acknowlegements when w:majority gets done. 

If we have an scenario that the secondaries are lagging because of a high volume or a peak traffic, the calls will begin to fail without problems because we have 'w' majority. Now if we have wtimeout option:

getLastError( {'w':majority, 'wtimeout':8000})

in that case, if the time used to get 'w' majority is greater than 'wtimeout', we get te system slow and a number of connections piles up really high. It can be a real problem with the triggers when things get slow.

The connections pile up on slowness (a kind of slowness is wtimeout). It is necessary to choose carefully the number of open connections to the mongod server: If we have  a max of ten thousand open connections in one hundred of appservers, the max pool size per server would be one hundred connections. 

Important:

  1. choose the max number  of open connections
  2. choose the wtimeout carefully: better 10seconds that 3 in order to the write concern will have enough time.
  3. monitor for lag
The best option is 'w' for majority because if a member of three is out or slow, we are waiting only por two members, the fastest two assuming that the slow one is not the primary but a secondary. 

What are some issues with using wtimeout? Check all that apply.

Replica Sets in a Single Datacenter

Recommended configurations for replica sets. Possible scenarios in a single datacernter:

  1. 3 members
  2. 2 members + arbiter
  3. 2 members with manual failover
  4. 5 members (nout very useful but scale-up)
  5. 2 members + 1 small member with priority = 0 (never primary) 

Replica Sets in Multiple Datacenters

Possible scenarios in a two datacernters:
  1. Datacenter 1
    1. 2 members (primary + secondary) 
  2. Datacenter 2
    1. a secondary (with priority = 0)
Possible scenarios in a three datacernters:
  1. Datacenter 1
    1. a secondary
  2. Datacenter 2
    1. a secondary
  3. Datacenter 3
    1. a primary
in case of network failure of member of datacenter 3 the other two secondaries will elect a primary between them voting each one for itself and voting the member with oldest data to the member with newest data and the primary in datacenter 3 will be in a secondary state. since it cannot see the rest of the members. In theory, if there is some client in this datacenter 3, it still can process reads if it is slave.

---------------------
  1. Datacenter 1
    1. a primary
  2. Datacenter 2
    1. a secondary
  3. Datacenter 3
    1. an arbiter (small member)

2 comentarios:

  1. can any one help on this?

    Homework: Homework 5.2

    You have just been hired at a new company with an existing MongoDB deployment. They are running a single replica set with two members. When you ask why, they explain that this ensures that the data will be durable in the face of the failure of either server. They also explain that should they use a readPreference of “primaryPreferred”, that the application can read from the one remaining server during server maintenance.

    You are concerned about two things, however. First, a server is brought down for maintenance once a month. When this is done, the replica set primary steps down, and the set cannot accept writes. You would like to ensure availability of writes during server maintenance.

    Second, you also want to ensure that all writes can be replicated during server maintenance.

    Which of the following options will allow you to ensure that a primary is available during server maintenance, and that any writes it receives will replicate during this time?

    Check all that apply.

    Add another data bearing node. Add two data bearing members plus one arbiter. Add an arbiter. Add two arbiters. Increase the priority of the first server from one to two.

    Add another data bearing node.
    Add two data bearing members plus one arbiter.
    Add an arbiter.
    Add two arbiters.
    Increase the priority of the first server from one to two.
    can anyone please help me for this assignment:

    I confused between below options: probably answer will be out of below options:

    1 and 5 ??
    1 and 3 and 5 ??
    only 1 ??
    only 5 ??
    Only 2 ??

    i tried 1, 3 but not working

    tried 3,5 but not working
    I have only one attempt left so need expert opinion.

    ResponderEliminar
  2. Add another data bearing node.
    Add two data bearing members plus one arbiter.

    ResponderEliminar