Buscar

viernes, 30 de enero de 2015

MongoDB for DBA's 4/7. Replication. Homeworks

Homework 4.1

In this chapter’s homework we will create a replica set and add some data to it.
1. Unpack replication.js from the Download Handout zip file.
2. We will create a three member replica set. Pick a root working directory to work in. Go to that directory in a console window.
Given we will have three members in the set, and three mongod processes, create three data directories:
mkdir 1
mkdir 2
mkdir 3
3. We will now start a single mongod as a standalone server. Given we will have three mongod processes on our single test server, we will explicitly specify the port numbers (this wouldn’t be necessary if we had three real machines or three virtual machines). We’ll also use the --smallfiles parameter and --oplogSize so the files are small given we have a lot of server processes running on our test PC.
$ # starting as a standalone server for problem 1:
$ mongod --dbpath 1 --port 27001 --smallfiles --oplogSize 50
Note: for all mongod startups in the homework this chapter, you can optionally use --logPath, --logappend, and --fork. Or, since this is just an exercise on a local PC, you could simply have a separate terminal window for all and forgo those settings. Run “mongod --help” for more info on those.
mongod --dbpath 1 --port 27001 --smallfiles --logpath log.1 --logappend --fork --oplogSize 50
mongod --dbpath 2 --port 27002 --smallfiles --logpath log.2 --logappend --fork --oplogSize 50
mongod --dbpath 3 --port 27003 --smallfiles --logpath log.3 --logappend --fork --oplogSize 50

To know pid mongo processes:

ps -Aef | grep mongod

To kill mongod processes

kill pid

4. In a separate terminal window (cmd.exe on Windows), run the mongo shell with the replication.js file:
mongo --port 27001 --shell replication.js
Then run in the shell:
> homework.init()
This will load a small amount of test data into the database.
Now run:
 > homework.a()

and enter the result. This will simply confirm all the above happened ok.

result: 5001


Homework 4.2

Now convert the mongod instance (the one in the problem 4.1 above, which uses “--dbpath 1”) to a single server replica set. To do this, you’ll need to stop the mongod (NOT the mongo shell instance) and restart it with “--replSet” on its command line. Give the set any name you like.
ps -Aef | grep mongod

mongodb   1009     1  0 09:25 ?        00:00:05 /usr/bin/mongod --config /etc/mongod.conf
user  3260  2009  0 09:42 ?        00:00:02 mongod --dbpath 1 --port 27001 --smallfiles --logpath log.1 --logappend --fork --oplogSize 50
user  3272  2009  0 09:43 ?        00:00:01 mongod --dbpath 2 --port 27002 --smallfiles --logpath log.2 --logappend --fork --oplogSize 50
user  3284  2009  0 09:43 ?        00:00:01 mongod --dbpath 3 --port 27003 --smallfiles --logpath log.3 --logappend --fork --oplogSize 50

user  3360  3221  0 09:48 pts/0    00:00:00 grep --color=auto mongod

To kill mongod processes

kill 3260

mongod --replSet abc --dbpath 1 --port 27001 --smallfiles --logpath log.1 --logappend --fork --oplogSize 50
Then go to the mongo shell. Once there, run
> rs.initiate()
> rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "SERVER:27001",
"info" : "Config now saved locally.  Should come online in about a minute.",
"ok" : 1
}
rs.status()
{
"set" : "abc",
"date" : ISODate("2015-01-30T08:55:02Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "SERVER:27001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 364,
"optime" : Timestamp(1422607970, 1),
"optimeDate" : ISODate("2015-01-30T08:52:50Z"),
"electionTime" : Timestamp(1422607970, 2),
"electionDate" : ISODate("2015-01-30T08:52:50Z"),
"self" : true
}
],
"ok" : 1
}
abc:PRIMARY> rs.conf()
{
"_id" : "abc",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "SERVER:27001"
}
]
}
When you first ran homework.init(), we loaded some data into the mongod. You should see it in the replication database. You can confirm with:
> use replication
> db.foo.find()
Once done with that, run
> homework.b()

in the mongo shell and enter that result below.
result: 5002

Homework 4.3

Now add two more members to the set. Use the 2/ and 3/ directories we created in homework 4.1. Run those two mongod’s on ports 27002 and 27003 respectively (the exact numbers could be different).
Remember to use the same replica set name as you used for the first member.
kill 3272 3484
mongod --replSet abc --dbpath 2 --port 27002 --smallfiles --logpath log.2 --logappend --fork --oplogSize 50
mongod --replSet abc --dbpath 3 --port 27003 --smallfiles --logpath log.3 --logappend --fork --oplogSize 50
You will need to add these two new members to your replica set, which will initially have only one member. In the shell running on the first member, you can see your replica set status with
> rs.status()
Initially it will have just that first member. Connecting to the other members will involve using
rs.add()
. For example,
> rs.add("localhost:27002")
You'll know it's added when you see an
{ "ok" : 1 }
document.
Your machine may or may not be OK with 'localhost'. If it isn't, try using the name in the "members.name" field in the document you get by calling rs.status() (but remember to use the correct port!).
abc:PRIMARY> rs.conf()
{
"_id" : "abc",
"version" : 3,
"members" : [
{
"_id" : 0,
"host" : "SERVER:27001"
},
{
"_id" : 1,
"host" : "SERVER:27002"
},
{
"_id" : 2,
"host" : "SERVER:27003"
}
]
}
Once a secondary has spun up, you can connect to it with a new mongo shell instance.
mongo --port 27002
Use
rs.slaveOk()
to let the shell know you're OK with (potentially) stale data, and run some queries. You can also insert data on your primary and then read it out on your secondary. Once the servers have sync'd with the primary and are caught up, run (on your primary):
> homework.c()
and enter the result below.

result: 5


Homework 4.4

We will now remove the first member (@ port 27001) from the set.
As a first step to doing this we will shut it down. (Given the rest of the set can maintain a majority, we can still do a majority reconfiguration if it is down.)
We could simply terminate its mongod process, but if we use the replSetStepDown command, the failover may be faster. That is a good practice, though not essential. Connect to member 1 (port 27001) in the shell and run:
> rs.stepDown()
Then cleanly terminate the mongod process for member 1.
abc:SECONDARY> rs.status()
{
"set" : "abc",
"date" : ISODate("2015-01-30T09:42:11Z"),
"myState" : 2,
"syncingTo" : "SERVER:27003",
"members" : [
{
"_id" : 0,
"name" : "SERVER:27001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 621,
"optime" : Timestamp(1422610708, 149),
"optimeDate" : ISODate("2015-01-30T09:38:28Z"),
"infoMessage" : "syncing to: SERVER:27003",
"self" : true
},
{
"_id" : 1,
"name" : "SERVER:27002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 398,
"optime" : Timestamp(1422610708, 149),
"optimeDate" : ISODate("2015-01-30T09:38:28Z"),
"lastHeartbeat" : ISODate("2015-01-30T09:42:10Z"),
"lastHeartbeatRecv" : ISODate("2015-01-30T09:42:11Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: SERVER:27001",
"syncingTo" : "SERVER:27001"
},
{
"_id" : 2,
"name" : "SERVER:27003",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 394,
"optime" : Timestamp(1422610708, 149),
"optimeDate" : ISODate("2015-01-30T09:38:28Z"),
"lastHeartbeat" : ISODate("2015-01-30T09:42:11Z"),
"lastHeartbeatRecv" : ISODate("2015-01-30T09:42:11Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1422610890, 1),
"electionDate" : ISODate("2015-01-30T09:41:30Z")
}
],
"ok" : 1
}
Next, go to the new primary of the set. You will probably need to connect with the mongo shell, which you'll want to run with '--shell replication.js' since we'll be getting the homework solution from there. 
mongo --port 27003 --shell replication.js
Once you are connected, run rs.status() to check that things are as you expect. Then reconfigure to remove member 1.
Tip: You can either use rs.reconfig() with your new configuration that does not contain the first member, or rs.remove(), specifying the host:port of the server you wish to remove as a string for the input.
rs.remove("SERVER:27001")
When done, run
> homework.d()
and enter the result.
Tip: If you ran the shell without replication.js on the command line, restart the shell with it.
 result: 6


Homework 4.5

Note our replica set now has an even number of members, and that is not a best practice. However, to keep the homework from getting too long we’ll leave it at that for now, and instead do one more exercise below involving the oplog.
To get the right answer on this problem, you must perform the homework questions in order. Otherwise, your oplog may look different than we expect.
Go to the secondary in the replica set. The shell should say SECONDARY at the prompt if you've done everything correctly.
mongo --port 27002 --shell replication.js
Switch to the local database and then look at the oplog:
> db.oplog.rs.find()
If you get a blank result, you are not on the right database.
Note: as the local database doesn’t replicate, it will let you query it without entering “rs.slaveOk()” first.
Next look at the stats on the oplog to get a feel for its size:
> db.oplog.rs.stats()
What result does this expression give when evaluated?
db.oplog.rs.find().sort({$natural:1}).limit(1).next().o.msg[0]

 result: R


MongoDB for DBA's 4/7. Replication

Chapter 4. REPLICATION: Replication, Failover, Recovery

Replication Overview

Replica Sets are equivalent to redundant copies across multiples virual or physical machines with internal or attached storage space.

The grace of the matter is to get a document on multiple servers getting multiples copies, redundant copies of the same data. 

Why would we do this?

The primary reasons we do replication are:

  1. HA (High Availability): if we loose a server we will have failover
  2. DS (Data Safety) in terms of durability
  3. DR (Disaster Recovery)
Other reasons are:
  1. We could potencially read or query from different places
  2. To get a little bit of scalability for geographic purposes because servers don't have to be in the same facility.
  3. If we have differents workloads to put them on differents servers (called read preference in MongoDB)

Why do we use replica sets? Check all that apply.

Replication with Sharding

If we have got multiple copies of the data on multiple servers (for example three servers), some people might write (N=3) and we will have three copies of the data. If this machines have a RAID in their storage, there might be internally some mirroring of the copies. It doesn't matter if there may or may not be mirroring on the disk subsystem. If these different servers having different documents and we have replica sets and we do sharding we would then multiple shards with different data in each one. The replicas with a shard have the same data with many documents but having the same information. 

What the replication factor is up to you to choose ?
-If we want two copies of the data, there would be two machines. It is unusually to have only one copy per shard because that would be a little dangerous. Considering if we had M = 1000 shards (1000 machines) and we don't have any extra copies , it's going to be very common for at least, one of the thousand machines to be down.

So, the more shards you have, more likely you want a high replication factor. If I had a half a dozen shards, It might be able to have two copies and if we had 1000 shards, we would tend to have three due to the higher probability of having two servers down at the same time within one shard.

We don't have to use sharding to use replication.
We can use replication on an unsharded system to have sharding (only one shard) 

Asynchronous Replication

The replication in mongoDb is synchronous which is not usually in databases. 

Commonly, in some databases, the replicas between master and slave are asynchronous and the machines tend to be side by side on a very low latency network connecting them to assure a high performance. 


Which of the following are true about replication in MongoDB?

Statement-based vs. Binary Replication

Statement-based is other of the replica set properties and you can use to contrast that with first its binary replications.

There is a couple of ways to do replication:

  1. sort of a logical levels
  2. her statement levels on other assistance in the raw data across
If you use statemens to acces data the replication don't need to control how the data is physically stored.

If you use binary replication, it is necessary to know if the files data are or are not changing to maintain the replication. For example, this system is used in journal files of mongoDB in case of crash recovery of a single node not involving in replication.



Replication Concepts

In mongoDB , the way we do replications is via replica sets. If we shards, eache shard is a replica set with one or more servers in it, probably two or three, typically and there may be more than two servers in a replica set. In fact, is possible to have one, and that becomes sort of the trivial case. It's not very useful, but the notion is supported. If we have three menbers of a replica set, de RF = 3 (replica set factor).


Automatic Failover

What replica sets give beyond the standard master/slave asynchronous replications is:

  1. Automatic failover
  2. Automatic node recovery after failure accurred

If we have one primary and two secondaries and the primary goes down, the two secondaries will be monitoring all that members each other until detect that the primary is off. At that moment, the secondaries have a consensus that the primary node is down and will determine who of them will be the new primary using an election. Now, replication occur between the new master and one of the secondaries. The drivers our replica sets of the client are aware and they realize to need to talk to the new primary instead of the primary off.

The failover is not instantaneous., it takes a little bit of time (ten of seconds) but doing that automatic failover is a strong consistency. Also, it is possible to read from secondaries using the read preference indicating from where primary or secondary will be done the read.

Imagine a replica set with 5 servers. What is the minimum number of servers (assume each server has 1 vote) to form a consensus?

Recovery

If the primary off, comes back, maybe, it just lost power, it booted back up and auto-started the mongod process on the server based on how we had set it up and ...

What do we do ?

What the mongoDB replica sets do is they will automatically roll back the writes that they did not make it out the door to the other server. If a secondary went down, if would became quite easy to just sync back up. You just start tailing from where you let off. 


In recoveries we have the facility to get aknowledgements of writes that have committed to the whole cluster. In this case, once that has occurred, the write will never be rolled back.


MongoDB has a method to ensure a write was replicated and will never need to be rolled back:


Starting Replica Sets

To set up a three node replica set only using one server we will run three mongod processes specifying diferent ports and database paths for each one. In case, they were actually separate machines or virtual machines it would not necessary  to do it like that.


We will need three directories (dir1, dir2, dir3) starting mongod in each one.

mongod --replSet nameReplicaSet --dbpath dir1 --port 27001 --opLogSize 50 --logpath log.1 --logappend --fork

mongod --replSet nameReplicaSet --dbpath dir2 --port 27002 --opLogSize 50 --logpath log.2 --logappend --fork

mongod --replSet nameReplicaSet --dbpath dir3 --port 27003 --opLogSize 50 --logpath log.3 --logappend --fork

--opLogSize = replication operation log
--logappend  = not to reset it yo empty on every start
--smallfiles = to use small files

mongo --port 27001
db.isMaster() --> to know if the server is the primary node or not.

Only writes will be allow to the primary. if we have the error "not master" in a write operation, we can use db.getLastErrorObj() to assure that error is coming back from "getLastError".

Why do we give replica sets names?

Initiating a Replica

What it will be happened when we initiate the set is:

  1. related to the initial data
  2. related to the configuration of the replica set

Related with the initial data is that the member of the set of the replica set which receives the initiate command can have initial data or not.

  1. NOT HAVING INITIAL DATA SET, everyone else has to start out empty on a initiation. The others will then sync from that.
  2. HAVING INITIAL DATA SET, we could use it to initiate the replica set. The command is replSetInitiate(). In the shell we can type rs.initiate(<< config >>)

Related to the configuration of the replica set, it will depend on the parametres of the initiation. We can create a json document containing the configuration like this:

config = {
//name of replica set
"_id" : "nameReplicaSet",
"members" : [ 
//node 1
{ "_id" : 1 , "host" : "nameHost.local:27001"}
//node 2
,{ "_id" : 2 , "host" : "nameHost.local:27002"}
//node 3
,{ "_id" : 3 , "host" : "nameHost.local:27003"}
]

}

Bests practices to name the hosts in replica set configs

  1. do not use raw IP addresses
  2. do not use names from /etc/hosts
  3. use DNS
    1. pick an appropiate TTL (5 minutes if we have a fast replacing machine)


To initiate the replica set:

rs.initiate(config) --> mongod will create the opLogs

Replica Set Status

rs.status() to get information of the replica set.

What does optimeDate represent?

Replica Set Commands

rs.conf() = gets the configuration of the replica set that is stored in the local database in a collection named system.replset. This is a special collection, just to hold the replica set config and only have one document in it at most. It is a sort of singleton collection.
The config data is a document for a cuple of reasons:

  1. We want able to modify this document atomically and hanving it be a single document let's us achieve that.
  2. To do reconfigurations it is necessary to use the command rs.conf not allowing direct writes
rs.add() = to add a new host or a new member

rs.stepDown() = to step down from primary for a certain period of time perhaps for 
administrative work, for maintenance.

rs.syncFrom( host ) = to make a sencodary to sync from the given member

rs.freeze( secs ) = to make a node ineligible to become a primary for the time specified

Which command prevents a node from becoming primary for 5 minutes?

Reading & Writing

We can use the command rs.slaveOk() in a secondary to make queries in a secondary

Failover

When a primary goes down and then is brought back online, it will always resume primary status:

Read Preference

The drivers to access mongoDB use the read preference. Having a replica set it is possible to have stale data in secondaries. The reasons for using read preference for some secundaries can be:

  1. geography: some nodes are near than others
  2. to separate a work load (analytics server)
  3. to separate load
  4. availability

Depends on what we want, we can read by default, from the primary, or chosing some secundary.

What are good reasons to read from a secondary? Check all that apply.

Read Preference Options

Only in mongoDB v2.2+. You can use tags to select the follow options:
  1. Primary (by default): When we open the connection in the mongo client using one of the drivers, we can select one of these options for connecting as a read preference. The default is Primary, and the reason that is the default is that gives the least surprising semantics, getting inmediately consistent reads without suprises like stale data.
  2. Primary preferred: try to talk to the primary but if the drive cannot find it, it will take a read from the secondary.
  3. Secondary: read only from secondary i f we want to1 keep the load off the primary
  4. Secondaty preferred: use secondaries first but use primary if you cannot reach any secondaries. In this case it might be hard for a primary to get 
  5. Nearest: go to the nearest member of the set in terms of network latency
In options 2,3,4 and 5, stale data is possible in the secondary.
In optins 2,4 and 5, primary is also possible.
  • use the default, Primary, if we are not sure what to use
  • use nearest
    • in a remote data center
    • coud also be good just for even read loads
  • use Secondaries for certain reporting workloads, taking in maind the possibility of lag
  • use Secondary preferred: if you have lots of capacity
For reads which must be consistent, which read preference(s) is used?