Web applications development : diciembre 2014

Write Concern

It's how concern are your writes complete before you get the responses back ?

All of this is controlled by the drivers. The driver and mongo shell will execute for you the function getLastError after the write operation. In mongo the operations are not acknowledge and requires a second call to getLastError function.

Every single time yo do an insert o update (a single operation), the function getLastError is called in order to get a possible error. If you use a driver to access mongodb database you can decide if these drivers will call or not the function getLastError.

The function getLastError can have two parameters:

w: (write) if it's equal to 1, it determines weather or not you want to wait for the write operation to be acknowledge.
j: (journal = log to disk with the operations with the data) if it's equal to 1, getLastError waits until the journal commits to disk

Values of parameters:

w = 0, j = 0 : fire and forget
w = 1, j = 0 : wait for a simple acknowlegement from mongo that receives the write. BY DEFAULT
w = 0 or 1 , j = 1 : wait for the write commits the journal. TO SURE THAT YOU ARRIVE TO DISK BEFORE YOU GO ON

Provided you assume that the disk is persistent, what are the w and j settings required to guarantee that an insert or update has been written all the way to disk.

Network Errors

We can have network errors that provoque a write or insert cannot arrive to disk even though the options of write concern will set to true (w=1, j=1). We can never complete sure what exactly happened in the transaction. In order to solve this situations you can use a 'try exception' to get the error and offer a solution.

What are the reasons why an application may receive an error back even if the write was successful. Check all that apply.

The Pymongo Driver

The api web site is api.mongodb.org . This is the main directory of all the drivers to use with mongodb as a developer.

The current driver will always be at api.mongodb.org/python/current

The most important takeaways are that you should be using pymongo.MongoClient() to connect to a standalone server, or if you're connecting to a replica set, an even better option is pymongo.MongoReplicaSetClient().

Which of the following are valid, supported ways to connect to a server with pymongo?

Introduction to Replication

Replication offers fault tolerance in order to the system continues working when a node goes down or we have an accident like a fire.

The solution of mongo is building a replica set. A replica set is a group of nodes with mongod that works mirroring each other the data. The is one primary node and the others are secondaries dynamically.

The operation is that your application and its drivers stay connected to the primary node, and will write to the primary (you can only write to the primary). If the primary goes down, the reamining nodes will perfom an election to elect a new primary having a strict majority of the original nodes.

The minimun number of nodes to buid a replica set is three and you can have an arbiter node to decide which one will be primary in case of a tie.

What is the minimum original number of nodes needed to assure the election of a new Primary if a node goes down?

Replica Set Elections

The Replica set is totally transparent for applications that will continue working without any break.

Types of Replica set nodes:

Regular: has the data and it's the most normal type of node. It can be a primary or secondary
Arbiter: It can be a regular node. It's used for voting purposes. If you have even number of replica set nodes you need to make sure that there's an arbiter node in order to have a strict majority to elect a node as primary
Delayed/Regular: It offers the possibility to delay behind other nodes to recover data in a fast way. It cannot be a primary but can participate voting the election. Its priority is set to zero.
Hidden: it's often used for analytics.

Setting the priority to zero, the node will not elected as a primary. It cannot be a primary but can participate voting the election. Its priority is set to zero.

Wihich types of nodes can participate in elections of a new primary?

Write Consistency

The writes are sent to the primary node but the reads can send to any node (by default is set to the primary node to have a good consistency) but keep in mind tha the lag between any two nodes is no guaranteed becase the replication is asynchronous and data read cannot exist at the time of reading.

During the time when failover is occurring, can writes successfully complete?

Creating a Replica Set

In order to create a Replica set, we have to create and next initizalize their nodes (in this case the three nodes are in one server with differents ports. In other case it will be necessary specify th target host and port for each node):

1) Create the nodes of Replica set:
mongod --port 27017 --dbpath "/var/lib/mongodb/data/rs1" --replSet group1 --logpath "log_1.log" --oplogSize 200 --fork --smallfiles

mongod --port 27018 --dbpath "/var/lib/mongodb/data/rs2" --replSet group1 --logpath "log_2.log" --oplogSize 200 --fork --smallfiles

mongod --port 27019 --dbpath "/var/lib/mongodb/data/rs3" --replSet group1 --logpath "log_3.log" --oplogSize 200 --fork --smallfiles

2) Initialise the Replica set creating a variable within the configuration and loading it from the shell.

config = {
    "_id" : "group1",
    "members" : [
    //node 1
    { "_id" : 1 , "host" : "localhost:27017"}
    //node 2
    ,{ "_id" : 2 , "host" : "localhost:27018"}
    //node 3
    ,{ "_id" : 3 , "host" : "localhost:27019"}
    ]
};

rs.initiate(config);

To know the status of replica set:

rs.status();

To allow readings from an slave node it's necessary to execute in the secondary node:

rs.slaveOk()

To show which node is the master:

rs.isMaster()

To force the current replica set member to step down as primary and then attempt to avoid election as primary for the designated number of seconds (60 second by default). Produces an error if the current member is not the primary.

rs.stepDown()

To show help of replica set options:

rs.help()

Which command, when issued from the mongo shell, will allow you to read from a secondary?

Replica Set Internals

In the video how long did it take to elect a new primary?

This is an approach to horizontal scalability. Every shard node can have their replica set because of this we have a lot of hosts involved.

In order to distribute the data, mongo uses a router named 'mongos' that's going to take care of the distribution. It's going to keep some sort of connection pool or knowledge of all the different hosts, and it's going to route them properly.

shard_key: something that is going to determine, it's some part of the document itself (the _id of the document)

Once you make the decission of what kind of shard key to use, mongo will then break the collection into chunks and decide what shard each of the chunks lives on a range-based way, and then any query that you make, which now has to be routed to a Mongo OS will then go to the appropiate shards to answer your query.

If the shard key is not include in a find operation and there are 3 shards, each one a replica set with 3 nodes, how many nodes will see the find operation?

Building a Sharded Environment

If you want to build a production system with two shards, each one a replica set with three nodes, how may mongod processes must you start?

2 shards has 6 nodes. 3 config nodes

Implications of Sharding

Some things to remember in a shard environment:

Every document needs to include the shard key
The shard key is immutable: yo cannot change the shard key inside the document
it needs an index that starts with the shard key but it cannot be a multiple index
when do an update, it's necessary to specify the shard key or specify that multi is true
no shard key means scatter gather operation, which could be expensive
you can't have a unique key, no unique index, unless it's also part of the shard key

Suppose you wanted to shard the zip code collection after importing it. You want to shard on zip code. What index would be required to allow MongoDB to shard on zip code?

Sharding + Replication

Suppose you want to run multiple mongos routers for redundancy. What level of the stack will assure that you can failover to a different mongos from within your application?

Choosing a Shard Key

Sufficient cardinality: in order to mongo can distribute the documents in shards
avoid hotspots in writes: monotically increasing: the shard key will not provoque that all the writes will go to an specific shard

You are building a facebook competitor called footbook that will be a mobile social network of feet. You have decided that your primary data structure for posts to the wall will look like this:

{'username':'toeguy',
     'posttime':ISODate("2012-12-02T23:12:23Z"),
     "randomthought": "I am looking at my feet right now",
     'visible_to':['friends','family', 'walkers']}

Thinking about the tradeoffs of shard key selection, select the true statements below.

Web applications development

Buscar

jueves, 4 de diciembre de 2014

MongoDB course for developers. unit 6/8. Application Engineering. Homeworks

Homework 6.1

Homework 6.2

Homework 6.3

Homework 6.4

Homework 6.5

MongoDB course for developers. unit 6/8. Application Engineering

Write Concern

Network Errors

The Pymongo Driver

Introduction to Replication

Replica Set Elections

Write Consistency

Creating a Replica Set

Replica Set Internals

Failover and Rollback

Connecting to a Replica Set from Pymongo

What happens when the failover occurs

Detecting Failover

Proper Handling of Failover

Write concern revisited

Read preferences

Implications of replication

Introduction to Sharding

Building a Sharded Environment

Implications of Sharding

Sharding + Replication

Choosing a Shard Key