Buscar

miércoles, 18 de febrero de 2015

MongoDB for DBA's 7/7. Backup and Recovery

Chapter 7. BACKUP AND RECOVERY: Security, backups and restoring for backups

Overview of Security

There is a few different ways to run mongoDB and secure it:
  1. To run it in a trusted environment: no one has access except for clients. They have full access to the database. "lock down at the network layer the relevant TCP ports for the processes on the machines
  2. Enable mongodb authentication with two command line options:
    1. --auth  : for securing client connections and access
    2. --keyFile: intracluster security: the machines in the cluster authenticating each other among themselves. You can also optionally layer SSL atop it. To gain encryption of all the communications occurring you have to compile Mongo with the --SSL make option to do that. More information on the SSL: docs.mongodb.org/manual/administration/ssl
To run MongoDB with encryption over the wire, you need to:

Security continued: Authentication and Authorization

  1. Authentication: there are a differents forms supported by mongodb:
    1. mongodb challenge response (original authentication mechanism in the product) with usernames and passwords that are actually stored by the cluster.
    2. using X.509 (SSL-style certificates and authentication)
    3. Kerberos (only in Enterprise edition)
    4. LDAT (only in Enterprise edition)
  2. Authorization: access control where you are logged in
  3. encryption: It's possible to run mongodb cluster using SSL and encryption for all the communications through the cluster
  4. network setup: making sure the nodes are not accessible on the mongodb ports from servers or clients that are not allowed to use the cluster. For example, locking it down not being visible from the public internet
  5. auditing: it is not free, only available via mongodb suscription in enterprise edition
By default, when we run s mongod or a mongos process security is off. It is assumed it is runnint in a trusted environment.

To turn on the security facilities we can use the option command line --auth:

mongod --dbpath dtabase --auth 

mongo localhost/test --> let you connect to mongodb database on localhost because there is not authentication configured yet for the cluster. We have not created any users or roles yet.

To create users and roles, we have to switch to the Admin database (reserved name because it is a special database which includes the users and roles for administrators):

<dbname>.system.users for eache database

admin database is special: 

mongodb challenge response with usernames and passwords

use admin

var me = { user : "bob", pwd : "robert", roles : [ "userAdminAnyDatabase" ] }

db.createUser( me )

at that moment any try to access database is not authorized because we have not authenticated as that user we just created yet. And also these user will not have any privileges to reading or writing to databases, only for database administration.

mongod localhost/admin -u bob -p robert

var you = { user : "will", roles : [ "readWriteAnyDatabase" ] }

db.createUser( you )

mongod localhost/admin -u will

We can create users that have permissions for specific database. 

Quick summary of some of the standard roles that are available from v2.6+:
  1. read: read any database
  2. readWrite: read and write any database
  3. dbAdmin: db Admin any database
  4. userAdmin: admin any database
  5. clusterAdmin: gives one authorization to use the cluster-related mongodb commands: adding a shard and managing replica sets, etc...
You can also create custom that are user-define roles. More information on the built-in roles in the mongodb documentation: docs.mongodb.org/manual/security

--auth on the command line: when we are working with a sharded cluster or replica set, we will use another parameter called --keyFile <filename>: to tell the processes making up the cluster, the mongod and mongs processes how to authenticate amnong themselves inter-cluster. The filename will contain a shared secret key that all members of the cluster have available to them to cross-authenticate so the can coordinate their actions. If we use keyFile, it implies off, but it is recommended that we list both out explicitly in the config file or command line 

Which of the following do you need to run with --auth in order to run your system with authentication?

SSL and Keyfiles

  • The --keyFile option assures that the members of the cluster are all legitimate and authenticated as legal members of the cluster. We would have clients which talk to the cluster and clients applications which also talk to the cluster via mongo drivers. A member of a cluster is basically a server process that have full privileges to do sort of things. The key is really authentication rather than authorization.
  • The --auth command line option enables the security mechanisms for validating:
    • authenticating the client and
    • authorizating the client to access a particular database.
In both cases, the traffic that is occurring is not encrypted in terms of data transfer over the TCP sockets in use, however they are authenticated and the initial authentication hand shake is secure. The password or the key used for authentication does not go over the wire unencrypted.

We can use SSL with mongo but it does require us to build it ourself using scons--ssl.

Which are true?

Security and Clients

mongod --dbpath database --auth

users are created per database

There are a cuple different kinds of users in terms of the system authorizations.
  1. Admin users:
    1. can perform admin operations
    2. are created in the admin database
    3. can access all databases, they ares super user
  2. Regular users
    1. access a specific database
    2. can be read/write or read only
To create a admin user:

use admin --> switch to the special admin database
{ user: "<name>",
  pwd: "<cleartext password>",
  customData: { <any information> },
  roles: [
    { role: "<role>", db: "<database>" } | "<role>",
    ...
  ]
}
userAdmin = { user: "theAdmin", pwd : "admin" }

db.createUSer( userAdmin )
db.auth( "theAdmin" , "admin" ) --> only for admin database. We can have the same username in two different databases with different passwords. 

db.system.users.find()

To create a user for a specific database:

use test
db.createUSer( "pat", "123" ) --> by default it has write permission.

All the drivers provide APIs where you can provide credentiasl to authenticate the client as a particular user when connecting to the database.

Roles are much more finely grained, and custom user permissions can be created. More information at docs.mongodb.org/manual/reference/built-in-roles
For MongoDB 2.2 and 2.4, which are true?

Intra-cluster Security

XTo use the -keyFile option, we need to put into a file a key, which is a text string made out of base64 legal character, so upper and lower case letters, a few symbols and numbers or what are legal but it is very interesting that put a strong password.

In mongoDB 2.6, --auth is implied by --keyFile.

mongod --dbpath data1 --auth --replSet abc --keyFile filename.txt

rs.initiate()

mongod --port 27002 --dbpath data2 --auth --replSet abc --keyFile filename.txt


> rs.status()
{
 "ok" : 0,
 "errmsg" : "not authorized on admin to execute command { replSetGetStatus: 1.0 }",
 "code" : 13
}
>
We need to log in as adminitrator to use that administrative command

Overview of Backing Up

Methods to backup in mongodb for individual replica set or an individual server:

  1. mongodump utility: make a dump of all databases on a given server or particular database.
  2. filesystem snapshot
  3. backup from a secondary so from a replica set secondary
    1. shut down, copy files and restart 

Mongodump

Make a dump of all databases on a given server or particular database.
  • Use the --oplog option.
  • It can be done while the system servicing normal load and operations but it creates some additional load on the system
  • mongorestore utility: to restore from backup later if we needed to. Use the oplogReplay option : it will allow you to achieve a true point in time snapshot for the particular replica set we are dumping 
Which process would you use to reload data after using mongodump?

Filesystem Snapshotting

If we have a true snapshot capability, this is a good option for a mongod instance that it is up and running and hot:

  1. we will need journaling enabled: used for crash recovery in the event of an unexpected crash of a Mongod also mean that any snapshot which includes the journal, will be valid if it is a true point in time snapshot by the underlying file system
    1. if journaling is not enabled, what would happen is you will getting a snapshot from a point in time where it is possible that an operation was mid-streamed in terms of its rights to data files and it would not be necessarily consistent
    2. we need a snapshotting of the entire data directory and file system 
  2. we could use a feature called db.fsyncLock : it will flush everything to disk and then lock the system, preventing writes
Snapshots ares generally very fast.

Backing Up a Sharded Cluster

When we use sharding, we back up each shard individually and also the config servers. Steps  for backing up a sharded cluster:

  1. turn off the balancer: because it moves data around and if there are chunk migrations in progress during our backup that could be problematic. In the shell, we would stop the balancer with sh.stopBalancer() and if it is in the middle of doing someting that may take a minute before it returns
  2. backup config databases
    1. using mongodump
    2. stop one of our three config servers and copy its files
  3. backup each replica set
  4. start the balancer: sh.startBalancer()

How to restore it?
    We would be to pull back these data files on to the appropiate machines and then start everything up. 
After stopping the balancer on your sharded cluster, you may need to wait for a live migration to finish before the balancer stops. Type the command you would use in the mongo shell in order to stop the balancer.
Hint: It starts with the "sh" helper command...


1
sh.stopBalancer()
Correct

Backup Strategies

To backup a sharded cluster:
  1. stop the balancer
    1. mongo --host nameMongos --eval "sh.stopBalancer() --> (make sure that worked)
  2. backup config database / a config server
    1. mongodump --host nameMongos_or_nameConfigServer --db config
  3. backup all the shards
    1. Two ways:
      1. shut down a secondary in each replica set and grabbing its data files. if we have snapshotting, just grabbing a snapshot from each replica set of one node 
      2. do a mongodump of each shard:
        1. mongodump --host shard_1_srv --oplog /backups/clusters/shard_1
        2. mongodump --host shard_2_srv --oplog /backups/clusters/shard_2
        3. ...
        4. mongodump --host shard_n_srv --oplog /backups/clusters/shard_n
  4. turn the balancer back on
    1. mongo --host nameMongos --eval "sh.startBalancer()
To take in mind:

  1. we might want to check that the cluster is healthy before you even begin. 
  2. if we have replica sets, they can be up when a single server in the set is down 

Additional Features of MongoDB

  1. Capped collections: they are circular queues where the data can age out a least-recently-inserted-order. These is used for mongodb replication oplog which is for the system profile collection
    1. They have to have a preallocated size
    2. we cannot delete document in these collections or grow them via update
    3. it can be very fast for doing inserts
  1. TTL collections: it autoages out of old documents creating an special index with the extra parameter named TTL
  2. GridFS: meaning grid files system, used to store files in mongodb. In mongodb the BSON size limit is currently 16Mbytes. If we store something larger, there is a facility for that called gridFS (a convention for chunking up large data files or binary objects in kind of a consistency mechanisme. Most of the drivers have built-in support for doing that when we need to do that. They are objects in mongo db collections, where they are broken in pieces that are say, 1 megabyte chunk if we will. Utilities:
    1. mongoFiles: put a file into mongodb into a gridFS collection or pull one out and reassemble it from its pieces.
You can store videos in mongoDB.

Restoring Data with MMS Backup



GridFS

If we have a 100 TeraBytes object or file, it can be stored in gridFS.
The drivers for mongo understand gridFS and have support for it. 
There are command line tools for it.

Hardware Tips

  • fast CPU clock is more helpful than more cores (faster cores rather than more cores)
  • RAM is good. Mongos does not require a lot of ram
  • we definitely want 64 bits because mongo uses memory map files
  • virtualization is OK but certainly not required. It runs pretty well on Amazon EC2 and it runs fine on VMWare
  • disable NUMA (Non-Uniform Memory Access) machine 
  • SSD's (SOLID STATE DISKS) are good (reserve some empty space on the disk (unpartitioned))
  • the file system cache is most of mongod's memory usage
  • check readahead setting (small value)

If a database is 100GB in size and has an equal amount of random read and write operations occurring, which hardware configuration is likely to perform better?

Additional Resources

  1. docs:
    1. mongodb.org
  2. driver docs: 
    1. docs.mongodb.org/ecosystem/drivers/
  3. bug database / features
    1. jira.mongodb.org
    2. support forum
      1. google groups
  4. IRC
    1. irc.freenode.net/#mongodb
    2. webchat.freenode.net
  5. github.com
    1. source code
  6. blog:
    1. blog.mongodb.org
  7. @mongodb
  8. MMUGs (mongo meetup groups) in various cities around the world
  9. MMS (Mongo Monitoring Service)

3 comentarios:


  1. I really like the step by step approach to the final solution. I really like analytic functions now after reading this post. Thanks!!!

    MongoDB Training Centers in Chenai

    ResponderEliminar
  2. How to Secure MongoDB with Username and Password? Contact to MongoDB Technical Support
    If you are looking for any modernize support that protects and solve your MongoDB related issues then Cognegic is the best platform to tackle your all technical hiccups regarding MongoDB. Our MongoDB Online Support or MongoDB Customer Support USA will support your technology innovation and provides you tremendous solution regarding MongoDB. With Cognegic’s Support for MongoDB Database Software you can easily maximize the availability of your mission-critical environment.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ResponderEliminar
  3. How to light up if MongoDB shut down with Code 100? Contact to MongoDB Technical Support
    Exactly when your MongoDB shut down with the code 100 then it suggests something turning out gravely in your database that is the reason you have to stand up to this oversight. The principal respond in due order regarding this issue is it is conceivable that you keep running with physically like: you have to make a list for your DB data to be secured. Regardless, if not then another option is MongoDB Online Support or MongoDB Customer Support USA. We prescribe you to never pick any untouchable help for your MongoDB in light of the way that they charge exorbitantly. Regardless, here you will get most prominent help with the help of Support for MongoDB Database Software.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ResponderEliminar