Buscar

domingo, 23 de noviembre de 2014

MongoDB for developers 3/8. Schema design

MongoDB Schema Design







What's the single most important factor in designing your application schema within MongoDB?
Matching the data access patterns of your application.

Mongo Design for Blog

Which data access pattern is not well supported by the blog schema?
Providing a table of contents by tag (need aggregation to group it)

Living without Constraints
''Embeber primero los datos que tengan sentido para la aplicacion ya que hacen mucho mas facil mantener los datos intactos y consistentes."
 
What does Living Without Constraints refer to?

Living without Transactions
In the relational world, transactions offer atomicity, consistency, isolation and durability. In MongoDB have not transactions but we have atomic operations.

Atomic operation: we works in a single document, the works will completed before anyone else sees the document. They will see all the changes or none of them.
Doing atomic operations can get the same of transactions in retational databases and the reason is that in a relational database you can begin a transaction through some tables that needs to be join at a time and in mongoDB if you have all the data embedded, the transaction is also completed at a time.

In MongoDB we have three options to simulate transactions:
  1. Reestructurate de code: in order to embedded data using only one transaction to get all the necessary data
  2. Implement transactions in software: implementing critical sections to find and modify, semaphores, etc.
  3. Tolerate consistency: often works in modern web apllications and others aplications that assume that in transmit data is just tolerate a little bit of inconsistency. Ex. People updating information in facebook can see information of their friends that has not yet updated.

Examples of operations that operate atomically within a single document:



Relationships. Denormalize
While we do not duplicate data, we will not be vulnerable to modify data.

One to One Relations: Always embedded
Considerations depending of access to data and hoe frequency access to a piece of the data (example of collections of employee and Resume (CV) )
  • frequency access: to one document respect the other. If a document has a lot of information that do not need to update frequency you will choose separate documents in order to get best performance.
  • if the size of the elements of one document are growing all the time or not, you can decide to separate the collections in order not to load a lot of information every time you can update some specific piece of information and of course if the info of the one document is larger of 16MB (multimedia info, events history, etc)
  • atomicity of data: in mongoDb there is not transactions but only atomic operations in individual documents. If you do not accept any inconsistency and you want to update the separated documents all the same time, you can embedd the documents to update all in one.
 
The good reason to keep two documents that are related to each other one-to-one in separate collecions are:
Is perfectly secure to embed data because yo are not duplicating data

One To Many Relations
Only embedded if we have one to few that's much more easy to modelate in mongodb. it is recommended to represent a one to many relationship in multiple collections when the many is large
 
Embedding will work so good without data duplicated if embedding the many to the ones. If you want to go to the ones to the many, linking will prevent data duplicated.
 
If you need embed data due to performance of application dessign pattern thie will have sense if you have diplicity of data specially if your data changes continuesly or updates a lot

It is recommended to represent a one to many relationship in multiple collections:

Many To Many Relations
If we have few to few we have to model in mongodb embedding documents.
 
To prevent problems with denormalization we have to make relationship linking with arrays of ObjectId in the documents.

Multikeys indexes
Arrays in the multikeys indexes allow find documents so fast

Benefits of embedding
The data in the disk is continuesly and the performance to read is high

Trees
mongodb can list ancestors and children. The best way is include an array in the document with the ancestors of this document sorted in order to find all the parent documents of the present document. Example:
 
Given the following typical document for a e-commerce category hierarchy collection called categories
 
{
  _id: 34,
  name : "Snorkeling",
  parent_id: 12,
  ancestors: [12, 35, 90]
}
 
Which query will find all descendants of the snorkeling category?
 

When to denormalize
while we do not duplicate data, we will not be vulnerable to modify data

Handling Blobs
GridFS stores large files and blobs in two collections, one for metadata and one for the blob chunks.

No hay comentarios:

Publicar un comentario