A small introduction to MongoDBPublished on: Author: Ivan Sotelo Codo Category: Java & Web
My goal in this post is to give you a small introduction that can help you to decide how and when to use MongoDB in projects and in conjunction with certain tools and facilities. MongoDB is simple and easy to install open source NoSQL database with drivers supporting many programming languages. It is also easy to scale and to implement high availability models.
Along the 9 years of existence MongoDB became the most popular NoSQL document oriented database. You can change your schema dynamically and for this reason you can adapt your application to the market quickly. Usually it runs on commodity servers and you don’t need DBAS, so it is a very affordable database.
Terms, structure and commands
The terms, structure and commands are completely different between MongoDB and SQL relational database implementations. In this table you will see some SQL and MongoDB terms.
Let me explain you the difference between some of the paradigms involving atomicity, distributed transactions and rollbacks.
- Atomicity of transactions: only single document can be changed. If you are executing an operation with multiples documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interfere. There is an operator ($isolated) that you can use to change multiple documents, but it doesn’t provide all-or-nothing atomicity. In SQL databases you can update multiple records atomically.
- Distributed transactions: if you need to perform multiple transactions (two phase commits) across multiple nodes you have to keep the state of your transaction during the duration of the whole process in case you need do rollback. In SQL implementations you can make use of distributed transactions and two phase commits. You can find a very good explanation about how to solve this kind of problem following this link.
- Rollback: MongoDB doesn’t support Rollback, something very common in SQL implementations.
The most important decision that you have to make when decided to use MongoDB is related with your data model. When using relational DBs usually you use the normalization levels 1, 2 and 3 to have a strong and secure data model with a lot of restrictions degrading the performance.
If you choose a Non-relational database probably you are not looking for restrictions but to increase the performance instead.
Let me show you an example of an implementation of User and Address entities.
Example of a relational database implementation:
In this example we created three entities and followed the basic principles of normalizations. If we need to change the name of one City we have to apply this change only into the City table, which is pretty simple.
It is a very good implementation in case of many updates, but to execute complex queries where we have to perform several joins between tables, it might be expensive.
Example of a MongoDB database implementation:
For a non-relational database implementation we don’t need to execute many joins to fetch the entire entry. As you can see we have the User entity with the Address entity embedded into the User and also the City entity is embedded into the Address entity.
We are not following the normalization standards in order to increase the performance. Using the NoSQL model referred above we have a better performance to insert and to fetch entries but not to execute some updates. Maybe we can face some problems if we need to change the name of the City, in this case we have to update all rows with the same name of this city, but how many times the names of the cities or the streets change?
These are the kind of questions that you have to think about before starting to model your database schema.
MongoDB store data into documents and for this reason it is very flexible to adapt to different structure. Because every document has its own structure and you don’t need to create your table before insert the data. Normally you execute only the insert operation and the database will create the new structure for you.
If you execute this command for the first time:
- "name":"Lumpia sauce restaurant",
- “category”: “restaurants”,
- “type”: ”Philippine”
It will create the collection 'restaurant' and insert the document automatically without execute any DML script.
Replicas are a set of MongoDB instances running to provide high availability, redundancy and automatic failover. One member will be the primary node and others are secondary nodes. Your application will read and write only into the primary and the operations will be applied to the secondary nodes asynchronously. When the primary doesn’t communicate with secondary nodes, then they have an election to choose new primary node so that normal operations continue. This is the automatic failover resource.
Based on the above diagram (extracted from the official documentation) you can have a better understanding about the structure of the MongoDB Replica Set.
You can think a Shard as a set of instances running with a subset of data. The chunks of data will be defined based on a key that you have to choose to split the data into multiple shards. Each subset can also be replicated into a replica set. In this kind of configuration, you have a router server(s), a config server(s) and the shards. Your application accesses the router nodes to execute the operations, the router accesses the config servers to fetch information about the location of the data and the shards to execute the operations.
MongoDB offers following ways to perform aggregation. This is useful when we have Shard resource and multiple instances running (maybe hundreds or thousands):
- Aggregation pipeline
- Map-reduce function
- Single purpose aggregation methods
In these first two cases you can perform aggregations on sharded collections, only with single purpose aggregation method you cannot do it.
Graphical user interfaces
You can find different GUI to use with your instance of MongoDB, some of them are free but you can also find paid versions if you need more resources.
MongoDB in short:
- A free and open source database
- High availability
- An aggregation framework
- A possibility to map your Java entities to documents
Apart from that MongoDB is simple to install and to use. The most common features can help you to solve the common critical problems in your daily routine. You can change dynamically your schemas without writing DML scripts. You can adapt your application to the market really fast and due to the simplicity of the basic maintenance tasks you don’t need DBA’s to keep your instances running and it saves a lot of money.
You are going to build a content management system (CMS) and the users can create a page dynamically, inserting and deleting some fields, creating some rules, validating etcetera. You have to change the schema dynamically for each new page created or updated.
How can you do it without execute any DML script? MongoDB fits perfectly in this case.
If you need a database mature, cheap, reliable, scalable, very flexible and easy to maintain, MongoDB can be the correct tool for you. Also for applications designed to manipulate big amounts of data, or that needs to adapt fast but needs less transactional control instead of performance. Provided you know what you are doing because some features can be tricky. The document structure orientation is a very good and dangerous resource because you can lose the control in case of multiple changes without tracking these changes properly.
The shard feature needs to be built correctly to avoid some problems in future. For example: if one of the keys needs to be changed or if migration of the server needs to happen.
You can find a comparison between SQL and MongoDB commands on this page.
In case you are planning to use MongoDB with Java, you should have a look at Morphia framework. Morphia is an open source framework to persist data, written over mongo java driver and maintained by MongoDB company. With this framework you can easily use annotations to map your Java entities to MongoDB documents and vice-versa and you can also add support to use custom types.
You can read more about this framework following this link.