A small introduction to MongoDBGepubliceerd: Auteur: Ivan Sotelo Codo Categorie: Java & Web
Deze blog is alleen beschikbaar in het Engels
When do you use MongoDB? And how? In this blog, I will help you answer both questions. But first, a short introduction: MongoDB is a simple and easy to install open source NoSQL database, with drivers supporting many programming languages. It is easy to scale and to implement high availability models.
During its 9 years of existence, MongoDB became the most popular NoSQL document oriented database. Probably because it allows you to change your schema dynamically, which means you can quickly adapt your application to the market. Because MongoDB runs on commodity servers, you don’t need DBAs. So it is a very affordable database.
Terms, structure and commands
The terms, structure, and commands between MongoDB and SQL relational database implementations are completely different. In this table, you can see a few SQL and MongoDB terms.
Let me explain the difference between some of the paradigms involving atomicity, distributed transactions and rollbacks.
- Atomicity of transactions: only a single document can be changed. If you are executing an operation with multiples documents, the modification of each document is atomic. The operation as a whole isn’t atomic though, so other operations may interfere. There is an operator ($isolated) that you can use to change multiple documents, but it doesn’t provide all-or-nothing atomicity. In SQL databases, you can update multiple records atomically.
- Distributed transactions: if you need to perform multiple transactions (two phase commits) across multiple nodes, you have to keep track of the state of your transaction during the whole process, in case you need to do a rollback. In SQL implementations, you can make use of distributed transactions and two phase commits. You’ll find a very good explanation about how to solve this kind of problem in this link.
- Rollback: MongoDB doesn’t support Rollback, which is very common in SQL implementations.
The most important decision you have to make when deciding whether or not to use MongoDB is related to your data model. Usually, when using relational DBs, you use the normalization levels 1, 2 and 3 to have a strong and secure data model with a lot of restrictions degrading the performance.
If you choose a Non-relational database, you’re probably not looking for restrictions but to increase the performance.
Let me show you an example of an implementation of User and Address entities.
Example of a relational database implementation:
In this example, we created three entities and followed the basic principles of normalizations. If we need to change the name of one City, we only need to apply this change into the City table. Which is pretty simple.
It is a very good implementation in case of many updates, but when you want to execute complex queries where we have to perform several joins between tables, it might be expensive.
Example of a MongoDB database implementation:
For a non-relational database implementation, we don’t need to execute a lot of joins to fetch the entire entry. As you can see, we have embedded the User entity with the Address entity into the User. The City entity is also embedded into the Address entity.
We are not following the normalization standards in order to increase the performance. Using the NoSQL model referred above, we now have a better performance to insert and fetch entries with, but not to execute updates. We might face problems if we need to change the name of the City. In this case, we have to update all the rows with the same name, but how many times do the names of the cities or streets change?
These are the kind of questions you have to think about before starting to model your database schema.
MongoDB stores data into documents. For this reason, it is very flexible to adapt to a different structure. Every document has its own structure, so you don’t need to create your table before inserting the data. Normally you execute only the insert operation, and the database will create the new structure for you.
If you execute this command for the first time:
- "name":"Lumpia sauce restaurant",
- “category”: “restaurants”,
- “type”: ”Philippine”
It will create the collection 'restaurant' and insert the document automatically without executing any DML script.
Replicas are a set of MongoDB instances, running to provide high availability, redundancy, and automatic failover. One member will be the primary node and the others are secondary nodes. Your application will read and write only into the primary. The operations will be applied to the secondary nodes asynchronously. When the primary doesn’t communicate with secondary nodes, they have an election to choose a new primary node so that the normal operations can continue. This is the automatic failover resource.
Based on the diagram above (extracted from the official documentation) you can have a better understanding about the structure of the MongoDB Replica Set.
You can think of a Shard as a set of instances running with a subset of data. The chunks of data will be defined based on a key that you have to choose to split the data into multiple shards. Each subset can also be replicated into a replica set. In this kind of configuration, you have a router server(s), a config server(s) and the shards. Your application accesses the router nodes to execute the operations, the router accesses the config servers to fetch information about the location of the data and the shards to execute the operations.
MongoDB offers the following ways to perform aggregation. This is useful when we have Shard resource and multiple instances running (maybe hundreds or thousands):
- Aggregation pipeline
- Map-reduce function
- Single purpose aggregation methods
In the first two cases, you can perform aggregations on sharded collections. With the single purpose aggregation method, you can’t.
Graphical user interfaces
You can find different GUIs to use with your instance of MongoDB. Some of them are free, but you can also find paid versions if you need more resources.
MongoDB in short:
- A free and open source database
- High availability
- An aggregation framework
- A possibility to map your Java entities to documents
MongoDB is simple to install and use. Its most common features can help you solve the ‘normal’, critical problems in your daily routine. You can change your schemas dynamically without writing DML scripts. You can adapt your application to the market really fast and due to the simplicity of the basic maintenance tasks you don’t need DBAs to keep your instances running. And last but not least: it saves a lot of money.
You want to build a content management system (CMS) where users can create a page dynamically, inserting and deleting some fields, creating some rules, validating etcetera. This means you have to change the schema dynamically for each page created or updated.
How can you do this without executing any DML script? MongoDB fits perfectly in this case.
If you need a mature, cheap, reliable, scalable, very flexible and easy to maintain database, MongoDB could be the correct tool for you. It’s also suited for applications designed to manipulate big amounts of data, or that need to adapt fast and need less transactional control instead of performance. Provided you know what you’re doing, because some features can be tricky. The document structure orientation is a good but dangerous resource, because you can easily lose control in case of multiple changes without tracking these changes properly.
The shard feature needs to be built correctly to avoid problems in the future. For example: if one of the keys needs to be changed, or if migration of the server needs to happen.
You can find a comparison between SQL and MongoDB commands on this page.
In case you are planning to use MongoDB with Java, you should have a look at Morphia framework. Morphia is an open source framework to persist data, written over Mongo Java Driver and maintained by MongoDB company. With this framework, you can easily use annotations to map your Java entities to MongoDB documents and vice-versa. You can also add support to use custom types.
You can read more about this framework following this link.