A small introduction to MongoDB

A small introduction to MongoDB

Published on: Category: Java & Web

When do you use MongoDB? And how? In this blog, I will help you answer both questions. But first, a short introduction: MongoDB is a simple and easy to install open source NoSQL database, with drivers supporting many programming languages. It is easy to scale and to implement high availability models.

During its 9 years of existence, MongoDB became the most popular NoSQL document oriented database. Probably because it allows you to change your schema dynamically, which means you can quickly adapt your application to the market. Because MongoDB runs on commodity servers, you don’t need DBAs. So it is a very affordable database.

Terms, structure and commands

The terms, structure, and commands between MongoDB and SQL relational database implementations are completely different. In this table, you can see a few SQL and MongoDB terms.

SQL term
MongoDB term
Database (schema)
Database
Table
Collection
Index
Index
Row
Document
Column
Field
Joining
Linking & embedding
Partition
Shard
SQL commands
MongoDB commands
create table people(...);
db.people.insertOne(...);
alter table people ...
db.people.updateMany(...);
drop table people ...
db.people.drop(...);
insert into people (...)
db.people.insertOne(...);
select * from people
db.people.find(...);

Paradigms

Let me explain the difference between some of the paradigms involving atomicity, distributed transactions and rollbacks.

  • Atomicity of transactions: only a single document can be changed. If you are executing an operation with multiples documents, the modification of each document is atomic. The operation as a whole isn’t atomic though, so other operations may interfere. There is an operator ($isolated) that you can use to change multiple documents, but it doesn’t provide all-or-nothing atomicity. In SQL databases, you can update multiple records atomically.
  • Distributed transactions: if you need to perform multiple transactions (two phase commits) across multiple nodes, you have to keep track of the state of your transaction during the whole process, in case you need to do a rollback. In SQL implementations, you can make use of distributed transactions and two phase commits. You’ll find a very good explanation about how to solve this kind of problem in this link.
  • Rollback: MongoDB doesn’t support Rollback, which is very common in SQL implementations.

Data model

The most important decision you have to make when deciding whether or not to use MongoDB is related to your data model. Usually, when using relational DBs, you use the normalization levels 1, 2 and 3 to have a strong and secure data model with a lot of restrictions degrading the performance.

If you choose a Non-relational database, you’re probably not looking for restrictions but to increase the performance.
Let me show you an example of an implementation of User and Address entities.

Example of a relational database implementation:

In this example, we created three entities and followed the basic principles of normalizations. If we need to change the name of one City, we only need to apply this change into the City table. Which is pretty simple.
It is a very good implementation in case of many updates, but when you want to execute complex queries where we have to perform several joins between tables, it might be expensive.

Example of a MongoDB database implementation:

For a non-relational database implementation, we don’t need to execute a lot of joins to fetch the entire entry. As you can see, we have embedded the User entity with the Address entity into the User. The City entity is also embedded into the Address entity.

We are not following the normalization standards in order to increase the performance. Using the NoSQL model referred above, we now have a better performance to insert and fetch entries with, but not to execute updates. We might face problems if we need to change the name of the City. In this case, we have to update all the rows with the same name, but how many times do the names of the cities or streets change?

These are the kind of questions you have to think about before starting to model your database schema.

Schemaless

MongoDB stores data into documents. For this reason, it is very flexible to adapt to a different structure. Every document has its own structure, so you don’t need to create your table before inserting the data. Normally you execute only the insert operation, and the database will create the new structure for you.
If you execute this command for the first time:

  1. db.restaurant.insert({
  2.           "name":"Lumpia sauce restaurant",
  3.           “category”: “restaurants”,
  4.           “type”: ”Philippine”
  5. });

It will create the collection 'restaurant' and insert the document automatically without executing any DML script.

Replica Set

Replicas are a set of MongoDB instances, running to provide high availability, redundancy, and automatic failover. One member will be the primary node and the others are secondary nodes. Your application will read and write only into the primary. The operations will be applied to the secondary nodes asynchronously. When the primary doesn’t communicate with secondary nodes, they have an election to choose a new primary node so that the normal operations can continue. This is the automatic failover resource.

Based on the diagram above (extracted from the official documentation) you can have a better understanding about the structure of the MongoDB Replica Set.

Sharding

You can think of a Shard as a set of instances running with a subset of data. The chunks of data will be defined based on a key that you have to choose to split the data into multiple shards. Each subset can also be replicated into a replica set. In this kind of configuration, you have a router server(s), a config server(s) and the shards. Your application accesses the router nodes to execute the operations, the router accesses the config servers to fetch information about the location of the data and the shards to execute the operations.

Aggregation

MongoDB offers the following ways to perform aggregation. This is useful when we have Shard resource and multiple instances running (maybe hundreds or thousands):

  • Aggregation pipeline
  • Map-reduce function
  • Single purpose aggregation methods

In the first two cases, you can perform aggregations on sharded collections. With the single purpose aggregation method, you can’t.

Graphical user interfaces

You can find different GUIs to use with your instance of MongoDB. Some of them are free, but you can also find paid versions if you need more resources.

Conclusion

MongoDB in short:

  • A free and open source database
  • High availability
  • Redundancy
  • An aggregation framework
  • Sharding
  • A possibility to map your Java entities to documents

MongoDB is simple to install and use. Its most common features can help you solve the ‘normal’, critical problems in your daily routine. You can change your schemas dynamically without writing DML scripts. You can adapt your application to the market really fast and due to the simplicity of the basic maintenance tasks you don’t need DBAs to keep your instances running. And last but not least: it saves a lot of money. 

Example

You want to build a content management system (CMS) where users can create a page dynamically, inserting and deleting some fields, creating some rules, validating etcetera. This means you have to change the schema dynamically for each page created or updated.

How can you do this without executing any DML script? MongoDB fits perfectly in this case.

What's next?

If you need a mature, cheap, reliable, scalable, very flexible and easy to maintain database, MongoDB could be the correct tool for you. It’s also suited for applications designed to manipulate big amounts of data, or that need to adapt fast and need less transactional control instead of performance. Provided you know what you’re doing, because some features can be tricky. The document structure orientation is a good but dangerous resource, because you can easily lose control in case of multiple changes without tracking these changes properly.

The shard feature needs to be built correctly to avoid problems in the future. For example: if one of the keys needs to be changed, or if migration of the server needs to happen.
You can find a comparison between SQL and MongoDB commands on this page

In case you are planning to use MongoDB with Java, you should have a look at Morphia framework. Morphia is an open source framework to persist data, written over Mongo Java Driver and maintained by MongoDB company. With this framework, you can easily use annotations to map your Java entities to MongoDB documents and vice-versa. You can also add support to use custom types.

You can read more about this framework following this link

About the author Ivan Sotelo Codo

More posts by Ivan Sotelo Codo
Comments (3)
  1. om 10:10

    Nice article. Another example in which a document database can be a good choice is when you need to store logging or business intelligence reporting data. This because the fields that you want to store vary a lot over time.

    1. om 23:11

      Bram,
      Logs should be stored in (indexed) search-analytical databases for end-user use or into Kafka for streaming use. Logs can grow too fast for MongoDB to keep up, because it doesn't provide enough features to handle logs. Logs for archival, can go to archive systems or object storage.

      I have talked to customers acknowledging that storing BI reporting data in a document store (like MongoDB) was the worst choice. So be careful. Proper xOLAP tools just run queries that document databases can't handle across the documents stored, when they want to analyse trends or multi-dimensional analytical calculations. Solution: they choose to export the data, or first develop some code to get the data in the right (fixed) format first.
      Only fixed format reporting (published or PDF reporting) would fit, but the industry standard here is XML or things like XBRL, but it's a niche and we have tools/platforms for that nowadays.

      It can all be very simple: combine relational and document data in a single database, there are many polyglot databases nowadays (MarkLogic, Oracle). I would never consider another databasestorage architecture when you already know multiple usescases, like you mention in your comment.

    2. om 13:01

      I agree with Hasso regarding store logs into MongoDB because we already have better open source options on the market.

      However, I don't think the use of multiple databases in one application could be a problem nowadays and we also need to keep in mind that both databases mentioned are paid and MongoDB is open source and free. For sure it always depends on how much the company is able to spend or if they already are using these databases, etc.

      Thanks for the comments!

Reply