A small introduction to MongoDB

A small introduction to MongoDB

Published on: Category: Java & Web

My goal in this post is to give you a small introduction that can help you to decide how and when to use MongoDB in projects and in conjunction with certain tools and facilities. MongoDB is simple and easy to install open source NoSQL database with drivers supporting many programming languages. It is also easy to scale and to implement high availability models.

Along the 9 years of existence MongoDB became the most popular NoSQL document oriented database. You can change your schema dynamically and for this reason you can adapt your application to the market quickly. Usually it runs on commodity servers and you don’t need DBAS, so it is a very affordable database.

Terms, structure and commands

The terms, structure and commands are completely different between MongoDB and SQL relational database implementations. In this table you will see some SQL and MongoDB terms.

SQL term
MongoDB term
Database (schema)
Database
Table
Collection
Index
Index
Row
Document
Column
Field
Joining
Linking & embedding
Partition
Shard
SQL commands
MongoDB commands
create table people(...);
db.people.insertOne(...);
alter table people ...
db.people.updateMany(...);
drop table people ...
db.people.drop(...);
insert into people (...)
db.people.insertOne(...);
select * from people
db.people.find(...);

Paradigms

Let me explain you the difference between some of the paradigms involving atomicity, distributed transactions and rollbacks.

  • Atomicity of transactions: only single document can be changed. If you are executing an operation with multiples documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interfere. There is an operator ($isolated) that you can use to change multiple documents, but it doesn’t provide all-or-nothing atomicity. In SQL databases you can update multiple records atomically.
  • Distributed transactions: if you need to perform multiple transactions (two phase commits) across multiple nodes you have to keep the state of your transaction during the duration of the whole process in case you need do rollback. In SQL implementations you can make use of distributed transactions and two phase commits. You can find a very good explanation about how to solve this kind of problem following this link.
  • Rollback: MongoDB doesn’t support Rollback, something very common in SQL implementations.

Data model

The most important decision that you have to make when decided to use MongoDB is related with your data model. When using relational DBs usually you use the normalization levels 1, 2 and 3 to have a strong and secure data model with a lot of restrictions degrading the performance.

If you choose a Non-relational database probably you are not looking for restrictions but to increase the performance instead.
Let me show you an example of an implementation of User and Address entities.

Example of a relational database implementation:

In this example we created three entities and followed the basic principles of normalizations. If we need to change the name of one City we have to apply this change only into the City table, which is pretty simple.
It is a very good implementation in case of many updates, but to execute complex queries where we have to perform several joins between tables, it might be expensive.

Example of a MongoDB database implementation:

For a non-relational database implementation we don’t need to execute many joins to fetch the entire entry. As you can see we have the User entity with the Address entity embedded into the User and also the City entity is embedded into the Address entity.

We are not following the normalization standards in order to increase the performance. Using the NoSQL model referred above we have a better performance to insert and to fetch entries but not to execute some updates. Maybe we can face some problems if we need to change the name of the City, in this case we have to update all rows with the same name of this city, but how many times the names of the cities or the streets change?

These are the kind of questions that you have to think about before starting to model your database schema.

Schema less

MongoDB store data into documents and for this reason it is very flexible to adapt to different structure. Because every document has its own structure and you don’t need to create your table before insert the data. Normally you execute only the insert operation and the database will create the new structure for you.
If you execute this command for the first time:

  1. db.restaurant.insert({
  2.           "name":"Lumpia sauce restaurant",
  3.           “category”: “restaurants”,
  4.           “type”: ”Philippine”
  5. });

It will create the collection 'restaurant' and insert the document automatically without execute any DML script.

Replica Set

Replicas are a set of MongoDB instances running to provide high availability, redundancy and automatic failover. One member will be the primary node and others are secondary nodes. Your application will read and write only into the primary and the operations will be applied to the secondary nodes asynchronously. When the primary doesn’t communicate with secondary nodes, then they have an election to choose new primary node so that normal operations continue. This is the automatic failover resource.

Based on the above diagram (extracted from the official documentation) you can have a better understanding about the structure of the MongoDB Replica Set.

Sharding

You can think a Shard as a set of instances running with a subset of data. The chunks of data will be defined based on a key that you have to choose to split the data into multiple shards. Each subset can also be replicated into a replica set. In this kind of configuration, you have a router server(s), a config server(s) and the shards. Your application accesses the router nodes to execute the operations, the router accesses the config servers to fetch information about the location of the data and the shards to execute the operations.

Aggregation

MongoDB offers following ways to perform aggregation. This is useful when we have Shard resource and multiple instances running (maybe hundreds or thousands):

  • Aggregation pipeline
  • Map-reduce function
  • Single purpose aggregation methods

In these first two cases you can perform aggregations on sharded collections, only with single purpose aggregation method you cannot do it.

Graphical user interfaces

You can find different GUI to use with your instance of MongoDB, some of them are free but you can also find paid versions if you need more resources.

Conclusion

MongoDB in short:

  • A free and open source database
  • High availability
  • Redundancy
  • An aggregation framework
  • Sharding
  • A possibility to map your Java entities to documents

Apart from that  MongoDB is simple to install and to use. The most common features can help you to solve the common critical problems in your daily routine. You can change dynamically your schemas without writing DML scripts. You can adapt your application to the market really fast and due to the simplicity of the basic maintenance tasks you don’t need DBA’s to keep your instances running and it saves a lot of money. 

Example

You are going to build a content management system (CMS) and the users can create a page dynamically, inserting and deleting some fields, creating some rules, validating etcetera. You have to change the schema dynamically for each new page created or updated.

How can you do it without execute any DML script? MongoDB fits perfectly in this case.

What's next?

If you need a database mature, cheap, reliable, scalable, very flexible and easy to maintain, MongoDB can be the correct tool for you. Also for applications designed to manipulate big amounts of data, or that needs to adapt fast but needs less transactional control instead of performance. Provided you know what you are doing because some features can be tricky. The document structure orientation is a very good and dangerous resource because you can lose the control in case of multiple changes without tracking these changes properly.

The shard feature needs to be built correctly to avoid some problems in future. For example: if one of the keys needs to be changed or if migration of the server needs to happen.
You can find a comparison between SQL and MongoDB commands on this page

In case you are planning to use MongoDB with Java, you should have a look at Morphia framework. Morphia is an open source framework to persist data, written over mongo java driver and maintained by MongoDB company. With this framework you can easily use annotations to map your Java entities to MongoDB documents and vice-versa and you can also add support to use custom types.

You can read more about this framework following this link

About the author Ivan Sotelo Codo

More posts by Ivan Sotelo Codo
Comments (3)
  1. om 10:10

    Nice article. Another example in which a document database can be a good choice is when you need to store logging or business intelligence reporting data. This because the fields that you want to store vary a lot over time.

    1. om 23:11

      Bram,
      Logs should be stored in (indexed) search-analytical databases for end-user use or into Kafka for streaming use. Logs can grow too fast for MongoDB to keep up, because it doesn't provide enough features to handle logs. Logs for archival, can go to archive systems or object storage.

      I have talked to customers acknowledging that storing BI reporting data in a document store (like MongoDB) was the worst choice. So be careful. Proper xOLAP tools just run queries that document databases can't handle across the documents stored, when they want to analyse trends or multi-dimensional analytical calculations. Solution: they choose to export the data, or first develop some code to get the data in the right (fixed) format first.
      Only fixed format reporting (published or PDF reporting) would fit, but the industry standard here is XML or things like XBRL, but it's a niche and we have tools/platforms for that nowadays.

      It can all be very simple: combine relational and document data in a single database, there are many polyglot databases nowadays (MarkLogic, Oracle). I would never consider another databasestorage architecture when you already know multiple usescases, like you mention in your comment.

    2. om 13:01

      I agree with Hasso regarding store logs into MongoDB because we already have better open source options on the market.

      However, I don't think the use of multiple databases in one application could be a problem nowadays and we also need to keep in mind that both databases mentioned are paid and MongoDB is open source and free. For sure it always depends on how much the company is able to spend or if they already are using these databases, etc.

      Thanks for the comments!

Reply