How to create a new collection in MongoDB by joining two collections
If you use MongoDB, you probably understand all thereâs to know about it, so I recommend you skip this blog post here, but if you are starting with MongoDB, see this blog post as a gentle introduction đ.
Before starting
This blog post assumes you already have an account set up with MongoDB. Remember that if you donât have one yet, you can create one for free here, and MongoDB has a free tier you can set up while creating your account.
Youâll be using the sample data from MongoDB to learn how to do an aggregation and create a new collection.
Getting acquainted with the data
In Mongo, a dataset is called a collection; think of it as a special spreadsheet. Whereas the records in a spreadsheet are lines, in MongoDB, a record is called a document. Youâll use the collection sample_mflix
that contains movies, comments, users, etc.
Make a basic filter
Your objective is to create a collection containing all the comments each movie received in August 2002.
Start by going into the projectâs database. My project is called âProject 0â and has the âExampleâ database comprising three databases and 15 collections.
Each project has a set of tabs, one of which is the âCollectionsâ tab, where you can see the databases and the documents in each collection, as shown in the image above. From within the Collections tab, you can see the data and make simple queries if you want.
{date: {$gte:ISODate('2002-08-18')}}
For example, the query above shows every document with a date value later than August 18th, 2002.
Using queries like that is great to see the preliminary results of the collection you are trying to build. But to actually create the collection, we need to use aggregations.
Constructing the aggregation
Aggregations are one way you can build collections inside MongoDB, and thereâs a tab for that, as shown in the image below.
To see the Aggregation tab, you need to click on the collection you want to investigate or use as the basis for your new collection. On the aggregation tab, youâll see the pipeline builder. A pipeline is a set of steps; each step is called a stage; each stage does one thing and one thing only. You will do three stages:
- Filter comments based on a date;
- Add the movieâs details into each comment;
- Output the results into a brand new collection.
Filter comments
To avoid dealing with unnecessary data, that is, data outside of your interest date window, the first thing youâll want to do is find the comments in that particular period (August 2002).
On the drop-down menu for the first stage, select the $match
operator; this operator will allow you to filter the records based on the date. Note that once you select the operator, Mongo will auto-populate the editable field with the standard structure for that operator.
Now you need to add the following code in the query section there:
Once that is done, youâll see that the sample result will start displaying.
Now letâs break down each step of that operation, shall we?
- Lines 1-4: Just comments, mainly to explain what the operator is. Note that MQL stands for MongoDB Query Language;
- Lines 6 (
date:
): corresponds to the field you want to use to filter your date; - Line 7 (
$gte:ISODate("2002-08-01")
): this line uses the$gte
operator to grab all dates after the date returned in ISODate format; - Line 8 (
$lt:ISODate("2002-08-31")
): this line uses the$lt
operator to grab all dates before the date returned in ISODate format.
This operator result will be all the documents between August 1st and August 31st 2002.
Adding movie information to each comment
Youâll notice that the comments donât have the movie information, but the movie identification number (ID) is present, as shown in the image below.
Including the movie information in the filtered comments you just got from the first stage is the second step of our aggregation. Click the âAdd stageâ button below the first stage to start working on adding the movie information to the collection using the $lookup
operator.
The $lookup
operator does a âleft outer joinâ. Think of it as a âfiltered joinâ. Based on the collection on the âleftâ (comments), select the documents on the ârightâ (movies) that match a given field in both, in this case, the movie id. This way, we donât have to care about removing movies without comments, but all comments from the time window.
On the drop-down menu for the second stage, select the $lookup
operator, as shown in the image above. Once again, Mongo will pre-fill the operator sample code, and you can update the code section with the code below.
Once more, letâs break down each step of that operation:
- Lines 1-9: Just comments, mainly to explain what the operator is;
- Line 11 (
from: "movies",
): Since the left side in the dashboard is the collection from which you are running the pipeline, this is where you define the ârightâ side of the join, so you chose the âmoviesâ collection to add the movie information into the comments; - Line 12 (
localField: "movie_id",
): the field on the current collection (comments) that correspond to a field in the other (movies) collection; - Line 13 (
foreignField: "_id",
): the field on the other collection; - Line 14 (
as: "movie_info"
): the field name to add information in, note that the lookup will add the information as an array.
And you should see the results show up like in the image below.
All the data manipulation is done. Time to save the results and actually create your new collection.
Save the results
Once again, click the âAdd stageâ button to create a stage that outputs the result into a new collection, and on the drop-down menu for the third stage, select the $out
operator.
Now update the code in the code field in the out section with the code below.
Letâs break down this part.
- Lines 1-4: Just comments, mainly to explain what the operator is;
- Line 5 (
'augustmoviescomments'
): is the name for the new collection;
After filling out the new collectionâs name, you should click the âSave documentsâ button.
Once the collection is created (it might take a few seconds), youâll see a message stating that documents were persisted in the collection and a âGo to collectionâ link that you should click.
Clicking the âGo to collectionâ link will open a new tab. If you do not want to see another tab, refresh the page, and youâll see the augustmoviecomments
show up on the left-hand side menu.
Now that your collection is ready, you can see the collection analytics data, like how many activities were logged and how much space that takes into our storage.
Recap
Now you know how to create collections by aggregating two other collections using pipelines. You also know:
- Datasets in MongoDB are called collections;
- One collection is a set of documents;
- To create collections by joining other collections, you can use pipelines;
- Pipelines are a set of steps called stages;
- Each stage does only one thing.
Now you can explore what else to use pipelines for.