CRUD Minus R on Mongo DB

20,000 American Flags are displayed for every resident of Massachusetts who died in a war over the past 100 years, Boston Common, Boston, MA, Memorial Day, 2011

It is a beautiful day in the Twin Cities of Minneapolis and St. Paul part of the Memorial Day 2018 weekend. The forecast calls for very warm days. On Monday the temperature will be reaching 95F. My wife and I decided to go for a walk first thing in the morning. Today Saturday we got up at our usual time (05:30 AM CDT), had breakfast (yogurt, granola, blueberries, strawberries, bananas and a cup of milk with a shot of espresso) and headed out. As the walk progressed, we ran into several people getting their daily walk out of the way in order to avoid the high temperatures. Showered and sat in front of my computer. I will be working for the next four hours or so. This afternoon my wife and I will attempt a walk around one of the lakes in Minneapolis. Will be very hot but we are planning on a stroll and a stop for ice cream and water.

As you know I like to read and take technical courses. In order to fully understand the subject I like to experiment with the material. I am currently reading chapter four of “MongoDB – The Definitive Guide” by Kristina Chodorow so this post will touch on my experimentation with material found in the book. If you are interested in learning MongoDB I strongly recommend the book. I purchased my copy via Amazon Prime.

The title of this post might be somewhat misleading. MongoDB is a NoSQL document database and it supports CRUD (Create, Read, Update and Delete) operations. Reading (finding records) is a more extensive topic so the author of the book decided to dedicate chapter four to reading. I will cover that chapter in my next post.

As you know there are different ways to access a MongoDB database. You could take the simplest path and use Compass which provides a nice graphical user interface. It is fine for simple operations, but if you really want to understand and experiment with database commands the MongoDB console is the way to go. In a future post I will cover a third way which is using the MongoDB API from different programming languages. I will do so using Java.

Let’s start by opening a MongoDB shell from a command prompt on a Windows computer:

C:\>mongo
MongoDB shell version v3.6.4
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.4
Server has startup warnings:
:::: :::: ::::
>

I will be using the “:::: :::: ::::” string to indicate that some non-relevant text has been omitted.

Let’s create a very simple DICOM (Digital Imaging and Communications in Medicine) database which we will use as a theme in this post. DICOM specifies the following levels: patient, study, series and image. A patient object represents an individual. A study represents a set of one or more series. For example, a referring physician might request a CT study with and without contrast. That implies that a study for a patient will include two series; one with contrast and one without contrast. The image level includes an entry per image that corresponds to each series. A CT scan, depending on the part of the body, may contain from a few dozen to several hundred images.

> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
towns   0.000GB
> use dicom
switched to db cake
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
towns   0.000GB
>

As you can see, we started using the “dicom” database but it has not been created yet. It will be created when we add a document / record to the “patient” collection as illustrated by:

> db.patient.insert( { name : "John Canessa" } )
WriteResult({ "nInserted" : 1 })
> db.patient.find()
{ "_id" : ObjectId("5b096f6b8a7d4656190e1795"), "name" : "John Canessa" }
>

Please note this is an over simplification. A patient record requires dozens of fields with granularity in order to avoid confusing the images from different patients. For example, if the only reference for the patient would be the name, take a look in a phone book and in just a large city you may find dozens of people with the same name (e.g., “Jane Smith” or “Paul Nelson”). We are not going to get into the details that would be needed to be addressed if we would be developing an actual DICOM archive to be used by healthcare facilities.

Let’s now add a record each for a study, series and image:

> db.study.insert( { "studyID" : 12345678 } )
WriteResult({ "nInserted" : 1 })
> db.series.insert( { "seriesID" : 11111111 } )
WriteResult({ "nInserted" : 1 })
> db.image.insert( { "imageID" : 22222222 } )
WriteResult({ "nInserted" : 1 })
> 

Now let’s take a look at the database and associated collections:

> show dbs
admin   0.000GB
config  0.000GB
dicom   0.000GB
local   0.000GB
towns   0.000GB
> use dicom
switched to db dicom
> show collections
image
patient
series
study
>

The dicom database has been created. In addition the necessary collections have also been created. Note we did not use a createDB() or createCollection() command. As soon as we added a document for the study collection the dicom database and the patient collection were created. As soon as we inserted a document into the study, series and image collections the collections were created. This approach is different to the one used by SQL databases in which one needs to explicitly create databases and tables with their associated schemas.

If we have a set of images generated by a CT (Computed Tomography) machine, we might wish to insert them using some type of batch. The following screen capture illustrates how this could be accomplished:

> db.image.insert( [ { imageID: 00000001 }, { imageID: 00000002 }, { imageID: 00000003 }, { imageID: 00000004}, { imageID: 00000005} ] )
BulkWriteResult({
        "writeErrors" : [ ],
        "writeConcernErrors" : [ ],
        "nInserted" : 5,
        "nUpserted" : 0,
        "nMatched" : 0,
        "nModified" : 0,
        "nRemoved" : 0,
        "upserted" : [ ]
})
>

We could take a look at the documents / records in the image collection as follows:

> db.image.find()
{ "_id" : ObjectId("5b0970918a7d4656190e1798"), "imageID" : 22222222 }
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5 }
>

The results indicate that we have successfully inserted five more images into our collection. If we wish to get the count we could use:

> db.image.find().count()
6

Note that each document in the image collection has a unique ObjectID field which was introduced by MongoDB. Such number is used to make each document unique.

We can remove all or some documents from the image collection. The one with imageID = 22222222, does not seem to be correct so let’s remove it:

> db.image.remove( { imageID: 22222222 } )
WriteResult({ "nRemoved" : 1 })
> db.image.find()
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5 }
>

When we have a collection with many documents, for performance reasons, we could delete the documents or we could drop the collection and then recreate it. Let’s take a gander. We could enter the JavaScript code directly into the mongo console and it will work, or we could write a script and then load it. The advantage of the second approach is that we can easily modify the script and run it at different times. Because I am lazy, we will write a script to populate the test collection with some random data. This is illustrated in the following script:

// **** ****
var populateCollection = function() {
    
    // **** ****
    var numberToInsert = 1000000;
    
    // **** get the start time ****
    var start = (new Date()).getTime();
    
    // **** populate the collection ****
    for (var i = 0; i < numberToInsert; i++) {
        db.test.insert( { "i": i, "j": i - 100 } );
    }
    
    // **** get and display the elapsed time ****
    var end = (new Date()).getTime();
    print("<<< elapsed: " + (end - start) + " ms");
    
    // **** get and display the count of documents ****
    var count = db.test.find().count();
    print("<<< count: " + count);
}

// **** populate the collection ****
print("<<< before populateCollection ...");
populateCollection();
print("<<< after populateCollection !!!");

The following screen capture illustrates the previous script executing:

> load("populate_collection.js")
<<< before populateCollection ...
<<< elapsed: 77532 ms
<<< count: 100000
<<< after populateCollection !!! true >

The operation took about 77 seconds to complete.

Now let’s remove all documents from the test collection using the following script:

// **** ****
var removeCollection = function() {
    
    // **** ****
    var count = db.test.find().count();
    print("<<< count: " + count);
    
    // **** ****
    var start = (new Date()).getTime();
   
   // **** remove all documents from collection ****
    db.test.remove( {} );
    
    // **** verify remove completed ****
    db.test.findOne();
   
   // **** get the end time and display elapsed time ****
    var end = (new Date()).getTime();
    print("<<< elapsed: " + (end - start) + " ms");
    
     // **** ****
    var count = db.test.find().count();
    print("<<< count: " + count);   
}

// **** remore all documents from the test collection ****
print("<<< refore removeCollection ...");
removeCollection();
print("<<< after removeCollection !!!");

Following is the run for the script:

> load("remove_collection.js")
<<< refore removeCollection ...
<<< count: 100000
<<< elapsed: 4312 ms
<<< count: 0
<<< after removeCollection !!! true >

The remove operation took about 4 seconds. That was not so bad. What if we had 1,000,000 documents?

> load("populate_collection.js")
<<< before populateCollection ...
<<< elapsed: 779492 ms
<<< count: 1000000
<<< after populateCollection !!! true >

Now let’s remove them:

> load("remove_collection.js")
<<< refore removeCollection ...
<<< count: 1000000
<<< elapsed: 34646 ms
<<< count: 0
<<< after removeCollection !!! true >

The 4 seconds went up to 34 seconds. By dropping the collection and recreating it one could save valuable time if that is a common operation in a particular set of operations. Dropping and recreating a database is also used in SQL based databases.

Document replacement is an operation that is used to update the schema of a database. If you are familiar with SQL databases you would really appreciate this feature. The key advantage is that the update can be performed while the database is operational. The software just needs to be modified to handle the old and new schemas. So let’s see how a document can be replaced given that new documents can immediately be stored using a different schema.

So before we proceed lets drop the test collection.

> show collections
image
patient
series
study
test
> db.test.drop()
true
> show collections
image
patient
series
study
>

Now let’s insert a new document for a new patient:

> db.patient.insert( { firstName: "James", lastName: "Bond" } )
WriteResult({ "nInserted" : 1 })
> db.patient.find()
{ "_id" : ObjectId("5b096f6b8a7d4656190e1795"), "name" : "John Canessa" }
{ "_id" : ObjectId("5b0acf29938a76240a7e2b0d"), "firstName" : "James", "lastName" : "Bond" }
>

Let’s take a look at the updated patient collection:

> db.patient.find()
{ "_id" : ObjectId("5b096f6b8a7d4656190e1795"), "firstName" : "John", "lastName" : "Canessa" }
{ "_id" : ObjectId("5b0acf29938a76240a7e2b0d"), "firstName" : "James", "lastName" : "Bond" }
{ "_id" : ObjectId("5b0c17ed94fd61e1ea91daec"), "name" : "Auric Goldfinger" }
{ "_id" : ObjectId("5b0c180c94fd61e1ea91daed"), "name" : "Emilio Largo" }
>

We have two records with first and last names and two with just name. We would like to update the documents so the entire collection uses the first and last name schema. To get this accomplished we will use the update_patients.js script:

> load("update_patient.js")
<<< before updatePatients ...
<<< firstName: John
<<<  lastName: Canessa

<<< firstName: James
<<<  lastName: Bond

<<<      name: "Auric Goldfinger"

<<<      name: "Emilio Largo"

<<< docCount: 1
<<< docCount: 2
<<< docCount: 3
<<< update myName: "Auric Goldfinger"
<<< myId: 5b0c17ed94fd61e1ea91daec
<<< names[0]: Auric names[1]: Goldfinger
<<< docCount: 4
<<< update myName: "Emilio Largo"
<<< myId: 5b0c180c94fd61e1ea91daed
<<< names[0]: Emilio names[1]: Largo

<<< firstName: John
<<<  lastName: Canessa

<<< firstName: James
<<<  lastName: Bond

<<< firstName: Auric
<<<  lastName: Goldfinger

<<< firstName: Emilio
<<<  lastName: Largo

<<< after updatePatients !!! true >

Let’s look at the contents of the script:

// **** ****
var printPatients = function() {

    // **** get a cursor to traverse the documents ****
    var myCursor = db.patient.find();
  
    // **** loop processing all the documents in the collection ****
    while (true) {
        
        // **** check if we are done with all the documents in the collection ****
        var myDocument = myCursor.hasNext() ? myCursor.next() : null;
        if (myDocument === null) {
            break;
        }
        
        // **** display the data in the document ****
        var myName = myDocument.name;
        if (myName === undefined) {
            
            // **** ****
            var firstName = myDocument.firstName;
            var lastName = myDocument.lastName;
            
            // **** ****
            print("<<< firstName: " + firstName);
            print("<<<  lastName: " + lastName);
        } else {
            print("<<<      name: " + tojson(myName));
        }
        print();
    }
}

// **** update documents ****
var updateDocuments = function() {
    
    var docCount = 0;
    
    // **** get a cursor to traverse the documents ****
    var myCursor = db.patient.find();
    
     // **** loop processing all the documents in the collection ****
    while (true) {
        
        // **** check if we are done with all the documents in the collection ****
        var myDocument = myCursor.hasNext() ? myCursor.next() : null;
        if (myDocument === null) {
            break;
        }
        
        // **** increment and display the count of documents ****
        docCount++;
        print("<<< docCount: " + docCount);
        
        // **** update the document (if needed) ****
        var myFirstName = myDocument.firstName;
        if (myFirstName === undefined) {
 
            // **** get the name from the document ****
            var myName = myDocument.name;
            print("<<< update myName: " + tojson(myName));
            
            // **** get the _id from the document ****
            var myId = myDocument._id;
            print("<<< myId: " + myId);
 
            // **** split the name ****
            var names = myName.split(" ");
            print("<<< names[0]: " + names[0] + " names[1]: " + names[1]);
            
            // **** update the first and last name in the document ****
            db.patient.update(  { _id: myId }, 
                                { firstName: names[0], lastName: names[1] },
                                { upsert: true } );
          }
    }
}

// **** ****
var updatePatients = function() {
   
    // **** display before ****
    printPatients();
    
    // **** ****
    updateDocuments();
    print();
    
    // **** display after ****
    printPatients();
}

// **** ****
print("<<< before updatePatients ...");
updatePatients();
print("<<< after updatePatients !!!");

I would agree that the script contains many print() statements. In practice, I like to put some print() statements while developing code. When done the functions / methods should be a clean as possible. Also I like to have comments. I do not like to memorize how code works. Note that the _id fields in the different documents have not changed. This means that the actual documents were updated and not replaced. An alternate approach could have been to read, write a new document with the desired schema, and delete the original one. To update a couple records in a small collection would not make much of a difference. On the other hand, if the collection is large and the updates are part of the regular process, performance would be something that needs to be considered. In preparation to show a feature, let’s first update the image collection to include the accessedCount field which should be updated every time an image is read. Images in DICOM must be immutable.

> db.image.find()
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5 }
> db.image.update( {}, { $set: { accessedCount: 1 } }, { multi: true } )
WriteResult({ "nMatched" : 5, "nUpserted" : 0, "nModified" : 5 })
> db.image.find()
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5, "accessedCount" : 1 }
>

Our collection of five images now has the new accessedField and we have set it to one to indicate that the image was accessed once when it was first written to the database.

> db.image.update( { imageID: 3 }, { $inc: { accessedCount : 1 } } )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.image.find()
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3, "accessedCount" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5, "accessedCount" : 1 }
>

The imageID 3 image now shows an accessedCount value of 2.

When you need to programmatically specify a collection, you can do so using the following approaches:

> db.image.find();
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3, "accessedCount" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5, "accessedCount" : 1 }

That works if you know in advanced the name of the collection. If you do not; you could use:

> db["image"].find();
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3, "accessedCount" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5, "accessedCount" : 1 }

Or:

> db.getCollection("image").find();
{ "_id" : ObjectId("5b097c6e8a7d4656190e1799"), "imageID" : 1, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179a"), "imageID" : 2, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179b"), "imageID" : 3, "accessedCount" : 2 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179c"), "imageID" : 4, "accessedCount" : 1 }
{ "_id" : ObjectId("5b097cba8a7d4656190e179d"), "imageID" : 5, "accessedCount" : 1 }

Believe it or not, I started this post over a week ago. I have been busy with family and friends. Finally I was able to set some time aside to complete this post.

Last week I purchased a book from Amazon and I this morning I just finished reading it. In addition, I am taking an online MongoDB course and had to submit the first homework.

As usual, hope the material covered in this post is useful.

Enjoy;

John

john.canessa@gmail.com
@john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.