MongoDB Text Search

Yesterday was a warm and humid day in the Twin Cities of Minneapolis and St. Paul.

Last week a software developer mentioned that he was interested in using MongoDB Atlas and needed to do some text searches. I have used text searches in a project a couple years ago. I have also experimented with it using the Community version currently installed in one of my Windows machines. I decided to test it to make sure searching for text in MongoDB is working as advertised.

I could have used an existing MongoDB database and collection but for simplicity I decided to use the example illustrated in the MongoDB documentation for Text Search. I would like to note that in my opinion MongoDB has done a very good job with their documentation.

It is always a good idea to note the version of the software you are experimenting / testing with. If something does not work as expected and need help, you should always include the version of the software.

When you open a command prompt on Linux or Windows the OS version is displayed. In this post I am using Windows 10.

Microsoft Windows [Version 10.0.17134.885]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Users\John>systeminfo

Host Name:                 CONDOR
OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.17134 N/A Build 17134
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          John
Registered Organization:   Microsoft
Product ID:                00330-80000-00000-AA206
Original Install Date:     5/19/2018, 11:38:17 AM
System Boot Time:          7/12/2019, 5:18:15 AM
System Manufacturer:       Dell Inc.
System Model:              Precision WorkStation T7500
System Type:               x64-based PC
Processor(s):              2 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 44 Stepping 2 GenuineIntel ~2128 Mhz
                           [02]: Intel64 Family 6 Model 44 Stepping 2 GenuineIntel ~2128 Mhz
BIOS Version:              Dell Inc. A18, 10/15/2018
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskVolume2
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC-06:00) Central Time (US & Canada)
Total Physical Memory:     24,574 MB
Available Physical Memory: 17,647 MB
Virtual Memory: Max Size:  49,150 MB
Virtual Memory: Available: 39,993 MB
Virtual Memory: In Use:    9,157 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\CONDOR
Hotfix(s):                 11 Hotfix(s) Installed.
                           [01]: KB4343669
                           [02]: KB4346084
                           [03]: KB4456655
                           [04]: KB4465663
                           [05]: KB4477137
                           [06]: KB4485449
                           [07]: KB4497398
                           [08]: KB4497932
                           [09]: KB4503308
                           [10]: KB4509094
                           [11]: KB4507435
Network Card(s):           5 NIC(s) Installed.
                           [01]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (Broadcom NetXtreme 57xx Gigabit Controller Virtual Switch)
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: 192.168.1.110
                                 [02]: fe80::2da0:c8eb:8b8e:4efa
                                 [03]: 2600:6c46:7b00:2058:2164:e54:b7f5:fed2
                                 [04]: 2600:6c46:7b00:2058:1c54:12cc:a575:9aa8
                                 [05]: 2600:6c46:7b00:2058:2da0:c8eb:8b8e:4efa
                           [02]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (Internal Ethernet Port Windows Phone Emulator Internal Switch)
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: 169.254.80.80
                                 [02]: fe80::4c25:7b15:321d:e6fc
                           [03]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (Default Switch)
                                 DHCP Enabled:    Yes
                                 DHCP Server:     255.255.255.255
                                 IP address(es)
                                 [01]: 172.29.31.97
                                 [02]: fe80::d910:3a38:fa35:6e32
                           [04]: Broadcom NetXtreme 57xx Gigabit Controller
                                 Connection Name: Local Area Connection
                                 DHCP Enabled:    Yes
                                 DHCP Server:     N/A
                                 IP address(es)
                           [05]: VirtualBox Host-Only Ethernet Adapter
                                 Connection Name: VirtualBox Host-Only Network #2
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: 192.168.56.1
                                 [02]: fe80::b537:b95a:6760:6865
Hyper-V Requirements:      VM Monitor Mode Extensions: Yes
                           Virtualization Enabled In Firmware: Yes
                           Second Level Address Translation: Yes
                           Data Execution Prevention Available: Yes

C:\Users\John>

I open a command prompt and the first thing it displays is the Windows version which in this case happens to be 10.0.17134.885 which is all we need to note in case something does not work. If one is asked to provide additional information I decided to use the systeminfo command. Note that the same OS version is displayed in addition to many other items.

Now we could also get the version of MongoDB we are using.

C:\>mongod --version
db version v4.0.9
git version: fc525e2d9b0e4bceff5c2201457e564362909765
allocator: tcmalloc
modules: none
build environment:
    distmod: 2008plus-ssl
    distarch: x86_64
    target_arch: x86_64

C:\>mongo --version
MongoDB shell version v4.0.9
git version: fc525e2d9b0e4bceff5c2201457e564362909765
allocator: tcmalloc
modules: none
build environment:
    distmod: 2008plus-ssl
    distarch: x86_64
    target_arch: x86_64

Note that I first requested the version of mongod which is the version of the database engine. I then requested the version of the MongoDB shell which is the tool I will be using in this post. Doth seem to be at the save revision level.

I guess that instead of the MongoDB shell we could have used the “MongoDB Compass Community” edition or “Robo 3T, MongoDB management tool” which I have installed in the machine I am currently using. That said, it is always best to eliminate possible extraneous issues by using the simplest tools provided by the vendor, in this case MongoDB.

Next I will log in to start experimenting with text searches. In general when I am not using a tool frequently, I tend to forget the inconsequential, like how to log on MongoDB. What seems to work in many interfaces is to ask for help.

C:\>mongo --help
MongoDB shell version v4.0.9
usage: mongo [options] [db address] [file names (ending in .js)]
db address can be:
  foo                   foo database on local machine
  192.168.0.5/foo       foo database on 192.168.0.5 machine
  192.168.0.5:9999/foo  foo database on 192.168.0.5 machine on port 9999
Options:
  --shell                             run the shell after executing files
  --nodb                              don't connect to mongod on startup - no
                                      'db address' arg expected
  --norc                              will not run the ".mongorc.js" file on
                                      start up
  --quiet                             be less chatty
  --port arg                          port to connect to
  --host arg                          server to connect to
  --eval arg                          evaluate javascript
  -h [ --help ]                       show this usage information
  --version                           show version information
  --verbose                           increase verbosity
  --ipv6                              enable IPv6 support (disabled by default)
  --disableJavaScriptJIT              disable the Javascript Just In Time
                                      compiler
  --enableJavaScriptJIT               enable the Javascript Just In Time
                                      compiler
  --disableJavaScriptProtection       allow automatic JavaScript function
                                      marshalling
  --ssl                               use SSL for all connections
  --sslCAFile arg                     Certificate Authority file for SSL
  --sslPEMKeyFile arg                 PEM certificate/key file for SSL
  --sslPEMKeyPassword arg             password for key in PEM file for SSL
  --sslCRLFile arg                    Certificate Revocation List file for SSL
  --sslAllowInvalidHostnames          allow connections to servers with
                                      non-matching hostnames
  --sslAllowInvalidCertificates       allow connections to servers with invalid
                                      certificates
  --sslFIPSMode                       activate FIPS 140-2 mode at startup
  --sslCertificateSelector arg        SSL Certificate in system store
  --sslDisabledProtocols arg          Comma separated list of TLS protocols to
                                      disable [TLS1_0,TLS1_1,TLS1_2]
  --retryWrites                       automatically retry write operations upon
                                      transient network errors
  --disableImplicitSessions           do not automatically create and use
                                      implicit sessions
  --jsHeapLimitMB arg                 set the js scope's heap size limit

Authentication Options:
  -u [ --username ] arg               username for authentication
  -p [ --password ] arg               password for authentication
  --authenticationDatabase arg        user source (defaults to dbname)
  --authenticationMechanism arg       authentication mechanism
  --gssapiServiceName arg (=mongodb)  Service name to use when authenticating
                                      using GSSAPI/Kerberos
  --gssapiHostName arg                Remote host name to use for purpose of
                                      GSSAPI/Kerberos authentication

file names: a list of files to run. files have to end in .js and will exit after unless --shell is specified

C:\>mongo -u john
MongoDB shell version v4.0.9
Enter password:
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("e4ac5bc7-b060-4730-a478-ae3a3bf809a9") }
MongoDB server version: 4.0.9

> show dbs
admin            0.000GB
agg              0.000GB
blog             0.000GB
chat             0.000GB
config           0.000GB
course           0.000GB
dicom            0.000GB
images           0.000GB
local            0.000GB
m101             0.000GB
mongo-exercises  0.000GB
playground       0.000GB
school           0.000GB
students         0.000GB
test             0.041GB
towns            0.000GB
video            0.000GB
vidly            0.000GB
>

Besides getting some additional information (i.e., MongoDB shell version) the first command towards the end illustrates how to log in. With that information on hand, I log into the database by just specifying my user name. The MongoDB shell then requests my password and I am logged in. Note that the MongoDB version is also displayed.

I always like to issue some base command to make sure all is well. In this case I used the “show dbs” command. Note that if you have authentication enabled, and you issue the mongo command, you will be able to log into the MongoDB shell. This is illustrated by the change in the command prompt. But if you issue the “show dbs” command, only databases with public access will be shown. I do not have public access databases in my machines.

> use test
switched to db test

> show collections
customer
example
fun
movieDetails
sentences
zips
>

We need to select a database in which we will create a collection to test the operation of text searches in MongoDB. I decided to use the test database. To make sure there will be no conflict, I display the existing collections. All seems fine so far.

> db.stores.insert(
...    [
...      { _id: 1, name: "Java Hut", description: "Coffee and cakes" },
...      { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
...      { _id: 3, name: "Coffee Shop", description: "Just coffee" },
...      { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
...      { _id: 5, name: "Java Shopping", description: "Indonesian goods" }
...    ]
... )
BulkWriteResult({
        "writeErrors" : [ ],
        "writeConcernErrors" : [ ],
        "nInserted" : 5,
        "nUpserted" : 0,
        "nMatched" : 0,
        "nModified" : 0,
        "nRemoved" : 0,
        "upserted" : [ ]
})
>

> db.stores.find()
{ "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes" }
{ "_id" : 2, "name" : "Burger Buns", "description" : "Gourmet hamburgers" }
{ "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" }
{ "_id" : 4, "name" : "Clothes Clothes Clothes", "description" : "Discount clothing" }
{ "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" }
>

We now create a simple collection with five documents. We have names and simple descriptions. As usual, after issuing a command I like to verify that all is well. I do so by finding all the documents in the stores collection. Once again, all seems to be working as expected.

You need to create a text index to be able to search text. Note that a collection can only have one text search index, but that index can cover multiple fields.

> db.stores.createIndex( { name: "text", description: "text" } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>

> db.stores.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.stores"
        },
        {
                "v" : 2,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "name_text_description_text",
                "ns" : "test.stores",
                "weights" : {
                        "description" : 1,
                        "name" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 3
        }
]
>

So we go ahead and create our text index for the stores collection. We then list all the indexes in the stores collection in order to verify that we have created a text index. Once again, all seems to be going as expected.

> db.stores.find( { $text: { $search: "java coffee shop" } } )
{ "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" }
{ "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes" }
{ "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" }
>

We now test a search by looking for documents that contain the following search words: “java coffee shop”. We get three results back. The first contains “coffee shop”, the second “java coffee” and the last document “java shopping”.

> db.stores.find( { $text: { $search: "\"coffee shop\"" } } )
{ "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" }
>

We can search for an exact phrase. In this case we search for “coffee shop” and our results include a single document.

> db.stores.find( { $text: { $search: "java shop -coffee" } } )
{ "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" }
>

In the last search we decided to search for documents that include the words “java shop” but do not include the word “coffee”. Only one document is a match.

> db.stores.find(
...    { $text: { $search: "java coffee shop" } },
...    { score: { $meta: "textScore" } }
... ).sort( { score: { $meta: "textScore" } } )
{ "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee", "score" : 2.25 }
{ "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes", "score" : 1.5 }
{ "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods", "score" : 1.5 }
>

In the last search we perform the search with the words “java coffee shop” but wish to sort the results in order of relevance and display how MongoDB determined the relevance. The first document has a higher relevance because it matched the three words. The other two documents just matched two of the three words.

In conclusion, the Community version of MongoDB supports text searching. This implies that the paid Atlas version which is found in Azure and AWS should also support it.

If you have comments or questions regarding this or any other post in this blog, or if you would like me to help with any phase in the SDLC (Software Development Life Cycle) of a product or service, please do not hesitate and leave me a note below. Requests for help will remain private.

Keep on reading and experimenting. It is the best way to learn and refresh your knowledge!

John

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.