Yesterday was a warm and humid day in the Twin Cities of Minneapolis and St. Paul.
Last week a software developer mentioned that he was interested in using MongoDB Atlas and needed to do some text searches. I have used text searches in a project a couple years ago. I have also experimented with it using the Community version currently installed in one of my Windows machines. I decided to test it to make sure searching for text in MongoDB is working as advertised.
I could have used an existing MongoDB database and collection but for simplicity I decided to use the example illustrated in the MongoDB documentation for Text Search. I would like to note that in my opinion MongoDB has done a very good job with their documentation.
It is always a good idea to note the version of the software you are experimenting / testing with. If something does not work as expected and need help, you should always include the version of the software.
When you open a command prompt on Linux or Windows the OS version is displayed. In this post I am using Windows 10.
Microsoft Windows [Version 10.0.17134.885] (c) 2018 Microsoft Corporation. All rights reserved. C:\Users\John>systeminfo Host Name: CONDOR OS Name: Microsoft Windows 10 Pro OS Version: 10.0.17134 N/A Build 17134 OS Manufacturer: Microsoft Corporation OS Configuration: Standalone Workstation OS Build Type: Multiprocessor Free Registered Owner: John Registered Organization: Microsoft Product ID: 00330-80000-00000-AA206 Original Install Date: 5/19/2018, 11:38:17 AM System Boot Time: 7/12/2019, 5:18:15 AM System Manufacturer: Dell Inc. System Model: Precision WorkStation T7500 System Type: x64-based PC Processor(s): 2 Processor(s) Installed. [01]: Intel64 Family 6 Model 44 Stepping 2 GenuineIntel ~2128 Mhz [02]: Intel64 Family 6 Model 44 Stepping 2 GenuineIntel ~2128 Mhz BIOS Version: Dell Inc. A18, 10/15/2018 Windows Directory: C:\WINDOWS System Directory: C:\WINDOWS\system32 Boot Device: \Device\HarddiskVolume2 System Locale: en-us;English (United States) Input Locale: en-us;English (United States) Time Zone: (UTC-06:00) Central Time (US & Canada) Total Physical Memory: 24,574 MB Available Physical Memory: 17,647 MB Virtual Memory: Max Size: 49,150 MB Virtual Memory: Available: 39,993 MB Virtual Memory: In Use: 9,157 MB Page File Location(s): C:\pagefile.sys Domain: WORKGROUP Logon Server: \\CONDOR Hotfix(s): 11 Hotfix(s) Installed. [01]: KB4343669 [02]: KB4346084 [03]: KB4456655 [04]: KB4465663 [05]: KB4477137 [06]: KB4485449 [07]: KB4497398 [08]: KB4497932 [09]: KB4503308 [10]: KB4509094 [11]: KB4507435 Network Card(s): 5 NIC(s) Installed. [01]: Hyper-V Virtual Ethernet Adapter Connection Name: vEthernet (Broadcom NetXtreme 57xx Gigabit Controller Virtual Switch) DHCP Enabled: No IP address(es) [01]: 192.168.1.110 [02]: fe80::2da0:c8eb:8b8e:4efa [03]: 2600:6c46:7b00:2058:2164:e54:b7f5:fed2 [04]: 2600:6c46:7b00:2058:1c54:12cc:a575:9aa8 [05]: 2600:6c46:7b00:2058:2da0:c8eb:8b8e:4efa [02]: Hyper-V Virtual Ethernet Adapter Connection Name: vEthernet (Internal Ethernet Port Windows Phone Emulator Internal Switch) DHCP Enabled: No IP address(es) [01]: 169.254.80.80 [02]: fe80::4c25:7b15:321d:e6fc [03]: Hyper-V Virtual Ethernet Adapter Connection Name: vEthernet (Default Switch) DHCP Enabled: Yes DHCP Server: 255.255.255.255 IP address(es) [01]: 172.29.31.97 [02]: fe80::d910:3a38:fa35:6e32 [04]: Broadcom NetXtreme 57xx Gigabit Controller Connection Name: Local Area Connection DHCP Enabled: Yes DHCP Server: N/A IP address(es) [05]: VirtualBox Host-Only Ethernet Adapter Connection Name: VirtualBox Host-Only Network #2 DHCP Enabled: No IP address(es) [01]: 192.168.56.1 [02]: fe80::b537:b95a:6760:6865 Hyper-V Requirements: VM Monitor Mode Extensions: Yes Virtualization Enabled In Firmware: Yes Second Level Address Translation: Yes Data Execution Prevention Available: Yes C:\Users\John>
I open a command prompt and the first thing it displays is the Windows version which in this case happens to be 10.0.17134.885 which is all we need to note in case something does not work. If one is asked to provide additional information I decided to use the systeminfo command. Note that the same OS version is displayed in addition to many other items.
Now we could also get the version of MongoDB we are using.
C:\>mongod --version db version v4.0.9 git version: fc525e2d9b0e4bceff5c2201457e564362909765 allocator: tcmalloc modules: none build environment: distmod: 2008plus-ssl distarch: x86_64 target_arch: x86_64 C:\>mongo --version MongoDB shell version v4.0.9 git version: fc525e2d9b0e4bceff5c2201457e564362909765 allocator: tcmalloc modules: none build environment: distmod: 2008plus-ssl distarch: x86_64 target_arch: x86_64
Note that I first requested the version of mongod which is the version of the database engine. I then requested the version of the MongoDB shell which is the tool I will be using in this post. Doth seem to be at the save revision level.
I guess that instead of the MongoDB shell we could have used the “MongoDB Compass Community” edition or “Robo 3T, MongoDB management tool” which I have installed in the machine I am currently using. That said, it is always best to eliminate possible extraneous issues by using the simplest tools provided by the vendor, in this case MongoDB.
Next I will log in to start experimenting with text searches. In general when I am not using a tool frequently, I tend to forget the inconsequential, like how to log on MongoDB. What seems to work in many interfaces is to ask for help.
C:\>mongo --help MongoDB shell version v4.0.9 usage: mongo [options] [db address] [file names (ending in .js)] db address can be: foo foo database on local machine 192.168.0.5/foo foo database on 192.168.0.5 machine 192.168.0.5:9999/foo foo database on 192.168.0.5 machine on port 9999 Options: --shell run the shell after executing files --nodb don't connect to mongod on startup - no 'db address' arg expected --norc will not run the ".mongorc.js" file on start up --quiet be less chatty --port arg port to connect to --host arg server to connect to --eval arg evaluate javascript -h [ --help ] show this usage information --version show version information --verbose increase verbosity --ipv6 enable IPv6 support (disabled by default) --disableJavaScriptJIT disable the Javascript Just In Time compiler --enableJavaScriptJIT enable the Javascript Just In Time compiler --disableJavaScriptProtection allow automatic JavaScript function marshalling --ssl use SSL for all connections --sslCAFile arg Certificate Authority file for SSL --sslPEMKeyFile arg PEM certificate/key file for SSL --sslPEMKeyPassword arg password for key in PEM file for SSL --sslCRLFile arg Certificate Revocation List file for SSL --sslAllowInvalidHostnames allow connections to servers with non-matching hostnames --sslAllowInvalidCertificates allow connections to servers with invalid certificates --sslFIPSMode activate FIPS 140-2 mode at startup --sslCertificateSelector arg SSL Certificate in system store --sslDisabledProtocols arg Comma separated list of TLS protocols to disable [TLS1_0,TLS1_1,TLS1_2] --retryWrites automatically retry write operations upon transient network errors --disableImplicitSessions do not automatically create and use implicit sessions --jsHeapLimitMB arg set the js scope's heap size limit Authentication Options: -u [ --username ] arg username for authentication -p [ --password ] arg password for authentication --authenticationDatabase arg user source (defaults to dbname) --authenticationMechanism arg authentication mechanism --gssapiServiceName arg (=mongodb) Service name to use when authenticating using GSSAPI/Kerberos --gssapiHostName arg Remote host name to use for purpose of GSSAPI/Kerberos authentication file names: a list of files to run. files have to end in .js and will exit after unless --shell is specified C:\>mongo -u john MongoDB shell version v4.0.9 Enter password: connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb Implicit session: session { "id" : UUID("e4ac5bc7-b060-4730-a478-ae3a3bf809a9") } MongoDB server version: 4.0.9 > show dbs admin 0.000GB agg 0.000GB blog 0.000GB chat 0.000GB config 0.000GB course 0.000GB dicom 0.000GB images 0.000GB local 0.000GB m101 0.000GB mongo-exercises 0.000GB playground 0.000GB school 0.000GB students 0.000GB test 0.041GB towns 0.000GB video 0.000GB vidly 0.000GB >
Besides getting some additional information (i.e., MongoDB shell version) the first command towards the end illustrates how to log in. With that information on hand, I log into the database by just specifying my user name. The MongoDB shell then requests my password and I am logged in. Note that the MongoDB version is also displayed.
I always like to issue some base command to make sure all is well. In this case I used the “show dbs” command. Note that if you have authentication enabled, and you issue the mongo command, you will be able to log into the MongoDB shell. This is illustrated by the change in the command prompt. But if you issue the “show dbs” command, only databases with public access will be shown. I do not have public access databases in my machines.
> use test switched to db test > show collections customer example fun movieDetails sentences zips >
We need to select a database in which we will create a collection to test the operation of text searches in MongoDB. I decided to use the test database. To make sure there will be no conflict, I display the existing collections. All seems fine so far.
> db.stores.insert( ... [ ... { _id: 1, name: "Java Hut", description: "Coffee and cakes" }, ... { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" }, ... { _id: 3, name: "Coffee Shop", description: "Just coffee" }, ... { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" }, ... { _id: 5, name: "Java Shopping", description: "Indonesian goods" } ... ] ... ) BulkWriteResult({ "writeErrors" : [ ], "writeConcernErrors" : [ ], "nInserted" : 5, "nUpserted" : 0, "nMatched" : 0, "nModified" : 0, "nRemoved" : 0, "upserted" : [ ] }) > > db.stores.find() { "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes" } { "_id" : 2, "name" : "Burger Buns", "description" : "Gourmet hamburgers" } { "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" } { "_id" : 4, "name" : "Clothes Clothes Clothes", "description" : "Discount clothing" } { "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" } >
We now create a simple collection with five documents. We have names and simple descriptions. As usual, after issuing a command I like to verify that all is well. I do so by finding all the documents in the stores collection. Once again, all seems to be working as expected.
You need to create a text index to be able to search text. Note that a collection can only have one text search index, but that index can cover multiple fields.
> db.stores.createIndex( { name: "text", description: "text" } ) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > > db.stores.getIndexes() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.stores" }, { "v" : 2, "key" : { "_fts" : "text", "_ftsx" : 1 }, "name" : "name_text_description_text", "ns" : "test.stores", "weights" : { "description" : 1, "name" : 1 }, "default_language" : "english", "language_override" : "language", "textIndexVersion" : 3 } ] >
So we go ahead and create our text index for the stores collection. We then list all the indexes in the stores collection in order to verify that we have created a text index. Once again, all seems to be going as expected.
> db.stores.find( { $text: { $search: "java coffee shop" } } ) { "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" } { "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes" } { "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" } >
We now test a search by looking for documents that contain the following search words: “java coffee shop”. We get three results back. The first contains “coffee shop”, the second “java coffee” and the last document “java shopping”.
> db.stores.find( { $text: { $search: "\"coffee shop\"" } } ) { "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee" } >
We can search for an exact phrase. In this case we search for “coffee shop” and our results include a single document.
> db.stores.find( { $text: { $search: "java shop -coffee" } } ) { "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods" } >
In the last search we decided to search for documents that include the words “java shop” but do not include the word “coffee”. Only one document is a match.
> db.stores.find( ... { $text: { $search: "java coffee shop" } }, ... { score: { $meta: "textScore" } } ... ).sort( { score: { $meta: "textScore" } } ) { "_id" : 3, "name" : "Coffee Shop", "description" : "Just coffee", "score" : 2.25 } { "_id" : 1, "name" : "Java Hut", "description" : "Coffee and cakes", "score" : 1.5 } { "_id" : 5, "name" : "Java Shopping", "description" : "Indonesian goods", "score" : 1.5 } >
In the last search we perform the search with the words “java coffee shop” but wish to sort the results in order of relevance and display how MongoDB determined the relevance. The first document has a higher relevance because it matched the three words. The other two documents just matched two of the three words.
In conclusion, the Community version of MongoDB supports text searching. This implies that the paid Atlas version which is found in Azure and AWS should also support it.
If you have comments or questions regarding this or any other post in this blog, or if you would like me to help with any phase in the SDLC (Software Development Life Cycle) of a product or service, please do not hesitate and leave me a note below. Requests for help will remain private.
Keep on reading and experimenting. It is the best way to learn and refresh your knowledge!
John
Follow me on Twitter: @john_canessa