10. Queries and Indexes


Queries

Queries are used to search a collection for a subset of documents.

Below is a list of the asynchronous mongoose functions that perform queries:

  • Model.deleteMany()
  • Model.deleteOne()
  • Model.find()
  • Model.findById()
  • Model.findByIdAndDelete()
  • Model.findByIdAndUpdate()
  • Model.findOne()
  • Model.findOneAndDelete()
  • Model.findOneAndReplace()
  • Model.findOneAndUpdate()
  • Model.replaceOne()
  • Model.updateMany()
  • Model.updateOne()
  • Model.countDocuments()

When we perform a query we provide a filter object which indicates the match criteria for the documents that should be selected.  The filter can contain query predicates which evaluates to true for the selected documents.

Some mongoose query methods allow us to specify a set of fields to return in each document (a.k.a. a projection) rather than return every field in each document.  

In addition, some query methods allow a query to specify options to further modify the resulting data set such as sort the document set, limit the number of documents returned, and make the document set lean.

Below is an example query that uses a filter, projection, and option.

const users = await User.find(
   { username : "rajiv"}, // query
   { username: 1, age : 1 }, // projection
   { limit : 1 } // options
)

Later on we’ll continue to investigate how to build queries using these mongoose methods.  But before we do, let’s discuss indexes and the various types that are available in MongoDB.

MongoDB Indexes

When designing a schema for a collection we need to consider which fields will be used in the queries that the API server provides to the client.

An Index is data structure that that maps the value of fields to documents in a collection. MongoDB indexes are implemented as B-trees.  The runtime for insert, search, and delete in. B-tree is O(log n).  Without indexes, a query would require searching every document in a collection taking O(n) time.  And as the collection grows this time can be intolerable.

MongoDB differentiates between MongoDB indexes and MongoDB Search indexes.  MongoDB indexes are used to locate documents based on the entire values of the indexed fields whereas MongoDB Search indexes enable searching within text fields, support partial matches, and autocomplete. MongoDB Search indexes are only available on Atlas. 

There are several types of MongoDB indexes: single field indexes, compound indexes, multikey indexes, geospatial indexes, text indexes,  hashed indexes, and clustered indexes.

When defining an index, if we choose, we can specify options to define which documents will be indexed and how the indexed documents will be stored.  Options can be used to create unique indexes,  case-insensitive indexes, hidden indexes, partial indexes, sparse indexes, and TTL indexes.  Note that not all options can be used for all index types.  

Because indexes reside in memory, engineers need to weigh the memory costs vs. the performance benefits of indexes before making a design decision. Included in the MongoDB documentation are articles that discuss design strategies.  

We say an index covers a query if to resolve a query MongoDB does not have to open a collection file, that is, all of the keys used in a query are in the index and the query projects only the fields that are defined in the index.  Covered queries are most efficient because the in order to resolve the query MongoDB does not need to pull documents from a collection as all of the information that is needed is in the index.

Defining Indexes with Mongoose

With Mongoose we can defined an index by calling the index() method on a schema object.  For example, below we create a compound index that uses the firstName and lastName fields, both sorted alphabetically, and for which any pair of strings for firstName and lastName can appear only once (making the pair unique) in the index.

userSchema.index({ firstname: 1, lastName: 1 }, { unique: true });

MongoDB Index Types

1.   Single field indexes store the values of a single field in the index.  These allow queries for documents within a collection using a single field.  When we create a single field index we need to specify the sort order for the indexed values (ascending or descending).

By default, MongoDB creates a single field index for every collection using the _id key.

2.   Compound indexes store the values for up to 32 fields in the index.  The order of the keys in the list of keys that is specified when creating the index limits the way queries can be written for the index.   For example, suppose we define a compound index using the following list: { a: 1, b: 1, c: 1 }. Then we can only perform searches on the following sets of keys:

  • a
  • a and b
  • a, b, and c

For some queries we’ll want to search specific fields for exact matches.  These fields are called equality fields.  For example, when the user with “joe123” as their username is logging into an app, we’ll want to find the one document that has the username field equal to “joe123”.  Equality fields must be specified first when defining the index – order matters.

Sometimes we want to select the set of documents that have field values that are within some range of values.  These fields are called range fields.   It is recommended to place range fields before sort fields if the range predicate is very selective to limit the documents that need to be sorted; otherwise list range fields after the sort fields.

Some fields may be used to specify how the documents in the query results will be sorted.  We call these sort fields. For example the query below returns all admins sorted according to their last name.

const users = await User.find( { role: "admin" } )
.sort( { lastName: 1 } )
.exec()

The MongoDB docs advise placing sort fields before range fields (ESR) if you wish to avoid in-memory sorts.

Example

Below we have an example query that has an equality field (role), a range field (startYear), and a sort field (lastname).

const users = await User.find( { role: 'admin', startYear: { $gt: 2020 } } )
.sort( { lastname: 1 } )
.exec()

To support this sort of query we should define the index using the following arrangement of keys.

userSchema.index({ role: 1, lastname: 1, startYear: 1 })

3. If when populating an index MongoDB finds a document that has an index field whose value is an array, then MongoDB sets that index to be a multikey index.  MongoDB then, for each document, creates a key in the B-tree for each value of the array if the key does not yet exist.  This makes performing queries (especially equality matches) on arrays fast.

We can create a compound multikey index so long as no two keys in the index have array values.  If you attempt to add a document that contains two index keys, each having an array value, the insert will fail.

4. Geospacial indexes allow you to query data that is stored as GeoJSON objects or “legacy coordinate pairs“.

5. Text indexes are obsolete indexes used to query fields that hold strings.  MongoDB offers a more advanced index solution, called MongoDB Search, for searching fields that holds strings.  It was announced on September 17, 2025 that MongoDB Search and Vector Search, which are currently available on Atlas, will also be available soon on self-managed MongoDB servers.

6. Sharding is a method for distributing data across multiple computers.  Though, sharding is beyond the scope of this class, we should be aware that hashed indexes are used to enable sharding.

7. A clustered collection is a collection whose documents are stored in order of the _id field rather than the collection’s natural order.   This may be advantageous if we have queries that search a single field using range scans or equality comparisons.  For example, searching for documents that have a timestamp within a range of values. In this case we design the schema so that the timestamp is stored in the _id field.  

We create a clustered collection by creating a clustered index when we create the collection.

Index Options

As mentioned above we have various options when creating indexes.  We can create unique indexes,  case-insensitive indexes, hidden indexes, partial indexes, sparse indexes, and TTL indexes.  Below is a description of each option.

1.  Unique indexes prevent multiple documents from having the same values for the indexed fields.  The unique option can be applied to all index types except hashed indexes.

memberSchema.index( { "username": 1 }, { unique: true } )

2. Case-insensitive indexes allow queries on indexed fields containing text to disregard the case of characters.

fruitSchema.index( { type: 1 }, { collation: { locale: 'en', strength: 2 } } )

3. Hidden indexes cannot be used in queries but retain their other properties. Making an index hidden allows engineers to evaluate the impact of dropping the hidden index.  If the impact is negative the index can be unhidden. 

addressSchema.index( { borough: 1 }, { hidden: true } );

4. Partial indexes index a subset of the documents in a collection, namely only the documents that satisfy the specified filter requirement.  Queries on a partial index must use an expression, similar to the filter expression, that searches for a subset of the documents indexed by the index and must include all indexed fields.  Sparse and _id indexes cannot be partial.

restaurantSchema.index( { cuisine: 1 }, { partialFilterExpression: { rating: { $gt: 5 } } })

5. Sparse indexes only index documents that have values (possibly null) defined for at least one indexed field.

userSchema.index( { score: 1 } , { sparse: true } )

6. TTL indexes are single field indexes that MongoDB can use to delete documents at a specific time or after a certain amount of time.  Note the mongoose option uses the expires property whereas MongoDB createIndex() uses expiresAfterSeconds property.  Mongoose uses the ms npm module to convert various intervals represented as strings into milliseconds and then converts the milliseconds to seconds.

eventLogSchema.index( { "timestamp": 1 }, { expires: '1d' } )

MongoDB Search

MongoDB Search (Search) supersedes the use of MongoDB text indexes, and provides more than just fast string matching.  Search provides apps the ability to do tasks such as predicting words from partial strings (autocomplete), allow users to refine search results using filters, return search results with relevance scores,  incorporate a collection of synonyms when comparing strings, and perform sequential searches to implement pagination.

We can create Search indexes using MongoDB Altas UI, Atlas API, and the Mongo Shell.  Once the indexes have been created we can use Mongoose to create queries that use the Search indexes by creating $search stages in an aggregate pipeline (more on this later).

Designing MongoDB Search Indexes and Queries

The Queries and Indexes documentation contains a useful process for designing, creating, and using Search indexes.   The process includes asking and answering the following questions:

  • What are your users searching for?
    • Documents or meta data
  • Which fields in your documents contain likely search terms?
    • Certain fields, all fields?
    • Static or dynamic set of fields
  • How closely should users’ search terms match your data?
    • Do queries need exact, similar, within a range, or partial matches?
  • Do you need advanced text analysis?
    • Choose between built-in or custom text analyzers
  • How do you want to present search results?
    • Should results be order according to relevance score or sorted?
    • Are results presented to the user is blocks?
    • Should users be able to filter results based on categories or other dimensions?
  • How can you optimize search performance?
    • Run queries concurrently; partition your index?

Please read the document cited above to better understand the capabilities provided by Search.