MongoDB Indexes

Like most database engine MongoDB allows you to optimise queries by creating an index on the collection. Indexes are created and used in a similar way to in SQL Server. There is no concept of a heap in MongoDB, all documents in a collection must have an _id field and the _id field is indexed automatically. You can create additional indexes on a collection in a similar way to creating non clustered index in SQL Server, i.e. a copy of the subset of fields in each document is copied to another location on disk in a specified order. The optimiser will then decide whether it is better to use the index than access the data directly from the collection.

The explain() function

In a similar way to viewing an execution plan in SQL Server you can using the explain() function to return information on how the query is going to be executed including whether any indexes were used. When you use the explain() function with any operation just execution details are returned and not the results of the query itself.

An example

The following code examples are all run in the Mongo shell.  I’m just going to use the following snippet of JavaScript to create some documents in the indexTest collection of the simonBlog database.

> use simonBlog
> db.dropDatabase()
> var someStuff = ""
  someStuff = someStuff.pad(1000)
  for (i=0;i<100;i++) {
    for (j=0;j<10;j++) {
      for (k=0;k<32;k++) {
        db.indexTest.insert({a:i, b:j, c:k, d:someStuff})
      }
    }
  }

This will write 32,000 documents to the collection.

Now if we run a find() to return all documents and use the explain() method as follows…

> db.indexTest.find().explain()

we get the following result…

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "$and" : [ ]
     },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "$and" : [ ]
      },
      "direction" : "forward"
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

The namespace is the database and collection we’re accessing.
The parsedQuery shows us the query executed – this is blank because we are searching for everything.
The winningPlan and rejectedPlans contains details of all the execution plans considered and which one won.
In this case there is only possible plan and there is only one stage to the plan, a collection scan (COLLSCAN) with no filter in the forward direction.

As mentioned above there is an index created on the _id field automatically. We can prove that by running the following…

> db.indexTest.getIndexes()

which gives the following result…

[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "name" : "_id_",
    "ns" : "simonBlog.indexTest"
  }
]

Now suppose we want to find all documents where the “b” field equals 4…

> db.indexTest.find({b:4}).explain()
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
        }
      },
      "direction" : "forward"
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

Again we’re doing a collection scan but this time we can see the filter details:

 { "b" : { "$eq" : 4 } }

The explain() method accepts an optional “verbosity” parameter. If this is omitted then the default queryPlanner verbosity is used. The parameters dictates how much information is returned in by the explain() method. You can see in the previous example that a field queryPlanner is returned.

If we run the same query using the executionStats verbosity we get an additional field in the results with the execution statistics

> db.indexTest.find({b:4}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
       }
     },
     "direction" : "forward"
   },
   "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 29,
    "totalKeysExamined" : 0,
    "totalDocsExamined" : 32000,
    "executionStages" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
        }
      },
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 30,
      "works" : 32002,
      "advanced" : 3200,
      "needTime" : 28801,
      "needFetch" : 0,
      "saveState" : 250,
      "restoreState" : 250,
      "isEOF" : 1,
      "invalidates" : 0,
      "direction" : "forward",
      "docsExamined" : 32000
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

This gives some information about how many documents will be returned by the query (nReturned) and how many documents and index keys were examined (totalDocsExamined and totalKeysExamined respectively). Here we can see that all 32,000 documents were accessed because a collections scan occurred.

The final verbosity option is allPlansExecution and this returns statistics on the other plans considered for execution.

Now lets add an index that might be useful to our query. We’re filtering on the “b” field so lets add an index there.

> db.indexTest.createIndex({"b":1})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 2,
  "ok" : 1
}

I’m using version 3.0 of MongoDB and in previous versions to create an index you had to use ensureIndex() instead of createIndex() which is still available for backwards compatibility.

As you can see the index was successfully created: we did have 1 index (the one on _id) and we now have 2 indexes. Creating the index on {“b”:1} means create the index in ascending order. {“b”:-1} would be in descending order.

> db.indexTest.getIndexes()
[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "name" : "_id_",
    "ns" : "simonBlog.indexTest"
  },
  {
    "v" : 1,
    "key" : {
      "b" : 1
    },
    "name" : "b_1",
    "ns" : "simonBlog.indexTest"
  }
]

Now let’s run our query again using the explain() method with the execution statistics on

> db.indexTest.find({b:4}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "b" : 1
        },
        "indexName" : "b_1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 171,
    "totalKeysExamined" : 3200,
    "totalDocsExamined" : 3200,
    "executionStages" : {
      "stage" : "FETCH",
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 10,
      "works" : 3201,
      "advanced" : 3200,
      "needTime" : 0,
      "needFetch" : 0,
      "saveState" : 26,
      "restoreState" : 26,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 3200,
      "alreadyHasObj" : 0,
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 3200,
        "executionTimeMillisEstimate" : 10,
        "works" : 3200,
        "advanced" : 3200,
        "needTime" : 0,
        "needFetch" : 0,
        "saveState" : 26,
        "restoreState" : 26,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "b" : 1
        },
        "indexName" : "b_1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ]
        },
        "keysExamined" : 3200,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0,
        "matchTested" : 0
      }
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

Now we can see that an index scan occurred (IXSCAN) and 3,200 keys were examined (keysExamined). However, you can see from the docsExamined field that we still had to access 3,200 documents. This is an improvement from the 32,000 documents we accessed without the index and MongoDB has to access the actual documents because we are returning all the fields in the documents. Only the “b” field is included in the index so it uses the index to do the filtering then uses the pointers in the index to directly access the correct documents.

Now suppose we only want to return the “c” field in our query. A more useful index would be on the “b” field and then the “c” field. The “b” must be the first one in the index as this is the one used to filter on.

> db.indexTest.createIndex({"b":1,"c":-1})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 2,
  "numIndexesAfter" : 3,
  "ok" : 1
}

Now’s let use projections only return the “c” field in the results.

> db.indexTest.find({b:4},{_id:0,c:1}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "PROJECTION",
      "transformBy" : {
        "_id" : 0,
        "c" : 1
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "b" : 1,
          "c" : -1
        },
        "indexName" : "b_1_c_-1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ],
          "c" : [
            "[MaxKey, MinKey]"
          ]
        }
      }
    },
    "rejectedPlans" : [
      {
        "stage" : "PROJECTION",
        "transformBy" : {
          "_id" : 0,
          "c" : 1
        },
        "inputStage" : {
          "stage" : "FETCH",
          "inputStage" : {
            "stage" : "IXSCAN",
            "keyPattern" : {
              "b" : 1
            },
            "indexName" : "b_1",
            "isMultiKey" : false,
            "direction" : "forward",
            "indexBounds" : {
              "b" : [
                "[4.0, 4.0]"
              ]
            }
          }
        }
      }
    ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 4,
    "totalKeysExamined" : 3200,
    "totalDocsExamined" : 0,
    "executionStages" : {
      "stage" : "PROJECTION",
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 0,
      "works" : 3201,
      "advanced" : 3200,
      "needTime" : 0,
      "needFetch" : 0,
      "saveState" : 26,
      "restoreState" : 26,
      "isEOF" : 1,
      "invalidates" : 0,
      "transformBy" : {
        "_id" : 0,
        "c" : 1
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 3200,
        "executionTimeMillisEstimate" : 0,
        "works" : 3201,
        "advanced" : 3200,
        "needTime" : 0,
        "needFetch" : 0,
        "saveState" : 26,
        "restoreState" : 26,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "b" : 1,
          "c" : -1
        },
        "indexName" : "b_1_c_-1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ],
          "c" : [
            "[MaxKey, MinKey]"
          ]
        },
        "keysExamined" : 3200,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0,
        "matchTested" : 0
      }
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

The executionStats field contains details on the winning plan and we can now see that 3,200 keys are still examined but now 0 documents have been accessed. This index now covers this query. If you compare the executionTimeMillisEstimate now we have 0 (meaning less than 1) where when we still had to access the documents it was 10. If you can see it took 30 milliseconds when there was no index at all.  So having the covered index is over 30 times faster.

In this last result from explain() you can see that MongoDB did consider using the first index we created just on the “b” field but this plan was rejected.

All the examples above have been on fields with simple data types, either strings or integers. It is possible to index fields that contain arrays and embedded documents.  The creation of the index is the same, just how they are stored is slightly different.  More on that in a future post.

The documentation on the MongoDB website is pretty good once you’ve got a basic understanding of how things work and there are more details on the different fields returned by the explain() method.

Advertisements

One thought on “MongoDB Indexes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s