MongoDB Indexes

Like most database engine MongoDB allows you to optimise queries by creating an index on the collection. Indexes are created and used in a similar way to in SQL Server. There is no concept of a heap in MongoDB, all documents in a collection must have an _id field and the _id field is indexed automatically. You can create additional indexes on a collection in a similar way to creating non clustered index in SQL Server, i.e. a copy of the subset of fields in each document is copied to another location on disk in a specified order. The optimiser will then decide whether it is better to use the index than access the data directly from the collection.

The explain() function

In a similar way to viewing an execution plan in SQL Server you can using the explain() function to return information on how the query is going to be executed including whether any indexes were used. When you use the explain() function with any operation just execution details are returned and not the results of the query itself.

An example

The following code examples are all run in the Mongo shell.  I’m just going to use the following snippet of JavaScript to create some documents in the indexTest collection of the simonBlog database.

> use simonBlog
> db.dropDatabase()
> var someStuff = ""
  someStuff = someStuff.pad(1000)
  for (i=0;i<100;i++) {
    for (j=0;j<10;j++) {
      for (k=0;k<32;k++) {
        db.indexTest.insert({a:i, b:j, c:k, d:someStuff})
      }
    }
  }

This will write 32,000 documents to the collection.

Now if we run a find() to return all documents and use the explain() method as follows…

> db.indexTest.find().explain()

we get the following result…

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "$and" : [ ]
     },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "$and" : [ ]
      },
      "direction" : "forward"
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

The namespace is the database and collection we’re accessing.
The parsedQuery shows us the query executed – this is blank because we are searching for everything.
The winningPlan and rejectedPlans contains details of all the execution plans considered and which one won.
In this case there is only possible plan and there is only one stage to the plan, a collection scan (COLLSCAN) with no filter in the forward direction.

As mentioned above there is an index created on the _id field automatically. We can prove that by running the following…

> db.indexTest.getIndexes()

which gives the following result…

[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "name" : "_id_",
    "ns" : "simonBlog.indexTest"
  }
]

Now suppose we want to find all documents where the “b” field equals 4…

> db.indexTest.find({b:4}).explain()
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
        }
      },
      "direction" : "forward"
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

Again we’re doing a collection scan but this time we can see the filter details:

 { "b" : { "$eq" : 4 } }

The explain() method accepts an optional “verbosity” parameter. If this is omitted then the default queryPlanner verbosity is used. The parameters dictates how much information is returned in by the explain() method. You can see in the previous example that a field queryPlanner is returned.

If we run the same query using the executionStats verbosity we get an additional field in the results with the execution statistics

> db.indexTest.find({b:4}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
       }
     },
     "direction" : "forward"
   },
   "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 29,
    "totalKeysExamined" : 0,
    "totalDocsExamined" : 32000,
    "executionStages" : {
      "stage" : "COLLSCAN",
      "filter" : {
        "b" : {
          "$eq" : 4
        }
      },
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 30,
      "works" : 32002,
      "advanced" : 3200,
      "needTime" : 28801,
      "needFetch" : 0,
      "saveState" : 250,
      "restoreState" : 250,
      "isEOF" : 1,
      "invalidates" : 0,
      "direction" : "forward",
      "docsExamined" : 32000
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

This gives some information about how many documents will be returned by the query (nReturned) and how many documents and index keys were examined (totalDocsExamined and totalKeysExamined respectively). Here we can see that all 32,000 documents were accessed because a collections scan occurred.

The final verbosity option is allPlansExecution and this returns statistics on the other plans considered for execution.

Now lets add an index that might be useful to our query. We’re filtering on the “b” field so lets add an index there.

> db.indexTest.createIndex({"b":1})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 2,
  "ok" : 1
}

I’m using version 3.0 of MongoDB and in previous versions to create an index you had to use ensureIndex() instead of createIndex() which is still available for backwards compatibility.

As you can see the index was successfully created: we did have 1 index (the one on _id) and we now have 2 indexes. Creating the index on {“b”:1} means create the index in ascending order. {“b”:-1} would be in descending order.

> db.indexTest.getIndexes()
[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "name" : "_id_",
    "ns" : "simonBlog.indexTest"
  },
  {
    "v" : 1,
    "key" : {
      "b" : 1
    },
    "name" : "b_1",
    "ns" : "simonBlog.indexTest"
  }
]

Now let’s run our query again using the explain() method with the execution statistics on

> db.indexTest.find({b:4}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "b" : 1
        },
        "indexName" : "b_1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 171,
    "totalKeysExamined" : 3200,
    "totalDocsExamined" : 3200,
    "executionStages" : {
      "stage" : "FETCH",
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 10,
      "works" : 3201,
      "advanced" : 3200,
      "needTime" : 0,
      "needFetch" : 0,
      "saveState" : 26,
      "restoreState" : 26,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 3200,
      "alreadyHasObj" : 0,
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 3200,
        "executionTimeMillisEstimate" : 10,
        "works" : 3200,
        "advanced" : 3200,
        "needTime" : 0,
        "needFetch" : 0,
        "saveState" : 26,
        "restoreState" : 26,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "b" : 1
        },
        "indexName" : "b_1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ]
        },
        "keysExamined" : 3200,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0,
        "matchTested" : 0
      }
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

Now we can see that an index scan occurred (IXSCAN) and 3,200 keys were examined (keysExamined). However, you can see from the docsExamined field that we still had to access 3,200 documents. This is an improvement from the 32,000 documents we accessed without the index and MongoDB has to access the actual documents because we are returning all the fields in the documents. Only the “b” field is included in the index so it uses the index to do the filtering then uses the pointers in the index to directly access the correct documents.

Now suppose we only want to return the “c” field in our query. A more useful index would be on the “b” field and then the “c” field. The “b” must be the first one in the index as this is the one used to filter on.

> db.indexTest.createIndex({"b":1,"c":-1})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 2,
  "numIndexesAfter" : 3,
  "ok" : 1
}

Now’s let use projections only return the “c” field in the results.

> db.indexTest.find({b:4},{_id:0,c:1}).explain("executionStats")
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "simonBlog.indexTest",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "b" : {
        "$eq" : 4
      }
    },
    "winningPlan" : {
      "stage" : "PROJECTION",
      "transformBy" : {
        "_id" : 0,
        "c" : 1
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "b" : 1,
          "c" : -1
        },
        "indexName" : "b_1_c_-1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ],
          "c" : [
            "[MaxKey, MinKey]"
          ]
        }
      }
    },
    "rejectedPlans" : [
      {
        "stage" : "PROJECTION",
        "transformBy" : {
          "_id" : 0,
          "c" : 1
        },
        "inputStage" : {
          "stage" : "FETCH",
          "inputStage" : {
            "stage" : "IXSCAN",
            "keyPattern" : {
              "b" : 1
            },
            "indexName" : "b_1",
            "isMultiKey" : false,
            "direction" : "forward",
            "indexBounds" : {
              "b" : [
                "[4.0, 4.0]"
              ]
            }
          }
        }
      }
    ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 3200,
    "executionTimeMillis" : 4,
    "totalKeysExamined" : 3200,
    "totalDocsExamined" : 0,
    "executionStages" : {
      "stage" : "PROJECTION",
      "nReturned" : 3200,
      "executionTimeMillisEstimate" : 0,
      "works" : 3201,
      "advanced" : 3200,
      "needTime" : 0,
      "needFetch" : 0,
      "saveState" : 26,
      "restoreState" : 26,
      "isEOF" : 1,
      "invalidates" : 0,
      "transformBy" : {
        "_id" : 0,
        "c" : 1
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 3200,
        "executionTimeMillisEstimate" : 0,
        "works" : 3201,
        "advanced" : 3200,
        "needTime" : 0,
        "needFetch" : 0,
        "saveState" : 26,
        "restoreState" : 26,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "b" : 1,
          "c" : -1
        },
        "indexName" : "b_1_c_-1",
        "isMultiKey" : false,
        "direction" : "forward",
        "indexBounds" : {
          "b" : [
            "[4.0, 4.0]"
          ],
          "c" : [
            "[MaxKey, MinKey]"
          ]
        },
        "keysExamined" : 3200,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0,
        "matchTested" : 0
      }
    }
  },
  "serverInfo" : {
    "host" : "Simon-Laptop",
    "port" : 27017,
    "version" : "3.0.1",
    "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
  },
  "ok" : 1
}

The executionStats field contains details on the winning plan and we can now see that 3,200 keys are still examined but now 0 documents have been accessed. This index now covers this query. If you compare the executionTimeMillisEstimate now we have 0 (meaning less than 1) where when we still had to access the documents it was 10. If you can see it took 30 milliseconds when there was no index at all.  So having the covered index is over 30 times faster.

In this last result from explain() you can see that MongoDB did consider using the first index we created just on the “b” field but this plan was rejected.

All the examples above have been on fields with simple data types, either strings or integers. It is possible to index fields that contain arrays and embedded documents.  The creation of the index is the same, just how they are stored is slightly different.  More on that in a future post.

The documentation on the MongoDB website is pretty good once you’ve got a basic understanding of how things work and there are more details on the different fields returned by the explain() method.

MongoDB Storage Engines

I’m currently participating the “MongoDB for DBAs” and “MongoDB for .Net Developers” free online courses on the https://university.mongodb.com/ website.  This week int he DBA course they have talked about how the storage engine works in MongoDB.  It was quite interesting stuff and here are some of my notes…

Mongo DB Storage Engine

The storage engine is the part of the MongoDB server that runs all the CRUD commands and affects how the data is read, written to and removed from disk as well as how it is physically stored on the disk.

In MongoDB 3.0 you now get a second choice of storage engine.

In earlier versions you had the original MMAP v1 which is still the default storage version in 3.0. But now you have the option to use the Wired Tiger storage engine which is the newly supported an open source storage engine that is also used by other databases.

When you start the mongod server you can specify which storage engine to use, e.g.

mongod --storageEngine mmapv1

or

mongod --storageEngine wiredTiger

If you’ve specified a stored engine you’ll see it in the log when mongod starts but not if you’re just using the default.

01

You can also see it by running the following from the shell

db.serverStatus()

02

You MongoDB instance can only run either MMAP v1 or Wired Tiger. You cannot have both MMAP v1 or Wired Tiger data files in your data path directory.

03

MMAP v1

MMAP v1 stores the data as BSON and maps the raw BSON directly into virtual memory which allows the operating system to do most the work for the storage engine. If the files are not already in memory then they get pulled into RAM and any updates will get propagated back to disk

Collection level locking is now available in version 3.0 and database level locking with version 2.2 to 2.6.

The MongoDB lock is a “mutliple reader, single writer” lock which means you can have multiple readers on the data that will lock out all writers, however one writer will lock out all readers and also all other writers.

They don’t have document level locking in MMAP v1 as there are other shared resources that can cause contention. For example, if two separate documents exist in the same index, then an update to the first document will result in an update to the index and an update to the second document will result in another update to the index. If the update to both documents occurs simultaneously then we may have issues updating the index.

The journal is used a bit like the transaction log in SQL Server. Any update is written to the journal before it is applied to the database. This means that if the file system or the service goes down then the unplayed updates in the journal can be performed when everything is back up and running.

As soon as you create a document in a collection in a database a 64MB data file gets created. When the size of the database increases and the original 64 MB file is filled, a new double sized file is created, i.e. a 128MB file. As the database size increases a new data file double the size of the last will be created until the 2GB limit is reached. Once it reaches this limit new 2GB will keep being created.

04

Prior to version 3.0 MongoDB used to add some padding if it saw documents increasing and would try and guess how much padding to add. In version 3.0 they introduced a Power of 2 sized allocation similar to what they do with the data files. If a small document less than 32 bytes is inserted it will be allocated a 32B slot in the data file. If the document then grows to 33B it will be moved to a 64B slot. The size of the allocation slot will be a power of 2 up to the 2MB limit and after this more 2MB slots get allocated up to the 16MB document size limit. If a document has to move in the disk then all pointers in any indexes need to be updated, which has a performance overhead. This power of 2 padding doesn’t prevent document movement but it can reduce it and a document that has moved will leave behind a standardised slot in the data file for any new document to fit into nicely.

If you have a database where the documents will not grow or you create place holders in the documents to give them a certain size, you can disable the padding by explicitly creating the collection with no padding, i.e.

db.createCollection("simonTest", {noPadding:true})

Wired Tiger

There are three main benefits to Wired Tiger

  1. Performance gains
  2. Document level locking
  3. Compression

This is the first release so they are expecting more improvements in MongoDB using Wired Tiger.

Wired Tiger data is stored per collection rather than per database and is stored in B-Trees. Any new documents are added to a separate region of the data files and are moved in with the others in the background. When a document is updated the new version of the document is created as a new document and the old file is removed. Because of this there is no need for padding.  Wired Tiger can apparently perform this with relative ease.

There are two caches: the Wired Tiger Cache and the File System Cache. All changes to the Wired Tiger Cache are moved in to the File System Cache via a checkpoint. The checkpoint runs 60 seconds after the last checkpoint ended and the previous but one checkpoint is not deleted until the next checkpoint has finished. This means that you could turn off the journal but you still could potentially lose the last 60 seconds of data. Once in the File System Cache the changes are flushed to disk.

Wired Tiger has concurrency protocols that are the equivalent of document level locking.

In MMAP v1 the raw BSON is held in the same format in memory as it is stored on disk so there is no compression available. With Wired Tiger the data format is different in the Wired Tiger Cache than on the file system so compression is available. There are two type of compression you can enable in the Wired Tiger stored engine: Snappy or zLib which are both open source compression libraries. By default compression is Snappy which is less process intensive than zLib but won’t compress the data as much.

For more information on the MongoDB storage engine see the documentation on their website.

Back in the habit

It’s been a while since I last blogged anything and this has mainly been down to a change of jobs. I spent the first few months trying to understanding the pretty big and quite complicated system at my new place and got out of the habit of blogging. So the purpose of this post is just to get back to it and there’s nothing much of any use here for anyone looking for any technical tips.

Since I started my new job I’ve had to learn a bit more about service broker,  change tracking, change data capture and partitioning which are used extensively in our existing system. I’m hoping to write a few posts about a few things I’ve found since using these parts of SQL Server.

I’ve also started looking at the data warehousing 2012 exam (70-463) and am currently going through the guide book so might post a few bits about my learning there.

I recently attended sqlbits in London and will put a few bits about that in when I get chance, at least some links to the blogs of some of the people I saw. Bob Ward’s short memory session on the Friday morning was a highlight for me.

While at sqlbits I attended a session titled “7 databases in 70 minutes” the idea of which arose from the book “7 databases in 7 weeks” which I own but have not yet read. Off the back of this I had another look at the MongoDB website and found they’ve just released version 3.0 and are running more of their free online courses. I’ve signed up for both the DBA and .Net developer courses and we’re currently in week 3 of the 7 week course (100% grade so far on my homework for both courses – been a while since I looked at any c# code). There is still time to register if anyone wants to learn more about MongoDB and I’m hoping to blog a few things around the DBA side of things including data storage and performance. They have a new storage engine named Wired Tiger that is available in version 3.0 and this looks to have a few advantages over their existing MMAP engine including document level locking. More on that at some point soon.

Using Octopus Deploy to Publish SSDT Projects Built with TeamCity

In another post I talked about how we were asked to move away from TFS to start using GitHub, TeamCity and Octopus Deploy.  In a further post I demonstrated a way of building and publishing an SSDT project under source control in a local git repository using TeamCity.  In this post I’m going to describe how to install Octopus Deploy and how to use it to publish an SSDT project built with TeamCity.

Overview

Before I start going into details of how to build and publish an SSDT project using TeamCity and Octopus Deploy I thought I’d run over how the end to end process will work.

  • An SSDT project exists in source control, for example GitHub or TFS
  • TeamCity is configured to listen for changes made to the source control repository and triggers a build of the SSDT project whenever a check in occurs
  • Once the build is complete a NuGet package is created that contains the DACPAC for the SSDT project and a Deploy.ps1 powershell file that actually does the deployment
  • The NuGet package gets fed up to Octopus Server which is configured to run the Deploy.ps1 file against the required tentacle

Installation

There are three parts to Octopus Deploy that we need to install to get it all working in the way we want and all of these are available from the Octopus Deploy download page. Firstly, there is Octopus Server. This is the bit that controls the deployment and should be installed somewhere appropriate, for example, on your build server or on it’s own box. Secondly, there is Octopus Tentacle. This needs to be installed on all the servers that you want to deploy to. So if your environment has three DBs servers, two apps servers and four web servers you’ll need to install this on every one. Octopus Server then connects to each of the Tentacles and deploys to them. One singe Octopus Server can control the deployment to all your environment so you can deploy to CI, staging and live all from one box. Finally, there is a TeamCity plugin for Octopus that can be deployed to control the publishing directly from TeamCity.  In this example I’m running everything on my local box which is running Windows 7.

Octopus Server

  • Firstly, download the latest version of the Octopus Server from the Octopus Deploy website.  In this example I’ve download the 64 bit version 2.4.5.46.
  • Run the msi and the installation wizard will start.  Accept the terms and conditions and choose and appropriate install path – I’m using D:\Octopus, then click Install.
  • Once the install has completed open the Octopus Manager (if it hasn’t opened already).
  • You should see a Welcome! screen so click on “Get started…” and make your way through to the Licence section
  • You can download a 45 day trial licence by clicking on the link and filling in the small form on the website.  You’ll then get a licence key emailed to you which is a piece of XML you can paste into the box in the Octopus Setup Wizard.
  • Choose a storage location – again I’m just using D:\Octopus.
  • Choose an HTTP listen port – I’m using 785
  • Set up the authentication – I’m just using a username and password stored in Octopus but if you’re using this for something other than a proof of concept you might want to use an active directory account.
  • Once the install has completed you should now have a windows service named “OctopusDeploy” installed and you can browse http://localhost:785.  You can configure the Octopus Server settings using the Octopus Manager.
Octopus Manager after successful installation

Octopus Manager after successful installation

Octopus Tentacle

  • Firstly, download the latest version of the Octopus Tentacle from the Octopus Deploy website.  In this example I’ve download the 64 bit version 2.4.5.46.
  • Run the msi and the installation wizard will start.  Accept the terms and conditions and choose and appropriate install path – I’m using D:\OctopusTentacle, then click Install.
  • Once the install has completed open the Tentacle Manager (if it hasn’t opened already).
  • You should see a Welcome! screen so click on “Get started…”
  • Choose appropriate storage paths – I’m using D:\OctopusTentacle
  • I used the default Listening Tentacle communication style
  • Once the install has completed you should now have a windows service named “OctopusDeploy” installed and you can browse http://localhost:785.  You can configure the Octopus Server settings using the Octopus Manager.
  • Browse to your Octopus server, e.g. localhost:785, and click on the Configuration tab.
  • Click on Certificates and copy the thumbprint value.  This needs to be added to every tentacle you set up.
  • Back in the Tentacle Setup Wizard, paste the Octopus thumbprint in and choose an appropriate listen port – I’m using the default 10933.
  • Once the install has completed you should now have a windows service named “OctopusDeploy Tentacle” installed. You can configure the Octopus Tentacle settings using the Tentacle Manager
Tentacle Manager after successful installation

Tentacle Manager after successful installation

Configure Environment

Now we’ve got an instance of Octopus Server and a Tentacle we can now configure our first environment

  • Browse to your Octopus server, e.g. http://localhost:785, and click on the Environments tab
  • Add a new environment and add a new machine
  • Enter the name of the machine you installed the Tentacle to and click discover.  If the server is able to connect to the tentacle you should see something like the following…

Octopus3

  • The value in the Thumbprint box will match the thumbprint shown in the Communications section on your Tentacle Manager
  • See the Octopus documentation for how to use the Environments and Roles values in the Deployment section – for this example I’m just going to give this tentacle a role of “Database”
  • Click on Save and you should now see your machine added.
  • If you have multiple environments with multiple machines you would add them all in the same way.

Configure Communication between TeamCity and Octopus Deploy

I believe there are at least a couple of ways to do this, with one of them being using one of the build runner types added as part of the TeamCity Octopus plugin but for this example I’m going to set up a NuGet feed.  If you’ve not already set up TeamCity please see my previous post. For the remaining steps I’m assuming you’ve got TeamCity set up and connected to some source control.

  • Browse to you TeamCity instance and click on Administration in the top right corner
  • Click on the NuGet Settings under Integrations and enable to the NuGet Server.  You will then be given a Authenticated Feed URL.  Copy this URL.

Octopus4

  • Next click on the NuGet.exe tab, then “Fetch NuGet”.  Choose the version NuGet you want to install and click Add

Octopus5

  • Browse to your Octopus Server site and click on the Library tab.
  • Under the “External feeds” tab click “Add feed”.
  • Give it an appropriate name and paste in the NuGet Server URL from TeamCity and enter the username and password for you TeamCity account.

Octopus6

  • Click on “Save and test”.  Ignore the test for the moment, we first need to create a NuGet package containing our DACPAC and then we can test it.

Octopack – tricking SSDT into using it

If you’ve not already got an SSDT project that you want to publish then create a test one now.  I’ve just created a very simple one named “SimonOctopusPlay” with a single table named “SimonTest” with a single column named “Id”. In order for Octopus to deploy our DACPAC it needs to be passed to it in the form of an artefact.  In this case the artefact is a NuGet package containing the DACPAC and a powershell to do the deployment. Octopus provides a NuGet package named Octopack that can be added to certain project types and a TeamCity plugin that handles the creation of the artefact for you.

Octopus8

However, Octopack does not support SSDT projects

Octopus9

The reason is doesn’t support SSDT projects is because what we’re trying to put in the artefact is not what Octopus is expecting.  Octopus wants to be passed the actual change scripts rather than the full model inside the DACPAC.  For this reason Octopus only supports a change-script-driven deployment approach rather than a model-driven approach which is what SSDT is.  Octopus supports ReadyRoll and DbUp as shown in their documentation.  However, change-script-driven development seems absolutely awful.  I’ve only had a brief play with ReadyRoll but in my opinion it’s nowhere near as good a tool as SSDT.  If you’re a developer who does front end, application and database development and you have a pretty simple data model then change-script-driven development my be better for you, but if you are a database specially I would recommend SSDT.  However, at the end of day these are all tools that you can get working so it’s down to what ones you prefer. Even though Octopack doesn’t support SSDT we can trick it quite simply.

  • There may be another way to get Octopack but one way is to add a console application to your solution and add the Octopack package to it.
  • Now within your solution folder you’ll have a “packages” folder that contains the Octopack package.
  • You can now delete the console app and open the SSDT project file in a text editer
  • Manually add an import to the bottom of the project file as follows…
<Import Project="..\packages\OctoPack.3.0.31\tools\OctoPack.targets" />

I’m not sure why it has to go at the bottom but if not then the TeamCity plugin doesn’t seem to pick it up.

If you’ve not already done so we need to include the solution folder into source control for TeamCity to pick up.  I briefly mentioned how to do this in my previous post. Now we need to install the TeamCity plugin for Octopus.

Installation – TeamCity plugin

Now we’ve tricked our SSDT project into using Octopack we need to install the TeamCity plugin to allow the build to use it.

  • Firstly, download the latest version of the TeamCity plugin from the Octopus Deploy website.
  • Copy the downloaded zip file to C:\ProgramData\JetBrains\TeamCity\plugins
  • As soon as you drop it in here it will automatically be unpacked into the .unpacked subdirectory

Octopus11

  • Now restart the TeamCity Server windows service.

Octopack – create NuGet package

Now we can set up our build definition in TeamCity as per my previous post but only build the solution rather than publishing it.

As we’ve installed the TeamCity plugin you will now see an additional couple of settings under “Octopus Packaging”.

Octopus12

Now is a good time to ensure the build step in TeamCity runs okay.  If it does we can then tell it to use Octopack to create a NuGet package containing the DACPAC. To do this we need to create two files within all the SSDT projects we want to publish. Firstly create a .nuspec file with the same name as the SSDT project, for example SimonOctopusPlay.nuspec.  This .nuspec file is the thing that controls what files need to be included in the Octopus artefact and should look something like this…

<?xml version="1.0"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
 <metadata>
 <id>SimonOctopusPlay</id>
 <version>1.0.0.1</version>
 <authors>Simon Peacock</authors>
 <description>Package for SimonOctopusPlay</description>
 </metadata>
 <files>
 <file src="\bin\Release\SimonOctopusPlay.dacpac" target="Content" />
 <file src="\bin\Release\Deploy.ps1" />
 </files>
</package>

Octopus13

Now we need to create the Deploy.ps1 file.  This is a powershell script that Octopus will automatically run if exists in each deployment.  This is the thing that does the actual deployment and we just use the Microsoft.SqlServer.Dac.dll methods to do this. For full details on what you can do with this see MSDN. The DacDeployOptions matches the options you can configure when publishing in Visual Studio.

Octopus14

Here is an example of a Deploy.ps1 script

# Set params
$databaseName = "SimonOctopusPlay"

# Add the DLL
# For 32-bit machines
#Add-Type -path "C:\Program Files\Microsoft SQL Server\110\DAC\bin\Microsoft.SqlServer.Dac.dll"
# For 64-bit machines
Add-Type -path "C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\Microsoft.SqlServer.Dac.dll"

# Create the connection strnig
$d = New-Object Microsoft.SqlServer.Dac.DacServices "server=(local)"

$dacpac = (Get-Location).Path + "\Content\" + $databaseName + ".dacpac"

Write-Host $dacpac

# Load dacpac from file & deploy to database
$dp = [Microsoft.SqlServer.Dac.DacPackage]::Load($dacpac)

# Set the DacDeployOptions
$options = New-Object Microsoft.SqlServer.Dac.DacDeployOptions -Property @{
 'BlockOnPossibleDataLoss' = $true;
 'DropObjectsNotInSource' = $false;
 'ScriptDatabaseOptions' = $true;
 'IgnorePermissions' = $true;
 'IgnoreRoleMembership' = $true
}

# Generate the deplopyment script
$deployScriptName = $databaseName + ".sql"
$deployScript = $d.GenerateDeployScript($dp, $databaseName, $options)

# Return the script to the log
Write-Host $deployScript

# Write the script out to a file
$deployScript | Out-File $deployScriptName

# Deploy the dacpac
$d.Deploy($dp, $databaseName, $true, $options)

The Deploy.ps1 file must get included in the output of the build so to do this make sure the “Copy to Output Directory” for the file is set to “Copy Always” or “Copy if newer”.

Octopus17

Now we can check in our nuspec and ps1 files and update TeamCity to create the nuspec package that will be used as the Octopus Deploy artefact.

TeamCity – running the build

We can now check the “Run OctoPack” in our build step and try the build again.

Octopus18

Please note: if you are using a git repository for your source control then you will probably see the following error when you run the build.

error MSB4062: The “OctoPack.Tasks.GetAssemblyVersionInfo” task could not be loaded from the assembly C:\TeamCity\buildAgent\work\245900cf36cf882e\packages\OctoPack.3.0.31\tools\OctoPack.Tasks.dll.

The DLL in question is part of the OctoPack pacakge but your global gitignore file is probably set to ignore all DLL files so this is not actually checked into your repository.  Simply update you global gitignore to not ignore this particular DLL and you should now be able to check this into your git repo.  You’ll need to do the same for the NuGet.exe file in the same folder.

Octopus19

Now you should hopefully have a successful build and you should see that an artefact has been created for the database project and it contains the Deploy.ps1 file and the compiled DACPAC.

Octopus21

Octopus Deploy

Now that we have our artefact this should be available to Octopus for deployment.  If we go to our Octopus Server web page and click on the Projects tab we can create a new deployment project.  Then under the Process tab we can add a deployment step to “Deploy a NuGet package”.  Choose the NuGet feed that we set up earlier and enter the name of the NuGet package as the NuGet package ID

Octopus22

Now we can click the “Create Release” button in the top right corner and Save the release. Then we can deploy this release to our environment.  If it is successful the database and table should have been created.

The Octopus TeamCity plugin also gives you three additional build runners: OctopusDeploy : Create release, OctopusDeploy: Deploy release and OctopusDeploy: Promote release.  This can be used to automate the release and deployment thr0ugh TeamCity rather than doing it manually through Octopus.

Advanced Settings

If you have a database project that has a database reference to another project or DACPAC then you need to make sure that you add all the DACPACs for the referenced projects into the NuGet package so they are available for when the Deploy.ps1 file is run.

If you use any SQLCMD variables in you database projects you will need to handle these in your powershell script within the DacDeployOptions.  This can be hard coded into the Deploy.ps1 script or you can set up variables in TeamCity for this.

Hopefully if you followed through these steps you should have something that works.  If not let me know and I’ll help where I can.

Using TeamCity to Build and Deploy SSDT Projects

In another post I talked about how we were asked to move away from TFS to start using GitHub, TeamCity and Octopus Deploy.  In this post I’m going to describe how to install TeamCity and how to build and deploy an SSDT project using it.  I’m running this on Windows 7.

Installation

  •  Firstly, download the latest version of TeamCity from the JetBrains website.
  • In the installation wizard select the install directory, install both the Build Agent and Server  components and choose the data directory.  You should be able to just use the default values for this.
  • Toward the end of the installation you’ll be asked to select a port to run the server component.  By default this will be 80 but if you already have an application using port 80 you will be told so.  Simply select a port not in use, for example 4567.
  • Next configure the Build Agent Properties.  You should be able to use the defaults for this.
  • Finally choose the accounts to run the server and agent services under.

Running TeamCity

  • Provided the TeamCity Server windows service is running you’ll now be able to browse to localhost:portNumber and see the server interface.  This is where you configure your builds.
  • The first time you use this interface you have to set up a few bits.  For testing purposes you can just use the “Internal (HSQLDB)” database type.
  • Once the final components have been set up agree the licence and then create an admin account
  • We’re now ready to use TeamCity but first we need an SSDT project to build and deploy.

Creating a test SSDT project

If you’ve not already got an SSDT project that you want to build you can either import a existing database into a new project or just create a very simple test project.

I’ve just created a new SSDT project named “SimonTeamCity” and created a table named “Person” with a couple of columns as follows:

TeamCity1

There a few ways that we can deploy this SSDT project but firstly I’m just going to create a publish.xml file to deploy to my local default instance.

TeamCity2

Now the project files exist we just need to add them to some sort of source control.  I’m going to use git for this and will just simply “Git Init” my solution directory. Please note, TeamCity does support building from TFS repositories (and many others).

Building the test project with TeamCity

The documentation in the Online Reference section of the Jet Brains website is good and you should be able to use this to build your project, but this I how I did it.

Firstly we need to create a new TeamCity project and this is done by selecting the “Projects” link at the top of the TeamCity server interface and then selecting “Create project”

TeamCity3

Give the project an appropriate name and click “Create”.

TeamCity4

Now we need to create the source control root.  Click on “VCS Roots” and then “Create VCS root”.  Select the Type of VCS as Git (or whatever source control you are using), add a VCS root name.  For git set the Fetch URL to be the local file path to the solution directory (i.e. your local git repository) and set the Default branch correctly.  You can click on “Test connection” to make sure everything is working correctly.

TeamCity5

Now under the “General Settings” we can create a new “Build Configuration” giving it an appropriate name.

Choose the existing VCS root to connect to and then the Build step page should be displayed.

You might have been given a few auto-detected build steps and there are several way to build the SSDT projects.  However, I think the easiest one is the “Visual Studio (sln)” build runner.

Edit or create your build step by using the “Visual Studio (sln)” Runner type and give it a Step Name.

If not already populated you can use the icon to the far right of the Solution file path text box to chose the .sln file

Choose the appropriate version of Visual Studio and just so we can test the build leave the Targets and Configuration as Rebuild and Release respectively.  You may want to show the advanced options and set an appropriate Platform, for example “Any CPU”.

TeamCity7

Save the build step and then click the “Run” button. Provided your agent is configured correctly and the agent service is running the build should start.  If you are not taken to the build run just click on the “Projects” link and after a little while you should see the success status

TeamCity8

You can click on the Success (or error) link to give more details of what happened during the build including seeing the Build Log.

Publishing the test project with TeamCity

In another post I’m going to demonstrate how to use TeamCity to generate a nuget package containing the compiled DACPAC that can be consumed and deployed using Octopus Deploy.  But now I’m just going to show how we can make a few simple changes to the Build Step to have TeamCity publish the SSDT project.

All we need to do is set the Targets to Build (or Rebuild) and Publish and then pass in the name of the publish.xml file into the Command line parameters

TeamCity9

Now if we run the build we should get a success message and can see in the build log that the SimonTeamCity database was created and the Person table added.

TeamCity10

We can also see that the database and table now exists on the server specified in the publish.xml file

TeamCity11

Continuous Integration

Now we can publish our projects to a database we can also set up TeamCity to trigger a build and deploy every time we check something into source control.  We can then create some tests using something like tSQLt that can be run after a successful publish to ensure we’ve not broken anything.

To add a trigger we just need to create one using the “Triggers” link in the “Build Configuration Settings”.  Create a VCS Trigger to detect the check ins.

TeamCity12

Now if I add a column to the existing table in my SSDT project, add a new table and check these in to my local git repository we should see that a build is automatically triggered and my local database gets updated.

TeamCity13

 

TeamCity14

Source control, build and deploy of SQL Server databases using GitHub, TeamCity and Octopus Deploy

I’ve only really used Microsoft products for source control, build and deployment: SourceSafe in my younger days and more recently TFS.  I’ve been using SSDT for a while now with the database projects checked into TFS.  Since we went back to using sprints we’ve been using the TFS work items to track epics, stories and tasks and organise these into a backlog where we pull from for each sprint.  I really like the fact that TFS does everything and that you can check your changes in under one or more work items so it’s easy to tell what changes were made for what story.  I have never been a big fan of the build definitions in TFS and don’t like updating the XAML files.  However, building and deploying SSDT projects is very simple using MSBuild (see a previous post for how to do this).  When you build an SSDT project you end up with a DACPAC and you can publish this using MSBuild.exe passing in a publish.xml file or using SQLPackage.exe passing in target server and database information.  However, a few weeks ago we were told we were moving away from TFS and would be using “better” tools, namely GitHub for source control, TeamCity for build and Octopus for deploy.  The developers in the company were keen to move to GitHub and myself and the rest of the database team are always willing to try new things and improve our processes.

Moving to GitHub was relatively straight forward.  We’re still using SSDT and at the end of the day source control just holds a bunch of files so GitHub was fine for this.  It took a little while to get used to the slightly different way git and GitHub work and at first I was using SourceTree and still use this from time to time but Visual Studio can also connect directly to a git repository and I find it much easier checking in and syncing my local git repository with the remote one through Visual Studio.  I’m not going to go into any more details about GitHub as if you decide to use it there is plenty of online help.

As I have mentioned the TFS build definitions were the things I liked least about TFS so I was looking forward to having a play with TeamCity. I like the product and have written a separate post on how to install it and set up a build for an SSDT project and publish to a particular environment.

We were recommended to use Octopus Deploy. There is a TeamCity plugin to allow you to run Octopack which is a package you can added to Visual Studio projects which when run will create a nuget package that can be passed into Octopus Deploy to deploy to whatever environments you need. This ensures that the same build artefacts get deployed across all your environments and this can be completely automated: you can check-in to your chosen source control which will trigger a build that is then deployed to a CI server and bunch of automated tests can be run. If the tests pass it can then be deployed to a staging environment and more automated tests can be run. If these test also pass we can then deploy to live and the whole process can be completely automated and triggered just from checking in some code. I know you can do all this with TFS but in theory this sounded great and I quite liked the Octopus application from the online demo I watched. However, the problem is that the Octopack package cannot be added to SSDT projects.  We were told maybe we should use something other than SSDT.  SSDT uses a “model-driven” approach meaning that you get your SSDT project into the state you want your data model in and then use an application like MSBuild to make a target database the same as the SSDT model.  Octopus recommend that you use a “change-script-driven” approach meaning that you create an initial data model and release this.  Any changes you make to the initial model and made by writing change scripts, e.g. ALTER TABLE, ALTER PROC etc.  Octopus only easily supports “change-script-driven” applications like ReadyRoll and DbUp.  I had a quick look at ReadyRoll and after using SSDT using a “change-script-driven” tool instead of a “model-driven” tool seems to be absolutely ridiculous.  Why would you make development an absolute nightmare just to fit into the build and deploy tool you want to use.  So we are sticking with SSDT and if you’re using SQL Server there is no better tool in my opinion.

Now we reached a bit of a stalemate: we were told we had to use Octopus and we were not moving away from SSDT.  After a bit of digging around we managed to find a way of tricking the SSDT project to use Octopack and to create a nuget package containing the compiled DACPAC and a powershell deploy script that uses the Data-Tier Application Framework to publish the DACPAC against each environment in Octopus.  It took a long time to work out how to do it but it’s not too complicated.  I’ve written another post that describes in detail how to do this.

In summary

So what’s better for SQL Server source control, build and deploy: TFS or GitHub, TeamCity and Octopus Deploy? For a start I’d always say use SSDT rather than any other tool.  If licence costs are not an issue and you want the ability to check code in under work-items then I’d use TFS for source control.  The main benefit of TFS is it does everything including the story board if you’re working in sprints.  We’re currently using Trello as our sprint board.  I’ve been using Trello for my personal to-do list for quite some time but I prefer the TFS sprint board and being able to check in under a particular work item.  There may be story boards out there that link to GitHub and if so this is an option.  I prefer TeamCity to the TFS build definitions and quite like Octopus.  However, having to manually create a nuspec file and Powershell deployment script for something you can do in one short MSBuild or SQLPackage command seems a bit extra work.  If I was starting a new project and had to choose one setup I’d probably go with TFS for work management and source control and use TeamCity for both build and deploy without using Octopus.  It was a close call and I can get by using any of them.  At the end of the day they are all tools that work (albeit after some initial pain), none of them perfect but all are manageable.

Related Posts:
Using TeamCity to Build and Deploy SSDT Projects
Using Octopus Deploy to Publish SSDT Projects Built with TeamCity

“Internal Error: Ese could not be initialized” when performing a schema compare in SSDT

I have recently imported several of our live databases into SSDT and was just trying to perform a schema compare to make sure I pick up any changes since the import before we start using the SSDT solution as our source code.  I have just received the following error…

“Target is unavailable: Internal Error: Ese could not be initialized”

SSDT Ese Error1

In this case the source is my database and the target is the database project (if I switch them round so the database project is the source then the error changes to “Source is unavailable…”).

If you look in the application log you can see a much more useful error…

SSDT Ese Error2

Looking at this error we can see that during the schema compare SSDT creates a .model file in “C:\Users\SimonP\AppData\Local\Temp\12aadf86-a621-49fa-8c75-de8dfa92fdc2.model” and it cannot open it because it is compressed.  My C drive is compressed but if I remove the compression from “C:\Users\SimonP” the schema compare now works.