Archiving old SQL Server data to Azure Data Lake and reading it with Data Lake Analytics

I recently worked on a project where we rebuilt one of our existing systems. This involved one team building a new application and the data engineering team rebuilding the database.

We migrated a lot of the data from the old database into the new but also aggregated up some financial data and added it as a balance transfer from the old system to the new.

The old database has been sitting in read-only state for a long time now and we’d like to get rid of it to recover some space. However, very recently I was asked to show details of how the balance transfer amount was calculated for a particular customer. I was able to read from the read-only old database on this occasion but wanted to come up with a way to archive some of the old database data so that is can be accessed if need be.

The solution I came up with was to export the data we need into our Azure Data Lake Store and then create a U-SQL Project in Visual Studio to hold the queries we run against the Data Lake. This project is then placed in source control for other people to use these queries in the future.

I’m going to use some of the tables in the AdventureWorks2016 database to mimic how I set this up…

An example query…

Let’s assume we’ve built a new version of AdventureWorks2016 and migrated a lot of the data across. However, for certain reasons we didn’t migrate over the order history but once or twice a year we might get a request asking for the order history for a customer. If the AdventureWorks2016 database is still available then we can run something like the following for AccountNumber AW00029794

USE AdventureWorks2016;
GO

SELECT
    C.CustomerID,
    C.AccountNumber,
    SOH.SalesOrderID,
    SOH.OrderDate,
    SOH.DueDate,
    SOH.ShipDate,
    SOH.SalesOrderNumber,
    SOD.SalesOrderDetailID,
    SOD.OrderQty,
    SOD.UnitPrice,
    P.ProductID,
    P.Name AS ProductName
FROM Sales.SalesOrderHeader SOH
JOIN Sales.SalesOrderDetail SOD ON SOD.SalesOrderID = SOH.SalesOrderID
JOIN Sales.Customer C ON C.CustomerID = SOH.CustomerID
JOIN Production.Product P ON P.ProductID = SOD.ProductID
WHERE C.AccountNumber = 'AW00029794'
ORDER BY SOH.OrderDate,
SOH.SalesOrderID,
SOD.SalesOrderDetailID;

This returned 59 rows including these…

Archive to ADLS 01

The following steps show how we can archive some of this data into the Data Lake and then run adhoc queries like the one above against it.

Archive tables

I just used the Import/Export Wizard via SQL Server Management Studio to export the table data as CSV files.

When choosing the destination it’s important to remember what options you chose as this will determine the options you need to include in your U-SQL script to read the data from the Data Lake.

Archive to ADLS 02

I’ve identified the Customer, Product, SalesOrderHeader and SalesOrderDetail tables as the only ones I need to archive so I’ve ended up with 4 CSV files containing all the data from these tables.

Archive to ADLS 03

Upload to Azure Data Lake

Like any Azure resource you need to have an active Azure Subscription (free trials are available) and I’m assuming you have this as well as have provisioned an Azure Data Lake Store (ADLS).

For this demo I’ve got a Data Lake set up called aworks2016archive and a Data Lake Analytics (ADLA) account with the same name linked to this Data Lake.

I now use Visual Studio and the Cloud Explorer for Visual Studio extension to interact with ADLS.

Archive to ADLS 06

This gives me the Cloud Explorer window and I can connect to the ADLS from there

Archive to ADLS 07.JPG

I can open this ADLS in File Explorer and add a new folder named AdventureWorks2016 to store the exported CSV files. I can then right click in that folder and Upload the CSV files…

Archive to ADLS 08

Reading the data from ADLS

I now use Visual Studio and the Azure Data Lake and Stream Analytics Tools extension to interact with ADLS and ADLA and query the data in the CSV files.

Archive to ADLS 04

I create a new U-SQL Project in Visual Studio. This allows me to keep my U-SQL scripts under source control although I can just run these scripts directly in ADLA via the Azure Portal if need be.

Archive to ADLS 05

I then end up with a new project with a single u-sql file named Script.usql

To run the script using the aworks2016archive ADLA account I need to choose it from the drop down. You can run this script against the local connection while you are developing it. I can also choose the number of analytics units (AU) here, up to the maximum set in the ADLA account. For the time being I’m going to set this to 1.

Within U-SQL you have extractors and outputters for reading data and writing results, respectively. There is quite a good course on ADLS and ADLA on Microsoft Virtual Academy called Introducing Azure Data Lake and there are plenty of other resources available so I’m not going to go too much into the U-SQL syntax.

One thing I would like to show is a nice way to get your EXTRACT statement for the files in the ADLS. From Cloud Explorer -> File Explorer I can navigate to the file I want to read and double-click it to open a preview of the file. From here I can choose the Delimiter, Quoting and Encoding to make sure the file is readable.

Archive to ADLS 09

There is also a button named Create EXTRACT Script…

Archive to ADLS 10

The file has the column headers in the first row so I can check the File Has Header Row box and get the correct column names in the script…

Archive to ADLS 11

This tool has made the best guess at what it thinks the data types should be but if the data is sparse for a particular column and there are no values in the preview it quite often just chooses string. I can change the data types in the columns in the data set but I prefer to click Copy Script and paste the code into my u-sql script and change it there.

@input =
    EXTRACT [ProductID] int,
            [Name] string,
            [ProductNumber] string,
            [MakeFlag] bool,
            [FinishedGoodsFlag] bool,
            [Color] string,
            [SafetyStockLevel] int,
            [ReorderPoint] int,
            [StandardCost] int,
            [ListPrice] int,
            [Size] string,
            [SizeUnitMeasureCode] string,
            [WeightUnitMeasureCode] string,
            [Weight] string,
            [DaysToManufacture] int,
            [ProductLine] string,
            [Class] string,
            [Style] string,
            [ProductSubcategoryID] string,
            [ProductModelID] string,
            [SellStartDate] DateTime,
            [SellEndDate] string,
            [DiscontinuedDate] string,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/Product.csv" 
    USING Extractors.Csv(skipFirstNRows:1);

In order to know what the data types should be it might be useful to store the schema of the original SQL Server table in the ADLS as well. For the Product table this is as follows…

CREATE TABLE [Production].[Product](
	[ProductID] [int] IDENTITY(1,1) NOT NULL,
	[Name] [dbo].[Name] NOT NULL,
	[ProductNumber] [nvarchar](25) NOT NULL,
	[MakeFlag] [dbo].[Flag] NOT NULL,
	[FinishedGoodsFlag] [dbo].[Flag] NOT NULL,
	[Color] [nvarchar](15) NULL,
	[SafetyStockLevel] [smallint] NOT NULL,
	[ReorderPoint] [smallint] NOT NULL,
	[StandardCost] [money] NOT NULL,
	[ListPrice] [money] NOT NULL,
	[Size] [nvarchar](5) NULL,
	[SizeUnitMeasureCode] [nchar](3) NULL,
	[WeightUnitMeasureCode] [nchar](3) NULL,
	[Weight] [decimal](8, 2) NULL,
	[DaysToManufacture] [int] NOT NULL,
	[ProductLine] [nchar](2) NULL,
	[Class] [nchar](2) NULL,
	[Style] [nchar](2) NULL,
	[ProductSubcategoryID] [int] NULL,
	[ProductModelID] [int] NULL,
	[SellStartDate] [datetime] NOT NULL,
	[SellEndDate] [datetime] NULL,
	[DiscontinuedDate] [datetime] NULL,
	[rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
	[ModifiedDate] [datetime] NOT NULL,
 CONSTRAINT [PK_Product_ProductID] PRIMARY KEY CLUSTERED 
(
	[ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

I can use this to correct any incorrect data types…

@input =
    EXTRACT [ProductID] int,
            [Name] string,
            [ProductNumber] string,
            [MakeFlag] bool,
            [FinishedGoodsFlag] bool,
            [Color] string,
            [SafetyStockLevel] int,
            [ReorderPoint] int,
            [StandardCost] decimal,
            [ListPrice] decimal,
            [Size] string,
            [SizeUnitMeasureCode] string,
            [WeightUnitMeasureCode] string,
            [Weight] decimal?,
            [DaysToManufacture] int,
            [ProductLine] string,
            [Class] string,
            [Style] string,
            [ProductSubcategoryID] int?,
            [ProductModelID] int?,
            [SellStartDate] DateTime,
            [SellEndDate] DateTime?,
            [DiscontinuedDate] DateTime?,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/Product.csv" 
    USING Extractors.Csv(skipFirstNRows:1);

Notice some of the data types have a question mark after their name. This is to indicate that the field is nullable. Please note, you can’t include this for string data types.

In this example we’re extracting from a specific file but you can use wildcards in the path within ADLS.

We are using the built in CSV extractor and specify one of the options to skip the first 1 row. There are other options available here and I’m going to add some of these in even though they are the default values. If we’d checked the Unicode box in the destination on the Import/Export Wizard I would have to specify the Unicode encoding type…

@input =
    EXTRACT [ProductID] int,
            [Name] string,
            [ProductNumber] string,
            [MakeFlag] bool,
            [FinishedGoodsFlag] bool,
            [Color] string,
            [SafetyStockLevel] int,
            [ReorderPoint] int,
            [StandardCost] decimal,
            [ListPrice] decimal,
            [Size] string,
            [SizeUnitMeasureCode] string,
            [WeightUnitMeasureCode] string,
            [Weight] decimal?,
            [DaysToManufacture] int,
            [ProductLine] string,
            [Class] string,
            [Style] string,
            [ProductSubcategoryID] int?,
            [ProductModelID] int?,
            [SellStartDate] DateTime,
            [SellEndDate] DateTime?,
            [DiscontinuedDate] DateTime?,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/Product.csv" 
    USING Extractors.Csv(skipFirstNRows:1, encoding:Encoding.UTF8, quoting:true, silent:false);

I’m going to give this variable a more appropriate name instead of @input and then do the same for the other three CSV files…

@customer =
    EXTRACT [CustomerID] int,
            [PersonID] int?,
            [StoreID] int?,
            [TerritoryID] int?,
            [AccountNumber] string,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/Customer.csv" 
    USING Extractors.Csv(skipFirstNRows:1);

@salesorderheader =
    EXTRACT [SalesOrderID] int,
            [RevisionNumber] int,
            [OrderDate] DateTime,
            [DueDate] DateTime,
            [ShipDate] DateTime?,
            [Status] int,
            [OnlineOrderFlag] bool,
            [SalesOrderNumber] string,
            [PurchaseOrderNumber] string,
            [AccountNumber] string,
            [CustomerID] int,
            [SalesPersonID] int?,
            [TerritoryID] int?,
            [BillToAddressID] int,
            [ShipToAddressID] int,
            [ShipMethodID] int,
            [CreditCardID] int?,
            [CreditCardApprovalCode] string,
            [CurrencyRateID] int?,
            [SubTotal] double,
            [TaxAmt] double,
            [Freight] double,
            [TotalDue] double,
            [Comment] string,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/SalesOrderHeader.csv" 
    USING Extractors.Csv(skipFirstNRows:1);

@salesorderdetail =
    EXTRACT [SalesOrderID] int,
            [SalesOrderDetailID] int,
            [CarrierTrackingNumber] string,
            [OrderQty] int,
            [ProductID] int,
            [SpecialOfferID] int,
            [UnitPrice] double,
            [UnitPriceDiscount] double,
            [LineTotal] double,
            [rowguid] Guid,
            [ModifiedDate] DateTime
    FROM "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/SalesOrderDetail.csv" 
    USING Extractors.Csv(skipFirstNRows:1);

Joining data in U-SQL

Now that we’ve got our four EXTRACT statements we can join these all together and create the required data set that we can then output…

@orders =
SELECT
       c.CustomerID,
       c.AccountNumber,
       soh.SalesOrderID,
       soh.OrderDate,
       soh.DueDate,
       soh.ShipDate,
       soh.SalesOrderNumber,
       sod.SalesOrderDetailID,
       sod.OrderQty,
       sod.UnitPrice,
       p.ProductID,
       p.Name AS ProductName
    FROM @customer AS c
    JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerID
    JOIN @salesorderdetail AS sod ON sod.SalesOrderID == soh.SalesOrderID
    JOIN @product AS p ON p.ProductID == sod.ProductID
    WHERE c.AccountNumber == "AW00029794";

There are some differences in this SELECT statement to the T-SQL equivalent. Remember this is U-SQL not T-SQL.

The variables we’re joining together have to have an AS before the alias name and the alias name can’t all be in upper case.

It also uses the C# equality operator with a double equals (single equals is used for assigning values).

The code is also case sensitive and you may be used to a SQL Server collation that is not. So ProductID is different to ProductId.

Double-quotes are used on string data instead of single.

There is a section at the bottom that shows some of the errors I got when first compiling my U-SQL scripts.

Outputting the data

Now that we’ve got the query to get the required data we need to output the results to the ADLS. We can’t see the results in Visual Studio like we would if running a T-SQL query against SQL Server.

OUTPUT @orders
TO "adl://aworks2016archive.azuredatalakestore.net/AdventureWorks2016/AW00029794.csv"
USING Outputters.Csv(outputHeader:true);

Here we specify the file we want to output the results to and we’re using the built in CSV outputter. The only option I’m specifying is that I want the column names included as the first row but you can see other options here.

Now we’ve got our full script we can execute it by clicking the Execute button.

Executing the U-SQL job

The scripts execute within jobs and if the script compiles successfully the job window will be displayed like the following…

Archive to ADLS 12

Here we can see the completed job…

Archive to ADLS 13

We can see the four Extract operations at the top and then some Combine operations which is where we are joining the data together.

One thing to note is the Extract on the Customer.csv file. Our EXTRACT statement didn’t have any filtering. It was only in the SELECT statement that we added the where clause to get only one customer. But if you look at the SV1 EXTRACT operation you can see only 1 row was returned. The compiler knows to apply that filter from the SELECT statement to the EXTRACT statement. Looking at the code you might think the EXTRACT statements work into a similar way to loading data from SELECT statements into temp tables in T-SQL in which case it would read all the data from the table and only filter from the temp table.

As we can see from the SV7 Combine operation there are 59 rows outputted which matches the number from the original T-SQL query from the very beginning.

Because we are running queries against the ADLA account we can log in to the Azure portal and see the job history…

Archive to ADLS 14.JPG

As you can see the successful job took 58 secs with 1 AU.  If I double-click the job and can see the details of it including the job graph.  There is a tab AU analysis that can be used to work out the most efficient amount of AU’s to use.  We have to find the balance between execution time and cost.

Because the data sets are tiny in this example 1 AU is the best choice.

The output file

The result of the script execution is a file written to ADLS and we can now see that in File Explorer

Archive to ADLS 15

We can view a preview of this file or download it.

Some errors that can be returned

Here are a few of the errors that I’ve received while creating U-SQL scripts along with the cause and how to resolve them.

  • Alias all in upper case

Problem:

JOIN @salesorderheader AS SOH ON SOH.CustomerID == c.CustomerID

Error:

E_CSC_USER_USQLUPPERCASEIDENTIFIER: The identifier SOH is all upper case.

Solution:

Alias SOH needs to be Soh or soh

JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerID
  • Missing AS in alias

Problem:

JOIN @salesorderheader soh ON soh.CustomerID == c.CustomerID

Error:

E_CSC_USER_SYNTAXERROR syntax error. Expected one of: '(' ALL ANTISEMIJOIN ANY AS BROADCASTLEFT BROADCASTRIGHT CROSS DISTINCT EXCEPT FULL FULLCROSS GROUP HASH HAVING INDEXLOOKUP INNER INTERSECT JOIN LEFT LOOP MERGE ON OPTION ORDER OUTER OUTER UNION PAIR PIVOT PRESORT PRODUCE READONLY REQUIRED RIGHT SAMPLE SEMIJOIN SERIAL TO UNIFORM UNION UNIVERSE UNPIVOT USING WHERE WITH ';' ')' ','

Solution:

Add AS before any aliases

JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerID
  • Incorrect casing

Problem:

JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerId

Error:

E_CSC_USER_QUALIFIEDCOLUMNNOTFOUND: Column 'CustomerId' not found in rowset 'c'.

Solution:

Make sure all casing is correct

JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerID
  • Wrong equality operator used

Problem:

JOIN @salesorderheader AS soh ON soh.CustomerID = c.CustomerID

Error:

E_CSC_USER_JOINEXPRESSIONNOTVALID: The Join Condition soh.CustomerID = c.CustomerID is not supported.

Solution:

Equality operators need to be double equals

JOIN @salesorderheader AS soh ON soh.CustomerID == c.CustomerID
  • Trying to return results

Problem:

SELECT
       c.CustomerID,
       c.AccountNumber,
...

Error:

E_CSC_USER_QUERYSTATEMENTMISSINGASSIGNMENT	Missing assignment for query statement.

Solution:

Can’t return results in a U-SQL script.  You have to output the results to a file

@orders =
SELECT
       c.CustomerID,
       c.AccountNumber,
...
OUTPUT @orders
...

 

Advertisements

Exam prep for 70-762

Last Updated 19/11/2018

A few months ago I took exam 70-761 and now I’m going to take 70-762 – Developing SQL Databases.

As part of my prep for 70-761 I wrote about a few of the topics less well known to me and they can all be found here – Exam prep for 70-761.

This page covers some of the topics covered in 70-762.

70-761 Exam Prep: Miscellaneous T-SQL – Part 2

I’m currently preparing to take the exam 70-761 – Querying Data with Transact-SQL and have been going through various resources including Itzit Ben-Gan’s exam guidebook which is certainly worth reading if you’re planning on taking this exam.

There are quite a few smaller bits of T-SQL functionality that I’m not that familiar with that I’m going to demonstrate in this and further posts. It’s a bit of a random selection but hopefully it will help me remember them and might be of use to others as well.

Contents

  1. Date and Time Functions
  2. IIF (Inline IF Function)
  3. CHOOSE Function
  4. String Functions

1. Date and Time Functions

There are quite a few different Date and Time functions available in SQL Server. There are a few I used regularly such as GETUTCDATE, DATEPART and DATEADD but some I hardly use or have never used at all. I can never remember the syntax for most of them so I’m going to give an example of each one along with a bit of detail about each.

Returning current date and time

The following functions all return the current date and time.
The timezone of my PC is BST.

This function is specific to SQL Server and returns a DATETIME value containing the current date and time.

SELECT GETDATE(); 
2018-04-30 22:13:18.403

This function is specific to SQL Server and returns a DATETIME value containing the current UTC date and time.

SELECT GETUTCDATE();
2018-04-30 21:13:18.403

This function is SQL standard and returns a DATETIME value containing the current date and time.

SELECT CURRENT_TIMESTAMP;
2018-04-30 22:13:18.403

This function is SQL standard and returns a DATETIME2(7) value containing the current date and time.

SELECT SYSDATETIME(); 
2018-04-30 22:13:18.4048719

This function is SQL standard and returns a DATETIME2(7) value containing the current UTC date and time.

SELECT SYSUTCDATETIME(); 
2018-04-30 21:13:18.4048719

This function is SQL standard and returns a DATETIMEOFFSET value containing the current date, time and offset.
As mentioned above this is BST which is 1 hour ahead of UTC.

SELECT SYSDATETIMEOFFSET();
2018-04-30 22:13:18.4048719 +01:00

Date Parts

These following two functions can be used to extract different parts of a supplied date and time.
For example, the following will return the month number from the supplied date or time parameter (please note I’m passing in a VARCHAR value but this is implicitly converted to a DATETIME2 data type).

SELECT DATEPART(MONTH, '20180430');
4

The following will return the minute value from the supplied date time value.

SELECT DATEPART(MINUTE, '20180430 03:23:17');
23

If no time part is included in the date value then it’s assumed to be midnight which is why this returns 0.

SELECT DATEPART(MINUTE, '20180430');
0

The following returns the number of days since the 1st Jan.

SELECT DATEPART(DAYOFYEAR, '20180430');
120

The following returns any offset in minutes. The offset is three hours so this returns 180 minutes.

SELECT DATEPART(TZoffset, '20180430 13:30:00 +03:00')
180

DATEPART always returns an INT value.

We can use the DATENAME function to return textual values for date parts.
For example, the following returns the month name for the supplied date.

SELECT DATENAME(MONTH, '20180430')
April

The DATENAME function is language dependent and will return the value in the language of the user running the query.

We can use the following functions to return date and time values from integer date parts. Each function requires a different number parameters and all are mandatory in order to build up the returned value correctly.

The following returns a DATE value.

SELECT DATEFROMPARTS(2018,04,30)
2018-04-30

The following returns a DATETIME value.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,34,13)
2018-04-30 19:42:34.013

All 7 parameters in this one are required but we can obviously use zeroes if we don’t have the required level of precision.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,0,0)
2018-04-30 19:42:00.000

The following returns a DATETIME2 type with an eighth parameter used to specify the precision of the data type.
The “fraction” parameter (the seventh one) is related to the precision which determines how many digits are allowed in the “fraction” parameter.
For example,

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,1);
2018-04-30 19:42:34.5


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,2);
2018-04-30 19:42:34.05


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,3);
2018-04-30 19:42:34.005


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,7);
2018-04-30 19:42:34.0000005

If we’re only allowing a precision of 1 we can’t specify 2 digits in the “fraction” parameter.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,15,1);
Msg 289, Level 16, State 5, Line 89
Cannot construct data type datetime2, some of the arguments have values which are not valid.

If we are using a precision of 0, which is just to the nearest second, the “fraction” parameter must be 0 otherwise we get the same error as above.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,0);
Msg 289, Level 16, State 5, Line 53
Cannot construct data type datetime2, some of the arguments have values which are not valid.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,0,0);
2018-04-30 19:42:34

The following returns a DATETIMEOFFSET value with the offset hour and minute parameters before the precision.

SELECT DATETIMEOFFSETFROMPARTS(2018,4,30,19,42,34,0,2,30,1);
2018-04-30 19:42:34.0 +02:30

The following function returns a DATE value containing the last date in the month of the supplied date.

SELECT EOMONTH('20180315');
2018-03-31

There is an optional offset parameter that can move forward or back the specified number of months

SELECT EOMONTH('20180315', -1);
2018-02-28
SELECT EOMONTH('20180315', 2);
2018-05-31

There isn’t a “start of month” function but further down I show how to do this using the DATEADD and DATEDIFF functions.

DATEADD

The DATEADD function can be used to add a specified number to the specified date part of the supplied date time value and returns a DATETIME value.
For example, the following adds 10 days to 10th April 2018.

SELECT DATEADD(DAY, 10, '20180410');
2018-04-20 00:00:00.000

The following one takes 1 month from 10th April 2018.

SELECT DATEADD(MONTH, -1, '20180410');
2018-03-10 00:00:00.000

DATEDIFF

This function compares a certain date part value between two datetime values and returns an INT value.
For example, the following calculates that there are 2 days difference between 1st April and 3rd April 2018.

SELECT DATEDIFF(DAY, '20180401', '20180403');
2

If the number returned is very large we can use this function to return the value as a BIGINT.

SELECT DATEDIFF_BIG(MILLISECOND, '19000101', GETUTCDATE());
3734111598403

Above we saw we can use the built in EOMONTH function to get the last day of the month for the supplied date.
Here’s a nice little trick to get the first date of the supplied month, in this case the current month. I’m running this on 30th April 2018

SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, GETUTCDATE()), 0);
2018-04-01 00:00:00.000

To explain how this works let’s first look at the DATEDIFF part that is passed in as the second parameter in the DATEADD function.
The start date parameter is supplied as zero. This defaults to 1st Jan 1900.
So the following two statements return the same value which is the number of months since 1st Jan 1900.
I’m running this in April 2018 so this returns 1419

SELECT DATEDIFF(MONTH, 0, GETUTCDATE())
SELECT DATEDIFF(MONTH, '19000101', GETUTCDATE())

The DATEADD function also uses 0 as the date parameter which again defaults to 1st Jan 1900.
So this function is saying “add 1419 months to 1st Jan 1900” and this gives us 1st April 2018.

Offest functions

We can use the TODATETIMEOFFSET function to add an offset to a date time value. This returns a DATETIMEOFFSET value.

SELECT TODATETIMEOFFSET('20180430 10:00', '+02:00')
2018-04-30 10:00:00.0000000 +02:00

We can use the SWITCHOFFSET function to switch from one time zone to another.
The first parameter has to be a DATETIMEOFFSET value and the second is the offset from UTC.  I’ve run this at 21:16 BST which is UTC + 1.

SELECT GETDATE();
2018-04-30 22:13:18.403


SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '+02:00');
2018-04-30 23:13:18.4048719 +02:00

We can also add an offset using “AT TIME ZONE” and specifically specify the time zone as follows.

SELECT CAST('20180415 17:32' AS DATETIME) AT TIME ZONE 'Central European Standard Time'
2018-04-15 17:32:00.000 +02:00

2. IIF (Inline IF function)

Most programming languages provide some sort of ternary or condition operator and SQL Server is no different.  It provides the Inline IF function (IIF)
This function accepts three parameters. The first is a boolean, or an expression that returns a boolean.  If the boolean value is true then the value supplied in the second parameter is the value returned by the function.  If the boolean value is false then the value supplied in the third parameter is the value returned by the function.

SELECT IIF(1=1, 1, 0)
1

SELECT IIF(1=0, 1, 0)
0

If you are more familiar with the CASE statement, the queries above are equivalent to the following…

SELECT CASE WHEN 1=1 THEN 1 ELSE 0 END
SELECT CASE WHEN 1=0 THEN 1 ELSE 0 END

3. CHOOSE Function

The CHOOSE function accepts at least 2 parameters.
The first parameter is the “index” and has to be an INT that says which value from the preceding parameters should be returned.
The following examples all use four parameters meaning there are three values available to chose from the parameters supplied after the “index” parameter.

The following will return the value in the first parameter after the “index” value

SELECT CHOOSE(1,'a','b','c')
a

The following will return the value in the second parameter after the “index” value

SELECT CHOOSE(2,'a','b','c')
b

The following will return the value in the third parameter after the “index” value

SELECT CHOOSE(3,'a','b','c')
c

If we pass an “index” value greater than the number of values to choose from then the function returns NULL.

SELECT CHOOSE(4,'a','b','c')
NULL

4. String Functions

There are several string functions provided in SQL Server and I’m going to demonstrate some of the ones I find most useful…

 

The following repeats the string “ab” 5 times…

SELECT REPLICATE('ab', 5);
ababababab

The following reverses the supplied string…

SELECT REVERSE('abcd');
dcba

The SPLIT_STRING table valued function will split a string based on a supplied separator. This is available from SQL Server 2016 onwards.
The following splits the string into a data set of rows using the underscore character as the separator…

SELECT *
FROM STRING_SPLIT('My_Name_Is_Simon', '_')

SPLIT_STRING

The STUFF function is one of the most powerful string manipulation tools.
It can be used to insert (or stuff) a string at any point inside another string.
It can be used to replace a certain number of characters at any point within a string and replace them with another string.
The scalar function returns a VARCHAR and accepts four parameters…
1. This is the original string supplied for modification
2. This is the starting point for the string manipulation
3. This is the number of characters in the original string that are going to be deleted starting at the character supplied in parameter 2
4. This is the string that we want to “stuff” into the original string at the character supplied in parameter 2

In the following example the original string is “abcde”. Parameter 2 says we’re going to perform the “stuff” at character 1, i.e. at character “a”.  Parmeter 3 says we’re deleting 0 characters from the original string and the final parameter is the the string we want to “stuff” in at character 1.
This will insert the character “x” at the beginning of the original string…

SELECT STUFF('abcde',1,0,'x')
xabcde

We can’t use STUFF to add “x” to the end of the string.  We would want to place it as the 6th character in this example but if we try this is just returns NULL.

SELECT STUFF('abcde',6,0,'x')
NULL

However, add a string to the end of another is just simple concatenation.

SELECT 'abcde' + 'x'
SELECT CONCAT('abcde', 'x')
abcdex

Now let’s try and replace the “c” character with the “x” character using the STUFF function…

SELECT STUFF('abcde', 3, 1, 'x')
abxde

Obviously this is not specifically looking for the “c” character as the REPLACE function would, it’s just replacing the third character in the original string with the string in the fourth parameter.

If we want to replace the second and third characters of the string we can do the following…

SELECT STUFF('abcde', 2, 2, 'x')
axde

This is saying delete the two characters starting at character 2, i.e. “bc”, and insert the string in the fourth parameter, i.e. “x”.

The fourth parameter doesn’t have to be a single character…

SELECT STUFF('abcde', 2, 2, 'alongerstring')
aalongerstringde

The statement won’t error if the third parameter is a bigger number than the number of characters in the original string.
For example, if we know the original string is less than 100 characters long we can run the following to replace everything but the first character…

SELECT STUFF('abcde', 2, 100, 'XXXXXXX')
aXXXXXXX
SELECT STUFF('fghijklmnopq', 2, 100, 'XXXXXXX')
fXXXXXXX

SQL Server now supplies dynamic data masking functions but we could use STUFF to do something similar such as mask all but the last four digits of a credit card.
Card numbers are typically 16 digits long so we could delete the first 12 digits by starting at digit 1 supplying 12 as the third parameter.  We could then replace that deleted part of the string with 12 asterisk characters…

SELECT STUFF('1234567812345678',1 , 12, REPLICATE('*', 12));
************5678

One final thing on STUFF is that it can be used along with FOR XML to concatenate rows of string data into a single string.  Details of this can be found here.

The following functions extract a certain number of characters from a supplied string
This returns the first 2 characters on the left…

SELECT LEFT('abcde', 2);
ab

This returns the last 2 characters…

SELECT RIGHT('abcde', 2);
de

This returns the 3 character starting from character 2…

SELECT SUBSTRING('abcde', 2, 3);
bcd

I’m only going to show one example of the FORMAT function as there is quite a lot to this one. For full information see the MS documentation.
This example will format a supplied UK phone number into a more readable format…

SELECT FORMAT(07777123678, '+44 (0) #### ### ###');

Please note as the value supplied in the first parameter is a number the leading zero is ignored.

There are several other string functions available and please see the MS documentation for full details.

 

 

Forcing default values on a nullable columns

When we add a new column to an existing table that contains data we need to think about if we need to populate the data in this column for the existing rows.

If the column should not allow NULL then we need to supply a default for the new column (or set it to allow NULL, backfill it, then change to not allow NULL).

We can add a default constraint to a column that allows NULL but the default is not applied unless we specifically say to set it

Here’s a demonstration of working with defaults on nullable columns.

First let’s create a table as follows…

DROP TABLE IF EXISTS dbo.WithValuesTest;
GO

CREATE TABLE dbo.WithValuesTest
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    FullName VARCHAR(100) NOT NULL
);
GO

INSERT INTO dbo.WithValuesTest
(
    FullName
)
VALUES
    ('Leslie Tambourine'),
    ('Linda Pencilcase'),
    ('Sally Carpets'),
    ('Andy Anchovie'),
    ('Ronnie Clotheshorse');
GO

SELECT *
FROM dbo.WithValuesTest

This gives us the following data…
DefaultNULL01

If we try to add a new not nullable column to this table this errors because there is no data to add into the new column

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL;
Msg 4901, Level 16, State 1, Line 86

ALTER TABLE only allows columns to be added that…

  1. can contain nulls, or
  2. have a DEFAULT definition specified, or
  3. the column being added is an identity or timestamp column, or
  4. the table must be empty

The column DateOfBirth cannot be added to the non-empty table WithValuesTest because it does not satisfy these conditions.

However, it we specify a default it will work…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest

DefaultNULL02

To continue let’s drop this new column

ALTER TABLE dbo.WithValuesTest DROP CONSTRAINT IF EXISTS DF_DOB;
ALTER TABLE dbo.WithValuesTest DROP COLUMN IF EXISTS DateOfBirth;

Now let’s try to create the same column with the same default but now allow the column to be nullable

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL03

As you can see, unlike when the column didn’t allow NULL, the default value has not been used.

Let’s drop the column again and this time create it with the WITH VALUES clause…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL DEFAULT '19000101' WITH VALUES;

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL04

To complete the demo let’s now try and add some new rows.

If we don’t include the DateOfBirth column in the insert list then the default is used…

INSERT INTO dbo.WithValuesTest (FullName)
VALUES ('Bernard Seesaw');

DefaultNULL05

Suppose our code needs to insert the date of birth if it’s supplied but should use the default value if it’s not supplied, i.e. if it’s NULL.

If we just try a straight insert using the NULL value then NULL is what is put into the table.

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Lizzie Onion', '19830402'),
('Terence Artichoke', NULL);

DefaultNULL06

We can specify DEFAULT instead of NULL and it will use the default value on the insert

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Mavis Breadbin', DEFAULT);

DefaultNULL07

The DEFAULT value can be used in an UPDATE statement as well…

UPDATE dbo.WithValuesTest
SET DateOfBirth = DEFAULT
WHERE FullName = 'Terence Artichoke'

DefaultNULL08

 

Partitioning Series

I’ve now written a few different posts about various parts of partitioning so thought I would put them all together here in a single series.

  1. Introduction to Partitioning – this post covers how to set up a table with multiple partitions using a partition scheme and function and briefly touches on partition elimination to help reduce the amount of data being read in a query.
  2. More Partitioning – this covers partition elimination in more detail and includes tips to make sure the optimiser knows it can eliminate certain partitions. It also covers the $PARTITION function.
  3. Using $PARTITION to find the last time an event occured – this post demonstrates a real life scenario where I’ve been able to use the $PARTITION function to speed up a query.
  4. 70-761 Exam Prep: Miscellaneous T-SQL – Part 1 – section 3 of this post covers truncating partitions which is new functionality available in SQL Server 2016.
  5. Modifying Partitions – this post shows how to add and remove partitions from an existing partition function.

SSDT: Target Platform and Compatibility Level

I have just seen an issue in one of my database release scripts where I’m getting a load of statements like the following….

PRINT N'Altering [dbo].[TestQueue]...';

GO

PRINT N'No script could be generated for operation 2 on object ''[dbo].[TestQueue]''.';

GO

This is because of the “Target Platform” setting on the properties of the project.  For this database it was set to 2008 as follows:

This needs to be set to the version of the SQL Server that the database sits on in live.  In this case it was SQL Server 2014.

When I changed this it automatically updated the compatibility level of the database to 2014.  However, although sitting on a SQL Server 2014 instance this database is still on 2008 R2 compatibility level so I had to click on “Database Settings” then click on the “Miscellaneous” tab and change it back to 2008 as follows.

Then when I generated the scripts again all the service broker queue alter statements were gone.

In summary, the “Target Platform” needs to be the version of the SQL Server Instance and the “Compatibility level” is the compatibility level of the database itself.