Exam prep for 70-762

Last Updated 19/11/2018

A few months ago I took exam 70-761 and now I’m going to take 70-762 – Developing SQL Databases.

As part of my prep for 70-761 I wrote about a few of the topics less well known to me and they can all be found here – Exam prep for 70-761.

This page covers some of the topics covered in 70-762.

Advertisements

70-761 Exam Prep: Miscellaneous T-SQL – Part 2

I’m currently preparing to take the exam 70-761 – Querying Data with Transact-SQL and have been going through various resources including Itzit Ben-Gan’s exam guidebook which is certainly worth reading if you’re planning on taking this exam.

There are quite a few smaller bits of T-SQL functionality that I’m not that familiar with that I’m going to demonstrate in this and further posts. It’s a bit of a random selection but hopefully it will help me remember them and might be of use to others as well.

Contents

  1. Date and Time Functions
  2. IIF (Inline IF Function)
  3. CHOOSE Function
  4. String Functions

1. Date and Time Functions

There are quite a few different Date and Time functions available in SQL Server. There are a few I used regularly such as GETUTCDATE, DATEPART and DATEADD but some I hardly use or have never used at all. I can never remember the syntax for most of them so I’m going to give an example of each one along with a bit of detail about each.

Returning current date and time

The following functions all return the current date and time.
The timezone of my PC is BST.

This function is specific to SQL Server and returns a DATETIME value containing the current date and time.

SELECT GETDATE(); 
2018-04-30 22:13:18.403

This function is specific to SQL Server and returns a DATETIME value containing the current UTC date and time.

SELECT GETUTCDATE();
2018-04-30 21:13:18.403

This function is SQL standard and returns a DATETIME value containing the current date and time.

SELECT CURRENT_TIMESTAMP;
2018-04-30 22:13:18.403

This function is SQL standard and returns a DATETIME2(7) value containing the current date and time.

SELECT SYSDATETIME(); 
2018-04-30 22:13:18.4048719

This function is SQL standard and returns a DATETIME2(7) value containing the current UTC date and time.

SELECT SYSUTCDATETIME(); 
2018-04-30 21:13:18.4048719

This function is SQL standard and returns a DATETIMEOFFSET value containing the current date, time and offset.
As mentioned above this is BST which is 1 hour ahead of UTC.

SELECT SYSDATETIMEOFFSET();
2018-04-30 22:13:18.4048719 +01:00

Date Parts

These following two functions can be used to extract different parts of a supplied date and time.
For example, the following will return the month number from the supplied date or time parameter (please note I’m passing in a VARCHAR value but this is implicitly converted to a DATETIME2 data type).

SELECT DATEPART(MONTH, '20180430');
4

The following will return the minute value from the supplied date time value.

SELECT DATEPART(MINUTE, '20180430 03:23:17');
23

If no time part is included in the date value then it’s assumed to be midnight which is why this returns 0.

SELECT DATEPART(MINUTE, '20180430');
0

The following returns the number of days since the 1st Jan.

SELECT DATEPART(DAYOFYEAR, '20180430');
120

The following returns any offset in minutes. The offset is three hours so this returns 180 minutes.

SELECT DATEPART(TZoffset, '20180430 13:30:00 +03:00')
180

DATEPART always returns an INT value.

We can use the DATENAME function to return textual values for date parts.
For example, the following returns the month name for the supplied date.

SELECT DATENAME(MONTH, '20180430')
April

The DATENAME function is language dependent and will return the value in the language of the user running the query.

We can use the following functions to return date and time values from integer date parts. Each function requires a different number parameters and all are mandatory in order to build up the returned value correctly.

The following returns a DATE value.

SELECT DATEFROMPARTS(2018,04,30)
2018-04-30

The following returns a DATETIME value.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,34,13)
2018-04-30 19:42:34.013

All 7 parameters in this one are required but we can obviously use zeroes if we don’t have the required level of precision.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,0,0)
2018-04-30 19:42:00.000

The following returns a DATETIME2 type with an eighth parameter used to specify the precision of the data type.
The “fraction” parameter (the seventh one) is related to the precision which determines how many digits are allowed in the “fraction” parameter.
For example,

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,1);
2018-04-30 19:42:34.5


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,2);
2018-04-30 19:42:34.05


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,3);
2018-04-30 19:42:34.005


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,7);
2018-04-30 19:42:34.0000005

If we’re only allowing a precision of 1 we can’t specify 2 digits in the “fraction” parameter.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,15,1);
Msg 289, Level 16, State 5, Line 89
Cannot construct data type datetime2, some of the arguments have values which are not valid.

If we are using a precision of 0, which is just to the nearest second, the “fraction” parameter must be 0 otherwise we get the same error as above.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,0);
Msg 289, Level 16, State 5, Line 53
Cannot construct data type datetime2, some of the arguments have values which are not valid.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,0,0);
2018-04-30 19:42:34

The following returns a DATETIMEOFFSET value with the offset hour and minute parameters before the precision.

SELECT DATETIMEOFFSETFROMPARTS(2018,4,30,19,42,34,0,2,30,1);
2018-04-30 19:42:34.0 +02:30

The following function returns a DATE value containing the last date in the month of the supplied date.

SELECT EOMONTH('20180315');
2018-03-31

There is an optional offset parameter that can move forward or back the specified number of months

SELECT EOMONTH('20180315', -1);
2018-02-28
SELECT EOMONTH('20180315', 2);
2018-05-31

There isn’t a “start of month” function but further down I show how to do this using the DATEADD and DATEDIFF functions.

DATEADD

The DATEADD function can be used to add a specified number to the specified date part of the supplied date time value and returns a DATETIME value.
For example, the following adds 10 days to 10th April 2018.

SELECT DATEADD(DAY, 10, '20180410');
2018-04-20 00:00:00.000

The following one takes 1 month from 10th April 2018.

SELECT DATEADD(MONTH, -1, '20180410');
2018-03-10 00:00:00.000

DATEDIFF

This function compares a certain date part value between two datetime values and returns an INT value.
For example, the following calculates that there are 2 days difference between 1st April and 3rd April 2018.

SELECT DATEDIFF(DAY, '20180401', '20180403');
2

If the number returned is very large we can use this function to return the value as a BIGINT.

SELECT DATEDIFF_BIG(MILLISECOND, '19000101', GETUTCDATE());
3734111598403

Above we saw we can use the built in EOMONTH function to get the last day of the month for the supplied date.
Here’s a nice little trick to get the first date of the supplied month, in this case the current month. I’m running this on 30th April 2018

SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, GETUTCDATE()), 0);
2018-04-01 00:00:00.000

To explain how this works let’s first look at the DATEDIFF part that is passed in as the second parameter in the DATEADD function.
The start date parameter is supplied as zero. This defaults to 1st Jan 1900.
So the following two statements return the same value which is the number of months since 1st Jan 1900.
I’m running this in April 2018 so this returns 1419

SELECT DATEDIFF(MONTH, 0, GETUTCDATE())
SELECT DATEDIFF(MONTH, '19000101', GETUTCDATE())

The DATEADD function also uses 0 as the date parameter which again defaults to 1st Jan 1900.
So this function is saying “add 1419 months to 1st Jan 1900” and this gives us 1st April 2018.

Offest functions

We can use the TODATETIMEOFFSET function to add an offset to a date time value. This returns a DATETIMEOFFSET value.

SELECT TODATETIMEOFFSET('20180430 10:00', '+02:00')
2018-04-30 10:00:00.0000000 +02:00

We can use the SWITCHOFFSET function to switch from one time zone to another.
The first parameter has to be a DATETIMEOFFSET value and the second is the offset from UTC.  I’ve run this at 21:16 BST which is UTC + 1.

SELECT GETDATE();
2018-04-30 22:13:18.403


SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '+02:00');
2018-04-30 23:13:18.4048719 +02:00

We can also add an offset using “AT TIME ZONE” and specifically specify the time zone as follows.

SELECT CAST('20180415 17:32' AS DATETIME) AT TIME ZONE 'Central European Standard Time'
2018-04-15 17:32:00.000 +02:00

2. IIF (Inline IF function)

Most programming languages provide some sort of ternary or condition operator and SQL Server is no different.  It provides the Inline IF function (IIF)
This function accepts three parameters. The first is a boolean, or an expression that returns a boolean.  If the boolean value is true then the value supplied in the second parameter is the value returned by the function.  If the boolean value is false then the value supplied in the third parameter is the value returned by the function.

SELECT IIF(1=1, 1, 0)
1

SELECT IIF(1=0, 1, 0)
0

If you are more familiar with the CASE statement, the queries above are equivalent to the following…

SELECT CASE WHEN 1=1 THEN 1 ELSE 0 END
SELECT CASE WHEN 1=0 THEN 1 ELSE 0 END

3. CHOOSE Function

The CHOOSE function accepts at least 2 parameters.
The first parameter is the “index” and has to be an INT that says which value from the preceding parameters should be returned.
The following examples all use four parameters meaning there are three values available to chose from the parameters supplied after the “index” parameter.

The following will return the value in the first parameter after the “index” value

SELECT CHOOSE(1,'a','b','c')
a

The following will return the value in the second parameter after the “index” value

SELECT CHOOSE(2,'a','b','c')
b

The following will return the value in the third parameter after the “index” value

SELECT CHOOSE(3,'a','b','c')
c

If we pass an “index” value greater than the number of values to choose from then the function returns NULL.

SELECT CHOOSE(4,'a','b','c')
NULL

4. String Functions

There are several string functions provided in SQL Server and I’m going to demonstrate some of the ones I find most useful…

 

The following repeats the string “ab” 5 times…

SELECT REPLICATE('ab', 5);
ababababab

The following reverses the supplied string…

SELECT REVERSE('abcd');
dcba

The SPLIT_STRING table valued function will split a string based on a supplied separator. This is available from SQL Server 2016 onwards.
The following splits the string into a data set of rows using the underscore character as the separator…

SELECT *
FROM STRING_SPLIT('My_Name_Is_Simon', '_')

SPLIT_STRING

The STUFF function is one of the most powerful string manipulation tools.
It can be used to insert (or stuff) a string at any point inside another string.
It can be used to replace a certain number of characters at any point within a string and replace them with another string.
The scalar function returns a VARCHAR and accepts four parameters…
1. This is the original string supplied for modification
2. This is the starting point for the string manipulation
3. This is the number of characters in the original string that are going to be deleted starting at the character supplied in parameter 2
4. This is the string that we want to “stuff” into the original string at the character supplied in parameter 2

In the following example the original string is “abcde”. Parameter 2 says we’re going to perform the “stuff” at character 1, i.e. at character “a”.  Parmeter 3 says we’re deleting 0 characters from the original string and the final parameter is the the string we want to “stuff” in at character 1.
This will insert the character “x” at the beginning of the original string…

SELECT STUFF('abcde',1,0,'x')
xabcde

We can’t use STUFF to add “x” to the end of the string.  We would want to place it as the 6th character in this example but if we try this is just returns NULL.

SELECT STUFF('abcde',6,0,'x')
NULL

However, add a string to the end of another is just simple concatenation.

SELECT 'abcde' + 'x'
SELECT CONCAT('abcde', 'x')
abcdex

Now let’s try and replace the “c” character with the “x” character using the STUFF function…

SELECT STUFF('abcde', 3, 1, 'x')
abxde

Obviously this is not specifically looking for the “c” character as the REPLACE function would, it’s just replacing the third character in the original string with the string in the fourth parameter.

If we want to replace the second and third characters of the string we can do the following…

SELECT STUFF('abcde', 2, 2, 'x')
axde

This is saying delete the two characters starting at character 2, i.e. “bc”, and insert the string in the fourth parameter, i.e. “x”.

The fourth parameter doesn’t have to be a single character…

SELECT STUFF('abcde', 2, 2, 'alongerstring')
aalongerstringde

The statement won’t error if the third parameter is a bigger number than the number of characters in the original string.
For example, if we know the original string is less than 100 characters long we can run the following to replace everything but the first character…

SELECT STUFF('abcde', 2, 100, 'XXXXXXX')
aXXXXXXX
SELECT STUFF('fghijklmnopq', 2, 100, 'XXXXXXX')
fXXXXXXX

SQL Server now supplies dynamic data masking functions but we could use STUFF to do something similar such as mask all but the last four digits of a credit card.
Card numbers are typically 16 digits long so we could delete the first 12 digits by starting at digit 1 supplying 12 as the third parameter.  We could then replace that deleted part of the string with 12 asterisk characters…

SELECT STUFF('1234567812345678',1 , 12, REPLICATE('*', 12));
************5678

One final thing on STUFF is that it can be used along with FOR XML to concatenate rows of string data into a single string.  Details of this can be found here.

The following functions extract a certain number of characters from a supplied string
This returns the first 2 characters on the left…

SELECT LEFT('abcde', 2);
ab

This returns the last 2 characters…

SELECT RIGHT('abcde', 2);
de

This returns the 3 character starting from character 2…

SELECT SUBSTRING('abcde', 2, 3);
bcd

I’m only going to show one example of the FORMAT function as there is quite a lot to this one. For full information see the MS documentation.
This example will format a supplied UK phone number into a more readable format…

SELECT FORMAT(07777123678, '+44 (0) #### ### ###');

Please note as the value supplied in the first parameter is a number the leading zero is ignored.

There are several other string functions available and please see the MS documentation for full details.

 

 

Forcing default values on a nullable columns

When we add a new column to an existing table that contains data we need to think about if we need to populate the data in this column for the existing rows.

If the column should not allow NULL then we need to supply a default for the new column (or set it to allow NULL, backfill it, then change to not allow NULL).

We can add a default constraint to a column that allows NULL but the default is not applied unless we specifically say to set it

Here’s a demonstration of working with defaults on nullable columns.

First let’s create a table as follows…

DROP TABLE IF EXISTS dbo.WithValuesTest;
GO

CREATE TABLE dbo.WithValuesTest
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    FullName VARCHAR(100) NOT NULL
);
GO

INSERT INTO dbo.WithValuesTest
(
    FullName
)
VALUES
    ('Leslie Tambourine'),
    ('Linda Pencilcase'),
    ('Sally Carpets'),
    ('Andy Anchovie'),
    ('Ronnie Clotheshorse');
GO

SELECT *
FROM dbo.WithValuesTest

This gives us the following data…
DefaultNULL01

If we try to add a new not nullable column to this table this errors because there is no data to add into the new column

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL;
Msg 4901, Level 16, State 1, Line 86

ALTER TABLE only allows columns to be added that…

  1. can contain nulls, or
  2. have a DEFAULT definition specified, or
  3. the column being added is an identity or timestamp column, or
  4. the table must be empty

The column DateOfBirth cannot be added to the non-empty table WithValuesTest because it does not satisfy these conditions.

However, it we specify a default it will work…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest

DefaultNULL02

To continue let’s drop this new column

ALTER TABLE dbo.WithValuesTest DROP CONSTRAINT IF EXISTS DF_DOB;
ALTER TABLE dbo.WithValuesTest DROP COLUMN IF EXISTS DateOfBirth;

Now let’s try to create the same column with the same default but now allow the column to be nullable

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL03

As you can see, unlike when the column didn’t allow NULL, the default value has not been used.

Let’s drop the column again and this time create it with the WITH VALUES clause…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL DEFAULT '19000101' WITH VALUES;

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL04

To complete the demo let’s now try and add some new rows.

If we don’t include the DateOfBirth column in the insert list then the default is used…

INSERT INTO dbo.WithValuesTest (FullName)
VALUES ('Bernard Seesaw');

DefaultNULL05

Suppose our code needs to insert the date of birth if it’s supplied but should use the default value if it’s not supplied, i.e. if it’s NULL.

If we just try a straight insert using the NULL value then NULL is what is put into the table.

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Lizzie Onion', '19830402'),
('Terence Artichoke', NULL);

DefaultNULL06

We can specify DEFAULT instead of NULL and it will use the default value on the insert

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Mavis Breadbin', DEFAULT);

DefaultNULL07

The DEFAULT value can be used in an UPDATE statement as well…

UPDATE dbo.WithValuesTest
SET DateOfBirth = DEFAULT
WHERE FullName = 'Terence Artichoke'

DefaultNULL08

 

Exam prep for 70-761

It’s been a little while since I took an exam so I’d thought I’d have a go at 70-761.

It covers a lot of the same material that was covered in 70-461 which was based on SQL Server 2012. 70-761 is based on SQL Server 2016 so I’m going to write about a few of the newer features in 2016 as part of my exam prep.

There is a good video on Pluralsight named “SQL Server 2016 New Features for Developers” by Leonard Lobel that covers the new features well.

Below are some details on some of the new features

I’ve also been working through the 70-761 guidebook by Itzik Ben-Gan and there are a few bits I’m less familiar with that I’m going to write about to make sure I fully understand them

GROUPING SETS (and ROLLUP & CUBE)

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of some of the T-SQL I’m less familiar with. One example of this is the GROUPING SETS clause. There are also ROLLUP and CUBE clauses but I believe these are deprecated and GROUPING SETS can do everything ROLLUP and CUBE can do but with much more control.

The best way to explain how GROUPING SETS works is to show some example code. I’ll show how it compares to ROLLUP and CUBE in case you need to update any code to convert it to using GROUPING SETS.

So lets create some test data. I’m just creating a simple table containing a few purchases made by a few customers…

DROP TABLE IF EXISTS dbo.GroupingSetsDemo
GO

CREATE TABLE dbo.GroupingSetsDemo
 (
 Id INT NOT NULL IDENTITY(1,1),
 CustomerID INT NOT NULL,
 PurchaseDate DATETIME2(0) NOT NULL,
 PurchaseItem VARCHAR(100) NOT NULL,
 Amount DECIMAL(10,2) NOT NULL,
 );
 GO

INSERT INTO dbo.GroupingSetsDemo
 (
 CustomerID,
 PurchaseDate,
 PurchaseItem,
 Amount
 )
 VALUES
 (1, '20171106 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20171106 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
 GO

SELECT *
 FROM dbo.GroupingSetsDemo;
 GO

GS1

If we want to see the total value of the purchases for each customer we can just group by the CustomerID and sum up the Amount values as follows…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY CustomerID

GS2

As well a grouping by columns we can group by what are called GROUPING SETS. This example just shows how we can use GROUPING SETS to mimic the previous query to get total sales by customer…

SELECT CustomerID, SUM(Amount) AS TotalAmount
 FROM dbo.GroupingSetsDemo WFD
 GROUP BY
 GROUPING SETS ((CustomerID));

GS2

So in this case we’ve just said to group by a single grouping set for CustomerID.

The really useful stuff starts to happen when we have more than one grouping set. One useful example is where you want to create a total value in the same data set. To do this we just need to add an additional grouping set telling SQL Server to also group over all columns to give us the total amount for all customers…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID), ());

GS3

Please note, that you still need to make sure all columns in the SELECT clause appear in at least one grouping set. So we can’t just include the total group set from the above query. If we try, we get the following error as expected…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (());
Msg 8120, Level 16, State 1, Line 56
 Column 'dbo.GroupingSetsDemo.CustomerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

More interesting things start to happen when we want to group over multiple columns. Let’s bring the purchase year into our query as follows..

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID, YEAR(WFD.PurchaseDate)));

GS4

We can now add more grouping sets to this query, for example, to give total amount for each customer…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID)
 );

GS5

… or for each year…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (YEAR(WFD.PurchaseDate))
 );

GS6

We can also combine everything to give us all possible total values

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 );

GS7

As we can see in the previous example, whenever we are adding a grouping set and getting a total row appear, one or more of the values returned can be NULL.  However, what if one of the values being grouped on is NULL.  How do we differential between the original NULL and the NULL created as part of the grouping set.

In the following example one of the PurchaseItem values is NULL….

DROP TABLE IF EXISTS dbo.GroupingSetsDemoNULL

CREATE TABLE dbo.GroupingSetsDemoNULL
(
  Id INT NOT NULL IDENTITY(1,1),
  CustomerID INT NOT NULL,
  PurchaseItem VARCHAR(100) NULL,
  Amount DECIMAL(10,2) NOT NULL,
);
GO

INSERT INTO dbo.GroupingSetsDemoNULL
(
  CustomerID,
  PurchaseItem,
  Amount
)
VALUES
  (1, 'Rugby Ball', 10),
  (1, 'Frying Pan', 20),
  (1, NULL, 30);
GO

SELECT *
FROM dbo.GroupingSetsDemoNULL;

GS11

Let’s say we want to give a total per customer and per item, and a grand total.  We need to create three grouping sets as follows…

SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemoNULL WFD
GROUP BY 
  GROUPING SETS (
    (CustomerID),
    (PurchaseItem),
    ()
  );

GS12

As we can see, we’ve got two rows where the CustomerID and PurchaseItems are NULL.  How do we know which one of these is the aggregation of the NULL PurchaseItem acorss all customer and which one is the grand total?  In this example looking at the amount we can see the that row 4 with the value of 60 is the grand total but we can also use the GROUPING_ID  to show this explicitly.

The GROUPING_ID function is a scalar function that returns a BIT.  We pass in a particular column name and this returns a flag saying if each row is part of the aggregation for that column or not.

In this example we’ve got two columns we are aggregating over so let’s use the GROUPING_ID function on both of these as follows…

SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount,
     GROUPING_ID(CustomerID) AS CustomerIDGrouped, GROUPING_ID(PurchaseItem) AS PurchaseItemGrouped
FROM dbo.GroupingSetsDemoNULL WFD
GROUP BY 
  GROUPING SETS (
    (CustomerID),
    (PurchaseItem),
    ()
  );

GS13

As we can see row 1 is grouped on the CustomerID only and is the aggregation of the NULL PurchaseItem for all customers and row 4 is the grand total aggregated across all customers and items.

As mentioned above, the only requirement is that each non-aggregated column appears in at least one grouping set so we can return just the total for each customer and for each year as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID),
    (YEAR(WFD.PurchaseDate))
 );

GS8

So what about ROLLUP and CUBE (but please remember these are deprecated)?

CUBE just means that the query is grouped by every possible grouping set, i.e. every combination of non-aggregated columns are used as the grouping sets. The syntax for using CUBE is as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH CUBE
ORDER BY WFD.CustomerID, PurchaseYear

GS

This is the same as using the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (YEAR(WFD.PurchaseDate), WFD.CustomerID),
    (WFD.CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 )
ORDER BY WFD.CustomerID, PurchaseYear

ROLLUP is slightly different and it depends on the order of the columns in the GROUP BY clause. It works by creating a grouping set across all columns in the GROUP BY clause, then creating additional sets across all columns except the last and each time dropping the last column off the end until it’s just left with the empty group. So if we’re grouping over three columns it will create a set for (1,2,3) then for (1,2) then for (1) and finally for (). Here’s an example…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY WFD.CustomerID, YEAR(WFD.PurchaseDate)
WITH ROLLUP

GS9

So this is grouping over both customer and year, then by just customer and finally over all columns. This is the same as creating the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    ()
 )

As mentioned, the order of the columns in the GROUP BY clause makes a difference and switching them from WFD.CustomerID, YEAR(WFD.PurchaseDate) to YEAR(WFD.PurchaseDate), WFD.CustomerID gives different results…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH ROLLUP

GS10

This time we’re grouping by year and customer, then by year, and finally by all columns.

70-761 Exam Prep: Miscellaneous T-SQL – Part 1

I’m currently preparing to take the exam 70-761 – Querying Data with Transact-SQL and have been going through various resources including Itzit Ben-Gan’s exam guidebook which is certainly worth reading if you’re planning on taking this exam.

There are quite a few smaller bits of T-SQL functionality that I’m not that familiar with that I’m going to demonstrate in this and further posts. It’s a bit of a random selection but hopefully it will help me remember them and might be of use to others as well.

Contents

  1. ALL ANY SOME with subqueries
  2. TOP (x) WITH TIES
  3. Truncate partition
  4. TRY_CAST, TRY_CONVERT, TRY_PARSE

1. ALL ANY SOME with subqueries

This is something that I’ve never seen anyone use anywhere but I’ve had questions come up on this in two different exams!

We use the ALL, ANY and SOME keywords when compare data against rows returned by a subquery.  For example, if we have two tables: the first containing numbers 1 to 5 and the second containing 3 to 5 we can use ALL to return all rows from the first table that are less than ALL the values in the second table.  The only values that are less than all the values in the second table are 1 and 2.  We can also use ANY or SOME (they both mean the same thing) to return all numbers less than any of the numbers in the second table.  1 and 2 are less so they all so they get returned, 3 is less than 4 and 5 that that get’s returns, and 4 is less than 5 so that gets returned.

Here’s the code, so it should make more sense looking at this…

DROP TABLE IF EXISTS dbo.AllAnySome;
GO

CREATE TABLE dbo.AllAnySome (SomeNumber INT);
GO

INSERT INTO dbo.AllAnySome (SomeNumber)
VALUES (1), (2), (3), (4), (5);
GO

DROP TABLE IF EXISTS dbo.AllAnySomeLookup;
GO

CREATE TABLE dbo.AllAnySomeLookup (SomeNumber INT);
GO

INSERT INTO AllAnySomeLookup (SomeNumber)
VALUES (3), (4), (5);
GO

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < ALL (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS1

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < SOME (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS2

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < ANY (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS2

The query above using ALL is returning all the row from dbo.AllAnySome that are less than the minimum value from dbo.AllAnySomeLookup, so it’s equivalent to…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < (SELECT MIN(SomeNumber) FROM dbo.AllAnySomeLookup);
GO

We can also do the following…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber = ALL (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS3

… which is the same as using the IN keyword…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber IN (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS3

In this example, ALL could also be replaced with SOME or ANY to return the same results.

2. TOP (x) WITH TIES

You can specify TOP (x) immediately after a SELECT clause to restrict the returned result set to only return x number of rows.  Here’s an example where we’ve got a table with 12 rows and we’re just asking the top 10 rows…

DROP TABLE IF EXISTS dbo.TiesTest;

CREATE TABLE dbo.TiesTest (Id INT);

INSERT INTO dbo.TiesTest (Id)
VALUES (1), (1), (1),
 (2), (2), (2),
 (3), (3), (3),
 (4), (4), (4);

SELECT TOP (10) *
FROM dbo.TiesTest
ORDER BY Id;

TopTies1

As we can see this query has returned all the rows with values 1, 2 and 3 but only one of the three rows with value 4.  If we wanted the TOP 10 rows but also wanted to include any other rows outside of the top 10 with the same values we can use the WITH TIES clause, meaning any subsequent rows where the value ties with the last value will be included.

SELECT TOP (10) WITH TIES *
FROM dbo.TiesTest
ORDER BY Id

TopTies2

As we can see we’ve now got 12 rows returned instead of the 10 as the values in rows 11 and 12 tie with the value in row 10.

One last thing to note is we must use ORDER BY if we’re using WITH TIES.

3. TRUNCATE PARTITION

In SQL Server 2016 we now have the ability to truncate a single partition.  Prior to this we could only truncate the entire table  Here’s an example of how to truncate a single partition…

CREATE PARTITION FUNCTION PF1 (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20171001', '20171101', '20171201', '20180101', 
 '20180201', '20180301', '20180401');
GO

CREATE PARTITION SCHEME PS1
AS PARTITION PF1 ALL TO ([PRIMARY]);
GO

CREATE TABLE dbo.PartitionedTable
(
 Id INT IDENTITY(1,1) NOT NULL,
 FirstName VARCHAR(100) NOT NULL,
 LastName VARCHAR(100) NOT NULL,
 DateOfBirth DATETIME2(0) NOT NULL,
 Address VARCHAR(400) NOT NULL,
 RegisteredDate DATETIME2(0) NOT NULL
) ON PS1(RegisteredDate);

INSERT INTO dbo.PartitionedTable (FirstName, LastName, DateOfBirth, Address, RegisteredDate)
VALUES
('Angela', 'Bennett', '1985-06-03', '1 The Road, This Town, AA1 2BB', '20171015'),
('Colin', 'Denton', '1952-12-05', '12 The Road, This Town, AA1 2BB', '20171125'),
('Eleanor', 'Francis', '1988-07-23', '9 The Road, This Town, AA1 2BB', '20171214'),
('Gerald', 'Harris', '1985-07-25', '102 The Road, This Town, AA1 2BB', '20180109'),
('Iris', 'Jones', '1947-10-18', '93 The Road, This Town, AA1 2BB', '20180206'),
('Kirk', 'Lennox', '2001-04-19', '2 The Road, This Town, AA1 2BB', '20180328');


SELECT *
FROM dbo.PartitionedTable;

TruncPart1

So now we’ve got a partitioned table with a single row on different partitions.

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.PartitionedTable');

TruncPart2

As we can see there is a row on partitions 2 to 7.

We can now truncate partitions 2, 3 and 4 which contains the rows for Angela Bennett, Colin Denton and Eleanor Francis as follows…

TRUNCATE TABLE dbo.PartitionedTable WITH (PARTITIONS (2 TO 4));

We can now see that the data has been removed from the table and the partitions now have no rows…

SELECT *
FROM dbo.PartitionedTable;

TruncPart3

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.PartitionedTable')

TruncPart4

4. TRY_CAST, TRY_CONVERT, TRY_PARSE

Let’s say, for example, that we have a fairly simple process that loads data from one table to another.  We get a feed of the source data into a table from a third party and it contains date values stored as text strings.  Our target table needs to store the date as a DATE data type rather than a string so we explicitly convert it into a DATE when loading into the target.  We’re just using an insert statement to load the data but the data from the supplier is sometimes invalid and causes the whole batch to fail.  We want a way to allow the valid data to go through (and possibly store the failed data somewhere else).  Let’s set up some test data to demo this…

CREATE TABLE T_Source
(
 ItsADate VARCHAR(20)
);

INSERT INTO T_Source (ItsADate)
VALUES 
('20180203'),
('20180204'),
('ThisIsntADate'),
('20180205');

SELECT *
FROM T_Source;

misc1

As we can see the third row does not contain a valid date.

Let’s now create our target table and try and load the source data in…

CREATE TABLE T_Target
(
 DefinitelyADate DATE
);

INSERT INTO dbo.T_Target (DefinitelyADate)
SELECT CAST(ItsADate AS DATE)
FROM dbo.T_Source;

Msg 241, Level 16, State 1, Line 32
Conversion failed when converting date and/or time from character string.

So because of the one invalid row, the other valid rows do not get loaded.  Instead of using CAST we can use TRY_CAST instead which will return NULL if the value can’t be cast and won’t return an error.

SELECT TRY_CAST(ItsADate AS DATE)
FROM dbo.T_Source

misc2

We probably wouldn’t want to load NULL in but we could do some filtering or maybe just use ISDATE() = 1.

CONVERT and PARSE have “TRY” equivalents…

SELECT TRY_CONVERT(DATE, 'simon')
SELECT TRY_PARSE('99 March 2018' AS DATE USING 'en-gb')

These two queries just return NULL without throwing an error.

 

WINDOW FUNCTIONS – PART 3: Offset Functions

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of the different types of window functions which are as follows:

  1. Ranking
  2. Aggregate
  3. Offset

I know the ranking functions pretty well but haven’t used the aggregate and offset functions at all really.

This third and final post demonstrates the offset functions.

I’ve been going through the official exam guidebook by Itzik Ben-Gan and he explains how the window functions work very well and the whole book is certainly worth reading if you use T-SQL at all.

With every window function you use the OVER clause. This is the part that specifies the “window” you are ranking or aggregating over.

The Offset Functions

For the demonstrations I’m going to use the same data set I used in the ranking function demo and create a table with 11 rows that contain purchases for 3 customers as follows…

CREATE TABLE dbo.WindowFunctionDemo
(
   Id INT NOT NULL IDENTITY(1,1),
   CustomerID INT NOT NULL,
   PurchaseDate DATETIME2(0) NOT NULL,
   PurchaseItem VARCHAR(100) NOT NULL,
   Amount DECIMAL(10,2) NOT NULL,
);
GO

INSERT INTO dbo.WindowFunctionDemo
(
   CustomerID,
   PurchaseDate,
   PurchaseItem,
   Amount
)
VALUES
 (1, '20180306 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20180225 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
GO

SELECT *
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 1

The offset window functions allow you to return data from other rows based on their distance away from the current row.  For example, we might want to return all of the purchases made today but in the same row return the value of the previous purchase that the customer made.

The LAG and LEAD window functions allow you to do this: LAG allows you to access previous rows and LEAD allows you to access following rows.

The data set must be ordered so the functions can find previous and following rows correctly.

The following example returns all the data in the table along with the amount of the previous and next purchases each customer has made…

SELECT *,
 LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
 LEAD(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS NextAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 1

You can pass a second parameter into the LAG and LEAD functions to access other rows and not just the ones immediately either side of the current row.  This example returns the Amount value from the two previous orders…

SELECT *,
   LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
   LAG(Amount, 2) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS SecondPreviousAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 2

If not supplied, a default value of 1 is used as this parameter so

LAG(Amount)

is the same as

LAG(Amount,1)

There is also a third parameter that can be passed in that will convert NULLs to a default value…

SELECT *,
   LAG(Amount, 1, -100) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
   LAG(Amount, 2, -200) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS SecondPreviousAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 3

There are also FIRST_VALUE and LAST_VALUE offset functions that return the first and last values in the window frame.

This example returns the Amount value of the first purchase for each customer…

SELECT *,
 FIRST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS FirstPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 4

We can do something similar to find the last value in the window frame… (however there is something you need to be careful with here!!!)

SELECT *,
   LAST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS LatestPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 5

As you can see this is not showing the results we might expect.  We might expect to see 259.99 in the LatestPurchaseAmount for all rows for CustomerID = 1.  If you read my post on the aggregate window functions you might remember that the default window frame is…

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This means that for each row the window frame, i.e. the data available to the offset function, is all previous rows up to and including the current row.  As we’re ordering by PurchaseDate, the window frame for the first purchase by customer 1 of the Rugby Ball on 2018-03-06 12:34:00 only contains that single row.  The window frame is restricting access to the rows for later purchases.

All we need to do to get the required result is make sure we’re using the correct window frame, i.e. make sure the frame includes the latest purchase for each customer.  We can do this as follows…

SELECT *,
   LAST_VALUE(Amount) OVER (
                     PARTITION BY CustomerID 
                     ORDER BY PurchaseDate
                     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 ) AS LatestPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 6

We could also use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING which spreads the window frame across the whole partition.