Columnstore Indexes – part 71 (“Change Data Capture, Change Tracking & Temporal”)

来源:互联网 时间:1970-01-01

Continuation from the previous 70 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .

In the SQL Server world of data modification tracking, there are 2 very important features, which are called Change Data Capture (CDC) and Change Tracking (CT). Both of them allow you to understand the changes done to the monitored table. Notice that they are implemented in the different way, and have different goals.

This blogpost is dedicated to their integration and collaboration with the Columnstore Indexes in SQL Server 2016.

First of all, let’s consider the reasons for not enabling this functionality in SQL Server 2012 & SQL Server 2014:

In SQL Server 2012 we have had only Nonclustered Columnstore Indexes, which were non-updatable, implementing support for CDC & CT have simply made no sense whatsoever.

In SQL Server 2014 the Nonclustered Columnstore Indexes were kept being unupdatable, and so it was still nonsensical to make them support CDC or CT. As for the Clustered Columnstore Indexes in SQL Server 2014, they were implemented as updatable, but unfortunately both functionalities stayed out of the scope for that release.

In SQL Server 2016, we have received support for both of the scenarios, but lets take it slow and 1 by 1 diving into some of the details and tests:

Change Data Capture

The Change Data Capture is the feature that was originally implemented for the good old SQL Server 2008, with the objective of capturing data modification statements, executed against SQL Server tables. There were some important enhancement since the original version, like the addition of SSIS components in SQL Server 2012, but largely the feature has been set in the original release.

In SQL Server 2016 Clustered Columnstore Indexes have received the support of the secondary Nonclustered B-Tree Indexes, including the Unique ones, you can find more information and example on this topic at the blogpost dedicated to the Clustered Columnstore Indexes Architectural Improvements for SQL Server 2016 .

From the logical perspective, in order to have support for CDC, we shall need to build a table with Clustered Columnstore Index, and we shall have to add a Unique Nonclustered Index, which we shall indicate CDC to use for collection of data modifications, if we are looking to get the net changes, while for the default implementation it is not required. But how should that work ? Given that Clustered Columnstore is the primary Index, we shall have to read its original data from the Delta-Store or from the compressed Row Group, where in the second case the performance will never be something that can be used in the real world scenario…My thoughts are that the CDC support for Clustered Columnstore Index requires a unuseful overload for enabling it on a real Fact table, because in Data Warehousing environments doing ETL would allow you to save the modifications directly within ETL process.

For the start, let’s restore a fresh copy of ContosoRetailDW database and apply the usual modifications, such as setting Compatibility Level to 130, setting SA to be the Database Owner and increasing the default sizes for the Data File & Transaction Log:

USE [master] alter database ContosoRetailDW set SINGLE_USER WITH ROLLBACK IMMEDIATE; RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'C:/Install/ContosoRetailDW.bak' WITH FILE = 1, MOVE N'ContosoRetailDW2.0' TO N'C:/Data/ContosoRetailDW.mdf', MOVE N'ContosoRetailDW2.0_log' TO N'C:/Data/ContosoRetailDW.ldf', NOUNLOAD, STATS = 5; alter database ContosoRetailDW set MULTI_USER;GOGOALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130GOALTER AUTHORIZATION ON DATABASE::ContosoRetailDW to sa;GOALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )GOALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )GO USE[master]alter databaseContosoRetailDWsetSINGLE_USERWITHROLLBACKIMMEDIATE;RESTORE DATABASE[ContosoRetailDW]FROM DISK=N'C:/Install/ContosoRetailDW.bak'WITH FILE=1, MOVEN'ContosoRetailDW2.0'TON'C:/Data/ContosoRetailDW.mdf',MOVEN'ContosoRetailDW2.0_log'TON'C:/Data/ContosoRetailDW.ldf', NOUNLOAD, STATS=5;alter databaseContosoRetailDWsetMULTI_USER;GOGOALTER DATABASE[ContosoRetailDW]SETCOMPATIBILITY_LEVEL=130GOALTER AUTHORIZATIONONDATABASE::ContosoRetailDWtosa;GOALTER DATABASE[ContosoRetailDW]MODIFYFILE(NAME=N'ContosoRetailDW2.0',SIZE=2000000KB,FILEGROWTH=128000KB)GOALTER DATABASE[ContosoRetailDW]MODIFYFILE(NAME=N'ContosoRetailDW2.0_log',SIZE=400000KB,FILEGROWTH=256000KB)GO

Now we can drop Clustered B-Tree Index and create a Clustered Columnstore Index on our dbo.FactOnlineSales table:

use ContosoRetailDW;GOALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];create clustered columnstore Index PK_FactOnlineSaleson dbo.FactOnlineSales; useContosoRetailDW;GOALTER TABLEdbo.[FactOnlineSales]DROPCONSTRAINT[PK_FactOnlineSales_SalesKey]; create clusteredcolumnstoreIndexPK_FactOnlineSaleson dbo.FactOnlineSales;

To enable Change Data Capture (CDC), we shall need to execute the following script, which will enable it on the database level:

exec sys.sp_cdc_enable_db; exec sys.sp_cdc_enable_db;

Now, let’s try to enable CDC on our test table (notice that I am using most of the default parameters while explicitly setting NET CHANGES to 1 (track all net changes), because this is what I expect from my CDC-enabled table:

exec [email protected]_schema = N'dbo',@source_name = N'FactOnlineSales',@role_name = NULL,@filegroup_name = N'PRIMARY',@supports_net_changes = 1; exec [email protected]_schema=N'dbo',@source_name =N'FactOnlineSales',@role_name =NULL,@filegroup_name=N'PRIMARY',@supports_net_changes=1;

Look at the error message that I have received:

Msg 22939, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 201The parameter @supports_net_changes is set to 1, but the source table does not have a primary key defined and no alternate unique index has been specified. Msg22939,Level16,State1,Proceduresp_cdc_enable_table_internal,Line201The [email protected]_net_changesissetto1,butthesourcetabledoesnothaveaprimarykeydefinedandnoalternateuniqueindexhasbeenspecified.

Yeah, I am missing a unique index in my table, because I have removed my Clustered B-Tree Index previously.Let’s add a Unique Nonclustered B-tree Index on the SalesKey column and try to execute the sys.sp_cdc_enable_table again:

create unique nonclustered index UQ_FactOnlineSaleson dbo.FactOnlineSales (OnlineSalesKey)with (DATA_COMPRESSION = PAGE);EXEC [email protected]_schema = N'dbo',@source_name = N'FactOnlineSales',@role_name = NULL,@filegroup_name = N'PRIMARY',@supports_net_changes = 1,@index_name = 'UQ_FactOnlineSales'; create uniquenonclusteredindexUQ_FactOnlineSaleson dbo.FactOnlineSales(OnlineSalesKey)with(DATA_COMPRESSION=PAGE); EXEC [email protected]_schema=N'dbo',@source_name =N'FactOnlineSales',@role_name =NULL,@filegroup_name=N'PRIMARY',@supports_net_changes=1,@index_name='UQ_FactOnlineSales';

This time I have received the following message:

Msg 35353, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 215Change Data Capture cannot be enabled on a table with a clustered columnstore index. Consider dropping clustered columnstore index 'PK_FactOnlineSales' on table 'FactOnlineSales'. Msg35353,Level16,State1,Proceduresp_cdc_enable_table_internal,Line215Change DataCapturecannotbeenabledonatablewithaclusteredcolumnstoreindex. Consider droppingclusteredcolumnstoreindex'PK_FactOnlineSales'ontable'FactOnlineSales'.

This means that Clustered Columnstore Indexes are not supporting CDC and that’s perfectly fine from my personal point of view.Notice that trying to enable the CDC even without net changes support, will result in the very same error, meaning that this scenario is not being supported in the moment.

What about the Nonclustered Columnstore Indexes ?For the test, let’s use the different table dbo.FactSales – beginning with the creation of the updatable Nonclustered Columnstore Index:

create nonclustered columnstore index [NCCI_FactSales_SalesKey]on [dbo].[FactSales] (SalesKey, DateKey, channelKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, UnitCost, UnitPrice, SalesQuantity, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, SalesAmount, ETLLoadID, LoadDate, UpdateDate) create nonclusteredcolumnstoreindex[NCCI_FactSales_SalesKey]on[dbo].[FactSales](SalesKey,DateKey,channelKey,StoreKey,ProductKey,PromotionKey,CurrencyKey,UnitCost,UnitPrice,SalesQuantity,ReturnQuantity,ReturnAmount,DiscountQuantity,DiscountAmount,TotalCost,SalesAmount,ETLLoadID,LoadDate,UpdateDate)

Now, let’s enable the CDC scenario on it, using the same parameters as we have had try on the dbo.FactOnlineSales table:

EXEC [email protected]_schema = N'dbo',@source_name = N'FactSales',@role_name = NULL,@filegroup_name = N'PRIMARY',@supports_net_changes = 1 EXEC [email protected]_schema=N'dbo',@source_name =N'FactSales',@role_name =NULL,@filegroup_name=N'PRIMARY',@supports_net_changes=1

This time it has worked without any error message!

To confirm that it was succesfull, let’s execute the sys.sp_cdc_help_change_data_capture system stored procedure, which will show all tables that are configured for the Change Data Capture:

exec sys.sp_cdc_help_change_data_capture; exec sys.sp_cdc_help_change_data_capture;

Magnificent, so that means that we have a Rowstore B-Tree table with an updatable Nonclustered Columnstore Index that supports CDC. This makes perfectly sense, because all we actually need to make it work with the traditional Rowstore structure, simply ignoring the secondary Columnstore Index.

But let’s try to insert, update & delete some data from our table, to make it sure, that everything works fine:

delete top (10) from dbo.FactSales; delete top(10)fromdbo.FactSales;

The deletion operation was a success, and we can confirm it with the execution of the following statement that will count the rows from the cdc.dbo_FactSales_CT table, the default system table that stores modifications from our original dbo.FactSales table:

select count(*) as ModificationsCountfrom [cdc].[dbo_FactSales_CT]; select count(*)asModificationsCountfrom[cdc].[dbo_FactSales_CT];

In the result, I have received exactly what I have expected – 10 rows.Now let’s update some data:

update top (20) dbo.FactSalesset UpdateDate = GetDate(); update top(20)dbo.FactSalesset UpdateDate=GetDate();

Let’s see if we get 50 rows from the cdc.dbo_FactSales_CT table meaning that we have received 40 modifications, because 20 modifications are stored as 20 deletes and 20 inserts:

select count(*) as ModificationsCountfrom [cdc].[dbo_FactSales_CT]; select count(*)asModificationsCountfrom[cdc].[dbo_FactSales_CT];

Success!

Change Tracking

Change Tracking is a solution that provides opportunity with a lightweight tracking of changes made to a table.

To set it up for your database is very easy – its enough to execute the following command:

alter database ContosoRetailDWset change_tracking = ON (CHANGE_RETENTION = 10 DAYS, AUTO_CLEANUP = ON); alter databaseContosoRetailDWset change_tracking=ON (CHANGE_RETENTION=10DAYS,AUTO_CLEANUP=ON);

Notice that you can also set up all of these properties by using the properties dialog of the database that you are configuring.

Since Change Tracking requires a primary key on the tracked table, we shall need to set up our Clustered Columnstore Table with a Primary Key:

alter table dbo.FactOnlineSalesadd constraint PK_FactOlnineSalesprimary key Nonclustered (OnlineSalesKey); alter tabledbo.FactOnlineSalesadd constraintPK_FactOlnineSalesprimary keyNonclustered(OnlineSalesKey);

Now, let’s enable Change Tracking on our Clustered Columnstore table dbo.FactOnlineSales

alter table dbo.FactOnlineSalesenable change_trackingwith (track_columns_updated = on); alter tabledbo.FactOnlineSalesenable change_trackingwith(track_columns_updated=on);

Well, again for Clustered Columnstore Index we have received an error message:

Msg 35353, Level 16, State 1, Line 1Change Tracking cannot be enabled on a table with a clustered columnstore index. Consider dropping clustered columnstore index 'PK_FactOnlineSales' on table 'FactOnlineSales'. Msg35353,Level16,State1,Line1Change Trackingcannotbeenabledonatablewithaclusteredcolumnstoreindex. Consider droppingclusteredcolumnstoreindex'PK_FactOnlineSales'ontable'FactOnlineSales'.

This means that the improvements in tracking for Clustered Columnstore Indexes are not included in SQL Server 2016. As I have written above for the Change Data Capture functionality tests – it does not shock me at all, if Microsoft will never implement it.

Let’s get back to our dbo.FactSales, the one that has a Nonclustered Columnstore Index and check if there is a support for CT (Change Tracking):

alter table dbo.FactSalesenable change_trackingwith (track_columns_updated = on); alter tabledbo.FactSalesenable change_trackingwith(track_columns_updated=on);

As expected, everything functioned perfectly in this case – there were an “Command(s) completed successfully.” message.To verify if our table is being set up for tracking successfully, let’s read data from the sys.change_tracking_tables DMV:

select object_name(object_id), *from sys.change_tracking_tables; select object_name(object_id),*from sys.change_tracking_tables;

Looks great!

Let’s delete some data:

delete top (10)from dbo.FactSales; delete top(10)from dbo.FactSales;

We can verify that the data is being tracked correctly by issuing the following command:

select * from ChangeTable (CHANGES FactSales,0) as CtColumnstore select *from ChangeTable(CHANGESFactSales,0)asCtColumnstore

I am very glad to see both of the functionalities getting support for the Nonclustered Columnstore Indexes in SQL Server 2016, this makes Operational Analytics a kind of solution that many people will be able to use without a lot of additional effort.

Temporal

Temporal is the new tracking mechanism that is implemented in SQL Server 2016, and I am considering to be one of the finest addition to the engine of my favourite database. If you are interested in it – considering visiting amazing articles on Temporal written by Itzik Ben-Gan for the SQL Server PRO Magazine .

Let’s consider a new table dbo.FactOnlineSales and enable Temporal on it by issuing the following commands that will drop existing Primary Key, Create a new Clustered Columnstore Index, add the necessary 2 columns for the Temporal:

ALTER TABLE [dbo].[FactInventory] DROP CONSTRAINT [PK_FactInventory_InventoryKey]GOcreate clustered columnstore index PK_FactInventoryon dbo.FactInventory;GOalter table dbo.FactInventory add temp_start DATETIME2(0) GENERATED ALWAYS AS ROW START NOT NULLCONSTRAINT DFT_FactInventory_temp_start DEFAULT('19000101'), temp_end DATETIME2(0) GENERATED ALWAYS AS ROW END NOT NULLCONSTRAINT DFT_FactInventory_temp_end DEFAULT('99991231 23:59:59'), PERIOD FOR SYSTEM_TIME (temp_start, temp_end); ALTER TABLE[dbo].[FactInventory]DROPCONSTRAINT[PK_FactInventory_InventoryKey]GOcreate clusteredcolumnstoreindexPK_FactInventoryon dbo.FactInventory;GOalter tabledbo.FactInventoryadd temp_startDATETIME2(0)GENERATEDALWAYSASROWSTARTNOTNULLCONSTRAINT DFT_FactInventory_temp_startDEFAULT('19000101'), temp_endDATETIME2(0)GENERATEDALWAYSASROWENDNOTNULLCONSTRAINT DFT_FactInventory_temp_endDEFAULT('99991231 23:59:59'), PERIODFORSYSTEM_TIME(temp_start,temp_end);

Now, we can activate the Temporal System Versioning on our table:

ALTER TABLE dbo.FactInventory SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.FactInventoryHistory ) ); ALTER TABLEdbo.FactInventory SET(SYSTEM_VERSIONING=ON(HISTORY_TABLE=dbo.FactInventoryHistory));

Well, I have received the following error message, saying that the primary key is missing in this table:

Msg 13553, Level 16, State 1, Line 12System versioned temporal table 'ContosoRetailDW.dbo.FactInventory' must have primary key defined. Msg13553,Level16,State1,Line12System versionedtemporaltable'ContosoRetailDW.dbo.FactInventory'musthaveprimarykeydefined.

Let’s add a new primary key (nonclusrtered) to our FactInventory table and try again:

alter table dbo.FactInventoryadd constraint PKey_FactInventoryprimary key Nonclustered (InventoryKey);ALTER TABLE dbo.FactInventory SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.FactInventoryHistory ) ); alter tabledbo.FactInventoryadd constraintPKey_FactInventoryprimary keyNonclustered(InventoryKey); ALTER TABLEdbo.FactInventory SET(SYSTEM_VERSIONING=ON(HISTORY_TABLE=dbo.FactInventoryHistory));

Perfect! This time it has worked like a charm!

Let’s take it for a small ride by deleting a couple of rows:

delete top (10)from dbo.FactInventory; delete top(10)from dbo.FactInventory;

As before, to verify I will simply check the pre-configured table dbo.FactInventoryHistory for the 10 entries:

select count(*)from dbo.FactInventoryHistory select count(*)from dbo.FactInventoryHistory

This has worked as expected! This looks great!

Let’s check if the Temporal supports Nonclustered Columnstore Indexes as well – we shall need to drop the Clustered Columnstore Index manually and create the Nonclustered Columnstore Index brand new since the conversion from one type to another does not funciton:

DROP INDEX [PK_FactInventory] ON [dbo].[FactInventory];CREATE NONCLUSTERED COLUMNSTORE INDEX [PK_FactInventory] ON [dbo].[FactInventory] (InventoryKey, DateKey, StoreKey, ProductKey, CurrencyKey, OnHandQuantity, OnOrderQuantity, SafetyStockQuantity, UnitCost, DaysInStock, MinDayInStock, MaxDayInStock, Aging, ETLLoadID, LoadDate, UpdateDate, temp_start, temp_end); DROP INDEX[PK_FactInventory]ON[dbo].[FactInventory]; CREATE NONCLUSTEREDCOLUMNSTOREINDEX[PK_FactInventory]ON[dbo].[FactInventory](InventoryKey,DateKey,StoreKey,ProductKey,CurrencyKey,OnHandQuantity,OnOrderQuantity,SafetyStockQuantity,UnitCost,DaysInStock,MinDayInStock,MaxDayInStock,Aging,ETLLoadID,LoadDate,UpdateDate,temp_start,temp_end);

This script executed without any error, thus confirming that Nonclustered Columnstore Index is supporting Temporal as well.

Conclusion

Its amazing that we have finally received support for Nonclustered Columnstore Indexes in combinations with the Change Data Capture and Change Tracking. I think that given finite resources the possible decision of not implementing support for the Clustered Columnstore Indexes is very well thought.

The support of the Temporal feature for both Clustered as well as the Nonclustered Columnstore Indexes gives a great alternative, if for some specific reason one needs to use Clustered Columnstore Indexes coupled with Data Tracking features.

to be continued …

相关阅读:
Top