As a Data Warehouse Consultant, I´m often asked the following by customers: “Is it possible to enrich our dimension data with some manually maintained data” and the answer is always “yeah, of course”. However, quite quickly the questions of:
- Who should input this data
- How do I inform that a new dimension value is available
- In which system should the user import this data
- And how can it been done in a user friendly manner
I have successfully used Master Data Services from SQL Server 2012 to manage this challenge.
Master Data Services offer an Excel plugin and a web application that makes it easy for business users to push data to a SQL Server. This tutorial will focus on the Excel plugin for pushing data.
This blog post will explain how you can push data from your SQL Server to your MDS implementation, thereby making it possible for the business user to enrich the data.
I have created an MDS entity as follows:
I would now like to push a new data row from my Data Warehouse solution to the MDS Entity, thereby making it possible for the business user to fill out the “MyMdsColumn” value. My DW table looks as follows:
Notice that row number 3 does not exists in the MDS entity.
The SSIS Package
I have created an SSIS package that looks as follows:
The control flow consists of a data flow that pushes data to the MDS entity. An Execute SQL Task that starts the import batch in the MDS solution (more on that later)
The data flow consist of an OLE DB Source that collects the data from the table, a Derived Column component that sets the required setting for the MDS and finally an OLE DB Destination that inserts the data in an MDS stage table.
The import process
MDS will automatically create a staging table for you when you create a new entity in MDS. The staging table is placed under the schema [stg] and is named as your entity´s name concatenated with “_Leaf”.
This table serves as a temporary repository before the data is processed against the MDS entity.
MDS requires two types for information when importing to MDS, hence Import Type and Batch Tag (see picture below).
Import Type specifies which type of import action you want MDS to do for you. The options are:
Please check this link for a more detailed explanation of the different import types. Please read the following for a complete explanation of the different import types: http://technet.microsoft.com/en-us/library/ee633854.aspx
The Batch Tag is a name that you specify for the data. The Batch Tag is used later when asking MDS to process a specific Batch Tag.
The picture below shows how I mapped the data from my Data Warehouse to the staging table in MDS.
I can now run my Data Flow, thereby importing the third row in the Data Warehouse table to the staging table in MDS.
MDS recognizes the imported data with the Batch Tag “ImportAndUpdate”. Notice, in the picture below that the batch as not yet been run.
To run the batch, the Execute SQL Task from the Control Flow needs to run. It contains the following SQL script:
The script executes a stored procedure from the MDS database. Executing the script tells MDS to start the batch with the Batch Tag “InsertAndUpdate”.
Turning our attention to the MDS again, we see that the batch has now successfully run.
Now, I refresh my entity in Excel and see that a new row (the third row) has been added to my MDS entity.
It is now possible for the business user to fill out the “MyMdsColumn” column for later lookups by the ETL load.
This concludes the brief tutorial of how to import data to your SQL Server 2012 Master Data Service solution.
Next up is how to expose your entity data to your Data Warehouse or other systems. This won´t be explained here, but it is easily done with subscription views in MDS.
Do you see tables taking up much more space than expected? And when looking at the table design and doing a little math numbers just don’t add up! I have seen issues so many times, where you have a simple little table with only a few thousand rows taking up GB’s of storage space. The case is more or less always the same, and that is what I’ll demonstrate in this blog post.
Normally I’m not a big fan of using trace flags, my advice is only to use these when it is absolutely necessary otherwise don’t. Here is a list of the documented trace flags that you can use with SQL Server. That is the documented once, besides that there is a list of undocumented trace flags – and it is as you might already have guessed one of these I’ll be blogging about today.
With the release of SQL Server 2012 SP1 CU2, an enhancement to the BACKUP command has been added. Now the BACKUP command support “TO URL” when you specify a backup device. This gives you the option to use an Azure Storage account as destinations for your backups.
Where did I put my Sales table?
Filegroups? What can I say! Sometimes I love them and sometimes I just hate them. In theory this should be quite simple and give you benefits – but in my years as a consultant, I have seen so many bad file group designs where the simplicity is traded in for an over complex layout that looks good on paper, but is a nightmare to maintain.
A scenario that I often come across is misplaced tables and indexes. This happens mostly if the person that created the database layout is not responsible for creating objects afterwards, or if the end-users does not get the proper instructions on how to do things the right way. When this is the case, things can end up messy. I will show you how to query the Meta data, so that you can check if your data is placed in the right filegroup.
Here is the database layout that I’m about to create:
In addition, here is the code used to create the database
CREATE DATABASE [Inspari] ON PRIMARY ( NAME = N'System01', FILENAME = N'C:\FusionIO_MP\System01.mdf' , SIZE = 100Mb, FILEGROWTH = 100Mb ), ( NAME = N'System02', FILENAME = N'C:\FusionIO_MP\System02.ndf' , SIZE = 100Mb, FILEGROWTH = 100Mb ), FILEGROUP [Data] ( NAME = N'Data01', FILENAME = N'C:\FusionIO_MP\Data01.ndf' , SIZE = 100Mb, FILEGROWTH = 100Mb ), ( NAME = N'Data02', FILENAME = N'C:\FusionIO_MP\Data02.ndf' , SIZE = 100Mb, FILEGROWTH = 100Mb ), FILEGROUP [Index] ( NAME = N'Index01', FILENAME = N'C:\FusionIO_MP\Index01.ndf' , SIZE = 100Mb, FILEGROWTH = 100Mb ), ( NAME = N'Index02', FILENAME = N'C:\FusionIO_MP\Index02.ndf' , SIZE = 100Mb, FILEGROWTH = 100Mb ) LOG ON ( NAME = N'Inspari_log', FILENAME = N'C:\FusionIO_MP\Inspari_log.ldf' , SIZE = 100Mb, FILEGROWTH = 100Mb ) GO
Let us create a table, pay attention to the ON [Filegroup] keyword
CREATE TABLE RandomPeople ( Id INT IDENTITY(1,1) PRIMARY KEY, DateCreated DATETIME2 DEFAULT SYSDATETIME(), Name VARCHAR(100), Salary BIGINT not null ) ON [Data] GO
Let us create an index on the table again pay attention to the ON [Filegroup] keyword.
CREATE INDEX idx_RandomIndex ON RandomPeople ( Name, Salary ) ON [Index] GO
Now imagine that you have a massive database where multiple people have created tables and indexes, the odds for everything being placed in the right file group is close to not existing. Some object will eventually end up in the wrong file group. Here is a T-SQL script that queries the meta data to show You which objects are stored in which file groups.
SELECT OBJECT_NAME(t1.object_id) as tablename, t3.name AS indexName, t3.type_desc, t1.index_id, t4.name AS FileGroupName, t1.rows AS RowsInObject, (total_pages * 8) / 1024 AS SpaceUsedInMB FROM sys.partitions t1 INNER JOIN sys.allocation_units t2 ON (t2.container_id = t1.hobt_id) INNER JOIN sys.indexes t3 on (t1.object_id = t3.object_id) AND (t1.index_id = t3.index_id) INNER JOIN sys.filegroups t4 ON (t4.data_space_id = t2.data_space_id) WHERE t1.object_id in ( SELECT object_id FROM sys.objects WHERE is_ms_shipped = 0 ) ORDER BY 1
Is your data stored correct? If that is the case, then congratulations – good job. If not, then there is work to do. The good news is that you might discover that your Sales table is located on slow disks, and by moving it to the faster disks, you’ll improve system performance immediately.