vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Was looking at Kimball SSIS templates and note that when he is updating Fact tables, he first dumps the original extract into an "extract" table, then in a 2nd Data Flow area inside the same control flow, pulls that data and lines it up with the dimension(s). I think he's only doing it this way for auditing reasons - is this the norm or are there other reasons to do it this way? |
| |||
| "Joe H" <horton.rj at gmail> wrote in message news:%23JaQxvurIHA.1872@TK2MSFTNGP04.phx.gbl... > Was looking at Kimball SSIS templates and note that when he is updating > Fact tables, he first dumps the original extract into an "extract" table, > then in a 2nd Data Flow area inside the same control flow, pulls that data > and lines it up with the dimension(s). > > I think he's only doing it this way for auditing reasons - is this the > norm or are there other reasons to do it this way? I think Kimball's "extract" table is analagous to what a lot of people simply call a "staging" area. It's just a temporary storage space, and it's pretty common. In addition to auditing, a staging area is often used as a workspace to perform complex calculations and additional validations that might be done more efficiently on the server in a set-based fashion (SQL) before you commit the data in its final form. |
| |||
| This would be the Transform step of the Extract Transform Load process. "Joe H" wrote: > Was looking at Kimball SSIS templates and note that when he is updating Fact > tables, he first dumps the original extract into an "extract" table, then in > a 2nd Data Flow area inside the same control flow, pulls that data and lines > it up with the dimension(s). > > I think he's only doing it this way for auditing reasons - is this the norm > or are there other reasons to do it this way? > > > |
| |||
| Further research took me that this is done because it's faster on very large loads to load to use this method instead of inserting row by row "Eric Russell" <EricRussell@discussions.microsoft.com> wrote in message news > This would be the Transform step of the Extract Transform Load process. > > "Joe H" wrote: > >> Was looking at Kimball SSIS templates and note that when he is updating >> Fact >> tables, he first dumps the original extract into an "extract" table, then >> in >> a 2nd Data Flow area inside the same control flow, pulls that data and >> lines >> it up with the dimension(s). >> >> I think he's only doing it this way for auditing reasons - is this the >> norm >> or are there other reasons to do it this way? >> >> >> |
| ||||
| "Joe H" <hortoristic at gmail> wrote in message news:%23JafXiT1IHA.4492@TK2MSFTNGP02.phx.gbl... > Further research took me that this is done because it's faster on very > large loads to load to use this method instead of inserting row by row That could be true if you're taking advantage of SQL set-based statements to manipulate the data around after you load it into the "staging" area. It's fairly common practice to bulk load data directly into the staging area for a few reasons: auditing (as you already mentioned), control (it's a lot easier to manipulate source data in the database than in flat files, etc.), and potentially speed (if you use SQL set-based manipulations). The speed improvement isn't necessarily a given, however, especially if you need to perform a lot of compex manipulations on your data before moving it to the fact table in finalized form. -- ======== Michael Coles "Pro SQL Server 2008 XML" http://www.amazon.com/Pro-SQL-Server...dp/1590599837/ |