Unix Technical Forum

Should you avoid updating the fact table fields?

This is a discussion on Should you avoid updating the fact table fields? within the SQL Server Data Warehousing forums, part of the Microsoft SQL Server category; --> I'm reading this data warehouse book and it says that you should never update a record in the fact ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server Data Warehousing

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-27-2008, 05:09 PM
Stephen
 
Posts: n/a
Default Should you avoid updating the fact table fields?

I'm reading this data warehouse book and it says that you should never
update a record in the fact table unless it is to correct an error but it
doesn't say what the consequences of doing this would be.

Could someone give me an example of why it is a bad idea to update fields in
the fact table?

I'm currently working on a data warehouse that has an invoice number in the
table that is the basis for the fact table. Sometimes this is null
depending on how far along the record is in the billing process before I try
to import it into the data warehouse. Should I import the record and then
at a later time update it when the invoice field is not null or wait until
the record is complete to add it to the fact table?


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-27-2008, 05:09 PM
Richard R
 
Posts: n/a
Default Re: Should you avoid updating the fact table fields?

Stephen,

Whether or not you update records in a fact table depends on how you are
processing OLAP cubes built over the underlying SQL server database. It also
has a philosophical aspect in that a data warehouse should have one version
of the data, which is recognised as "The Truth", so you only change it if it
is in error.

If you don't have any OLAP cubes over the database, then there is no problem
updating the data, users' queries will report the latest data. So if you
change that sale of 10,000,000 USD back to 10,000 USD as it should be, then
the warehouse will reflect this. No problem. Unless of course you have just
declared this year-end to the government :-<

If you have OLAP cubes built, then there are consequences, because the cube
is a copy of the data, in a new format. So if you change the fact table,
then you have to make sure that the cube is up to date. Assuming you use MS
Analysis services, then you might have your cube set to incrememtal update,
refresh, or full process. If you have incremental update, then the cube
won't go back to pick up the changes, and your warehouse and cubes will be
out of sync. If you refresh or re-process you will be fine.

I have several warehouses where I load in a rolling 60 day's worth of data,
becuase the users need data up to yesterday, but accept that the data may be
changed for up to 2 months after the initial load (these are for a shipping
organisation, and there are ALWAYS arguments over who pays for what on a
ship!). So every night I delete and update the last 60 days of the fact
table. The cubes either re-process or refresh, and everything is OK.
Eventually when everyone has agreed the fees, the relevant rows are marked
"certified", so the users know they are OK.

In response to Kevin's comment that the invoice number should be on a
dimension, I would disagree. I have the same setup, and the Invoice number
is on the fact table for *non-OLAP* reporting, although I have allowed it to
appear in drill-to-detail.
Even if you had created an Invoices dimension (with information about
invoice type and payment terms etc), then you still need a key field on the
fact table, and it makes sense to use the invoice number itself as the key.
If it is not a purely numeric field then it would be faster to use an
artificial integer key, but probably not worth it. The users won't see any
difference in performance, and I bet the additional time to do the join is
almost imeasurable..
Of course If you want to use the invoice number on the fact table as a
dimension, (which you can if you want to), or as a key to a dimension table
then you MUSTchange the NULL to a valid value. "Unknown" or 00000 would be
obvious choices.

Hope this helps,

Rich


"Stephen" <swrothfuss@hotmail.com> wrote in message
news:#5tQhRhZDHA.652@TK2MSFTNGP10.phx.gbl...
> I'm reading this data warehouse book and it says that you should never
> update a record in the fact table unless it is to correct an error but it
> doesn't say what the consequences of doing this would be.
>
> Could someone give me an example of why it is a bad idea to update fields

in
> the fact table?
>
> I'm currently working on a data warehouse that has an invoice number in

the
> table that is the basis for the fact table. Sometimes this is null
> depending on how far along the record is in the billing process before I

try
> to import it into the data warehouse. Should I import the record and then
> at a later time update it when the invoice field is not null or wait until
> the record is complete to add it to the fact table?
>
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-27-2008, 05:09 PM
Stephen
 
Posts: n/a
Default Re: Should you avoid updating the fact table fields?

Thanks for all the information Richard.


"Richard R" <general@adslleamcres> wrote in message
news:edgIxPnZDHA.2020@TK2MSFTNGP10.phx.gbl...
> Stephen,
>
> Whether or not you update records in a fact table depends on how you are
> processing OLAP cubes built over the underlying SQL server database. It

also
> has a philosophical aspect in that a data warehouse should have one

version
> of the data, which is recognised as "The Truth", so you only change it if

it
> is in error.
>
> If you don't have any OLAP cubes over the database, then there is no

problem
> updating the data, users' queries will report the latest data. So if you
> change that sale of 10,000,000 USD back to 10,000 USD as it should be,

then
> the warehouse will reflect this. No problem. Unless of course you have

just
> declared this year-end to the government :-<
>
> If you have OLAP cubes built, then there are consequences, because the

cube
> is a copy of the data, in a new format. So if you change the fact table,
> then you have to make sure that the cube is up to date. Assuming you use

MS
> Analysis services, then you might have your cube set to incrememtal

update,
> refresh, or full process. If you have incremental update, then the cube
> won't go back to pick up the changes, and your warehouse and cubes will be
> out of sync. If you refresh or re-process you will be fine.
>
> I have several warehouses where I load in a rolling 60 day's worth of

data,
> becuase the users need data up to yesterday, but accept that the data may

be
> changed for up to 2 months after the initial load (these are for a

shipping
> organisation, and there are ALWAYS arguments over who pays for what on a
> ship!). So every night I delete and update the last 60 days of the fact
> table. The cubes either re-process or refresh, and everything is OK.
> Eventually when everyone has agreed the fees, the relevant rows are marked
> "certified", so the users know they are OK.
>
> In response to Kevin's comment that the invoice number should be on a
> dimension, I would disagree. I have the same setup, and the Invoice number
> is on the fact table for *non-OLAP* reporting, although I have allowed it

to
> appear in drill-to-detail.
> Even if you had created an Invoices dimension (with information about
> invoice type and payment terms etc), then you still need a key field on

the
> fact table, and it makes sense to use the invoice number itself as the

key.
> If it is not a purely numeric field then it would be faster to use an
> artificial integer key, but probably not worth it. The users won't see any
> difference in performance, and I bet the additional time to do the join is
> almost imeasurable..
> Of course If you want to use the invoice number on the fact table as a
> dimension, (which you can if you want to), or as a key to a dimension

table
> then you MUSTchange the NULL to a valid value. "Unknown" or 00000 would be
> obvious choices.
>
> Hope this helps,
>
> Rich
>
>
> "Stephen" <swrothfuss@hotmail.com> wrote in message
> news:#5tQhRhZDHA.652@TK2MSFTNGP10.phx.gbl...
> > I'm reading this data warehouse book and it says that you should never
> > update a record in the fact table unless it is to correct an error but

it
> > doesn't say what the consequences of doing this would be.
> >
> > Could someone give me an example of why it is a bad idea to update

fields
> in
> > the fact table?
> >
> > I'm currently working on a data warehouse that has an invoice number in

> the
> > table that is the basis for the fact table. Sometimes this is null
> > depending on how far along the record is in the billing process before I

> try
> > to import it into the data warehouse. Should I import the record and

then
> > at a later time update it when the invoice field is not null or wait

until
> > the record is complete to add it to the fact table?
> >
> >

>
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:01 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com