This is a discussion on Should you avoid updating the fact table fields? within the SQL Server Data Warehousing forums, part of the Microsoft SQL Server category; --> I'm reading this data warehouse book and it says that you should never update a record in the fact ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm reading this data warehouse book and it says that you should never update a record in the fact table unless it is to correct an error but it doesn't say what the consequences of doing this would be. Could someone give me an example of why it is a bad idea to update fields in the fact table? I'm currently working on a data warehouse that has an invoice number in the table that is the basis for the fact table. Sometimes this is null depending on how far along the record is in the billing process before I try to import it into the data warehouse. Should I import the record and then at a later time update it when the invoice field is not null or wait until the record is complete to add it to the fact table? |
| |||
| Stephen, Whether or not you update records in a fact table depends on how you are processing OLAP cubes built over the underlying SQL server database. It also has a philosophical aspect in that a data warehouse should have one version of the data, which is recognised as "The Truth", so you only change it if it is in error. If you don't have any OLAP cubes over the database, then there is no problem updating the data, users' queries will report the latest data. So if you change that sale of 10,000,000 USD back to 10,000 USD as it should be, then the warehouse will reflect this. No problem. Unless of course you have just declared this year-end to the government :-< If you have OLAP cubes built, then there are consequences, because the cube is a copy of the data, in a new format. So if you change the fact table, then you have to make sure that the cube is up to date. Assuming you use MS Analysis services, then you might have your cube set to incrememtal update, refresh, or full process. If you have incremental update, then the cube won't go back to pick up the changes, and your warehouse and cubes will be out of sync. If you refresh or re-process you will be fine. I have several warehouses where I load in a rolling 60 day's worth of data, becuase the users need data up to yesterday, but accept that the data may be changed for up to 2 months after the initial load (these are for a shipping organisation, and there are ALWAYS arguments over who pays for what on a ship!). So every night I delete and update the last 60 days of the fact table. The cubes either re-process or refresh, and everything is OK. Eventually when everyone has agreed the fees, the relevant rows are marked "certified", so the users know they are OK. In response to Kevin's comment that the invoice number should be on a dimension, I would disagree. I have the same setup, and the Invoice number is on the fact table for *non-OLAP* reporting, although I have allowed it to appear in drill-to-detail. Even if you had created an Invoices dimension (with information about invoice type and payment terms etc), then you still need a key field on the fact table, and it makes sense to use the invoice number itself as the key. If it is not a purely numeric field then it would be faster to use an artificial integer key, but probably not worth it. The users won't see any difference in performance, and I bet the additional time to do the join is almost imeasurable.. Of course If you want to use the invoice number on the fact table as a dimension, (which you can if you want to), or as a key to a dimension table then you MUSTchange the NULL to a valid value. "Unknown" or 00000 would be obvious choices. Hope this helps, Rich "Stephen" <swrothfuss@hotmail.com> wrote in message news:#5tQhRhZDHA.652@TK2MSFTNGP10.phx.gbl... > I'm reading this data warehouse book and it says that you should never > update a record in the fact table unless it is to correct an error but it > doesn't say what the consequences of doing this would be. > > Could someone give me an example of why it is a bad idea to update fields in > the fact table? > > I'm currently working on a data warehouse that has an invoice number in the > table that is the basis for the fact table. Sometimes this is null > depending on how far along the record is in the billing process before I try > to import it into the data warehouse. Should I import the record and then > at a later time update it when the invoice field is not null or wait until > the record is complete to add it to the fact table? > > |
| ||||
| Thanks for all the information Richard. "Richard R" <general@adslleamcres> wrote in message news:edgIxPnZDHA.2020@TK2MSFTNGP10.phx.gbl... > Stephen, > > Whether or not you update records in a fact table depends on how you are > processing OLAP cubes built over the underlying SQL server database. It also > has a philosophical aspect in that a data warehouse should have one version > of the data, which is recognised as "The Truth", so you only change it if it > is in error. > > If you don't have any OLAP cubes over the database, then there is no problem > updating the data, users' queries will report the latest data. So if you > change that sale of 10,000,000 USD back to 10,000 USD as it should be, then > the warehouse will reflect this. No problem. Unless of course you have just > declared this year-end to the government :-< > > If you have OLAP cubes built, then there are consequences, because the cube > is a copy of the data, in a new format. So if you change the fact table, > then you have to make sure that the cube is up to date. Assuming you use MS > Analysis services, then you might have your cube set to incrememtal update, > refresh, or full process. If you have incremental update, then the cube > won't go back to pick up the changes, and your warehouse and cubes will be > out of sync. If you refresh or re-process you will be fine. > > I have several warehouses where I load in a rolling 60 day's worth of data, > becuase the users need data up to yesterday, but accept that the data may be > changed for up to 2 months after the initial load (these are for a shipping > organisation, and there are ALWAYS arguments over who pays for what on a > ship!). So every night I delete and update the last 60 days of the fact > table. The cubes either re-process or refresh, and everything is OK. > Eventually when everyone has agreed the fees, the relevant rows are marked > "certified", so the users know they are OK. > > In response to Kevin's comment that the invoice number should be on a > dimension, I would disagree. I have the same setup, and the Invoice number > is on the fact table for *non-OLAP* reporting, although I have allowed it to > appear in drill-to-detail. > Even if you had created an Invoices dimension (with information about > invoice type and payment terms etc), then you still need a key field on the > fact table, and it makes sense to use the invoice number itself as the key. > If it is not a purely numeric field then it would be faster to use an > artificial integer key, but probably not worth it. The users won't see any > difference in performance, and I bet the additional time to do the join is > almost imeasurable.. > Of course If you want to use the invoice number on the fact table as a > dimension, (which you can if you want to), or as a key to a dimension table > then you MUSTchange the NULL to a valid value. "Unknown" or 00000 would be > obvious choices. > > Hope this helps, > > Rich > > > "Stephen" <swrothfuss@hotmail.com> wrote in message > news:#5tQhRhZDHA.652@TK2MSFTNGP10.phx.gbl... > > I'm reading this data warehouse book and it says that you should never > > update a record in the fact table unless it is to correct an error but it > > doesn't say what the consequences of doing this would be. > > > > Could someone give me an example of why it is a bad idea to update fields > in > > the fact table? > > > > I'm currently working on a data warehouse that has an invoice number in > the > > table that is the basis for the fact table. Sometimes this is null > > depending on how far along the record is in the billing process before I > try > > to import it into the data warehouse. Should I import the record and then > > at a later time update it when the invoice field is not null or wait until > > the record is complete to add it to the fact table? > > > > > > |