This is a discussion on UDM and Star Schema within the SQL Server Data Warehousing forums, part of the Microsoft SQL Server category; --> Hi everyone, I have a general question to datawarehousing and it'll be great if someone can drop some tips ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi everyone, I have a general question to datawarehousing and it'll be great if someone can drop some tips here. I'm about to start designing a corporate real time datawarehouse with SQL Server 2005 and is deciding on the approach for this. SQL Server 2005 now allows us to actually build a datawarehouse without actually restructuring the datasources into star/snowflake schemas. I'm curious to know if this will have any performance impact on the actual implementation? This project will probably require the usage of proactive caching as well as medium latency molap storage structure. By using proactive caching, SSAS will intelligently switch between ROLAP and MOLAP depending on the data changes in the production database. What is confusing to me is, what happens to ETL then? Does this mean that if SSAS is made to query directly from the production database, then ETL is totally eliminated? Any recommendations will be appreciated. Thanks. Nes |
| |||
| SSAS 2005 like any other OLAP tool on the market can works with flat schemas and more complicated schemas. but using star or snowflake schema is far better for a DW project and when you create a cube. SSAS 2005 is able to create the tables for you, so instead of creating the tables in the database then creating the cube, you create your dimensions and cubes first and SSAS2005 will generates the tables for you. this greatly simplify the prototype step of your project. proactive caching don't switch from MOLAP to ROLAP. proactive caching can detect changes in the source database and then update the cube automatically or can do an incremental loading automatically. there is a lot of options available here for different usage and requirements. but there is no switch from MOLAP to ROLAP. the ETL process is not required if your cubes are directly connect to the operational database. today, a lot of databases schemas are more accessible then before and the data quality is good, so accessing the source database is a good option. it's also a good option when you have only 1 database to play with (and not a lot of databases to synchronize) but if you have to sync. multiple databases, or you have to do a lot of cleansing / transformations, then an ETL process is required. another option we can found today: the transaction made to the operational system are also replicated into the datawarehouse in real time (using triggers, procedures or workflow systems) so the DW database himself become a realtime system. "Nestor" <n3570r@yahoo.com> wrote in message news > Hi everyone, > > I have a general question to datawarehousing and it'll be great if someone > can drop some tips here. I'm about to start designing a corporate real > time datawarehouse with SQL Server 2005 and is deciding on the approach > for this. > > SQL Server 2005 now allows us to actually build a datawarehouse without > actually restructuring the datasources into star/snowflake schemas. I'm > curious to know if this will have any performance impact on the actual > implementation? This project will probably require the usage of proactive > caching as well as medium latency molap storage structure. > > By using proactive caching, SSAS will intelligently switch between ROLAP > and MOLAP depending on the data changes in the production database. What > is confusing to me is, what happens to ETL then? Does this mean that if > SSAS is made to query directly from the production database, then ETL is > totally eliminated? > > Any recommendations will be appreciated. > > Thanks. > Nes |
| |||
| On Aug 22, 1:40 am, "Jeje" <willg...@hotmail.com> wrote: > SSAS 2005 like any other OLAP tool on the market can works with flat schemas > and more complicated schemas. > > but using star or snowflake schema is far better for a DW project and when > you create a cube. > SSAS 2005 is able to create the tables for you, so instead of creating the > tables in the database then creating the cube, you create your dimensions > and cubes first and SSAS2005 will generates the tables for you. > this greatly simplify the prototype step of your project. > > proactive caching don't switch from MOLAP to ROLAP. > proactive caching can detect changes in the source database and then update > the cube automatically or can do an incremental loading automatically. > there is a lot of options available here for different usage and > requirements. > but there is no switch from MOLAP to ROLAP. > > the ETL process is not required if your cubes are directly connect to the > operational database. > today, a lot of databases schemas are more accessible then before and the > data quality is good, so accessing the source database is a good option. > it's also a good option when you have only 1 database to play with (and not > a lot of databases to synchronize) > but if you have to sync. multiple databases, or you have to do a lot of > cleansing / transformations, then an ETL process is required. > another option we can found today: > the transaction made to the operational system are also replicated into the > datawarehouse in real time (using triggers, procedures or workflow systems) > so the DW database himself become a realtime system. > > "Nestor" <n35...@yahoo.com> wrote in message > > news > > > > > Hi everyone, > > > I have a general question to datawarehousing and it'll be great if someone > > can drop some tips here. I'm about to start designing a corporate real > > time datawarehouse with SQL Server 2005 and is deciding on the approach > > for this. > > > SQL Server 2005 now allows us to actually build a datawarehouse without > > actually restructuring the datasources into star/snowflake schemas. I'm > > curious to know if this will have any performance impact on the actual > > implementation? This project will probably require the usage of proactive > > caching as well as medium latency molap storage structure. > > > By using proactive caching, SSAS will intelligently switch between ROLAP > > and MOLAP depending on the data changes in the production database. What > > is confusing to me is, what happens to ETL then? Does this mean that if > > SSAS is made to query directly from the production database, then ETL is > > totally eliminated? > > > Any recommendations will be appreciated. > > > Thanks. > > Nes- Hide quoted text - > > - Show quoted text - I would add that in my experience creating a set of views to replace the star schema is good only for prototypes. Chances are that in a real world environment you need to do some transformation and/or cleansing operations that requires you having some staging/working area. Moreover, you will have a longer lifetime of your solution if you decouple your OLTP source from the db that becomes the physical OLAP Data Source. The capability of SSAS to handle relationship more complex than star/ snowflake schema is a tentative to reach the "UDM utopia" (UDM means Unified Dimensional Model, at the beginning of Yukon story it was the "final" model on which every report will be based on - but just only SSRS has a different model from UDM, so we are still far from realizing this vision). Today, I'm very happy to use UDM as an OLAP model that is much richer than those offered by other products (the single feature of many-to-many dimension relationships allowed me to build a lot of complex analytical models, like I described in my paper - http://www.sqlbi.eu/manytomany.aspx). Marco Russo http://www.sqlbi.eu http://sqlblog.com/blogs/marco_russo |
| |||
| Hi Jeje, Thanks for the recommendation. The tricky part is will need the system to fulfil real time scenarios, have the capability to store historical data as well as being able to due with very large SCDs. Proactive caching when set up properly does changes from MOLAP to ROLAP and to MOLAP again according to set up (some reference http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at the end of the day I might need to identify what needs to be real time (hence ROLAP structure) and what needs to be historical + scheduled (hence MOLAP and Snowflaked). I did some intensive research to UDM and basically 'concluded' that the concept of data cleansing doesn't sit very well in it. Seems like I need to break out the system into 2 conceptual areas in order to fulfil all that requirements... Thanks. Regards, David Ong On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote: > SSAS 2005 like any other OLAP tool on the market can works with flat > schemas and more complicated schemas. > > but using star or snowflake schema is far better for a DW project and > when you create a cube. > SSAS 2005 is able to create the tables for you, so instead of creating > the tables in the database then creating the cube, you create your > dimensions and cubes first and SSAS2005 will generates the tables for > you. > this greatly simplify the prototype step of your project. > > proactive caching don't switch from MOLAP to ROLAP. > proactive caching can detect changes in the source database and then > update the cube automatically or can do an incremental loading > automatically. > there is a lot of options available here for different usage and > requirements. > but there is no switch from MOLAP to ROLAP. > > the ETL process is not required if your cubes are directly connect to > the operational database. > today, a lot of databases schemas are more accessible then before and > the data quality is good, so accessing the source database is a good > option. > it's also a good option when you have only 1 database to play with (and > not a lot of databases to synchronize) > but if you have to sync. multiple databases, or you have to do a lot of > cleansing / transformations, then an ETL process is required. > another option we can found today: > the transaction made to the operational system are also replicated into > the datawarehouse in real time (using triggers, procedures or workflow > systems) so the DW database himself become a realtime system. > > > "Nestor" <n3570r@yahoo.com> wrote in message > news >> Hi everyone, >> >> I have a general question to datawarehousing and it'll be great if >> someone can drop some tips here. I'm about to start designing a >> corporate real time datawarehouse with SQL Server 2005 and is deciding >> on the approach for this. >> >> SQL Server 2005 now allows us to actually build a datawarehouse without >> actually restructuring the datasources into star/snowflake schemas. I'm >> curious to know if this will have any performance impact on the actual >> implementation? This project will probably require the usage of >> proactive caching as well as medium latency molap storage structure. >> >> By using proactive caching, SSAS will intelligently switch between >> ROLAP and MOLAP depending on the data changes in the production >> database. What is confusing to me is, what happens to ETL then? Does >> this mean that if SSAS is made to query directly from the production >> database, then ETL is totally eliminated? >> >> Any recommendations will be appreciated. >> >> Thanks. >> Nes > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
| |||
| Like marco says, use views in your source database to apply some business rules and cleansing process. for the choice of MOLAP or ROLAP... historical partitions generally are stored in MOLAP mode, because you don't update them frequently. the "Today" partition can be in ROLAP mode and use indexed views. But I recommend the low latency MOLAP mode. provides good performance by keeping the cache longer, but you suffer few seconds of latency between an update and the cube update. if you want to be more realtime the HOLAP mode provides updated result all the time. SCD dimension should not impact historical data. but your business key must be unique other the time to insure usage of both real time and historical content. but regarding the data cleansing process you have to do and other technical constraints. Try to split your solution in 2 projects: Real time project --> contains only "today" data, there is no SCD dimension Historical project -> contains all other data and the SCD dimensions generally doing analysis against the history will not answer the same questions as the real time access. "Nestor" <n3570r@yahoo.com> wrote in message news > Hi Jeje, > > Thanks for the recommendation. The tricky part is will need the system to > fulfil real time scenarios, have the capability to store historical data > as well as being able to due with very large SCDs. > > Proactive caching when set up properly does changes from MOLAP to ROLAP > and to MOLAP again according to set up (some reference > http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at the > end of the day I might need to identify what needs to be real time (hence > ROLAP structure) and what needs to be historical + scheduled (hence MOLAP > and Snowflaked). > > I did some intensive research to UDM and basically 'concluded' that the > concept of data cleansing doesn't sit very well in it. Seems like I need > to break out the system into 2 conceptual areas in order to fulfil all > that requirements... > > Thanks. > Regards, > David Ong > > > > > On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote: > >> SSAS 2005 like any other OLAP tool on the market can works with flat >> schemas and more complicated schemas. >> >> but using star or snowflake schema is far better for a DW project and >> when you create a cube. >> SSAS 2005 is able to create the tables for you, so instead of creating >> the tables in the database then creating the cube, you create your >> dimensions and cubes first and SSAS2005 will generates the tables for >> you. >> this greatly simplify the prototype step of your project. >> >> proactive caching don't switch from MOLAP to ROLAP. >> proactive caching can detect changes in the source database and then >> update the cube automatically or can do an incremental loading >> automatically. >> there is a lot of options available here for different usage and >> requirements. >> but there is no switch from MOLAP to ROLAP. >> >> the ETL process is not required if your cubes are directly connect to >> the operational database. >> today, a lot of databases schemas are more accessible then before and >> the data quality is good, so accessing the source database is a good >> option. >> it's also a good option when you have only 1 database to play with (and >> not a lot of databases to synchronize) >> but if you have to sync. multiple databases, or you have to do a lot of >> cleansing / transformations, then an ETL process is required. >> another option we can found today: >> the transaction made to the operational system are also replicated into >> the datawarehouse in real time (using triggers, procedures or workflow >> systems) so the DW database himself become a realtime system. >> >> >> "Nestor" <n3570r@yahoo.com> wrote in message >> news >>> Hi everyone, >>> >>> I have a general question to datawarehousing and it'll be great if >>> someone can drop some tips here. I'm about to start designing a >>> corporate real time datawarehouse with SQL Server 2005 and is deciding >>> on the approach for this. >>> >>> SQL Server 2005 now allows us to actually build a datawarehouse without >>> actually restructuring the datasources into star/snowflake schemas. I'm >>> curious to know if this will have any performance impact on the actual >>> implementation? This project will probably require the usage of >>> proactive caching as well as medium latency molap storage structure. >>> >>> By using proactive caching, SSAS will intelligently switch between >>> ROLAP and MOLAP depending on the data changes in the production >>> database. What is confusing to me is, what happens to ETL then? Does >>> this mean that if SSAS is made to query directly from the production >>> database, then ETL is totally eliminated? >>> >>> Any recommendations will be appreciated. >>> >>> Thanks. >>> Nes >> > > > > -- > Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
| |||
| Hi Jeje, Thanks for the recommendation, I've been ponder about this and thought that's the best approach as well. It's great to hear from someone who agrees with the design. At the end of the day there should be one tha caters to real time reporting and another for historical analysis reporting. It's a pity that the vision of 'one version of truth' might have to be compromised in this case though. Cheers. On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willgart@hotmail.com> wrote: > Like marco says, > > use views in your source database to apply some business rules and > cleansing process. > > for the choice of MOLAP or ROLAP... > > historical partitions generally are stored in MOLAP mode, because you > don't update them frequently. > the "Today" partition can be in ROLAP mode and use indexed views. > But I recommend the low latency MOLAP mode. provides good performance by > keeping the cache longer, but you suffer few seconds of latency between > an update and the cube update. > if you want to be more realtime the HOLAP mode provides updated result > all the time. > > SCD dimension should not impact historical data. but your business key > must be unique other the time to insure usage of both real time and > historical content. > > but regarding the data cleansing process you have to do and other > technical constraints. > Try to split your solution in 2 projects: > Real time project --> contains only "today" data, there is no SCD > dimension > Historical project -> contains all other data and the SCD dimensions > > generally doing analysis against the history will not answer the same > questions as the real time access. > > "Nestor" <n3570r@yahoo.com> wrote in message > news >> Hi Jeje, >> >> Thanks for the recommendation. The tricky part is will need the system >> to fulfil real time scenarios, have the capability to store historical >> data as well as being able to due with very large SCDs. >> >> Proactive caching when set up properly does changes from MOLAP to ROLAP >> and to MOLAP again according to set up (some reference >> http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at >> the end of the day I might need to identify what needs to be real time >> (hence ROLAP structure) and what needs to be historical + scheduled >> (hence MOLAP and Snowflaked). >> >> I did some intensive research to UDM and basically 'concluded' that the >> concept of data cleansing doesn't sit very well in it. Seems like I >> need to break out the system into 2 conceptual areas in order to fulfil >> all that requirements... >> >> Thanks. >> Regards, >> David Ong >> >> >> >> >> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote: >> >>> SSAS 2005 like any other OLAP tool on the market can works with flat >>> schemas and more complicated schemas. >>> >>> but using star or snowflake schema is far better for a DW project and >>> when you create a cube. >>> SSAS 2005 is able to create the tables for you, so instead of creating >>> the tables in the database then creating the cube, you create your >>> dimensions and cubes first and SSAS2005 will generates the tables for >>> you. >>> this greatly simplify the prototype step of your project. >>> >>> proactive caching don't switch from MOLAP to ROLAP. >>> proactive caching can detect changes in the source database and then >>> update the cube automatically or can do an incremental loading >>> automatically. >>> there is a lot of options available here for different usage and >>> requirements. >>> but there is no switch from MOLAP to ROLAP. >>> >>> the ETL process is not required if your cubes are directly connect to >>> the operational database. >>> today, a lot of databases schemas are more accessible then before and >>> the data quality is good, so accessing the source database is a good >>> option. >>> it's also a good option when you have only 1 database to play with >>> (and not a lot of databases to synchronize) >>> but if you have to sync. multiple databases, or you have to do a lot >>> of cleansing / transformations, then an ETL process is required. >>> another option we can found today: >>> the transaction made to the operational system are also replicated >>> into the datawarehouse in real time (using triggers, procedures or >>> workflow systems) so the DW database himself become a realtime system. >>> >>> >>> "Nestor" <n3570r@yahoo.com> wrote in message >>> news >>>> Hi everyone, >>>> >>>> I have a general question to datawarehousing and it'll be great if >>>> someone can drop some tips here. I'm about to start designing a >>>> corporate real time datawarehouse with SQL Server 2005 and is >>>> deciding on the approach for this. >>>> >>>> SQL Server 2005 now allows us to actually build a datawarehouse >>>> without actually restructuring the datasources into star/snowflake >>>> schemas. I'm curious to know if this will have any performance impact >>>> on the actual implementation? This project will probably require the >>>> usage of proactive caching as well as medium latency molap storage >>>> structure. >>>> >>>> By using proactive caching, SSAS will intelligently switch between >>>> ROLAP and MOLAP depending on the data changes in the production >>>> database. What is confusing to me is, what happens to ETL then? Does >>>> this mean that if SSAS is made to query directly from the production >>>> database, then ETL is totally eliminated? >>>> >>>> Any recommendations will be appreciated. >>>> >>>> Thanks. >>>> Nes >>> >> >> >> >> -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
| |||
| I'm still curious to know if there're anyone who is able to advise on building a realtime BI system which is capable of storing historical information, and which dimensions are large SCD Type IIs. For all I know SQLServer 2005 Analysis Service's proactive caching is good for direct connection with the relational database, and it does a good job in balancing performance with near real time data. However, due to the inherited nature of this, we miss out a very important layer which makes it very hard to hold historical data, and even harder to make what appears to be SCD TYPE 1 (even though it's really not type 1) into a Type II. I'm wondering if there's any solution better than splitting the project into 2 sub projects with 1 as real time and the other as historical. I'm trying to maintain the 'one version of truth' perspective in this project. Cheers. On Thu, 23 Aug 2007 15:09:03 -0700, Nestor <n3570r@yahoo.com> wrote: > Hi Jeje, > > Thanks for the recommendation, I've been ponder about this and thought > that's the best approach as well. It's great to hear from someone who > agrees with the design. At the end of the day there should be one tha > caters to real time reporting and another for historical analysis > reporting. It's a pity that the vision of 'one version of truth' might > have to be compromised in this case though. > > Cheers. > > > On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willgart@hotmail.com> wrote: > >> Like marco says, >> >> use views in your source database to apply some business rules and >> cleansing process. >> >> for the choice of MOLAP or ROLAP... >> >> historical partitions generally are stored in MOLAP mode, because you >> don't update them frequently. >> the "Today" partition can be in ROLAP mode and use indexed views. >> But I recommend the low latency MOLAP mode. provides good performance >> by keeping the cache longer, but you suffer few seconds of latency >> between an update and the cube update. >> if you want to be more realtime the HOLAP mode provides updated result >> all the time. >> >> SCD dimension should not impact historical data. but your business key >> must be unique other the time to insure usage of both real time and >> historical content. >> >> but regarding the data cleansing process you have to do and other >> technical constraints. >> Try to split your solution in 2 projects: >> Real time project --> contains only "today" data, there is no SCD >> dimension >> Historical project -> contains all other data and the SCD dimensions >> >> generally doing analysis against the history will not answer the same >> questions as the real time access. >> >> "Nestor" <n3570r@yahoo.com> wrote in message >> news >>> Hi Jeje, >>> >>> Thanks for the recommendation. The tricky part is will need the system >>> to fulfil real time scenarios, have the capability to store historical >>> data as well as being able to due with very large SCDs. >>> >>> Proactive caching when set up properly does changes from MOLAP to >>> ROLAP and to MOLAP again according to set up (some reference >>> http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at >>> the end of the day I might need to identify what needs to be real time >>> (hence ROLAP structure) and what needs to be historical + scheduled >>> (hence MOLAP and Snowflaked). >>> >>> I did some intensive research to UDM and basically 'concluded' that >>> the concept of data cleansing doesn't sit very well in it. Seems like >>> I need to break out the system into 2 conceptual areas in order to >>> fulfil all that requirements... >>> >>> Thanks. >>> Regards, >>> David Ong >>> >>> >>> >>> >>> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote: >>> >>>> SSAS 2005 like any other OLAP tool on the market can works with flat >>>> schemas and more complicated schemas. >>>> >>>> but using star or snowflake schema is far better for a DW project and >>>> when you create a cube. >>>> SSAS 2005 is able to create the tables for you, so instead of >>>> creating the tables in the database then creating the cube, you >>>> create your dimensions and cubes first and SSAS2005 will generates >>>> the tables for you. >>>> this greatly simplify the prototype step of your project. >>>> >>>> proactive caching don't switch from MOLAP to ROLAP. >>>> proactive caching can detect changes in the source database and then >>>> update the cube automatically or can do an incremental loading >>>> automatically. >>>> there is a lot of options available here for different usage and >>>> requirements. >>>> but there is no switch from MOLAP to ROLAP. >>>> >>>> the ETL process is not required if your cubes are directly connect to >>>> the operational database. >>>> today, a lot of databases schemas are more accessible then before and >>>> the data quality is good, so accessing the source database is a good >>>> option. >>>> it's also a good option when you have only 1 database to play with >>>> (and not a lot of databases to synchronize) >>>> but if you have to sync. multiple databases, or you have to do a lot >>>> of cleansing / transformations, then an ETL process is required. >>>> another option we can found today: >>>> the transaction made to the operational system are also replicated >>>> into the datawarehouse in real time (using triggers, procedures or >>>> workflow systems) so the DW database himself become a realtime system. >>>> >>>> >>>> "Nestor" <n3570r@yahoo.com> wrote in message >>>> news >>>>> Hi everyone, >>>>> >>>>> I have a general question to datawarehousing and it'll be great if >>>>> someone can drop some tips here. I'm about to start designing a >>>>> corporate real time datawarehouse with SQL Server 2005 and is >>>>> deciding on the approach for this. >>>>> >>>>> SQL Server 2005 now allows us to actually build a datawarehouse >>>>> without actually restructuring the datasources into star/snowflake >>>>> schemas. I'm curious to know if this will have any performance >>>>> impact on the actual implementation? This project will probably >>>>> require the usage of proactive caching as well as medium latency >>>>> molap storage structure. >>>>> >>>>> By using proactive caching, SSAS will intelligently switch between >>>>> ROLAP and MOLAP depending on the data changes in the production >>>>> database. What is confusing to me is, what happens to ETL then? Does >>>>> this mean that if SSAS is made to query directly from the production >>>>> database, then ETL is totally eliminated? >>>>> >>>>> Any recommendations will be appreciated. >>>>> >>>>> Thanks. >>>>> Nes >>>> >>> >>> >>> >>> -- Using Opera's revolutionary e-mail client: >>> http://www.opera.com/mail/ >> > > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
| ||||
| On Aug 29, 4:14 pm, Nestor <n35...@yahoo.com> wrote: > I'm still curious to know if there're anyone who is able to advise on > building a realtime BI system which is capable of storing historical > information, and which dimensions are large SCD Type IIs. > > For all I know SQLServer 2005 Analysis Service's proactive caching is good > for direct connection with the relational database, and it does a good job > in balancing performance with near real time data. However, due to the > inherited nature of this, we miss out a very important layer which makes > it very hard to hold historical data, and even harder to make what appears > to be SCD TYPE 1 (even though it's really not type 1) into a Type II. > > I'm wondering if there's any solution better than splitting the project > into 2 sub projects with 1 as real time and the other as historical. I'm > trying to maintain the 'one version of truth' perspective in this project. > > Cheers. > > > > > > On Thu, 23 Aug 2007 15:09:03 -0700, Nestor <n35...@yahoo.com> wrote: > > Hi Jeje, > > > Thanks for the recommendation, I've been ponder about this and thought > > that's the best approach as well. It's great to hear from someone who > > agrees with the design. At the end of the day there should be one tha > > caters to real time reporting and another for historical analysis > > reporting. It's a pity that the vision of 'one version of truth' might > > have to be compromised in this case though. > > > Cheers. > > > On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willg...@hotmail.com> wrote: > > >> Like marco says, > > >> use views in your source database to apply some business rules and > >> cleansing process. > > >> for the choice of MOLAP or ROLAP... > > >> historical partitions generally are stored in MOLAP mode, because you > >> don't update them frequently. > >> the "Today" partition can be in ROLAP mode and use indexed views. > >> But I recommend the low latency MOLAP mode. provides good performance > >> by keeping the cache longer, but you suffer few seconds of latency > >> between an update and the cube update. > >> if you want to be more realtime the HOLAP mode provides updated result > >> all the time. > > >> SCD dimension should not impact historical data. but your business key > >> must be unique other the time to insure usage of both real time and > >> historical content. > > >> but regarding the data cleansing process you have to do and other > >> technical constraints. > >> Try to split your solution in 2 projects: > >> Real time project --> contains only "today" data, there is no SCD > >> dimension > >> Historical project -> contains all other data and the SCD dimensions > > >> generally doing analysis against the history will not answer the same > >> questions as the real time access. > > >> "Nestor" <n35...@yahoo.com> wrote in message > >>news > >>> Hi Jeje, > > >>> Thanks for the recommendation. The tricky part is will need the system > >>> to fulfil real time scenarios, have the capability to store historical > >>> data as well as being able to due with very large SCDs. > > >>> Proactive caching when set up properly does changes from MOLAP to > >>> ROLAP and to MOLAP again according to set up (some reference > >>>http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at > >>> the end of the day I might need to identify what needs to be real time > >>> (hence ROLAP structure) and what needs to be historical + scheduled > >>> (hence MOLAP and Snowflaked). > > >>> I did some intensive research to UDM and basically 'concluded' that > >>> the concept of data cleansing doesn't sit very well in it. Seems like > >>> I need to break out the system into 2 conceptual areas in order to > >>> fulfil all that requirements... > > >>> Thanks. > >>> Regards, > >>> David Ong > > >>> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willg...@hotmail.com> wrote: > > >>>> SSAS 2005 like any other OLAP tool on the market can works with flat > >>>> schemas and more complicated schemas. > > >>>> but using star or snowflake schema is far better for a DW project and > >>>> when you create a cube. > >>>> SSAS 2005 is able to create the tables for you, so instead of > >>>> creating the tables in the database then creating the cube, you > >>>> create your dimensions and cubes first and SSAS2005 will generates > >>>> the tables for you. > >>>> this greatly simplify the prototype step of your project. > > >>>> proactive caching don't switch from MOLAP to ROLAP. > >>>> proactive caching can detect changes in the source database and then > >>>> update the cube automatically or can do an incremental loading > >>>> automatically. > >>>> there is a lot of options available here for different usage and > >>>> requirements. > >>>> but there is no switch from MOLAP to ROLAP. > > >>>> the ETL process is not required if your cubes are directly connect to > >>>> the operational database. > >>>> today, a lot of databases schemas are more accessible then before and > >>>> the data quality is good, so accessing the source database is a good > >>>> option. > >>>> it's also a good option when you have only 1 database to play with > >>>> (and not a lot of databases to synchronize) > >>>> but if you have to sync. multiple databases, or you have to do a lot > >>>> of cleansing / transformations, then an ETL process is required. > >>>> another option we can found today: > >>>> the transaction made to the operational system are also replicated > >>>> into the datawarehouse in real time (using triggers, procedures or > >>>> workflow systems) so the DW database himself become a realtime system. > > >>>> "Nestor" <n35...@yahoo.com> wrote in message > >>>>news > >>>>> Hi everyone, > > >>>>> I have a general question to datawarehousing and it'll be great if > >>>>> someone can drop some tips here. I'm about to start designing a > >>>>> corporate real time datawarehouse with SQL Server 2005 and is > >>>>> deciding on the approach for this. > > >>>>> SQL Server 2005 now allows us to actually build a datawarehouse > >>>>> without actually restructuring the datasources into star/snowflake > >>>>> schemas. I'm curious to know if this will have any performance > >>>>> impact on the actual implementation? This project will probably > >>>>> require the usage of proactive caching as well as medium latency > >>>>> molap storage structure. > > >>>>> By using proactive caching, SSAS will intelligently switch between > >>>>> ROLAP and MOLAP depending on the data changes in the production > >>>>> database. What is confusing to me is, what happens to ETL then? Does > >>>>> this mean that if SSAS is made to query directly from the production > >>>>> database, then ETL is totally eliminated? > > >>>>> Any recommendations will be appreciated. > > >>>>> Thanks. > >>>>> Nes > > >>> -- Using Opera's revolutionary e-mail client: > >>>http://www.opera.com/mail/ > > -- > Using Opera's revolutionary e-mail client:http://www.opera.com/mail/- Hide quoted text - > > - Show quoted text - As you note, the problem linking directly with relational database is the lack of surrogate keys. One simple solution is splitting the issue in 2 projects. One harder solution (to build and to maintain) is the use of a separate partition for the "current data". You might prefer to feed this partition with a very high frequency using a SSIS package that "pushes" data directly into the partition. In this case you have an ETL that might generate all surrogate keys you need. At the end of the day, the "current day" partition can be destroyed and the "regular" ETL can process the new data with full historical attributes (you might skip some dimensions in the "current day" processing). In general, you have to balance the cost of development, the time to process, the volume of data and the latency the user can tolerate. I think there is no "silver bullet" that is good for every requirement. Marco Russo http://www.sqlbi.eu http://sqlblog.com/blogs/marco_russo |
| Thread Tools | |
| Display Modes | |
|
|