This is a discussion on Staging Area Design within the SQL Server Data Warehousing forums, part of the Microsoft SQL Server category; --> Hi there, For ETL purposes, I'm wondering whether it would be better to: - grouping all my heterogenous source ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| |||
| Chris Leroquais wrote: > For ETL purposes, I'm wondering whether it would be better to: > - grouping all my heterogenous source systems tables into a same Staging > Database > or > - Using a dedicated Staging Database for each source system It depends on your hardware and on your version of SQL Server. In the best scenario you have SQL Server 2005 Enterprise and your databases are located on a RAID-10 group striping on a lot of disks (like a SAN). In this case you should use a single database and use table partitions as needed. If you don't have SQL 2005 Enterprise then you should make different databases. If you don't have a RAID-10 array you should also locate the databases on different disks to optimize I/O. |
| |||
| Hello Chris, I usually go for one staging database. Staging is a useful way of validating records as they come into to the ETL process. It's the scratch or work space for the ETL to use, before populating your ODS or Star schemas. Make sure you select the correct recovery model of the staging db. Simple recovery model is the easiest to implement, however bulk load will give you the best prefromance. I am not sure about the SAN advice. Other than it is good advice for setting up mid-large (>300 GB) scale system. However there are a lot of discussions on whether dedicated disk is better for database solutions. So if you don't have a SAN for you system I would not worry about it. What version of SQL server are you using? Hope this helps, Myles Matheson Data Warehouse Architect http://bi-on-sql-server.blogspot.com/ |
| ||||
| > I am not sure about the SAN advice. Other than it is good advice for > setting up mid-large (>300 GB) scale system. SAN is not a solution designed only for large databases, it is also very handy when performance is critical. Size does not always matter. > However there are a lot of > discussions on whether dedicated disk is better for database solutions. Yep, there is also a lot of discussions about Elvis whereabouts... Seriously, a LUN of 14 disks in a dedicated RAID-10 array will offer a performance that a single disk can't possibly achieve. Even a 4-disks array will be faster. Thanks to the striping, the workload is spread across all the disks (actually half of them). Good performance can be achieved with a traditional RAID controller; however the read/write cache and the impressive speed (up to 4G/s on FC these days) are features that make SAN the best choice for databases storage. As long as there is some room in the budget of course. In many (or most) scenarios the SAN can be just too expensive. But as far as the performance goes, this is the best choice. > So if you don't have a SAN for you system I would not worry about it. I agree that SAN can be overkill in some situations. However, you have no idea what are the load and performance requirements, so I wonder how you can make such a statement. > Hope this helps, > > Myles Matheson > Data Warehouse Architect A data warehouse architect that is not sure that SAN are a good solution for ETL databases... Quite puzzling. |
| Thread Tools | |
| Display Modes | |
|
|