This is a discussion on DB Structure Advice - lots of measurements in DB or file refs? within the SQL Server forums, part of the Microsoft SQL Server category; --> A project I'm thinking about, could use some advice from those who understand the dimensional limits of an SQL-like ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| A project I'm thinking about, could use some advice from those who understand the dimensional limits of an SQL-like DB. Imagine, for example, a database containing detailed data for many cities during thunderstorms. Each time there is a storm, data collection begins, gathering "rain-drops-per millisecond" maybe 5 times per second (don't worry about the sanity of this, its just an analogy). The logistical data is no concern for me (cities, dates, storm duration etc). I anticipate a few hundred cities and but also thousands of storm events associated with those various cities. However the storage of rain storm data is my question. Should "rain-drop-rate" data be fields in a rain-drop-rate measurements table, with columns of "rate-drop-rate" and a "storm event" foreign key for the storm to which the event belongs? That seems like it would be a poor use of DB resources, as there would be billions of entries in that table after many hundreds or thousands of storms are captured. The alternatives that come to mind are a) each storm event has a file reference, and the file contains thousands of pairs of time and rain-drop-rate. Then I have directories full of files. b) each storm event is a new table in the database with the columns of "rain-drop-rate" and "time of measurement" (The database still gets filled with billions of entries, but spread among thousands of tables) Is there some other option I should consider? It seems like a is the right answer but your input is appreciated... Ross. |
| |||
| RgeeK (Ross@no.thanks.spammers) writes: > Imagine, for example, a database containing detailed data for many > cities during thunderstorms. Each time there is a storm, data > collection begins, gathering "rain-drops-per millisecond" maybe 5 times > per second (don't worry about the sanity of this, its just an analogy). > > The logistical data is no concern for me (cities, dates, storm duration > etc). I anticipate a few hundred cities and but also thousands of storm > events associated with those various cities. > > However the storage of rain storm data is my question. > > Should "rain-drop-rate" data be fields in a rain-drop-rate measurements > table, with columns of "rate-drop-rate" and a "storm event" foreign key > for the storm to which the event belongs? That seems like it would be > a poor use of DB resources, as there would be billions of entries in > that table after many hundreds or thousands of storms are captured. > > The alternatives that come to mind are > > a) each storm event has a file reference, and the file contains > thousands of pairs of time and rain-drop-rate. Then I have directories > full of files. > > b) each storm event is a new table in the database with the columns of > "rain-drop-rate" and "time of measurement" (The database still gets > filled with billions of entries, but spread among thousands of tables) > > Is there some other option I should consider? It seems like a is the > right answer but your input is appreciated... Since we don't know the real business problem and what you will use the data for, it's difficult to say. But your desciption reminds me of the problem the racing team McLaren faced. They implemented, together with people from Conchango, a solution where used the new Filestream feature in SQL 2008. You can read about it on http://www.microsoft.com/casestudies...yid=4000001476 McLaren wanted to work with their data from Excel, why the Filestream solution suited them well. If you want to run SELECT queries against the data, this is less practical. Your logical model should certainly be (storm event, raim-drop-rate, time-of-measurement), but there are some alternatives to organise it. SQL Server offers to partitioning solutions: partition views available already in SQL 2000 and in all editions, and partitioned tables, added in SQL 2005, and available only in Enterprise Edition. Partitioned tables are more solid, but both solutions permit you to add and drop partitions as needed. Partitioned tables can have up 999 partitions if memory serves, whereas partitioned views has a limit of 256 tables. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| >> Is there some other option I should consider? << Kx (http://kx.com/) and Streambase (http://www.streambase.com/); they were designed from the ground up for capturing this kind of moving data streams and have been in use for years. |
| ||||
| Thanks for the suggestions. Happy to see the words "billions of records" in there. --CELKO-- wrote: >>> Is there some other option I should consider? << > > Kx (http://kx.com/) and Streambase (http://www.streambase.com/); they > were designed from the ground up for capturing this kind of moving > data streams and have been in use for years. |
| Thread Tools | |
| Display Modes | |
|
|