This is a discussion on many tables vs one big table within the SQL Server forums, part of the Microsoft SQL Server category; --> Dear experts, If I have a number of related groups of data (e.g., stock prices for different stocks) is ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Dear experts, If I have a number of related groups of data (e.g., stock prices for different stocks) is it better to put them into many tables (e.g., one table per stock) or into one big table (e.g., a single table where one column is the stock id other columns are the price and other data)? The single table approach with a clustered primary key including the stock id and other information that is different for each stock seems cleaner to me. But I worry that having a single table could hurt preformance when the database gets very large. The clustered primary key should make queries on the single table fast, but I'm worried about inserts. According to my vague understanding of a clustered primary key, records are physically stored on disk based on the primary key. So if I insert a record for a given stock does the database have to physicall "move down" all the records that are below that or does it do something smarter? To summarize: 1. Should I use a single table or many tables? 2. If using a single table, how are inserts handled by the DB to prevent inserts from getting slower and slower as more items are added to the DB? Thanks in advance for your help and advice. |
| |||
| Use a single table, not many tables. Use a multi-column key that includes the stock. When a new stock comes along it should be a user transaction to add a row to the Stocks master table, not a database administrator's job to add tables. The database design should be stable, not in a constant state of change. Performance of INSERTS does not change much between inserting to a table with 10000 rows or 10000000 rows. Roy Harvey Beacon Falls, CT On 2 Jan 2007 08:57:21 -0800, "Emin" <emin.shopper@gmail.com> wrote: >Dear experts, > >If I have a number of related groups of data (e.g., stock prices for >different stocks) is it better to put them into many tables (e.g., one >table per stock) or into one big table (e.g., a single table where one >column is the stock id other columns are the price and other data)? > >The single table approach with a clustered primary key including the >stock id and other information that is different for each stock seems >cleaner to me. But I worry that having a single table could hurt >preformance when the database gets very large. The clustered primary >key should make queries on the single table fast, but I'm worried about >inserts. According to my vague understanding of a clustered primary >key, records are physically stored on disk based on the primary key. So >if I insert a record for a given stock does the database have to >physicall "move down" all the records that are below that or does it do >something smarter? > >To summarize: > > 1. Should I use a single table or many tables? > > 2. If using a single table, how are inserts handled by the DB to >prevent inserts from getting slower and slower as more items are added >to the DB? > >Thanks in advance for your help and advice. |
| |||
| Emin (emin.shopper@gmail.com) writes: > If I have a number of related groups of data (e.g., stock prices for > different stocks) is it better to put them into many tables (e.g., one > table per stock) or into one big table (e.g., a single table where one > column is the stock id other columns are the price and other data)? There are two reasons why you would have more than one table: 1) Information is so different for different (groups of) stocks, that you get different set of columns. 2) The table is really big. Then you would might make into a partioned view or table, based on condition. But that would not be one table per stock id. So, in the logical design, that's one table. > The single table approach with a clustered primary key including the > stock id and other information that is different for each stock seems > cleaner to me. But I worry that having a single table could hurt > preformance when the database gets very large. The clustered primary > key should make queries on the single table fast, but I'm worried about > inserts. According to my vague understanding of a clustered primary > key, records are physically stored on disk based on the primary key. So > if I insert a record for a given stock does the database have to > physicall "move down" all the records that are below that or does it do > something smarter? If the PK is monotonically ascending new rows are just added at the end, all continues smoothly. If the PK is not ascending, but for instance a random GUID, then you will get a lots of page splits, and that takes extra power. But the answer in this case is that you should not cluster on the GUID, but on something else. It's perfectly possible to have the clustered index on some other column(s). -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| On 2 Jan 2007 08:57:21 -0800, Emin wrote: (snip) >if I insert a record for a given stock does the database have to >physicall "move down" all the records that are below that or does it do >something smarter? Hi Emin, Roy answered most of your questions, but was friendly enough to leave this bit for me to comment on. Yes, it does something smarter. Rows are *NOT* physically stored in the order imposed by the clustered index. The clustered index imposes a *logical* ordering on the data, which is implemented by pointers that form a logical chain of database pages. If a row in inserted, SQL Server first navigates the clustered index to find the correct location (database page). Then, there are two possibilities: 1. There's still space left on the page. The row is added and the insert is finished. 2. There's not enough free space on the page. A page split occurs: half of the data on the page is moved to a newly allocated page and pointers are updated to reflect the location of the new page in the chain. After that, the new row is inserted. If your database has to process *LOTS* of inserts (in the order of thousands per second or more), it makes sense to define your clustered index such that new rows are always inserted at the logical end of the pointer chain. In that case, page splits will never happen. For lower amounts of inserts, the overhead of a page split is insignificant. -- Hugo Kornelis, SQL Server MVP My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis |
| |||
| Dear Roy, Erland, and Hugo, Thank you very much for your excellent responses. It sounds like a single logical table is the way to go since the frequency of inserts is not very high. If necessary, where would I get more information about low level details like page splits (e.g., how to control the page size, how to determine when and how many page splits are occuring, etc.)? Thanks again, -Emin On Jan 2, 5:53 pm, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID> wrote: > On 2 Jan 2007 08:57:21 -0800, Emin wrote: > > (snip) > > >if I insert a record for a given stock does the database have to > >physicall "move down" all the records that are below that or does it do > >something smarter?Hi Emin, > > Roy answered most of your questions, but was friendly enough to leave > this bit for me to comment on. > > Yes, it does something smarter. Rows are *NOT* physically stored in the > order imposed by the clustered index. The clustered index imposes a > *logical* ordering on the data, which is implemented by pointers that > form a logical chain of database pages. > > If a row in inserted, SQL Server first navigates the clustered index to > find the correct location (database page). Then, there are two > possibilities: > > 1. There's still space left on the page. The row is added and the insert > is finished. > > 2. There's not enough free space on the page. A page split occurs: half > of the data on the page is moved to a newly allocated page and pointers > are updated to reflect the location of the new page in the chain. After > that, the new row is inserted. > > If your database has to process *LOTS* of inserts (in the order of > thousands per second or more), it makes sense to define your clustered > index such that new rows are always inserted at the logical end of the > pointer chain. In that case, page splits will never happen. > For lower amounts of inserts, the overhead of a page split is > insignificant. > > -- > Hugo Kornelis, SQL Server MVP > My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis |
| |||
| One more question, is there a process similar to defragmentation that I should run periodically to collect split pages together? Thanks again, -Emin On Jan 3, 1:42 pm, "Emin" <emin.shop...@gmail.com> wrote: > Dear Roy, Erland, and Hugo, > > Thank you very much for your excellent responses. It sounds like a > single logical table is the way to go since the frequency of inserts is > not very high. If necessary, where would I get more information about > low level details like page splits (e.g., how to control the page size, > how to determine when and how many page splits are occuring, etc.)? > > Thanks again, > -Emin > > On Jan 2, 5:53 pm, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID> > wrote: > > > On 2 Jan 2007 08:57:21 -0800, Emin wrote: > > > (snip) > > > >if I insert a record for a given stock does the database have to > > >physicall "move down" all the records that are below that or does it do > > >something smarter?Hi Emin, > > > Roy answered most of your questions, but was friendly enough to leave > > this bit for me to comment on. > > > Yes, it does something smarter. Rows are *NOT* physically stored in the > > order imposed by the clustered index. The clustered index imposes a > > *logical* ordering on the data, which is implemented by pointers that > > form a logical chain of database pages. > > > If a row in inserted, SQL Server first navigates the clustered index to > > find the correct location (database page). Then, there are two > > possibilities: > > > 1. There's still space left on the page. The row is added and the insert > > is finished. > > > 2. There's not enough free space on the page. A page split occurs: half > > of the data on the page is moved to a newly allocated page and pointers > > are updated to reflect the location of the new page in the chain. After > > that, the new row is inserted. > > > If your database has to process *LOTS* of inserts (in the order of > > thousands per second or more), it makes sense to define your clustered > > index such that new rows are always inserted at the logical end of the > > pointer chain. In that case, page splits will never happen. > > For lower amounts of inserts, the overhead of a page split is > > insignificant. > > > -- > > Hugo Kornelis, SQL Server MVP > > My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis |
| |||
| Emin (emin.shopper@gmail.com) writes: > One more question, is there a process similar to defragmentation that I > should run periodically to collect split pages together? Yes, DBCC DBREINDEX on SQL 2000. The preferred syntax on SQL 2005 is ALTER INDEX WITH REBUILD or something like that. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| ||||
| On 3 Jan 2007 10:42:46 -0800, Emin wrote: > If necessary, where would I get more information about >low level details like page splits (e.g., how to control the page size, >how to determine when and how many page splits are occuring, etc.)? Hi Emin, For that kind of information, the only serious resource are the books written by Kalen Delaney. Quite advanced stuff, definitely not entry level, but that's unavoidable if you want to know about page splits. -- Hugo Kornelis, SQL Server MVP My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis |
| Thread Tools | |
| Display Modes | |
|
|