This is a discussion on SQL Server storing large amounts of data in multiple tables within the SQL Server forums, part of the Microsoft SQL Server category; --> Hello, Currently we have a database, and it is our desire for it to be able to store millions ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, Currently we have a database, and it is our desire for it to be able to store millions of records. The data in the table can be divided up by client, and it stores nothing but about 7 integers. | table | | id | clientId | int1 | int2 | int 3 | ... | Right now, our benchmarks indicate a drastic increase in performance if we divide the data into different tables. For example, table_clientA, table_clientB, table_clientC, despite the fact the tables contain the exact same columns. This however does not seem very clean or elegant to me, and rather illogical since a database exists as a single file on the harddrive. | table_clientA | | id | clientId | int1 | int2 | int 3 | ... | table_clientB | | id | clientId | int1 | int2 | int 3 | ... | table_clientC | | id | clientId | int1 | int2 | int 3 | ... Is there anyway to duplicate this increase in database performance gained by splitting the table, perhaps by using a certain type of index? Thanks, Jeff Brubaker Software Developer |
| |||
| Why not create a view that will combine the separate tables back into your original format. You could even place the different base tables in different locations (i.e., different servers) - the whole distributed partitioned view concept. Other than this, what do you have indexed on this table? What kind of query is showing a drastic improvement when you separate the table like this? -Chuck Urwiler, MCSD, MCDBA "jeff brubaker" <jeff@priva.com> wrote in message news:b7387660.0310021523.f2fb5e6@posting.google.co m... > Hello, > Currently we have a database, and it is our desire for it to be able > to store millions of records. The data in the table can be divided up > by client, and it stores nothing but about 7 integers. > | table | > | id | clientId | int1 | int2 | int 3 | ... | > > Right now, our benchmarks indicate a drastic increase in performance > if we divide the data into different tables. For example, > table_clientA, table_clientB, table_clientC, despite the fact the > tables contain the exact same columns. This however does not seem very > clean or elegant to me, and rather illogical since a database exists > as a single file on the harddrive. > > | table_clientA | > | id | clientId | int1 | int2 | int 3 | ... > > | table_clientB | > | id | clientId | int1 | int2 | int 3 | ... > > | table_clientC | > | id | clientId | int1 | int2 | int 3 | ... > > Is there anyway to duplicate this increase in database performance > gained by splitting the table, perhaps by using a certain type of > index? > > Thanks, > Jeff Brubaker > Software Developer |
| |||
| [posted and mailed, please reply in news] jeff brubaker (jeff@priva.com) writes: > Currently we have a database, and it is our desire for it to be able > to store millions of records. The data in the table can be divided up > by client, and it stores nothing but about 7 integers. >| table | >| id | clientId | int1 | int2 | int 3 | ... | > > Right now, our benchmarks indicate a drastic increase in performance > if we divide the data into different tables. For example, > table_clientA, table_clientB, table_clientC, despite the fact the > tables contain the exact same columns. This however does not seem very > clean or elegant to me, and rather illogical since a database exists > as a single file on the harddrive. >... > > Is there anyway to duplicate this increase in database performance > gained by splitting the table, perhaps by using a certain type of > index? It is not implausible, but with out further knowledge of your tables and the benchmark queries, it is impossible to tell. You could get a more informative answer, if you posted: o The CREATE TABLE statements (both for the unpartitioned table, and the partitioned table). o Any indexes on the tables. o The queries you use for the benchmark. o If you have scripts that generates data for the benchmarks, that would extremely useful. (Provided that they reasonably small.) Which client did you use for the benchmark? Query Analyzer? -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |
| |||
| Erland Sommarskog <sommar@algonet.se> wrote in message news:<Xns940AEEEF4BC5EYazorman@127.0.0.1>... > [posted and mailed, please reply in news] > > jeff brubaker (jeff@priva.com) writes: > > Currently we have a database, and it is our desire for it to be able > > to store millions of records. The data in the table can be divided up > > by client, and it stores nothing but about 7 integers. > >| table | > >| id | clientId | int1 | int2 | int 3 | ... | > > > > Right now, our benchmarks indicate a drastic increase in performance > > if we divide the data into different tables. For example, > > table_clientA, table_clientB, table_clientC, despite the fact the > > tables contain the exact same columns. This however does not seem very > > clean or elegant to me, and rather illogical since a database exists > > as a single file on the harddrive. > >... > > > > Is there anyway to duplicate this increase in database performance > > gained by splitting the table, perhaps by using a certain type of > > index? > > It is not implausible, but with out further knowledge of your tables > and the benchmark queries, it is impossible to tell. > > You could get a more informative answer, if you posted: > > o The CREATE TABLE statements (both for the unpartitioned table, > and the partitioned table). > o Any indexes on the tables. > o The queries you use for the benchmark. > o If you have scripts that generates data for the benchmarks, that > would extremely useful. (Provided that they reasonably small.) > > Which client did you use for the benchmark? Query Analyzer? Okay, sorry for the delay. Here is a SQL Script to setup the experiment. Basically it builts one table with 100,000 records, and 10 small tables with 10,000 records. To select all the records from table_5 is substantially faster than selecting all the records from bigTable where clientID = 5 SET NOCOUNT ON /* Drop any tables that might exist */ if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[bigTable]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[bigTable] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_0]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_0] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_1]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_1] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_2]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_2] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_3]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_3] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_4]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_4] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_5]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_5] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_6]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_6] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_7]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_7] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_8]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_8] if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[table_9]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[table_9] /* Create the tables */ CREATE TABLE [dbo].[bigTable] ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id110] [int] NULL , [clientid] [int] NULL ) ON [PRIMARY] CREATE TABLE table_1 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_2 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_3 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_4 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_5 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_6 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_7 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_8 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_9 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] CREATE TABLE table_0 ( [id1] [int] NULL , [id2] [int] NULL , [id3] [int] NULL , [id4] [int] NULL , [id5] [int] NULL , [id6] [int] NULL , [id7] [int] NULL , [id8] [int] NULL , [id9] [int] NULL , [id10] [int] NULL , [id11] [int] NULL , [id12] [int] NULL , [id13] [int] NULL , [id14] [int] NULL , [id15] [int] NULL , [id16] [int] NULL , [id17] [int] NULL , [id18] [int] NULL , [id19] [int] NULL , [id20] [int] NULL , ) ON [PRIMARY] DECLARE @countPerClient int SET @countPerClient = 10000 DECLARE @counter int SET @counter = 1 /* Fill the big table with the 10 clients */ WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (0) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (1) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (2) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (3) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (4) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (5) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (6) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (7) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (8) SET @counter = @counter + 1 END SET @counter=1 WHILE (@counter <= @countPerClient) BEGIN INSERT bigTable (clientId) VALUES (9) SET @counter = @counter + 1 END /* Fill each of the table with 1 clients */ SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_1 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_2 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_3 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_4 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_5 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_6 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_7 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_8 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_9 DEFAULT VALUES SET @counter = @counter + 1 END SET @counter = 1 WHILE (@counter <= @countPerClient) BEGIN INSERT table_0 DEFAULT VALUES SET @counter = @counter + 1 END GO /* Now time for the queries */ DECLARE @x datetime SELECT @x = GetDate() select count(*) from table_5 SELECT 'Split Tables' as label,DateDiff(millisecond, @x, GetDate()) SELECT @x = GetDate() select count(*) from bigTable where clientId=5 SELECT 'Big Table' as label,DateDiff(millisecond, @x, GetDate()) |
| ||||
| [posted and mailed, please reply in news] jeff brubaker (jeff@priva.com) writes: > Okay, sorry for the delay. Here is a SQL Script to setup the > experiment. Basically it builts one table with 100,000 records, and 10 > small tables with 10,000 records. To select all the records from > table_5 is substantially faster than selecting all the records from > bigTable where clientID = 5 Yes, since there is no index at all at your tables, this is not strange. First, I had to increase the number of rows per client to 100000 to get a significant difference. When I had run the first test, I ran these two statements: CREATE CLUSTERED INDEX clientid_ix on bigTable (clientId) go DBCC DROPCLEANBUFFERS The first statement builds an index on bigTable.clientId. The second just cleans out the cache, so that all data will be read from disk. (Don't do this on a production machine!) I then ran the benchmarks. table_5 was still faster with 450 ms, where as the SELECT COUNT(*) from bigTable needed 563 ms. However, on successive runs, table_5 took 110 ms, where as the read from bigTable was 16 ms. Before you consider advanced techniques like splitting tables, you should make sure that you have a sound index strategy. A clustered index has the data as its leaf pages, so after adding the clustered index, bigTable is really like table_1 to table_9 glued together. -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |