This is a discussion on Q: calling all gurus within the SQL Server forums, part of the Microsoft SQL Server category; --> The Scenario: We have some data that can be in three states. One is a saved state, a temporary ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The Scenario: We have some data that can be in three states. One is a saved state, a temporary state, and a complete state. The complete state is the bulk of the data. This data will be requested 1% of the time, maybe even less. The saved state and temporary state will only number 10-15 records. This data will be used 99% of the time, very frequently. So to summarize there will be 10-15 records that will be used ALL the time…. And 50,000-100,000 that will be used very infrequently. The programmer that I work with wants to combine the tables. I want to keep the tables sepperate. My logic is that it will be much faster to work with 10-15 records than sift threw 50-100k most of which are unnecessary to 99% of the requests. Also there are frequent deletions/modifications in the temporary table(10-15 records). The complete table can NOT have modifications or deletions (50-100k records). The optimization for these tables is also very different. He says that since the structures are almost identical they should be in the same table, that we should just use a field to denote the different types of records. He insists that my view is wrong and that I know nothing about databases. I may be new to SQL, but am I really that out to lunch? Would it not make sense to keep these tables separate? (Thank you for taking the time to read this) Sat |
| |||
| It's always difficult to give design advice online without the opportunity to analyse a particular situation in detail. But my opinion, based on the information given, is that you should definitely use a single table. Always start with the logical model - you appear to have a single entity so you should have a single table. Add a column to indicate the status and index it. The table is so small that retrieving the relevant rows from the index should give SQLServer no trouble at all. Updating even this indexed column should be far preferable to copying data between tables when the status changes. But as always with performance questions: don't believe what I say - test it and see. -- David Portas ------------ Please reply only to the newsgroup -- |
| |||
| On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmail.com (Saturnin Kepa) wrote: >The Scenario: > >We have some data that can be in three states. One is a saved state, >a temporary state, and a complete state. > >The complete state is the bulk of the data. This data will be >requested 1% of the time, maybe even less. > >The saved state and temporary state will only number 10-15 records. >This data will be used 99% of the time, very frequently. > > >So to summarize there will be 10-15 records that will be used ALL the >time…. And 50,000-100,000 that will be used very infrequently. > >The programmer that I work with wants to combine the tables. I want >to keep the tables sepperate. > >My logic is that it will be much faster to work with 10-15 records >than sift threw 50-100k most of which are unnecessary to 99% of the >requests. Also there are frequent deletions/modifications in the >temporary table(10-15 records). The complete table can NOT have >modifications or deletions (50-100k records). The optimization for >these tables is also very different. > >He says that since the structures are almost identical they should be >in the same table, that we should just use a field to denote the >different types of records. He insists that my view is wrong and that >I know nothing about databases. > >I may be new to SQL, but am I really that out to lunch? Would it not >make sense to keep these tables separate? I'm a relative newbie at SQL design and administration, but I believe that it probably won't make much difference. If you build the table with a clustered index (hopefully an integer), and with an additional field to flag "saved", "temporary" and "complete", and index the additional field, there will be nearly no time lost in scanning the index, and grabbing the 10-15 rows in either of the first two tables. If these were large tables, it might make a difference, but even then, less than you might think; the index on the flag field essentially splits the table into three separate tables, so long as your queries are written to take advantage of that index. (e.g.: if you're looking for something "saved" that has a certain unique attribute, make sure you specify in the query that it's "saved", so SQL Server doesn't have to do a table scan for the attribute you're looking for. (Also, as a side note: proper database design suggests that you do not have a field that allows the words "saved", "temporary" and "complete; you should have a field that allows integers from one to three, and a second table that equates 1 with "saved", 2 with "temporary" and 3 with "complete" (or whatever order seems best). Comparing integers is much faster than comparing strings, and this saves a nice chunk of space, allowing more rows to be cached.) |
| |||
| "Saturnin Kepa" <no1ninja@hotmail.com> wrote: <snip> > My logic is that it will be much faster to work with 10-15 records > than sift threw 50-100k most of which are unnecessary to 99% of the > requests. Also there are frequent deletions/modifications in the > temporary table(10-15 records). The complete table can NOT have > modifications or deletions (50-100k records). The optimization for > these tables is also very different. <snip> My two cents: I agree with David. However, I'd also like to throw in the old adage that pre-mature optimization can cause you untold headaches. My advice would be to go with a nice, clean, logical design (in this instance, avoiding the use of multiple tables for a single type of entity based on a single property). If some prudent indexing won't keep your performance at acceptable levels, then you can try things like splitting into separate tables. However, my guess is that you'll find the single table performs just fine: it's not like we're talking about 10 billion records or anything. Craig |
| |||
| On Wed, 29 Oct 2003 09:07:20 -0800, John Palmer <jpalmer1@ix.netcom.com> wrote: >On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmail.com (Saturnin Kepa) >wrote: > >>The Scenario: >> >>We have some data that can be in three states. One is a saved state, >>a temporary state, and a complete state. >> >>The complete state is the bulk of the data. This data will be >>requested 1% of the time, maybe even less. >> >>The saved state and temporary state will only number 10-15 records. >>This data will be used 99% of the time, very frequently. >> >> >>So to summarize there will be 10-15 records that will be used ALL the >>time…. And 50,000-100,000 that will be used very infrequently. >> >>The programmer that I work with wants to combine the tables. I want >>to keep the tables sepperate. >> >>My logic is that it will be much faster to work with 10-15 records >>than sift threw 50-100k most of which are unnecessary to 99% of the >>requests. Also there are frequent deletions/modifications in the >>temporary table(10-15 records). The complete table can NOT have >>modifications or deletions (50-100k records). The optimization for >>these tables is also very different. >> >>He says that since the structures are almost identical they should be >>in the same table, that we should just use a field to denote the >>different types of records. He insists that my view is wrong and that >>I know nothing about databases. >> >>I may be new to SQL, but am I really that out to lunch? Would it not >>make sense to keep these tables separate? > >I'm a relative newbie at SQL design and administration, but I believe >that it probably won't make much difference. If you build the table >with a clustered index (hopefully an integer), and with an additional >field to flag "saved", "temporary" and "complete", and index the >additional field, there will be nearly no time lost in scanning the >index, and grabbing the 10-15 rows in either of the first two tables. > >If these were large tables, it might make a difference, but even then, >less than you might think; the index on the flag field essentially >splits the table into three separate tables, so long as your queries >are written to take advantage of that index. (e.g.: if you're looking >for something "saved" that has a certain unique attribute, make sure >you specify in the query that it's "saved", so SQL Server doesn't have >to do a table scan for the attribute you're looking for. > >(Also, as a side note: proper database design suggests that you do not >have a field that allows the words "saved", "temporary" and "complete; >you should have a field that allows integers from one to three, and a >second table that equates 1 with "saved", 2 with "temporary" and 3 >with "complete" (or whatever order seems best). Comparing integers is >much faster than comparing strings, and this saves a nice chunk of >space, allowing more rows to be cached.) A further consideration could be how you are obtaining your primary key. If this is one of the auto numbering schemes you may well find it harder to work with two tables. As far as the speed issue is concerned, try both ways and see if the difference justifies having two tables. |
| |||
| >> The programmer that I work with wants to combine the tables. I want to keep the tables separate. << Why? If they are the same entity in your data model, then they should be in one table. What you are doing is called attribute splitting and it gets messy very fast. >> My logic is that it will be much faster to work with 10-15 records [sic] than sift threw 50-100k most of which are unnecessary to 99% of the requests. << First of all, rows are not records and the differences are vital to using and understanding SQL. Secondly, use an index or uniqueness constraint to access the rows; why would you look thru the whole table? This is not a sequential file system. >> He says that since the structures are almost identical they should be in the same table, that we should just use a field [sic] to denote the different types of records [sic]. << Yes. If they are the same kind of entities, and differ only by a status code, then he is right. Get the dara model right first, then tune the performance. |
| |||
| On 29 Oct 2003 13:43:52 -0800, joe.celko@northface.edu (--CELKO--) wrote: >>> The programmer that I work with wants to combine the tables. I >want to keep the tables separate. << > >Why? If they are the same entity in your data model, then they should >be in one table. What you are doing is called attribute splitting and >it gets messy very fast. > >>> My logic is that it will be much faster to work with 10-15 records >[sic] than sift threw 50-100k most of which are unnecessary to 99% of >the requests. << > >First of all, rows are not records and the differences are vital to >using and understanding SQL. Secondly, use an index or uniqueness >constraint to access the rows; why would you look thru the whole >table? This is not a sequential file system. > >>> He says that since the structures are almost identical they should >be in the same table, that we should just use a field [sic] to denote >the different types of records [sic]. << > >Yes. If they are the same kind of entities, and differ only by a >status code, then he is right. Get the dara model right first, then >tune the performance. (not a guru, but here it goes anyway...) Note the word 'almost identical' - isn't that 'almost positive' sign for either over generalization or existing denormalization (especially on conceptual level of design)? Also if one entity never changes (only inserts allowed) and the other allows updates, inserts and deletes doesn't that suggest different entities, and calls for deeper examination of those entities and their attributes? To me it seems that their DBA is the one that fails to see the difference between the conceptual data modeling and physical data model. --- On another note, also hoping to learn something... What exactly does get messy with (for instance): CREATE TABLE ABIG ( prop1 INT PRIMARY KEY CHECK (prop2 > 1000), prop2 CHAR(50)) CREATE TABLE ASMALL ( prop1 INT PRIMARY KEY CHECK (prop1 <= 1000), prop2 CHAR(50)) CREATE VIEW A AS SELECT prop1, prop2 FROM ABIG UNION ALL SELECT prop1, prop2 FROM ASMALL INSERT INTO A (prop1, prop2) VALUES (1, 'G'); INSERT INTO A (prop1, prop2) VALUES (1001, 'G'); ....except that it is already messy. Does it get even messier? ( GoranG79 AT hotmail.com ) |
| |||
| QUOTE On another note, also hoping to learn something... What exactly does get messy with (for instance): CREATE TABLE ABIG ( prop1 INT PRIMARY KEY CHECK (prop2 > 1000), prop2 CHAR(50)) CREATE TABLE ASMALL ( prop1 INT PRIMARY KEY CHECK (prop1 <= 1000), prop2 CHAR(50)) CREATE VIEW A AS SELECT prop1, prop2 FROM ABIG UNION ALL SELECT prop1, prop2 FROM ASMALL INSERT INTO A (prop1, prop2) VALUES (1, 'G'); INSERT INTO A (prop1, prop2) VALUES (1001, 'G'); ....except that it is already messy. Does it get even messier? END QUOTE GoranG, I'm not sure what you are asking or demonstrating here? Tnx, Hans Brouwer *** Sent via Developersdex http://www.developersdex.com *** Don't just participate in USENET...get rewarded for it! |
| |||
| On 30 Oct 2003 12:41:08 GMT, hansje <hansjes@anonymous.com> wrote: >QUOTE >On another note, also hoping to learn something... >What exactly does get messy with (for instance): > >CREATE TABLE ABIG ( > prop1 INT PRIMARY KEY CHECK (prop2 > 1000), > prop2 CHAR(50)) > >CREATE TABLE ASMALL ( > prop1 INT PRIMARY KEY CHECK (prop1 <= 1000), > prop2 CHAR(50)) > >CREATE VIEW A AS >SELECT prop1, prop2 >FROM ABIG >UNION ALL >SELECT prop1, prop2 >FROM ASMALL > >INSERT INTO A (prop1, prop2) VALUES (1, 'G'); >INSERT INTO A (prop1, prop2) VALUES (1001, 'G'); > >...except that it is already messy. >Does it get even messier? >END QUOTE > > > >GoranG, > >I'm not sure what you are asking or demonstrating here? Potential logical equivalence of spliting attributes at physical level. ( GoranG79 AT hotmail.com ) |
| ||||
| On Thu, 30 Oct 2003 14:57:06 +0100, GoranG <no@spam.net> wrote: <cut> > >Potential logical equivalence of spliting attributes at physical >level. Oops. Potential logical equivalence of a) splitting attributes at physical level and having two physical entities and a single view to b) single entity at physical level. ( GoranG79 AT hotmail.com ) |