This is a discussion on Getting list of recently added IDENTITY items within the SQL Server forums, part of the Microsoft SQL Server category; --> Erland Sommarskog wrote: > David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes: > > If you want to record more than 3ms precision ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Erland Sommarskog wrote: > David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes: > > If you want to record more than 3ms precision then you'll have to use a > > non-DATETIME datatype. IDENTITY won't help you do that. If you don't > > need that level of precision then you can just increment the > > num_of_hits for each hit within the same 3ms timeframe. Again, IDENTITY > > doesn't help you. Either way my design is perfectly sound. > > No, it isn't. Using no_of_hits is a kludge, and also a performance problem, > because instead of a plain insert, you need to do an IF EXISTS and then > INSERT or UPDATE. Similarly, if you want count the number of hits, you > need to sum no_of_hits, rather than doing count(*) which is likely to > give the optimizer fewer choices for an effective query plan. > > The problem is fairly apparent here, because SQL Server has a fairly > low resolution on time. But the problem is not the resolution, the > problem is that time is a contiguous entity that does not have discreet > values. Using time for as a primary key has the same problem as using a > floating-point as a primary key. > > On the other hand, IDENTITY is a discrete set of value that is easy > to work with. After all, that is all we know. This was hit 56762 that > was registered. It happens to have the same values in the database as > hit 56763, but we don't know if the values they model were the same or > not. Thus, it would be incorrect to handle them as being the same. The > only thing we can to discern them, is to add a number to the observations, > so that we know that they are distinct. > > > Unfortunately these types examples just tend to become rather tedious > > "what if this..." and "what if that..." exchanges. As a result they > > aren't very informative. Fundamentally, my observation is simply this: > > that there is no information in duplicate data that cannot also be > > modelled in relational form (i.e. with keys). > > Yes, these exchanges are tedious, when someone insists on putting the cart > before the horse, and tries to press a circular reality into a squared > model. > > no_of_hits here is perfect example of this. It adds no information to > the data, but only serves to make the data more difficult to work with. > Certainly, it makes the relational purists sleep better at night, but > that is usually not what the customer is paying for. Now here is a more interesting slant on the key problem. When does a metric become a kludge? I don't model account balances or stock as one row per dollar or one row per pack - and I'm sure you don't either. We use values in columns for those things. No_of_hits is surely a value that is likely to be of interest to our users - in fact you've already suggested aggregating the rows to get that value. So when does it become "right" to put it in a column and "wrong" to create N rows instead of just 1 row for the same data? You seen to suggest that performance should be the deciding factor. My view is that Normal Form is a better criteria. In any event, the designer must make that choice and live with the consequences. In my opinion it does no harm to point out yet again that the choice exists. On the other hand it does a lot of harm to perpetuate the transparently silly idea that "There isn't a natural key". -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/m...S,SQL.90).aspx -- |
| |||
| As far as DDL, just assume a simple lookup table, Id and Description. The main reason I use identities is for ease of use. Sometimes I use natural keys, when they make sense and are a fairly small number of columns, but other times we have tables that are basically a lookup table, Id and Description. I suppose I could make the 500 character description the primary key and only have a single column, but there QueryString length limitations to deal with, index performance and other things to worry about with having a large piece of text as a primary key. I also don't like the idea of having that description data duplicated when used as a foreign key for all the other tables that relate to it. If that is just a personal preference and not an actual problem, I don't know, but as Donald Rumsfeld says, you go to war with the army you have. Sure, we could re-architect the solution to make Celko and others treat us nicely, but that would be hard to sell to the customers - 1 yr delay to rewrite the software so purists are happy in the discussion forums. As of now a serializable transaction seems to be the best way to handle this issue in my mind using my original posted code since we can't also force our customer to spend 25K to upgrade their SQL servers. |
| |||
| "no_of_hits here is perfect example of this. It adds no information to the data, but only serves to make the data more difficult to work with. Certainly, it makes the relational purists sleep better at night, but that is usually not what the customer is paying for. -- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se " Well phrased. I am sure the SQL ANSI spec does not require an identity type. However, it is SUCH a common and useful tool that any practicing DBA or developer is going ot use them. As an example, the journeyman carpenter specs require green sinker nails. Purists use a claw hammer to drive them in. Guys that want to make money at the business use nail guns. |
| |||
| "You seen to suggest that performance should be the deciding factor. My view is that Normal Form is a better criteria. In any event, the " Real world stuff. I've played some fairly significant games violating all sorts of normalization because the "right" way wasn't fast enough. Early on I tried to teach the customer that it was in their best long term interests to suffer 3 second response times. The danged customers kept saying "FIX IT." I've built systems to create and guarantee unique keys. It is a giant PITA as you have to learn specificities about the particular engine in place to figure out how unique keys could be created. As an example, i read about 3 milliseconds on this forum. I didn't know that, and, it turns out I don't really care. For the past 15 years or so, I try to find the EASIEST way to get guaranteed unique keys from the engine. And the engines of today all seem to have them, just like most modern cars have parking brakes, and I don't need to carry a large rock around any more. regards, doug |
| |||
| pb648174 wrote: > As far as DDL, just assume a simple lookup table, Id and Description. > > The main reason I use identities is for ease of use. Sometimes I use > natural keys, when they make sense and are a fairly small number of > columns, but other times we have tables that are basically a lookup > table, Id and Description. I suppose I could make the 500 character > description the primary key and only have a single column, but there > QueryString length limitations to deal with, index performance and > other things to worry about with having a large piece of text as a > primary key. I also don't like the idea of having that description data > duplicated when used as a foreign key for all the other tables that > relate to it. If that is just a personal preference and not an actual > problem, I don't know, but as Donald Rumsfeld says, you go to war with > the army you have. > > Sure, we could re-architect the solution to make Celko and others treat > us nicely, but that would be hard to sell to the customers - 1 yr delay > to rewrite the software so purists are happy in the discussion forums. > As of now a serializable transaction seems to be the best way to handle > this issue in my mind using my original posted code since we can't also > force our customer to spend 25K to upgrade their SQL servers. Continue to use IDENTITY. Just declare the other key as well. Does my solution work for your original problem? -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/m...S,SQL.90).aspx -- |
| |||
| David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes: > Now here is a more interesting slant on the key problem. When does a > metric become a kludge? When the kludge is not a metric. Of course, two hits from the same IP address in the same ms is a little funny - but so are two hits 10 ms apart, but your model does not account for that. No_of_hits is just a mechanism you've added to solve the problem with collisions on a key values that represents a contiguous spectrum. Take another example. Some process in laboratory or whereever generates lots of measurements data, in rates of microseconds. We don't use datetime to store the value, obviously, but would it be right to assume discrete steps of microseconds? Maybe, but what if the frequency is somewhat uneven?. We can get two values registered for the same millisecond, but we know that they are nevertheless apart. And here we are not talking number of hits, but some value - presumably floating-point. What I am saying is that you cannot use en entity that is a continuous spectrum as a key. Key values must be discrete. Of, course, in a digital computer, everything is discrete - but that only means that two analogue values can get the same representation. And this is just one example where the real world does not have that fine key the relational model wants. Again, think customers. > You seen to suggest that performance should be the deciding factor. Not really, as you can see above. However, for registering hits on a busy web site, performance is probably argument enough to kill that model. > On the other hand it does a lot of harm to perpetuate the transparently > silly idea that "There isn't a natural key". It isn't silly. It's fact of life that in many cases. Common cases. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| Erland, Can you post your definition of a "natural key"? >> And this is just one example where the real world does not have that fine >> key the relational model wants. However, real world examples of entities, by virtue of there mere existance, have keys regardless of what relational model mandates. >> Again, think customers. What about them? -- Anith |
| |||
| Erland Sommarskog wrote: > David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes: > > Now here is a more interesting slant on the key problem. When does a > > metric become a kludge? > > When the kludge is not a metric. > > Of course, two hits from the same IP address in the same ms is a little > funny - but so are two hits 10 ms apart, but your model does not account > for that. No_of_hits is just a mechanism you've added to solve the > problem with collisions on a key values that represents a contiguous > spectrum. > > Take another example. Some process in laboratory or whereever generates > lots of measurements data, in rates of microseconds. We don't use datetime > to store the value, obviously, but would it be right to assume discrete > steps of microseconds? Maybe, but what if the frequency is somewhat > uneven?. We can get two values registered for the same millisecond, but > we know that they are nevertheless apart. And here we are not talking > number of hits, but some value - presumably floating-point. > > What I am saying is that you cannot use en entity that is a continuous > spectrum as a key. Key values must be discrete. Of, course, in a > digital computer, everything is discrete - but that only means that > two analogue values can get the same representation. See Kimball on slowly changing dimensions. Or Date, Darwen and Lorentzos on Temporal Data and The Relational Model. Time-variant data is commonplace and all the techniques I know of expect the time dimension to be part of the key. Kimball actually asserts that we should choose fixed, granular periods for facts - although I don't think he develops a proper argument to support that. So this is interesting but given that we have widely implemented and proven industry standard solutions for these problems I don't think you should dismiss them without hard evidence that you can better them. > And this is just one example where the real world does not have that > fine key the relational model wants. Again, think customers. Been there. Done that. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/m...S,SQL.90).aspx -- |
| |||
| Just out of curiosity, lets say we are told that only a basic description of something is needed, let's say a list of projects that a user is going to work on, that needs to be 500 characters long. What should the primary key be? Is it the 500 character description or an identity column? Are there performance issues with having 500 character foreign keys, indexes, etc.? In what ways is it going to be better than using an identity column taking into account that it is going to be passed through the QueryString, posted in links, etc.? Doug, I think your solution would mostly work, but be a pain to implement when there are lots of columns. I think perhaps a more general way to do it would be to store only the primary key of the copied data (identity or not) and use that to know which records were copied. As long as the amount of copied data is fairly small I don't see that there would be a very big performance impact for that. I think I might just do that to avoid needing the serializable transaction. --CELKO-- wrote: > Your whoel mental model of RDBMS is wrong and it is about to kludge you > to death. > > When you look for a key, start with your industry standards. I have > been in IT for a bit over 35 years, and would estimate that this covers > ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which > can be verified and validated via a trusted exernal source. In the ode > days, this took time; you can Google it today. > > After that, there is a natural key in the data, such as (longitude, > latitude), (store, cash register, ticket_nbr), etc. which can be > verified and validated in the reality of the data. This is ~18% of the > cases. if you have no natural key, then your schema is probably wrong. > > > Then, if that fails we invent a key with check digits, validation rules > and an audit trail within our enterprise. This is a great screaming > pain IF you do it right. This is why industry standards exists -- > people got tired the pain (would you like to do Retail without UPC > barcodes on products?) > > So, newbies prefer to do it wrong by using auto-increments or other > proprietary stuff that cannot be verified or validated BECAUSE it is > easier than real RDBMS design. It takes SIX years to become a Union > Journeyman carpenter in New York State, but newbies can be a DB > designer in less than 10 years. Hell, less than 10 months! WOW! > > I know you want the quick easy answer that does not require you > ACTUALLY UNDERSTAND the business problem or your problem domain. Just > make up something in the software and hope you never write for a > company that has a SOX audit or a Data Warehouse project. > > What you posted is just enough to get a lot of Kludges but not a real > answer. |
| ||||
| Just out of curiosity, lets say we are told that only a basic description of something is needed, let's say a list of projects that a user is going to work on, that needs to be 500 characters long. What should the primary key be? Is it the 500 character description or an identity column? Are there performance issues with having 500 character foreign keys, indexes, etc.? In what ways is it going to be better than using an identity column taking into account that it is going to be passed through the QueryString, posted in links, etc.? |