Unix Technical Forum

Getting list of recently added IDENTITY items

This is a discussion on Getting list of recently added IDENTITY items within the SQL Server forums, part of the Microsoft SQL Server category; --> Erland Sommarskog wrote: > David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes: > > If you want to record more than 3ms precision ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 02-29-2008, 07:42 PM
David Portas
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

Erland Sommarskog wrote:
> David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes:
> > If you want to record more than 3ms precision then you'll have to use a
> > non-DATETIME datatype. IDENTITY won't help you do that. If you don't
> > need that level of precision then you can just increment the
> > num_of_hits for each hit within the same 3ms timeframe. Again, IDENTITY
> > doesn't help you. Either way my design is perfectly sound.

>
> No, it isn't. Using no_of_hits is a kludge, and also a performance problem,
> because instead of a plain insert, you need to do an IF EXISTS and then
> INSERT or UPDATE. Similarly, if you want count the number of hits, you
> need to sum no_of_hits, rather than doing count(*) which is likely to
> give the optimizer fewer choices for an effective query plan.
>
> The problem is fairly apparent here, because SQL Server has a fairly
> low resolution on time. But the problem is not the resolution, the
> problem is that time is a contiguous entity that does not have discreet
> values. Using time for as a primary key has the same problem as using a
> floating-point as a primary key.
>
> On the other hand, IDENTITY is a discrete set of value that is easy
> to work with. After all, that is all we know. This was hit 56762 that
> was registered. It happens to have the same values in the database as
> hit 56763, but we don't know if the values they model were the same or
> not. Thus, it would be incorrect to handle them as being the same. The
> only thing we can to discern them, is to add a number to the observations,
> so that we know that they are distinct.
>
> > Unfortunately these types examples just tend to become rather tedious
> > "what if this..." and "what if that..." exchanges. As a result they
> > aren't very informative. Fundamentally, my observation is simply this:
> > that there is no information in duplicate data that cannot also be
> > modelled in relational form (i.e. with keys).

>
> Yes, these exchanges are tedious, when someone insists on putting the cart
> before the horse, and tries to press a circular reality into a squared
> model.
>
> no_of_hits here is perfect example of this. It adds no information to
> the data, but only serves to make the data more difficult to work with.
> Certainly, it makes the relational purists sleep better at night, but
> that is usually not what the customer is paying for.


Now here is a more interesting slant on the key problem. When does a
metric become a kludge? I don't model account balances or stock as one
row per dollar or one row per pack - and I'm sure you don't either. We
use values in columns for those things. No_of_hits is surely a value
that is likely to be of interest to our users - in fact you've already
suggested aggregating the rows to get that value. So when does it
become "right" to put it in a column and "wrong" to create N rows
instead of just 1 row for the same data?

You seen to suggest that performance should be the deciding factor. My
view is that Normal Form is a better criteria. In any event, the
designer must make that choice and live with the consequences. In my
opinion it does no harm to point out yet again that the choice exists.
On the other hand it does a lot of harm to perpetuate the transparently
silly idea that "There isn't a natural key".

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #22 (permalink)  
Old 02-29-2008, 07:42 PM
pb648174
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

As far as DDL, just assume a simple lookup table, Id and Description.

The main reason I use identities is for ease of use. Sometimes I use
natural keys, when they make sense and are a fairly small number of
columns, but other times we have tables that are basically a lookup
table, Id and Description. I suppose I could make the 500 character
description the primary key and only have a single column, but there
QueryString length limitations to deal with, index performance and
other things to worry about with having a large piece of text as a
primary key. I also don't like the idea of having that description data
duplicated when used as a foreign key for all the other tables that
relate to it. If that is just a personal preference and not an actual
problem, I don't know, but as Donald Rumsfeld says, you go to war with
the army you have.

Sure, we could re-architect the solution to make Celko and others treat
us nicely, but that would be hard to sell to the customers - 1 yr delay
to rewrite the software so purists are happy in the discussion forums.
As of now a serializable transaction seems to be the best way to handle
this issue in my mind using my original posted code since we can't also
force our customer to spend 25K to upgrade their SQL servers.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #23 (permalink)  
Old 02-29-2008, 07:42 PM
Doug
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

"no_of_hits here is perfect example of this. It adds no information to
the data, but only serves to make the data more difficult to work with.
Certainly, it makes the relational purists sleep better at night, but
that is usually not what the customer is paying for.
--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se "

Well phrased. I am sure the SQL ANSI spec does not require an identity
type. However, it is SUCH a common and useful tool that any practicing
DBA or developer is going ot use them.
As an example, the journeyman carpenter specs require green sinker
nails. Purists use a claw hammer to drive them in.

Guys that want to make money at the business use nail guns.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #24 (permalink)  
Old 02-29-2008, 07:42 PM
Doug
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

"You seen to suggest that performance should be the deciding factor. My
view is that Normal Form is a better criteria. In any event, the "

Real world stuff. I've played some fairly significant games violating
all sorts of normalization because the "right" way wasn't fast enough.
Early on I tried to teach the customer that it was in their best long
term interests to suffer 3 second response times. The danged customers
kept saying "FIX IT."

I've built systems to create and guarantee unique keys. It is a giant
PITA as you have to learn specificities about the particular engine in
place to figure out how unique keys could be created. As an example, i
read about 3 milliseconds on this forum. I didn't know that, and, it
turns out I don't really care.

For the past 15 years or so, I try to find the EASIEST way to get
guaranteed unique keys from the engine. And the engines of today all
seem to have them, just like most modern cars have parking brakes, and
I don't need to carry a large rock around any more.

regards,
doug

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #25 (permalink)  
Old 02-29-2008, 07:42 PM
David Portas
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

pb648174 wrote:
> As far as DDL, just assume a simple lookup table, Id and Description.
>
> The main reason I use identities is for ease of use. Sometimes I use
> natural keys, when they make sense and are a fairly small number of
> columns, but other times we have tables that are basically a lookup
> table, Id and Description. I suppose I could make the 500 character
> description the primary key and only have a single column, but there
> QueryString length limitations to deal with, index performance and
> other things to worry about with having a large piece of text as a
> primary key. I also don't like the idea of having that description data
> duplicated when used as a foreign key for all the other tables that
> relate to it. If that is just a personal preference and not an actual
> problem, I don't know, but as Donald Rumsfeld says, you go to war with
> the army you have.
>
> Sure, we could re-architect the solution to make Celko and others treat
> us nicely, but that would be hard to sell to the customers - 1 yr delay
> to rewrite the software so purists are happy in the discussion forums.
> As of now a serializable transaction seems to be the best way to handle
> this issue in my mind using my original posted code since we can't also
> force our customer to spend 25K to upgrade their SQL servers.


Continue to use IDENTITY. Just declare the other key as well. Does my
solution work for your original problem?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #26 (permalink)  
Old 02-29-2008, 07:42 PM
Erland Sommarskog
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes:
> Now here is a more interesting slant on the key problem. When does a
> metric become a kludge?


When the kludge is not a metric.

Of course, two hits from the same IP address in the same ms is a little
funny - but so are two hits 10 ms apart, but your model does not account
for that. No_of_hits is just a mechanism you've added to solve the
problem with collisions on a key values that represents a contiguous
spectrum.

Take another example. Some process in laboratory or whereever generates
lots of measurements data, in rates of microseconds. We don't use datetime
to store the value, obviously, but would it be right to assume discrete
steps of microseconds? Maybe, but what if the frequency is somewhat
uneven?. We can get two values registered for the same millisecond, but
we know that they are nevertheless apart. And here we are not talking
number of hits, but some value - presumably floating-point.

What I am saying is that you cannot use en entity that is a continuous
spectrum as a key. Key values must be discrete. Of, course, in a
digital computer, everything is discrete - but that only means that
two analogue values can get the same representation.

And this is just one example where the real world does not have that
fine key the relational model wants. Again, think customers.

> You seen to suggest that performance should be the deciding factor.


Not really, as you can see above. However, for registering hits on a
busy web site, performance is probably argument enough to kill that
model.

> On the other hand it does a lot of harm to perpetuate the transparently
> silly idea that "There isn't a natural key".


It isn't silly. It's fact of life that in many cases. Common cases.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #27 (permalink)  
Old 02-29-2008, 07:42 PM
Anith Sen
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

Erland,

Can you post your definition of a "natural key"?

>> And this is just one example where the real world does not have that fine
>> key the relational model wants.


However, real world examples of entities, by virtue of there mere existance,
have keys regardless of what relational model mandates.

>> Again, think customers.


What about them?

--
Anith


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #28 (permalink)  
Old 02-29-2008, 07:42 PM
David Portas
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

Erland Sommarskog wrote:
> David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes:
> > Now here is a more interesting slant on the key problem. When does a
> > metric become a kludge?

>
> When the kludge is not a metric.
>
> Of course, two hits from the same IP address in the same ms is a little
> funny - but so are two hits 10 ms apart, but your model does not account
> for that. No_of_hits is just a mechanism you've added to solve the
> problem with collisions on a key values that represents a contiguous
> spectrum.
>
> Take another example. Some process in laboratory or whereever generates
> lots of measurements data, in rates of microseconds. We don't use datetime
> to store the value, obviously, but would it be right to assume discrete
> steps of microseconds? Maybe, but what if the frequency is somewhat
> uneven?. We can get two values registered for the same millisecond, but
> we know that they are nevertheless apart. And here we are not talking
> number of hits, but some value - presumably floating-point.
>
> What I am saying is that you cannot use en entity that is a continuous
> spectrum as a key. Key values must be discrete. Of, course, in a
> digital computer, everything is discrete - but that only means that
> two analogue values can get the same representation.


See Kimball on slowly changing dimensions. Or Date, Darwen and
Lorentzos on Temporal Data and The Relational Model. Time-variant data
is commonplace and all the techniques I know of expect the time
dimension to be part of the key. Kimball actually asserts that we
should choose fixed, granular periods for facts - although I don't
think he develops a proper argument to support that. So this is
interesting but given that we have widely implemented and proven
industry standard solutions for these problems I don't think you should
dismiss them without hard evidence that you can better them.

> And this is just one example where the real world does not have that
> fine key the relational model wants. Again, think customers.


Been there. Done that.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #29 (permalink)  
Old 02-29-2008, 07:42 PM
pb648174
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

Just out of curiosity, lets say we are told that only a basic
description of something is needed, let's say a list of projects that a
user is going to work on, that needs to be 500 characters long. What
should the primary key be? Is it the 500 character description or an
identity column? Are there performance issues with having 500 character
foreign keys, indexes, etc.? In what ways is it going to be better than
using an identity column taking into account that it is going to be
passed through the QueryString, posted in links, etc.?


Doug, I think your solution would mostly work, but be a pain to
implement when there are lots of columns. I think perhaps a more
general way to do it would be to store only the primary key of the
copied data (identity or not) and use that to know which records were
copied. As long as the amount of copied data is fairly small I don't
see that there would be a very big performance impact for that. I think
I might just do that to avoid needing the serializable transaction.

--CELKO-- wrote:
> Your whoel mental model of RDBMS is wrong and it is about to kludge you
> to death.
>
> When you look for a key, start with your industry standards. I have
> been in IT for a bit over 35 years, and would estimate that this covers
> ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which
> can be verified and validated via a trusted exernal source. In the ode
> days, this took time; you can Google it today.
>
> After that, there is a natural key in the data, such as (longitude,
> latitude), (store, cash register, ticket_nbr), etc. which can be
> verified and validated in the reality of the data. This is ~18% of the
> cases. if you have no natural key, then your schema is probably wrong.
>
>
> Then, if that fails we invent a key with check digits, validation rules
> and an audit trail within our enterprise. This is a great screaming
> pain IF you do it right. This is why industry standards exists --
> people got tired the pain (would you like to do Retail without UPC
> barcodes on products?)
>
> So, newbies prefer to do it wrong by using auto-increments or other
> proprietary stuff that cannot be verified or validated BECAUSE it is
> easier than real RDBMS design. It takes SIX years to become a Union
> Journeyman carpenter in New York State, but newbies can be a DB
> designer in less than 10 years. Hell, less than 10 months! WOW!
>
> I know you want the quick easy answer that does not require you
> ACTUALLY UNDERSTAND the business problem or your problem domain. Just
> make up something in the software and hope you never write for a
> company that has a SOX audit or a Data Warehouse project.
>
> What you posted is just enough to get a lot of Kludges but not a real
> answer.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #30 (permalink)  
Old 02-29-2008, 07:43 PM
pb648174
 
Posts: n/a
Default Re: Getting list of recently added IDENTITY items

Just out of curiosity, lets say we are told that only a basic
description of something is needed, let's say a list of projects that a

user is going to work on, that needs to be 500 characters long. What
should the primary key be? Is it the 500 character description or an
identity column? Are there performance issues with having 500 character

foreign keys, indexes, etc.? In what ways is it going to be better than

using an identity column taking into account that it is going to be
passed through the QueryString, posted in links, etc.?

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:43 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com