This is a discussion on Reducing NUMERIC size for 8.3 within the pgsql Hackers forums, part of the PostgreSQL category; --> We've changed the on-disk database format in 8.3, so we have an opportunity to change other things also. There ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We've changed the on-disk database format in 8.3, so we have an opportunity to change other things also. There is a patch thats been on the patch queue for some time called numeric508, submitted Dec 2005; I've updated this patch now for 8.3 to remove bit rot (an hour's work). This is posted to pgsql-patches now and it works. The benefit of the patch is that it reduces each NUMERIC value by 2 bytes, so will speed up things considerably. This is now especially important if we are looking to reduce the speed of numeric division by a factor of 4 (recent WIP patch). The objections to applying this patch originally were: 1. it changes on-disk format (we've done this, so argument is void) 2. it would restrict number of digits to 508 and there are allegedly some people that want to store > 508 digits. The current patch passes all regression tests, but currently fails numeric_big.sql since this explicitly checks for support of numeric(1000,800). We could: a) accept the patch as-is and restrict NUMERIC to 508 digits b) refine the patch somewhat to allow 1000 digits (b) is possible in a couple of ways, both fairly quick: - extend the patch so that one of the spare bits from the second digit is used to represent dscale 508-1000. - extend the patch so that if weight > 127 or dscale > 127 we would use the first byte in the digits as an extra indicator byte holding the high bits of both fields. Neither change makes any difference to numbers below 1,000,000,000,000,000....(127 zeroes in total)...000 which probably covers the vast majority of people's usage. Objections: True, we are passed feature freeze, but this patch has been on the queue for 14 months prior to freeze and has been waiting on disk format changes to make patch application acceptable. We definitely want to reduce the size of Numeric by 2 bytes at some point. The question in my mind is: When is the best time to make this change? If we put this off until 8.4, then it will get rejected again because we won't want to change the disk format again. So the best time to do this is now, otherwise we'll put it off forever. Can I get somebody other than Tom to agree to review the patch? Clearly waiting for Tom to review this is just going to delay release, which I don't want to do. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > We've changed the on-disk database format in 8.3, so we have an > opportunity to change other things also. There is a patch thats been on > the patch queue for some time called numeric508, submitted Dec 2005; I thought that idea had been rejected long since as being an unreasonable reduction in the flexibility of numeric arithmetic. > The benefit of the patch is that it reduces each NUMERIC value by 2 > bytes, so will speed up things considerably. This is now especially > important if we are looking to reduce the speed of numeric division by a > factor of 4 (recent WIP patch). The only way that patch would make division faster is by making the slowest cases (longest input numbers) impossible. Which is hardly a solution. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Simon Riggs wrote: > The objections to applying this patch originally were: > 2. it would restrict number of digits to 508 and there are allegedly > some people that want to store > 508 digits. > If 508 digits are not enough, are1000 digits be sufficient? Both limits appear quite arbitrary to me. Regards, Andreas ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Mon, 2007-06-18 at 16:56 +0200, Andreas Pflug wrote: > Simon Riggs wrote: > > The objections to applying this patch originally were: > > 2. it would restrict number of digits to 508 and there are allegedly > > some people that want to store > 508 digits. > > > If 508 digits are not enough, are1000 digits be sufficient? Both limits > appear quite arbitrary to me. Thats the current limit; I agree, but I didn't choose it. IIRC if you don't specify a limit then you can have arbitrarily long numbers. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Andreas Pflug wrote: > Simon Riggs wrote: >> The objections to applying this patch originally were: >> 2. it would restrict number of digits to 508 and there are allegedly >> some people that want to store > 508 digits. >> > If 508 digits are not enough, are1000 digits be sufficient? Both limits > appear quite arbitrary to me. This 1000 is just a restriction on the typmod of numeric. You can still use a much higher number of digits, if you use unconstrained numeric: test=> create table test (n numeric); CREATE TABLE test=> insert into test values (10::numeric ^ 99999); INSERT 0 1 test=> select length(n) from test; length -------- 100017 (1 row) Best Regards Michael Paesold ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Mon, 2007-06-18 at 10:54 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > We've changed the on-disk database format in 8.3, so we have an > > opportunity to change other things also. There is a patch thats been on > > the patch queue for some time called numeric508, submitted Dec 2005; > > I thought that idea had been rejected long since as being an unreasonable > reduction in the flexibility of numeric arithmetic. I've re-read all of the posts from Dec 2005 and I see everyone in favour of committing the patch, subject to review of whether limitation to 508 was acceptable. For me that was never finalised, at least I can't see a clear decision. The important point is that the patch can be easily modified to overcome these restrictions, a line of thought that was never explored. The current patch can be easily modified to allow 1000 digits, yet still saving 2 bytes per value when storing more common values. Or we can use the same techniques to represent arbitrary number of digits, again still saving 2 bytes on common values. We have the ground work done, we just need to agree further changes. > > The benefit of the patch is that it reduces each NUMERIC value by 2 > > bytes, so will speed up things considerably. This is now especially > > important if we are looking to reduce the speed of numeric division by a > > factor of 4 (recent WIP patch). > > The only way that patch would make division faster is by making the > slowest cases (longest input numbers) impossible. Which is hardly a > solution. It isn't directly related to division, but is a speed up nonetheless which can offset the loss. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Andreas Pflug <pgadmin@pse-consulting.de> writes: > Simon Riggs wrote: >> The objections to applying this patch originally were: >> 2. it would restrict number of digits to 508 and there are allegedly >> some people that want to store > 508 digits. >> > If 508 digits are not enough, are1000 digits be sufficient? Both limits > appear quite arbitrary to me. As per the recent discussion about factorial, the current limit of numeric format is 10^131071 --- there is a whole lot of daylight between that and 10^508. I had a thought though: it's possible to reduce the header overhead for typical-size numbers without giving up the ability to store large ones. This is because the POS/NEG/NAN sign possibilities leave one unused bit pattern. Hence: 1. Switch the positions of the n_sign_dscale and n_weight fields in the long format, so that the sign bits are in the first word. 2. Reserve the fourth "sign" bit pattern to denote a compressed-header format in which there's just one uint16 header word and the NumericDigits start right after that. The header word could contain: 2 bits: "sign" distinguishing this from the two-word-header format 1 bit: actual number sign (POS or NEG, disallow NaN) 6 bits: weight, room for -32 .. 31 7 bits: dscale, room for 0 .. 127 3. When packing a NumericVar into a Numeric, use this short format when it's not a NaN and the weight and dscale are in range, else use the long format. Since the weight is in base-10000 digits, this bit allocation allows a dynamic range of about +- 10^127 which fits well with the dscale range. But I suspect that most of the use-cases for long numerics involve large integers, so it might be more useful to shave another bit or two from dscale and give 'em to weight. In any case, no capability is lost, unlike the original proposal; and this would be much less invasive than the original patch since there's no need to play tricks with the content of the digit array. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >> Simon Riggs wrote: >> >>> The objections to applying this patch originally were: >>> 2. it would restrict number of digits to 508 and there are allegedly >>> some people that want to store > 508 digits. >>> >>> >> If 508 digits are not enough, are1000 digits be sufficient? Both limits >> appear quite arbitrary to me. >> > > As per the recent discussion about factorial, the current limit of > numeric format is 10^131071 --- there is a whole lot of daylight between > that and 10^508. > > I had a thought though: it's possible to reduce the header overhead for > typical-size numbers without giving up the ability to store large ones. > This is because the POS/NEG/NAN sign possibilities leave one unused bit > pattern. Hence: > > 1. Switch the positions of the n_sign_dscale and n_weight fields in the > long format, so that the sign bits are in the first word. > > 2. Reserve the fourth "sign" bit pattern to denote a compressed-header > format in which there's just one uint16 header word and the > NumericDigits start right after that. The header word could contain: > 2 bits: "sign" distinguishing this from the two-word-header format > 1 bit: actual number sign (POS or NEG, disallow NaN) > 6 bits: weight, room for -32 .. 31 > 7 bits: dscale, room for 0 .. 127 > > 3. When packing a NumericVar into a Numeric, use this short format when > it's not a NaN and the weight and dscale are in range, else use the long > format. > > Since the weight is in base-10000 digits, this bit allocation allows a > dynamic range of about +- 10^127 which fits well with the dscale range. > But I suspect that most of the use-cases for long numerics involve large > integers, so it might be more useful to shave another bit or two from > dscale and give 'em to weight. > > In any case, no capability is lost, unlike the original proposal; and > this would be much less invasive than the original patch since there's > no need to play tricks with the content of the digit array. > I wonder if the currently waiting patch isn't Good Enough for 999.9999999999999999 % of use cases, and "all" others can use numeric instead of numeric(1000,800) or so. Especially since there are many patches waiting that do need further investigation and refining. Regards, Andreas ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Mon, 2007-06-18 at 11:32 -0400, Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > > Simon Riggs wrote: > >> The objections to applying this patch originally were: > >> 2. it would restrict number of digits to 508 and there are allegedly > >> some people that want to store > 508 digits. > >> > > If 508 digits are not enough, are1000 digits be sufficient? Both limits > > appear quite arbitrary to me. > > As per the recent discussion about factorial, the current limit of > numeric format is 10^131071 --- there is a whole lot of daylight between > that and 10^508. > > I had a thought though: it's possible to reduce the header overhead for > typical-size numbers without giving up the ability to store large ones. > This is because the POS/NEG/NAN sign possibilities leave one unused bit > pattern. Hence: > > 1. Switch the positions of the n_sign_dscale and n_weight fields in the > long format, so that the sign bits are in the first word. > > 2. Reserve the fourth "sign" bit pattern to denote a compressed-header > format in which there's just one uint16 header word and the > NumericDigits start right after that. The header word could contain: > 2 bits: "sign" distinguishing this from the two-word-header format > 1 bit: actual number sign (POS or NEG, disallow NaN) > 6 bits: weight, room for -32 .. 31 > 7 bits: dscale, room for 0 .. 127 > > 3. When packing a NumericVar into a Numeric, use this short format when > it's not a NaN and the weight and dscale are in range, else use the long > format. > > Since the weight is in base-10000 digits, this bit allocation allows a > dynamic range of about +- 10^127 which fits well with the dscale range. > But I suspect that most of the use-cases for long numerics involve large > integers, so it might be more useful to shave another bit or two from > dscale and give 'em to weight. > > In any case, no capability is lost, unlike the original proposal; and > this would be much less invasive than the original patch since there's > no need to play tricks with the content of the digit array. Sounds good. I thought there'd be a way. Since this is your idea, would you like to do this, or should I? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> In any case, no capability is lost, unlike the original proposal; and >> this would be much less invasive than the original patch since there's >> no need to play tricks with the content of the digit array. > I wonder if the currently waiting patch isn't Good Enough for > 999.9999999999999999 % of use cases, and "all" others can use numeric > instead of numeric(1000,800) or so. Apparently you misunderstand that patch: it takes capability away from unconstrained numeric too. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| Thread Tools | |
| Display Modes | |
|
|