This is a discussion on Re: [COMMITTERS] pgsql: Update: < * Allow adding enumeratedvalues to an existing within the pgsql Hackers forums, part of the PostgreSQL category; --> "Tom Dunstan" <pgsql@tomd.cc> writes: > One scenario I'm not happy about is this: the friendly db admin has > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| "Tom Dunstan" <pgsql@tomd.cc> writes: > One scenario I'm not happy about is this: the friendly db admin has > happily added an extra value to the end before and the operation has > been a snap - no rewriting required. But this time either a) oid > wraparound has occurred, b) she's inserted one or c) she's reordered > them. Bam - we start rewriting the entire database. That's not the > kind of surprise I like giving people, and the current situation of > either don't allow updates at all, or the alternative to surprises of > always rewrite everything seem pretty deficient. And I don't want to > only allow updates if they won't cause a rewrite, it's > nondeterministic. If we take OIDs out of the picture it wouldn't be nondeterministic. I think with something like your 16bit/16bit design, and say ten free codes between each original assignment, it'd be okay to not support the rewriting stuff at all. The frequency with which people would hit the restriction would be so low it wouldn't be worth supporting the code for it, especially since we couldn't do it any more efficiently than a manual ALTER COLUMN TYPE replacement would. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Apr 26, 2008 at 6:33 AM, Tom Dunstan wrote: > One scenario I'm not happy about is this: the friendly db admin has > happily added an extra value to the end before and the operation has > been a snap - no rewriting required. But this time either a) oid > wraparound has occurred, b) she's inserted one or c) she's reordered > them. Bam - we start rewriting the entire database. As long as the documentation is candid about this, I don't think it's a show-stopper. e.g.: N.B. Rearranging an ENUM will usually be a simple operation, but in $CERTAIN_CASES may require a rewrite of tables using the ENUM, which is time consuming and locks the table against writing ... You'd probably also want a "NOTICE: Change to ENUM will require rewriting of tables." to be emitted when this happens. > > I've already suggested some alternatives in the reply to Brendan that > would solve some of this, but I suppose another gross-seeming way to > stop surprise rewrites would be to never do one unless given a FORCE > REWRITE clause on the ALTER statement or something like that, and fail > if a rewrite is required not specified. > That would be okay too, but I think I'd prefer proceeding with the rewrite after emitting a NOTICE. If the db admin decides not to go ahead, or wait to do it after hours, she can always hit ^C, right? Cheers, BJ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFIEkO+5YBsbHkuyV0RAttIAJ9TNhNDN8SAsfyAR5MY9l ppPyeWSQCfYOSs kG25F0V44QqTZ4HMAWXL5JI= =tG5q -----END PGP SIGNATURE----- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Sat, Apr 26, 2008 at 2:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The very first idea that I had was to have an enum value as > > the combination of both an enum id and the ordinal value. > > I seem to remember that we discussed that and rejected it, but I don't > remember the reasoning... I don't think there was any terribly strong objection. IIRC I originally proposed trying to fit everything into 2 bytes, you objected to that as "unnecessary bit-shaving" and proposed 8 bytes, I didn't want to give up pass-by-value, plus my initial pg_enum design was rather denormalized - the current solution was a compromise that fixed that and kept everyone happy. updates too carefully. Maybe it was just a bit too cute a solution. So two alternative proposals, both with a 2 byte "enum id" and a 2 byte "value": 1 - We space the values out as evenly as we can across the 65000ish range and allow people to delete, insert and append, but not reorder. If they do the above gratuitously we might have to do a rewrite, but they'll have to get fairly busy to do it. Rewrite would be required for reorderings. 2- We totally give up the idea of storing a value on disk that is directly comparable (other than equality), and simply number from zero up, using that number to index into an array (or use as syscache key or whatever) containing the real ordering information. We can then reorder or do any other operations to our heart's content. I'm actually favouring option 2 - I think it can be done in such a way as to not be much of an overhead compared to the status quo, and you know that if we don't implement proper reordering now, someone will ask for it, and we'll be having this discussion at a similar time after 8.4 goes out. I'm happy to work on a patch for this if it meets general approval. Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Oops, sorry for the crossed emails, slight delay in my main being received. On Sat, Apr 26, 2008 at 2:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think with something like your 16bit/16bit design, and say ten free > codes between each original assignment, it'd be okay to not support the > rewriting stuff at all. Yeah. I'm more concerned about someone wanting reordering, hence the second option in my other mail. Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| "Tom Dunstan" <pgsql@tomd.cc> writes: > 1 - We space the values out as evenly as we can across the 65000ish > range and allow people to delete, insert and append, but not reorder. > If they do the above gratuitously we might have to do a rewrite, but > they'll have to get fairly busy to do it. Rewrite would be required > for reorderings. > 2- We totally give up the idea of storing a value on disk that is > directly comparable (other than equality), and simply number from zero > up, using that number to index into an array (or use as syscache key > or whatever) containing the real ordering information. We can then > reorder or do any other operations to our heart's content. > I'm actually favouring option 2 - I'm not ... it strikes me that it will add implementation complexity and runtime overhead for a feature that two days ago we didn't think we needed at all, and IMHO one we still shouldn't be thinking to expend a lot of work on. I like #1 with no rewrite support. That strikes me as covering 99% of the requirements with 10% of the work. Further, as already noted, if you do have to rewrite then a series of manual ALTER COLUMN TYPE operations would probably be a *better* answer than a monolithic implementation, because of the locking problems involved in doing it in one transaction. (Oh, and don't forget the disk space problem: double the disk space for every table involved, simultaneously.) regards, tom lane PS: no, I do *not* want to hear any proposals for ALTER TYPE CONCURRENTLY ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Dunstan wrote: > So two alternative proposals, both with a 2 byte "enum id" and a 2 byte "value": > > 1 - We space the values out as evenly as we can across the 65000ish > range and allow people to delete, insert and append, but not reorder. > If they do the above gratuitously we might have to do a rewrite, but > they'll have to get fairly busy to do it. Rewrite would be required > for reorderings. > Or else we just error out in such cases. As Tom Lane suggests, rewriting has some nasty deadlock possibilities. You always have the option of creating a new enum type and moving each affected column to that type. > 2- We totally give up the idea of storing a value on disk that is > directly comparable (other than equality), and simply number from zero > up, using that number to index into an array (or use as syscache key > or whatever) containing the real ordering information. We can then > reorder or do any other operations to our heart's content. > > I'm actually favouring option 2 - I think it can be done in such a way > as to not be much of an overhead compared to the status quo, and you > know that if we don't implement proper reordering now, someone will > ask for it, and we'll be having this discussion at a similar time > after 8.4 goes out. > > > Being able simply to order by the oid value is fast. That's one of the current benefits. So I think we'd need some benchmarking to show that this wouldn't slow things down. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Sat, Apr 26, 2008 at 2:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not ... it strikes me that it will add implementation complexity and > runtime overhead for a feature that two days ago we didn't think we > needed at all, and IMHO one we still shouldn't be thinking to expend a > lot of work on. Well, I *did* think it was necessary, I just hadn't spent the effort in coming up with a solution. And on the effort side, I'm not going to be hacking the optimizer any day soon. > I like #1 with no rewrite support. That strikes me as covering 99% > of the requirements with 10% of the work. > Further, as already noted, if you do have to rewrite then a series of > manual ALTER COLUMN TYPE operations would probably be a *better* answer > than a monolithic implementation, because of the locking problems > involved in doing it in one transaction. I don't understand this if it's calling option 2 the monolithic implementation. I was intending that the values be permanent tokens if you like, so that ZERO rewriting would be required for any types of modification. So I don't see where locking comes in. I don't want rewriting either. Hmm, maybe I haven't explained it properly. Here's an example: CREATE TYPE thing AS ENUM('vegetable', 'animal'); Hypothetical new pg_enum: enum_id | value | order | label ---------+-------+-------+----------- 0 | 0 | 0 | vegetable 0 | 1 | 1 | animal ALTER TYPE thing AS ENUM('animal', 'mineral', 'vegetable'); pg_enum: enum_id | value | order | label ---------+-------+-------+----------- 0 | 0 | 2 | vegetable 0 | 1 | 0 | animal 0 | 2 | 1 | mineral So we store the 'value' column on disk, and it never changes. The downside is that we now have to look up the order when we call a non-equality operator, but I reckon we can pretty efficiently cache that, so the lookup is just a couple of array index operations. The upside is that we can reorder, and we'll never run out of values "in-between" existing ones. Anyway, sorry if all of the above *was* clear and I just misunderstood the comment. If there's consensus to go with option 1 I'll pursue that path. It's much less of a change to go from option 1 to option 2 than the current code to either of them anyway, so doing some benchmarking of both options shouldn't be hard if I want to. Cheers Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| "Tom Dunstan" <pgsql@tomd.cc> writes: > On Sat, Apr 26, 2008 at 2:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Further, as already noted, if you do have to rewrite then a series of >> manual ALTER COLUMN TYPE operations would probably be a *better* answer >> than a monolithic implementation, because of the locking problems >> involved in doing it in one transaction. > I don't understand this if it's calling option 2 the monolithic > implementation. No, I was imagining an option-1 implementation trying to support rewriting of all affected tables in a single "monolithic" command. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Andrew Dunstan escribió: > Tom Dunstan wrote: >> So two alternative proposals, both with a 2 byte "enum id" and a 2 byte "value": >> >> 1 - We space the values out as evenly as we can across the 65000ish >> range and allow people to delete, insert and append, but not reorder. >> If they do the above gratuitously we might have to do a rewrite, but >> they'll have to get fairly busy to do it. Rewrite would be required >> for reorderings. > > Or else we just error out in such cases. As Tom Lane suggests, rewriting > has some nasty deadlock possibilities. > > You always have the option of creating a new enum type and moving each > affected column to that type. Another alternative would be internally creating a different temporary enum, rewriting the tables one by one each on its own transaction, and finish by dropping the original enum and renaming the temporary one. This solves the deadlock problem. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| Alvaro Herrera wrote: > Andrew Dunstan escribió: > > >> Tom Dunstan wrote: >> >>> So two alternative proposals, both with a 2 byte "enum id" and a 2 byte "value": >>> >>> 1 - We space the values out as evenly as we can across the 65000ish >>> range and allow people to delete, insert and append, but not reorder. >>> If they do the above gratuitously we might have to do a rewrite, but >>> they'll have to get fairly busy to do it. Rewrite would be required >>> for reorderings. >>> >> Or else we just error out in such cases. As Tom Lane suggests, rewriting >> has some nasty deadlock possibilities. >> >> You always have the option of creating a new enum type and moving each >> affected column to that type. >> > > Another alternative would be internally creating a different temporary > enum, rewriting the tables one by one each on its own transaction, and > finish by dropping the original enum and renaming the temporary one. > This solves the deadlock problem. > > What happens when someone tries to join two of the tables, one that has been converted and one that hasn't? You might not have deadlock, but you won't have type integrity either, ISTM. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |