This is a discussion on Column storage positions within the pgsql Hackers forums, part of the PostgreSQL category; --> On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote: > Simon Riggs wrote: > > On Wed, 2007-02-21 at ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote: > Simon Riggs wrote: > > On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote: > > > >> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >> > >>> I'd expect the system being able to reoder the columns to the most > >>> efficient order possible (performance-wise and padding-saving-wise), > >>> automatically. When you create a table, sort the columns to the most > >>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the > >>> end of the tuple; and anything that requires a rewrite of the table > >>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER > >>> do it as well; and do it on TRUNCATE also) again recomputes the most > >>> efficient order. > >>> > >> That's exactly what I'm proposing. On table creation, the system > >> chooses an efficient column order for you. > >> > > > > That's fairly straightforward and beneficial. I much prefer Alvaro's > > approach rather than the storage position details originally described. > > Moreover, you'd need to significantly re-write lots of ALTER TABLE and I > > really don't think you want to go there. > > > > There is a problem: If people do a CREATE TABLE and then issue SELECT * > > they will find the columns in a different order. That could actually > > break some programs, so it isn't acceptable in all cases. e.g. COPY > > without a column-list assumes that the incoming data should be assigned > > to the table columns in the same order as the incoming data file. > > > > You seem to have missed that we will be separating logical from physical > ordering. Each attribute will have a permanent id, a physical ordering > and a logical ordering. You can change either ordering without affecting > the other. I missed nothing, AFAICS. My understanding was that Alvaro was proposing to have just a simple physical re-ordering and that would be altered at CREATE TABLE time. No complexity of multiple column orderings: nice, simple and effective. My only addition was to say: must be optional. > COPY, SELECT and all user-visible commands should follow the logical > ordering, not the physical ordering, which should be completely > invisible to SQL. I agree with comments here about the multiple orderings being a horrible source of bugs, as well as lots of coding even to make it happen at all http://archives.postgresql.org/pgsql...2/msg00859.php -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Phil Currier wrote: > On 2/21/07, Martijn van Oosterhout <kleptog@svana.org> wrote: >> > don't see any good way to perform an upgrade between PG versions >> > without rewriting each table's data. Maybe most people aren't doing >> > upgrades like this right now, but it seems like it will only become >> > more common in the future. In my opinion, this is more important than >> > #1. >> >> I don't see this either. For all current tables, the storage position >> is the attribute number, no exception. You say: >> >> > because the version X table could >> > have dropped columns that might or might not be present in any given >> > tuple on disk. >> >> Whether they're there or not is irrelevent. Drop columns are not >> necesarily empty, but in any case they occupy a storage position until >> the table is rewritten. A dump/restore doesn't need to preserve this, >> but pg_migrator will need some smarts to handle it. The system will >> need to create a column of the appropriate type and drop it to get to >> the right state. > > I agree, a dump/restore that rewrites all the table datafiles doesn't > need to handle this. And I agree that the system will need to create > dropped columns and then drop them again, that's exactly what I > suggested in fact. We're talking about pg_migrator-style upgrades > only here. Why would a pg_migrator style upgrade use pg_dump at all? I assumed it would rather copy the verbatim data from the old to the new catalog, only changing it if the layout of the tables in pg_catalog actually changed. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Wed, 21 Feb 2007, Martijn van Oosterhout wrote: > On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote: > > Well, for two reasons: > > > > 1) If you have a table with one very-frequently-accessed varchar() > > column and several not-frequently-accessed int columns, it might > > actually make sense to put the varchar column first. The system won't > > always be able to make the most intelligent decision about table > > layout. > > Umm, the point of the exercise is that if you know there are int > columns, then you can skip over them, whereas you can never skip over a > varchar column. So there isn't really any situation where it would be > better to put the varchar first. IIRC, in the first message in this thread, or another recent thread of this type, someone tried a reordering example with alternating smallints and varchar() and found that the leftmost varchar was actually slower to access after reordering, so I'm not sure that we can say there isn't a situation where it would affect things. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Simon Riggs wrote: > On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote: >> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: >>> I'd expect the system being able to reoder the columns to the most >>> efficient order possible (performance-wise and padding-saving-wise), >>> automatically. When you create a table, sort the columns to the most >>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the >>> end of the tuple; and anything that requires a rewrite of the table >>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER >>> do it as well; and do it on TRUNCATE also) again recomputes the most >>> efficient order. >> That's exactly what I'm proposing. On table creation, the system >> chooses an efficient column order for you. > > That's fairly straightforward and beneficial. I much prefer Alvaro's > approach rather than the storage position details originally described. > Moreover, you'd need to significantly re-write lots of ALTER TABLE and I > really don't think you want to go there. > > There is a problem: If people do a CREATE TABLE and then issue SELECT * > they will find the columns in a different order. That could actually > break some programs, so it isn't acceptable in all cases. e.g. COPY > without a column-list assumes that the incoming data should be assigned > to the table columns in the same order as the incoming data file. But the display order (and hence the COPY order) of columns would still be determinted by attnum, not by some attstoragepos, no? The column reordering would only apply to the physical storage of columns, not to how it's presented to the user I'd think. The original idea was to add a third column, attdisplaypos, and let the user choose the display ordering independent from the unique id (attnum), which in turn is independent from the storage position. For simplicity, the OP said he omitted the display-position part here, because it's really orthogonal to being able to modify the storage position. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Simon Riggs wrote: > > I agree with comments here about the multiple orderings being a horrible > source of bugs, as well as lots of coding even to make it happen at all > http://archives.postgresql.org/pgsql...2/msg00859.php > > I thought we were going with this later proposal of Tom's (on which he's convinced me): http://archives.postgresql.org/pgsql...2/msg00983.php - if not I'm totally confused (situation normal). The current thread started with this sentence: > Inspired by this thread [1], and in particular by the idea of storing > three numbers (permanent ID, on-disk storage position, display > position) for each column, I spent a little time messing around with a > prototype implementation of column storage positions to see what kind > of difference it would make. I haven't understood Alvaro to suggest not keeping 3 numbers. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Stephan Szabo escribió: > On Wed, 21 Feb 2007, Martijn van Oosterhout wrote: > > > On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote: > > > Well, for two reasons: > > > > > > 1) If you have a table with one very-frequently-accessed varchar() > > > column and several not-frequently-accessed int columns, it might > > > actually make sense to put the varchar column first. The system won't > > > always be able to make the most intelligent decision about table > > > layout. > > > > Umm, the point of the exercise is that if you know there are int > > columns, then you can skip over them, whereas you can never skip over a > > varchar column. So there isn't really any situation where it would be > > better to put the varchar first. > > IIRC, in the first message in this thread, or another recent thread of > this type, someone tried a reordering example with alternating > smallints and varchar() and found that the leftmost varchar was > actually slower to access after reordering, so I'm not sure that we can > say there isn't a situation where it would affect things. Offsets are cached in tuple accesses, but the caching is obviously disabled for all attributes past any variable-length attribute. So if you put a varlena attr in front, caching is completely disabled for all attrs (but that first one). The automatic reordering algorithm must put all fixed-len attrs at the front, so that their offets (and that of the first variable length attr) can be cached. Did I miss something in what you were trying to say? I assume you must already know this. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > Why would a pg_migrator style upgrade use pg_dump at all? I assumed it > would rather copy the verbatim data from the old to the new catalog, > only changing it if the layout of the tables in pg_catalog actually changed. The way pg_migrator works is does a pg_dump to move the schema to the new postgres. Then it transfers the files and drops them into place where the new schema expects to find them. So yes, there would be a use case for specifying the physical column layout when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably just update the physical column numbers itself. It's not like updating system catalog tables directly is any more of an abstraction violation than swapping files out from under the database... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Alvaro Herrera wrote: >> >> I haven't understood Alvaro to suggest not keeping 3 numbers. >> > > Right, I'm not advocating not doing that -- I'm just saying that the > first step to that could be decoupling physical position with attr id > :-) Logical column ordering (the order in which SELECT * expands to) > seems to me to be a different feature. > > Except in the sense that divorcing the id from the storage order makes it possible to do sanely. :-) Incidentally, I'm sure there would be a full scale revolt if there was a suggestion to alter the visible behaviour of SELECT *, COPY and other commands that rely on the logical ordering (which is currently, and unless we provide commands to alter it would stay as, the definition order). That's the order pg_dump should use IMNSHO - it should never have to worry about the physical order nor about explicitly setting the logical order. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On 2/21/07, Gregory Stark <stark@enterprisedb.com> wrote: > So yes, there would be a use case for specifying the physical column layout > when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably > just update the physical column numbers itself. It's not like updating system > catalog tables directly is any more of an abstraction violation than swapping > files out from under the database... If people are ok with that answer, then I'll gladly stop suggesting that ALTER TABLE be able to explicitly set storage positions. I was just trying to avoid forcing a tool like pg_migrator to muck with the system catalogs. phil ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Phil Currier escribió: > On 2/21/07, Gregory Stark <stark@enterprisedb.com> wrote: > >So yes, there would be a use case for specifying the physical column layout > >when pg_migrator is doing the pg_dump/restore. But pg_migrator could > >probably > >just update the physical column numbers itself. It's not like updating > >system > >catalog tables directly is any more of an abstraction violation than > >swapping > >files out from under the database... > > If people are ok with that answer, then I'll gladly stop suggesting > that ALTER TABLE be able to explicitly set storage positions. I was > just trying to avoid forcing a tool like pg_migrator to muck with the > system catalogs. I am ... that would be pg_migrator's goal anyway. And it's certainly going to need knowledge on how to go from one version to the next. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |