Unix Technical Forum

Column storage positions

This is a discussion on Column storage positions within the pgsql Hackers forums, part of the PostgreSQL category; --> On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote: > Simon Riggs wrote: > > On Wed, 2007-02-21 at ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 04-12-2008, 07:14 AM
Simon Riggs
 
Posts: n/a
Default Re: Column storage positions

On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote:
> Simon Riggs wrote:
> > On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
> >
> >> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >>
> >>> I'd expect the system being able to reoder the columns to the most
> >>> efficient order possible (performance-wise and padding-saving-wise),
> >>> automatically. When you create a table, sort the columns to the most
> >>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> >>> end of the tuple; and anything that requires a rewrite of the table
> >>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> >>> do it as well; and do it on TRUNCATE also) again recomputes the most
> >>> efficient order.
> >>>
> >> That's exactly what I'm proposing. On table creation, the system
> >> chooses an efficient column order for you.
> >>

> >
> > That's fairly straightforward and beneficial. I much prefer Alvaro's
> > approach rather than the storage position details originally described.
> > Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
> > really don't think you want to go there.
> >
> > There is a problem: If people do a CREATE TABLE and then issue SELECT *
> > they will find the columns in a different order. That could actually
> > break some programs, so it isn't acceptable in all cases. e.g. COPY
> > without a column-list assumes that the incoming data should be assigned
> > to the table columns in the same order as the incoming data file.
> >

>
> You seem to have missed that we will be separating logical from physical
> ordering. Each attribute will have a permanent id, a physical ordering
> and a logical ordering. You can change either ordering without affecting
> the other.


I missed nothing, AFAICS. My understanding was that Alvaro was proposing
to have just a simple physical re-ordering and that would be altered at
CREATE TABLE time. No complexity of multiple column orderings: nice,
simple and effective. My only addition was to say: must be optional.

> COPY, SELECT and all user-visible commands should follow the logical
> ordering, not the physical ordering, which should be completely
> invisible to SQL.


I agree with comments here about the multiple orderings being a horrible
source of bugs, as well as lots of coding even to make it happen at all
http://archives.postgresql.org/pgsql...2/msg00859.php

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #22 (permalink)  
Old 04-12-2008, 07:14 AM
Florian G. Pflug
 
Posts: n/a
Default Re: Column storage positions

Phil Currier wrote:
> On 2/21/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
>> > don't see any good way to perform an upgrade between PG versions
>> > without rewriting each table's data. Maybe most people aren't doing
>> > upgrades like this right now, but it seems like it will only become
>> > more common in the future. In my opinion, this is more important than
>> > #1.

>>
>> I don't see this either. For all current tables, the storage position
>> is the attribute number, no exception. You say:
>>
>> > because the version X table could
>> > have dropped columns that might or might not be present in any given
>> > tuple on disk.

>>
>> Whether they're there or not is irrelevent. Drop columns are not
>> necesarily empty, but in any case they occupy a storage position until
>> the table is rewritten. A dump/restore doesn't need to preserve this,
>> but pg_migrator will need some smarts to handle it. The system will
>> need to create a column of the appropriate type and drop it to get to
>> the right state.

>
> I agree, a dump/restore that rewrites all the table datafiles doesn't
> need to handle this. And I agree that the system will need to create
> dropped columns and then drop them again, that's exactly what I
> suggested in fact. We're talking about pg_migrator-style upgrades
> only here.


Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
would rather copy the verbatim data from the old to the new catalog,
only changing it if the layout of the tables in pg_catalog actually
changed.

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #23 (permalink)  
Old 04-12-2008, 07:14 AM
Stephan Szabo
 
Posts: n/a
Default Re: Column storage positions

On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:

> On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> > Well, for two reasons:
> >
> > 1) If you have a table with one very-frequently-accessed varchar()
> > column and several not-frequently-accessed int columns, it might
> > actually make sense to put the varchar column first. The system won't
> > always be able to make the most intelligent decision about table
> > layout.

>
> Umm, the point of the exercise is that if you know there are int
> columns, then you can skip over them, whereas you can never skip over a
> varchar column. So there isn't really any situation where it would be
> better to put the varchar first.


IIRC, in the first message in this thread, or another recent thread of
this type, someone tried a reordering example with alternating
smallints and varchar() and found that the leftmost varchar was
actually slower to access after reordering, so I'm not sure that we can
say there isn't a situation where it would affect things.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #24 (permalink)  
Old 04-12-2008, 07:14 AM
Florian G. Pflug
 
Posts: n/a
Default Re: Column storage positions

Simon Riggs wrote:
> On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
>> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>>> I'd expect the system being able to reoder the columns to the most
>>> efficient order possible (performance-wise and padding-saving-wise),
>>> automatically. When you create a table, sort the columns to the most
>>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
>>> end of the tuple; and anything that requires a rewrite of the table
>>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
>>> do it as well; and do it on TRUNCATE also) again recomputes the most
>>> efficient order.

>> That's exactly what I'm proposing. On table creation, the system
>> chooses an efficient column order for you.

>
> That's fairly straightforward and beneficial. I much prefer Alvaro's
> approach rather than the storage position details originally described.
> Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
> really don't think you want to go there.
>
> There is a problem: If people do a CREATE TABLE and then issue SELECT *
> they will find the columns in a different order. That could actually
> break some programs, so it isn't acceptable in all cases. e.g. COPY
> without a column-list assumes that the incoming data should be assigned
> to the table columns in the same order as the incoming data file.


But the display order (and hence the COPY order) of columns would still
be determinted by attnum, not by some attstoragepos, no?
The column reordering would only apply to the physical storage of
columns, not to how it's presented to the user I'd think.

The original idea was to add a third column, attdisplaypos, and let the
user choose the display ordering independent from the unique id
(attnum), which in turn is independent from the storage position.

For simplicity, the OP said he omitted the display-position part here,
because it's really orthogonal to being able to modify the storage position.

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #25 (permalink)  
Old 04-12-2008, 07:14 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Column storage positions

Simon Riggs wrote:
>
> I agree with comments here about the multiple orderings being a horrible
> source of bugs, as well as lots of coding even to make it happen at all
> http://archives.postgresql.org/pgsql...2/msg00859.php
>
>


I thought we were going with this later proposal of Tom's (on which he's
convinced me):
http://archives.postgresql.org/pgsql...2/msg00983.php - if
not I'm totally confused (situation normal). The current thread started
with this sentence:

> Inspired by this thread [1], and in particular by the idea of storing
> three numbers (permanent ID, on-disk storage position, display
> position) for each column, I spent a little time messing around with a
> prototype implementation of column storage positions to see what kind
> of difference it would make.


I haven't understood Alvaro to suggest not keeping 3 numbers.

cheers



andrew

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #26 (permalink)  
Old 04-12-2008, 07:14 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Column storage positions

Stephan Szabo escribió:
> On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:
>
> > On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> > > Well, for two reasons:
> > >
> > > 1) If you have a table with one very-frequently-accessed varchar()
> > > column and several not-frequently-accessed int columns, it might
> > > actually make sense to put the varchar column first. The system won't
> > > always be able to make the most intelligent decision about table
> > > layout.

> >
> > Umm, the point of the exercise is that if you know there are int
> > columns, then you can skip over them, whereas you can never skip over a
> > varchar column. So there isn't really any situation where it would be
> > better to put the varchar first.

>
> IIRC, in the first message in this thread, or another recent thread of
> this type, someone tried a reordering example with alternating
> smallints and varchar() and found that the leftmost varchar was
> actually slower to access after reordering, so I'm not sure that we can
> say there isn't a situation where it would affect things.


Offsets are cached in tuple accesses, but the caching is obviously
disabled for all attributes past any variable-length attribute. So if
you put a varlena attr in front, caching is completely disabled for all
attrs (but that first one). The automatic reordering algorithm must put
all fixed-len attrs at the front, so that their offets (and that of the
first variable length attr) can be cached.

Did I miss something in what you were trying to say? I assume you must
already know this.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #27 (permalink)  
Old 04-12-2008, 07:14 AM
Gregory Stark
 
Posts: n/a
Default Re: Column storage positions


"Florian G. Pflug" <fgp@phlo.org> writes:

> Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
> would rather copy the verbatim data from the old to the new catalog,
> only changing it if the layout of the tables in pg_catalog actually changed.


The way pg_migrator works is does a pg_dump to move the schema to the new
postgres. Then it transfers the files and drops them into place where the new
schema expects to find them.

So yes, there would be a use case for specifying the physical column layout
when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
just update the physical column numbers itself. It's not like updating system
catalog tables directly is any more of an abstraction violation than swapping
files out from under the database...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #28 (permalink)  
Old 04-12-2008, 07:14 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Column storage positions



Alvaro Herrera wrote:
>>
>> I haven't understood Alvaro to suggest not keeping 3 numbers.
>>

>
> Right, I'm not advocating not doing that -- I'm just saying that the
> first step to that could be decoupling physical position with attr id
> :-) Logical column ordering (the order in which SELECT * expands to)
> seems to me to be a different feature.
>
>


Except in the sense that divorcing the id from the storage order makes
it possible to do sanely. :-)

Incidentally, I'm sure there would be a full scale revolt if there was a
suggestion to alter the visible behaviour of SELECT *, COPY and other
commands that rely on the logical ordering (which is currently, and
unless we provide commands to alter it would stay as, the definition
order). That's the order pg_dump should use IMNSHO - it should never
have to worry about the physical order nor about explicitly setting the
logical order.

cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #29 (permalink)  
Old 04-12-2008, 07:14 AM
Phil Currier
 
Posts: n/a
Default Re: Column storage positions

On 2/21/07, Gregory Stark <stark@enterprisedb.com> wrote:
> So yes, there would be a use case for specifying the physical column layout
> when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
> just update the physical column numbers itself. It's not like updating system
> catalog tables directly is any more of an abstraction violation than swapping
> files out from under the database...


If people are ok with that answer, then I'll gladly stop suggesting
that ALTER TABLE be able to explicitly set storage positions. I was
just trying to avoid forcing a tool like pg_migrator to muck with the
system catalogs.

phil

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #30 (permalink)  
Old 04-12-2008, 07:14 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Column storage positions

Phil Currier escribió:
> On 2/21/07, Gregory Stark <stark@enterprisedb.com> wrote:
> >So yes, there would be a use case for specifying the physical column layout
> >when pg_migrator is doing the pg_dump/restore. But pg_migrator could
> >probably
> >just update the physical column numbers itself. It's not like updating
> >system
> >catalog tables directly is any more of an abstraction violation than
> >swapping
> >files out from under the database...

>
> If people are ok with that answer, then I'll gladly stop suggesting
> that ALTER TABLE be able to explicitly set storage positions. I was
> just trying to avoid forcing a tool like pg_migrator to muck with the
> system catalogs.


I am ... that would be pg_migrator's goal anyway. And it's certainly
going to need knowledge on how to go from one version to the next.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:08 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com