Unix Technical Forum

Re: Table clustering idea

This is a discussion on Re: Table clustering idea within the pgsql Hackers forums, part of the PostgreSQL category; --> Dawid, > Other idea than using histogram_bounds would be using the > position of key inside the index to ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 03:11 AM
Luke Lonergan
 
Posts: n/a
Default Re: Table clustering idea

Dawid,

> Other idea than using histogram_bounds would be using the
> position of key inside the index to determine the "ideal"
> place of row inside the table and find the closest free spot
> there. This would be of course much more precise and wouldn't
> rely on statistic.


This is generally known as "index organized tables" in other databases.
It implements a clustered storage of the table, which dramatically
speeds access on the chosen indexed column and makes inserts fast.

This also eases some of the visibility/MVCC implementation issues being
discussed on hackers, but does not help with the maintenance of
non-storage key indices.

Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses. We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?

- Luke


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 03:11 AM
Josh Berkus
 
Posts: n/a
Default Re: Table clustering idea

Luke,

> Other DBMS have index organized tables that can use either hash or btree
> organizations, both of which have their uses. *We are planning to
> implement btree organized tables sometime - anyone else interested in
> this idea?


Search the archives. It's been dicussed on this list at least twice
before, that I know of.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 03:13 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Table clustering idea

On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
> Other DBMS have index organized tables that can use either hash or btree
> organizations, both of which have their uses. We are planning to
> implement btree organized tables sometime - anyone else interested in
> this idea?


I'm curious how you'll do it, as I was once told that actually trying to
store heap data in a btree structure would be a non-starter (don't
remember why).

On a somewhat related note, I think that it would be advantageous if the
FSM had a means to prefer certain pages for a given tuple over other
pages. This would allow for a better way to keep heap and possibly index
data more compacted, and it would also be a means of keeping tables
loosely clustered. It would also make it far easier to shrink heaps that
have become bloated, because the FSM could be told to favor pages at the
beginning of the relation.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 03:13 AM
Luke Lonergan
 
Posts: n/a
Default Re: Table clustering idea

Jim,

On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:

> On a somewhat related note, I think that it would be advantageous if the
> FSM had a means to prefer certain pages for a given tuple over other
> pages. This would allow for a better way to keep heap and possibly index
> data more compacted, and it would also be a means of keeping tables
> loosely clustered. It would also make it far easier to shrink heaps that
> have become bloated, because the FSM could be told to favor pages at the
> beginning of the relation.


Interesting idea - page affinity implemented using the FSM.

WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
Teradata implemented a hashing filesystem for their heap storage and I've
always wondered about how they handle collision and chaining efficiently,
but it's a solved problem for sure - knowing that makes the challenge that
much easier :-)

- Luke



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 03:13 AM
Kim Bisgaard
 
Posts: n/a
Default Re: Table clustering idea

Jim C. Nasby wrote:
> On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
>
>> Other DBMS have index organized tables that can use either hash or btree
>> organizations, both of which have their uses. We are planning to
>> implement btree organized tables sometime - anyone else interested in
>> this idea?
>>

>
> I'm curious how you'll do it, as I was once told that actually trying to
> store heap data in a btree structure would be a non-starter (don't
> remember why).
>

Ingres is now open source - they have clustering on btree/isam/hash
(it's called "modify table xx to btree on col1,col2")

Regards,
Kim


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 03:14 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Table clustering idea

On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote:
> Jim,
>
> On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:
>
> > On a somewhat related note, I think that it would be advantageous if the
> > FSM had a means to prefer certain pages for a given tuple over other
> > pages. This would allow for a better way to keep heap and possibly index
> > data more compacted, and it would also be a means of keeping tables
> > loosely clustered. It would also make it far easier to shrink heaps that
> > have become bloated, because the FSM could be told to favor pages at the
> > beginning of the relation.

>
> Interesting idea - page affinity implemented using the FSM.
>
> WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
> Teradata implemented a hashing filesystem for their heap storage and I've
> always wondered about how they handle collision and chaining efficiently,
> but it's a solved problem for sure - knowing that makes the challenge that
> much easier :-)


I know there were discussions in the past, though as per usual I can't
find them in the archives. At one point I had suggested clustering not
on a row level, but on a page level, since it doesn't really matter
terribly if the tuples in a page are clustered (worst case you can scan
the entire page).

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item (since
items will need to move to maintain an IOT).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 03:14 AM
Csaba Nagy
 
Posts: n/a
Default Re: Table clustering idea

> I think one of the issues might have been: how will you handle other
> indexes on the table when you can no longer point them at an item (since
> items will need to move to maintain an IOT).


I guess you shouldn't allow any other indexes. That's a perfectly
acceptable compromise I think... it would be still very useful for big
and narrow tables which would benefit from being clustered.

The other concern is how would you do sequential scans on the table if
items are allowed to move ? I think some other DBs have a facility to
make a "fast index scan" which is essentially a sequential scan of the
index file, something like that would be needed here too.

Cheers,
Csaba.



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 03:14 AM
J. Andrew Rogers
 
Posts: n/a
Default Re: Table clustering idea


On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote:
> I think one of the issues might have been: how will you handle other
> indexes on the table when you can no longer point them at an item
> (since
> items will need to move to maintain an IOT).



There are clean ways to handle this. The table is organized on the
primary key, a typical requirement for IOTs. Any indexes you add to
IOT reference the primary key of the heap tuple. Since the heap and
PK index are the same thing, external indexes use the PK as the tuple
identifier.

The only caveat is that this creates performance asymmetries. IOTs
have significantly faster access through their primary keys but
slower external index access since two B-Trees have to be traversed.
An IOT is typically only used for tables that are only accessed
through their primary key. Not supporting external indexes on IOTs
is a functional implementation (and probably recommended in
practice), though most real implementations allow external indexes if
not always in their first version.


J. Andrew Rogers
jrogers@neopolitan.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 03:14 AM
Josh Berkus
 
Posts: n/a
Default Re: Table clustering idea

Jim,

> I know there were discussions in the past, though as per usual I can't
> find them in the archives.


Search on "B-Tree Organized Tables".

From what I can find, this feature isn't prohibitively useless. It's just a
singnificant amount of effort for a result which is a tradeoff. That is,
you'd *only* want to use it on tables which are *always* accessed by their
primary key.

What stopped the features AFAICT is that the interested parties weren't up to
doing the code.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:30 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com