Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 07:12 AM
Phil Currier
 
Posts: n/a
Default Column storage positions

Inspired by this thread [1], and in particular by the idea of storing
three numbers (permanent ID, on-disk storage position, display
position) for each column, I spent a little time messing around with a
prototype implementation of column storage positions to see what kind
of difference it would make. The results were encouraging: on a table
with 20 columns of alternating smallint and varchar(10) datatypes,
selecting the max() of one of the rightmost int columns across 1
million rows ran around 3 times faster. The same query on the
leftmost varchar column (which should suffer the most from this
change) predictably got a little slower (about 10%); I couldn't
measure a performance drop on the rightmost varchar columns. The
table's size didn't drop much in this case, but a different table of
20 alternating int and smallint columns showed a 20% slimmer disk
footprint, pretty much as expected. Pgbenching showed no measurable
difference, which isn't surprising since the pgbench test tables
consist of just int values with char filler at the end.

So here is a proposal for separating a column's storage position from
its permanent ID. I've ignored the display position piece of the
original thread because display positions don't do much other than
save you the hassle of creating a view on top of your table, while
storage positions have demonstrable, tangible benefits. And there is
no reason to connect the two features; display positions can easily be
added separately at a later point.

We want to decouple a column's on-disk storage position from its
permanent ID for two reasons: to minimize the space lost to alignment
padding between fields, and to speed up access to individual fields.
The system will automatically assign new storage positions when a
table is created, and when a table alteration requires a rewrite
(currently just adding a column with a default, or changing a column
datatype). To allow users to optimize tables based on the fields they
know will be frequently accessed, I think we should extend ALTER TABLE
to accept user-assigned storage positions (something like "ALTER TABLE
ALTER col SET STORAGE POSITION X"). This command would also be useful
for another reason discussed below.

In my prototype, I used these rules to determine columns' storage order:
1) fixed-width fields before variable-width, dropped columns always last
2) fixed-width fields ordered by increasing size
3) not-null fields before nullable fields
There are other approaches worth considering - for example, you could
imagine swapping the priority of rules 2 and 3. Resultant tables
would generally have more alignment waste, but would tend to have
slightly faster field access. I'm really not sure what the optimal
strategy is since every user will have a slightly different metric for
"optimal". In any event, either of these approaches is better than
the current situation.

To implement this, we'll need a field (perhaps attstoragepos?) in
pg_attribute to hold the storage position. It will equal attnum until
it is explicitly reassigned. The routines in heaptuple.c need to
quickly loop through the fields of a tuple in storage order rather
than attnum order, so I propose extending TupleDesc to hold an
"attrspos" array that sits alongside the attrs array. In the
prototype I used an array of int2 indices into the attrs array,
ordered by storage position.

These changes cause a problem in ExecTypeFromTLInternal: this function
calls CreateTemplateTupleDesc followed by TupleDescInitEntry, assuming
that attnum == attstoragepos for all tuples. With the introduction of
storage positions, this of course will no longer be true. I got
around this by having expand_targetlist, build_physical_tlist, and
build_relation_tlist make sure each TargetEntry (for targetlists
corresponding to either insert/update tuples, or base tuples pulled
straight from the heap) gets a correct resorigtbl and resname. Then
ExecTypeFromTLInternal first tries calling a new function
TupleDescInitEntryAttr, which hands off to TupleDescInitEntry and then
performs a syscache lookup to update the storage position using the
resorigtbl. This is a little ugly because ExecTypeFromTLInternal
doesn't know in advance what kind of tupledesc it's building, so it
needs to retreat to the old method whenever the syscache lookup fails,
but it was enough to pass the regression tests. I could use some
advice on this - there's probably a better way to do it.

Another problem relates to upgrades. With tools like pg_migrator now
on pgfoundry, people will eventually expect quick upgrades that don't
require rewriting each table's data. Storage positions would cause a
problem for every version X -> version Y upgrade with Y >= 8.3, even
when X is also >= 8.3, because a version X table could always have
been altered without a rewrite into a structure different from what
Y's CREATE TABLE will choose. I don't think it's as simple as just
using the above-mentioned ALTER TABLE extension to assign the proper
storage positions for each field, because the version X table could
have dropped columns that might or might not be present in any given
tuple on disk. The best solution I can see is having pg_dump create a
table covering *all* columns (including dropped ones) with explicit
storage positions, and then immediately issue an alter statement to
get rid of the dropped columns. I'm not thrilled about this approach
(in particular, preserving dropped columns across upgrades seems
sloppy), but I haven't been able to think of anything better.
Hopefully I'm missing a simpler way to do this.

Comments and ideas? Does this whole thing seem worthwhile to do?

phil


[1] http://archives.postgresql.org/pgsql...2/msg00780.php

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 07:13 AM
Sergey E. Koposov
 
Posts: n/a
Default Re: Column storage positions


Just as my 2 cents to the proposed idea.
I want to demonstrate that the proposed idea is very relevant for the
performance.

I recently did an migration from PG 8.1 to PG 8.2. During that time I was
dumping the 2TB database with several very wide tables (having ~ 200
columns). And I saw that on my pretty powerful server with 8Gb
RAM, Itanium2 procesor,large RAID which can do I/O at 100Mb/sec the
performance of pg_dump was CPU limited, and the read speed of the tables
was 1-1.5mb/sec (leading to 2 week dumping time).

I was very surprised by these times, and profiled postgres to check the
reason of that:
here is the top of gprof:
% cumulative self self total
time seconds seconds calls s/call s/call name
60.72 13.52 13.52 6769826 0.00 0.00 nocachegetattr
10.58 15.88 2.36 9035566 0.00 0.00 CopyAttributeOutText
7.22 17.49 1.61 65009457 0.00 0.00 CopySendData
6.34 18.90 1.41 1 1.41 22.21 CopyTo

So the main slow-down of the process was all this code recomputing the
boundaries of the columns.... I checked that by removing one tiny varchar
column and COALESCING all NULLs, and after that the performance of
pg_dumping increased by more than a factor of 2!

I should have reported that experience earlier... but I hope that my
observations can be useful in the context of the Phil's idea.

regards,
Sergey

************************************************** *****************
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 07:13 AM
Robert Treat
 
Posts: n/a
Default Re: Column storage positions

On Tuesday 20 February 2007 16:07, Phil Currier wrote:
> Another problem relates to upgrades. With tools like pg_migrator now
> on pgfoundry, people will eventually expect quick upgrades that don't
> require rewriting each table's data. Storage positions would cause a
> problem for every version X -> version Y upgrade with Y >= 8.3, even
> when X is also >= 8.3, because a version X table could always have
> been altered without a rewrite into a structure different from what
> Y's CREATE TABLE will choose.


If you are using pg_migrator your not going to be moving the datafiles on disk
anyway,so pg_migrator's behavior shouldnt change terribly. If your doing
pg_dump based upgrade, presumably pg_dump could write it's create statements
with the columns in attstorpos order and set attnum = attstorpos, preserving
the physical layout from the previous install.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 07:13 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Column storage positions

Phil Currier escribió:
> Inspired by this thread [1], and in particular by the idea of storing
> three numbers (permanent ID, on-disk storage position, display
> position) for each column, I spent a little time messing around with a
> prototype implementation of column storage positions to see what kind
> of difference it would make. The results were encouraging: on a table
> with 20 columns of alternating smallint and varchar(10) datatypes,
> selecting the max() of one of the rightmost int columns across 1
> million rows ran around 3 times faster.


[snipped]

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically. When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 07:13 AM
Phil Currier
 
Posts: n/a
Default Re: Column storage positions

On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> I'd expect the system being able to reoder the columns to the most
> efficient order possible (performance-wise and padding-saving-wise),
> automatically. When you create a table, sort the columns to the most
> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> end of the tuple; and anything that requires a rewrite of the table
> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> do it as well; and do it on TRUNCATE also) again recomputes the most
> efficient order.


That's exactly what I'm proposing. On table creation, the system
chooses an efficient column order for you. The next time an ALTER
TABLE operation forces a rewrite, the system would recompute the
column storage order. I hadn't thought of having CLUSTER also redo
the storage order, but that seems safe since it takes an exclusive
lock on the table. I'm less sure about whether it's safe to do this
during a TRUNCATE.

phil

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 07:13 AM
Bruce Momjian
 
Posts: n/a
Default Re: Column storage positions

Phil Currier wrote:
> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > I'd expect the system being able to reoder the columns to the most
> > efficient order possible (performance-wise and padding-saving-wise),
> > automatically. When you create a table, sort the columns to the most
> > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > end of the tuple; and anything that requires a rewrite of the table
> > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > do it as well; and do it on TRUNCATE also) again recomputes the most
> > efficient order.

>
> That's exactly what I'm proposing. On table creation, the system
> chooses an efficient column order for you. The next time an ALTER
> TABLE operation forces a rewrite, the system would recompute the
> column storage order. I hadn't thought of having CLUSTER also redo
> the storage order, but that seems safe since it takes an exclusive
> lock on the table. I'm less sure about whether it's safe to do this
> during a TRUNCATE.


Keep in mind we have a patch in process to reduce the varlena length and
reduce alignment requirements, so once that is in, reordering columns
will not be as important.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 07:13 AM
Florian G. Pflug
 
Posts: n/a
Default Re: Column storage positions

Phil Currier wrote:
> On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>> I'd expect the system being able to reoder the columns to the most
>> efficient order possible (performance-wise and padding-saving-wise),
>> automatically. When you create a table, sort the columns to the most
>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
>> end of the tuple; and anything that requires a rewrite of the table
>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
>> do it as well; and do it on TRUNCATE also) again recomputes the most
>> efficient order.

>
> That's exactly what I'm proposing. On table creation, the system
> chooses an efficient column order for you. The next time an ALTER
> TABLE operation forces a rewrite, the system would recompute the
> column storage order. I hadn't thought of having CLUSTER also redo
> the storage order, but that seems safe since it takes an exclusive
> lock on the table. I'm less sure about whether it's safe to do this
> during a TRUNCATE.


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 07:13 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Column storage positions

Bruce Momjian escribió:
> Phil Currier wrote:
> > On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > > I'd expect the system being able to reoder the columns to the most
> > > efficient order possible (performance-wise and padding-saving-wise),
> > > automatically. When you create a table, sort the columns to the most
> > > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > > end of the tuple; and anything that requires a rewrite of the table
> > > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > > do it as well; and do it on TRUNCATE also) again recomputes the most
> > > efficient order.

> >
> > That's exactly what I'm proposing. On table creation, the system
> > chooses an efficient column order for you. The next time an ALTER
> > TABLE operation forces a rewrite, the system would recompute the
> > column storage order. I hadn't thought of having CLUSTER also redo
> > the storage order, but that seems safe since it takes an exclusive
> > lock on the table. I'm less sure about whether it's safe to do this
> > during a TRUNCATE.

>
> Keep in mind we have a patch in process to reduce the varlena length and
> reduce alignment requirements, so once that is in, reordering columns
> will not be as important.


Yes, but the "cache offset" stuff is still significant, so there will be
some benefit in putting all the fixed-length attributes at the start of
the tuple, and varlena atts grouped at the end.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 07:13 AM
Phil Currier
 
Posts: n/a
Default Re: Column storage positions

On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote:
> Keep in mind we have a patch in process to reduce the varlena length and
> reduce alignment requirements, so once that is in, reordering columns
> will not be as important.


Well, as I understand it, that patch isn't really addressing the same
problem. Consider this table:
create table foo (a varchar(10), b int, c smallint, d int, e smallint, ....);

There are two problems here:

1) On my machine, each int/smallint column pair takes up 8 bytes. 2
of those 8 bytes are alignment padding wasted on the smallint field.
If we grouped all the smallint fields together within the tuple, that
space would not be lost.

2) Each time you access any of the int/smallint fields, you have to
peek inside the varchar field to figure out its length. If we stored
the varchar field at the end of the tuple instead, the access times
for all the other fields would be measurably improved, by a factor
that greatly outweighs the small penalty imposed on the varchar field
itself.

My understanding is that the varlena headers patch would potentially
reduce the size of the varchar header (which is definitely worthwhile
by itself), but it wouldn't help much for either of these problems.
Or am I misunderstanding what that patch does?

phil

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 07:13 AM
Bruce Momjian
 
Posts: n/a
Default Re: Column storage positions

Phil Currier wrote:
> On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote:
> > Keep in mind we have a patch in process to reduce the varlena length and
> > reduce alignment requirements, so once that is in, reordering columns
> > will not be as important.

>
> Well, as I understand it, that patch isn't really addressing the same
> problem. Consider this table:
> create table foo (a varchar(10), b int, c smallint, d int, e smallint, ....);
>
> There are two problems here:
>
> 1) On my machine, each int/smallint column pair takes up 8 bytes. 2
> of those 8 bytes are alignment padding wasted on the smallint field.
> If we grouped all the smallint fields together within the tuple, that
> space would not be lost.


Yes, good point.

> 2) Each time you access any of the int/smallint fields, you have to
> peek inside the varchar field to figure out its length. If we stored
> the varchar field at the end of the tuple instead, the access times
> for all the other fields would be measurably improved, by a factor
> that greatly outweighs the small penalty imposed on the varchar field
> itself.
>
> My understanding is that the varlena headers patch would potentially
> reduce the size of the varchar header (which is definitely worthwhile
> by itself), but it wouldn't help much for either of these problems.
> Or am I misunderstanding what that patch does?
>


Agreed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:49 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472