vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Inspired by this thread [1], and in particular by the idea of storing three numbers (permanent ID, on-disk storage position, display position) for each column, I spent a little time messing around with a prototype implementation of column storage positions to see what kind of difference it would make. The results were encouraging: on a table with 20 columns of alternating smallint and varchar(10) datatypes, selecting the max() of one of the rightmost int columns across 1 million rows ran around 3 times faster. The same query on the leftmost varchar column (which should suffer the most from this change) predictably got a little slower (about 10%); I couldn't measure a performance drop on the rightmost varchar columns. The table's size didn't drop much in this case, but a different table of 20 alternating int and smallint columns showed a 20% slimmer disk footprint, pretty much as expected. Pgbenching showed no measurable difference, which isn't surprising since the pgbench test tables consist of just int values with char filler at the end. So here is a proposal for separating a column's storage position from its permanent ID. I've ignored the display position piece of the original thread because display positions don't do much other than save you the hassle of creating a view on top of your table, while storage positions have demonstrable, tangible benefits. And there is no reason to connect the two features; display positions can easily be added separately at a later point. We want to decouple a column's on-disk storage position from its permanent ID for two reasons: to minimize the space lost to alignment padding between fields, and to speed up access to individual fields. The system will automatically assign new storage positions when a table is created, and when a table alteration requires a rewrite (currently just adding a column with a default, or changing a column datatype). To allow users to optimize tables based on the fields they know will be frequently accessed, I think we should extend ALTER TABLE to accept user-assigned storage positions (something like "ALTER TABLE ALTER col SET STORAGE POSITION X"). This command would also be useful for another reason discussed below. In my prototype, I used these rules to determine columns' storage order: 1) fixed-width fields before variable-width, dropped columns always last 2) fixed-width fields ordered by increasing size 3) not-null fields before nullable fields There are other approaches worth considering - for example, you could imagine swapping the priority of rules 2 and 3. Resultant tables would generally have more alignment waste, but would tend to have slightly faster field access. I'm really not sure what the optimal strategy is since every user will have a slightly different metric for "optimal". In any event, either of these approaches is better than the current situation. To implement this, we'll need a field (perhaps attstoragepos?) in pg_attribute to hold the storage position. It will equal attnum until it is explicitly reassigned. The routines in heaptuple.c need to quickly loop through the fields of a tuple in storage order rather than attnum order, so I propose extending TupleDesc to hold an "attrspos" array that sits alongside the attrs array. In the prototype I used an array of int2 indices into the attrs array, ordered by storage position. These changes cause a problem in ExecTypeFromTLInternal: this function calls CreateTemplateTupleDesc followed by TupleDescInitEntry, assuming that attnum == attstoragepos for all tuples. With the introduction of storage positions, this of course will no longer be true. I got around this by having expand_targetlist, build_physical_tlist, and build_relation_tlist make sure each TargetEntry (for targetlists corresponding to either insert/update tuples, or base tuples pulled straight from the heap) gets a correct resorigtbl and resname. Then ExecTypeFromTLInternal first tries calling a new function TupleDescInitEntryAttr, which hands off to TupleDescInitEntry and then performs a syscache lookup to update the storage position using the resorigtbl. This is a little ugly because ExecTypeFromTLInternal doesn't know in advance what kind of tupledesc it's building, so it needs to retreat to the old method whenever the syscache lookup fails, but it was enough to pass the regression tests. I could use some advice on this - there's probably a better way to do it. Another problem relates to upgrades. With tools like pg_migrator now on pgfoundry, people will eventually expect quick upgrades that don't require rewriting each table's data. Storage positions would cause a problem for every version X -> version Y upgrade with Y >= 8.3, even when X is also >= 8.3, because a version X table could always have been altered without a rewrite into a structure different from what Y's CREATE TABLE will choose. I don't think it's as simple as just using the above-mentioned ALTER TABLE extension to assign the proper storage positions for each field, because the version X table could have dropped columns that might or might not be present in any given tuple on disk. The best solution I can see is having pg_dump create a table covering *all* columns (including dropped ones) with explicit storage positions, and then immediately issue an alter statement to get rid of the dropped columns. I'm not thrilled about this approach (in particular, preserving dropped columns across upgrades seems sloppy), but I haven't been able to think of anything better. Hopefully I'm missing a simpler way to do this. Comments and ideas? Does this whole thing seem worthwhile to do? phil [1] http://archives.postgresql.org/pgsql...2/msg00780.php ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Just as my 2 cents to the proposed idea. I want to demonstrate that the proposed idea is very relevant for the performance. I recently did an migration from PG 8.1 to PG 8.2. During that time I was dumping the 2TB database with several very wide tables (having ~ 200 columns). And I saw that on my pretty powerful server with 8Gb RAM, Itanium2 procesor,large RAID which can do I/O at 100Mb/sec the performance of pg_dump was CPU limited, and the read speed of the tables was 1-1.5mb/sec (leading to 2 week dumping time). I was very surprised by these times, and profiled postgres to check the reason of that: here is the top of gprof: % cumulative self self total time seconds seconds calls s/call s/call name 60.72 13.52 13.52 6769826 0.00 0.00 nocachegetattr 10.58 15.88 2.36 9035566 0.00 0.00 CopyAttributeOutText 7.22 17.49 1.61 65009457 0.00 0.00 CopySendData 6.34 18.90 1.41 1 1.41 22.21 CopyTo So the main slow-down of the process was all this code recomputing the boundaries of the columns.... I checked that by removing one tiny varchar column and COALESCING all NULLs, and after that the performance of pg_dumping increased by more than a factor of 2! I should have reported that experience earlier... but I hope that my observations can be useful in the context of the Phil's idea. regards, Sergey ************************************************** ***************** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tuesday 20 February 2007 16:07, Phil Currier wrote: > Another problem relates to upgrades. With tools like pg_migrator now > on pgfoundry, people will eventually expect quick upgrades that don't > require rewriting each table's data. Storage positions would cause a > problem for every version X -> version Y upgrade with Y >= 8.3, even > when X is also >= 8.3, because a version X table could always have > been altered without a rewrite into a structure different from what > Y's CREATE TABLE will choose. If you are using pg_migrator your not going to be moving the datafiles on disk anyway,so pg_migrator's behavior shouldnt change terribly. If your doing pg_dump based upgrade, presumably pg_dump could write it's create statements with the columns in attstorpos order and set attnum = attstorpos, preserving the physical layout from the previous install. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Phil Currier escribió: > Inspired by this thread [1], and in particular by the idea of storing > three numbers (permanent ID, on-disk storage position, display > position) for each column, I spent a little time messing around with a > prototype implementation of column storage positions to see what kind > of difference it would make. The results were encouraging: on a table > with 20 columns of alternating smallint and varchar(10) datatypes, > selecting the max() of one of the rightmost int columns across 1 > million rows ran around 3 times faster. [snipped] I'd expect the system being able to reoder the columns to the most efficient order possible (performance-wise and padding-saving-wise), automatically. When you create a table, sort the columns to the most efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the end of the tuple; and anything that requires a rewrite of the table (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER do it as well; and do it on TRUNCATE also) again recomputes the most efficient order. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I'd expect the system being able to reoder the columns to the most > efficient order possible (performance-wise and padding-saving-wise), > automatically. When you create a table, sort the columns to the most > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the > end of the tuple; and anything that requires a rewrite of the table > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER > do it as well; and do it on TRUNCATE also) again recomputes the most > efficient order. That's exactly what I'm proposing. On table creation, the system chooses an efficient column order for you. The next time an ALTER TABLE operation forces a rewrite, the system would recompute the column storage order. I hadn't thought of having CLUSTER also redo the storage order, but that seems safe since it takes an exclusive lock on the table. I'm less sure about whether it's safe to do this during a TRUNCATE. phil ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Phil Currier wrote: > On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > I'd expect the system being able to reoder the columns to the most > > efficient order possible (performance-wise and padding-saving-wise), > > automatically. When you create a table, sort the columns to the most > > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the > > end of the tuple; and anything that requires a rewrite of the table > > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER > > do it as well; and do it on TRUNCATE also) again recomputes the most > > efficient order. > > That's exactly what I'm proposing. On table creation, the system > chooses an efficient column order for you. The next time an ALTER > TABLE operation forces a rewrite, the system would recompute the > column storage order. I hadn't thought of having CLUSTER also redo > the storage order, but that seems safe since it takes an exclusive > lock on the table. I'm less sure about whether it's safe to do this > during a TRUNCATE. Keep in mind we have a patch in process to reduce the varlena length and reduce alignment requirements, so once that is in, reordering columns will not be as important. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Phil Currier wrote: > On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> I'd expect the system being able to reoder the columns to the most >> efficient order possible (performance-wise and padding-saving-wise), >> automatically. When you create a table, sort the columns to the most >> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the >> end of the tuple; and anything that requires a rewrite of the table >> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER >> do it as well; and do it on TRUNCATE also) again recomputes the most >> efficient order. > > That's exactly what I'm proposing. On table creation, the system > chooses an efficient column order for you. The next time an ALTER > TABLE operation forces a rewrite, the system would recompute the > column storage order. I hadn't thought of having CLUSTER also redo > the storage order, but that seems safe since it takes an exclusive > lock on the table. I'm less sure about whether it's safe to do this > during a TRUNCATE. I think you'd want to have a flag per field that tell you if the user has overridden the storage pos for that specific field. Otherwise, the next time you have to chance to optimize the ordering, you might throw away changes that the admin has done on purpose. The same hold true for a pg_dump/pg_reload cycle. If none of the fields had their storage order changed manually, you'd want to reoder them optimally at dump/reload time. If, however, the admin specified an ordering, you'd want to preserve that. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Bruce Momjian escribió: > Phil Currier wrote: > > On 2/21/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > I'd expect the system being able to reoder the columns to the most > > > efficient order possible (performance-wise and padding-saving-wise), > > > automatically. When you create a table, sort the columns to the most > > > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the > > > end of the tuple; and anything that requires a rewrite of the table > > > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER > > > do it as well; and do it on TRUNCATE also) again recomputes the most > > > efficient order. > > > > That's exactly what I'm proposing. On table creation, the system > > chooses an efficient column order for you. The next time an ALTER > > TABLE operation forces a rewrite, the system would recompute the > > column storage order. I hadn't thought of having CLUSTER also redo > > the storage order, but that seems safe since it takes an exclusive > > lock on the table. I'm less sure about whether it's safe to do this > > during a TRUNCATE. > > Keep in mind we have a patch in process to reduce the varlena length and > reduce alignment requirements, so once that is in, reordering columns > will not be as important. Yes, but the "cache offset" stuff is still significant, so there will be some benefit in putting all the fixed-length attributes at the start of the tuple, and varlena atts grouped at the end. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote: > Keep in mind we have a patch in process to reduce the varlena length and > reduce alignment requirements, so once that is in, reordering columns > will not be as important. Well, as I understand it, that patch isn't really addressing the same problem. Consider this table: create table foo (a varchar(10), b int, c smallint, d int, e smallint, ....); There are two problems here: 1) On my machine, each int/smallint column pair takes up 8 bytes. 2 of those 8 bytes are alignment padding wasted on the smallint field. If we grouped all the smallint fields together within the tuple, that space would not be lost. 2) Each time you access any of the int/smallint fields, you have to peek inside the varchar field to figure out its length. If we stored the varchar field at the end of the tuple instead, the access times for all the other fields would be measurably improved, by a factor that greatly outweighs the small penalty imposed on the varchar field itself. My understanding is that the varlena headers patch would potentially reduce the size of the varchar header (which is definitely worthwhile by itself), but it wouldn't help much for either of these problems. Or am I misunderstanding what that patch does? phil ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| ||||
| Phil Currier wrote: > On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > Keep in mind we have a patch in process to reduce the varlena length and > > reduce alignment requirements, so once that is in, reordering columns > > will not be as important. > > Well, as I understand it, that patch isn't really addressing the same > problem. Consider this table: > create table foo (a varchar(10), b int, c smallint, d int, e smallint, ....); > > There are two problems here: > > 1) On my machine, each int/smallint column pair takes up 8 bytes. 2 > of those 8 bytes are alignment padding wasted on the smallint field. > If we grouped all the smallint fields together within the tuple, that > space would not be lost. Yes, good point. > 2) Each time you access any of the int/smallint fields, you have to > peek inside the varchar field to figure out its length. If we stored > the varchar field at the end of the tuple instead, the access times > for all the other fields would be measurably improved, by a factor > that greatly outweighs the small penalty imposed on the varchar field > itself. > > My understanding is that the varlena headers patch would potentially > reduce the size of the varchar header (which is definitely worthwhile > by itself), but it wouldn't help much for either of these problems. > Or am I misunderstanding what that patch does? > Agreed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |