vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: > > The input functions get it, the output functions (bpcharout, > > bpcharsend, etc) don't. Which makes it kind of hard to print a raw > > value if you don't know how long it's going to be. They used to, but > > that was removed some time back. > Even back then you couldn't rely on the typmod value to be supplied; > it was quite likely to be passed as -1. The issue is not actually > with on-disk storage, it is with function/operator arguments and > results. Those have never been identified any more closely than by > giving a type OID. So for any value that came from a function, > you won't have a typmod, and you'd better be able to find out all > you need to know just by inspecting the value itself. Hence, length > words. Hm, so it could be stored on disk without the length header as long as the length header is added to the in-memory representation? I don't think the type system has hooks for reading and storing data to disk though. > This is all pretty off-topic for pgsql-general, isn't it? [moved to -hackers] -- greg ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Greg Stark <gsstark@mit.edu> writes: > Hm, so it could be stored on disk without the length header as long as > the length header is added to the in-memory representation? I don't > think the type system has hooks for reading and storing data to disk > though. No, it doesn't, and we'd pay a nonzero price for allowing that. Currently the executor doesn't have to care (much) about whether a tuple is on-disk or in-memory --- the individual datums look the same either way. Allowing them to be different would force a lot of format conversion steps that currently need not happen. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Hm, so it could be stored on disk without the length header as long as > > the length header is added to the in-memory representation? I don't > > think the type system has hooks for reading and storing data to disk > > though. > > No, it doesn't, and we'd pay a nonzero price for allowing that. > Currently the executor doesn't have to care (much) about whether a > tuple is on-disk or in-memory --- the individual datums look the same > either way. Allowing them to be different would force a lot of > format conversion steps that currently need not happen. Is there ever a case where an entire tuple is passed around without knowing the typmod of an attribute in the tuple? The conversion would only really have to happen when the attribute is fetched or stored, not when the tuple is being passed around wholesale. But I have a feeling that would be more intrusive than just making the entire system typmod aware. -- greg ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, Jun 29, 2006 at 02:40:15AM -0400, Greg Stark wrote: > > Greg Stark <gsstark@mit.edu> writes: > > No, it doesn't, and we'd pay a nonzero price for allowing that. > > Currently the executor doesn't have to care (much) about whether a > > tuple is on-disk or in-memory --- the individual datums look the same > > either way. Allowing them to be different would force a lot of > > format conversion steps that currently need not happen. > > Is there ever a case where an entire tuple is passed around without knowing > the typmod of an attribute in the tuple? A tuple is just an array of datums, with some header information. The problems come when you don't have a tuple anymore, but only the datum, like in arguments for functions. I think it's more a case that most places that deal with datums simply don't know about typmods. For example, the return type of a function can only be char, not char(16). If you consider the case of a function returning a RAW, the caller will have no way of knowing the typmod, they do know the type though. To be honest, it seems like a lot of work to save the four bytes of overhead for the varlena structure on disk if you're going to need it in memory anyway. And anything like RAW(16) which people want for UUIDs, if it's going to have a lot of functions associated with it, may as well just be a new type. I think time would be much better spent finding a way of allowing user-defined types to be created without using C functions. > The conversion would only really have to happen when the attribute is fetched > or stored, not when the tuple is being passed around wholesale. But I have a > feeling that would be more intrusive than just making the entire system typmod > aware. I'm not sure if tuples are ever passed wholesale very far. The first node to actually do anything with it (any join, expression or condition test) is going to need to deconstruct it. Consider where we currently we have a "Filter Cond" on a "Seq Scan". Currently the filter can access the datums directly on the disk page, with what you're proposing, it can't. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEo55ZIB7bNG8LQkwRAiqFAJ9IW143Ewq0DElWWLRLeM Wf5hX1ewCcDdOq +PT+U4FjO5mHH3E9i/F93xI= =F52V -----END PGP SIGNATURE----- |
| |||
| Martijn van Oosterhout <kleptog@svana.org> writes: > A tuple is just an array of datums, with some header information. The > problems come when you don't have a tuple anymore, but only the datum, > like in arguments for functions. > > I think it's more a case that most places that deal with datums simply > don't know about typmods. For example, the return type of a function > can only be char, not char(16). If you consider the case of a function > returning a RAW, the caller will have no way of knowing the typmod, > they do know the type though. > > To be honest, it seems like a lot of work to save the four bytes of > overhead for the varlena structure on disk if you're going to need it > in memory anyway. And anything like RAW(16) which people want for > UUIDs, if it's going to have a lot of functions associated with it, may > as well just be a new type. For large databases storage density leads directly to speed. Saving four bytes of overhead on a 16-byte data structure would mean a 20% speed increase. Even if that's only helpful on a tenth of the columns you're still talking about a 2% speed increase for all queries on the table. A lot of databases use CHAR(1) for flags. The overhead is even worse there. > Consider where we currently we have a "Filter Cond" on a "Seq Scan". > Currently the filter can access the datums directly on the disk page, with > what you're proposing, it can't. Well it only can't if the data type has conversion functions. I'm not sure how complex it would be having pointers that *getattr sometimes return pointers to the disk page and sometimes return pointers to a palloced copy though. -- greg ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Greg Stark wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > >> To be honest, it seems like a lot of work to save the four bytes of >> overhead for the varlena structure on disk if you're going to need it >> in memory anyway. And anything like RAW(16) which people want for >> UUIDs, if it's going to have a lot of functions associated with it, may >> as well just be a new type. >> > > For large databases storage density leads directly to speed. Saving four bytes > of overhead on a 16-byte data structure would mean a 20% speed increase. Even > if that's only helpful on a tenth of the columns you're still talking about a > 2% speed increase for all queries on the table. A lot of databases use CHAR(1) > for flags. The overhead is even worse there. > > I have to concur with this. Assume you use a bytea for a UUID that in turn is used as a primary key. The extra overhead will be reflected in all indexes, all foreign keys, etc. In a normalized database some tables may consist of UUID columns only. Regards, Thomas Hallgren ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > I have to concur with this. Assume you use a bytea for a UUID that in > turn is used as a primary key. The extra overhead will be reflected in > all indexes, all foreign keys, etc. In a normalized database some tables > may consist of UUID columns only. So you create a UUID type. It's cheap enough to create new types after all, that's one of postgresql's strengths. What I'm saying is that it's easier to create new fixed length types for the cases that need it, than it is to redo the entire type handling of the backend. And for people that want char(1), they should be using "char", which really is one byte (ex padding ofcourse). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEo+ywIB7bNG8LQkwRAvE+AJ9gE10+L8Qskh2OF67Rut GapzCXmQCfVGRH 9wFKXllofQ+KsKDbw7LV7Rw= =z9K7 -----END PGP SIGNATURE----- |
| |||
| Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > > I have to concur with this. Assume you use a bytea for a UUID that in > > turn is used as a primary key. The extra overhead will be reflected in > > all indexes, all foreign keys, etc. In a normalized database some tables > > may consist of UUID columns only. > > So you create a UUID type. It's cheap enough to create new types after > all, that's one of postgresql's strengths. What I'm saying is that it's > easier to create new fixed length types for the cases that need it, > than it is to redo the entire type handling of the backend. I guess my motivation here is that I feel currently char(n) is basically broken in Postgres. Sure it satisfies the letter of the specification, but it's failing to actually achieve anything for the users. There's no point at all in using char(n) in Postgres since it takes exactly the same amount of space as varchar() if you're always stuffing it full and more space if you're not. In the current setup the only reason for Postgres to have this data type at all is purely for legacy compatibility. It doesn't actually "work" in that it doesn't provide the space savings it's intended to and that would give users an actual reason to use it in new databases. -- greg ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Greg Stark <gsstark@mit.edu> writes: > In the current setup the only reason for Postgres to have this data type at > all is purely for legacy compatibility. Yes. So? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Martijn van Oosterhout wrote: > On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote: > >> I have to concur with this. Assume you use a bytea for a UUID that in >> turn is used as a primary key. The extra overhead will be reflected in >> all indexes, all foreign keys, etc. In a normalized database some tables >> may consist of UUID columns only. >> > > So you create a UUID type. It's cheap enough to create new types after > all, that's one of postgresql's strengths. It would be a whole lot easier if I could use a domain. > What I'm saying is that it's > easier to create new fixed length types for the cases that need it, > than it is to redo the entire type handling of the backend. > > Of course. But it's a matter of who does what. Your reasoning push the burden to the users. Regards, Thomas Hallgren ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |