vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I am working on an issue that I deal with a lot, there is of course a standard answer, but maybe it is something to think about for PostgreSQL 9.0 or something. I think I finally understand what I have been fighting for a number of years. When I have been grousing about postgresql configuration, this has been what I have been fighting. One of the problems with the current PostgreSQL design is that all the databases operated by one postmaster server process are interlinked at some core level. They all share the same system tables. If one database becomes corrupt because of disk or something, the whole cluster is affected. If one db is REALLY REALLY huge and doesn't change, and a few others are small and change often, pg_dumpall will spend most of its time dumping the unchanging data. Now, the answer, obviously, is to create multiple postgresql database clusters and run postmaster for each logical group of databases, right? That really is a fine idea, but.... Say, in pgsql, I do this: "\c newdb" It will only find the database that I have in that logical group. If another postmaster is running, obviously, psql doesn't know anything about it. From the DB admin perspective, maybe there should be some heirarchical structure to this. What if there were a program, maybe a special parent "postmaster" process, I don't know, that started a list of child postmasters based on some site config? The parent postmaster would hold all the configuration parameters of the child postmaster processes, so there would only be on postgresql.conf. This also answers "how do we get postgresql options in a database," because the parent postmaster only needs to bootstrap the others, it can be configured to run lean and mean, and the "real" settings can be inspected and changed at will. A trigger will send a HUP to child postmasters when their settings change. The parent postmaster only needs one connection for each child and one admin, right? Does anyone see this as useful? ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Thu, 2006-02-02 at 10:23 -0500, Mark Woodward wrote: > If one db is REALLY REALLY huge and doesn't change, and a few > others are small and change often, pg_dumpall will spend most of its time > dumping the unchanging data. > My usual backup strategy does pg_dumpall -g to get the (tiny) global data, and then pg_dump for each individual database. Quite apart from anything else I prefer to have custom format dumps anyway, but I think this should meet your need for less frequent dumping of some constant database. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| "Mark Woodward" <pgsql@mohawksoft.com> writes: > One of the problems with the current PostgreSQL design is that all the > databases operated by one postmaster server process are interlinked at > some core level. They all share the same system tables. If one database > becomes corrupt because of disk or something, the whole cluster is > affected. This problem is not as large as you paint it, because most of the system catalogs are *not* shared. > Does anyone see this as useful? No... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| > "Mark Woodward" <pgsql@mohawksoft.com> writes: >> One of the problems with the current PostgreSQL design is that all the >> databases operated by one postmaster server process are interlinked at >> some core level. They all share the same system tables. If one database >> becomes corrupt because of disk or something, the whole cluster is >> affected. > > This problem is not as large as you paint it, because most of the system > catalogs are *not* shared. > >> Does anyone see this as useful? Seriously? No use at all? You don't see any purpose in controlling and managing multiple postgresql postmaster processes from one central point? Sure you don't want to think about this a little? ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Mark Woodward wrote: >>"Mark Woodward" <pgsql@mohawksoft.com> writes: >> >>>One of the problems with the current PostgreSQL design is that all the >>>databases operated by one postmaster server process are interlinked at >>>some core level. They all share the same system tables. If one database >>>becomes corrupt because of disk or something, the whole cluster is >>>affected. >> >>This problem is not as large as you paint it, because most of the system >>catalogs are *not* shared. >> >> >>>Does anyone see this as useful? > > > Seriously? No use at all? You don't see any purpose in controlling and > managing multiple postgresql postmaster processes from one central point? pgAdmin does so. IMHO it's totally sufficient to handle this on a client side level. Regards, Andreas ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Mark Woodward wrote: > Seriously? No use at all? You don't see any purpose in controlling and > managing multiple postgresql postmaster processes from one central point? I'd rather spend effort in fixing the problems that arise from big databases; for example Hannu's patch for concurrent vacuum attacks one of the problems that IMHO are important. More elaborate partitioning does too. Anyway, if you're very excited about it, I don't think it's impossible to code a super-postmaster that would redirect a client to the real postmaster. I even think it can be done without modifying the regular postmaster. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 2 Feb 2006, Mark Woodward wrote: > Now, the answer, obviously, is to create multiple postgresql database > clusters and run postmaster for each logical group of databases, right? > That really is a fine idea, but.... > > Say, in pgsql, I do this: "\c newdb" It will only find the database that I > have in that logical group. If another postmaster is running, obviously, > psql doesn't know anything about it. > >From the DB admin perspective, maybe there should be some heirarchical > structure to this. What if there were a program, maybe a special parent > "postmaster" process, I don't know, that started a list of child > postmasters based on some site config? The parent postmaster would hold > all the configuration parameters of the child postmaster processes, so > there would only be on postgresql.conf. > > This also answers "how do we get postgresql options in a database," > because the parent postmaster only needs to bootstrap the others, it can > be configured to run lean and mean, and the "real" settings can be > inspected and changed at will. A trigger will send a HUP to child > postmasters when their settings change. The parent postmaster only needs > one connection for each child and one admin, right? > > Does anyone see this as useful? Not as described above, no. Perhaps with a more concrete plan that actually talks about these things in more details. For example, you posit the \c thing as an issue, I don't personally agree, but you also don't address it with a solution. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| > On Thu, 2 Feb 2006, Mark Woodward wrote: > >> Now, the answer, obviously, is to create multiple postgresql database >> clusters and run postmaster for each logical group of databases, right? >> That really is a fine idea, but.... >> >> Say, in pgsql, I do this: "\c newdb" It will only find the database that >> I >> have in that logical group. If another postmaster is running, obviously, >> psql doesn't know anything about it. > >> >From the DB admin perspective, maybe there should be some heirarchical >> structure to this. What if there were a program, maybe a special parent >> "postmaster" process, I don't know, that started a list of child >> postmasters based on some site config? The parent postmaster would hold >> all the configuration parameters of the child postmaster processes, so >> there would only be on postgresql.conf. > >> >> This also answers "how do we get postgresql options in a database," >> because the parent postmaster only needs to bootstrap the others, it can >> be configured to run lean and mean, and the "real" settings can be >> inspected and changed at will. A trigger will send a HUP to child >> postmasters when their settings change. The parent postmaster only needs >> one connection for each child and one admin, right? >> >> Does anyone see this as useful? > > Not as described above, no. Perhaps with a more concrete plan that > actually talks about these things in more details. For example, you posit > the \c thing as an issue, I don't personally agree, but you also don't > address it with a solution. While I understand that it is quite a vague suggestion, I guess I was brainstorming more than detailing an actual set of features. My issue is this, (and this is NOT a slam on PostgreSQL), I have a number of physical databases on one machine on ports 5432, 5433, 5434. All running the same version and in fact, installation of PostgreSQL. Even though they run on the same machine, run the same version of the software, and are used by the same applications, they have NO interoperability. For now, lets just accept that they need to be on separate physical clusters because some need to be able to started and stopped while others need to remain running, there are other reasons, but one reason will suffice for the discussion. From an administration perspective, a single point of admin would seem like a logical and valuable objective, no? Beyond just the admin advanatges, the utilities could be modified to handle a root server that redirects to child servers. The psql program, when handling a "\c" command, queries the root server to find the child server and then connects to that. libpq could also be modified to handle this without changing the applications. The child postmasters will query the root postmaster when a DB is created and deleted to keep it up to date. Conflicts between two children can be managed by either some sort of first come first serve or disallow creating of a duplicate name, or some other method. So, conn = connect("host=localhost dbname=mydb"); Will connect to the root server, find the actual server, and then connect to it, completely hiding the different physical databases, and creating one very large logical install. Perhaps this can even be written to include large scale clustering. Who knows? ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, Feb 02, 2006 at 02:05:03PM -0500, Mark Woodward wrote: > My issue is this, (and this is NOT a slam on PostgreSQL), I have a number > of physical databases on one machine on ports 5432, 5433, 5434. All > running the same version and in fact, installation of PostgreSQL. One way of acheiving this would be to allow the PGHOST and/or PGPORT variables to be lists and when you connect it tries each combination until it finds on that works. Maybe not as clean but a lot easier to implement. Unless ofcourse you want "psql -l" to list all databases in all clusters... I think it would be better to put the intelligence into libpq rather than trying to create more servers... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFD4lqOIB7bNG8LQkwRAja/AJoCTdTbJf7v6E1IvXq024hildpRNQCdEVN/ fMUh3ZNPNRUBYI+C0GLwsf4= =12Tq -----END PGP SIGNATURE----- |
| ||||
| Mark, > Even though they run on the same machine, run the same version of the > software, and are used by the same applications, they have NO > interoperability. For now, lets just accept that they need to be on > separate physical clusters because some need to be able to started and > stopped while others need to remain running, there are other reasons, > but one reason will suffice for the discussion. Well, to answer your original question, I personally would not see your general idea as useful at all. I admin 9 or 10 PostgreSQL servers currently and have never run across a need, or even a desire, to do what you are doing. In fact, if there's any general demand, it's to go the opposite way: patches to lock down the system tables and prevent switching databases to support ISPs and other shared-hosting situations. For an immediate solution to what you are encountering, have you looked at pgPool? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |