vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, Is there any reason why PostgreSQL replication solutions are all add-on 3rd party ones? Is there any reason why replication couldn't be implemented using triggers and a handful of stored procedures? This is what I have in mind: Have a plperl function that creates connections to all servers in the cluster (replication partners), and issues the supplied write query to them, possibly with a tag of some sort to indicated it is a replicated query (to prevent circular replication). Have a post execute trigger that calls the above replication function if the query was issued directly (as opposed to replicated), and passes it the query it just executed if it was successful. If the replication failed on any node, the whole thing gets rolled back. This would effectively give star topology synchronous replication with very little effort, and no need for any external code. Am I missing something obvious that would prevent this from working? It would give replication capabilities better than MySQL's (which can only handle ring based multi-master replication) for the sake of about 100 lines of code. None of the required functionality required is new to PostgreSQL, either. Is there an existing implementation of this? Perhaps a perl program that creates the required triggers and stored procedures from looking at a schema? Thanks. Gordan ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Jan 18, 2008, at 9:21 AM, gordan@bobich.net wrote: > Hi, > > Is there any reason why PostgreSQL replication solutions are all > add-on 3rd party ones? Because no one solution would be appropriate for everyone. The core team and contributors feel that their time is better spent on the database itself rather than developing and maintaining multiple different replication solutions and dealing with the support thereof. What has been done is to add some extra hooks in 8.3 for replication triggers that can help to specialize when/if a given trigger fires. > Is there any reason why replication couldn't be implemented using > triggers and a handful of stored procedures? That's usually how it's done. Well, plus some external user-land application libraries. > This is what I have in mind: > > Have a plperl function that creates connections to all servers in > the cluster (replication partners), and issues the supplied write > query to them, possibly with a tag of some sort to indicated it is > a replicated query (to prevent circular replication). > > Have a post execute trigger that calls the above replication > function if the query was issued directly (as opposed to > replicated), and passes it the query it just executed if it was > successful. > > If the replication failed on any node, the whole thing gets rolled > back. That sounds pretty brittle. Do you really want all progress in your databases to stop if there is a network issue to a single server? > This would effectively give star topology synchronous replication > with very little effort, and no need for any external code. Am I > missing something obvious that would prevent this from working? It > would give replication capabilities better than MySQL's (which can > only handle ring based multi-master replication) for the sake of > about 100 lines of code. None of the required functionality > required is new to PostgreSQL, either. > > Is there an existing implementation of this? Perhaps a perl program > that creates the required triggers and stored procedures from > looking at a schema? What you've described here would be pretty simple to implement. However, I think you've greatly underestimated the performance issues involved. If you need to push data to multiple databases before each transaction commits I think you'll find that pretty slow. That's why most of the available third party solutions are asynchronous. The biggest options are out there are Slony and Londiste (both master- slave, asynchronous) and Bucardo (asynchronous, but supports both master-master and master-slave) which, as you would have it, is written in Perl. Erik Jones DBA | EmmaŽ erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ |
| |||
| On Fri, 18 Jan 2008, Erik Jones wrote: >> Is there any reason why PostgreSQL replication solutions are all add-on 3rd >> party ones? > > Because no one solution would be appropriate for everyone. The core team and > contributors feel that their time is better spent on the database itself > rather than developing and maintaining multiple different replication > solutions and dealing with the support thereof. What has been done is to add > some extra hooks in 8.3 for replication triggers that can help to specialize > when/if a given trigger fires. Hmm, selective trigger firing sounds interesting. >> Is there any reason why replication couldn't be implemented using triggers >> and a handful of stored procedures? > > That's usually how it's done. Well, plus some external user-land application > libraries. That's just it - I don't think any user-land libraries would actually be required. One of supposed big advantages of MySQL is it's straightforward replication support. It's quite painful to see PostgreSQL suffer purely for the sake of lack of marketting in this department. :-( >> This is what I have in mind: >> >> Have a plperl function that creates connections to all servers in the >> cluster (replication partners), and issues the supplied write query to >> them, possibly with a tag of some sort to indicated it is a replicated >> query (to prevent circular replication). >> >> Have a post execute trigger that calls the above replication function if >> the query was issued directly (as opposed to replicated), and passes it the >> query it just executed if it was successful. >> >> If the replication failed on any node, the whole thing gets rolled back. > > That sounds pretty brittle. Do you really want all progress in your > databases to stop if there is a network issue to a single server? 1) That's what MySQL does (it either ignores errors or stops replication on encountering an error, which of those two it does is selectable, but that's about it). 2) If there is a network/server issue, that would be detected because the $dbh would break. If the $dbh breaks, then plperl can either attempt to re-connect, or failing that, boot the node from the replication cluster (could have a node list with active/fenced flag in a separate config schema). Neither approach would be difficult to implement. Then it could just not bother reconnecting the fenced node until the user updates the node status in the said config schema table. Configuration schema could also be replicated. >> This would effectively give star topology synchronous replication with very >> little effort, and no need for any external code. Am I missing something >> obvious that would prevent this from working? It would give replication >> capabilities better than MySQL's (which can only handle ring based >> multi-master replication) for the sake of about 100 lines of code. None of >> the required functionality required is new to PostgreSQL, either. >> >> Is there an existing implementation of this? Perhaps a perl program that >> creates the required triggers and stored procedures from looking at a >> schema? > > What you've described here would be pretty simple to implement. However, I > think you've greatly underestimated the performance issues involved. If you > need to push data to multiple databases before each transaction commits I > think you'll find that pretty slow. Only if transactions are used. I'm basing the requirements on "at least as good as MySQL", which this would meet without transactions. If transactions are wanted they could be enabled, otherwise it could just be fire and forget asynchronous replication a-la MySQL. Having a choice between transactions and speed is good. :-) One thing I haven't quite thought of a good way to do with this approach is the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER command, that gets the server in sync by dropping and re-loading all the tables from the master(s) (or rather, peers in a multi-master star replication), and enables it in the replication. It would be neater than requiring downtime or global write locks. But I guess that could wait until version 2. > That's why most of the available third > party solutions are asynchronous. The biggest options are out there are > Slony and Londiste (both master-slave, asynchronous) and Bucardo > (asynchronous, but supports both master-master and master-slave) which, as > you would have it, is written in Perl. I looked at all of the above, and they all seemed (to meat least) to involve unnecessary complication or limitations I saw as unreasonable (or both). I looked at Bucardo in detail, and I was rather disappointed to see that it only supports two master nodes at the moment. Thanks. Gordan ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| gordan@bobich.net wrote: > > > On Fri, 18 Jan 2008, Erik Jones wrote: > >>> This is what I have in mind: >>> >>> Have a plperl function that creates connections to all servers in the >>> cluster (replication partners), and issues the supplied write query >>> to them, possibly with a tag of some sort to indicated it is a >>> replicated query (to prevent circular replication). >>> >>> Have a post execute trigger that calls the above replication function >>> if the query was issued directly (as opposed to replicated), and >>> passes it the query it just executed if it was successful. Not sure here if you mean literally the SQL query that was executed - in which case you have all sorts of problems with sequences and functions returning different values. >>> >>> If the replication failed on any node, the whole thing gets rolled back. >> > >>> This would effectively give star topology synchronous replication >>> with very little effort, and no need for any external code. Am I >>> missing something obvious that would prevent this from working? It >>> would give replication capabilities better than MySQL's (which can >>> only handle ring based multi-master replication) for the sake of >>> about 100 lines of code. None of the required functionality required >>> is new to PostgreSQL, either. But there are plenty of solutions that do a lot better than this. Slony-I is the most polular. My favourite is a spin on the old db_mirror that used to be part of the Postgres distribution. I can't talk about how Slony works, but db_mirror uses a very fast 'C' function to capture changes in a set of simple replication tables. A replication process then takes data from those tables and replicates (using actual values not the SQL statement) to any number of other servers. If one of the servers is down, the data remains in the replication tables until that node returns (or is removed). The problem with db_mirror was that the replication process was written in Perl. This worked fine for simple tests but was ridiculously slow for replicating tables holding big BYTEA structures. I re-wrote the replication code in 'C' and it can replicate just about arbitrarily complex transactions is close to real-time. You seem to be re-inventing the wheel, and the re-invention is not quite as round as the existing wheel :-) >>> >>> Is there an existing implementation of this? Perhaps a perl program >>> that creates the required triggers and stored procedures from looking >>> at a schema? >> >> What you've described here would be pretty simple to implement. >> However, I think you've greatly underestimated the performance issues >> involved. If you need to push data to multiple databases before each >> transaction commits I think you'll find that pretty slow. > > Only if transactions are used. I'm basing the requirements on "at least > as good as MySQL", which this would meet without transactions. If > transactions are wanted they could be enabled, otherwise it could just > be fire and forget asynchronous replication a-la MySQL. Having a choice > between transactions and speed is good. :-) Synchronous replication tends to imply it works on all servers simultaneously or not on any. If any server fails a transaction it's rolled back on all servers. What you're describing sounds asynchronous to me. > > One thing I haven't quite thought of a good way to do with this approach > is the equivalent of the useful (albeit deprecated) LOAD DATA FROM > MASTER command, that gets the server in sync by dropping and re-loading > all the tables from the master(s) (or rather, peers in a multi-master > star replication), and enables it in the replication. It would be neater > than requiring downtime or global write locks. But I guess that could > wait until version 2. That's one thing. The other problem that most trigger based replication systems have problems with is propogating schema changes - because (I think) you can attach triggers to schema changes. > > Thanks. > Gordan > Pete -- Peter Wilson : http://www.whitebeam.org |
| |||
| >>>> This is what I have in mind: >>>> >>>> Have a plperl function that creates connections to all servers in the >>>> cluster (replication partners), and issues the supplied write query to >>>> them, possibly with a tag of some sort to indicated it is a replicated >>>> query (to prevent circular replication). >>>> >>>> Have a post execute trigger that calls the above replication function if >>>> the query was issued directly (as opposed to replicated), and passes it >>>> the query it just executed if it was successful. > > Not sure here if you mean literally the SQL query that was executed - in > which case you have all sorts of problems with sequences and functions > returning different values. Indeed, but sequences at least can be worked around. Post-execute, sequence number used should be available already, and the sequence offset and increment can be set so as to ensure they do not clash. That's what MySQL does (and I must apologize for making the comparison all the time). >>>> If the replication failed on any node, the whole thing gets rolled back. >>> >> >>>> This would effectively give star topology synchronous replication with >>>> very little effort, and no need for any external code. Am I missing >>>> something obvious that would prevent this from working? It would give >>>> replication capabilities better than MySQL's (which can only handle ring >>>> based multi-master replication) for the sake of about 100 lines of code. >>>> None of the required functionality required is new to PostgreSQL, either. > > But there are plenty of solutions that do a lot better than this. Slony-I is > the most polular. My favourite is a spin on the old db_mirror that used to be > part of the Postgres distribution. How would Slony be better? It doesn't seem to support master-master replication for one. > I can't talk about how Slony works, but db_mirror uses a very fast 'C' > function to capture changes in a set of simple replication tables. A > replication process then takes data from those tables and replicates (using > actual values not the SQL statement) to any number of other servers. If one > of the servers is down, the data remains in the replication tables until that > node returns (or is removed). Interesting. I was thinking about making an auxiliary feature that just writes a pending queue log for a server when it cannot establish the $dbh, and when it manages to connect, it attempts to re-play the log before issuing new queries. The problem with this is that the ordering becomes inconsistent with multiple masters. That would be a bit too inconsistent for my liking. As far as I can tell, that is also why MySQL's current replication method is unsuitable for more than ring-replication. Having said that, ring suffers from similar race conditions, it's more of a hack than a solution. Now that I think about it, I'm not actually sure that waiting for global success before final commit would make update/delete without race condition as they won't fail, but can still yield inconsistencies due to race conditions. Still, I think it's worth having despite this issue. > The problem with db_mirror was that the replication process was written in > Perl. This worked fine for simple tests but was ridiculously slow for > replicating tables holding big BYTEA structures. I re-wrote the replication > code in 'C' and it can replicate just about arbitrarily complex transactions > is close to real-time. Yes, I can see how big blobs can be an issue for performance. :-( > You seem to be re-inventing the wheel, and the re-invention is not quite as > round as the existing wheel :-) Not quite - I think multi-master capability is important. >>>> Is there an existing implementation of this? Perhaps a perl program that >>>> creates the required triggers and stored procedures from looking at a >>>> schema? >>> >>> What you've described here would be pretty simple to implement. However, >>> I think you've greatly underestimated the performance issues involved. If >>> you need to push data to multiple databases before each transaction >>> commits I think you'll find that pretty slow. >> >> Only if transactions are used. I'm basing the requirements on "at least as >> good as MySQL", which this would meet without transactions. If transactions >> are wanted they could be enabled, otherwise it could just be fire and >> forget asynchronous replication a-la MySQL. Having a choice between >> transactions and speed is good. :-) > > Synchronous replication tends to imply it works on all servers simultaneously > or not on any. If any server fails a transaction it's rolled back on all > servers. What you're describing sounds asynchronous to me. Indeed, I spotted that above. The transactions roll back of they fail, but this alone does not quite ensure cross-node consistency of the data. Some kind of special DELETE/UPDATE handling would be required to fix this, but I don't have a definitive idea on how this could be handled. Will have to think about it a bit more. >> One thing I haven't quite thought of a good way to do with this approach is >> the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER >> command, that gets the server in sync by dropping and re-loading all the >> tables from the master(s) (or rather, peers in a multi-master star >> replication), and enables it in the replication. It would be neater than >> requiring downtime or global write locks. But I guess that could wait until >> version 2. > > That's one thing. The other problem that most trigger based replication > systems have problems with is propogating schema changes - because (I think) > you can attach triggers to schema changes. I presume you mean that you cannot attach triggers to schema changes. Yes, I had thought of that a minute ago. I don't suppose this could be deemed a feature request for CREATE/ALTER/DROP schema level triggers? Gordan ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Jan 18, 2008, at 11:37 AM, gordan@bobich.net wrote: >> >> That's one thing. The other problem that most trigger based >> replication systems have problems with is propogating schema >> changes - because (I think) you can attach triggers to schema >> changes. > > I presume you mean that you cannot attach triggers to schema > changes. Yes, I had thought of that a minute ago. I don't suppose > this could be deemed a feature request for CREATE/ALTER/DROP schema > level triggers? I'm sure he did. Also, there was a little bit of discussion a week or so ago about ddl & ddl/dml (such as TRUNCATE) triggers. With the imminent release of 8.3, now's the time to pipe up with feature requests. Erik Jones DBA | EmmaŽ erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Fri, Jan 18, 2008 at 04:09:45PM +0000, gordan@bobich.net wrote: > > That's just it - I don't think any user-land libraries would actually be > required. One of supposed big advantages of MySQL is it's straightforward > replication support. It's quite painful to see PostgreSQL suffer purely > for the sake of lack of marketting in this department. :-( The "straigtforward" replication support in MySQL is seriously broken. We (by which I really mean "Jan") spent a great deal of time on the design of Slony (and it's add-on nature is a feature, not a bug -- one thing it can do is cross-version upgrades on PostgreSQL versions that were out before Slony was finished being dfesigned) to avoid several nasty corner cases that are sort of waved aside in the MySQL documentation. Designing a replication system that works well 80% of the time is a waste of effort, because the times when you really need it are all already in that 20% of cases that you won't cover with the simple-minded solution. Specifically, > 1) That's what MySQL does (it either ignores errors or stops replication > on encountering an error, which of those two it does is selectable, but > that's about it). That's got to be _the_ most brain-dead approach to replication I've ever heard. It chooses the two least good of all possible worlds, and when you get into your particular version of hell at 0-dark:30, you have to spend some time first figuring out which hell you happen to be in. In any case, > fire and forget asynchronous replication a-la MySQL. Having a choice > between transactions and speed is good. :-) if this is what you believe, then you don't need a database to store your data anyway. I can make your data system faster by storing all your data on /dev/null. Writes will be very fast indeed. A ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Andrew Sullivan wrote: > On Fri, Jan 18, 2008 at 04:09:45PM +0000, gordan@bobich.net wrote: >> That's just it - I don't think any user-land libraries would actually be >> required. One of supposed big advantages of MySQL is it's straightforward >> replication support. It's quite painful to see PostgreSQL suffer purely >> for the sake of lack of marketting in this department. :-( > > The "straigtforward" replication support in MySQL is seriously broken. I am not arguing that it isn't! :-) I am merely trying to implement something at least as good (or rather, no more broken) for PostgreSQL with a minimum of effort. > We > (by which I really mean "Jan") spent a great deal of time on the design of > Slony (and it's add-on nature is a feature, not a bug -- one thing it can do > is cross-version upgrades on PostgreSQL versions that were out before Slony > was finished being dfesigned) to avoid several nasty corner cases that are > sort of waved aside in the MySQL documentation. Designing a replication > system that works well 80% of the time is a waste of effort, because the > times when you really need it are all already in that 20% of cases that you > won't cover with the simple-minded solution. > > Specifically, > >> 1) That's what MySQL does (it either ignores errors or stops replication >> on encountering an error, which of those two it does is selectable, but >> that's about it). > > That's got to be _the_ most brain-dead approach to replication I've ever > heard. It chooses the two least good of all possible worlds, and when you > get into your particular version of hell at 0-dark:30, you have to spend > some time first figuring out which hell you happen to be in. I couldn't agree more. But I don't see another multi-master replication solution on the horizon. > In any case, > >> fire and forget asynchronous replication a-la MySQL. Having a choice >> between transactions and speed is good. :-) > > if this is what you believe, then you don't need a database to store your > data anyway. I can make your data system faster by storing all your data on > /dev/null. Writes will be very fast indeed. Fantastically put. :-) But in the meantime, until a better multi-master replication solution becomes available, I think I'll stick with the current plan. I suppose some kind of a write counter with a rolling write query hash could be implemented. Replicator function issues locks and compares the counters/hashes to establish whether a state is consistent on all nodes before a write query is replicated. It's a kludge and a horrible one at that, and it will slow down the writes under load, but I think it would work for ensuring ordering consistency with not-commutative write operations. Gordan ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Hello, On Fri, 18 Jan 2008 17:37:07 +0000 (GMT) gordan@bobich.net wrote: > >>>> This is what I have in mind: > >>>> > >>>> Have a plperl function that creates connections to all servers in the > >>>> cluster (replication partners), and issues the supplied write query to > >>>> them, possibly with a tag of some sort to indicated it is a replicated > >>>> query (to prevent circular replication). > >>>> > >>>> Have a post execute trigger that calls the above replication function if > >>>> the query was issued directly (as opposed to replicated), and passes it > >>>> the query it just executed if it was successful. > > > > Not sure here if you mean literally the SQL query that was executed - in > > which case you have all sorts of problems with sequences and functions > > returning different values. > > Indeed, but sequences at least can be worked around. Post-execute, > sequence number used should be available already, and the sequence offset > and increment can be set so as to ensure they do not clash. That's what > MySQL does (and I must apologize for making the comparison all the time). Sequences are only one (small) problem. What about functions returning different results (volatile) for each call? Just imagine random() or now(). What about inserts or updates selecting parts of table data? You can't be sure to get exactly the same results on the slave. > > But there are plenty of solutions that do a lot better than this. Slony-I is > > the most polular. My favourite is a spin on the old db_mirror that used to be > > part of the Postgres distribution. > > How would Slony be better? It doesn't seem to support master-master > replication for one. But it work's. At least master-slave. > The problem with this is that the ordering becomes inconsistent with > multiple masters. That would be a bit too inconsistent for my liking. As > far as I can tell, that is also why MySQL's current replication method is > unsuitable for more than ring-replication. Having said that, ring suffers > from similar race conditions, it's more of a hack than a solution. A lot books are written about this topic, many details researched and still master-master replication i a very complicated topic. You should start reading about vector and Lamport clocks. This are the (very) basic (among others) for your sync problem. > Now that I think about it, I'm not actually sure that waiting for global > success before final commit would make update/delete without race > condition as they won't fail, but can still yield inconsistencies due to > race conditions. Still, I think it's worth having despite this issue. Sure, you are right. You now are about to find out what other ppl are researching the last *uhm* 20 years ;-) > > You seem to be re-inventing the wheel, and the re-invention is not quite as > > round as the existing wheel :-) > > Not quite - I think multi-master capability is important. Yes. But master-master or even multi-master is a very complicated topic. So let's start with something more easy, like master-slave. This works, you don't have race conditions and that's a good starting point for more research. > I presume you mean that you cannot attach triggers to schema changes. Yes, > I had thought of that a minute ago. I don't suppose this could be deemed a > feature request for CREATE/ALTER/DROP schema level triggers? More triggers, not only for ddl changes, would be nice anyway. I see fields of application for some of my own projects ;-) Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Andreas 'ads' Scherbaum wrote: >>>>>> Have a plperl function that creates connections to all servers in the >>>>>> cluster (replication partners), and issues the supplied write query to >>>>>> them, possibly with a tag of some sort to indicated it is a replicated >>>>>> query (to prevent circular replication). >>>>>> >>>>>> Have a post execute trigger that calls the above replication function if >>>>>> the query was issued directly (as opposed to replicated), and passes it >>>>>> the query it just executed if it was successful. >>> Not sure here if you mean literally the SQL query that was executed - in >>> which case you have all sorts of problems with sequences and functions >>> returning different values. >> Indeed, but sequences at least can be worked around. Post-execute, >> sequence number used should be available already, and the sequence offset >> and increment can be set so as to ensure they do not clash. That's what >> MySQL does (and I must apologize for making the comparison all the time). > > Sequences are only one (small) problem. What about functions returning > different results (volatile) for each call? Just imagine random() or > now(). Yes, that's a problem. The bodge workaround for that is to save the master's state for such functions and re-pack it from a function into a literal in a pre-execution trigger, and then replicate the literals. > What about inserts or updates selecting parts of table data? You > can't be sure to get exactly the same results on the slave. You can if you have an ordering consistency check mechanism, as I mentioned in the other mail. Recovery when "something goes wrong" (tm), however, could get interesting, especially under heavy distributed write load. If there's a counter and a hash, I guess you could lock everything, find the one with the biggest counter, and release the lock on everything else until it catches up, then re-lock, then replicate. It would add a fair bit of latency, though. Gordan ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |