This is a discussion on PG-MQ? within the pgsql Hackers forums, part of the PostgreSQL category; --> I'm seeing some applications where it appears that there would be value in introducing asynchronous messaging, ala "message queueing." ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm seeing some applications where it appears that there would be value in introducing asynchronous messaging, ala "message queueing." <http://en.wikipedia.org/wiki/Message_queue> The "granddaddy" of message queuing systems is IBM's MQ-Series, and I don't see particular value in replicating its functionality. On the other side, the "big names" these days are: a) The Java Messaging Service, which seems to implement *way* more options than I'm even vaguely interested in having (notably, lots that involve data stores or lack thereof that I do not care to use); b) AMPQ <http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol> which again is a big, complex, protocol-oriented thing that seems mighty big and complex; c) There are lesser names, like isectd <http://isectd.sf.net> and the (infamous?) Spread Toolkit which both implement memory-based messaging systems. There's a simple + free system called MQS that is implemented in Perl and uses XML-RPC; <http://www.foo.be/mqs/> that seems perhaps a little *too* primitive. FYI, here's its simple schema (filtered into PG): create table mqname ( name varchar (255) not null default '', id serial, priority integer not null default 10, primary key (id)); create table message ( id serial, timestamp timestamptz NOT NULL cid varchar(255), queue integer not null references mqname(id), body text not null, priority integer not null default 10, flag integer default 0, primary key(id) ); My bias would be to have something that can basically run as a thin set of stored procedures atop PostgreSQL :-). It would be trivial to extend that to support SOAP/XML-RPC, if desired. It would be nice to achieve 'higher availability' by having queues where you might replicate the contents (probably using the MQ system itself ;-)) to other servers. There tend to be varying semantics out there: - Some queues may represent "subscriptions" where a whole bunch of listeners want to get all the messages; - Sometimes you have the semantics where: - messages need to be delivered at least once - messages need to be delivered no more than once - messages need to be delivered exactly once Is there any existing work out there on this? Or should I maybe be looking at prototyping something? -- (format nil "~S@~S" "cbbrowne" "linuxfinances.info") http://linuxfinances.info/info/lsf.html Q: How many Newtons does it take to change a light bulb? A: Faux! There to eat lemons, axe gravy soup! |
| |||
| On Jun 19, 2007, at 2:45 PM, Chris Browne wrote: > I'm seeing some applications where it appears that there would be > value in introducing asynchronous messaging, ala "message queueing." > <http://en.wikipedia.org/wiki/Message_queue> Me too. > My bias would be to have something that can basically run as a thin > set of stored procedures atop PostgreSQL :-). It would be trivial to > extend that to support SOAP/XML-RPC, if desired. > > It would be nice to achieve 'higher availability' by having queues > where you might replicate the contents (probably using the MQ system > itself ;-)) to other servers. > > There tend to be varying semantics out there: > > - Some queues may represent "subscriptions" where a whole bunch of > listeners want to get all the messages; > > - Sometimes you have the semantics where: > - messages need to be delivered at least once > - messages need to be delivered no more than once > - messages need to be delivered exactly once > > Is there any existing work out there on this? Or should I maybe be > looking at prototyping something? The skype tools have some sort of decent-looking publish/subscribe thing, PgQ, then they layer their replication on top of. It's multi consumer and producer, with "delivered at least once" semantics. Looks nice. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| steve@blighty.com (Steve Atkins) writes: >> Is there any existing work out there on this? Or should I maybe be >> looking at prototyping something? > > The skype tools have some sort of decent-looking publish/subscribe > thing, PgQ, then they layer their replication on top of. It's multi > consumer and producer, with "delivered at least once" semantics. > > Looks nice. I had not really noticed that - I need to take a look at their connection pooler too, so I guess that puts more "skype" items on my ToDo list ;-). Thanks for pointing it out... -- let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" [name;tld];; http://cbbrowne.com/info/advocacy.html Signs of a Klingon Programmer #1: "Our users will know fear and cower before our software. Ship it! Ship it and let them flee like the dogs they are!" |
| |||
| On Wed, June 20, 2007 04:45, Chris Browne wrote: > I'm seeing some applications where it appears that there would be > value in introducing asynchronous messaging, ala "message queueing." > <http://en.wikipedia.org/wiki/Message_queue> > > The "granddaddy" of message queuing systems is IBM's MQ-Series, and I > don't see particular value in replicating its functionality. I'm quite interested in this. Maybe I'm thinking of something too complex, but I do think there are some "oh it'll need to do that too" pitfalls that are best considered up front. The big thing about MQ is that it participates as a resource manager in two-phase commits (and optionally a transaction manager as well). That means that you get atomic processing steps: application takes message off a queue, processes it, commits its changes to the database, replies to message. The queue manager then does a second-phase commit for all of those steps, and that's when the reply really goes out. If the application fails, none of this will have happened so you get ACID over the complete cycle. That's something we should have free software for. Perhaps the time is right for something new. A lot of the complexity inside MQ comes from data representation issues like encodings and fixed-length strings, as I recall, and things have changed since MQ was designed. I agree it could be useful (and probably not hard either) to have a transactional messaging system inside the database. It saves you from having to do two-phase commits. But it does tie everything to postgres to some extent, and you lose the interesting features—atomicity and assured, single delivery—as soon as anything in the chain does anything persistent that does not participate in the postgres transaction. Perhaps what we really need is more mature components, with a unified control layer on top. That's how a lot of successful free software grows. See below. > On the other side, the "big names" these days are: > > a) The Java Messaging Service, which seems to implement *way* more > options than I'm even vaguely interested in having (notably, lots > that involve data stores or lack thereof that I do not care to use); Far as I know, JMS is an API, not a product. You'd still slot some messaging middleware underneath, such as MQ. That is why MQSeries was renamed: it fits into the WebSphere suite as the implementing engine underneath the JMS API. From what I understand MQ is one of the "best-of-breed" products that JMS was designed around. (Sun's term, bit hypey for my taste). In one way, Java is easy: the last thing you want to get into is yet another marshaling standard. There are plenty of "standards" to choose from already, each married to one particular communications mechanism: RPC, EDI, CORBA, D-Bus, XMLRPC, what have you. Even postgres has its own. I'd say the most successful mechanism is TCP itself, because it isolates itself from content representation so effectively. It's hard not to get into marshaling: someone has to do it, and it's often a drag to do it in the application, but the way things stand now *any* choice limits the usefulness of what you're building. That's something I'd like to see change. Personally I'd love to see marshaling or low-level data representation isolated into a mature component that speaks multiple programming languages on the one hand and multiple data representation formats on the other. Something the implementers of some of these messaging standards would want to use to compose their messages, isolating their format definitions into plugins. Something that would make application writers stop composing messages in finicky ad-hoc code that fails with unexpected locales or has trouble with different line breaks. If we had a component like that, combining it with existing transactional variants of TCP and [S]HTTP might even be enough to build a usable messaging system. I haven't looked at them enough to know. Of course we'd need implementations of those protocols; see http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/ for example. Another box of important tools, and I have no idea where we stand with this one, is transaction management. We have 2-phase commit in postgres now. But do we have interoperability with existing transaction managers? Is there a decent free, portable, everything-agnostic transaction manager? With those, the sphere of reliability of a database-driven messaging package could extend much further. A free XA-capable filesystem would be great too, but I guess I'm daydreaming. > There tend to be varying semantics out there: > > - Some queues may represent "subscriptions" where a whole bunch of > listeners want to get all the messages; The two simplest models that offer something more than TCP/UDP are 1:n reliable publish-subscribe without persistence, and 1:1 request-reply with persistent storage. D-Bus does them both; IIRC MQ does 1:1 and has add-ons on top for publish-subscribe. I could imagine variations such as persistent publish-subscribe, where you can come back once in a while and see if your subscriptions caught anything since your last visit. But such things probably get more complex and less useful as you add more ideas. On top of that goes communication model: symmetric or asymmetric, synchronous or asynchronous. Do you end up with a "remote procedure call" model like RPC, D-Bus, CORBA? Or do you stick with a pure message/event view of communication? Personally I think it's good not to intrude into the application's event loop too much, but others seem to feel the central event loop should not be part of application code. > - Sometimes you have the semantics where: > - messages need to be delivered at least once > - messages need to be delivered no more than once > - messages need to be delivered exactly once IMHO, if you're not doing "exactly once," or something very close to it, you might as well stay with ad-hoc code. You can ensure single delivery by having the sender re-send when in doubt, and keeping track of duplications in the recipient. > Is there any existing work out there on this? Or should I maybe be > looking at prototyping something? I've looked around a bit (not much) and not found anything very generally useful. I think it's an exciting area that probably needs work, so prototyping might be a good idea. If nothing else, I hope I've given you some examples of what you don't want to get yourself into. :-) Jeroen ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Hi Chris, Chris Browne wrote: > I'm seeing some applications where it appears that there would be > value in introducing asynchronous messaging, ala "message queueing." > <http://en.wikipedia.org/wiki/Message_queue> ISTM that 'message queue' is a way too general term. There are hundreds of different queues at different levels on a standard server. So I'm somewhat unsure about what problem you want to solve. > c) There are lesser names, like isectd <http://isectd.sf.net> and the > (infamous?) Spread Toolkit which both implement memory-based messaging > systems. If a GCS is about what you're looking for, then you also might want to consider these: ensemble, appia or jGroups. There's a Java layer called jGCS, which supports even more, similar systems. Another commonly used term is 'reliable multicast', which guarantees that messages are delivered to a group of recipients. These algorithms often are the basis for a GCS. > My bias would be to have something that can basically run as a thin > set of stored procedures atop PostgreSQL :-). It would be trivial to > extend that to support SOAP/XML-RPC, if desired. Hm.. in Postgres-R I currently have (partial) support for ensemble and spread. Exporting that interface via stored procedures could be done, but you would probably need a manager process, as you certainly want your connections to persist across transactions (or not?). Together with that process, we already have half of what Postgres-R is: an additional process which connects to the GCS. Thus I'm questioning, if there's value for exporting the interface. Can you think of other use cases than database replication? Why do you want to do that via the database, then, and not directly with the GCS? > It would be nice to achieve 'higher availability' by having queues > where you might replicate the contents (probably using the MQ system > itself ;-)) to other servers. Uhm.. sorry, but I fail to see the big news here. Which replication solution does *not* work that way? Regards Markus ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On 6/20/07, Jeroen T. Vermeulen <jtv@xs4all.nl> wrote: > On Wed, June 20, 2007 04:45, Chris Browne wrote: > > - Sometimes you have the semantics where: > > - messages need to be delivered at least once > > - messages need to be delivered no more than once > > - messages need to be delivered exactly once > > IMHO, if you're not doing "exactly once," or something very close to it, > you might as well stay with ad-hoc code. You can ensure single delivery > by having the sender re-send when in doubt, and keeping track of > duplications in the recipient. In case of PGQ, the "at least once" semantics is related to batch-based processing it does - in case of failure, full batch is delivered again, so if consumer had managed to process some of the items already, it gets them double. As it is responsponsible only for delivering events from database, it has no way of guaranteeing "exactly once" behaviour, that needs to be built on top of PGQ. Simplest case would be if the events are processed in same database that the queue resides. Then you can just fetch, process, close batch in one transaction and immidiately you get "exactly once" behaviour. To achieve "exactly once" behaviour with different databases, look at the "pgq_ext" module for sample. Basically it just requires storing batch_id/event_id on remote db and committing there first. Later it can be checked if the batch/event is already processed. It's tricky only if you want to achieve full transactionality for event processing. As I understand, JMS does not have a concept of transactions, probably also other solutions mentioned before, so to use PgQ as backend for them should be much simpler... To Chris: you should like PgQ, its just stored procs in database, plus it's basically just generalized Slony-I, with some optimizations, so should be familiar territory -- marko ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Marko Kreen wrote: > As I understand, JMS does not have a concept > of transactions, probably also other solutions mentioned before, > so to use PgQ as backend for them should be much simpler... JMS certainly does have the concept of transactions. Both distributed ones through XA and two-phase commit, and local involving just one JMS provider. I don't know about others, but would be surprised if they didn't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On 6/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Marko Kreen wrote: > > As I understand, JMS does not have a concept > > of transactions, probably also other solutions mentioned before, > > so to use PgQ as backend for them should be much simpler... > > JMS certainly does have the concept of transactions. Both distributed > ones through XA and two-phase commit, and local involving just one JMS > provider. I don't know about others, but would be surprised if they didn't. Ah, sorry, my mistake then. Shouldn't trust hearsay -- marko ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote: > Marko Kreen wrote: >> As I understand, JMS does not have a concept >> of transactions, probably also other solutions mentioned before, >> so to use PgQ as backend for them should be much simpler... > > JMS certainly does have the concept of transactions. Both distributed > ones through XA and two-phase commit, and local involving just one JMS > provider. I don't know about others, but would be surprised if they > didn't. Wait... I thought XA did two-phase commit, and then there was XA+ for *distributed* two-phase commit, which is much harder? Jeroen ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| Jeroen T. Vermeulen wrote: > On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote: >> Marko Kreen wrote: >>> As I understand, JMS does not have a concept >>> of transactions, probably also other solutions mentioned before, >>> so to use PgQ as backend for them should be much simpler... >> JMS certainly does have the concept of transactions. Both distributed >> ones through XA and two-phase commit, and local involving just one JMS >> provider. I don't know about others, but would be surprised if they >> didn't. > > Wait... I thought XA did two-phase commit, and then there was XA+ for > *distributed* two-phase commit, which is much harder? Well, I meant distributed as in one transaction manager, multiple resource managers, all participating in a single atomic transaction. I don't know what XA+ adds on top of that. To be precise, being a Java-thing, JMS actually supports two-phase commit through JTA (Java Transaction API), not XA. It's the same design and interface, just defined as Java interfaces instead of at native library level. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |