Unix Technical Forum

Postgresql replication

This is a discussion on Postgresql replication within the Pgsql General forums, part of the PostgreSQL category; --> On Thu, 2005-08-25 at 17:45 -0700, Jeff Davis wrote: > The replicator surely is not optional, and must be ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #41 (permalink)  
Old 04-09-2008, 05:41 AM
Matt Miller
 
Posts: n/a
Default Re: Postgresql replication

On Thu, 2005-08-25 at 17:45 -0700, Jeff Davis wrote:
> The replicator surely is not optional, and must be centralized.


>From http://pgcluster.projects.postgresql..._3/index.html:


"Several replication server can be set up. When an problem occurs at the
replication server, Cluster DB automatically changes connection to the
standing-by replication server. When all replication servers stop,
Cluster DB operates in stand-alone mode. There are two modes for
stand-alone mode: read only mode which only permits references, and read
write mode, which permits updating."

> PGCluster is query-based, right?


Yes.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #42 (permalink)  
Old 04-09-2008, 05:41 AM
William Yu
 
Posts: n/a
Default Re: Postgresql replication

Chris Browne wrote:
> I'm pretty sure that they _don't_ track balance updates for each
> transaction that applies to a customer's account. You could, via one
> form of trickery or another, "overdraw" your account by a fairly hefty
> amount, and they probably won't notice for a day or even three. But
> once they notice/estimate that the Gentle Caller has built up some
> dangerously high balance, they'll warn of impending discontinuation of
> service if some sum isn't paid by some date.


This works for companies that have some degree of power over their
customers. E.g. pay up or we disconnect your service. Return your
overdrafts/pay your fees or we mess up your credit.

This doesn't work if it's a small company who's account has been
emptied. Sure the bank will refuse to honor the check but then that
company will be hit with overdraw penalties and possible legal penalties
to the payee for the bounced check.

The warning threshold system is easy to implement but there will always
be corner cases where the warning is not soon enough or a single payment
wipes out ~ 100% of the account. Warn too often and it'll be ignored by
people as a "boy crying wolf" alarm.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #43 (permalink)  
Old 04-09-2008, 05:41 AM
William Yu
 
Posts: n/a
Default Re: Postgresql replication

Chris Travers wrote:
> I guess I am thinking along different lines than you. I was thinking
> that the simplest solution would be to have master/slave replication for
> *approved* transactions only and no replication for initial commits
> prior to approval. This makes the assumption that a single transaction
> will be committed on a single server, and that a single transaction will
> not be split over multiple servers. In this way, you can commit a
> pending transaction to any single server, and when it is approved, it
> gets replicated via the master. See below for more.


This works if you don't care that multiple servers commit transactions
that force a budget or bank account to be exceeded.


> Thinking about this.... The big issue is that you only want to
> replicate the deltas, not the entire account. I am still thinking
> master/slave, but something where the deltas are replicated in the
> background or where the user, in checking his account, is actually
> querying the home server. This second issue could be done via dblink or
> DBI-Link and would simply require that a master table linking the
> accounts with home servers be replicated (this should, I think, be
> fairly low-overhead).


Except what you know have is your system fails if any server fail or is
inaccessible.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #44 (permalink)  
Old 04-09-2008, 05:41 AM
Chris Travers
 
Posts: n/a
Default Re: Postgresql replication

William Yu wrote:

> Chris Travers wrote:
>
>> I guess I am thinking along different lines than you. I was thinking
>> that the simplest solution would be to have master/slave replication
>> for *approved* transactions only and no replication for initial
>> commits prior to approval. This makes the assumption that a single
>> transaction will be committed on a single server, and that a single
>> transaction will not be split over multiple servers. In this way,
>> you can commit a pending transaction to any single server, and when
>> it is approved, it gets replicated via the master. See below for more.

>
>
> This works if you don't care that multiple servers commit transactions
> that force a budget or bank account to be exceeded.
>

Ok. then lets look at possibility B. (Async Multimaster Replication is
out).

>
>> Thinking about this.... The big issue is that you only want to
>> replicate the deltas, not the entire account. I am still thinking
>> master/slave, but something where the deltas are replicated in the
>> background or where the user, in checking his account, is actually
>> querying the home server. This second issue could be done via dblink
>> or DBI-Link and would simply require that a master table linking the
>> accounts with home servers be replicated (this should, I think, be
>> fairly low-overhead).

>
>
> Except what you know have is your system fails if any server fail or
> is inaccessible.
>

Ok. If you have a system where each location is authoritative for its
customers and the server transparently queries that server where needed
(via a view). then when any server becomes inaccessible then the
customers whose accounts are on that server become inaccessible. This
may not be accessible. But if this is the case, then you could treat
this as a partitioned table, where each partition is authoritative on
one location (see a previous post on how to do this) and then use Slony
to replicate. Again this does not get around your objection above
namely that it is possible to do duplicate transactions at multiple
locations. For this you would need an explicit call to the
authoritative server. I see no other way around that. This might allow
people to view tentative balances from other branches if the home (or
its connection) server is down, but they would not be able to withdraw
funds.

But if you do this, you have a different problem. Namely that
replication will be occuring over your slower than desired links. As
the number of branches grow, so will the bandwidth demands on every
branch. This may not therefore be scalable.

Unfortunately there is no free lunch here. And I think that at some
point you are going to have to choose between:

1) Efficiency of network throughput
2) Tolerance to attempts at repeat transactions before replication
(emptying an account multiple times)
3) Availability of a transaction.

You can pick any two. I think that this is the way it will work with
any other database system as well.

Best Wishes,
Chris Travers
Metatron Technology Consulting

> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #45 (permalink)  
Old 04-09-2008, 05:41 AM
William Yu
 
Posts: n/a
Default Re: Postgresql replication

Chris Travers wrote:
> 1) Efficiency of network throughput
> 2) Tolerance to attempts at repeat transactions before replication
> (emptying an account multiple times)
> 3) Availability of a transaction.


We ended up having to give up #1. It's possible to have our transactions
routed to multiple servers before it becomes a final transaction. User1
might request a payment on ServerA. User2 then marks the payment as
approved on ServerB. ServerC is authoritative and checks the bank/budget
balances before posting as final. After each of these steps requires
replication of the latest changes to all other servers. (In theory, the
first 2 steps only require replication to the authoritative server but
we do so anyways so all servers can act as backups for each other --
pending transactions still need to be restored in case of total DB failure.)

There's definitely a delay in terms of getting from point A to point Z;
duplicate servers. But there's guaranteed financial integrity, users can
connect to any server the load balancer picks and no server requires any
other server/connection to be up for individual user tranactions to occur.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #46 (permalink)  
Old 04-09-2008, 05:42 AM
Chris Travers
 
Posts: n/a
Default Re: Postgresql replication

William Yu wrote:

> Chris Travers wrote:
>
>> 1) Efficiency of network throughput
>> 2) Tolerance to attempts at repeat transactions before replication
>> (emptying an account multiple times)
>> 3) Availability of a transaction.

>
>
> We ended up having to give up #1. It's possible to have our
> transactions routed to multiple servers before it becomes a final
> transaction. User1 might request a payment on ServerA. User2 then
> marks the payment as approved on ServerB. ServerC is authoritative and
> checks the bank/budget balances before posting as final. After each of
> these steps requires replication of the latest changes to all other
> servers. (In theory, the first 2 steps only require replication to the
> authoritative server but we do so anyways so all servers can act as
> backups for each other -- pending transactions still need to be
> restored in case of total DB failure.)


Ok. I see where you are going with this.

It is an interesting problem. Multimaster Async Replication will give
you a problem in that it allows the attack you are describing due to the
asynchronous nature of the replication. If I were trying to prevent
this sort of attack, I would try to build into this some sort of
"account authority" which can manage these transactions. Origionally I
was thinking of the home server as the obvious place to start if it is
available. But if it is not, then you would need some infrastructure to
track attempted withdrawals and handle them appropriately. Such servers
could cache requests and if they see duplicates or many requests coming
from many servers on the same account could flag that. One option might
be to have a rotational authority (i.e. home-server, then next, then
next) in a ring so that an unavailable server still allows reasonable
precautions to be held against emptying the account many times.

Basically, if the servers aren't talking to eachother at the time of the
transaction, then they are not going to know about duplicates. You have
to have some setup beyond your replication to handle this. Otherwise
you will have some issues with delays causing the security risks you
deem unacceptable.

Your question seems to be "How do I set up multimaster async replication
such that a person cannot empty his account on each server" and the
answer is that this is an inherent limitation of multimaster async
replication. This also means that you will have to have some sort of
other verification process for such transactions beyond what is locally
available on the replicants.

> There's definitely a delay in terms of getting from point A to point
> Z; duplicate servers. But there's guaranteed financial integrity,
> users can connect to any server the load balancer picks and no server
> requires any other server/connection to be up for individual user
> tranactions to occur.


The delay will by definition defeat any guarantee of financial integrity
if you are allowing read-write operations to the replica without
checking with some sort of central authority. At very least, the
central authority should look for suspicious patterns. Again, it may be
possible to do some failover here, but I don't think you can do without
*some* sort of centralized control.

(Note, here load balancing is handled by the distribution of accounts.
A down server simply means that the next server in the ring will take
over its remote verification role).

This doesn;t make the security issue go away, but it may reduce it to an
acceptable level. I.e. it is still possible for duplicates to be
submitted just before and after a home server goes down, but this is a
lot better than being able to have one transaction repeated on each
server and then dealing with the massively overdrawn account.

Best Wishes,
Chris Travers
Metatron Technology Consulting

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #47 (permalink)  
Old 04-09-2008, 05:42 AM
William Yu
 
Posts: n/a
Default Re: Postgresql replication

Our own personal IM

Chris Travers wrote:
> The delay will by definition defeat any guarantee of financial integrity
> if you are allowing read-write operations to the replica without
> checking with some sort of central authority. At very least, the
> central authority should look for suspicious patterns. Again, it may be
> possible to do some failover here, but I don't think you can do without
> *some* sort of centralized control.


Actually this is the easy part. When the home server finally issues
payments, it only issues what it knows about and what can be verified as
OK. Any transactions that are currently being entered on another server
will appear after the next replication cycle and it will be verified
afterwards. If the previous payment issuing cycle used up all the money,
the "new" requests are kept in pending until money is put in. This does
allow for newer requests that happen to be executed on home servers to
possibly take precendence over old requests but there is no requirement
in the business process that payments must come out in any specific order.


> This doesn;t make the security issue go away, but it may reduce it to an
> acceptable level. I.e. it is still possible for duplicates to be
> submitted just before and after a home server goes down, but this is a
> lot better than being able to have one transaction repeated on each
> server and then dealing with the massively overdrawn account.


The "home" server going down is the trickiest issue. Because when a
server disappears, is that because it went down temporarily? For good? A
temporary internet problem where nobody can get access to it? Or an
internet routing issue where just the connection between those two
servers is severed? If it's the last, users might still be doing stuff
on ServerA with ServerA is posting financials but ServerB thinks the
server is down and decides to take over ServerA's duties. Of course, in
ServerA's view, it's ServerB and ServerC that's down -- not itself.

Maybe we can mitigate this by having more servers at more data centers
around the world so everybody can monitor everybody. At some point, if
you have N servers and N-1 servers say ServerA is down, it probably is
down. With a high enough N, ServerA could probably decisively decide it
was the server severed from the internet and refuse to post any
financials until connection to the outside world was restore + some
extra threshold.

This last problem, which luckily occurs rarely, we do by hand right now.
We're not ready to run this on full auto because we only have 2 data
centers (with multiple servers within each data center). The servers do
not have enough info to know which server is actually down in order to
auto-promote/demote. It does require staff that's not just in 1 location
though because our primary office going down w/ our local datacenter
would mean nobody there could do the switchover. (Assuming major natural
disaster that kept us from using our laptops at the local Starbucks to
do the work.)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #48 (permalink)  
Old 04-09-2008, 05:42 AM
Chris Browne
 
Posts: n/a
Default Re: Postgresql replication

William Yu <wyu@talisys.com> writes:
> Chris Browne wrote:
>> I'm pretty sure that they _don't_ track balance updates for each
>> transaction that applies to a customer's account. You could, via one
>> form of trickery or another, "overdraw" your account by a fairly hefty
>> amount, and they probably won't notice for a day or even three. But
>> once they notice/estimate that the Gentle Caller has built up some
>> dangerously high balance, they'll warn of impending discontinuation of
>> service if some sum isn't paid by some date.

>
> This works for companies that have some degree of power over their
> customers. E.g. pay up or we disconnect your service. Return your
> overdrafts/pay your fees or we mess up your credit.
>
> This doesn't work if it's a small company who's account has been
> emptied. Sure the bank will refuse to honor the check but then that
> company will be hit with overdraw penalties and possible legal
> penalties to the payee for the bounced check.
>
> The warning threshold system is easy to implement but there will
> always be corner cases where the warning is not soon enough or a
> single payment wipes out ~ 100% of the account. Warn too often and
> it'll be ignored by people as a "boy crying wolf" alarm.


In a context where there is no "degree of power over their customers,"
I would hardly think that the presence/lack of automated controls or
presence/lack of balance synchronization is particularly material.

In other words, if trying to apply policies is going to forcibly
break, then building the data system may have been a futile exercise
in the first place. And trying to distribute the system may again be
a *business* mistake that admits no technical solution.

If that's the case, then we can conclude that replication is no
solution, and that the organization should improve connectivity for
their single centralized system.

That may be the case, but is the uninteresting case, as it amounts to
throwing our hands up, saying "no answers; you can't have a
distributed system," and giving up on trying anything further. No
point to further discussion...
--
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/linux.html
"There is no reason anyone would want a computer in their home".
-- Ken Olson, Pres. and founder of Digital Equipment Corp. 1977
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #49 (permalink)  
Old 04-09-2008, 05:44 AM
Andrew Sullivan
 
Posts: n/a
Default Re: Postgresql replication

On Thu, Aug 25, 2005 at 01:44:15PM +0200, Bohdan Linda wrote:
> there are some other db solutions which have good performance when doing
> this kind of replication across the world.


Bluntly, "No."


--
Andrew Sullivan | ajs@crankycanuck.ca
Information security isn't a technological problem. It's an economics
problem.
--Bruce Schneier

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #50 (permalink)  
Old 04-09-2008, 05:44 AM
Andrew Sullivan
 
Posts: n/a
Default Re: Postgresql replication

On Wed, Aug 24, 2005 at 11:54:42AM -0400, Chris Browne wrote:
> There is a Slony-II project ongoing that is trying to construct a
> more-or-less synchronous multimaster replication system (where part of
> the cleverness involves trying to get as much taking place in an
> asynchronous fashion as possible) that would almost certainly be of no
> use to your "use case."


Just to emphasise this point: assuming we ever get Slony-II to work,
it is all but guaranteed to be useless for cases like the one that
started this thread: it'll simply require very fast network
connections to work. I've had more than one person ask me when
multi-site multimaster is coming, and my answer is always, "Have you
started work on it yet?" I think there might be a way to hack up
Slony-I to do it -- Josh Berkus gave me a quick outline while at
OSCON that made me think it possible -- but AFAIK, nobody is actually
doing that work.

It's worth noting that single-origin master-slave async replication
is tricky, but by no means impossible. Multi-master _anything_ is
hard, no question about it; and it more or less always imposes some
overhead that you won't like. The question is merely whether you
want to pay that price.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:29 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com