Unix Technical Forum

which Xeon processors don't have the context switching problem

This is a discussion on which Xeon processors don't have the context switching problem within the Pgsql Performance forums, part of the PostgreSQL category; --> On Tue, 2007-02-27 at 01:11 +0100, Peter Kovacs wrote: > On 2/26/07, Jeff Davis <pgsql@j-davis.com> wrote: > > On ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 04-19-2008, 09:17 AM
Jeff Davis
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Tue, 2007-02-27 at 01:11 +0100, Peter Kovacs wrote:
> On 2/26/07, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Sun, 2007-02-25 at 23:11 +0100, Peter Kovacs wrote:
> > > A related question:
> > > Is it sufficient to disable write cache only on the disk where pg_xlog
> > > is located? Or should write cache be disabled on both disks?
> > >

> >
> > When PostgreSQL does a checkpoint, it thinks the data pages before the
> > checkpoint have successfully made it to disk.
> >
> > If the write cache holds those data pages, and then loses them, there's
> > no way for PostgreSQL to recover. So use a battery backed cache or turn
> > off the write cache.

>
> Sorry for for not being familar with storage techonologies... Does
> "battery" here mean battery in the common sense of the word - some
> kind of independent power supply? Shouldn't the disk itself be backed
> by a battery? As should the entire storage subsystem?
>


Yes, a battery that can hold power to keep data alive in the write cache
in case of power failure, etc., for a long enough time to recover and
commit the data to disk.

So, a write cache is OK (even for pg_xlog) if it is durable (i.e. on
permanent storage or backed by enough power to make sure it gets there).
However, if PostgreSQL has no way to know whether a write is durable or
not, it can't guarantee the data is safe.

The reason this becomes an issue is that many consumer-grade disks have
write cache enabled by default and no way to make sure the cached data
actually gets written. So, essentially, these disks "lie" and say they
wrote the data, when in reality, it's in volatile memory. It's
recommended that you disable write cache on such a device.

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #22 (permalink)  
Old 04-19-2008, 09:17 AM
Shane Ambler
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

Jeff Davis wrote:

>> Sorry for for not being familar with storage techonologies... Does
>> "battery" here mean battery in the common sense of the word - some
>> kind of independent power supply? Shouldn't the disk itself be backed
>> by a battery? As should the entire storage subsystem?
>>

>
> Yes, a battery that can hold power to keep data alive in the write cache
> in case of power failure, etc., for a long enough time to recover and
> commit the data to disk.


Just to expand a bit - the battery backup options are available on some
raid cards - that is where you would be looking for it. I don't know of
any hard drives that have it built in.

Of cause another reason to have a UPS for the server - keep it running
long enough after the clients have gone down so that it can ensure
everything is on disk and shuts down properly.

> So, a write cache is OK (even for pg_xlog) if it is durable (i.e. on
> permanent storage or backed by enough power to make sure it gets there).
> However, if PostgreSQL has no way to know whether a write is durable or
> not, it can't guarantee the data is safe.
>
> The reason this becomes an issue is that many consumer-grade disks have
> write cache enabled by default and no way to make sure the cached data
> actually gets written. So, essentially, these disks "lie" and say they
> wrote the data, when in reality, it's in volatile memory. It's
> recommended that you disable write cache on such a device.


From all that I have heard this is another advantage of SCSI disks -
they honor these settings as you would expect - many IDE/SATA disks
often say "sure I'll disable the cache" but continue to use it or don't
retain the setting after restart.


--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #23 (permalink)  
Old 04-19-2008, 09:17 AM
Peter Kovacs
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On 2/27/07, Shane Ambler <pgsql@sheeky.biz> wrote:
> Jeff Davis wrote:
>
> >> Sorry for for not being familar with storage techonologies... Does
> >> "battery" here mean battery in the common sense of the word - some
> >> kind of independent power supply? Shouldn't the disk itself be backed
> >> by a battery? As should the entire storage subsystem?
> >>

> >
> > Yes, a battery that can hold power to keep data alive in the write cache
> > in case of power failure, etc., for a long enough time to recover and
> > commit the data to disk.

>
> Just to expand a bit - the battery backup options are available on some
> raid cards - that is where you would be looking for it. I don't know of
> any hard drives that have it built in.
>
> Of cause another reason to have a UPS for the server - keep it running
> long enough after the clients have gone down so that it can ensure
> everything is on disk and shuts down properly.
>
> > So, a write cache is OK (even for pg_xlog) if it is durable (i.e. on
> > permanent storage or backed by enough power to make sure it gets there).
> > However, if PostgreSQL has no way to know whether a write is durable or
> > not, it can't guarantee the data is safe.
> >
> > The reason this becomes an issue is that many consumer-grade disks have
> > write cache enabled by default and no way to make sure the cached data
> > actually gets written. So, essentially, these disks "lie" and say they
> > wrote the data, when in reality, it's in volatile memory. It's
> > recommended that you disable write cache on such a device.

>
> From all that I have heard this is another advantage of SCSI disks -
> they honor these settings as you would expect - many IDE/SATA disks
> often say "sure I'll disable the cache" but continue to use it or don't
> retain the setting after restart.


As far as I know, SCSI drives also have "write cache" which is turned
off by default, but can be turned on (e.g. with the sdparm utility on
Linux). The reason I am so much interested in how write cache is
typically used (on or off) is that I recently ran our benchmarks on a
machine with SCSI disks and those benchmarks with high commit ratio
suffered significantly compared to our previous results
"traditionally" obtained on machines with IDE drives.

I wonder if running a machine on a UPS + 1 hot standby internal PS is
equivalent, in terms of data integrity, to using battery backed write
cache. Instinctively, I'd think that UPS + 1 hot standby internal PS
is better, since this setup also provides for the disk to actually
write out the content of the cache -- as you pointed out.

Thanks
Peter

>
>
> --
>
> Shane Ambler
> pgSQL@Sheeky.Biz
>
> Get Sheeky @ http://Sheeky.Biz
>


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #24 (permalink)  
Old 04-19-2008, 09:17 AM
Ben
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

Just remember that batteries (in both RAID cards and UPSes) wear out
and will eventually have to be replaced. It depends how critical your
data is, but if you only have a UPS, you risk badness in the off
chance that your power fails and you haven't replaced your UPS battery.

On Feb 27, 2007, at 12:27 AM, Peter Kovacs wrote:

> I wonder if running a machine on a UPS + 1 hot standby internal PS is
> equivalent, in terms of data integrity, to using battery backed write
> cache. Instinctively, I'd think that UPS + 1 hot standby internal PS
> is better, since this setup also provides for the disk to actually
> write out the content of the cache -- as you pointed out.



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #25 (permalink)  
Old 04-19-2008, 09:17 AM
Shane Ambler
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

Peter Kovacs wrote:

>> > The reason this becomes an issue is that many consumer-grade disks have
>> > write cache enabled by default and no way to make sure the cached data
>> > actually gets written. So, essentially, these disks "lie" and say they
>> > wrote the data, when in reality, it's in volatile memory. It's
>> > recommended that you disable write cache on such a device.

>>
>> From all that I have heard this is another advantage of SCSI disks -
>> they honor these settings as you would expect - many IDE/SATA disks
>> often say "sure I'll disable the cache" but continue to use it or don't
>> retain the setting after restart.

>
> As far as I know, SCSI drives also have "write cache" which is turned
> off by default, but can be turned on (e.g. with the sdparm utility on
> Linux). The reason I am so much interested in how write cache is
> typically used (on or off) is that I recently ran our benchmarks on a
> machine with SCSI disks and those benchmarks with high commit ratio
> suffered significantly compared to our previous results
> "traditionally" obtained on machines with IDE drives.


Most likely - with write cache, when the drive gets the data it puts it
into cache and then says "yep all done" and you continue on as it puts
it on the disk. But if the power goes out as it's doing that you got
trouble.

The difference between SCSI and IDE/SATA in this case is a lot if not
all IDE/SATA drives tell you that the cache is disabled when you ask it
to but they either don't actually disable it or they don't retain the
setting so you get caught later. SCSI disks can be trusted when you set
this option.

> I wonder if running a machine on a UPS + 1 hot standby internal PS is
> equivalent, in terms of data integrity, to using battery backed write
> cache. Instinctively, I'd think that UPS + 1 hot standby internal PS
> is better, since this setup also provides for the disk to actually
> write out the content of the cache -- as you pointed out.
>


This is covering two different scenarios.
The UPS maintains power in the event of a black out.
The hot standby internal PS maintains power when the first PS dies.

It is a good choice to have both as a PS dying will be just as bad as
losing power without a UPS and the UPS won't save you if the PS goes.

A battery backed raid card sits in between these - as long as the
drive's write cache is off - the raid card will hold data that was sent
to disk until it confirms it is written to disk. The battery backup will
even hold that data until the machine is switched back on when it
completes the writing to disk. That would cover you even if the PS goes.


--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #26 (permalink)  
Old 04-19-2008, 09:17 AM
Jeff Davis
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Tue, 2007-02-27 at 09:27 +0100, Peter Kovacs wrote:
> I wonder if running a machine on a UPS + 1 hot standby internal PS is
> equivalent, in terms of data integrity, to using battery backed write
> cache. Instinctively, I'd think that UPS + 1 hot standby internal PS
> is better, since this setup also provides for the disk to actually
> write out the content of the cache -- as you pointed out.


It's all about the degree of safety. A battery-backed cache on a RAID
controller sits below all of these points of failure:

* External power
* Power supply
* Operating system

and with proper system administration, can recover from any transient
errors in the above. Keep in mind that it can only recover from
transient failures: if you have a long blackout that outlasts your UPS
and cache battery, you can still have data loss. Also, you need a very
responsive system administrator that can make sure that data gets to
disk in case of failure.

Let's say you have a RAID system but you rely on the UPS to make sure
the data hits disk. Well, now if you have an OS crash (caused by another
piece of hardware failing, perhaps), you've lost your data.

If you can afford it (in terms of dollars or performance hit) go with
the safe solution.

Also, put things in context. The chances of failure due to these kinds
of things are fairly low. If it's more likely that someone spills coffee
on your server than the UPS fails, it doesn't make sense to spend huge
amounts of money on NVRAM (or something) to store your data. So identify
the highest-risk scenarios and prevent those first.

Also keep in mind what the cost of failure is: a few hundred bucks more
on a better RAID controller is probably a good value if it prevents a
day of chaos and unhappy customers.

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #27 (permalink)  
Old 04-19-2008, 09:17 AM
Scott Marlowe
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Tue, 2007-02-27 at 13:23, Jeff Davis wrote:
> Also, put things in context. The chances of failure due to these kinds
> of things are fairly low. If it's more likely that someone spills coffee
> on your server than the UPS fails, it doesn't make sense to spend huge
> amounts of money on NVRAM (or something) to store your data. So identify
> the highest-risk scenarios and prevent those first.
>
> Also keep in mind what the cost of failure is: a few hundred bucks more
> on a better RAID controller is probably a good value if it prevents a
> day of chaos and unhappy customers.


Just FYI, I can testify to the happiness a good battery backed caching
RAID controller can bring. I had the only server that survived a
complete power grid failure in the data center where I used to work. A
piece of wire blew out a power conditioner, which killed the other power
conditioner, all three UPSes and the switch to bring the diesel
generator online.

the only problem the pgsql server had coming back up was that it had
remote nfs mounts it used for file storage that weren't able to boot up
fast enough so we just waited a few minutes and rebooted it.

All of our other database servers had to be restored from backup due to
massive data corruption because someone had decided that NFS mounts were
a good idea under databases.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #28 (permalink)  
Old 04-19-2008, 09:18 AM
Bruno Wolff III
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Sun, Feb 25, 2007 at 23:11:01 +0100,
Peter Kovacs <maxottovonstirlitz@gmail.com> wrote:
> A related question:
> Is it sufficient to disable write cache only on the disk where pg_xlog
> is located? Or should write cache be disabled on both disks?


With recent linux kernels you may also have the option to use write
barriers instead of disabling caching. You need to make sure all of
your stacked block devices will handle it and most versions of software
raid (other than 1) won't. This won't be a lot faster, since at sync
points the OS needs to order a cache flush, but it does give the disks a chance
to reorder some commands in between flushes.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #29 (permalink)  
Old 04-19-2008, 09:18 AM
Bruno Wolff III
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Tue, Feb 27, 2007 at 15:35:13 +1030,
Shane Ambler <pgsql@Sheeky.Biz> wrote:
>
> From all that I have heard this is another advantage of SCSI disks -
> they honor these settings as you would expect - many IDE/SATA disks
> often say "sure I'll disable the cache" but continue to use it or don't
> retain the setting after restart.


It is easy enough to tests if your disk lie about disabling the cache.
I doubt that it is all that common for modern disks to do that.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #30 (permalink)  
Old 04-19-2008, 09:18 AM
Bruno Wolff III
 
Posts: n/a
Default Re: Two hard drives --- what to do with them?

On Wed, Feb 28, 2007 at 05:21:41 +1030,
Shane Ambler <pgsql@Sheeky.Biz> wrote:
>
> The difference between SCSI and IDE/SATA in this case is a lot if not
> all IDE/SATA drives tell you that the cache is disabled when you ask it
> to but they either don't actually disable it or they don't retain the
> setting so you get caught later. SCSI disks can be trusted when you set
> this option.


I have some Western Digital Caviars and they don't lie about disabling
write caching.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:24 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com