Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-13-2008, 06:16 PM
Francisco Reyes
 
Posts: n/a
Default RAID controllers for Postgresql on large setups

Inheritted a number of servers and I am starting to look into the hardware.

So far what I know from a few of the servers
Redhat servers.
15K rpm disks, 12GB to 32GB of RAM.
Adaptec 2120 SCSI controller (64MB of cache).

The servers have mostly have 12 drives in RAID 10.
We are going to redo one machine to compare RAID 10 vs RAID 50.
Mostly to see if the perfomance is close, the space gain may be usefull.

The usage pattern is mostly large set of transactions ie bulk loads of
millions of rows, queries involving tens of millions of rows. There are
usually only a handfull of connections at once, but I have seen it go up to
10 in the few weeks I have been at the new job. The rows are not very wide.
Mostly 30 to 90 bytes. The few that will be wider will be summary tables
that will be read straight up without joins and indexed on the fields we
will be quering them. Most of the connections will all be doing bulk
reads/updates/writes.

Some of the larger tables have nearly 1 billion rows and most have tens of
millions. Most DBs are under 500GB, since they had split the data as to keep
each machine somewhat evenly balanced compared to the others.

I noticed the machine we are about to redo doesn't have a BBU.

A few questions.
Will it pay to go to a controller with higher memory for existing machines?
The one machine I am about to redo has PCI which seems to
somewhat limit our options. So far I have found another Adaptec controller,
2130SLP, that has 128MB and is also just plain PCI. I need to decide whether
to buy the BBU for the 2120 or get a new controller with more memory and a
BBU. For DBs with bulk updates/inserts is 128MB write cache even enough to
achieve reasonable rates? (ie at least 5K inserts/sec)

A broader question
For large setups (ie 500GB+ per server) does it make sense to try to get a
controller in a machine or do SANs have better throughput even if at a much
higher cost?

For future machines I plan to look into controllers with at least 512MB,
which likely will be PCI-X/PCI-e.. not seen anything with large caches for
PCI. Also the machines in question have SCSI drives, not SAS. I believe the
most recent machine has SAS, but the others may be 15K rpm scsi

Whether a SAN or just an external enclosure is 12disk enough to substain 5K
inserts/updates per second on rows in the 30 to 90bytes territory? At
5K/second inserting/updating 100 Million records would take 5.5 hours. That
is fairly reasonable if we can achieve. Faster would be better, but it
depends on what it would cost to achieve.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 05-13-2008, 06:16 PM
Joshua D. Drake
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

On Mon, 12 May 2008 22:04:03 -0400
Francisco Reyes <lists@stringsutils.com> wrote:

> Inheritted a number of servers and I am starting to look into the
> hardware.
>
> So far what I know from a few of the servers
> Redhat servers.
> 15K rpm disks, 12GB to 32GB of RAM.
> Adaptec 2120 SCSI controller (64MB of cache).
>
> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare RAID 10 vs RAID 50.
> Mostly to see if the perfomance is close, the space gain may be
> usefull.


Most likely you have a scsi onboard as well I am guessing. You
shouldn't bother with the 2120. My tests show it is a horrible
controller for random writes.

Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
using hardware raid 10, the software raid completely blew the adaptec
away.

Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIKPjGATb/zqfZUUQRAs2wAKCiMSwYFF2dYYw4+v7FtqskIYBtswCdGdFd
UD1yjsleqkk446IiYaol4+g=
=+MUg
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-13-2008, 06:16 PM
Francisco Reyes
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

Joshua D. Drake writes:

> Most likely you have a scsi onboard as well I am guessing.


Will check.


> shouldn't bother with the 2120. My tests show it is a horrible
> controller for random writes.


Thanks for the feedback..

> Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
> using hardware raid 10, the software raid completely blew the adaptec
> away.


Any PCI controller you have had good experience with?
How any other PCI-X/PCI-e controller that you have had good results?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 05-13-2008, 06:16 PM
Chris Ruprecht
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

Joshua,

did you try to run the 345 on an IBM ServeRAID 6i?
I have one in mine, but I never actually ran any speed test.
Do you have any benchmarks that I could run and compare?

best regards,
chris
--
chris ruprecht
database grunt and bit pusher extraordinaíre


On May 12, 2008, at 22:11, Joshua D. Drake wrote:

> On Mon, 12 May 2008 22:04:03 -0400
> Francisco Reyes <lists@stringsutils.com> wrote:
>
>> Inheritted a number of servers and I am starting to look into the
>>


[snip]

> Comparing software raid on an LSI onboard for an IBM 345 versus a
> 2120s
> using hardware raid 10, the software raid completely blew the adaptec
> away.


[more snip]
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 05-13-2008, 06:16 PM
Joshua D. Drake
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

Chris Ruprecht wrote:
> Joshua,
>
> did you try to run the 345 on an IBM ServeRAID 6i?


No the only controllers I had at the time were the 2120 and the LSI on
board that is limited to RAID 1. I put the drives on the LSI in JBOD and
used Linux software raid.

The key identifier for me was using a single writer over 6 (RAID 10)
drives with the 2120 I could get ~ 16 megs a second. The moment I went
to multiple writers it dropped exponentially.

However with software raid I was able to sustain ~ 16 megs a second over
multiple threads. I stopped testing at 4 threads when I was getting 16
megs per thread . I was happy at that point.


Joshua D. Drake



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 05-13-2008, 06:16 PM
Joshua D. Drake
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

Francisco Reyes wrote:
> Joshua D. Drake writes:
>
>
> Any PCI controller you have had good experience with?


I don't have any PCI test data.

> How any other PCI-X/PCI-e controller that you have had good results?


http://www.commandprompt.com/blogs/j...or_postgresql/

If you are digging for used see if you can pick up a 64xx series from
HP. A very nice card that can generally be had for reasonable dollars.

http://cgi.ebay.com/HP-Compaq-SMART-...QQcmdZViewItem

If you want new, definitely go with the P800.

Sincerely,

Joshua D. Drake


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 05-13-2008, 06:16 PM
Greg Smith
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

On Mon, 12 May 2008, Francisco Reyes wrote:

> We are going to redo one machine to compare RAID 10 vs RAID 50. Mostly to
> see if the perfomance is close, the space gain may be usefull.


Good luck with that, you'll need it.

> Will it pay to go to a controller with higher memory for existing
> machines? The one machine I am about to redo has PCI which seems to
> somewhat limit our options. So far I have found another Adaptec
> controller, 2130SLP, that has 128MB and is also just plain PCI. I need
> to decide whether to buy the BBU for the 2120 or get a new controller
> with more memory and a BBU.


These options are both pretty miserable. I hear rumors that Adaptec makes
controllers that work OK under Linux , I've never seen one. A quick
search suggests both the 2120 and 2130SLP are pretty bad. The suggestions
Joshua already gave look like much better ideas.

Considering your goals here, I personally wouldn't put a penny into a
system that wasn't pretty modern. I think you've got too aggressive a
target for database size combined with commit rate to be playing with
hardware unless it's new enough to support PCI-Express cards.

> For DBs with bulk updates/inserts is 128MB write cache even enough to
> achieve reasonable rates? (ie at least 5K inserts/sec)


This really depends on how far the data is spread across disk. You'll
probably be OK on inserts. Let's make a wild guess and say we fit 80
100-byte records in each 8K database block. If you have 5000/second,
that's 63 8K blocks/second which works out to 0.5MB/s of writes. Pretty
easy, unless there's a lot of indexes involved as well. But an update can
require reading in a 8K block, modifying it, then writing another back out
again. In the worst case, if your data was sparse enough (which is
frighteningly possible when I hear you mention a billion records) that
every update was hitting a unique block, 5K/sec * 8K = 39MB/second of
reads *and* writes. That doesn't sound like horribly much, but that's
pretty tough if there's a lot of seeking involved in there.

Now, in reality, many of your small records will be clumped into each
block on these updates and a lot of writes are deferred until checkpoint
time which gives more time to aggregate across shared blocks. You'll
actually be somewhere in the middle of 0.5 and 78MB/s, which is a pretty
wide range. It's hard to estimate too closely here without a lot more
information about the database, the application, what version of
PostgreSQL you're using, all sorts of info.

You really should be thinking in terms of benchmarking the current
hardware first to try and draw some estimates you can extrapolate from.
Theoretical comments are a very weak substitute for real-world
benchmarking on the application itself, even if that benchmarking is done
on less capable hardware. Run some tests, measure your update rate while
also measuring real I/O rate with vmstat, compare that I/O rate to the
disk's sequential/random performance as measured via bonnie++, and now
there's a set of figures that mean something you can estimate based on.

> For large setups (ie 500GB+ per server) does it make sense to try to get a
> controller in a machine or do SANs have better throughput even if at a much
> higher cost?


That's not a large setup nowadays, certainly not large enough that a SAN
would be required to get reasonable performance. You may need an array
that's external to the server itself, but a SAN includes more than just
that.

There are a lot of arguments on both sides for using SANs; see
http://wiki.postgresql.org/wiki/Direct_Storage_vs._SAN for a summary and
link to recent discussion where this was thrashed about heavily. If
you're still considering RAID5 and PCI controllers you're still a bit in
denial about the needs of your situation here, but jumping right from
there to assuming you need a SAN is likely overkill.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 05-13-2008, 06:16 PM
Francisco Reyes
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

PFC writes:

> You say that like you don't mind having PCI in a server whose job is to
> perform massive query over large data sets.


I am in my 4th week at a new job. Trying to figure what I am working with.
From what I see I will likely get as much improvement from new hardware as
from re-doing some of the database design. Can't get everything done at
once, not to mention I have to redo one machine sooner rather than later so
I need to prioritize.

>In fact for bulk IO a box with 2 SATA drives would be just as fast as
> your monster RAID, lol.


I am working on setting up a standard test based on the type of operations
that the company does. This will give me a beter idea. Specially I will work
with the developers to make sure the queries I create for the benchmark are
representative of the workload.

>Adding more drives will help random reads/writes but do nothing for
> throughput since the tiny PCI pipe is choking.


Understood, but right now I have to use the hardware they already have. Just
trying to make the most of it. I believe another server is due in some
months so then I can better plan.

In your opinion if we get a new machine with PCI-e, at how many spindles
will the SCSI random access superiority start to be less notable? Specially
given the low number of connections we usually have running against these
machines.

>If you mean doing large COPY or inserting/updating lots of rows using one
> SQL statement, you are going to need disk bandwidth.


We are using one single SQL statement.

> http://tweakers.net/reviews/557/17/c...pagina-17.html


I have heard great stories about Areca controllers. That is definitely one
in my list to research and consider.

> However RAID5 will choke and burn on small random writes, which will come
> from UPDATing random rows in a large table, updating indexes, etc. Since
> you are doing this apparently, RAID5 is therefore NOT advised !


I thought I read a while back in this list that as the number of drives
increased that RAID 5 was less bad. Say an external enclosure with 20+
drives.


>Have you considered Bizgres ?


Yes. In my todo list, to check it further. I have also considered Greenplums
may DB offering that has clustering, but when I initially mentioned it there
was some reluctance because of cost. Also will look into Enterprise DB.

Right now I am trying to learn usage patterns, what DBs need to be
re-designed and what hardware I have to work with. Not to mention learning
what all these tables are. Also need to make time to research/get a good
ER-diagram tool and document all these DBs.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 05-13-2008, 06:16 PM
Merlin Moncure
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

On Tue, May 13, 2008 at 8:00 AM, Francisco Reyes <lists@stringsutils.com> wrote:
> PFC writes:
>
>
> > You say that like you don't mind having PCI in a server whose job

> is to perform massive query over large data sets.
> >

>
> I am in my 4th week at a new job. Trying to figure what I am working with.
> From what I see I will likely get as much improvement from new hardware as
> from re-doing some of the database design. Can't get everything done at
> once, not to mention I have to redo one machine sooner rather than later so
> I need to prioritize.
>
>
>
> > In fact for bulk IO a box with 2 SATA drives would be just as fast as

> your monster RAID, lol.
> >

>
> I am working on setting up a standard test based on the type of operations
> that the company does. This will give me a beter idea. Specially I will work
> with the developers to make sure the queries I create for the benchmark are
> representative of the workload.
>
>
>
> > Adding more drives will help random reads/writes but do nothing for

> throughput since the tiny PCI pipe is choking.
> >

>
> Understood, but right now I have to use the hardware they already have.
> Just trying to make the most of it. I believe another server is due in some
> months so then I can better plan.
>
> In your opinion if we get a new machine with PCI-e, at how many spindles
> will the SCSI random access superiority start to be less notable? Specially
> given the low number of connections we usually have running against these
> machines.
>
>
>
> > However RAID5 will choke and burn on small random writes, which

> will come from UPDATing random rows in a large table, updating indexes,
> etc. Since you are doing this apparently, RAID5 is therefore NOT advised !
> >

>
> I thought I read a while back in this list that as the number of drives
> increased that RAID 5 was less bad. Say an external enclosure with 20+
> drives.


maybe, but I don't think very many people run that many drives in a
raid 5 configuration...too dangerous. with 20 drives in a single
volume, you need to be running raid 10 or raid 6. 20 drive raid 50 is
pushing it as well..I'd at least want a hot spare.

merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 05-13-2008, 06:16 PM
Vivek Khera
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups


On May 12, 2008, at 10:04 PM, Francisco Reyes wrote:

> Adaptec 2120 SCSI controller (64MB of cache).
>
> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare RAID 10 vs RAID 50.
> Mostly to see if the perfomance is close, the space gain may be
> usefull.


with only 64Mb of cache, you will see degradation of performance.
from my experience, the adaptec controllers are not the best choice,
but that's mostly FreeBSD experience. And if you don't have a BBU,
you're not benefitting from the write-back cache at all so it is kind
of moot.

If you want to buy a couple of 2230SLP cards with 256Mb of RAM, I have
them for sale. They're identical to the 2130SLP but have two SCSI
channels per card instead of one. they both have BBUs, and are in
working condition. I retired them in favor of an external RAID
attached via Fibre Channel.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:56 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462