This is a discussion on Oracle - solving performance problem within the Oracle Database forums, part of the Database Server Software category; --> I am a sysadmin tasked with developing a solution to improve our Oracle 10g performance. I inherited the following ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I am a sysadmin tasked with developing a solution to improve our Oracle 10g performance. I inherited the following server and problem. I'm not a DBA, so if my terminology is off please forgive me, if you are kind enough to help but need more info, I can get you that info. Server Spec: Red Hat Enterprise Server 3.0 2.4.21-27.0.4.ELsmp Dual Intel Xeon CPU @ 3.00GHz w/ hyper threading 12 gigs of RAM 3 x RAID 5 arrays with 7 disks each, attached externally. Each array has its own SCSI channel. Problem: We have a 230 gig database that contains 101 million rows. Indexes are placed on mount point/SCSI channel, data on another mount point/SCSI channel. I'm told that the size of this database can't be changed by our development staff. In fact it will be growing by about 20-30% each year. Certain queries are taking a dogs age to run. One is a mass load of data each day, the other a more critical one is a client query that takes about 2 minutes. We want to bring this value down to something like 5-10 seconds, if thats not possible at least 30-45 seconds or so. We know our bottle necks are I/O in both reads and writes. Things we have tried: - Enable and use Async I/O - Increase SGA to 2.7, jump to 3.7 not done yet. - Upgrade SCSI driver, saw some minor improvement - Other minor DB/query changes Consultant was hired to look at our environment they recommended: - We need 5-10% of db size in SGA, therefore 23 gigs of SGA for today's size. - 64 bit architecture, to allow for huge SGA - Scrap the RAID 5 arrays and build RAID 10 arrays instead, more platters the better. - Recommended some query and index changes, the fixes will help, but won't be overly significant. I need to figure out what we can do to achieve our goals and have it last at least 2 years of increasing data. I've been researching the following. - Change hardware platform to a large Sun box that can take more than 32 gigs of RAM running Solaris. (v890 or something) - Some type of high performance platform from IBM with AIX - Intel/Sun Oracle grid with several nodes, blade server perhaps - Scrap SCSI and use a fiber channel SAN instead, in standalone or Grid setup. Anyone have any opinions on the best course of action? How should one handle this amount of data and maintain performance? Grid seems like a good option, but perhaps not necessary because I/O is our problem, not really CPU. Dumb question, but how do multiple grid members access the same data efficiently? Can the SGA be spread around the members of the Grid .Any caveats? Thanks for any advice you may be able to provide. Mark |
| |||
| On 7 Oct 2005 14:42:52 -0700, mark_sensfan@hotmail.com wrote: >Anyone have any opinions on the best course of action? How should one >handle this amount of data and maintain performance? Actually you have forgotten three things - You need to tune the application - You need to tune the application - You need to tune the application After you have done that - you need to tune the application, because it is unscalable If you can't tune the application because it is canned and bought you need to - dump it - sue the vendor or fire the developers Hardware will never compensate for a badly written application. You obviously have one. A SGA of several Gigs is fully ridiculous. -- Sybrand Bakker, Senior Oracle DBA |
| |||
| Perhaps call in a consultant such as Jonathan Lewis? As Sybrand already mentioned tuning the app sql code is your best bet. other ideas include: Are your database statistics current? How are they being gathered? select owner, trunc(last_analyzed), count(1) from all_tables group by owner, trunc(last_analyzed) / Investigate the use of partitioning so as to reduce the number of blocks fetched (aim for getting partition view elimination working). This is a separately licensed option above and beyond enterprise edition. Investigate the use of parallel query option so as to parallelize large operations to possibly shorten their duration (probably no large gains possible due to existing number of CPUs) (enterprise edition required). Investigate the use of materialized views for aggregation of summary data. Investigate the use of analytic functions in the application SQL (available in standard edition). You didn't mention the filesystems used to support the datafiles (extfs3, 4096 byte block size). Did you configure large file settings when you ran mkfs? What is the block size used for the app tablespaces? There is a paper on hotsos that covers "aligning database block and filesystem block sizes". Your file io might be very inefficient due to misalignment. How much sorting is taking place to temp tablespaces? (find this via statspack) Have you super-sized your pga? (joke) Get a baseline of what is going on from within the database and from within the operating system. Install and configure oracle's "statspack" utility and schedule some snapshots (level 7 is a good start). generate a statspack report and upload it to oraperf.com. hth. good luck. -bdbafh |
| |||
| Throwing dollars at hardware to solve the problem should be the last thing you do, not the first. What if you invest in a big dollar server and performance is still far short of your goal? There might well be much less costly and far more effective solutions. That one client query that you mentioned that is taking too much time - figure out specifically what it is doing during those 2 minutes. The available statistics tools (and consultants) can often show a path to utilizing available resources far more efficiently. You can always run out and buy a megaserver later. My 2 cents worth |
| |||
| I know that on AIX, you can mount the filesystems in concurrent i/o mode, which bypasses the unix filesystem cache and can lead to i/o performance that is close to that of using raw volumes (+20% performance gain compared to "cached" i/o). Don't know if it's available for Linux filesystems though, so you might look into this. Matthias |
| |||
| mark_sensfan@hotmail.com wrote: > > Problem: > Certain queries are taking a dogs age to run. One is a mass load > of data each day, the other a more critical one is a client query that > takes about 2 minutes. We want to bring this value down to something > like 5-10 seconds, if thats not possible at least 30-45 seconds or so. So you want an improvement of 12 to 24 fold. How feasible do you think it is that you can improve IO by a factor of 24? Are you currently using less than 5% of the CPU? When you try to improve something by a factor of 20, you have assume the bottleneck is going to change in the course of your improvements. > We know our bottle necks are I/O in both reads and writes. *How* do you know that? I assume it is read during the client query, and write during the mass load? > - Other minor DB/query changes You probably need major query changes. An improved execution plan can increase performance by orders of magnitude. Can you provide the execution plan and various stats for that query? (Of course, there is always the chance that it is already about as good as it can get, but I think this is the best shot.) Better yet, trace it, too. > > Consultant was hired to look at our environment they recommended: > - We need 5-10% of db size in SGA, therefore 23 gigs of SGA for today's > size. Can they justify this recommendation by refering to specific analyses/experiments they did on your specific system? Or did they just pull it out of their asses with a Rule of Thumb? If the former, what was it? If the latter, how much do you pay people to pull things out of their ass? > - 64 bit architecture, to allow for huge SGA Is there evidence this will help, or is it just a hunch? > - Scrap the RAID 5 arrays and build RAID 10 arrays instead, more > platters the better. Might make a big difference for the bulk load. Probably not so much with your customer's query. (Well, unless we are talking about a heck of a lot more platters.) > - Recommended some query and index changes, the fixes will help, but > won't be overly significant. Is there anyone who thoroughly understands the data? Does the query just fundamentally demand a massive amount of data, either because it returns a massive number of rows or because it aggregates a massive number or rows? If not, then there is a very good chance it could still be improved a lot. > > I need to figure out what we can do to achieve our goals and have it > last at least 2 years of increasing data. I've been researching the > following. > > - Change hardware platform to a large Sun box that can take more than > 32 gigs of RAM running Solaris. (v890 or something) That sounds expensive. How confident are you that it will give the desired result? I'm not very confident it will. > - Some type of high performance platform from IBM with AIX Same thing. > - Intel/Sun Oracle grid with several nodes, blade server perhaps Since you are a sysadmin looking for advice on usenet, I wonder if you guys have sufficient DBA talent in house to do a good job of that. Or would it be consultants? > - Scrap SCSI and use a fiber channel SAN instead, in standalone or Grid > setup. I don't know enough to comment. > > Anyone have any opinions on the best course of action? How should one > handle this amount of data and maintain performance? Tune the application. It ain't the size, it is the motion. Compare a 10 GB database that needs to inspect every row to answer every query to a 1TB database than can answer 99% of it's queries based on 3MB of hot data and can answer the other 1% by looking up 20 cold rows via primary key. Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB |
| |||
| On Fri, 07 Oct 2005 14:42:52 -0700, mark_sensfan interested us by writing: > Consultant was hired to look at our environment they recommended: > - We need 5-10% of db size in SGA, therefore 23 gigs of SGA for today's > size. While it MAY be required, you might want to find out WHY they made this recommendation. It could be right, BUT ... often consultants who learned their trade from the internet (or in the 90s and never upgraded their skills) will make such a recommendation because the want a huge increase in DB Cache. This is on the assumption that keeping everything in memory will speed things up. They miss a few minor things like how long it takes to find stuff to flush back to disk when the cache is that big. Unless proven otherwise such a large cache could easily slow the entire system down. Your cheapest route is to get the 10g Diagnostic Pack, Tuning Pack and Performance Pack officially turned on (they are included in Enterprise Manager but are extra cost options), start collecting stats with the AWR and use the Advisors to help you find out what is going on. To get started, go to http://otn.oracle.com, click on Enterprise Manager (left side) and look at the demos. There are dozens of things to look at first. Things as simple as 'are you using undo vs redo', 'are you using locally managed tablespaces' and 'could you benefit from partitioning' should be reviewed first. Depending on your SAN technology, you MIGHT benefit from switching Raid 5 to Raid 1+0 (10), but some of the newer SANs compensate for Raid 5 losses in performance reasonably well. Using Real Application Clusters (RAC), which is effectively the database's contribution to Oracle's GRID vision, might be helpful. I'd study other things first, as there are some classes of problems that are not suited for RAC. A lot of what Jonathan Lewis writes in Practical Oracle8i is still valid as well. If you are serious about using intelligent consulting, wander over to http://www.oaktable.net and get the consultants listed there involved. I'm sure there is one close to you. (Or contact any of the regulars in this group offline.) -- Hans Forbrich Canada-wide Oracle training and consulting mailto: Fuzzy.GreyBeard_at_gmail.com *** I no longer assist with top-posted newsgroup queries *** |
| |||
| <mark_sensfan@hotmail.com> wrote in message news:1128721372.461542.213190@g44g2000cwa.googlegr oups.com... > I am a sysadmin tasked with developing a solution to improve our Oracle > 10g performance. I inherited the following server and problem. I'm not > a DBA, so if my terminology is off please forgive me, if you are kind > enough to help but need more info, I can get you that info. > > Server Spec: > Red Hat Enterprise Server 3.0 2.4.21-27.0.4.ELsmp > Dual Intel Xeon CPU @ 3.00GHz w/ hyper threading > 12 gigs of RAM > 3 x RAID 5 arrays with 7 disks each, attached externally. Each array > has its own SCSI channel. > > Problem: > We have a 230 gig database that contains 101 million rows. Indexes are > placed on mount point/SCSI channel, data on another mount point/SCSI > channel. I'm told that the size of this database can't be changed by > our development staff. In fact it will be growing by about 20-30% each > year. Certain queries are taking a dogs age to run. One is a mass load > of data each day, the other a more critical one is a client query that > takes about 2 minutes. We want to bring this value down to something > like 5-10 seconds, if thats not possible at least 30-45 seconds or so. > We know our bottle necks are I/O in both reads and writes. > > Things we have tried: > - Enable and use Async I/O > - Increase SGA to 2.7, jump to 3.7 not done yet. > - Upgrade SCSI driver, saw some minor improvement > - Other minor DB/query changes > > > Consultant was hired to look at our environment they recommended: > - We need 5-10% of db size in SGA, therefore 23 gigs of SGA for today's > size. > - 64 bit architecture, to allow for huge SGA > - Scrap the RAID 5 arrays and build RAID 10 arrays instead, more > platters the better. > - Recommended some query and index changes, the fixes will help, but > won't be overly significant. > > I need to figure out what we can do to achieve our goals and have it > last at least 2 years of increasing data. I've been researching the > following. > > - Change hardware platform to a large Sun box that can take more than > 32 gigs > of RAM running Solaris. (v890 or something) > - Some type of high performance platform from IBM with AIX > - Intel/Sun Oracle grid with several nodes, blade server perhaps > - Scrap SCSI and use a fiber channel SAN instead, in standalone or Grid > setup. > > Anyone have any opinions on the best course of action? How should one > handle this amount of data and maintain performance? > > Grid seems like a good option, but perhaps not necessary because I/O is > our problem, not really CPU. Dumb question, but how do multiple grid > members access the same data efficiently? Can the SGA be spread around > the members of the Grid .Any caveats? > > Thanks for any advice you may be able to provide. > > Mark > You need to find out what the bottleneck is. What is taking the longest? Can you eliminate that? Look, you are wasting money on the consultant and the hardware until you know what is taking the longest and of the whole work load what that represents. Let me tell you a story to illustrate my point. Once upon a time I worked on designing pension administration systems. We created these systems in a scripting language. The language was interpretive, not compiled. Each system was customized for the client. It was not uncommon for a pension calculation to take 2 minutes. Of course, the conventional wisdom said that if you just wrote the entire system in a compiled language (eg C) then it would be much faster. After all we know that compiled languages are faster than interpreted languages. Certainly this is true if all other things are equal. Fortunately it would have been a monumental task to write the whole system in C. The scripting language had a very nice profiler. So I would go on site and profile the calculation. Once I profiled the application, it would turn out that the majority of time was spent getting data. Out of 2 minutes maybe 1 minute and 50 seconds was spent getting the data. So if we wrote the application in assembly language and fine tuned the code the best we could do was to go from 2 minutes to 1 minute 50 seconds. So the answer wasn't to write the code in C or assembly language. The answer was to NOT spend so much time getting data. It was quite common to calculate the final average pay by taking the highest 60 months out of the last 120 months. It was quite common that people who wrote the application would retrieve the pay data 120 times to do the pay calculation. Ahhh, what would happen if we retrieved the pay only once instead of 120 times. It would take me about an hour to make that change in their system and now the calculation time would take maybe 10 seconds. (from 120 seconds) Doing a little more work, looking at what was not the top time waster would in general net a 4 to 7 second reduction in calculation time. So we went from 120 seconds to 6 seconds a factor of 20! All with about 4 hours of focused work. We didn't have to buy faster hardware. We actually could use slower hardware since we were not doing things that we didn't need to do. The point is you need to find out what is event in the whole thing is taking the longest. If you reduce that to 0 what would the results be? Is that fast enough? Can you avoid doing it at all? Can you do less of that? But you have to tackle the most expensive operation first. You might need more hardware, but until you identify it you are just lining the pockets of someone else. Jim |
| |||
| Mark, Check here: http://www.oreilly.com/catalog/optoraclep/ Chapter 1 is available for download as a .pdf doc. Chapter 12 is available in html. hth. -bdbafh |
| ||||
| Mark wrote: >One is a mass load >of data each day, Please provide more specifics on how this is done, there may be some easy fixes such as using a hint or changing commands. >the other a more critical one is a client query that >takes about 2 minutes. We want to bring this value down to something >like 5-10 seconds, if thats not possible at least 30-45 seconds or so. >We know our bottle necks are I/O in both reads and writes. While tracing and statspack are the proper way to go, we might be able to provide some help with an Explain Plan. A proper tuning analyst may be the best investment you make. Also, if you are doing full table or index scans (even other queries than the problem ones), you might need to put those objects into the recycle pool. I saw one complicated OLTP system completely change its behavior during month-end bottleneck processing just by placing one key table (and it's indices) into the recycle pool. Listen to Sybrand. Are you using parallel queries? It is possible putting those together might take longer than not using parallel. By the way, separating indices and data does not necessarily lead to better performance, in most cases it has been proved to be a myth, google this group for myths. Oracle gets the index first, then gets the data, and it all usually becomes irrelevant with multiuser access - you wind up wanting to place both across as many spindles as possible. Google for SAME (Stripe And Mirror Everything) and BAARF (I'm serious). jg -- @home.com is bogus. http://www.physics.uq.edu.au/physics...itchdrop.shtml |