vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| HI, I'm a quite new in the RAC field and I want to know if the following behavior is normal for such a configuration: A few weeks ago we succeeded to configure an Oracle 10.2.0.1 cluster. The configuration was comprised of 2 nodes and the underlying OS was Redhat AS4. The installation went well following all the installation steps mentioned into the official oracle documentation. The OCR and the voting disks were configured using NFS. At that time we noticed that from time to time one of the nodes (not always the same) was unexpectedly rebooted. The system or oracle logs didn't offered any clues therefore our conclusion was that the NFS might cause problems. In order to prove this we decided to configure a RAC on a single node just for testing purposes. The OCR, voting disks and the oracle software were installed on OCFS2 partitions therefore no NFS was involved. On this node we configured 2 oracle instances which worked fine for a while but, from time to time or when the server is stressed with intensive SQLs the entire server is rebooted. After some searching on metalink we found out the Bug.4741921/4556989 (36) INSTANCE RESTARTED AFTER SHUTDOWN ABORT IN RAC ENVIRONMENT which is fixed in 10.2.0.2 patch. We downloaded and installed the patch but it seems that the strange behavior is still there. We notice, indeed, that the frequency of the server reboot is lower now but we have no explanation for what really causes the reboot. Have anyone notice the same behavior on the 10.2.0.x RAC configuration? Are there any workarounds for this? Many thanks. |
| |||
| alek wrote: > HI, > > I'm a quite new in the RAC field and I want to know if the following > behavior is normal for such a configuration: > > A few weeks ago we succeeded to configure an Oracle 10.2.0.1 cluster. > The configuration was comprised of 2 nodes and the underlying OS was > Redhat AS4. The installation went well following all the installation > steps mentioned into the official oracle documentation. The OCR and the > voting disks were configured using NFS. At that time we noticed that > from time to time one of the nodes (not always the same) was > unexpectedly rebooted. The system or oracle logs didn't offered any > clues therefore our conclusion was that the NFS might cause problems. > In order to prove this we decided to configure a RAC on a single node > just for testing purposes. The OCR, voting disks and the oracle > software were installed on OCFS2 partitions therefore no NFS was > involved. On this node we configured 2 oracle instances which worked > fine for a while but, from time to time or when the server is stressed > with intensive SQLs the entire server is rebooted. After some searching > on metalink we found out the Bug.4741921/4556989 (36) INSTANCE > RESTARTED AFTER SHUTDOWN ABORT IN RAC ENVIRONMENT which is fixed in > 10.2.0.2 patch. We downloaded and installed the patch but it seems that > the strange behavior is still there. We notice, indeed, that the > frequency of the server reboot is lower now but we have no explanation > for what really causes the reboot. > Have anyone notice the same behavior on the 10.2.0.x RAC configuration? > Are there any workarounds for this? The unexpected re-booting of nodes is unfortunately a pretty common occurrence under linux -- often usually associated with some kind of lockup/access problem against the shared disk systems. There are some parameters you can attempt to set "higher" to allow the clusterware to tolerate longer periods of ... ( inability to access the storage ). Jeffrey Hunter in his site www.idevelopment.info has some long writeups on RAC linux configuration that may be a step in the right direction. Opening a tar with oracle support is for better or worse probably another direction that you need to proceed in. |
| |||
| Thanks for the valuable info. I've made some tweaks on the parameters mentioned on the Jeffrey Hunter site but it seams that the node is still rebooting. However, I feel more confortable now knowing that this is a common issue on the Linux platform. |
| |||
| I hope it's not that frequent for production sites running RAC on linux. ( We run on hpux ). But I have heard a lot of people having problems in that area. ( That does include remarks from people noting that prod linux systems have experienced those symptoms at least on occassion or under load ). When you start bundling the oracle clusterware with other things like the oracle file system layer, etc ... it can get messy pretty quickly. If you are contemplating a production system using linux it's definitely something that needs to be addressed, understood, and hopefully completely resolved. |
| |||
| hpuxrac wrote: > I hope it's not that frequent for production sites running RAC on > linux. ( We run on hpux ). > > But I have heard a lot of people having problems in that area. ( That > does include remarks from people noting that prod linux systems have > experienced those symptoms at least on occassion or under load ). > > When you start bundling the oracle clusterware with other things like > the oracle file system layer, etc ... it can get messy pretty quickly. > > If you are contemplating a production system using linux it's > definitely something that needs to be addressed, understood, and > hopefully completely resolved. I think Mladen's comments on a slightly different topic actually apply here, too: http://www.freelists.org/archives/or.../msg00366.html jg -- @home.com is bogus. http://www.businessweek.com/technolo...309_009568.htm |
| ||||
| alek wrote: > HI, > > I'm a quite new in the RAC field and I want to know if the following > behavior is normal for such a configuration: > > A few weeks ago we succeeded to configure an Oracle 10.2.0.1 cluster. > The configuration was comprised of 2 nodes and the underlying OS was > Redhat AS4. The installation went well following all the installation > steps mentioned into the official oracle documentation. The OCR and the > voting disks were configured using NFS. At that time we noticed that > from time to time one of the nodes (not always the same) was > unexpectedly rebooted. The system or oracle logs didn't offered any > clues therefore our conclusion was that the NFS might cause problems. > In order to prove this we decided to configure a RAC on a single node > just for testing purposes. The OCR, voting disks and the oracle > software were installed on OCFS2 partitions therefore no NFS was > involved. On this node we configured 2 oracle instances which worked > fine for a while but, from time to time or when the server is stressed > with intensive SQLs the entire server is rebooted. After some searching > on metalink we found out the Bug.4741921/4556989 (36) INSTANCE > RESTARTED AFTER SHUTDOWN ABORT IN RAC ENVIRONMENT which is fixed in > 10.2.0.2 patch. We downloaded and installed the patch but it seems that > the strange behavior is still there. We notice, indeed, that the > frequency of the server reboot is lower now but we have no explanation > for what really causes the reboot. > Have anyone notice the same behavior on the 10.2.0.x RAC configuration? > Are there any workarounds for this? > > Many thanks. I have never seen the behaviour reported by hpuxrac and others with respect to node ejection and rebooting reported but then I do all of my work on NetApps with the connection string supplied by NetApp. I would suggest you monitor the network for outages as the behaviour you describe is expected if, for some reason, the Oracle clusterware believes it can no longer see a resource. And that resource might be public, memory interconnect, or the storage device. -- Daniel A. Morgan http://www.psoug.org damorgan@x.washington.edu (replace x with u to respond) |