vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Vital Info: 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. It has shared SCSI disks using 2 2104-DU3's with 10 total Physical Volumes. Note, I know that my OS and HA software is unsupported right now and I am planning on upgrading. Currently 'Node 1'(which is the primary node) has the cluster resources and is running fine. 'Node 2'(which is the secondary) is in the cluster, but, has lost all information regarding the shared disk, i.e. PVID's and Volume Group information is gone. Just a note, was troubleshooting via diag and it had issues with a scsi card which happened to be attached to shared disk. Diag recommended to remove the device and run cfgmgr which I did and should not have(I knew better not too). Ok, I need to get the configuration corrected and the following is how I plan on doing it: On Node 2 1. Assign PVID's to the shared disk since they no longer have any assigned. 2. Shutdown the cluster software. On Node 1 3. Shutdown the cluster software 4. exportvg vg01 On Node 2 5. importvg vg01 6. varyoffvg vg01 On Node 1 7. importvg vg01 8. varyoffvg vg01 9. Start the cluster software. On Node 2 10. Start the cluster software. I tried this before except that the PV's did not have id's assigned and it failed(I'm assuming it failed because of no pvid's). Question(s), Am I missing anything in the steps outlined above? Should assign the pvid's, skip the exportvg/importvg and try a failover and see if the 'Lazy Update' update the volume group information? Is there anything I should watch out for? TIA Pete's |
| |||
| "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag news:6724a51f.0401220607.36d4155e@posting.google.c om... > Vital Info: > 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. > It has shared SCSI disks using 2 2104-DU3's with 10 total Physical > Volumes. [snip] > Currently 'Node 1'(which is the primary node) has the cluster > resources and is running fine. > > 'Node 2'(which is the secondary) is in the cluster, but, has lost all > information regarding the shared disk, i.e. PVID's and Volume Group > information is gone.[some snipped] Hallo Pete, sounds like you got your punishment already ;-). Here's the good news: the solution might be less painful because we are talking AIX here Disclaimer: I am writing from memory and I assume that the HACMP configuration itself was working properly before and did not change through your diag adventure... > > On Node 2 > 1. Assign PVID's to the shared disk since they no longer have any > assigned. Nope. Don't assign manually, just let configuration manager read what's there (node1 still sees the old/original PIVDs doesn' it? Then they can be read by node2 again, too.) > 2. Shutdown the cluster software. > On Node 1 > 3. Shutdown the cluster software > 4. exportvg vg01 Nope!!!!!! just varyoffvg vg01 > On Node 2 # cfgmgr -vlscsi0 ... (or similar, as you like. Use -v verbose option to watch hardware detection) > 5. importvg vg01 # importvg vg01 <one_of_the_vg01_member_disks_rediscovered_by_cfgm gr> Assign Major Number (same as on node1) during import if fs of vg is exported via NFS. (I would in any case). > 6. varyoffvg vg01 > On Node 1 > 7. importvg vg01 Just varyonvg vg01 after varyoffvg > 8. varyoffvg vg01 Why? Did not understand why you want to do this step. Do you want to run HACMP from node2? Should you run node1 as primary and node2 as backup (e.g. not concurrent cluster) I would start HACMP on node1 and then do a manual takeover to node2 and back to see whether everything's working. (As I said above: it should if your HACMP config was fine before and did not change). Synchronize cluster resources bevore starting takeover tests. > 9. Start the cluster software. > On Node 2 > 10. Start the cluster software. > > I tried this before except that the PV's did not have id's assigned > and it failed(I'm assuming it failed because of no pvid's). Yes. Actually that's the starting point. You need to see disks from node2. Write again if cfgmgr cannot access the disks because of some lvm locks or scsi reserve bits. HTH, Andreas |
| |||
| "Andreas Schulze" <b79xan@gmx.de> wrote in message news:<buon2q$9mk10@news-1.bank.dresdner.net>... > "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag > news:6724a51f.0401220607.36d4155e@posting.google.c om... > > Vital Info: > > 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. > > It has shared SCSI disks using 2 2104-DU3's with 10 total Physical > > Volumes. [snip] > > Currently 'Node 1'(which is the primary node) has the cluster > > resources and is running fine. > > > > 'Node 2'(which is the secondary) is in the cluster, but, has lost all > > information regarding the shared disk, i.e. PVID's and Volume Group > > information is gone.[some snipped] > > Hallo Pete, > sounds like you got your punishment already ;-). Here's the good news: the > solution might be less painful because we are talking AIX here > Disclaimer: I am writing from memory and I assume that the HACMP > configuration itself was working properly before and did not change through > your diag adventure... Nope, nothing should have changed through my adventure. > > > > On Node 2 > > 1. Assign PVID's to the shared disk since they no longer have any > > assigned. > Nope. Don't assign manually, just let configuration manager read what's > there (node1 still sees the old/original PIVDs doesn' it? Then they can be > read by node2 again, too.) > > 2. Shutdown the cluster software. > > On Node 1 > > 3. Shutdown the cluster software > > 4. exportvg vg01 > Nope!!!!!! just varyoffvg vg01 If the cluster software is not running, the vg is already varied off. Plus, when you do an exportvg, it removes pv and filesystem info from the ODM and places it all on disk. So, what you are basically saying is do not do the exportvg because I'm not actually moving the data and all the VG, PV and filesystem is still there in the VGDA. Correct? Then run cfgmgr and importvg on Node2. Correct? > > On Node 2 > # cfgmgr -vlscsi0 ... (or similar, as you like. Use -v verbose option to > watch hardware detection) When I originally ran cfgmgr on Node2, the Node1 had the disk resources, so, there could have been locks. But, cfgmgr on Node2 did not pick up the PVID's. I do an lspv and there are no pvid's assigned. On Node2, should I remove the current pv devices and then run cfgmgr when Node1 does not have the disks locked? > > 5. importvg vg01 > # importvg vg01 <one_of_the_vg01_member_disks_rediscovered_by_cfgm gr> > Assign Major Number (same as on node1) during import if fs of vg is exported > via NFS. (I would in any case). These are not NFS exports, so no worries there. > > 6. varyoffvg vg01 > > On Node 1 > > 7. importvg vg01 > Just varyonvg vg01 after varyoffvg > > 8. varyoffvg vg01 > Why? Did not understand why you want to do this step. Do you want to run > HACMP from node2? Should you run node1 as primary and node2 as backup (e.g. > not concurrent cluster) I would start HACMP on node1 and then do a manual This is non-concurrent cluster, the reason I would do this is because exportvg removes the VG, PV and filesystem information from the ODM(lqueryvg is a headache saver). > takeover to node2 and back to see whether everything's working. (As I said > above: it should if your HACMP config was fine before and did not change). > Synchronize cluster resources bevore starting takeover tests. > > 9. Start the cluster software. > > On Node 2 > > 10. Start the cluster software. > > > > I tried this before except that the PV's did not have id's assigned > > and it failed(I'm assuming it failed because of no pvid's). > Yes. Actually that's the starting point. You need to see disks from node2. > Write again if cfgmgr cannot access the disks because of some lvm locks or > scsi reserve bits. > > HTH, > Andreas Thanks for your reply Andreas. Additional questions above, feel free to email me at my above address Andreas. TIA, Pete's |
| |||
| "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag news:6724a51f.0401221251.327285be@posting.google.c om... > "Andreas Schulze" <b79xan@gmx.de> wrote in message news:<buon2q$9mk10@news-1.bank.dresdner.net>... > > "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag > > news:6724a51f.0401220607.36d4155e@posting.google.c om... > > > Vital Info: > > > 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. > > > It has shared SCSI disks using 2 2104-DU3's with 10 total Physical > > > Volumes. [snip] > > > Currently 'Node 1'(which is the primary node) has the cluster > > > resources and is running fine. > > > > > > 'Node 2'(which is the secondary) is in the cluster, but, has lost all > > > information regarding the shared disk, i.e. PVID's and Volume Group > > > information is gone.[some snipped] > > > > [some snipped] I assume that the HACMP > > configuration itself was working properly before and did not change through > > your diag adventure... > Nope, nothing should have changed through my adventure. > > > > On Node 2 > > > 1. Assign PVID's to the shared disk since they no longer have any > > > assigned. > > Nope. Don't assign manually, just let configuration manager read what's > > there (node1 still sees the old/original PVIDs doesn' it? Then they can be > > read by node2 again, too.) > > > 2. Shutdown the cluster software. > > > On Node 1 > > > 3. Shutdown the cluster software > > > 4. exportvg vg01 > > Nope!!!!!! just varyoffvg vg01 > If the cluster software is not running, the vg is already varied off. > Plus, when you do an exportvg, it removes pv and filesystem info from > the ODM and places it all on disk. So, what you are basically saying > is do not do the exportvg because I'm not actually moving the data and > all the VG, PV and filesystem is still there in the VGDA. Correct? Each VGDA contains the complete allocation information of the entire volume group. There is no need to remove the known information out of the ODM and /etc/filesystems just to import this VGDA information into another server. > Then run cfgmgr and importvg on Node2. Correct? Yes. We could have used a Learning Import while node1 is up and applications are running if the if diag had not swept out node2's memory. > > > > On Node 2 > > # cfgmgr -vlscsi0 ... (or similar, as you like. Use -v verbose option to > > watch hardware detection) > > When I originally ran cfgmgr on Node2, the Node1 had the disk > resources, so, there could have been locks. >But, cfgmgr on Node2 did not pick up the PVID's. >I do an lspv and there are no pvid's > assigned. On Node2, should I remove the current pv devices and then > run cfgmgr when Node1 does not have the disks locked? Yes, in this case cfgmgr could not read sector 0 that contains the pvid. On node2 rmdev -d -l all the hdisk devices of the shared vg then run cfgmgr again (with the shared vg varyed off from node1). If cfgmgr just had problems with the LVM lock this should be sufficient to read the information we need. Should there still be some scsi disk reservation active you could use an HACMP utility to blast the reserve bit # /usr/sbin/cluster/utilities/cl_SCSIdiskreset /dev/hdisk_in_question Path will be /usr/es/sbin... if you use ES. You need to run this command against all of the shared vg's hdisk devices. Alternatively (or in case the utility fails) you could just reboot node1. > > > > 5. importvg vg01 > > # importvg vg01 <one_of_the_vg01_member_disks_rediscovered_by_cfgm gr> > > Assign Major Number (same as on node1) during import if fs of vg is exported > > via NFS. (I would in any case). > > These are not NFS exports, so no worries there. > > > > 6. varyoffvg vg01 > > > On Node 1 > > > 7. importvg vg01 > > Just varyonvg vg01 after varyoffvg > > > 8. varyoffvg vg01 > > Why? Did not understand why you want to do this step. Do you want to run > > HACMP from node2? Should you run node1 as primary and node2 as backup (e.g. > > not concurrent cluster) I would start HACMP on node1 and then do a manual > > [some snipped] > This is non-concurrent cluster, the reason I would do this is because > exportvg removes the VG, PV and filesystem information from the > ODM(lqueryvg is a headache saver). I see. But IMHO you do not have to export. See above. > > > takeover to node2 and back to see whether everything's working. (As I said > > above: it should if your HACMP config was fine before and did not change). > > Synchronize cluster resources bevore starting takeover tests. Don't know whether I got you correctly that you always export and import your shared vg during failover/takeover. In this case at least fs order in /etc/filesystem should be fine. Otherwise check this if order of fs is important for error free mounting. After the volumegroup is known to both nodes synchronize cluster resources bevore starting takeover tests. > > > 9. Start the cluster software. > > > On Node 2 > > > 10. Start the cluster software. > > > > > > I tried this before except that the PV's did not have id's assigned > > > and it failed(I'm assuming it failed because of no pvid's). > > Yes. Actually that's the starting point. You need to see disks from node2. > > Write again if cfgmgr cannot access the disks because of some lvm locks or > > scsi reserve bits. > > > > HTH, > > Andreas > > Thanks for your reply Andreas. Additional questions above, feel free > to email me at my above address Andreas. Pls let's stick to the list for the benefit of others that might experience similar problems. > > TIA, > Pete's HTH, Andreas |
| |||
| Hey Pete, a quick thought...did you assign a unique SCSI ID to your SCSI cards? (e.g. Node1 SCSI ID 6. Node 2 SCSI ID 5) Rick Pete's wrote: > Vital Info: > 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. > It has shared SCSI disks using 2 2104-DU3's with 10 total Physical > Volumes. Note, I know that my OS and HA software is unsupported right > now and I am planning on upgrading. > > Currently 'Node 1'(which is the primary node) has the cluster > resources and is running fine. > > 'Node 2'(which is the secondary) is in the cluster, but, has lost all > information regarding the shared disk, i.e. PVID's and Volume Group > information is gone. Just a note, was troubleshooting via diag and it > had issues with a scsi card which happened to be attached to shared > disk. Diag recommended to remove the device and run cfgmgr which I > did and should not have(I knew better not too). Ok, I need to get the > configuration corrected and the following is how I plan on doing it: > > On Node 2 > 1. Assign PVID's to the shared disk since they no longer have any > assigned. > 2. Shutdown the cluster software. > On Node 1 > 3. Shutdown the cluster software > 4. exportvg vg01 > On Node 2 > 5. importvg vg01 > 6. varyoffvg vg01 > On Node 1 > 7. importvg vg01 > 8. varyoffvg vg01 > 9. Start the cluster software. > On Node 2 > 10. Start the cluster software. > > I tried this before except that the PV's did not have id's assigned > and it failed(I'm assuming it failed because of no pvid's). > > Question(s), Am I missing anything in the steps outlined above? > Should assign the pvid's, skip the exportvg/importvg and try a > failover and see if the 'Lazy Update' update the volume group > information? Is there anything I should watch out for? > > TIA > Pete's |
| |||
| Rick <rmunn@eastlink.ca> wrote in message news:<A90Rb.3811$Ja2.51225@nnrp1.uunet.ca>... > Hey Pete, a quick thought...did you assign a unique SCSI ID to your SCSI > cards? (e.g. Node1 SCSI ID 6. Node 2 SCSI ID 5) > > Rick > > Pete's wrote: > > > Vital Info: > > 2 Node cluster, p660-6h1's running AIX 4.3.3 ml09 with HACMP 4.4.1. > > It has shared SCSI disks using 2 2104-DU3's with 10 total Physical > > Volumes. Note, I know that my OS and HA software is unsupported right > > now and I am planning on upgrading. > > > > Currently 'Node 1'(which is the primary node) has the cluster > > resources and is running fine. > > > > 'Node 2'(which is the secondary) is in the cluster, but, has lost all > > information regarding the shared disk, i.e. PVID's and Volume Group > > information is gone. Just a note, was troubleshooting via diag and it > > had issues with a scsi card which happened to be attached to shared > > disk. Diag recommended to remove the device and run cfgmgr which I > > did and should not have(I knew better not too). Ok, I need to get the > > configuration corrected and the following is how I plan on doing it: > > > > On Node 2 > > 1. Assign PVID's to the shared disk since they no longer have any > > assigned. > > 2. Shutdown the cluster software. > > On Node 1 > > 3. Shutdown the cluster software > > 4. exportvg vg01 > > On Node 2 > > 5. importvg vg01 > > 6. varyoffvg vg01 > > On Node 1 > > 7. importvg vg01 > > 8. varyoffvg vg01 > > 9. Start the cluster software. > > On Node 2 > > 10. Start the cluster software. > > > > I tried this before except that the PV's did not have id's assigned > > and it failed(I'm assuming it failed because of no pvid's). > > > > Question(s), Am I missing anything in the steps outlined above? > > Should assign the pvid's, skip the exportvg/importvg and try a > > failover and see if the 'Lazy Update' update the volume group > > information? Is there anything I should watch out for? > > > > TIA > > Pete's Yep, if you don't and you boot up, spurious events start happening when Node1 has the disk resources. Ran into this on one card when I first brought up this cluster because I forgot to change it on one adapter, that was about 1.5yrs ago. Thanks, Pete's |
| ||||
| "Andreas Schulze" <b79xan@gmx.de> wrote in message news:<bupisn$kj2g9$1@ID-208966.news.uni-berlin.de>... > "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag > news:6724a51f.0401221251.327285be@posting.google.c om... > > "Andreas Schulze" <b79xan@gmx.de> wrote in message > news:<buon2q$9mk10@news-1.bank.dresdner.net>... > > > "Pete's" <empete2000@yahoo.com> schrieb im Newsbeitrag > > > news:6724a51f.0401220607.36d4155e@posting.google.c om... SNIPPED. > change). > > > Synchronize cluster resources bevore starting takeover tests. > Don't know whether I got you correctly that you always export and import > your shared vg during failover/takeover. In this case at least fs order in > /etc/filesystem should be fine. Otherwise check this if order of fs is > important for error free mounting. After the volumegroup is known to both > nodes synchronize cluster resources bevore starting takeover tests. > > > > 9. Start the cluster software. > > > > On Node 2 > > > > 10. Start the cluster software. > > > > > > > > I tried this before except that the PV's did not have id's assigned > > > > and it failed(I'm assuming it failed because of no pvid's). > > > Yes. Actually that's the starting point. You need to see disks from > node2. > > > Write again if cfgmgr cannot access the disks because of some lvm locks > or > > > scsi reserve bits. > > > > > > HTH, > > > Andreas > > > > Thanks for your reply Andreas. Additional questions above, feel free > > to email me at my above address Andreas. > Pls let's stick to the list for the benefit of others that might experience > similar problems. > > > > TIA, > > Pete's > > HTH, > Andreas I thought I'd post a followup to this thread. I successfully fixed my configuration. Everything appears to be sync'd. Just confirming where I went wrong when I first tried to fix this, I had run 'cfgmgr' while my primary node had the disk resources, so no pvid's were imported from the command. After shutting down the cluster on both sides and running cfgmgr after all the disks were rmdev'd, lspv reported pvid's. Hopefully, my adventure and punishment is subsiding now. Key lesson with HACMP - Never automatically do what 'diag' tells you to do (i.e. rmdev a scsi device) when troubleshooting scsi devices. HTH someone else. Pete's |