This is a discussion on Multiprocessor questions within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> Hi, I have some questions regarding multiprocessor systems, and specially Sun ones. Don't be tough, I don't know much ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I have some questions regarding multiprocessor systems, and specially Sun ones. Don't be tough, I don't know much about the subject. I think I know some of the answers, but I'd like to have them confirmed... In a n-processor system, is there a possibility that, even if one processor stops working or is damaged, the system continues on? (I'm betting no) If the damaged processor is removed, would the n-1 system continue working after reboot? (I believe so, perhaps after some jumper modification??) I guess my question is if there is some microarchitecture similar to a load balancer, that would ask if the processor is active, and if so, use it... Any link to the subject is very welcome Thanks! |
| |||
| Joe Smith wrote: > Hi, > > I have some questions regarding multiprocessor systems, and specially Sun > ones. Don't be tough, I don't know much about the subject. > > I think I know some of the answers, but I'd like to have them confirmed... > > In a n-processor system, is there a possibility that, even if one > processor stops working or is damaged, the system continues on? > (I'm betting no) > Yes, dependant on the particular class of hardware, however any processes running on that particular CPU will die. > If the damaged processor is removed, would the n-1 system continue working > after reboot? > (I believe so, perhaps after some jumper modification??) Again, yes dependant on the system design. The answer here is "probably" > > I guess my question is if there is some microarchitecture similar to a > load balancer, that would ask if the processor is active, and if so, use > it... Again, system class dependant. P. |
| |||
| On Thu, 20 Nov 2003, Joe Smith wrote: > I have some questions regarding multiprocessor systems, and specially Sun > ones. Don't be tough, I don't know much about the subject. I'll answer these questions purely from a Sun HW persepctive. YMMV for other brands of HW. > In a n-processor system, is there a possibility that, even if one processor > stops working or is damaged, the system continues on? > (I'm betting no) Yes the system will continue, although how well depends on which HW range you're talking about. Entry level MP machines probably will stop, but higher end machine will keep going. The processes running on the dead CPU will probably die, though. > If the damaged processor is removed, would the n-1 system continue working > after reboot? > (I believe so, perhaps after some jumper modification??) Yes, no jumper mods required. > I guess my question is if there is some microarchitecture similar to a load > balancer, that would ask if the processor is active, and if so, use it... That's what the kernel does! -- Rich Teer, SCNA, SCSA President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-online.net |
| |||
| "Rich Teer" <rich.teer@rite-group.com> wrote: > On Thu, 20 Nov 2003, Joe Smith wrote: > > > I have some questions regarding multiprocessor systems, and specially Sun > > ones. Don't be tough, I don't know much about the subject. > > I'll answer these questions purely from a Sun HW persepctive. > YMMV for other brands of HW. > > > In a n-processor system, is there a possibility that, even if one processor > > stops working or is damaged, the system continues on? > > (I'm betting no) > > Yes the system will continue, although how well depends on > which HW range you're talking about. Entry level MP machines > probably will stop, but higher end machine will keep going. > The processes running on the dead CPU will probably die, though. > [.. snip ..] Hi, Any Sun system will crash if a CPU is dying, that is due the UNIX operating system. After the crash it will reboot and continue working if at least one working CPU is possible. Of course this means that only a domain will crash if this will happen in a multi domain system like the F15K. HTH, Axel Neumann |
| |||
| > + On 20-Nov-03 15:28:27 +Joe Smith <nospam@nospam.com> wrote >I have some questions regarding multiprocessor systems, and specially Sun >ones. Don't be tough, I don't know much about the subject. >I think I know some of the answers, but I'd like to have them confirmed... >In a n-processor system, is there a possibility that, even if one processor >stops working or is damaged, the system continues on? >(I'm betting no) I guess it depends on if any operating system critic tasks is running on that particular CPU, at least with Sun HW and Solaris.. however I never had a CPU fail except during boot. More expensive system (Mainly the Enterprise systems) can even let the technichan replace CPU-boards without rebooting the machine.. (the same goes for RAM,disks and PSU's ofcoz..) >If the damaged processor is removed, would the n-1 system continue working >after reboot? >(I believe so, perhaps after some jumper modification??) Yes, usally you dont even have to remove the CPU-board, the machine will detect it as bad and then just not use it. >I guess my question is if there is some microarchitecture similar to a load >balancer, that would ask if the processor is active, and if so, use it... I dont know how it works "inside", but yes it probably works in some similar way... I just had to fire up my SS10 for fun (normally shutdown since it heat up the room alot if its running..) [glenn@hydra glenn]$ psrinfo 0 on-line since 11/22/03 00:05:31 1 on-line since 11/22/03 00:05:36 2 on-line since 11/22/03 00:05:36 3 on-line since 11/22/03 00:05:36 [glenn@hydra glenn]$ uname -a SunOS hydra 5.8 Generic_108528-18 sun4m sparc SUNW,SPARCstation-10 [glenn@hydra glenn]$ (Yes, I know that it's badly patched, but as I said, it been turned off during all the summer since it get so warm, I actually patching it right now |
| ||||
| "Axel Neumann" <Axel.Neumann@epost.de> writes: > Any Sun system will crash if a CPU is dying, that is due the UNIX > operating system. Some CPU faults are caught on some Sun models, and can be fixed without crashing the overall system. However, if you really want a system with no single point of failure, then you need a fault-tolerant system; e.g., pairs of CPUs operating in lockstep such that if any single CPU fails, the error will be caught right away. Sun doesn't make boxes like that as far as I know; they bought a fault-tolerant SPARC company in the late 1990s, produced a Netra ft 1800 box, and they still have a web page on the topic <http://www.sun.com/servers/ft-sparc/> but I haven't heard much about fault-tolerant SPARC from Sun lately. Resilience Corp. also put out a fault-tolerant SPARC Solaris box in the 1990s, but they've switched to GNU/Linux in their current products. (Their current boxes are not fault-tolerant -- merely "integrated high availability", which is good enough for most users.) There's also the LEON-FT SPARC V8 core used by the European Space Agency for use in long space missions: it's fault-tolerant, but it's not a Sun design. See <http://www.gaisler.com/>. Fault-tolerant boxes tend to be fairly expensive. The cheapest commercial box that I know of is the NEC Express 5800/ft, a Xeon-based box that runs GNU/Linux and Windows Server 2003 (but not Solaris x86, unfortunately). |