This is a discussion on Q: Cluster File System Reliability within the Linux Operating System forums, part of the Unix Operating Systems category; --> Hi there, I am rather new to cluster file systems, so let me state my requirements, and hopefully the ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi there, I am rather new to cluster file systems, so let me state my requirements, and hopefully the kind people here get point me in the right direction. Basically I want a file system spanning multiple physical hosts, but mounted on a server as a single mount point - almost like having some kind of RAID over various physical nodes. The idea is that if one node fails, the server should still be able to continue with processing (assume it's something like a database). I am hoping that when the failed file server(s) come back online, the "RAID" should be rebuild on the fly, like a real RAID. This will be first prize. If the above is not possible, or if the performance penalty is too high, I will also settle for a real time mirroring solution, but with one crucial requirement: the write on the remote file server (the mirror) should be guaranteed - in other words, first ensure the data is on the remote host before writing to the local host disc, and then only return to the system. I know this will also have a very negative impact on performance, but I am between a rock and a hard place with all the requirements (fast file system, but never ever loose any data). Perhaps my requirements can not be met, but then I at least need some pointer as to the next best thing. Any help will be greatly appreciated. I will feedback to the group in months to come as to what eventually worked for us (or didn't work and why). Thanks in advance. PS: if anybody knows something about battery backups for hard drives, and how to protect the hard drive cache in case of power failures etc, I would also like to hear some of your thoughts/learnings/experiences. |
| |||
| nicc777 wrote: > Hi there, > > I am rather new to cluster file systems, so let me state my > requirements, and hopefully the kind people here get point me in the > right direction. > > Basically I want a file system spanning multiple physical hosts, but > mounted on a server as a single mount point - almost like having some > kind of RAID over various physical nodes. The idea is that if one node > fails, the server should still be able to continue with processing > (assume it's something like a database). I am hoping that when the > failed file server(s) come back online, the "RAID" should be rebuild > on the fly, like a real RAID. This will be first prize. > Not much to ask...is it? :-) > If the above is not possible, or if the performance penalty is too > high, I will also settle for a real time mirroring solution, but with > one crucial requirement: the write on the remote file server (the > mirror) should be guaranteed - in other words, first ensure the data > is on the remote host before writing to the local host disc, and then > only return to the system. I know this will also have a very negative > impact on performance, but I am between a rock and a hard place with > all the requirements (fast file system, but never ever loose any > data). I think proprietary systems exist, but not on Linux as such. You would need to hack some code I think. I can concieve of one Linux box running a custom daemon talking to similar and intercepting network file requests and sharing them out amongst the other machines..use synchronous writes to ensure data integrity, and loads of RAM to cache files with.. > You might use linux as a starting point, but you would be writing your own code to run on it, I suspect. > Perhaps my requirements can not be met, but then I at least need some > pointer as to the next best thing. > > Any help will be greatly appreciated. I will feedback to the group in > months to come as to what eventually worked for us (or didn't work and > why). > > Thanks in advance. > > PS: if anybody knows something about battery backups for hard drives, > and how to protect the hard drive cache in case of power failures etc, > I would also like to hear some of your thoughts/learnings/experiences. > The solution there is to have UPS and power monitoring to flush caches and halt further access before the system is taken down. Rather than asking for a particular solution, how about describing te exact problem you are trying to solve..like how much storage, and how much throughput, and what level of integrity. There may be other approaches that would work.. |
| |||
| nicc777 <nicc777@gmail.com> said: >I am rather new to cluster file systems, so let me state my >requirements, and hopefully the kind people here get point me in the >right direction. > >Basically I want a file system spanning multiple physical hosts, but >mounted on a server as a single mount point - almost like having some >kind of RAID over various physical nodes. The idea is that if one node >fails, the server should still be able to continue with processing >(assume it's something like a database). I am hoping that when the >failed file server(s) come back online, the "RAID" should be rebuild >on the fly, like a real RAID. This will be first prize. This doesn't sound what commonly is called "cluster file system"; "cluster file system" (to me, at least), means a set of shared storage accessible by several computers concurrently (as local storage). So, the actual storage media will be concurrently connected to several hosts, and all hosts are able to concurrently access the file system(s) created on the shared media. What you describe sounds something I've heard Google having developed for their search data storage - but I haven't heard of such technology being used anywhere else. >PS: if anybody knows something about battery backups for hard drives, >and how to protect the hard drive cache in case of power failures etc, >I would also like to hear some of your thoughts/learnings/experiences. There are storage subsystems (disk racks) from various vendors with various amounts of local battery-backed cache. As long as the write to the cache completes, the disk subsystem will be in consistent state (and when the power resumes, will flush any buffered data to disks). -- Wolf a.k.a. Juha Laiho Espoo, Finland (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++ "...cancel my subscription to the resurrection!" (Jim Morrison) |
| |||
| On Sep 2, 12:30 am, The Natural Philosopher <a...@b.c> wrote: > nicc777 wrote: > > Hi there, > > > I am rather new to cluster file systems, so let me state my > > requirements, and hopefully the kind people here get point me in the > > right direction. > > > Basically I want a file system spanning multiple physical hosts, but > > mounted on a server as a single mount point - almost like having some > > kind of RAID over various physical nodes. The idea is that if one node > > fails, the server should still be able to continue with processing > > (assume it's something like a database). I am hoping that when the > > failed file server(s) come back online, the "RAID" should be rebuild > > on the fly, like a real RAID. This will be first prize. > > Not much to ask...is it? :-) > > > If the above is not possible, or if the performance penalty is too > > high, I will also settle for a real time mirroring solution, but with > > one crucial requirement: the write on the remote file server (the > > mirror) should be guaranteed - in other words, first ensure the data > > is on the remote host before writing to the local host disc, and then > > only return to the system. I know this will also have a very negative > > impact on performance, but I am between a rock and a hard place with > > all the requirements (fast file system, but never ever loose any > > data). > > I think proprietary systems exist, but not on Linux as such. You would > need to hack some code I think. > > I can concieve of one Linux box running a custom daemon talking to > similar and intercepting network file requests and sharing them out > amongst the other machines..use synchronous writes to ensure data > integrity, and loads of RAM to cache files with.. > > > > You might use linux as a starting point, but you would be writing your > own code to run on it, I suspect. > > > Perhaps my requirements can not be met, but then I at least need some > > pointer as to the next best thing. > > > Any help will be greatly appreciated. I will feedback to the group in > > months to come as to what eventually worked for us (or didn't work and > > why). > > > Thanks in advance. > > > PS: if anybody knows something about battery backups for hard drives, > > and how to protect the hard drive cache in case of power failures etc, > > I would also like to hear some of your thoughts/learnings/experiences. > > The solution there is to have UPS and power monitoring to flush caches > and halt further access before the system is taken down. > > Rather than asking for a particular solution, how about describing te > exact problem you are trying to solve..like how much storage, and how > much throughput, and what level of integrity. > > There may be other approaches that would work.. Ok - here goes... I am trying to put together an architecture that can handle large amount of data processing near real time (data of around 10TB per day), but the processing of data should not be broken by any hardware failures. Think of an accounting/billing system like you get in mobile operators (or the telecoms industry in general) - if they have a critical hardware failure in the billing modules one of two things could happen: 1) you can make calls for free for a period of time; or 2) you can make no calls at all (system stops you). The two scenarios depend on how the system is designed to handle situations where the billing module stops working. *** I am trying to prevent the billing system from breaking in the first place. *** That means a "cluster" with nodes in various physical locations, working as a single image. I don't know if there is a term for this, but the best I can come up with is a entire system looking/working like a RAID5 disk system (maybe the term "grid" also kind of describes it). Storage obviously makes a big part of the solution, as all the data that is streamed to the system should be written to disk first to guarantee that the nodes processing the data can retrieve the data again (in case of a failure somewhere else in the processing system). This "raw" data might also be archived from time to time when you are collecting data for troubleshooting parts of your system - the captured data can be "re-run" in a simulation environment where you can recreate a days worth of processing, but perhaps slowed down or something while you debug (if that makes sense to anybody). >From the commercial products, I think Oracle is the closest in terms of what I want (11g) - but it would be great if we can put a complete "open source" solution together that many others can use. Not that I have something against Oracle per say, but I have this thing against vendor lock-in, and Oracle is rather good with that. And yes - I do belief a lot of coding will have to take place. Personally I am leaning toward something like MPI for the data processing bits - but still, reliable storage is a problem for me. I don't know how I can get a distributed (physically) filesystem that looks and behaves like a single system image (or partition) that would make for highly reliable data storage. I hope that sheds some more light of what I want to achieve. Thanks Nico |
| |||
| nicc777 wrote: > On Sep 2, 12:30 am, The Natural Philosopher <a...@b.c> wrote: >> nicc777 wrote: >> Rather than asking for a particular solution, how about describing te >> exact problem you are trying to solve..like how much storage, and how >> much throughput, and what level of integrity. >> >> There may be other approaches that would work.. > > Ok - here goes... > > I am trying to put together an architecture that can handle large > amount of data processing near real time (data of around 10TB per > day), but the processing of data should not be broken by any hardware > failures. Think of an accounting/billing system like you get in mobile > operators (or the telecoms industry in general) - if they have a > critical hardware failure in the billing modules one of two things > could happen: 1) you can make calls for free for a period of time; or > 2) you can make no calls at all (system stops you). The two scenarios > depend on how the system is designed to handle situations where the > billing module stops working. > Is this a data base application? I am not sure, but think if you are trying to spread a database across multiple engines, this is something that people like e.g. oracle have addressed already. > *** I am trying to prevent the billing system from breaking in the > first place. *** > Well yes, one always hopes.. > That means a "cluster" with nodes in various physical locations, > working as a single image. I don't know if there is a term for this, > but the best I can come up with is a entire system looking/working > like a RAID5 disk system (maybe the term "grid" also kind of describes > it). Storage obviously makes a big part of the solution, as all the > data that is streamed to the system should be written to disk first to > guarantee that the nodes processing the data can retrieve the data > again (in case of a failure somewhere else in the processing system). > This "raw" data might also be archived from time to time when you are > collecting data for troubleshooting parts of your system - the > captured data can be "re-run" in a simulation environment where you > can recreate a days worth of processing, but perhaps slowed down or > something while you debug (if that makes sense to anybody). > MM. Again I think this is something that database people have addressed but in a different way. google 'oracle cluster' or 'VAX cluster' and see what you come up with. I am very hazy on exactly wha is involved here, but I *think* that is a way of having multiple transaction engines that synchronise with each other. The application, rather than the hardware or OS software, does this. I have to say in this case my first port of call would be to phone an IBM salesman. My brother in law works for them in this sort of area..might drop him an email..I know he deals with implementation of massive database projects for very large companies. He wouldn't know himself, but he probably knows a amn who does.. >>From the commercial products, I think Oracle is the closest in terms > of what I want (11g) - but it would be great if we can put a complete > "open source" solution together that many others can use. Not that I > have something against Oracle per say, but I have this thing against > vendor lock-in, and Oracle is rather good with that. > Of all the software people I have worked with, Oracle get my vote for total professionalism of approach. At least in the UK, them and IBM. Both have chosen to be ultra high quality total support vendors to pretty large installation user bases. That doesn't come free, and largely you accept the lock in as the price you pay for the quality you get. > And yes - I do belief a lot of coding will have to take place. > Personally I am leaning toward something like MPI for the data > processing bits - but still, reliable storage is a problem for me. I > don't know how I can get a distributed (physically) filesystem that > looks and behaves like a single system image (or partition) that would > make for highly reliable data storage. > I think you should be looking more at a a multiple database/machine system that looks like a single one. In essence transactions are, I think, cached and distributed amongst machines. I will say one thing: the likelihood of - say - a heavily monitored server farm going down (Oracle UK have about half an acre of machines, mainly customer machines, in their facility) is a LOT less than the likelihood of WAN link going down. Likewise the ability to concentrate 24x7 staff in a single location, with the expertise and spares they need, is also great. Plus the cost of building - yea even unto diesel generators - a fully power conditioned air conditioned environment that is - if your paranoia runs that deep - terrorist bomb proof..is a lot less than spreading it all around. What comes to my mind would be some sort of cluster of RAIDED machines - proper hardware RAID, fully hot swappable - backed up by service contracts, hot and cold running technicians and plenty of spares..and built with e.g. twin Ethernet cards etc..for redundancy. The networking layer itself needs attention..you will need a failover routing policy in there as well. Or perhaps simply create two networks with two switches and run some sort of dynamic routing on the boxes so they use whichever network is available.. ...but you still need to take machines down sometimes...more n that later. You need multiply redundant internet connections from TRULY different suppliers running down TRULY different bits of physical fibre. Coming into your site by totally different physical routes..yep the digger through the single point conduit scenario looms here..;-) Unsurprisingly this is precisely the sort of facility that e.g. Oracle have.. So hosting at a place like that, even if you have a few of YOUR staff permanently seconded to the site, is probably cost effective unless you truly have megabucks to play with. Now you have a hardware system that should take you through anything but CPU type failure. I have a feeling that its possible to have twin engines hitting a single RAID disk array as well, which takes care of that..you need to look into that. IF you do all THAT, you should have a system that is hardware failure proof. at least to a huge degree of probability. Then conventional database clustering takes care of the load issues. And maybe more. I don't know. Finally, if it all does fsck up, thats YOUR problem, not the customers. They get to use the system for free while you sort it out. That's good marketing. And also gives the bean counters something use as a yardstick for 'lost revenue versus downtime' with which you can beat them when they complain about the costs of all this. The facility nees fullpower conditioning, aircon and physical security. > I hope that sheds some more light of what I want to achieve. > > Thanks > > Nico > |
| |||
| Thanks for all the advice - it really helps to hear some one else's thoughts on these issues. I suppose there is only so much you can do, and in the end it depends what the cost implication would be. You have given me a lot to chew on, so let me be of then Cheers and thanks again ! |
| |||
| On Sep 2, 8:17 am, Juha Laiho <Juha.La...@iki.fi> wrote: > nicc777 <nicc...@gmail.com> said: > > >I am rather new to cluster file systems, so let me state my > >requirements, and hopefully the kind people here get point me in the > >right direction. > > >Basically I want a file system spanning multiple physical hosts, but > >mounted on a server as a single mount point - almost like having some > >kind of RAID over various physical nodes. The idea is that if one node > >fails, the server should still be able to continue with processing > >(assume it's something like a database). I am hoping that when the > >failed file server(s) come back online, the "RAID" should be rebuild > >on the fly, like a real RAID. This will be first prize. > > This doesn't sound what commonly is called "cluster file system"; > "cluster file system" (to me, at least), means a set of shared storage > accessible by several computers concurrently (as local storage). > So, the actual storage media will be concurrently connected to > several hosts, and all hosts are able to concurrently access the > file system(s) created on the shared media. > > What you describe sounds something I've heard Google having developed > for their search data storage - but I haven't heard of such technology > being used anywhere else. > > >PS: if anybody knows something about battery backups for hard drives, > >and how to protect the hard drive cache in case of power failures etc, > >I would also like to hear some of your thoughts/learnings/experiences. > > There are storage subsystems (disk racks) from various vendors with > various amounts of local battery-backed cache. As long as the write > to the cache completes, the disk subsystem will be in consistent > state (and when the power resumes, will flush any buffered data to > disks). > -- > Wolf a.k.a. Juha Laiho Espoo, Finland > (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V > PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++ > "...cancel my subscription to the resurrection!" (Jim Morrison) Thanks for your thoughts. |
| |||
| On Sep 1, 5:50 pm, nicc777 <nicc...@gmail.com> wrote: > Hi there, > > Basically I want a file system spanning multiple physical hosts, but > mounted on a server as a single mount point - almost like having some > kind of RAID over various physical nodes. The idea is that if one node > fails, the server should still be able to continue with processing > (assume it's something like a database). I am hoping that when the > failed file server(s) come back online, the "RAID" should be rebuild > on the fly, like a real RAID. This will be first prize. > Hi, Take a look at http://freshmeat.net/projects/chironfs/ Except [yet] for the resync part, it may worth a try. Hope it helps, Alex |
| ||||
| nicc777 a écrit : > Hi there, > > I am rather new to cluster file systems, so let me state my > requirements, and hopefully the kind people here get point me in the > right direction. > > Basically I want a file system spanning multiple physical hosts, but > mounted on a server as a single mount point - almost like having some > kind of RAID over various physical nodes. The idea is that if one node > fails, the server should still be able to continue with processing > (assume it's something like a database). I am hoping that when the > failed file server(s) come back online, the "RAID" should be rebuild > on the fly, like a real RAID. This will be first prize. > > If the above is not possible, or if the performance penalty is too > high, I will also settle for a real time mirroring solution, but with > one crucial requirement: the write on the remote file server (the > mirror) should be guaranteed - in other words, first ensure the data > is on the remote host before writing to the local host disc, and then > only return to the system. I know this will also have a very negative > impact on performance, but I am between a rock and a hard place with > all the requirements (fast file system, but never ever loose any > data). > > Perhaps my requirements can not be met, but then I at least need some > pointer as to the next best thing. > > Any help will be greatly appreciated. I will feedback to the group in > months to come as to what eventually worked for us (or didn't work and > why). > > Thanks in advance. > > PS: if anybody knows something about battery backups for hard drives, > and how to protect the hard drive cache in case of power failures etc, > I would also like to hear some of your thoughts/learnings/experiences. > > try coda ! http://www.coda.cs.cmu.edu/ |
| Thread Tools | |
| Display Modes | |
|
|