Unix Technical Forum

Q: Cluster File System Reliability

This is a discussion on Q: Cluster File System Reliability within the Linux Operating System forums, part of the Unix Operating Systems category; --> Hi there, I am rather new to cluster file systems, so let me state my requirements, and hopefully the ...


Go Back   Unix Technical Forum > Unix Operating Systems > Linux Operating System

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-19-2008, 06:15 AM
nicc777
 
Posts: n/a
Default Q: Cluster File System Reliability

Hi there,

I am rather new to cluster file systems, so let me state my
requirements, and hopefully the kind people here get point me in the
right direction.

Basically I want a file system spanning multiple physical hosts, but
mounted on a server as a single mount point - almost like having some
kind of RAID over various physical nodes. The idea is that if one node
fails, the server should still be able to continue with processing
(assume it's something like a database). I am hoping that when the
failed file server(s) come back online, the "RAID" should be rebuild
on the fly, like a real RAID. This will be first prize.

If the above is not possible, or if the performance penalty is too
high, I will also settle for a real time mirroring solution, but with
one crucial requirement: the write on the remote file server (the
mirror) should be guaranteed - in other words, first ensure the data
is on the remote host before writing to the local host disc, and then
only return to the system. I know this will also have a very negative
impact on performance, but I am between a rock and a hard place with
all the requirements (fast file system, but never ever loose any
data).

Perhaps my requirements can not be met, but then I at least need some
pointer as to the next best thing.

Any help will be greatly appreciated. I will feedback to the group in
months to come as to what eventually worked for us (or didn't work and
why).

Thanks in advance.

PS: if anybody knows something about battery backups for hard drives,
and how to protect the hard drive cache in case of power failures etc,
I would also like to hear some of your thoughts/learnings/experiences.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-19-2008, 06:15 AM
The Natural Philosopher
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

nicc777 wrote:
> Hi there,
>
> I am rather new to cluster file systems, so let me state my
> requirements, and hopefully the kind people here get point me in the
> right direction.
>
> Basically I want a file system spanning multiple physical hosts, but
> mounted on a server as a single mount point - almost like having some
> kind of RAID over various physical nodes. The idea is that if one node
> fails, the server should still be able to continue with processing
> (assume it's something like a database). I am hoping that when the
> failed file server(s) come back online, the "RAID" should be rebuild
> on the fly, like a real RAID. This will be first prize.
>


Not much to ask...is it? :-)

> If the above is not possible, or if the performance penalty is too
> high, I will also settle for a real time mirroring solution, but with
> one crucial requirement: the write on the remote file server (the
> mirror) should be guaranteed - in other words, first ensure the data
> is on the remote host before writing to the local host disc, and then
> only return to the system. I know this will also have a very negative
> impact on performance, but I am between a rock and a hard place with
> all the requirements (fast file system, but never ever loose any
> data).


I think proprietary systems exist, but not on Linux as such. You would
need to hack some code I think.

I can concieve of one Linux box running a custom daemon talking to
similar and intercepting network file requests and sharing them out
amongst the other machines..use synchronous writes to ensure data
integrity, and loads of RAM to cache files with..
>


You might use linux as a starting point, but you would be writing your
own code to run on it, I suspect.

> Perhaps my requirements can not be met, but then I at least need some
> pointer as to the next best thing.
>
> Any help will be greatly appreciated. I will feedback to the group in
> months to come as to what eventually worked for us (or didn't work and
> why).
>
> Thanks in advance.
>
> PS: if anybody knows something about battery backups for hard drives,
> and how to protect the hard drive cache in case of power failures etc,
> I would also like to hear some of your thoughts/learnings/experiences.
>

The solution there is to have UPS and power monitoring to flush caches
and halt further access before the system is taken down.


Rather than asking for a particular solution, how about describing te
exact problem you are trying to solve..like how much storage, and how
much throughput, and what level of integrity.

There may be other approaches that would work..
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-19-2008, 06:15 AM
Juha Laiho
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

nicc777 <nicc777@gmail.com> said:
>I am rather new to cluster file systems, so let me state my
>requirements, and hopefully the kind people here get point me in the
>right direction.
>
>Basically I want a file system spanning multiple physical hosts, but
>mounted on a server as a single mount point - almost like having some
>kind of RAID over various physical nodes. The idea is that if one node
>fails, the server should still be able to continue with processing
>(assume it's something like a database). I am hoping that when the
>failed file server(s) come back online, the "RAID" should be rebuild
>on the fly, like a real RAID. This will be first prize.


This doesn't sound what commonly is called "cluster file system";
"cluster file system" (to me, at least), means a set of shared storage
accessible by several computers concurrently (as local storage).
So, the actual storage media will be concurrently connected to
several hosts, and all hosts are able to concurrently access the
file system(s) created on the shared media.

What you describe sounds something I've heard Google having developed
for their search data storage - but I haven't heard of such technology
being used anywhere else.

>PS: if anybody knows something about battery backups for hard drives,
>and how to protect the hard drive cache in case of power failures etc,
>I would also like to hear some of your thoughts/learnings/experiences.


There are storage subsystems (disk racks) from various vendors with
various amounts of local battery-backed cache. As long as the write
to the cache completes, the disk subsystem will be in consistent
state (and when the power resumes, will flush any buffered data to
disks).
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-19-2008, 06:15 AM
nicc777
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

On Sep 2, 12:30 am, The Natural Philosopher <a...@b.c> wrote:
> nicc777 wrote:
> > Hi there,

>
> > I am rather new to cluster file systems, so let me state my
> > requirements, and hopefully the kind people here get point me in the
> > right direction.

>
> > Basically I want a file system spanning multiple physical hosts, but
> > mounted on a server as a single mount point - almost like having some
> > kind of RAID over various physical nodes. The idea is that if one node
> > fails, the server should still be able to continue with processing
> > (assume it's something like a database). I am hoping that when the
> > failed file server(s) come back online, the "RAID" should be rebuild
> > on the fly, like a real RAID. This will be first prize.

>
> Not much to ask...is it? :-)
>
> > If the above is not possible, or if the performance penalty is too
> > high, I will also settle for a real time mirroring solution, but with
> > one crucial requirement: the write on the remote file server (the
> > mirror) should be guaranteed - in other words, first ensure the data
> > is on the remote host before writing to the local host disc, and then
> > only return to the system. I know this will also have a very negative
> > impact on performance, but I am between a rock and a hard place with
> > all the requirements (fast file system, but never ever loose any
> > data).

>
> I think proprietary systems exist, but not on Linux as such. You would
> need to hack some code I think.
>
> I can concieve of one Linux box running a custom daemon talking to
> similar and intercepting network file requests and sharing them out
> amongst the other machines..use synchronous writes to ensure data
> integrity, and loads of RAM to cache files with..
>
>
>
> You might use linux as a starting point, but you would be writing your
> own code to run on it, I suspect.
>
> > Perhaps my requirements can not be met, but then I at least need some
> > pointer as to the next best thing.

>
> > Any help will be greatly appreciated. I will feedback to the group in
> > months to come as to what eventually worked for us (or didn't work and
> > why).

>
> > Thanks in advance.

>
> > PS: if anybody knows something about battery backups for hard drives,
> > and how to protect the hard drive cache in case of power failures etc,
> > I would also like to hear some of your thoughts/learnings/experiences.

>
> The solution there is to have UPS and power monitoring to flush caches
> and halt further access before the system is taken down.
>
> Rather than asking for a particular solution, how about describing te
> exact problem you are trying to solve..like how much storage, and how
> much throughput, and what level of integrity.
>
> There may be other approaches that would work..


Ok - here goes...

I am trying to put together an architecture that can handle large
amount of data processing near real time (data of around 10TB per
day), but the processing of data should not be broken by any hardware
failures. Think of an accounting/billing system like you get in mobile
operators (or the telecoms industry in general) - if they have a
critical hardware failure in the billing modules one of two things
could happen: 1) you can make calls for free for a period of time; or
2) you can make no calls at all (system stops you). The two scenarios
depend on how the system is designed to handle situations where the
billing module stops working.

*** I am trying to prevent the billing system from breaking in the
first place. ***

That means a "cluster" with nodes in various physical locations,
working as a single image. I don't know if there is a term for this,
but the best I can come up with is a entire system looking/working
like a RAID5 disk system (maybe the term "grid" also kind of describes
it). Storage obviously makes a big part of the solution, as all the
data that is streamed to the system should be written to disk first to
guarantee that the nodes processing the data can retrieve the data
again (in case of a failure somewhere else in the processing system).
This "raw" data might also be archived from time to time when you are
collecting data for troubleshooting parts of your system - the
captured data can be "re-run" in a simulation environment where you
can recreate a days worth of processing, but perhaps slowed down or
something while you debug (if that makes sense to anybody).

>From the commercial products, I think Oracle is the closest in terms

of what I want (11g) - but it would be great if we can put a complete
"open source" solution together that many others can use. Not that I
have something against Oracle per say, but I have this thing against
vendor lock-in, and Oracle is rather good with that.

And yes - I do belief a lot of coding will have to take place.
Personally I am leaning toward something like MPI for the data
processing bits - but still, reliable storage is a problem for me. I
don't know how I can get a distributed (physically) filesystem that
looks and behaves like a single system image (or partition) that would
make for highly reliable data storage.

I hope that sheds some more light of what I want to achieve.

Thanks

Nico

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-19-2008, 06:15 AM
The Natural Philosopher
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

nicc777 wrote:
> On Sep 2, 12:30 am, The Natural Philosopher <a...@b.c> wrote:
>> nicc777 wrote:


>> Rather than asking for a particular solution, how about describing te
>> exact problem you are trying to solve..like how much storage, and how
>> much throughput, and what level of integrity.
>>
>> There may be other approaches that would work..

>
> Ok - here goes...
>
> I am trying to put together an architecture that can handle large
> amount of data processing near real time (data of around 10TB per
> day), but the processing of data should not be broken by any hardware
> failures. Think of an accounting/billing system like you get in mobile
> operators (or the telecoms industry in general) - if they have a
> critical hardware failure in the billing modules one of two things
> could happen: 1) you can make calls for free for a period of time; or
> 2) you can make no calls at all (system stops you). The two scenarios
> depend on how the system is designed to handle situations where the
> billing module stops working.
>


Is this a data base application?
I am not sure, but think if you are trying to spread a database across
multiple engines, this is something that people like e.g. oracle have
addressed already.

> *** I am trying to prevent the billing system from breaking in the
> first place. ***
>

Well yes, one always hopes..

> That means a "cluster" with nodes in various physical locations,
> working as a single image. I don't know if there is a term for this,
> but the best I can come up with is a entire system looking/working
> like a RAID5 disk system (maybe the term "grid" also kind of describes
> it). Storage obviously makes a big part of the solution, as all the
> data that is streamed to the system should be written to disk first to
> guarantee that the nodes processing the data can retrieve the data
> again (in case of a failure somewhere else in the processing system).
> This "raw" data might also be archived from time to time when you are
> collecting data for troubleshooting parts of your system - the
> captured data can be "re-run" in a simulation environment where you
> can recreate a days worth of processing, but perhaps slowed down or
> something while you debug (if that makes sense to anybody).
>


MM. Again I think this is something that database people have addressed
but in a different way.

google 'oracle cluster' or 'VAX cluster' and see what you come up with.

I am very hazy on exactly wha is involved here, but I *think* that is a
way of having multiple transaction engines that synchronise with each
other. The application, rather than the hardware or OS software, does this.

I have to say in this case my first port of call would be to phone an
IBM salesman. My brother in law works for them in this sort of
area..might drop him an email..I know he deals with implementation of
massive database projects for very large companies. He wouldn't know
himself, but he probably knows a amn who does..


>>From the commercial products, I think Oracle is the closest in terms

> of what I want (11g) - but it would be great if we can put a complete
> "open source" solution together that many others can use. Not that I
> have something against Oracle per say, but I have this thing against
> vendor lock-in, and Oracle is rather good with that.
>


Of all the software people I have worked with, Oracle get my vote for
total professionalism of approach. At least in the UK, them and IBM.
Both have chosen to be ultra high quality total support vendors to
pretty large installation user bases. That doesn't come free, and
largely you accept the lock in as the price you pay for the quality you get.


> And yes - I do belief a lot of coding will have to take place.
> Personally I am leaning toward something like MPI for the data
> processing bits - but still, reliable storage is a problem for me. I
> don't know how I can get a distributed (physically) filesystem that
> looks and behaves like a single system image (or partition) that would
> make for highly reliable data storage.
>


I think you should be looking more at a a multiple database/machine
system that looks like a single one.

In essence transactions are, I think, cached and distributed amongst
machines.

I will say one thing: the likelihood of - say - a heavily monitored
server farm going down (Oracle UK have about half an acre of machines,
mainly customer machines, in their facility) is a LOT less than the
likelihood of WAN link going down.

Likewise the ability to concentrate 24x7 staff in a single location,
with the expertise and spares they need, is also great.

Plus the cost of building - yea even unto diesel generators - a fully
power conditioned air conditioned environment that is - if your paranoia
runs that deep - terrorist bomb proof..is a lot less than spreading it
all around.

What comes to my mind would be some sort of cluster of RAIDED machines -
proper hardware RAID, fully hot swappable - backed up by service
contracts, hot and cold running technicians and plenty of spares..and
built with e.g. twin Ethernet cards etc..for redundancy. The networking
layer itself needs attention..you will need a failover routing policy in
there as well. Or perhaps simply create two networks with two switches
and run some sort of dynamic routing on the boxes so they use whichever
network is available..

...but you still need to take machines down sometimes...more n that later.

You need multiply redundant internet connections from TRULY different
suppliers running down TRULY different bits of physical fibre. Coming
into your site by totally different physical routes..yep the digger
through the single point conduit scenario looms here..;-)

Unsurprisingly this is precisely the sort of facility that e.g. Oracle
have..

So hosting at a place like that, even if you have a few of YOUR staff
permanently seconded to the site, is probably cost effective unless you
truly have megabucks to play with.

Now you have a hardware system that should take you through anything but
CPU type failure.

I have a feeling that its possible to have twin engines hitting a single
RAID disk array as well, which takes care of that..you need to look into
that.

IF you do all THAT, you should have a system that is hardware failure
proof. at least to a huge degree of probability.

Then conventional database clustering takes care of the load issues. And
maybe more. I don't know.

Finally, if it all does fsck up, thats YOUR problem, not the customers.
They get to use the system for free while you sort it out. That's good
marketing. And also gives the bean counters something use as a
yardstick for 'lost revenue versus downtime' with which you can beat
them when they complain about the costs of all this.








The facility nees fullpower conditioning, aircon and physical security.



> I hope that sheds some more light of what I want to achieve.
>
> Thanks
>
> Nico
>

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 01-19-2008, 06:15 AM
nicc777
 
Posts: n/a
Default Re: Q: Cluster File System Reliability


Thanks for all the advice - it really helps to hear some one else's
thoughts on these issues. I suppose there is only so much you can do,
and in the end it depends what the cost implication would be.

You have given me a lot to chew on, so let me be of then

Cheers and thanks again !

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 01-19-2008, 06:15 AM
nicc777
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

On Sep 2, 8:17 am, Juha Laiho <Juha.La...@iki.fi> wrote:
> nicc777 <nicc...@gmail.com> said:
>
> >I am rather new to cluster file systems, so let me state my
> >requirements, and hopefully the kind people here get point me in the
> >right direction.

>
> >Basically I want a file system spanning multiple physical hosts, but
> >mounted on a server as a single mount point - almost like having some
> >kind of RAID over various physical nodes. The idea is that if one node
> >fails, the server should still be able to continue with processing
> >(assume it's something like a database). I am hoping that when the
> >failed file server(s) come back online, the "RAID" should be rebuild
> >on the fly, like a real RAID. This will be first prize.

>
> This doesn't sound what commonly is called "cluster file system";
> "cluster file system" (to me, at least), means a set of shared storage
> accessible by several computers concurrently (as local storage).
> So, the actual storage media will be concurrently connected to
> several hosts, and all hosts are able to concurrently access the
> file system(s) created on the shared media.
>
> What you describe sounds something I've heard Google having developed
> for their search data storage - but I haven't heard of such technology
> being used anywhere else.
>
> >PS: if anybody knows something about battery backups for hard drives,
> >and how to protect the hard drive cache in case of power failures etc,
> >I would also like to hear some of your thoughts/learnings/experiences.

>
> There are storage subsystems (disk racks) from various vendors with
> various amounts of local battery-backed cache. As long as the write
> to the cache completes, the disk subsystem will be in consistent
> state (and when the power resumes, will flush any buffered data to
> disks).
> --
> Wolf a.k.a. Juha Laiho Espoo, Finland
> (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
> PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
> "...cancel my subscription to the resurrection!" (Jim Morrison)


Thanks for your thoughts.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 01-19-2008, 06:15 AM
Alex
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

On Sep 1, 5:50 pm, nicc777 <nicc...@gmail.com> wrote:
> Hi there,
>
> Basically I want a file system spanning multiple physical hosts, but
> mounted on a server as a single mount point - almost like having some
> kind of RAID over various physical nodes. The idea is that if one node
> fails, the server should still be able to continue with processing
> (assume it's something like a database). I am hoping that when the
> failed file server(s) come back online, the "RAID" should be rebuild
> on the fly, like a real RAID. This will be first prize.
>


Hi,
Take a look at http://freshmeat.net/projects/chironfs/
Except [yet] for the resync part, it may worth a try.

Hope it helps,
Alex

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 01-19-2008, 06:15 AM
Eric Bechet
 
Posts: n/a
Default Re: Q: Cluster File System Reliability

nicc777 a écrit :
> Hi there,
>
> I am rather new to cluster file systems, so let me state my
> requirements, and hopefully the kind people here get point me in the
> right direction.
>
> Basically I want a file system spanning multiple physical hosts, but
> mounted on a server as a single mount point - almost like having some
> kind of RAID over various physical nodes. The idea is that if one node
> fails, the server should still be able to continue with processing
> (assume it's something like a database). I am hoping that when the
> failed file server(s) come back online, the "RAID" should be rebuild
> on the fly, like a real RAID. This will be first prize.
>
> If the above is not possible, or if the performance penalty is too
> high, I will also settle for a real time mirroring solution, but with
> one crucial requirement: the write on the remote file server (the
> mirror) should be guaranteed - in other words, first ensure the data
> is on the remote host before writing to the local host disc, and then
> only return to the system. I know this will also have a very negative
> impact on performance, but I am between a rock and a hard place with
> all the requirements (fast file system, but never ever loose any
> data).
>
> Perhaps my requirements can not be met, but then I at least need some
> pointer as to the next best thing.
>
> Any help will be greatly appreciated. I will feedback to the group in
> months to come as to what eventually worked for us (or didn't work and
> why).
>
> Thanks in advance.
>
> PS: if anybody knows something about battery backups for hard drives,
> and how to protect the hard drive cache in case of power failures etc,
> I would also like to hear some of your thoughts/learnings/experiences.
>
>

try coda !
http://www.coda.cs.cmu.edu/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:58 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com