public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS Resiliency to the disk errors.
@ 2007-04-05  8:08 Zak, Semion
  2007-04-05 16:06 ` Eric Sandeen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Zak, Semion @ 2007-04-05  8:08 UTC (permalink / raw)
  To: xfs

Hi,
 
We are studying possibility to use XFS with cheap (not too reliable)
discs, so we have some questions:
 
What in XFS is done to survive the disk errors (bad sectors)?
I know about superblock duplication in every AG. What else?
 
What is XFS behavior in case of the disk errors  (panic/no mount/partial
data access)?
 
What could be done to restore?
If  zero bad sector/dump to other device/format/restore will help? 
 
Thanks,
 
Semion.


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS Resiliency to the disk errors.
  2007-04-05  8:08 XFS Resiliency to the disk errors Zak, Semion
@ 2007-04-05 16:06 ` Eric Sandeen
  2007-04-10  6:49   ` Zak, Semion
  2007-04-06 18:49 ` Peter Grandi
  2007-04-07 20:47 ` Martin Steigerwald
  2 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2007-04-05 16:06 UTC (permalink / raw)
  To: Zak, Semion; +Cc: xfs

Zak, Semion wrote:
> Hi,
>  
> We are studying possibility to use XFS with cheap (not too reliable)
> discs, so we have some questions:
>  
> What in XFS is done to survive the disk errors (bad sectors)?
> I know about superblock duplication in every AG. What else?
>  
> What is XFS behavior in case of the disk errors  (panic/no mount/partial
> data access)?

generally metadata IO errors or bad magic found in metadata will shut 
down the filesystem gracefully if it can.

IO errors on data will just be IO errors.

> What could be done to restore?

xfsdump/xfsrestore I suppose

> If  zero bad sector/dump to other device/format/restore will help? 

Well, you can't make data out of nothing.

you could dd off the junk drive, zeroing out unreadable sectors, point 
xfs_repair at it and hope for the best.  Which, depending on the 
problem, could wind up not being very good.

If you want to know how to recover from disaster, it sounds like perhaps 
your data is important enough that you should not plan for failure, but 
rather find a way to avoid it?

Seems to me the only way I'd want to put drives which are expected to 
fail regularly into a product is if the recovery method of "replace the 
disk and re-image the appliance" was  acceptable, but that's just me.  :)

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS Resiliency to the disk errors.
  2007-04-05  8:08 XFS Resiliency to the disk errors Zak, Semion
  2007-04-05 16:06 ` Eric Sandeen
@ 2007-04-06 18:49 ` Peter Grandi
  2007-04-07 20:47 ` Martin Steigerwald
  2 siblings, 0 replies; 5+ messages in thread
From: Peter Grandi @ 2007-04-06 18:49 UTC (permalink / raw)
  To: Linux XFS

>>> On Thu, 5 Apr 2007 11:08:07 +0300, "Zak, Semion"
>>> <SZak@nds.com> said:

SZak> Hi, We are studying possibility to use XFS with cheap (not
SZak> too reliable) discs, so we have some questions:

Astute move :-). I hope that you are also thinking of using
16-wide RAID5 too :-).
 
SZak> What in XFS is done to survive the disk errors (bad
SZak> sectors)? [ ... ]

My impression is that the XFS design is really meant for highly
scalable performance on enterprise level hardware, where the
block device layer abstracts aways all drive error issues,
including having UPSes.

Sure you can use it otherwise, but it has a very different
optimal usage envelope from 'ext3' or ReiserFS/Reiser4 (which
have been designed with stronger resiliency and recoverability
features, as they are more oriented to desktop and cheap server
usage).

Anyhow, a highly reliable block device layer can surely be built
out of cheap disks, if one does it right, and people like EMC2
do it regularly with their midrange products.

I may be interesting for your to have a look at the disk
reliability statistics in some recent papers by some Google and
CMU researchers, discussed here:

http://swik.net/User:dolander/All+Things+Distributed/On+the+Reliability+of+Hard+Disks/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS Resiliency to the disk errors.
  2007-04-05  8:08 XFS Resiliency to the disk errors Zak, Semion
  2007-04-05 16:06 ` Eric Sandeen
  2007-04-06 18:49 ` Peter Grandi
@ 2007-04-07 20:47 ` Martin Steigerwald
  2 siblings, 0 replies; 5+ messages in thread
From: Martin Steigerwald @ 2007-04-07 20:47 UTC (permalink / raw)
  To: linux-xfs

Am Donnerstag 05 April 2007 schrieb Zak, Semion:
> Hi,
>
> We are studying possibility to use XFS with cheap (not too reliable)
> discs, so we have some questions:

Hi Semion!

I recommend at least monitoring the health status of the drives using 
smartmontools - with regular short and long selft test - or a similar 
mechanism. So you *may* at least be warned *before* a disk fails.

Otherwise I would go for a redundant RAID array at least so that at least 
one drive in a bunch of drives can fail without data loss.

Regards,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: XFS Resiliency to the disk errors.
  2007-04-05 16:06 ` Eric Sandeen
@ 2007-04-10  6:49   ` Zak, Semion
  0 siblings, 0 replies; 5+ messages in thread
From: Zak, Semion @ 2007-04-10  6:49 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Thank you very much.

I have other question, about data lose on crash/power cut.
Is it possible to make it not more then in other file systems, if open
the important file with O_SYNC flag, or use fsync and sync functions?

Thanks,

Semion. 

-----Original Message-----
From: Eric Sandeen [mailto:sandeen@sandeen.net] 
Sent: Thursday, April 05, 2007 7:07 PM
To: Zak, Semion
Cc: xfs@oss.sgi.com
Subject: Re: XFS Resiliency to the disk errors.

Zak, Semion wrote:
> Hi,
>  
> We are studying possibility to use XFS with cheap (not too reliable) 
> discs, so we have some questions:
>  
> What in XFS is done to survive the disk errors (bad sectors)?
> I know about superblock duplication in every AG. What else?
>  
> What is XFS behavior in case of the disk errors  (panic/no 
> mount/partial data access)?

generally metadata IO errors or bad magic found in metadata will shut
down the filesystem gracefully if it can.

IO errors on data will just be IO errors.

> What could be done to restore?

xfsdump/xfsrestore I suppose

> If  zero bad sector/dump to other device/format/restore will help? 

Well, you can't make data out of nothing.

you could dd off the junk drive, zeroing out unreadable sectors, point
xfs_repair at it and hope for the best.  Which, depending on the
problem, could wind up not being very good.

If you want to know how to recover from disaster, it sounds like perhaps
your data is important enough that you should not plan for failure, but
rather find a way to avoid it?

Seems to me the only way I'd want to put drives which are expected to
fail regularly into a product is if the recovery method of "replace the
disk and re-image the appliance" was  acceptable, but that's just me.
:)

-Eric
***********************************************************************************
This email message and any attachments thereto are intended only for use by the addressee(s) named above, and may contain legally privileged and/or confidential information. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the postmaster@nds.com and destroy the original message.
***********************************************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-04-10  6:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05  8:08 XFS Resiliency to the disk errors Zak, Semion
2007-04-05 16:06 ` Eric Sandeen
2007-04-10  6:49   ` Zak, Semion
2007-04-06 18:49 ` Peter Grandi
2007-04-07 20:47 ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox