linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <ric@emc.com>
To: Andreas Dilger <adilger@clusterfs.com>
Cc: Matthew Wilcox <matthew@wil.cx>, linux-fsdevel@vger.kernel.org
Subject: Re: topics for the file system mini-summit
Date: Mon, 29 May 2006 15:29:11 -0400	[thread overview]
Message-ID: <447B4B87.7040403@emc.com> (raw)
In-Reply-To: <20060529160905.GX5964@schatzie.adilger.int>



Andreas Dilger wrote:

>On May 28, 2006  22:07 -0400, Ric Wheeler wrote:
>  
>
>>you could build a 
>>file system on top of a collection of disk partitions/LUN's and then 
>>your inode would could be extended to encode the partition number and 
>>the internal mapping. You could even harden the block groups to the 
>>point that fsck could heal one group while the file system was (mostly?) 
>>online backed up by the rest of the block groups...
>>    
>>
>
>This is one thing that we have been thinking of for ext3.  Instead of a
>filesystem-wide "error" bit we could move this per-group to only mark
>the block or inode bitmaps in error if they have a checksum failure.
>This would prevent allocations from that group to avoid further potential
>corruption of the filesystem metadata.
>
>Once an error is detected then a filesystem service thread or a userspace
>helper would walk the inode table (starting in the current group, which
>is most likely to hold the relevant data) recreating the respective bitmap
>table and keeping a "valid bit" bitmap as well.  Once all of the bits
>in the bitmap are marked valid then we can start using this group again.
>
>
>  
>
That is a neat idea - would you lose complete access to the impacted 
group, or have you thought about "best effort" read-only while under repair?

One thing that has worked very well for us is that we keep a digital 
signature of each user object (MD5, SHAX hash, etc) so we can validate 
that what we wrote is what got read back.  This also provides a very 
powerful sanity check after getting hit by failing media or severe file 
system corruption since what ever we do manage to salvage (which might 
not be all files) can be validated.

As an archival (write once, read infrequently) storage device, this 
works pretty well for us since the signature does not need to constantly 
recomputed on each write/append.

For general purpose read/write work loads, I wonder if it would make 
sense to compute and store such a checksum or signature on close (say in 
an extended attribute)?  It might be useful to use another of those 
special attributes (like immutable attribute) to indicate that this file 
is important enough to digitally sign on close.

Regards,

Ric


  reply	other threads:[~2006-05-29 19:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-25 21:44 topics for the file system mini-summit Ric Wheeler
2006-05-26 16:48 ` Andreas Dilger
2006-05-27  0:49   ` Ric Wheeler
2006-05-27 14:18     ` Andreas Dilger
2006-05-28  1:44       ` Ric Wheeler
2006-05-29  0:11 ` Matthew Wilcox
2006-05-29  2:07   ` Ric Wheeler
2006-05-29 16:09     ` Andreas Dilger
2006-05-29 19:29       ` Ric Wheeler [this message]
2006-05-30  6:14         ` Andreas Dilger
2006-06-07 10:10       ` Stephen C. Tweedie
2006-06-07 14:03         ` Andi Kleen
2006-06-07 18:55         ` Andreas Dilger
2006-06-01  2:19 ` Valerie Henson
2006-06-01  2:42   ` Matthew Wilcox
2006-06-01  3:24     ` Valerie Henson
2006-06-01 12:45       ` Matthew Wilcox
2006-06-01 12:53         ` Arjan van de Ven
2006-06-01 20:06         ` Russell Cattelan
2006-06-02 11:27         ` Nathan Scott
2006-06-01  5:36   ` Andreas Dilger
2006-06-03 13:50   ` Ric Wheeler
2006-06-03 14:13     ` Arjan van de Ven
2006-06-03 15:07       ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=447B4B87.7040403@emc.com \
    --to=ric@emc.com \
    --cc=adilger@clusterfs.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).