public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Andi Kleen <ak@suse.de>
Cc: Alexandre Ratchov <alexandre.ratchov@bull.net>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4 compat flag assignments
Date: Sun, 1 Oct 2006 22:34:58 -0600	[thread overview]
Message-ID: <20061002043458.GF22010@schatzie.adilger.int> (raw)
In-Reply-To: <200609290106.27852.ak@suse.de>

On Sep 29, 2006  01:06 +0200, Andi Kleen wrote:
> Andreas Dilger wrote:
> > No work has been done on this yet.  Getting checksums to be efficient
> > depends on having a generic callback mechanism from the journal code
> > to avoid repeated checksums on a block while it is being modified.
> 
> You can just do incremental checksumming which is very cheap. 
> 
> Or did you mean the flushing to disk of the checksum?  If it's always in
> the same object that would be free, but that is not possible for bitmaps
> at least.  But I guess the checksum write in the block descriptor 
> could be done very lazily at least, perhaps keeping track on disk if invalid
> checksums are expected or not.

I'm not sure I understand what you mean.  My goal is that the ext4 code
modifies the block as many times as it wants during a transaction (this
may happen from multiple threads for a single block), then just before
the transaction is committed to disk the journal calls a callback for that
block (inode, group descriptor, bitmap, superblock, extent, index, etc) and 
computes the checksum only once for that block.  Then the block is flushed
to filesystem.

I'm not sure I like the idea of writing "this block doesn't have a valid
checksum" to disk, since there is some risk of that block being corrupted
during a crash and then we don't know if the block is valid or not.

> > Finally, the extents format has the capability (though no code is
> > implemented for this yet) to store a checksum in each index and extent
> > block... storing an ext3_extent_tail (checksum, inode+generation
> > backpointer) as the last entry in the block.
> 
> Old style indirect blocks will need them too. My thinking was
> to use another block for those (so a indirect block would be two nearby
> blocks) 

We couldn't do this for the existing indirect blocks easily, but what I'd
thought is that it is possible to either have e2fsck convert block-mapped
files to extent mapped (with extent tail of checksum + inode backpointer)
or have a new block-mapped extent (for fragmented files), which would also
have a header with magic (so that random garbage in a large filesystem
doesn't look like a valid [dt]indirect block) and also have the extent
tail to contain the checksum + inode backpointer.

> Inodes need them, but with the inode extension that will be hopefully
> not a problem to keep a few bytes for this.

Yes, it might even be valuable to put this into the "small" inode so
that it can be used for existing ext3 filesystems.

> And directories, which should be relatively easy to extend with
> the current format.

Haven't thought about that specifically for directories, but I do have
some ideas about enhancing the directory format to allow storing more
data into the dir_entries (e.g. 64-bit inode) and possibly using the
same code to store a tree of EAs in the same format as directories, so
the htree code can be used to do lookups if there are lots of EAs.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


  reply	other threads:[~2006-10-02  4:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-22  9:15 ext4 compat flag assignments Andreas Dilger
2006-09-28  8:55 ` Alexandre Ratchov
2006-09-28 20:29   ` Andi Kleen
2006-09-28 22:41     ` Andreas Dilger
2006-09-28 23:06       ` Andi Kleen
2006-10-02  4:34         ` Andreas Dilger [this message]
2006-09-28 23:06   ` Andreas Dilger
2006-10-04 20:04   ` Theodore Tso
2006-10-05  0:19     ` Andreas Dilger
2006-10-05  2:02       ` Theodore Tso
2006-10-06 13:33     ` Valerie Clement

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061002043458.GF22010@schatzie.adilger.int \
    --to=adilger@clusterfs.com \
    --cc=ak@suse.de \
    --cc=alexandre.ratchov@bull.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox