public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger@clusterfs.com>,
	Mike Waychison <mikew@google.com>,
	Sreenivasa Busam <sreenivasac@google.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: fallocate support for bitmap-based files
Date: Fri, 29 Jun 2007 15:36:53 -0500	[thread overview]
Message-ID: <1183149414.12702.10.camel@kleikamp.austin.ibm.com> (raw)
In-Reply-To: <20070629130120.ec0d1c75.akpm@linux-foundation.org>

On Fri, 2007-06-29 at 13:01 -0700, Andrew Morton wrote:
> Guys, Mike and Sreenivasa at google are looking into implementing
> fallocate() on ext2.  Of course, any such implementation could and should
> also be portable to ext3 and ext4 bitmapped files.
> 
> I believe that Sreenivasa will mainly be doing the implementation work.
> 
> 
> The basic plan is as follows:
> 
> - Create (with tune2fs and mke2fs) a hidden file using one of the
>   reserved inode numbers.  That file will be sized to have one bit for each
>   block in the partition.  Let's call this the "unwritten block file".
> 
>   The unwritten block file will be initialised with all-zeroes
> 
> - at fallocate()-time, allocate the blocks to the user's file (in some
>   yet-to-be-determined fashion) and, for each one which is uninitialised,
>   set its bit in the unwritten block file.  The set bit means "this block
>   is uninitialised and needs to be zeroed out on read".
> 
> - truncate() would need to clear out set-bits in the unwritten blocks file.

By truncating the blocks file at the correct byte offset, only needing
to zero some bits of the last byte of the file.

> - When the fs comes to read a block from disk, it will need to consult
>   the unwritten blocks file to see if that block should be zeroed by the
>   CPU.
> 
> - When the unwritten-block is written to, its bit in the unwritten blocks
>   file gets zeroed.
> 
> - An obvious efficiency concern: if a user file has no unwritten blocks
>   in it, we don't need to consult the unwritten blocks file.
> 
>   Need to work out how to do this.  An obvious solution would be to have
>   a number-of-unwritten-blocks counter in the inode.  But do we have space
>   for that?

Would it be too expensive to test the blocks-file page each time a bit
is cleared to see if it is all-zero, and then free the page, making it a
hole?  This test would stop if if finds any non-zero word, so it may not
be too bad.  (This could further be done on a block basis if the block
size is less than a page.)

>   (I expect google and others would prefer that the on-disk format be
>   compatible with legacy ext2!)
> 
> - One concern is the following scenario:
> 
>   - Mount fs with "new" kernel, fallocate() some blocks to a file.
> 
>   - Now, mount the fs under "old" kernel (which doesn't understand the
>     unwritten blocks file).
> 
>     - This kernel will be able to read uninitialised data from that
>       fallocated-to file, which is a security concern.
> 
>   - Now, the "old" kernel writes some data to a fallocated block.  But
>     this kernel doesn't know that it needs to clear that block's flag in
>     the unwritten blocks file!
> 
>   - Now mount that fs under the "new" kernel and try to read that file.
>      The flag for the block is set, so this kernel will still zero out the
>     data on a read, thus corrupting the user's data
> 
>   So how to fix this?  Perhaps with a per-inode flag indicating "this
>   inode has unwritten blocks".  But to fix this problem, we'd require that
>   the "old" kernel clear out that flag.
> 
>   Can anyone propose a solution to this?
> 
>   Ah, I can!  Use the compatibility flags in such a way as to prevent the
>   "old" kernel from mounting this filesystem at all.  To mount this fs
>   under an "old" kernel the user will need to run some tool which will
> 
>   - read the unwritten blocks file
> 
>   - for each set-bit in the unwritten blocks file, zero out the
>     corresponding block
> 
>   - zero out the unwritten blocks file
> 
>   - rewrite the superblock to indicate that this fs may now be mounted
>     by an "old" kernel.
> 
>   Sound sane?

Yeah.  I think it would have to be done under a compatibility flag.  Is
going back to an older kernel really that important?  I think it's more
important to make sure it can't be mounted by an older kernel if bad
things can happen, and they can.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

  reply	other threads:[~2007-06-29 20:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-29 20:01 fallocate support for bitmap-based files Andrew Morton
2007-06-29 20:36 ` Dave Kleikamp [this message]
2007-06-29 20:52   ` Mike Waychison
2007-06-29 21:24     ` Dave Kleikamp
2007-06-29 20:55 ` Theodore Tso
2007-06-29 21:38   ` Andrew Morton
2007-06-29 22:07     ` Mike Waychison
2007-07-04 23:11       ` Valerie Henson
2007-07-06 21:15         ` Mike Waychison
2007-06-29 21:46   ` Andreas Dilger
2007-06-29 22:26     ` Mike Waychison
2007-06-30  5:14       ` Andreas Dilger
2007-06-30 14:31         ` Mingming Cao
2007-06-30 14:13 ` Mingming Cao
2007-06-30 17:29   ` Andreas Dilger
2007-07-02 14:44     ` Mingming Cao
2007-07-02 17:44   ` Badari Pulavarty
2007-07-06 21:33     ` Mike Waychison
2007-07-07  2:05       ` Badari Pulavarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1183149414.12702.10.camel@kleikamp.austin.ibm.com \
    --to=shaggy@linux.vnet.ibm.com \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mikew@google.com \
    --cc=sreenivasac@google.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox