public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Theodore Tso <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: e2fsprogs and blocks outside i_size
Date: Sun, 20 Jul 2008 23:08:25 -0600	[thread overview]
Message-ID: <20080721050825.GE3370@webber.adilger.int> (raw)
In-Reply-To: <20080718123706.GE11221@mit.edu>

On Jul 18, 2008  08:37 -0400, Theodore Ts'o wrote:
> On Fri, Jul 18, 2008 at 05:41:30PM +0530, Aneesh Kumar K.V wrote:
> > With fallocate FALLOC_FL_KEEP_SIZE option, when we write to prealloc
> > space and if we hit ENOSPC when trying to insert the extent,
> > we actually zero out the extent. That means we can have blocks
> > outside i_size for an inode.

To clarify, doesn't FALLOC_FL_KEEP_SIZE put the extent beyond i_size,
regardless of whether the ENOSPC problem is hit?

> > I guess e2fsck currently doesn't handle this. Or should we fix kernel
> > to update i_size to the right value if we do a zero out of the extent ?
> > 
> > With fallocate if the prealloc area is small we also aggressively zeroout.
> > This was needed so that a random write pattern on falloc area doesn't
> > results in too many extents.  That also can result in the above
> > error on fsck.
> 
> It would seem to me that e2fsck should be fixed to not complain about
> blocks outside of i_size, *if* the blocks in question are marked as
> being unitialized.

Yes, I think that is the right approach.

> It also seems to me that updating i_size when the
> extent is zero'ed out is also not the right thing to do.  Some
> applications depend on i_size.

Yes, this was my thought exactly.  Just because the fallocated extent
is zeroed out doesn't mean the file size can suddenly change.  This
would appear as file corruption or "data loss" to many applications.

> In the case where you hit ENOSPC when you need to grow the tree to
> insert an extent, that's a tough one.  One approach would be to simply

I think you misunderstand Aneesh's comments - the ext4 fallocate code
already handles all of these cases.  If ENOSPC is hit during the
extent split the error recovery will zero out the whole uninitialized
extent.  Slow, but effective.

> For your second case, where we aggressively zero out blocks, one of
> the reasons why we have to do that is because the kernel isn't
> coalescing extents aggressively.  My inclination here is to *not*
> aggressively zero out blocks outside outside of i_size, and to split
> the extent in that case

That is only partly true.  If an application is writing every other
block in a 64MB fallocated extent we don't want to create 64MB/4kB
separate extents.  There is no guarantee that the application will
fill in the rest of the unwritten extents and allow them to merge.

Also, there is likely a net performance improvement by writing every
block (half with data, the other half with zeroes) compared to doing
write + seek over the entire file.

> I suppose the other hack we could do is have e2fsck check the blocks
> that are outside of i_size, and if they are all zero and extents are
> involved, that it's a case of pre-allocated blocks that needed to be
> zero'ed for some reason, as opposed to a corrupted i_size.  That seems
> to be a really gross hack, though.

Yuck, with the added problem that there is no guarantee that these
data blocks ARE all zero.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-07-21  5:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-18 12:11 e2fsprogs and blocks outside i_size Aneesh Kumar K.V
2008-07-18 12:37 ` Theodore Tso
2008-07-21  5:08   ` Andreas Dilger [this message]
2008-07-21  5:59     ` Aneesh Kumar K.V
2008-07-21 12:34       ` Theodore Tso
2008-07-21 23:32         ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080721050825.GE3370@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox