linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Allison Henderson <achender@linux.vnet.ibm.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 1/1 v4] ext4: fix xfstests 75, 112, 127 punch hole failure
Date: Sun, 7 Aug 2011 22:42:13 -0400	[thread overview]
Message-ID: <20110808024213.GC11497@thunk.org> (raw)
In-Reply-To: <1312698112-7836-1-git-send-email-achender@linux.vnet.ibm.com>

Thanks, much better.  I still have some questions though.

>  /*
> + * ext4_unmap_partial_page_buffers()
> + * Unmaps a page range of length 'length' starting from offset
> + * 'from'.  The range to be unmaped must be contained with in
> + * one page.  If the specified range exceeds the end of the page
> + * it will be shortened to end of the page that cooresponds to
> + * 'from'.  Only block aligned buffers will be unmapped and unblock
> + * aligned buffers are skipped
> + */
> +static int ext4_unmap_partial_page_buffers(handle_t *handle,
> +		struct address_space *mapping, loff_t from, loff_t length)

Is "unmap" really the right name of this function?  Isn't the real to
invalidate a buffer head that corresponds to a punched-out block?

Correct me if I'm wrong, but the primary problem that we need to worry
about here is that the block that has just been punched out could be
reused by some other inode --- but if we still have a dirty buffer
head mapped to the now-punched-out-and-released-block, writeback could
end up corrupting some other buffer.  So in some sense, what we are
effectively doing is a bforget, right?

In fact, if we are doing data journalling, don't we need to call
ext4_forget()?  Otherwise, the block gets reassigned, but if the block
has been data journaled, without the revoke record written by
ext4_forget(), won't we have a potential problem where the journal
replay could end up corrupting another inode's data?

Furthermore, the question I have is why don't have precisely the same
problem with truncate?  If we have a 16k page size, and we open a 15k
file for writing, and then we overwrite the first 15k, and then
truncate the file down to 4k.  At the moment we'll zero out the last
11k of the file, and leave the buffer heads dirty and mapped; then
suppose those blocks get reassigned and used before writeback happens.
We're not calling ext4_unmap_partial_page_buffers() now; why isn't
this a problem today?  Are we just getting lucky?  Why is this more of
a problem with punch, such that xfstests 75, 112, 127, etc. are
failing?

Am I missing something here?

> +	/*
> +	 * Now we need to unmap the non-page-aligned buffers.
> +	 * If the block size is smaller than the page size
> +	 * and the file space being truncated is not
> +	 * page aligned, then unmap the buffers
> +	 */
> +	if (inode->i_sb->s_blocksize < PAGE_CACHE_SIZE &&
> +	   !((offset % PAGE_CACHE_SIZE == 0) &&
> +	   (length % PAGE_CACHE_SIZE == 0))) {

How does this solve the situation I had outlined before?  Suppose are
have a 4k page size, and a 4k block size, and we issue a punch
operation with offset=4092, and length=4105.  In the previous section
of code, the offsets 4092--4095 and 8193--8197 will be zero'ed out.
But offset will not be a multiple of the page size, and length will
not be a multiple of page size, but what has been left to be dealt
with *is* an exactl multiple of a page size, and thus could be
optimized out.

I didn't see any changes in your patch that adjust offset and length
after calling ext4_block_zero_page_range(); so I think we still have
an optimization opportunity we're missing here.

Regards,

						- Ted

  reply	other threads:[~2011-08-08  2:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-07  6:21 [PATCH 1/1 v4] ext4: fix xfstests 75, 112, 127 punch hole failure Allison Henderson
2011-08-08  2:42 ` Ted Ts'o [this message]
2011-08-08 20:54   ` Allison Henderson
2011-08-09 16:45     ` Ted Ts'o
2011-08-09 21:10       ` Allison Henderson
2011-08-11  3:17         ` Ted Ts'o
2011-08-11 21:31           ` Allison Henderson
2011-08-11 23:44             ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110808024213.GC11497@thunk.org \
    --to=tytso@mit.edu \
    --cc=achender@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).