From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 1/2] xfs: xfs_check_page_type buffer checks need help
Date: Wed, 5 Mar 2014 17:08:20 -0500 [thread overview]
Message-ID: <20140305220819.GC55736@bfoster.bfoster> (raw)
In-Reply-To: <1393981893-2497-2-git-send-email-david@fromorbit.com>
On Wed, Mar 05, 2014 at 12:11:32PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs_aops_discard_page() was introduced in the following commit:
>
> xfs: truncate delalloc extents when IO fails in writeback
>
> ... to clean up left over delalloc ranges after I/O failure in
> ->writepage(). generic/224 tests for this scenario and occasionally
> reproduces panics on sub-4k blocksize filesystems.
>
> The cause of this is failure to clean up the delalloc range on a
> page where the first buffer does not match one of the expected
> states of xfs_check_page_type(). If a buffer is not unwritten,
> delayed or dirty&mapped, xfs_check_page_type() stops and
> immediately returns 0.
>
> The stress test of generic/224 creates a scenario where the first
> several buffers of a page with delayed buffers are mapped & uptodate
> and some subsequent buffer is delayed. If the ->writepage() happens
> to fail for this page, xfs_aops_discard_page() incorrectly skips
> the entire page.
>
> This then causes later failures either when direct IO maps the range
> and finds the stale delayed buffer, or we evict the inode and find
> that the inode still has a delayed block reservation accounted to
> it.
>
> We can easily fix this xfs_aops_discard_page() failure by making
> xfs_check_page_type() check all buffers, but this breaks
> xfs_convert_page() more than it is already broken. Indeed,
> xfs_convert_page() wants xfs_check_page_type() to tell it if the
> first buffers on the pages are of a type that can be aggregated into
> the contiguous IO that is already being built.
>
> xfs_convert_page() should not be writing random buffers out of a
> page, but the current behaviour will cause it to do so if there are
> buffers that don't match the current specification on the page.
> Hence for xfs_convert_page() we need to:
>
> a) return "not ok" if the first buffer on the page does not
> match the specification provided to we don't write anything;
> and
> b) abort it's buffer-add-to-io loop the moment we come
> across a buffer that does not match the specification.
>
> Hence we need to fix both xfs_check_page_type() and
> xfs_convert_page() to work correctly with pages that have mixed
> buffer types, whilst allowing xfs_aops_discard_page() to scan all
> buffers on the page for a type match.
>
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
This looks pretty good to me as well. I notice one little thing that
might not be a real problem, but worth a quick thought...
> fs/xfs/xfs_aops.c | 81 ++++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 50 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ef62c6b..98016b3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -632,38 +632,46 @@ xfs_map_at_offset(
> }
>
...
> /*
> @@ -697,7 +705,7 @@ xfs_convert_page(
> goto fail_unlock_page;
> if (page->mapping != inode->i_mapping)
> goto fail_unlock_page;
> - if (!xfs_check_page_type(page, (*ioendp)->io_type))
> + if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
> goto fail_unlock_page;
>
> /*
> @@ -742,6 +750,15 @@ xfs_convert_page(
> p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
> page_dirty = p_offset / len;
>
> + /*
> + * The moment we find a buffer that doesn't match our current type
> + * specification or can't be written, abort the loop and start
> + * writeback. As per the above xfs_imap_valid() check, only
> + * xfs_vm_writepage() can handle partial page writeback fully - we are
> + * limited here to the buffers that are contiguous with the current
> + * ioend, and hence a buffer we can't write breaks that contiguity and
> + * we have to defer the rest of the IO to xfs_vm_writepage().
> + */
> bh = head = page_buffers(page);
> do {
> if (offset >= end_offset)
> @@ -750,7 +767,7 @@ xfs_convert_page(
> uptodate = 0;
> if (!(PageUptodate(page) || buffer_uptodate(bh))) {
> done = 1;
> - continue;
> + break;
> }
>
> if (buffer_unwritten(bh) || buffer_delay(bh) ||
> @@ -762,10 +779,11 @@ xfs_convert_page(
> else
> type = XFS_IO_OVERWRITE;
>
> - if (!xfs_imap_valid(inode, imap, offset)) {
> - done = 1;
> - continue;
> - }
> + /*
> + * imap should always be valid because of the above
> + * partial page end_offset check on the imap.
> + */
> + ASSERT(xfs_imap_valid(inode, imap, offset));
>
> lock_buffer(bh);
> if (type != XFS_IO_OVERWRITE)
> @@ -777,6 +795,7 @@ xfs_convert_page(
> count++;
> } else {
> done = 1;
> + break;
> }
> } while (offset += len, (bh = bh->b_this_page) != head);
>
The next couple lines after the loop are:
if (uptodate && bh == head)
SetPageUptodate(page);
Now that we can break out of the loop, the "bh == head" part of that
check might not necessarily mean what it used to mean. The uptodate
variable is initialized to 1 and we reset to 0 the moment we encounter a
!uptodate buffer. Do you think it's possible to get here on the first
buffer of the page, without having reset 'uptodate,' and potentially
incorrectly set the page uptodate?
Brian
> @@ -868,7 +887,7 @@ xfs_aops_discard_page(
> struct buffer_head *bh, *head;
> loff_t offset = page_offset(page);
>
> - if (!xfs_check_page_type(page, XFS_IO_DELALLOC))
> + if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
> goto out_invalidate;
>
> if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> --
> 1.9.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-03-05 22:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 1:11 [PATCH 0/2] xfs: more bug fixes Dave Chinner
2014-03-05 1:11 ` [PATCH 1/2] xfs: xfs_check_page_type buffer checks need help Dave Chinner
2014-03-05 17:06 ` Christoph Hellwig
2014-03-05 22:08 ` Brian Foster [this message]
2014-03-05 23:18 ` Dave Chinner
2014-03-05 1:11 ` [PATCH 2/2] xfs: inode log reservations are still too small Dave Chinner
2014-03-05 3:33 ` Eric Sandeen
2014-03-05 16:06 ` Brian Foster
2014-03-05 17:14 ` Christoph Hellwig
2014-03-05 21:40 ` Dave Chinner
2014-03-05 22:34 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140305220819.GC55736@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.