From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 1/7 v2] xfs: don't dirty buffers beyond EOF
Date: Fri, 29 Aug 2014 08:13:00 -0400 [thread overview]
Message-ID: <20140829121300.GA3640@laptop.bfoster> (raw)
In-Reply-To: <20140828234932.GW20518@dastard>
On Fri, Aug 29, 2014 at 09:49:32AM +1000, Dave Chinner wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> generic/263 is failing fsx at this point with a page spanning
> EOF that cannot be invalidated. The operations are:
>
> 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes)
> 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes)
> 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes)
>
> where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
> write attempts to invalidate the cached page over this range, it
> fails with -EBUSY and so any attempt to do page invalidation fails.
>
> The real question is this: Why can't that page be invalidated after
> it has been written to disk and cleaned?
>
> Well, there's data on the first two buffers in the page (1k block
> size, 4k page), but the third buffer on the page (i.e. beyond EOF)
> is failing drop_buffers because it's bh->b_state == 0x3, which is
> BH_Uptodate | BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say
> what?
>
> OK, set_buffer_dirty() is called on all buffers from
> __set_page_buffers_dirty(), regardless of whether the buffer is
> beyond EOF or not, which means that when we get to ->writepage,
> we have buffers marked dirty beyond EOF that we need to clean.
> So, we need to implement our own .set_page_dirty method that
> doesn't dirty buffers beyond EOF.
>
> This is messy because the buffer code is not meant to be shared
> and it has interesting locking issues on the buffer dirty bits.
> So just copy and paste it and then modify it to suit what we need.
>
> Note: the solutions the other filesystems and generic block code use
> of marking the buffers clean in ->writepage does not work for XFS.
> It still leaves dirty buffers beyond EOF and invalidations still
> fail. Hence rather than play whack-a-mole, this patch simply
> prevents those buffers from being dirtied in the first place.
>
> cc: <stable@kernel.org>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>
> v2: fix page offset calculation. passed 61 million fsx ops before
> hitting an unrelated problem in xfs_zero_file_space(), so no
> difference to the result with this updated patch.
>
> fs/xfs/xfs_aops.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 11e9b4c..9bd2f53 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1753,11 +1753,69 @@ xfs_vm_readpages(
> return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
> }
>
> +/*
> + * This is basically a copy of __set_page_dirty_buffers() with one
> + * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
> + * dirty, we'll never be able to clean them because we don't write buffers
> + * beyond EOF, and that means we can't invalidate pages that span EOF
> + * that have been marked dirty. Further, the dirty state can leak into
> + * the file interior if the file is extended, resulting in all sorts of
> + * bad things happening as the state does not match the unerlying data.
underlying
I tend to agree with Christoph in that it would be nice if this was
handled generically one way or another. That said, I understand not
wanting to tweak behavior for other filesystems. You mention that the
trajectory for XFS is to kill the use of buffer heads, I suppose that
means this code is hopefully short-lived and probably less likely
subject to problems due to changes in the core code. Given that and the
fact that it looks correct at this point:
Reviewed-by: Brian Foster <bfoster@redhat.com>
Though it would be nice to see a small addition to the comment above to
state that explicitly. E.g., 'XXX this code should die when buffer heads
in XFS die...' or something along those lines... thanks.
Brian
> + */
> +STATIC int
> +xfs_vm_set_page_dirty(
> + struct page *page)
> +{
> + struct address_space *mapping = page->mapping;
> + struct inode *inode = mapping->host;
> + loff_t end_offset;
> + loff_t offset;
> + int newly_dirty;
> +
> + if (unlikely(!mapping))
> + return !TestSetPageDirty(page);
> +
> + end_offset = i_size_read(inode);
> + offset = page_offset(page);
> +
> + spin_lock(&mapping->private_lock);
> + if (page_has_buffers(page)) {
> + struct buffer_head *head = page_buffers(page);
> + struct buffer_head *bh = head;
> +
> + do {
> + if (offset < end_offset)
> + set_buffer_dirty(bh);
> + bh = bh->b_this_page;
> + offset += 1 << inode->i_blkbits;
> + } while (bh != head);
> + }
> + newly_dirty = !TestSetPageDirty(page);
> + spin_unlock(&mapping->private_lock);
> +
> + if (newly_dirty) {
> + /* sigh - __set_page_dirty() is static, so copy it here, too */
> + unsigned long flags;
> +
> + spin_lock_irqsave(&mapping->tree_lock, flags);
> + if (page->mapping) { /* Race with truncate? */
> + WARN_ON_ONCE(!PageUptodate(page));
> + account_page_dirtied(page, mapping);
> + radix_tree_tag_set(&mapping->page_tree,
> + page_index(page), PAGECACHE_TAG_DIRTY);
> + }
> + spin_unlock_irqrestore(&mapping->tree_lock, flags);
> + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> + }
> + return newly_dirty;
> +}
> +
> const struct address_space_operations xfs_address_space_operations = {
> .readpage = xfs_vm_readpage,
> .readpages = xfs_vm_readpages,
> .writepage = xfs_vm_writepage,
> .writepages = xfs_vm_writepages,
> + .set_page_dirty = xfs_vm_set_page_dirty,
> .releasepage = xfs_vm_releasepage,
> .invalidatepage = xfs_vm_invalidatepage,
> .write_begin = xfs_vm_write_begin,
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-29 12:13 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-28 11:49 [PATCH v2 0/7] xfs: invalidation and related fixes for v3.17-rc3 Dave Chinner
2014-08-28 11:49 ` [PATCH 1/7] xfs: don't dirty buffers beyond EOF Dave Chinner
2014-08-28 13:34 ` Brian Foster
2014-08-28 22:37 ` Dave Chinner
2014-08-28 23:49 ` [PATCH 1/7 v2] " Dave Chinner
2014-08-29 12:13 ` Brian Foster [this message]
2014-08-29 0:39 ` [PATCH 1/7] " Christoph Hellwig
2014-08-29 0:53 ` Dave Chinner
2014-08-28 11:49 ` [PATCH 2/7] xfs: don't zero partial page cache pages during O_DIRECT writes Dave Chinner
2014-08-29 0:39 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 3/7] " Dave Chinner
2014-08-29 0:39 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 4/7] xfs: use ranged writeback and invalidation for direct IO Dave Chinner
2014-08-29 0:40 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 5/7] xfs: don't log inode unless extent shift makes extent modifications Dave Chinner
2014-08-29 0:41 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 6/7] xfs: xfs_file_collapse_range is delalloc challenged Dave Chinner
2014-08-29 0:41 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 7/7] xfs: trim eofblocks before collapse range Dave Chinner
2014-08-29 0:42 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140829121300.GA3640@laptop.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.