From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 1/7 v2] xfs: don't dirty buffers beyond EOF
Date: Fri, 29 Aug 2014 08:13:00 -0400 [thread overview]
Message-ID: <20140829121300.GA3640@laptop.bfoster> (raw)
In-Reply-To: <20140828234932.GW20518@dastard>
On Fri, Aug 29, 2014 at 09:49:32AM +1000, Dave Chinner wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> generic/263 is failing fsx at this point with a page spanning
> EOF that cannot be invalidated. The operations are:
>
> 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes)
> 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes)
> 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes)
>
> where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
> write attempts to invalidate the cached page over this range, it
> fails with -EBUSY and so any attempt to do page invalidation fails.
>
> The real question is this: Why can't that page be invalidated after
> it has been written to disk and cleaned?
>
> Well, there's data on the first two buffers in the page (1k block
> size, 4k page), but the third buffer on the page (i.e. beyond EOF)
> is failing drop_buffers because it's bh->b_state == 0x3, which is
> BH_Uptodate | BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say
> what?
>
> OK, set_buffer_dirty() is called on all buffers from
> __set_page_buffers_dirty(), regardless of whether the buffer is
> beyond EOF or not, which means that when we get to ->writepage,
> we have buffers marked dirty beyond EOF that we need to clean.
> So, we need to implement our own .set_page_dirty method that
> doesn't dirty buffers beyond EOF.
>
> This is messy because the buffer code is not meant to be shared
> and it has interesting locking issues on the buffer dirty bits.
> So just copy and paste it and then modify it to suit what we need.
>
> Note: the solutions the other filesystems and generic block code use
> of marking the buffers clean in ->writepage does not work for XFS.
> It still leaves dirty buffers beyond EOF and invalidations still
> fail. Hence rather than play whack-a-mole, this patch simply
> prevents those buffers from being dirtied in the first place.
>
> cc: <stable@kernel.org>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>
> v2: fix page offset calculation. passed 61 million fsx ops before
> hitting an unrelated problem in xfs_zero_file_space(), so no
> difference to the result with this updated patch.
>
> fs/xfs/xfs_aops.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 11e9b4c..9bd2f53 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1753,11 +1753,69 @@ xfs_vm_readpages(
> return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
> }
>
> +/*
> + * This is basically a copy of __set_page_dirty_buffers() with one
> + * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
> + * dirty, we'll never be able to clean them because we don't write buffers
> + * beyond EOF, and that means we can't invalidate pages that span EOF
> + * that have been marked dirty. Further, the dirty state can leak into
> + * the file interior if the file is extended, resulting in all sorts of
> + * bad things happening as the state does not match the unerlying data.
underlying
I tend to agree with Christoph in that it would be nice if this was
handled generically one way or another. That said, I understand not
wanting to tweak behavior for other filesystems. You mention that the
trajectory for XFS is to kill the use of buffer heads, I suppose that
means this code is hopefully short-lived and probably less likely
subject to problems due to changes in the core code. Given that and the
fact that it looks correct at this point:
Reviewed-by: Brian Foster <bfoster@redhat.com>
Though it would be nice to see a small addition to the comment above to
state that explicitly. E.g., 'XXX this code should die when buffer heads
in XFS die...' or something along those lines... thanks.
Brian
> + */
> +STATIC int
> +xfs_vm_set_page_dirty(
> + struct page *page)
> +{
> + struct address_space *mapping = page->mapping;
> + struct inode *inode = mapping->host;
> + loff_t end_offset;
> + loff_t offset;
> + int newly_dirty;
> +
> + if (unlikely(!mapping))
> + return !TestSetPageDirty(page);
> +
> + end_offset = i_size_read(inode);
> + offset = page_offset(page);
> +
> + spin_lock(&mapping->private_lock);
> + if (page_has_buffers(page)) {
> + struct buffer_head *head = page_buffers(page);
> + struct buffer_head *bh = head;
> +
> + do {
> + if (offset < end_offset)
> + set_buffer_dirty(bh);
> + bh = bh->b_this_page;
> + offset += 1 << inode->i_blkbits;
> + } while (bh != head);
> + }
> + newly_dirty = !TestSetPageDirty(page);
> + spin_unlock(&mapping->private_lock);
> +
> + if (newly_dirty) {
> + /* sigh - __set_page_dirty() is static, so copy it here, too */
> + unsigned long flags;
> +
> + spin_lock_irqsave(&mapping->tree_lock, flags);
> + if (page->mapping) { /* Race with truncate? */
> + WARN_ON_ONCE(!PageUptodate(page));
> + account_page_dirtied(page, mapping);
> + radix_tree_tag_set(&mapping->page_tree,
> + page_index(page), PAGECACHE_TAG_DIRTY);
> + }
> + spin_unlock_irqrestore(&mapping->tree_lock, flags);
> + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> + }
> + return newly_dirty;
> +}
> +
> const struct address_space_operations xfs_address_space_operations = {
> .readpage = xfs_vm_readpage,
> .readpages = xfs_vm_readpages,
> .writepage = xfs_vm_writepage,
> .writepages = xfs_vm_writepages,
> + .set_page_dirty = xfs_vm_set_page_dirty,
> .releasepage = xfs_vm_releasepage,
> .invalidatepage = xfs_vm_invalidatepage,
> .write_begin = xfs_vm_write_begin,
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-29 12:13 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-28 11:49 [PATCH v2 0/7] xfs: invalidation and related fixes for v3.17-rc3 Dave Chinner
2014-08-28 11:49 ` [PATCH 1/7] xfs: don't dirty buffers beyond EOF Dave Chinner
2014-08-28 13:34 ` Brian Foster
2014-08-28 22:37 ` Dave Chinner
2014-08-28 23:49 ` [PATCH 1/7 v2] " Dave Chinner
2014-08-29 12:13 ` Brian Foster [this message]
2014-08-29 0:39 ` [PATCH 1/7] " Christoph Hellwig
2014-08-29 0:53 ` Dave Chinner
2014-08-28 11:49 ` [PATCH 2/7] xfs: don't zero partial page cache pages during O_DIRECT writes Dave Chinner
2014-08-29 0:39 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 3/7] " Dave Chinner
2014-08-29 0:39 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 4/7] xfs: use ranged writeback and invalidation for direct IO Dave Chinner
2014-08-29 0:40 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 5/7] xfs: don't log inode unless extent shift makes extent modifications Dave Chinner
2014-08-29 0:41 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 6/7] xfs: xfs_file_collapse_range is delalloc challenged Dave Chinner
2014-08-29 0:41 ` Christoph Hellwig
2014-08-28 11:49 ` [PATCH 7/7] xfs: trim eofblocks before collapse range Dave Chinner
2014-08-29 0:42 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140829121300.GA3640@laptop.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox