From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v2 0/5] clean up collapse range and handle post-eof delalloc
Date: Fri, 12 Sep 2014 15:51:29 -0400 [thread overview]
Message-ID: <20140912195128.GA42029@bfoster.bfoster> (raw)
In-Reply-To: <20140911211915.GA4322@dastard>
On Fri, Sep 12, 2014 at 07:19:15AM +1000, Dave Chinner wrote:
> On Thu, Sep 11, 2014 at 11:20:13AM -0400, Brian Foster wrote:
> > On Thu, Sep 11, 2014 at 02:42:43PM +1000, Dave Chinner wrote:
> > > On Wed, Sep 10, 2014 at 09:20:26AM -0400, Brian Foster wrote:
> > > > Hi all,
> > > >
> > > > Here's v2 of the collapse clean up. We refactor a bit more via the
> > > > insertion of patch 3, otherwise it's similar to v1. This will see some
> > > > continued testing, but it survived ~500m fsx operations overnight.
> > > >
> > > > Brian
> > >
> > > I'm not sure about the invalidation patch now. On a 1k block size
> > > filesystem, generic/127 fails with:
> > >
> > > +ltp/fsx -q -l 262144 -o 65536 -S 191110531 -N 100000 -R -W fsx_std_nommap
> > > +collapse range: 1000 to 3000
> > > +do_collapse_range: fallocate: Device or resource busy
> > >
> > > which indicates we had an invalidation failure. This is probably
> > > exposing some other bug, but I haven't had time to look into it yet
> > > so I don't know.
> > >
> >
> > Yeah, I can reproduce this as well, thanks. I think you're referring to
> > the xfs_free_file_space() patch (5/5)..?
>
> *nod*
>
> > FWIW, I don't see the problem
> > without that patch, so it appears that the full pagecache truncate is
> > still covering up a problem somewhere. I'll try to dig into it...
>
> It's likely that it is leaving a dirty buffer on the page beyond EOF
> as a result of the truncate zeroing the remainder of the page in
> memory.
>
> If I get a chance I'll look at it this afternoon, as this patchset
> also seems to be causing a marked increase in the number of fsstress
> failures due to stray delalloc blocks in files even on 4k block size
> filesystems (e.g. at unmount).
>
I think I have a bit of an idea of what's going on here. I'm not sure
that this one is post-eof delalloc related. For reference, here's the
command that reproduces for me 100% of the time on a 1k block fs
(derived from the fsx trace):
xfs_io -fc "pwrite 185332 55756" \
-c "fcollapse 28672 40960" \
-c "pwrite 133228 63394" \
-c "fcollapse 0 4096" /mnt/file
The first write extends the file to 241088 bytes and the first collapse
targets a hole, but shrinks the file down to 200128 bytes. The last page
offset of the file is now 196608 and with 1k blocks, eof falls within
the last block of this page (e.g., within bh offsets 199680-200704). The
second write falls just into the first block of this page.
What I see occur on the final collapse is that the
invalidate_inode_pages2_range() call in the collapse op fails due to
finding a dirty page at offset 196608. We don't have a launder_page()
callback, so this results in -EBUSY.
Given the above, it seems like the filemap_write_and_wait_range() call
should handle this. Digging further, what I see is one or two
writepage()->xfs_cluster_write() sequences along the range of the
previous write. xfs_convert_page() eventually attempts to handle page
offset 196608 and breaks out at offset 197632 (the second bh in the
page) near the top of the bh loop:
...
if (!(PageUptodate(page) || buffer_uptodate(bh))) {
done = 1;
break;
}
...
I suspect the reason the page/buffer is in this particular state is due
to the previous invalidation (collapse 1), which would have removed the
page from the mapping. This seems reasonable to me since we only wrote
into the first bytes of the page since the prior invalidation. The problem is
that the page somehow has the following buffer_head state at the point of the
EBUSY inval failure:
invalidate_inode_pages2_range(648): state 0x29 block 84 size 1024
invalidate_inode_pages2_range(648): state 0x0 block 18446744073709551615 size 1024
invalidate_inode_pages2_range(648): state 0x0 block 18446744073709551615 size 1024
invalidate_inode_pages2_range(648): state 0x2b block 87 size 1024
The first block is BH_Uptodate|BH_Req|BH_Mapped. I read the next two as
simply not up to date..? The last block is the same as the first plus
BH_Dirty. This last block is offset 199680 and the previous write goes
to 196622, so I'm not sure how this ends up dirty. I think the write
path to this range of the file might be the spot to dig next...
Brian
P.S., As a datapoint from experimentation, this problem doesn't occur if
I ensure that writepage() handles this particular page rather than
xfs_convert_page(). I can do that by either jumping out of
xfs_convert_page() sooner, before the page is set for writeback, or
hacking xfs_start_page_writeback() to use __test_set_page_writeback()
and keep the write tag such that write_cache_pages() can still find the
page. This calls out an interesting bit of behavior in
xfs_convert_page() since commit a49935f2: if we handle a part of a
page, break and mark the page for writeback without handling the rest,
we'll still clear the PAGECACHE_TAG_TOWRITE tag and writeback won't
finish off the page during the current iteration (for WB_SYNC_ALL).
It's not clear to me if this is contributing to the problem in this
particular case, but it seems like an independent bug at the very
least. Thoughts?
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-09-12 19:51 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-10 13:20 [PATCH v2 0/5] clean up collapse range and handle post-eof delalloc Brian Foster
2014-09-10 13:20 ` [PATCH v2 1/5] xfs: track collapse via file offset rather than extent index Brian Foster
2014-09-10 13:20 ` [PATCH v2 2/5] xfs: refactor shift-by-merge into xfs_bmse_merge() helper Brian Foster
2014-09-10 13:20 ` [PATCH v2 3/5] xfs: refactor single extent shift into xfs_bmse_shift_one() helper Brian Foster
2014-09-10 13:20 ` [PATCH v2 4/5] xfs: writeback and inval. file range to be shifted by collapse Brian Foster
2014-09-10 13:20 ` [PATCH v2 5/5] xfs: only writeback and truncate pages for the freed range Brian Foster
2014-09-11 4:42 ` [PATCH v2 0/5] clean up collapse range and handle post-eof delalloc Dave Chinner
2014-09-11 15:20 ` Brian Foster
2014-09-11 21:19 ` Dave Chinner
2014-09-12 19:51 ` Brian Foster [this message]
2014-09-12 20:05 ` Brian Foster
2014-09-15 1:46 ` Dave Chinner
2014-09-15 13:18 ` Brian Foster
2014-09-15 22:55 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140912195128.GA42029@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox