From: Eryu Guan <eguan@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: recheck reflink / dirty page status before freeing CoW reservations
Date: Mon, 15 Jan 2018 14:36:05 +0800 [thread overview]
Message-ID: <20180115063605.GC3102@eguan.usersys.redhat.com> (raw)
In-Reply-To: <20180112033231.GM5123@eguan.usersys.redhat.com>
On Fri, Jan 12, 2018 at 11:32:31AM +0800, Eryu Guan wrote:
> On Thu, Jan 11, 2018 at 03:54:41PM +0800, Eryu Guan wrote:
> > On Wed, Jan 10, 2018 at 02:03:36PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > Eryu Guan reported seeing occasional hangs when running generic/269 with
> > > a new fsstress that supports clonerange/deduperange. The cause of this
> > > hang is an infinite loop when we convert the CoW fork extents from
> > > unwritten to real just prior to writing the pages out; the infinite
> > > loop happens because there's nothing in the CoW fork to convert, and so
> > > it spins forever.
> > >
> > > The underlying issue here is that when we go to perform these CoW fork
> > > conversions, we're supposed to have an extent waiting for us, but the
> > > low space CoW reaper has snuck in and blown them away! There are four
> > > conditions that can dissuade the reaper from touching our file -- no
> > > reflink iflag; dirty page cache; writeback in progress; or directio in
> > > progress. We check the four conditions prior to taking the locks, but
> > > we neglect to recheck them once we have the locks, which is how we end
> > > up whacking the writeback that's in progress.
> > >
> > > Therefore, refactor the four checks into a helper function and call it
> > > once again once we have the locks to make sure we really want to reap
> > > the inode. While we're at it, add an ASSERT for this weird condition so
> > > that we'll fail noisily if we ever screw this up again.
> > >
> > > Reported-by: Eryu Guan <eguan@redhat.com>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > I applied this patch on top of v4.15-rc5 kernel, and ran generic/083
> > generic/269 and generic/270 (where I hit the soft lockup and hang before)
> > multiple times and tests all passed. I also ran all tests in 'enospc'
> > group on 1k/2k/4k XFS with reflink enabled, tests passed too. So
> >
> > Tested-by: Eryu Guan <eguan@redhat.com>
>
> Sorry, I have to withdraw this tag for now.. I'm seeing soft lockup
> again in generic/269 run with the patched kernel. I'll do more testings
> to confirm, paste the soft lockup info here for now:
I ran generic/269 for over 4000 iterations and didn't hit soft lockup, I
suspect that previously I tested on wrong/unpatched xfs module..
But occationally I saw fs inconsistency in generic/269, it's hard to
reproduce (need 100-200 iterations) but I did see it several times. But
it seems like another problem.
_check_xfs_filesystem: filesystem on /dev/sda6 is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
sb_fdblocks 8178, counted 8188
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 3
- agno = 2
- agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output
Thanks,
Eryu
next prev parent reply other threads:[~2018-01-15 6:36 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-10 22:03 [PATCH] xfs: recheck reflink / dirty page status before freeing CoW reservations Darrick J. Wong 2018-01-11 7:54 ` Eryu Guan 2018-01-12 3:32 ` Eryu Guan 2018-01-15 6:36 ` Eryu Guan [this message] 2018-01-15 20:08 ` Darrick J. Wong 2018-01-11 12:04 ` Brian Foster 2018-01-11 17:40 ` Darrick J. Wong 2018-01-11 19:38 ` Brian Foster 2018-01-11 20:32 ` Darrick J. Wong 2018-01-17 1:18 ` [PATCH v2] " Darrick J. Wong 2018-01-17 12:56 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180115063605.GC3102@eguan.usersys.redhat.com \
--to=eguan@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox