xlog_grant_head_wait deadlocks on high-rolling transactions?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: xfs <linux-xfs@vger.kernel.org>
Cc: Dave Chinner <david@fromorbit.com>, Brian Foster <bfoster@redhat.com>
Subject: xlog_grant_head_wait deadlocks on high-rolling transactions?
Date: Tue, 12 Mar 2019 11:18:25 -0700	[thread overview]
Message-ID: <20190312181825.GH4359@magnolia> (raw)

Hi all,

Does anyone /else/ occasionally see fstests hang with a hojillion
threads stuck in xlog_grant_head_wait?  I periodically see xfs/347 hang
with a hojillion threads stuck in:

kworker/0:214   D13120 26117      2 0x80000000
Workqueue: xfs-conv/sdf xfs_end_io [xfs]
Call Trace:
 schedule+0x36/0x90
 xlog_grant_head_wait+0x66/0x450 [xfs]
 xlog_grant_head_check+0xf0/0x170 [xfs]
 xfs_log_reserve+0x166/0x500 [xfs]
 xfs_trans_reserve+0x1ac/0x2b0 [xfs]
 xfs_trans_alloc+0xda/0x220 [xfs]
 xfs_reflink_end_cow_extent+0xda/0x3a0 [xfs]
 xfs_reflink_end_cow+0x92/0x2a0 [xfs]
 xfs_end_io+0xd0/0x120 [xfs]
 process_one_work+0x252/0x600
 worker_thread+0x3d/0x390
 kthread+0x11f/0x140
 ret_from_fork+0x24/0x30

Which is the end io worker stalled under xfs_trans_alloc trying to
reserve log space to remap extents from the COW fork to the data fork.
I also observe one thread stuck here:

kworker/0:215   D13120 26118      2 0x80000000
Workqueue: xfs-conv/sdf xfs_end_io [xfs]
Call Trace:
 schedule+0x36/0x90
 xlog_grant_head_wait+0x66/0x450 [xfs]
 xlog_grant_head_check+0xf0/0x170 [xfs]
 xfs_log_regrant+0x155/0x3b0 [xfs]
 xfs_trans_reserve+0xa5/0x2b0 [xfs]
 xfs_trans_roll+0x9c/0x190 [xfs]
 xfs_defer_trans_roll+0x16e/0x5b0 [xfs]
 xfs_defer_finish_noroll+0xf1/0x7e0 [xfs]
 __xfs_trans_commit+0x1c3/0x630 [xfs]
 xfs_reflink_end_cow_extent+0x285/0x3a0 [xfs]
 xfs_reflink_end_cow+0x92/0x2a0 [xfs]
 xfs_end_io+0xd0/0x120 [xfs]
 process_one_work+0x252/0x600
 worker_thread+0x3d/0x390
 kthread+0x11f/0x140
 ret_from_fork+0x24/0x30

This thread is stalled under xfs_trans_roll trying to reserve more log
space because it rolled more times than tr_write.tr_logcount
anticipated.  logcount = 8, but (having added a patch to trace log
tickets that roll more than logcount guessed) we actually roll these
end_cow transactions 10 times.

I think the problem was introduced when we added the deferred AGFL log
item, because the bunmapi of the old data fork extent and the map_extent
of the new extent can both add separate deferred AGFL log items to the
defer chain.  It's also possible that I underestimated
XFS_WRITE_LOG_COUNT_REFLINK way back when.

Either way, the xfs_trans_roll transaction wants (logres) more space,
and the xfs_trans_alloc transactions want (logres * logcount) space.
Unfortunately, the alloc transactions got to the grant waiter list
first, and there's not enough space for them, so the entire list waits.
There seems to be enough space to grant the rolling transaction its
smaller amount of space, so at least in theory that transaction could
finish (and release a lot of space) if it could be bumped to the head of
the waiter list.

Another way to solve this problem of course is to increase tr_logcount
from 8 to 10, though this could cause some user heartburn for small
filesystems because the minimum log size would increase.  However, I'm
not sure about the relative merits of either approach, so I'm kicking
this to the list for further input (while I go have lunch :P)

The second problem I noticed is that the reflink cancel cow and reflink
remap functions follow the pattern of allocating one transaction and
rolling it for every extent it encounters.  This results in /very/ high
roll counts for the transaction, which (on a very busy system with a
smallish log) seems like it could land us right back in this deadlock.
I think the answer is to split those up to run one transaction per
extent (like I did for reflink end_cow), though I'd have to ensure that
we can drop the ILOCK safely to get a new transaction.

Thoughts?

--D

next             reply	other threads:[~2019-03-12 18:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 18:18 Darrick J. Wong [this message]
2019-03-13 17:43 ` xlog_grant_head_wait deadlocks on high-rolling transactions? Brian Foster
2019-03-13 18:43   ` Darrick J. Wong
2019-03-14 22:36     ` Dave Chinner
2019-03-15 11:32       ` Brian Foster
2019-03-17 22:30         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190312181825.GH4359@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox