Re: [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery
Date: Thu, 10 Mar 2022 15:46:50 -0800	[thread overview]
Message-ID: <20220310234650.GJ8224@magnolia> (raw)
In-Reply-To: <20220309015512.2648074-2-david@fromorbit.com>

On Wed, Mar 09, 2022 at 12:55:09PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> After 963 iterations of generic/530, it deadlocked during recovery
> on a pinned inode cluster buffer like so:
> 
> XFS (pmem1): Starting recovery (logdev: internal)
> INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
>       Not tainted 5.17.0-rc6-dgc+ #975
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
> Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
> Call Trace:
>  <TASK>
>  __schedule+0x30d/0x9e0
>  schedule+0x55/0xd0
>  schedule_timeout+0x114/0x160
>  __down+0x99/0xf0
>  down+0x5e/0x70
>  xfs_buf_lock+0x36/0xf0
>  xfs_buf_find+0x418/0x850
>  xfs_buf_get_map+0x47/0x380
>  xfs_buf_read_map+0x54/0x240
>  xfs_trans_read_buf_map+0x1bd/0x490
>  xfs_imap_to_bp+0x4f/0x70
>  xfs_iunlink_map_ino+0x66/0xd0
>  xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
>  xfs_iunlink_remove_inode+0xf2/0x1d0
>  xfs_inactive_ifree+0x1a3/0x900
>  xfs_inode_unlink+0xcc/0x210
>  xfs_inodegc_worker+0x1ac/0x2f0
>  process_one_work+0x1ac/0x390
>  worker_thread+0x56/0x3c0
>  kthread+0xf6/0x120
>  ret_from_fork+0x1f/0x30
>  </TASK>
> task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
> Call Trace:
>  <TASK>
>  __schedule+0x30d/0x9e0
>  schedule+0x55/0xd0
>  schedule_timeout+0x114/0x160
>  __down+0x99/0xf0
>  down+0x5e/0x70
>  xfs_buf_lock+0x36/0xf0
>  xfs_buf_find+0x418/0x850
>  xfs_buf_get_map+0x47/0x380
>  xfs_buf_read_map+0x54/0x240
>  xfs_trans_read_buf_map+0x1bd/0x490
>  xfs_imap_to_bp+0x4f/0x70
>  xfs_iget+0x300/0xb40
>  xlog_recover_process_one_iunlink+0x4c/0x170
>  xlog_recover_process_iunlinks.isra.0+0xee/0x130
>  xlog_recover_finish+0x57/0x110
>  xfs_log_mount_finish+0xfc/0x1e0
>  xfs_mountfs+0x540/0x910
>  xfs_fs_fill_super+0x495/0x850
>  get_tree_bdev+0x171/0x270
>  xfs_fs_get_tree+0x15/0x20
>  vfs_get_tree+0x24/0xc0
>  path_mount+0x304/0xba0
>  __x64_sys_mount+0x108/0x140
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>  </TASK>
> task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
> Call Trace:
>  <TASK>
>  __schedule+0x30d/0x9e0
>  schedule+0x55/0xd0
>  io_schedule+0x4b/0x80
>  xfs_buf_wait_unpin+0x9e/0xf0
>  __xfs_buf_submit+0x14a/0x230
>  xfs_buf_delwri_submit_buffers+0x107/0x280
>  xfs_buf_delwri_submit_nowait+0x10/0x20
>  xfsaild+0x27e/0x9d0
>  kthread+0xf6/0x120
>  ret_from_fork+0x1f/0x30
> 
> We have the mount process waiting on an inode cluster buffer read,
> inodegc doing unlink waiting on the same inode cluster buffer, and
> the AIL push thread blocked in writeback waiting for the inode to
> become unpinned.
> 
> What has happened here is that the AIL push thread has raced with
> the inodegc process modifying, committing and pinning the inode
> cluster buffer here in xfs_buf_delwri_submit_buffers() here:
> 
> 	blk_start_plug(&plug);
> 	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
> 		if (!wait_list) {
> 			if (xfs_buf_ispinned(bp)) {
> 				pinned++;
> 				continue;
> 			}
> Here >>>>>>
> 			if (!xfs_buf_trylock(bp))
> 				continue;
> 
> Basically, the AIL has found the buffer wasn't pinned and got the
> lock without blocking, but then the buffer was pinned. This implies
> the processing here was pre-empted between the pin check and the
> lock, because the pin count can only be increased while holding the
> buffer locked. Hence when it has gone to submit the IO, it has
> blocked waiting for the buffer to be unpinned.
> 
> With all executing threads now waiting on the buffer to be unpinned,
> we normally get out of situations like this via the background log
> worker issuing a log force which will unpinned stuck buffers like
> this. But at this point in recovery, we haven't started the log
> worker. In fact, the first thing we do after processing intents and
> unlinked inodes is *start the log worker*. IOWs, we start it too
> late to have it break deadlocks like this.

Because finishing the intents, processing unlinked inodes, and freeing
dead COW extents are all just regular transactional updates that run
after sorting out the log contents, there's no reason why the log worker
oughtn't be running, right?

> Avoid this and any other similar deadlock vectors in intent and
> unlinked inode recovery by starting the log worker before we recover
> intents and unlinked inodes. This part of recovery runs as though
> the filesystem is fully active, so we really should have the same
> infrastructure running as we normally do at runtime.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 89fec9a18c34..ffd928cf9a9a 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -812,10 +812,9 @@ xfs_log_mount_finish(
>  	 * mount failure occurs.
>  	 */
>  	mp->m_super->s_flags |= SB_ACTIVE;
> +	xfs_log_work_queue(mp);
>  	if (xlog_recovery_needed(log))
>  		error = xlog_recover_finish(log);
> -	if (!error)
> -		xfs_log_work_queue(mp);

I /think/ in the error case, we'll cancel and wait for the worker in
xfs_mountfs -> xfs_log_mount_cancel -> xfs_log_unmount -> xfs_log_clean
-> xfs_log_quiesce, right?

TBH I'd tried to solve these g/530 hangs by making this exact change, so
assuming the answers are {yes, yes}, then:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	mp->m_super->s_flags &= ~SB_ACTIVE;
>  	evict_inodes(mp->m_super);
>  
> -- 
> 2.33.0
>

next prev parent reply	other threads:[~2022-03-10 23:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09  1:55 PATCH [0/4 V2] xfs: log recovery hang fixes Dave Chinner
2022-03-09  1:55 ` [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery Dave Chinner
2022-03-10 23:46   ` Darrick J. Wong [this message]
2022-03-11  0:13     ` Dave Chinner
2022-03-09  1:55 ` [PATCH 2/4] xfs: check buffer pin state after locking in delwri_submit Dave Chinner
2022-03-09  1:55 ` [PATCH 3/4] xfs: xfs_ail_push_all_sync() stalls when racing with updates Dave Chinner
2022-03-09  1:55 ` [PATCH 4/4] xfs: async CIL flushes need pending pushes to be made stable Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220310234650.GJ8224@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox