public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* PATCH [0/4 V2] xfs: log recovery hang fixes
@ 2022-03-09  1:55 Dave Chinner
  2022-03-09  1:55 ` [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Dave Chinner @ 2022-03-09  1:55 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Willy reported generic/530 had started hanging on his test machines
and I've tried to reproduce the problem he reported. While I haven't
reproduced the exact hang he's been having, I've found a couple of
others while running g/530 in a tight loop on a couple of test
machines.

The first 3 patches are defensive fixes - the log worker acts as a
watchdog, and the issues in patch 2 and 3 were triggered on my
testing of g/530 and lead to 30s delays that the log worker watchdog
caught. Without the watchdog, these may actually be deadlock
triggers.

The 4th patch is the one that fixes the problem Willy reported.
It is a regression from conversion of the AIL pushing to use
non-blocking CIL flushes. It is unknown why this suddenly started
showing up on Willy's test machine right now, and why only on that
machine, but it is clearly a problem. This patch catches the state
that leads to the deadlock and breaks it with an immediate log
force to flush any pending iclogs.

Version 2:
- updated to 5.17-rc7
- tested by Willy.

Version 1:
- https://lore.kernel.org/linux-xfs/20220307053252.2534616-1-david@fromorbit.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-11  0:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-09  1:55 PATCH [0/4 V2] xfs: log recovery hang fixes Dave Chinner
2022-03-09  1:55 ` [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery Dave Chinner
2022-03-10 23:46   ` Darrick J. Wong
2022-03-11  0:13     ` Dave Chinner
2022-03-09  1:55 ` [PATCH 2/4] xfs: check buffer pin state after locking in delwri_submit Dave Chinner
2022-03-09  1:55 ` [PATCH 3/4] xfs: xfs_ail_push_all_sync() stalls when racing with updates Dave Chinner
2022-03-09  1:55 ` [PATCH 4/4] xfs: async CIL flushes need pending pushes to be made stable Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox