cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH v5.19-rc1 0/7] fs: dlm: recovery error handling
@ 2022-06-10 17:06 Alexander Aring
  2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 1/7] fs: dlm: add notes for recovery and membership handling Alexander Aring
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Alexander Aring @ 2022-06-10 17:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

I have these patches laying around a long time... and it's maybe time to
bring them up. It does the three changes in dlm recovery handling:

1.

The dlm_lsop_recover_prep() callback should be called once after the
lockspace is stopped and not if it's already stopped when the recovery
is running. 

It will change possible:

dlm_lsop_recover_prep()
...
dlm_lsop_recover_prep()
dlm_lsop_recover_done()

to only have one possible prep call:

dlm_lsop_recover_prep()
dlm_lsop_recover_done()

2.

If a new_lockspace() is created we wait until a point when members are
successful pinged, then new_lockspace() returns to the caller. However
the recovery might be still running. Mostly all users of dlm will
workaround this with a dlm_lsop_recover_done() call wait to know the dlm
lockspace can be used now. This should be backwards compatible with the
existing dlm users, however they can drop their handling if they want.

3.

There exists two ways how recovery can be triggered. Either somebody called
new_lockspace(), that means a waiter waits until recovery is done. Or it
is a complete async process e.g. nodes joining/leaving the lockspace.
There is no caller in the async case which waits for dlm recovery is done,
therefore there exists no error handling which reacts on possible recovery
errors. This patch series will introduce a "best effort" approach to simple
retry/schedule() the recovery on error and hope the error gets resolved.
If this is not the case in 5 retries panic() will fence the node.

- Alex

Alexander Aring (7):
  fs: dlm: add notes for recovery and membership handling
  fs: dlm: call dlm_lsop_recover_prep once
  fs: dlm: let new_lockspace() wait until recovery
  fs: dlm: handle recovery result outside of ls_recover
  fs: dlm: handle recovery -EAGAIN case as retry
  fs: dlm: change -EINVAL recovery error to -EAGAIN
  fs: dlm: add WARN_ON for non waiter case

 fs/dlm/dlm_internal.h |  4 +--
 fs/dlm/lock.c         |  5 +++-
 fs/dlm/lockspace.c    |  9 ++++---
 fs/dlm/member.c       | 30 +++++++++++-----------
 fs/dlm/recoverd.c     | 60 ++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 82 insertions(+), 26 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-14 17:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-10 17:06 [Cluster-devel] [PATCH v5.19-rc1 0/7] fs: dlm: recovery error handling Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 1/7] fs: dlm: add notes for recovery and membership handling Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 2/7] fs: dlm: call dlm_lsop_recover_prep once Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 3/7] fs: dlm: let new_lockspace() wait until recovery Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 4/7] fs: dlm: handle recovery result outside of ls_recover Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 5/7] fs: dlm: handle recovery -EAGAIN case as retry Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 6/7] fs: dlm: change -EINVAL recovery error to -EAGAIN Alexander Aring
2022-06-14 14:54   ` Alexander Aring
2022-06-10 17:06 ` [Cluster-devel] [PATCH v5.19-rc1 7/7] fs: dlm: add WARN_ON for non waiter case Alexander Aring
2022-06-14 17:59   ` Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).