* [Cluster-devel] [PATCH AUTOSEL 5.14 51/99] fs: dlm: fix return -EINTR on recovery stopped
[not found] <20210910001558.173296-1-sashal@kernel.org>
@ 2021-09-10 0:15 ` Sasha Levin
2021-09-10 0:15 ` [Cluster-devel] [PATCH AUTOSEL 5.14 86/99] fs: dlm: avoid comms shutdown delay in release_lockspace Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2021-09-10 0:15 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Alexander Aring <aahringo@redhat.com>
[ Upstream commit aee742c9928ab4f5f4e0b00f41fb2d2cffae179e ]
This patch will return -EINTR instead of 1 if recovery is stopped. In
case of ping_members() the return value will be checked if the error is
-EINTR for signaling another recovery was triggered and the whole
recovery process will come to a clean end to process the next one.
Returning 1 will abort the recovery process and can leave the recovery
in a broken state.
It was reported with the following kernel log message attached and a gfs2
mount stopped working:
"dlm: bobvirt1: dlm_recover_members error 1"
whereas 1 was returned because of a conversion of "dlm_recovery_stopped()"
to an errno was missing which this patch will introduce. While on it all
other possible missing errno conversions at other places were added as
they are done as in other places.
It might be worth to check the error case at this recovery level,
because some of the functionality also returns -ENOBUFS and check why
recovery ends in a broken state. However this will fix the issue if
another recovery was triggered at some points of recovery handling.
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/dlm/dir.c | 4 +++-
fs/dlm/member.c | 4 +++-
fs/dlm/recoverd.c | 4 +++-
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/dlm/dir.c b/fs/dlm/dir.c
index 10c36ae1a8f9..45ebbe602bbf 100644
--- a/fs/dlm/dir.c
+++ b/fs/dlm/dir.c
@@ -85,8 +85,10 @@ int dlm_recover_directory(struct dlm_ls *ls)
for (;;) {
int left;
error = dlm_recovery_stopped(ls);
- if (error)
+ if (error) {
+ error = -EINTR;
goto out_free;
+ }
error = dlm_rcom_names(ls, memb->nodeid,
last_name, last_len);
diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index d9e1e4170eb1..731d489aa323 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -443,8 +443,10 @@ static int ping_members(struct dlm_ls *ls)
list_for_each_entry(memb, &ls->ls_nodes, list) {
error = dlm_recovery_stopped(ls);
- if (error)
+ if (error) {
+ error = -EINTR;
break;
+ }
error = dlm_rcom_status(ls, memb->nodeid, 0);
if (error)
break;
diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index 85e245392715..97d052cea5a9 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -125,8 +125,10 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
dlm_recover_waiters_pre(ls);
error = dlm_recovery_stopped(ls);
- if (error)
+ if (error) {
+ error = -EINTR;
goto fail;
+ }
if (neg || dlm_no_directory(ls)) {
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [Cluster-devel] [PATCH AUTOSEL 5.14 86/99] fs: dlm: avoid comms shutdown delay in release_lockspace
[not found] <20210910001558.173296-1-sashal@kernel.org>
2021-09-10 0:15 ` [Cluster-devel] [PATCH AUTOSEL 5.14 51/99] fs: dlm: fix return -EINTR on recovery stopped Sasha Levin
@ 2021-09-10 0:15 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2021-09-10 0:15 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Alexander Aring <aahringo@redhat.com>
[ Upstream commit ecd95673142ef80169a6c003b569b8a86d1e6329 ]
When dlm_release_lockspace does active shutdown on connections to
other nodes, the active shutdown will wait for any exisitng passive
shutdowns to be resolved. But, the sequence of operations during
dlm_release_lockspace can prevent the normal resolution of passive
shutdowns (processed normally by way of lockspace recovery.)
This disruption of passive shutdown handling can cause the active
shutdown to wait for a full timeout period, delaying the completion
of dlm_release_lockspace.
To fix this, make dlm_release_lockspace resolve existing passive
shutdowns (by calling dlm_clear_members earlier), before it does
active shutdowns. The active shutdowns will not find any passive
shutdowns to wait for, and will not be delayed.
Reported-by: Chris Mackowski <cmackows@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/dlm/lockspace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index d71aba8c3e64..e51ae83fc5b4 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -793,6 +793,7 @@ static int release_lockspace(struct dlm_ls *ls, int force)
if (ls_count == 1) {
dlm_scand_stop();
+ dlm_clear_members(ls);
dlm_midcomms_shutdown();
}
--
2.30.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-09-10 0:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210910001558.173296-1-sashal@kernel.org>
2021-09-10 0:15 ` [Cluster-devel] [PATCH AUTOSEL 5.14 51/99] fs: dlm: fix return -EINTR on recovery stopped Sasha Levin
2021-09-10 0:15 ` [Cluster-devel] [PATCH AUTOSEL 5.14 86/99] fs: dlm: avoid comms shutdown delay in release_lockspace Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).