From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Aring Date: Wed, 22 Jun 2022 14:45:13 -0400 Subject: [Cluster-devel] [PATCH RESEND v5.19-rc3 10/20] fs: dlm: add notes for recovery and membership handling In-Reply-To: <20220622184523.1886869-1-aahringo@redhat.com> References: <20220622184523.1886869-1-aahringo@redhat.com> Message-ID: <20220622184523.1886869-11-aahringo@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch adds some comment sections to make aware that the ls_recover() function should never fail before membership handling. Membership handling means to add/remove nodes from the lockspace ls_nodes attribute in dlm_recover_members(). This is because there are functionality like dlm_midcomms_add_member(), dlm_midcomms_remove_member() or dlm_lsop_recover_slot() which should always get aware of any join or leave of lockspace members. If we add a e.g. dlm_locking_stopped() before dlm_recover_members() to check if the recovery was interrupted and abort it we might skip to call dlm_midcomms_add_member(), dlm_midcomms_remove_member() or dlm_lsop_recover_slot(). A reason because the recovery is interrupted could be that the cluster manager notified about a new configuration .e.g. members joined or leaved. It is fine to interrupt or fail the recovery handling after the mentioned handling of dlm_recover_members() but never before. Signed-off-by: Alexander Aring --- fs/dlm/member.c | 6 +++++- fs/dlm/recoverd.c | 4 ++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 98084e0cfccf..7e5f5aefefb5 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -534,7 +534,11 @@ int dlm_recover_members(struct dlm_ls *ls, struct dlm_recover *rv, int *neg_out) int i, error, neg = 0, low = -1; /* previously removed members that we've not finished removing need to - count as a negative change so the "neg" recovery steps will happen */ + * count as a negative change so the "neg" recovery steps will happen + * + * This functionality must report all member changes to lsops or + * midcomms layer and must never return before. + */ list_for_each_entry(memb, &ls->ls_nodes_gone, list) { log_rinfo(ls, "prev removed member %d", memb->nodeid); diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c index a55dfce705dd..b5b519cde20b 100644 --- a/fs/dlm/recoverd.c +++ b/fs/dlm/recoverd.c @@ -70,6 +70,10 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv) /* * Add or remove nodes from the lockspace's ls_nodes list. + * + * Due the fact we must report all membership changes to lsops or + * midcomms layer it is not permitted to abort ls_recover() until + * this is done. */ error = dlm_recover_members(ls, rv, &neg); -- 2.31.1