gfs2 filesystem and dlm development
 help / color / mirror / Atom feed
* [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int
@ 2024-12-02 15:26 Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

The return type of srcu_read_lock() is int and not bool. Whereas we
using the ret variable only to evaluate a bool type of
dlm_lowcomms_con_has_addr() to check if an address is already being set.

Fixes: 6f0b0b5d7ae7 ("fs: dlm: remove dlm_node_addrs lookup list")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index df40c3fd1070..d28141829c05 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -462,7 +462,8 @@ static bool dlm_lowcomms_con_has_addr(const struct connection *con,
 int dlm_lowcomms_addr(int nodeid, struct sockaddr_storage *addr)
 {
 	struct connection *con;
-	bool ret, idx;
+	bool ret;
+	int idx;
 
 	idx = srcu_read_lock(&connections_srcu);
 	con = nodeid2con(nodeid, GFP_NOFS);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer
  2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related " Alexander Aring
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

I figured out we don't cleanup all rsb on toss list right now if all
locks are unlocked in the cluster for a resource. After investigation
I figured out the condition to add_scan() is wrong and I added a comment
for the second important condition for all possible permuations of the
boolean expression.

The current expression:

"(r->res_master_nodeid != our_nodeid &&
  dlm_dir_nodeid(r) != our_nodeid))"

it only covers the case when we are not the master and not the dir node
which is the second case of the added comment in 2. Case 1 and 3 are
forgotten.

Remove also the del_scan() as it always should remove the rsb out of the
scan timer when it's on, if it's not on del_scan() will do nothing.

For find_rsb_nodir() the del_scan() need to be moved before clearing the
RSB_INACTIVE flag as del_scan() is has a WARN_ON() on it.

Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 52 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index fc1d710166e9..a8885cf82913 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -824,8 +824,11 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
 		r->res_first_lkid = 0;
 	}
 
-	/* A dir record will not be on the scan list. */
-	if (r->res_dir_nodeid != our_nodeid)
+	/* A dir record will not be on the scan list.
+	 * Except if we are the master of the rsb.
+	 */
+	if (r->res_master_nodeid == our_nodeid ||
+	    r->res_dir_nodeid != our_nodeid)
 		del_scan(ls, r);
 	list_move(&r->res_slow_list, &ls->ls_slow_active);
 	rsb_clear_flag(r, RSB_INACTIVE);
@@ -989,10 +992,10 @@ static int find_rsb_nodir(struct dlm_ls *ls, const void *name, int len,
 		r->res_nodeid = 0;
 	}
 
+	del_scan(ls, r);
 	list_move(&r->res_slow_list, &ls->ls_slow_active);
 	rsb_clear_flag(r, RSB_INACTIVE);
 	kref_init(&r->res_ref);
-	del_scan(ls, r);
 	write_unlock_bh(&ls->ls_rsbtbl_lock);
 
 	goto out;
@@ -1337,9 +1340,13 @@ static int _dlm_master_lookup(struct dlm_ls *ls, int from_nodeid, const char *na
 	__dlm_master_lookup(ls, r, our_nodeid, from_nodeid, true, flags,
 			    r_nodeid, result);
 
-	/* A dir record rsb should never be on scan list. */
-	/* Try to fix this with del_scan? */
-	WARN_ON(!list_empty(&r->res_scan_list));
+	/* A dir record rsb should never be on scan list.
+	 * Except when we are the dir and master node.
+	 * This function should only be called by the dir
+	 * node.
+	 */
+	WARN_ON(!list_empty(&r->res_scan_list) &&
+		r->res_master_nodeid != our_nodeid);
 
 	write_unlock_bh(&ls->ls_rsbtbl_lock);
 
@@ -1431,15 +1438,32 @@ static void deactivate_rsb(struct kref *kref)
 
 	/*
 	 * When the rsb becomes unused:
-	 * - If it's not a dir record for a remote master rsb,
-	 *   then it is put on the scan list to be freed.
-	 * - If it's a dir record for a remote master rsb,
-	 *   then it is kept in the inactive state until
-	 *   receive_remove() from the master node.
+	 * - It's on scan list when there is no directory
+	 *   node functionality
+	 * or
+	 * - First case is if we are the rsb master we need to
+	 *   call send_remove() to the dir node and delete
+	 *   ourself.
+	 *
+	 *   The second case is if we are not the dir node we
+	 *   need to delete ourself as well but don't call
+	 *   send_remove() as we are not the master.
+	 *
+	 *   The thrid case when the node is master and
+	 *   dir node at the same time. Then the rsb need to be
+	 *   added to the scan timer as well because the scan
+	 *   acts like a local send_remove() that is done
+	 *   immediately.
+	 *
+	 *   The fourth case is the one we need to avoid here,
+	 *   when we are the directory node and not the master.
+	 *   This is the inverse of the condition below.
+	 *   Then we never should be on the scan timer and
+	 *   waiting of receive_remove() of the first case.
 	 */
-	if (!dlm_no_directory(ls) &&
-	    (r->res_master_nodeid != our_nodeid) &&
-	    (dlm_dir_nodeid(r) != our_nodeid))
+	if (dlm_no_directory(ls) ||
+	    (r->res_master_nodeid == our_nodeid ||
+	     dlm_dir_nodeid(r) != our_nodeid))
 		add_scan(ls, r);
 
 	if (r->res_lvbptr) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related scan timer
  2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

This patch adds more sanity checks if an invalid rsb structure is on the
scan timer which should never be on the timer because it is waiting on a
remove messages instead of a timeout of being unused.

The function del_scan() should therefore never be called on certain rsbs
that puts the rsb on the scan list for the scan timer. In case of a
find_rsb_dir() rsbs that are not the master and the dir record should
never be on the scan list as well as they waiting for a remove message.
We just always check on a lookup if there might be other buggy places
that accidentially moved those rsbs on the scan list.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index a8885cf82913..a834c270f24e 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -426,8 +426,8 @@ static void del_scan(struct dlm_ls *ls, struct dlm_rsb *r)
 	spin_lock_bh(&ls->ls_scan_lock);
 	r->res_toss_time = 0;
 
-	/* if the rsb is not queued do nothing */
-	if (list_empty(&r->res_scan_list))
+	/* if the rsb is not queued do nothing, should never be the case */
+	if (WARN_ON(list_empty(&r->res_scan_list)))
 		goto out;
 
 	/* get the first element before delete */
@@ -830,6 +830,8 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
 	if (r->res_master_nodeid == our_nodeid ||
 	    r->res_dir_nodeid != our_nodeid)
 		del_scan(ls, r);
+	else
+		WARN_ON(!list_empty(&r->res_scan_list));
 	list_move(&r->res_slow_list, &ls->ls_slow_active);
 	rsb_clear_flag(r, RSB_INACTIVE);
 	kref_init(&r->res_ref); /* ref is now used in active state */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6.13-rc1 4/5] dlm: make sense out of force values
  2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related " Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
  2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

Currently users e.g. gfs2 calling lockspace_busy() with 2 which seems
not to be specified but it is specified as the rest of force value
meanings besides 0 and 1 which means: do check nothing. That makes no
sense as we can never introduce any new force value to have a specific
new meaning for it.

We just switch now to have 2 as meaning as do check on nothing and the
other values will return -EINVAL if somebody calls
dlm_release_lockspace() with them. This might break API but it totally
makes no sense to not use the other values of the force parameter.

All kernel users, gfs2 and md-cluster uses 2 as force parameter which
has the meaning: do check on nothing and let recovery deal with it.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lockspace.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 8afac6e2dff0..1a216a8b6ca2 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -691,8 +691,10 @@ static int lockspace_busy(struct dlm_ls *ls, int force)
 				break;
 			}
 		}
-	} else {
+	} else if (force == 2) {
 		rv = 0;
+	} else {
+		rv = -EINVAL;
 	}
 	read_unlock_bh(&ls->ls_lkbxa_lock);
 	return rv;
@@ -703,6 +705,8 @@ static int release_lockspace(struct dlm_ls *ls, int force)
 	int busy, rv;
 
 	busy = lockspace_busy(ls, force);
+	if (busy < 0)
+		return busy;
 
 	spin_lock_bh(&lslist_lock);
 	if (ls->ls_create_count == 1) {
@@ -730,7 +734,7 @@ static int release_lockspace(struct dlm_ls *ls, int force)
 
 	dlm_device_deregister(ls);
 
-	if (force < 3 && dlm_user_daemon_available())
+	if (dlm_user_daemon_available())
 		do_uevent(ls, 0);
 
 	dlm_recoverd_stop(ls);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found
  2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
                   ` (2 preceding siblings ...)
  2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

Currently if no comm can be found dlm_comm_seq() returns -EEXIST which
means entry already exists for a lookup it makes no sense to return
-EEXIST. We change it to -ENOENT. There is no user that will evaluate
the return value on a specific value so this should be fine.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index b2f21aa00719..cf9ba6fd7a28 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -935,7 +935,7 @@ int dlm_comm_seq(int nodeid, uint32_t *seq, bool locked)
 		mutex_unlock(&clusters_root.subsys.su_mutex);
 	}
 	if (!cm)
-		return -EEXIST;
+		return -ENOENT;
 
 	*seq = cm->seq;
 	put_comm(cm);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-02 15:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related " Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox