* [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related " Alexander Aring
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
To: teigland; +Cc: gfs2, aahringo
I figured out we don't cleanup all rsb on toss list right now if all
locks are unlocked in the cluster for a resource. After investigation
I figured out the condition to add_scan() is wrong and I added a comment
for the second important condition for all possible permuations of the
boolean expression.
The current expression:
"(r->res_master_nodeid != our_nodeid &&
dlm_dir_nodeid(r) != our_nodeid))"
it only covers the case when we are not the master and not the dir node
which is the second case of the added comment in 2. Case 1 and 3 are
forgotten.
Remove also the del_scan() as it always should remove the rsb out of the
scan timer when it's on, if it's not on del_scan() will do nothing.
For find_rsb_nodir() the del_scan() need to be moved before clearing the
RSB_INACTIVE flag as del_scan() is has a WARN_ON() on it.
Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lock.c | 52 +++++++++++++++++++++++++++++++++++++--------------
1 file changed, 38 insertions(+), 14 deletions(-)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index fc1d710166e9..a8885cf82913 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -824,8 +824,11 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
r->res_first_lkid = 0;
}
- /* A dir record will not be on the scan list. */
- if (r->res_dir_nodeid != our_nodeid)
+ /* A dir record will not be on the scan list.
+ * Except if we are the master of the rsb.
+ */
+ if (r->res_master_nodeid == our_nodeid ||
+ r->res_dir_nodeid != our_nodeid)
del_scan(ls, r);
list_move(&r->res_slow_list, &ls->ls_slow_active);
rsb_clear_flag(r, RSB_INACTIVE);
@@ -989,10 +992,10 @@ static int find_rsb_nodir(struct dlm_ls *ls, const void *name, int len,
r->res_nodeid = 0;
}
+ del_scan(ls, r);
list_move(&r->res_slow_list, &ls->ls_slow_active);
rsb_clear_flag(r, RSB_INACTIVE);
kref_init(&r->res_ref);
- del_scan(ls, r);
write_unlock_bh(&ls->ls_rsbtbl_lock);
goto out;
@@ -1337,9 +1340,13 @@ static int _dlm_master_lookup(struct dlm_ls *ls, int from_nodeid, const char *na
__dlm_master_lookup(ls, r, our_nodeid, from_nodeid, true, flags,
r_nodeid, result);
- /* A dir record rsb should never be on scan list. */
- /* Try to fix this with del_scan? */
- WARN_ON(!list_empty(&r->res_scan_list));
+ /* A dir record rsb should never be on scan list.
+ * Except when we are the dir and master node.
+ * This function should only be called by the dir
+ * node.
+ */
+ WARN_ON(!list_empty(&r->res_scan_list) &&
+ r->res_master_nodeid != our_nodeid);
write_unlock_bh(&ls->ls_rsbtbl_lock);
@@ -1431,15 +1438,32 @@ static void deactivate_rsb(struct kref *kref)
/*
* When the rsb becomes unused:
- * - If it's not a dir record for a remote master rsb,
- * then it is put on the scan list to be freed.
- * - If it's a dir record for a remote master rsb,
- * then it is kept in the inactive state until
- * receive_remove() from the master node.
+ * - It's on scan list when there is no directory
+ * node functionality
+ * or
+ * - First case is if we are the rsb master we need to
+ * call send_remove() to the dir node and delete
+ * ourself.
+ *
+ * The second case is if we are not the dir node we
+ * need to delete ourself as well but don't call
+ * send_remove() as we are not the master.
+ *
+ * The thrid case when the node is master and
+ * dir node at the same time. Then the rsb need to be
+ * added to the scan timer as well because the scan
+ * acts like a local send_remove() that is done
+ * immediately.
+ *
+ * The fourth case is the one we need to avoid here,
+ * when we are the directory node and not the master.
+ * This is the inverse of the condition below.
+ * Then we never should be on the scan timer and
+ * waiting of receive_remove() of the first case.
*/
- if (!dlm_no_directory(ls) &&
- (r->res_master_nodeid != our_nodeid) &&
- (dlm_dir_nodeid(r) != our_nodeid))
+ if (dlm_no_directory(ls) ||
+ (r->res_master_nodeid == our_nodeid ||
+ dlm_dir_nodeid(r) != our_nodeid))
add_scan(ls, r);
if (r->res_lvbptr) {
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related scan timer
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
To: teigland; +Cc: gfs2, aahringo
This patch adds more sanity checks if an invalid rsb structure is on the
scan timer which should never be on the timer because it is waiting on a
remove messages instead of a timeout of being unused.
The function del_scan() should therefore never be called on certain rsbs
that puts the rsb on the scan list for the scan timer. In case of a
find_rsb_dir() rsbs that are not the master and the dir record should
never be on the scan list as well as they waiting for a remove message.
We just always check on a lookup if there might be other buggy places
that accidentially moved those rsbs on the scan list.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lock.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index a8885cf82913..a834c270f24e 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -426,8 +426,8 @@ static void del_scan(struct dlm_ls *ls, struct dlm_rsb *r)
spin_lock_bh(&ls->ls_scan_lock);
r->res_toss_time = 0;
- /* if the rsb is not queued do nothing */
- if (list_empty(&r->res_scan_list))
+ /* if the rsb is not queued do nothing, should never be the case */
+ if (WARN_ON(list_empty(&r->res_scan_list)))
goto out;
/* get the first element before delete */
@@ -830,6 +830,8 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
if (r->res_master_nodeid == our_nodeid ||
r->res_dir_nodeid != our_nodeid)
del_scan(ls, r);
+ else
+ WARN_ON(!list_empty(&r->res_scan_list));
list_move(&r->res_slow_list, &ls->ls_slow_active);
rsb_clear_flag(r, RSB_INACTIVE);
kref_init(&r->res_ref); /* ref is now used in active state */
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v6.13-rc1 4/5] dlm: make sense out of force values
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related " Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
To: teigland; +Cc: gfs2, aahringo
Currently users e.g. gfs2 calling lockspace_busy() with 2 which seems
not to be specified but it is specified as the rest of force value
meanings besides 0 and 1 which means: do check nothing. That makes no
sense as we can never introduce any new force value to have a specific
new meaning for it.
We just switch now to have 2 as meaning as do check on nothing and the
other values will return -EINVAL if somebody calls
dlm_release_lockspace() with them. This might break API but it totally
makes no sense to not use the other values of the force parameter.
All kernel users, gfs2 and md-cluster uses 2 as force parameter which
has the meaning: do check on nothing and let recovery deal with it.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lockspace.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 8afac6e2dff0..1a216a8b6ca2 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -691,8 +691,10 @@ static int lockspace_busy(struct dlm_ls *ls, int force)
break;
}
}
- } else {
+ } else if (force == 2) {
rv = 0;
+ } else {
+ rv = -EINVAL;
}
read_unlock_bh(&ls->ls_lkbxa_lock);
return rv;
@@ -703,6 +705,8 @@ static int release_lockspace(struct dlm_ls *ls, int force)
int busy, rv;
busy = lockspace_busy(ls, force);
+ if (busy < 0)
+ return busy;
spin_lock_bh(&lslist_lock);
if (ls->ls_create_count == 1) {
@@ -730,7 +734,7 @@ static int release_lockspace(struct dlm_ls *ls, int force)
dlm_device_deregister(ls);
- if (force < 3 && dlm_user_daemon_available())
+ if (dlm_user_daemon_available())
do_uevent(ls, 0);
dlm_recoverd_stop(ls);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
` (2 preceding siblings ...)
2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
@ 2024-12-02 15:26 ` Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2024-12-02 15:26 UTC (permalink / raw)
To: teigland; +Cc: gfs2, aahringo
Currently if no comm can be found dlm_comm_seq() returns -EEXIST which
means entry already exists for a lookup it makes no sense to return
-EEXIST. We change it to -ENOENT. There is no user that will evaluate
the return value on a specific value so this should be fine.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/config.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index b2f21aa00719..cf9ba6fd7a28 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -935,7 +935,7 @@ int dlm_comm_seq(int nodeid, uint32_t *seq, bool locked)
mutex_unlock(&clusters_root.subsys.su_mutex);
}
if (!cm)
- return -EEXIST;
+ return -ENOENT;
*seq = cm->seq;
put_comm(cm);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread