gfs2 filesystem and dlm development
 help / color / mirror / Atom feed
From: Alexander Aring <aahringo@redhat.com>
To: teigland@redhat.com
Cc: gfs2@lists.linux.dev, aahringo@redhat.com
Subject: [PATCH v6.13-rc1 2/5] dlm: fix missing rsb put on scan timer
Date: Mon,  2 Dec 2024 10:26:38 -0500	[thread overview]
Message-ID: <20241202152641.3395369-2-aahringo@redhat.com> (raw)
In-Reply-To: <20241202152641.3395369-1-aahringo@redhat.com>

I figured out we don't cleanup all rsb on toss list right now if all
locks are unlocked in the cluster for a resource. After investigation
I figured out the condition to add_scan() is wrong and I added a comment
for the second important condition for all possible permuations of the
boolean expression.

The current expression:

"(r->res_master_nodeid != our_nodeid &&
  dlm_dir_nodeid(r) != our_nodeid))"

it only covers the case when we are not the master and not the dir node
which is the second case of the added comment in 2. Case 1 and 3 are
forgotten.

Remove also the del_scan() as it always should remove the rsb out of the
scan timer when it's on, if it's not on del_scan() will do nothing.

For find_rsb_nodir() the del_scan() need to be moved before clearing the
RSB_INACTIVE flag as del_scan() is has a WARN_ON() on it.

Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 52 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index fc1d710166e9..a8885cf82913 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -824,8 +824,11 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
 		r->res_first_lkid = 0;
 	}
 
-	/* A dir record will not be on the scan list. */
-	if (r->res_dir_nodeid != our_nodeid)
+	/* A dir record will not be on the scan list.
+	 * Except if we are the master of the rsb.
+	 */
+	if (r->res_master_nodeid == our_nodeid ||
+	    r->res_dir_nodeid != our_nodeid)
 		del_scan(ls, r);
 	list_move(&r->res_slow_list, &ls->ls_slow_active);
 	rsb_clear_flag(r, RSB_INACTIVE);
@@ -989,10 +992,10 @@ static int find_rsb_nodir(struct dlm_ls *ls, const void *name, int len,
 		r->res_nodeid = 0;
 	}
 
+	del_scan(ls, r);
 	list_move(&r->res_slow_list, &ls->ls_slow_active);
 	rsb_clear_flag(r, RSB_INACTIVE);
 	kref_init(&r->res_ref);
-	del_scan(ls, r);
 	write_unlock_bh(&ls->ls_rsbtbl_lock);
 
 	goto out;
@@ -1337,9 +1340,13 @@ static int _dlm_master_lookup(struct dlm_ls *ls, int from_nodeid, const char *na
 	__dlm_master_lookup(ls, r, our_nodeid, from_nodeid, true, flags,
 			    r_nodeid, result);
 
-	/* A dir record rsb should never be on scan list. */
-	/* Try to fix this with del_scan? */
-	WARN_ON(!list_empty(&r->res_scan_list));
+	/* A dir record rsb should never be on scan list.
+	 * Except when we are the dir and master node.
+	 * This function should only be called by the dir
+	 * node.
+	 */
+	WARN_ON(!list_empty(&r->res_scan_list) &&
+		r->res_master_nodeid != our_nodeid);
 
 	write_unlock_bh(&ls->ls_rsbtbl_lock);
 
@@ -1431,15 +1438,32 @@ static void deactivate_rsb(struct kref *kref)
 
 	/*
 	 * When the rsb becomes unused:
-	 * - If it's not a dir record for a remote master rsb,
-	 *   then it is put on the scan list to be freed.
-	 * - If it's a dir record for a remote master rsb,
-	 *   then it is kept in the inactive state until
-	 *   receive_remove() from the master node.
+	 * - It's on scan list when there is no directory
+	 *   node functionality
+	 * or
+	 * - First case is if we are the rsb master we need to
+	 *   call send_remove() to the dir node and delete
+	 *   ourself.
+	 *
+	 *   The second case is if we are not the dir node we
+	 *   need to delete ourself as well but don't call
+	 *   send_remove() as we are not the master.
+	 *
+	 *   The thrid case when the node is master and
+	 *   dir node at the same time. Then the rsb need to be
+	 *   added to the scan timer as well because the scan
+	 *   acts like a local send_remove() that is done
+	 *   immediately.
+	 *
+	 *   The fourth case is the one we need to avoid here,
+	 *   when we are the directory node and not the master.
+	 *   This is the inverse of the condition below.
+	 *   Then we never should be on the scan timer and
+	 *   waiting of receive_remove() of the first case.
 	 */
-	if (!dlm_no_directory(ls) &&
-	    (r->res_master_nodeid != our_nodeid) &&
-	    (dlm_dir_nodeid(r) != our_nodeid))
+	if (dlm_no_directory(ls) ||
+	    (r->res_master_nodeid == our_nodeid ||
+	     dlm_dir_nodeid(r) != our_nodeid))
 		add_scan(ls, r);
 
 	if (r->res_lvbptr) {
-- 
2.43.0


  reply	other threads:[~2024-12-02 15:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 15:26 [PATCH v6.13-rc1 1/5] dlm: fix srcu_read_lock() return type to int Alexander Aring
2024-12-02 15:26 ` Alexander Aring [this message]
2024-12-02 15:26 ` [PATCH v6.13-rc1 3/5] dlm: add more sanity checks related scan timer Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 4/5] dlm: make sense out of force values Alexander Aring
2024-12-02 15:26 ` [PATCH v6.13-rc1 5/5] dlm: return -ENOENT if no comm was found Alexander Aring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241202152641.3395369-2-aahringo@redhat.com \
    --to=aahringo@redhat.com \
    --cc=gfs2@lists.linux.dev \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox