Re: [PATCH] md/md-cluster: handle REMOVE message earlier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Su Yue <l@damenly.org>
To: Heming Zhao <heming.zhao@suse.com>
Cc: song@kernel.org,  yukuai1@huaweicloud.com,  glass.su@suse.com,
	linux-raid@vger.kernel.org
Subject: Re: [PATCH] md/md-cluster: handle REMOVE message earlier
Date: Mon, 28 Jul 2025 14:35:06 +0800	[thread overview]
Message-ID: <wm7sj011.fsf@damenly.org> (raw)
In-Reply-To: <20250728042145.9989-1-heming.zhao@suse.com> (Heming Zhao's message of "Mon, 28 Jul 2025 12:21:40 +0800")

On Mon 28 Jul 2025 at 12:21, Heming Zhao <heming.zhao@suse.com> 
wrote:

> Commit a1fd37f97808 ("md: Don't wait for MD_RECOVERY_NEEDED for
> HOT_REMOVE_DISK ioctl") introduced a regression in the 
> md_cluster
> module. (Failed cases 02r1_Manage_re-add & 02r10_Manage_re-add)
>
> Consider a 2-node cluster:
> - node1 set faulty & remove command on a disk.
> - node2 must correctly update the array metadata.
>
> Before a1fd37f97808, on node1, the delay between 
> msg:METADATA_UPDATED
> (triggered by faulty) and msg:REMOVE was sufficient for node2 to
> reload the disk info (written by node1).
> After a1fd37f97808, node1 no longer waits between faulty and 
> remove,
> causing it to send msg:REMOVE while node2 is still reloading 
> disk info.
> This often results in node2 failing to remove the faulty disk.
>
> == how to trigger ==
>
> set up a 2-node cluster (node1 & node2) with disks vdc & vdd.
>
> on node1:
> mdadm -CR /dev/md0 -l1 -b clustered -n2 /dev/vdc /dev/vdd 
> --assume-clean
> ssh node2-ip mdadm -A /dev/md0 /dev/vdc /dev/vdd
> mdadm --manage /dev/md0 --fail /dev/vdc --remove /dev/vdc
>
> check array status on both nodes with "mdadm -D /dev/md0".
> node1 output:
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        1     254       48        1      active sync   /dev/vdd
> node2 output:
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        1     254       48        1      active sync   /dev/vdd
>
>        0     254       32        -      faulty   /dev/vdc
>
> Fixes: a1fd37f97808 ("md: Don't wait for MD_RECOVERY_NEEDED for 
> HOT_REMOVE_DISK ioctl")
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  drivers/md/md.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0f03b21e66e4..de9f9a345ed3 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9758,8 +9758,8 @@ void md_check_recovery(struct mddev 
> *mddev)
>  			 * remove disk.
>  			 */
>  			rdev_for_each_safe(rdev, tmp, mddev) {
> -				if (test_and_clear_bit(ClusterRemove, 
> &rdev->flags) &&
> -						rdev->raid_disk < 0)
>
ClusterRemove and MD_RECOVERY_NEEDED are set in removed 
rdev->flags on node 2 by process_remove_disk().
After commit a1fd37f97808, node 2 could receive REMOVE in short 
time after receive of
METADATA_UPDATED, MD_RECOVERY_NEEDED will be cleared by 
md_check_recovery() but in that time
rdev->raid_disk could be unchanged. Then MD_RECOVERY_NEEDED will 
be cleared in the end of md_check_recovery()
no more chance to meet the condition.

Reviewed-by: Su Yue <glass.su@suse.com>

> +				if (rdev->raid_disk < 0 &&
> +				    test_and_clear_bit(ClusterRemove, 
> &rdev->flags))
>  					md_kick_rdev_from_array(rdev);
>  			}
>  		}
> @@ -10065,8 +10065,11 @@ static void check_sb_changes(struct 
> mddev *mddev, struct md_rdev *rdev)
>
>  	/* Check for change of roles in the active devices */
>  	rdev_for_each_safe(rdev2, tmp, mddev) {
> -		if (test_bit(Faulty, &rdev2->flags))
> +		if (test_bit(Faulty, &rdev2->flags)) {
> +			if (test_bit(ClusterRemove, &rdev2->flags))
> +				set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>  			continue;
> +		}
>
>  		/* Check if the roles changed */
>  		role = le16_to_cpu(sb->dev_roles[rdev2->desc_nr]);

next prev parent reply	other threads:[~2025-07-28  6:40 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28  4:21 [PATCH] md/md-cluster: handle REMOVE message earlier Heming Zhao
2025-07-28  6:35 ` Su Yue [this message]
2025-07-30 17:56 ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wm7sj011.fsf@damenly.org \
    --to=l@damenly.org \
    --cc=glass.su@suse.com \
    --cc=heming.zhao@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.