* [PATCH 3/3] md: remove/add redundancy group only in level change
2025-06-03 5:20 [PATCH V3 0/2] md: call del_gendisk in sync way Xiao Ni
@ 2025-06-03 5:20 ` Xiao Ni
0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-03 5:20 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, song, ncroxon
del_gendisk is called in synchronous way now. So it doesn't need to handle
redundancy group separately.
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 29 ++++++++++-------------------
1 file changed, 10 insertions(+), 19 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0913b8236471..84cd21bd85b0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -818,23 +818,16 @@ void mddev_unlock(struct mddev *mddev)
mddev->sysfs_active = 1;
mutex_unlock(&mddev->reconfig_mutex);
- if (mddev->kobj.sd) {
- if (to_remove != &md_redundancy_group)
- sysfs_remove_group(&mddev->kobj, to_remove);
- if (mddev->pers == NULL ||
- mddev->pers->sync_request == NULL) {
- sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
- if (mddev->sysfs_action)
- sysfs_put(mddev->sysfs_action);
- if (mddev->sysfs_completed)
- sysfs_put(mddev->sysfs_completed);
- if (mddev->sysfs_degraded)
- sysfs_put(mddev->sysfs_degraded);
- mddev->sysfs_action = NULL;
- mddev->sysfs_completed = NULL;
- mddev->sysfs_degraded = NULL;
- }
- }
+ sysfs_remove_group(&mddev->kobj, to_remove);
+ if (mddev->sysfs_action)
+ sysfs_put(mddev->sysfs_action);
+ if (mddev->sysfs_completed)
+ sysfs_put(mddev->sysfs_completed);
+ if (mddev->sysfs_degraded)
+ sysfs_put(mddev->sysfs_degraded);
+ mddev->sysfs_action = NULL;
+ mddev->sysfs_completed = NULL;
+ mddev->sysfs_degraded = NULL;
mddev->sysfs_active = 0;
} else
mutex_unlock(&mddev->reconfig_mutex);
@@ -6475,8 +6468,6 @@ static void __md_stop(struct mddev *mddev)
if (mddev->private)
pers->free(mddev, mddev->private);
mddev->private = NULL;
- if (pers->sync_request && mddev->to_remove == NULL)
- mddev->to_remove = &md_redundancy_group;
put_pers(pers);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] md: remove/add redundancy group only in level change
2025-06-04 9:07 [PATCH V4 0/3] md: call del_gendisk in sync way Xiao Ni
@ 2025-06-04 9:07 ` Xiao Ni
0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-04 9:07 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
del_gendisk is called in synchronous way now. So it doesn't need to handle
redundancy group in stop path separately.
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8824ce64eee2..e3bb8c725bca 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6528,8 +6528,6 @@ static void __md_stop(struct mddev *mddev)
if (mddev->private)
pers->free(mddev, mddev->private);
mddev->private = NULL;
- if (pers->sync_request && mddev->to_remove == NULL)
- mddev->to_remove = &md_redundancy_group;
put_pers(pers);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] md: remove/add redundancy group only in level change
2025-06-10 7:20 [PATCH V5 " Xiao Ni
@ 2025-06-10 7:20 ` Xiao Ni
2025-06-11 6:17 ` Yu Kuai
0 siblings, 1 reply; 13+ messages in thread
From: Xiao Ni @ 2025-06-10 7:20 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
del_gendisk is called in synchronous way now. So it doesn't need to handle
redundancy group in stop path separately.
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index dde3d2bfd34d..7ae91155f2e4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6523,8 +6523,6 @@ static void __md_stop(struct mddev *mddev)
if (mddev->private)
pers->free(mddev, mddev->private);
mddev->private = NULL;
- if (pers->sync_request && mddev->to_remove == NULL)
- mddev->to_remove = &md_redundancy_group;
put_pers(pers);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] md: remove/add redundancy group only in level change
2025-06-10 7:20 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
@ 2025-06-11 6:17 ` Yu Kuai
0 siblings, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2025-06-11 6:17 UTC (permalink / raw)
To: Xiao Ni, linux-raid; +Cc: ncroxon, song, yukuai1, yukuai (C)
在 2025/06/10 15:20, Xiao Ni 写道:
> del_gendisk is called in synchronous way now. So it doesn't need to handle
> redundancy group in stop path separately.
>
> Signed-off-by: Xiao Ni<xni@redhat.com>
> ---
> drivers/md/md.c | 2 --
> 1 file changed, 2 deletions(-)
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V6 0/3] md: call del_gendisk in sync way
@ 2025-06-11 7:31 Xiao Ni
2025-06-11 7:31 ` [PATCH 1/3] md: call del_gendisk in control path Xiao Ni
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-11 7:31 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
Now del_gendisk is called in a queue work which has a small window
that mdadm --stop command exits but the device node still exists.
It causes trouble in regression tests. This patch set tries to resolve
this problem.
v1: replace MD_DELETED with MD_CLOSING
v2: keep MD_CLOSING
v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
and adjust the order of patches
v4: only remove the codes in stop path.
v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
v6: don't initialize ret and add reviewed-by tag
Xiao Ni (3):
md: call del_gendisk in control path
md: Don't clear MD_CLOSING until mddev is freed
md: remove/add redundancy group only in level change
drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
drivers/md/md.h | 26 ++++++++++++++++++++++++--
2 files changed, 50 insertions(+), 25 deletions(-)
--
2.32.0 (Apple Git-132)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] md: call del_gendisk in control path
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
@ 2025-06-11 7:31 ` Xiao Ni
2025-06-11 7:31 ` [PATCH 2/3] md: Don't clear MD_CLOSING until mddev is freed Xiao Ni
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-11 7:31 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
Now del_gendisk and put_disk are called asynchronously in workqueue work.
The asynchronous way has a problem that the device node can still exist
after mdadm --stop command returns in a short window. So udev rule can
open this device node and create the struct mddev in kernel again. So put
del_gendisk in control path and still leave put_disk in md_kobj_release
to avoid uaf of gendisk.
Function del_gendisk can't be called with reconfig_mutex. If it's called
with reconfig mutex, a deadlock can happen. del_gendisk waits all sysfs
files access to finish and sysfs file access waits reconfig mutex. So
put del_gendisk after releasing reconfig mutex.
But there is still a window that sysfs can be accessed between mddev_unlock
and del_gendisk. So some actions (add disk, change level, .e.g) can happen
which lead unexpected results. MD_DELETED is used to resolve this problem.
MD_DELETED is set before releasing reconfig mutex and it should be checked
for these sysfs access which need reconfig mutex. For sysfs access which
don't need reconfig mutex, del_gendisk will wait them to finish.
But it doesn't need to do this in function mddev_lock_nointr. There are
ten places that call it.
* Five of them are in dm raid which we don't need to care. MD_DELETED is
only used for md raid.
* stop_sync_thread, md_do_sync and md_start_sync are related sync request,
and it needs to wait sync thread to finish before stopping an array.
* md_ioctl: md_open is called before md_ioctl, so ->openers is added. It
will fail to stop the array. So it doesn't need to check MD_DELETED here
* md_set_readonly:
It needs to call mddev_set_closing_and_sync_blockdev when setting readonly
or read_auto. So it will fail to stop the array too because MD_CLOSING is
already set.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 33 +++++++++++++++++++++++----------
drivers/md/md.h | 26 ++++++++++++++++++++++++--
2 files changed, 47 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0f03b21e66e4..7445e44eabff 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -636,9 +636,6 @@ static void __mddev_put(struct mddev *mddev)
mddev->ctime || mddev->hold_active)
return;
- /* Array is not configured at all, and not held active, so destroy it */
- set_bit(MD_DELETED, &mddev->flags);
-
/*
* Call queue_work inside the spinlock so that flush_workqueue() after
* mddev_find will succeed in waiting for the work to be done.
@@ -873,6 +870,16 @@ void mddev_unlock(struct mddev *mddev)
kobject_del(&rdev->kobj);
export_rdev(rdev, mddev);
}
+
+ /* Call del_gendisk after release reconfig_mutex to avoid
+ * deadlock (e.g. call del_gendisk under the lock and an
+ * access to sysfs files waits the lock)
+ * And MD_DELETED is only used for md raid which is set in
+ * do_md_stop. dm raid only uses md_stop to stop. So dm raid
+ * doesn't need to check MD_DELETED when getting reconfig lock
+ */
+ if (test_bit(MD_DELETED, &mddev->flags))
+ del_gendisk(mddev->gendisk);
}
EXPORT_SYMBOL_GPL(mddev_unlock);
@@ -5774,19 +5781,30 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr);
struct mddev *mddev = container_of(kobj, struct mddev, kobj);
ssize_t rv;
+ struct kernfs_node *kn = NULL;
if (!entry->store)
return -EIO;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
+
+ if (entry->store == array_state_store && cmd_match(page, "clear"))
+ kn = sysfs_break_active_protection(kobj, attr);
+
spin_lock(&all_mddevs_lock);
if (!mddev_get(mddev)) {
spin_unlock(&all_mddevs_lock);
+ if (kn)
+ sysfs_unbreak_active_protection(kn);
return -EBUSY;
}
spin_unlock(&all_mddevs_lock);
rv = entry->store(mddev, page, length);
mddev_put(mddev);
+
+ if (kn)
+ sysfs_unbreak_active_protection(kn);
+
return rv;
}
@@ -5794,12 +5812,6 @@ static void md_kobj_release(struct kobject *ko)
{
struct mddev *mddev = container_of(ko, struct mddev, kobj);
- if (mddev->sysfs_state)
- sysfs_put(mddev->sysfs_state);
- if (mddev->sysfs_level)
- sysfs_put(mddev->sysfs_level);
-
- del_gendisk(mddev->gendisk);
put_disk(mddev->gendisk);
}
@@ -6646,8 +6658,9 @@ static int do_md_stop(struct mddev *mddev, int mode)
mddev->bitmap_info.offset = 0;
export_array(mddev);
-
md_clean(mddev);
+ set_bit(MD_DELETED, &mddev->flags);
+
if (mddev->hold_active == UNTIL_STOP)
mddev->hold_active = 0;
}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d45a9e6ead80..67b365621507 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -700,11 +700,26 @@ static inline bool reshape_interrupted(struct mddev *mddev)
static inline int __must_check mddev_lock(struct mddev *mddev)
{
- return mutex_lock_interruptible(&mddev->reconfig_mutex);
+ int ret;
+
+ ret = mutex_lock_interruptible(&mddev->reconfig_mutex);
+
+ /* MD_DELETED is set in do_md_stop with reconfig_mutex.
+ * So check it here.
+ */
+ if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+ ret = -ENODEV;
+ mutex_unlock(&mddev->reconfig_mutex);
+ }
+
+ return ret;
}
/* Sometimes we need to take the lock in a situation where
* failure due to interrupts is not acceptable.
+ * It doesn't need to check MD_DELETED here, the owner which
+ * holds the lock here can't be stopped. And all paths can't
+ * call this function after do_md_stop.
*/
static inline void mddev_lock_nointr(struct mddev *mddev)
{
@@ -713,7 +728,14 @@ static inline void mddev_lock_nointr(struct mddev *mddev)
static inline int mddev_trylock(struct mddev *mddev)
{
- return mutex_trylock(&mddev->reconfig_mutex);
+ int ret;
+
+ ret = mutex_trylock(&mddev->reconfig_mutex);
+ if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+ ret = -ENODEV;
+ mutex_unlock(&mddev->reconfig_mutex);
+ }
+ return ret;
}
extern void mddev_unlock(struct mddev *mddev);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] md: Don't clear MD_CLOSING until mddev is freed
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
2025-06-11 7:31 ` [PATCH 1/3] md: call del_gendisk in control path Xiao Ni
@ 2025-06-11 7:31 ` Xiao Ni
2025-06-11 7:31 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-11 7:31 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
UNTIL_STOP is used to avoid mddev is freed on the last close before adding
disks to mddev. And it should be cleared when stopping an array which is
mentioned in commit efeb53c0e572 ("md: Allow md devices to be created by
name."). So reset ->hold_active to 0 in md_clean.
And MD_CLOSING should be kept until mddev is freed to avoid reopen.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7445e44eabff..dde3d2bfd34d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6425,15 +6425,10 @@ static void md_clean(struct mddev *mddev)
mddev->persistent = 0;
mddev->level = LEVEL_NONE;
mddev->clevel[0] = 0;
- /*
- * Don't clear MD_CLOSING, or mddev can be opened again.
- * 'hold_active != 0' means mddev is still in the creation
- * process and will be used later.
- */
- if (mddev->hold_active)
- mddev->flags = 0;
- else
- mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
+ /* if UNTIL_STOP is set, it's cleared here */
+ mddev->hold_active = 0;
+ /* Don't clear MD_CLOSING, or mddev can be opened again. */
+ mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
mddev->sb_flags = 0;
mddev->ro = MD_RDWR;
mddev->metadata_type[0] = 0;
@@ -6660,9 +6655,6 @@ static int do_md_stop(struct mddev *mddev, int mode)
export_array(mddev);
md_clean(mddev);
set_bit(MD_DELETED, &mddev->flags);
-
- if (mddev->hold_active == UNTIL_STOP)
- mddev->hold_active = 0;
}
md_new_event();
sysfs_notify_dirent_safe(mddev->sysfs_state);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] md: remove/add redundancy group only in level change
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
2025-06-11 7:31 ` [PATCH 1/3] md: call del_gendisk in control path Xiao Ni
2025-06-11 7:31 ` [PATCH 2/3] md: Don't clear MD_CLOSING until mddev is freed Xiao Ni
@ 2025-06-11 7:31 ` Xiao Ni
2025-06-14 6:46 ` [PATCH V6 0/3] md: call del_gendisk in sync way Yu Kuai
2025-06-14 8:18 ` Yu Kuai
4 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-11 7:31 UTC (permalink / raw)
To: linux-raid; +Cc: yukuai3, ncroxon, song, yukuai1
del_gendisk is called in synchronous way now. So it doesn't need to handle
redundancy group in stop path separately.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
---
drivers/md/md.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index dde3d2bfd34d..7ae91155f2e4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6523,8 +6523,6 @@ static void __md_stop(struct mddev *mddev)
if (mddev->private)
pers->free(mddev, mddev->private);
mddev->private = NULL;
- if (pers->sync_request && mddev->to_remove == NULL)
- mddev->to_remove = &md_redundancy_group;
put_pers(pers);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V6 0/3] md: call del_gendisk in sync way
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
` (2 preceding siblings ...)
2025-06-11 7:31 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
@ 2025-06-14 6:46 ` Yu Kuai
2025-06-14 8:18 ` Yu Kuai
4 siblings, 0 replies; 13+ messages in thread
From: Yu Kuai @ 2025-06-14 6:46 UTC (permalink / raw)
To: Xiao Ni, linux-raid; +Cc: ncroxon, song, yukuai1, yukuai (C)
在 2025/06/11 15:31, Xiao Ni 写道:
> Now del_gendisk is called in a queue work which has a small window
> that mdadm --stop command exits but the device node still exists.
> It causes trouble in regression tests. This patch set tries to resolve
> this problem.
>
> v1: replace MD_DELETED with MD_CLOSING
> v2: keep MD_CLOSING
> v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
> and adjust the order of patches
> v4: only remove the codes in stop path.
> v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
> v6: don't initialize ret and add reviewed-by tag
>
> Xiao Ni (3):
> md: call del_gendisk in control path
> md: Don't clear MD_CLOSING until mddev is freed
> md: remove/add redundancy group only in level change
>
> drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
> drivers/md/md.h | 26 ++++++++++++++++++++++++--
> 2 files changed, 50 insertions(+), 25 deletions(-)
>
Applied to md-6.16
Thanks,
Kuai
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V6 0/3] md: call del_gendisk in sync way
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
` (3 preceding siblings ...)
2025-06-14 6:46 ` [PATCH V6 0/3] md: call del_gendisk in sync way Yu Kuai
@ 2025-06-14 8:18 ` Yu Kuai
2025-06-15 3:21 ` Xiao Ni
4 siblings, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2025-06-14 8:18 UTC (permalink / raw)
To: Xiao Ni, linux-raid; +Cc: ncroxon, song, yukuai1, yukuai (C)
Hi,
在 2025/06/11 15:31, Xiao Ni 写道:
> Now del_gendisk is called in a queue work which has a small window
> that mdadm --stop command exits but the device node still exists.
> It causes trouble in regression tests. This patch set tries to resolve
> this problem.
>
> v1: replace MD_DELETED with MD_CLOSING
> v2: keep MD_CLOSING
> v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
> and adjust the order of patches
> v4: only remove the codes in stop path.
> v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
> v6: don't initialize ret and add reviewed-by tag
>
> Xiao Ni (3):
> md: call del_gendisk in control path
> md: Don't clear MD_CLOSING until mddev is freed
> md: remove/add redundancy group only in level change
>
> drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
> drivers/md/md.h | 26 ++++++++++++++++++++++++--
> 2 files changed, 50 insertions(+), 25 deletions(-)
>
Just running mdadm tests with loop dev in my VM, and found this set can
cause many tests to fail, the first is 02r5grow:
++ /usr/sbin/mdadm -A /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3
++ rv=1
++ case $* in
++ cat /var/tmp/stderr
mdadm: Unable to initialize sysfs
++ return 1
++ check state UUU
++ case $1 in
++ grep -sq 'blocks.*\[UUU\]$' /proc/mdstat
++ die 'state UUU not found!'
++ echo -e '\n\tERROR: state UUU not found! \n'
ERROR: state UUU not found!
++ save_log fail
I do not look into details yet.
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V6 0/3] md: call del_gendisk in sync way
2025-06-14 8:18 ` Yu Kuai
@ 2025-06-15 3:21 ` Xiao Ni
2025-06-16 1:17 ` Yu Kuai
0 siblings, 1 reply; 13+ messages in thread
From: Xiao Ni @ 2025-06-15 3:21 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, ncroxon, song, yukuai (C)
On Sat, Jun 14, 2025 at 4:18 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2025/06/11 15:31, Xiao Ni 写道:
> > Now del_gendisk is called in a queue work which has a small window
> > that mdadm --stop command exits but the device node still exists.
> > It causes trouble in regression tests. This patch set tries to resolve
> > this problem.
> >
> > v1: replace MD_DELETED with MD_CLOSING
> > v2: keep MD_CLOSING
> > v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
> > and adjust the order of patches
> > v4: only remove the codes in stop path.
> > v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
> > v6: don't initialize ret and add reviewed-by tag
> >
> > Xiao Ni (3):
> > md: call del_gendisk in control path
> > md: Don't clear MD_CLOSING until mddev is freed
> > md: remove/add redundancy group only in level change
> >
> > drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
> > drivers/md/md.h | 26 ++++++++++++++++++++++++--
> > 2 files changed, 50 insertions(+), 25 deletions(-)
> >
>
> Just running mdadm tests with loop dev in my VM, and found this set can
> cause many tests to fail, the first is 02r5grow:
>
> ++ /usr/sbin/mdadm -A /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3
> ++ rv=1
> ++ case $* in
> ++ cat /var/tmp/stderr
> mdadm: Unable to initialize sysfs
> ++ return 1
> ++ check state UUU
> ++ case $1 in
> ++ grep -sq 'blocks.*\[UUU\]$' /proc/mdstat
> ++ die 'state UUU not found!'
> ++ echo -e '\n\tERROR: state UUU not found! \n'
>
> ERROR: state UUU not found!
>
> ++ save_log fail
>
> I do not look into details yet.
> Thanks
>
Hi Kuai
You need to use the latest upstream mdadm code
https://github.com/md-raid-utilities/mdadm/
Regards
Xiao
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V6 0/3] md: call del_gendisk in sync way
2025-06-15 3:21 ` Xiao Ni
@ 2025-06-16 1:17 ` Yu Kuai
2025-06-16 1:43 ` Xiao Ni
0 siblings, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2025-06-16 1:17 UTC (permalink / raw)
To: Xiao Ni, Yu Kuai; +Cc: linux-raid, ncroxon, song, yukuai (C)
Hi,
在 2025/06/15 11:21, Xiao Ni 写道:
> On Sat, Jun 14, 2025 at 4:18 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2025/06/11 15:31, Xiao Ni 写道:
>>> Now del_gendisk is called in a queue work which has a small window
>>> that mdadm --stop command exits but the device node still exists.
>>> It causes trouble in regression tests. This patch set tries to resolve
>>> this problem.
>>>
>>> v1: replace MD_DELETED with MD_CLOSING
>>> v2: keep MD_CLOSING
>>> v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
>>> and adjust the order of patches
>>> v4: only remove the codes in stop path.
>>> v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
>>> v6: don't initialize ret and add reviewed-by tag
>>>
>>> Xiao Ni (3):
>>> md: call del_gendisk in control path
>>> md: Don't clear MD_CLOSING until mddev is freed
>>> md: remove/add redundancy group only in level change
>>>
>>> drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
>>> drivers/md/md.h | 26 ++++++++++++++++++++++++--
>>> 2 files changed, 50 insertions(+), 25 deletions(-)
>>>
>>
>> Just running mdadm tests with loop dev in my VM, and found this set can
>> cause many tests to fail, the first is 02r5grow:
>>
>> ++ /usr/sbin/mdadm -A /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3
>> ++ rv=1
>> ++ case $* in
>> ++ cat /var/tmp/stderr
>> mdadm: Unable to initialize sysfs
>> ++ return 1
>> ++ check state UUU
>> ++ case $1 in
>> ++ grep -sq 'blocks.*\[UUU\]$' /proc/mdstat
>> ++ die 'state UUU not found!'
>> ++ echo -e '\n\tERROR: state UUU not found! \n'
>>
>> ERROR: state UUU not found!
>>
>> ++ save_log fail
>>
>> I do not look into details yet.
>> Thanks
>>
>
> Hi Kuai
>
> You need to use the latest upstream mdadm code
> https://github.com/md-raid-utilities/mdadm/
>
I use the repo from:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
With the latest commit:
8da27191 ("mdadm: enable sync file for udev rules")
Do we not update mdadm here?
I'll run the test soon, and BTW, wahy in the above commit, test can
pass before this set?
Thanks,
Kuai
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V6 0/3] md: call del_gendisk in sync way
2025-06-16 1:17 ` Yu Kuai
@ 2025-06-16 1:43 ` Xiao Ni
0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2025-06-16 1:43 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, ncroxon, song, yukuai (C)
On Mon, Jun 16, 2025 at 9:18 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2025/06/15 11:21, Xiao Ni 写道:
> > On Sat, Jun 14, 2025 at 4:18 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2025/06/11 15:31, Xiao Ni 写道:
> >>> Now del_gendisk is called in a queue work which has a small window
> >>> that mdadm --stop command exits but the device node still exists.
> >>> It causes trouble in regression tests. This patch set tries to resolve
> >>> this problem.
> >>>
> >>> v1: replace MD_DELETED with MD_CLOSING
> >>> v2: keep MD_CLOSING
> >>> v3: call den_gendisk in mddev_unlock, and remove ->to_remove in stop path
> >>> and adjust the order of patches
> >>> v4: only remove the codes in stop path.
> >>> v5: remove sysfs_remove in md_kobj_release and change EBUSY with ENODEV
> >>> v6: don't initialize ret and add reviewed-by tag
> >>>
> >>> Xiao Ni (3):
> >>> md: call del_gendisk in control path
> >>> md: Don't clear MD_CLOSING until mddev is freed
> >>> md: remove/add redundancy group only in level change
> >>>
> >>> drivers/md/md.c | 49 ++++++++++++++++++++++++++-----------------------
> >>> drivers/md/md.h | 26 ++++++++++++++++++++++++--
> >>> 2 files changed, 50 insertions(+), 25 deletions(-)
> >>>
> >>
> >> Just running mdadm tests with loop dev in my VM, and found this set can
> >> cause many tests to fail, the first is 02r5grow:
> >>
> >> ++ /usr/sbin/mdadm -A /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3
> >> ++ rv=1
> >> ++ case $* in
> >> ++ cat /var/tmp/stderr
> >> mdadm: Unable to initialize sysfs
> >> ++ return 1
> >> ++ check state UUU
> >> ++ case $1 in
> >> ++ grep -sq 'blocks.*\[UUU\]$' /proc/mdstat
> >> ++ die 'state UUU not found!'
> >> ++ echo -e '\n\tERROR: state UUU not found! \n'
> >>
> >> ERROR: state UUU not found!
> >>
> >> ++ save_log fail
> >>
> >> I do not look into details yet.
> >> Thanks
> >>
> >
> > Hi Kuai
> >
> > You need to use the latest upstream mdadm code
> > https://github.com/md-raid-utilities/mdadm/
> >
>
> I use the repo from:
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
>
> With the latest commit:
> 8da27191 ("mdadm: enable sync file for udev rules")
>
> Do we not update mdadm here?
>
> I'll run the test soon, and BTW, wahy in the above commit, test can
> pass before this set?
Hi Kuai
It doesn't have the patches which were submitted to github recently. I
don't have the permission to sync from github to
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
https://github.com/md-raid-utilities/mdadm/commit/ea4cdaea1a553685444a3fb39aae6b2cfee387ef
fixes the problem which can be introduced by this patch set.
Regards
Xiao
>
> Thanks,
> Kuai
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-16 1:43 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 7:31 [PATCH V6 0/3] md: call del_gendisk in sync way Xiao Ni
2025-06-11 7:31 ` [PATCH 1/3] md: call del_gendisk in control path Xiao Ni
2025-06-11 7:31 ` [PATCH 2/3] md: Don't clear MD_CLOSING until mddev is freed Xiao Ni
2025-06-11 7:31 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
2025-06-14 6:46 ` [PATCH V6 0/3] md: call del_gendisk in sync way Yu Kuai
2025-06-14 8:18 ` Yu Kuai
2025-06-15 3:21 ` Xiao Ni
2025-06-16 1:17 ` Yu Kuai
2025-06-16 1:43 ` Xiao Ni
-- strict thread matches above, loose matches on Subject: below --
2025-06-10 7:20 [PATCH V5 " Xiao Ni
2025-06-10 7:20 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
2025-06-11 6:17 ` Yu Kuai
2025-06-04 9:07 [PATCH V4 0/3] md: call del_gendisk in sync way Xiao Ni
2025-06-04 9:07 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
2025-06-03 5:20 [PATCH V3 0/2] md: call del_gendisk in sync way Xiao Ni
2025-06-03 5:20 ` [PATCH 3/3] md: remove/add redundancy group only in level change Xiao Ni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).