[PATCH AUTOSEL 6.12 15/31] md: call del_gendisk in control path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Xiao Ni <xni@redhat.com>, Yu Kuai <yukuai3@huawei.com>,
	Sasha Levin <sashal@kernel.org>,
	song@kernel.org, linux-raid@vger.kernel.org
Subject: [PATCH AUTOSEL 6.12 15/31] md: call del_gendisk in control path
Date: Sun,  3 Aug 2025 17:19:18 -0400	[thread overview]
Message-ID: <20250803211935.3547048-15-sashal@kernel.org> (raw)
In-Reply-To: <20250803211935.3547048-1-sashal@kernel.org>

From: Xiao Ni <xni@redhat.com>

[ Upstream commit 9e59d609763f70a992a8f3808dabcce60f14eb5c ]

Now del_gendisk and put_disk are called asynchronously in workqueue work.
The asynchronous way has a problem that the device node can still exist
after mdadm --stop command returns in a short window. So udev rule can
open this device node and create the struct mddev in kernel again. So put
del_gendisk in control path and still leave put_disk in md_kobj_release
to avoid uaf of gendisk.

Function del_gendisk can't be called with reconfig_mutex. If it's called
with reconfig mutex, a deadlock can happen. del_gendisk waits all sysfs
files access to finish and sysfs file access waits reconfig mutex. So
put del_gendisk after releasing reconfig mutex.

But there is still a window that sysfs can be accessed between mddev_unlock
and del_gendisk. So some actions (add disk, change level, .e.g) can happen
which lead unexpected results. MD_DELETED is used to resolve this problem.
MD_DELETED is set before releasing reconfig mutex and it should be checked
for these sysfs access which need reconfig mutex. For sysfs access which
don't need reconfig mutex, del_gendisk will wait them to finish.

But it doesn't need to do this in function mddev_lock_nointr. There are
ten places that call it.
* Five of them are in dm raid which we don't need to care. MD_DELETED is
only used for md raid.
* stop_sync_thread, md_do_sync and md_start_sync are related sync request,
and it needs to wait sync thread to finish before stopping an array.
* md_ioctl: md_open is called before md_ioctl, so ->openers is added. It
will fail to stop the array. So it doesn't need to check MD_DELETED here
* md_set_readonly:
It needs to call mddev_set_closing_and_sync_blockdev when setting readonly
or read_auto. So it will fail to stop the array too because MD_CLOSING is
already set.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/linux-raid/20250611073108.25463-2-xni@redhat.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
extensive analysis:

## 1. **Critical Race Condition Fix**

The commit addresses a serious race condition between `mdadm --stop` and
udev that can lead to unexpected behavior:

```c
// Before: del_gendisk was called asynchronously in md_kobj_release
- del_gendisk(mddev->gendisk);
  put_disk(mddev->gendisk);

// After: del_gendisk moved to control path in mddev_unlock
+ if (test_bit(MD_DELETED, &mddev->flags))
+     del_gendisk(mddev->gendisk);
```

The race window allows udev to reopen the device node after `mdadm
--stop` returns but before `del_gendisk()` completes, creating a new
`struct mddev` in the kernel unexpectedly.

## 2. **Use-After-Free Prevention**

The commit prevents a potential use-after-free (UAF) of gendisk by
properly sequencing the cleanup:
- `del_gendisk()` is called synchronously in the control path
- `put_disk()` remains in `md_kobj_release()` to avoid UAF

## 3. **Deadlock Avoidance**

The implementation carefully avoids deadlocks by calling `del_gendisk()`
after releasing `reconfig_mutex`:

```c
// In mddev_unlock():
+ /* Call del_gendisk after release reconfig_mutex to avoid
+  * deadlock (e.g. call del_gendisk under the lock and an
+  * access to sysfs files waits the lock)
+  */
+ if (test_bit(MD_DELETED, &mddev->flags))
+     del_gendisk(mddev->gendisk);
```

## 4. **MD_DELETED Flag Protection**

The commit introduces proper synchronization using the `MD_DELETED`
flag:

```c
// In mddev_lock():
+ if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+     ret = -ENODEV;
+     mutex_unlock(&mddev->reconfig_mutex);
+ }
```

This prevents sysfs operations from racing with array shutdown.

## 5. **Minimal and Contained Changes**

The fix is:
- Relatively small (~100 lines)
- Confined to the md subsystem
- Does not introduce new features
- Does not make architectural changes
- Only modifies the cleanup path timing

## 6. **Clear Bug Fix Criteria**

This meets all stable backport criteria:
- Fixes a real bug that affects users (race condition causing unexpected
  device recreation)
- Prevents potential data corruption/loss scenarios
- Has minimal risk of regression (only affects shutdown path)
- Is well-tested (has Reviewed-by tag)
- Fixes behavior that violates expected semantics (device should be gone
  after `mdadm --stop`)

## 7. **Related UAF History**

Recent commit history shows ongoing UAF issues in md:
- `8542870237c3 md: fix mddev uaf while iterating all_mddevs list`
- This indicates the subsystem has had similar issues that required
  fixes

The synchronous `del_gendisk()` approach is the correct fix for this
class of problems, making this an important stability improvement for
the md subsystem.

 drivers/md/md.c | 33 +++++++++++++++++++++++----------
 drivers/md/md.h | 26 ++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7809b951e09a..d3a9b56dc0d1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -619,9 +619,6 @@ static void __mddev_put(struct mddev *mddev)
 	    mddev->ctime || mddev->hold_active)
 		return;
 
-	/* Array is not configured at all, and not held active, so destroy it */
-	set_bit(MD_DELETED, &mddev->flags);
-
 	/*
 	 * Call queue_work inside the spinlock so that flush_workqueue() after
 	 * mddev_find will succeed in waiting for the work to be done.
@@ -856,6 +853,16 @@ void mddev_unlock(struct mddev *mddev)
 		kobject_del(&rdev->kobj);
 		export_rdev(rdev, mddev);
 	}
+
+	/* Call del_gendisk after release reconfig_mutex to avoid
+	 * deadlock (e.g. call del_gendisk under the lock and an
+	 * access to sysfs files waits the lock)
+	 * And MD_DELETED is only used for md raid which is set in
+	 * do_md_stop. dm raid only uses md_stop to stop. So dm raid
+	 * doesn't need to check MD_DELETED when getting reconfig lock
+	 */
+	if (test_bit(MD_DELETED, &mddev->flags))
+		del_gendisk(mddev->gendisk);
 }
 EXPORT_SYMBOL_GPL(mddev_unlock);
 
@@ -5720,19 +5727,30 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr);
 	struct mddev *mddev = container_of(kobj, struct mddev, kobj);
 	ssize_t rv;
+	struct kernfs_node *kn = NULL;
 
 	if (!entry->store)
 		return -EIO;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
+
+	if (entry->store == array_state_store && cmd_match(page, "clear"))
+		kn = sysfs_break_active_protection(kobj, attr);
+
 	spin_lock(&all_mddevs_lock);
 	if (!mddev_get(mddev)) {
 		spin_unlock(&all_mddevs_lock);
+		if (kn)
+			sysfs_unbreak_active_protection(kn);
 		return -EBUSY;
 	}
 	spin_unlock(&all_mddevs_lock);
 	rv = entry->store(mddev, page, length);
 	mddev_put(mddev);
+
+	if (kn)
+		sysfs_unbreak_active_protection(kn);
+
 	return rv;
 }
 
@@ -5740,12 +5758,6 @@ static void md_kobj_release(struct kobject *ko)
 {
 	struct mddev *mddev = container_of(ko, struct mddev, kobj);
 
-	if (mddev->sysfs_state)
-		sysfs_put(mddev->sysfs_state);
-	if (mddev->sysfs_level)
-		sysfs_put(mddev->sysfs_level);
-
-	del_gendisk(mddev->gendisk);
 	put_disk(mddev->gendisk);
 }
 
@@ -6601,8 +6613,9 @@ static int do_md_stop(struct mddev *mddev, int mode)
 		mddev->bitmap_info.offset = 0;
 
 		export_array(mddev);
-
 		md_clean(mddev);
+		set_bit(MD_DELETED, &mddev->flags);
+
 		if (mddev->hold_active == UNTIL_STOP)
 			mddev->hold_active = 0;
 	}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8826dce9717d..cf06a9de6f19 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -669,11 +669,26 @@ static inline bool reshape_interrupted(struct mddev *mddev)
 
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
-	return mutex_lock_interruptible(&mddev->reconfig_mutex);
+	int ret;
+
+	ret = mutex_lock_interruptible(&mddev->reconfig_mutex);
+
+	/* MD_DELETED is set in do_md_stop with reconfig_mutex.
+	 * So check it here.
+	 */
+	if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+		ret = -ENODEV;
+		mutex_unlock(&mddev->reconfig_mutex);
+	}
+
+	return ret;
 }
 
 /* Sometimes we need to take the lock in a situation where
  * failure due to interrupts is not acceptable.
+ * It doesn't need to check MD_DELETED here, the owner which
+ * holds the lock here can't be stopped. And all paths can't
+ * call this function after do_md_stop.
  */
 static inline void mddev_lock_nointr(struct mddev *mddev)
 {
@@ -682,7 +697,14 @@ static inline void mddev_lock_nointr(struct mddev *mddev)
 
 static inline int mddev_trylock(struct mddev *mddev)
 {
-	return mutex_trylock(&mddev->reconfig_mutex);
+	int ret;
+
+	ret = mutex_trylock(&mddev->reconfig_mutex);
+	if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+		ret = -ENODEV;
+		mutex_unlock(&mddev->reconfig_mutex);
+	}
+	return ret;
 }
 extern void mddev_unlock(struct mddev *mddev);
 
-- 
2.39.5

next prev parent reply	other threads:[~2025-08-03 21:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-03 21:19 [PATCH AUTOSEL 6.12 01/31] hfs: fix general protection fault in hfs_find_init() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 02/31] hfs: fix slab-out-of-bounds in hfs_bnode_read() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 03/31] hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 04/31] hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 05/31] hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 06/31] arm64: Handle KCOV __init vs inline mismatches Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 07/31] firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 08/31] smb/server: avoid deadlock when linking with ReplaceIfExists Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 09/31] nvme-pci: try function level reset on init failure Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 10/31] dm-stripe: limit chunk_sectors to the stripe size Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 11/31] md/raid10: set chunk_sectors limit Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 12/31] nvme-tcp: log TLS handshake failures at error level Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 13/31] gfs2: Validate i_depth for exhash directories Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 14/31] gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops Sasha Levin
2025-08-03 21:19 ` Sasha Levin [this message]
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 16/31] loop: Avoid updating block size under exclusive owner Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 17/31] udf: Verify partition map count Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 18/31] drbd: add missing kref_get in handle_write_conflicts Sasha Levin
2025-08-03 21:19   ` Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 19/31] hfs: fix not erasing deleted b-tree node issue Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 20/31] better lockdep annotations for simple_recursive_removal() Sasha Levin
2025-08-03 21:19 ` [PATCH AUTOSEL 6.12 21/31] ata: ahci: Disallow LPM policy control if not supported Sasha Levin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7809b951e09 dfblob:d3a9b56dc0d dfblob:8826dce9717
dfblob:cf06a9de6f1 )
 OR (
bs:"[PATCH AUTOSEL 6.12 15/31] md: call del_gendisk in control path" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250803211935.3547048-15-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=song@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xni@redhat.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.