[PATCH AUTOSEL 6.6 10/23] md: call del

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.6 10/23] md: call del_gendisk in control path
       [not found] <20250803212031.3547641-1-sashal@kernel.org>
@ 2025-08-03 21:20 ` Sasha Levin
  2025-08-03 21:20 ` [PATCH AUTOSEL 6.6 19/23] md: Don't clear MD_CLOSING until mddev is freed Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-08-03 21:20 UTC (permalink / raw)
  To: patches, stable; +Cc: Xiao Ni, Yu Kuai, Sasha Levin, song, linux-raid

From: Xiao Ni <xni@redhat.com>

[ Upstream commit 9e59d609763f70a992a8f3808dabcce60f14eb5c ]

Now del_gendisk and put_disk are called asynchronously in workqueue work.
The asynchronous way has a problem that the device node can still exist
after mdadm --stop command returns in a short window. So udev rule can
open this device node and create the struct mddev in kernel again. So put
del_gendisk in control path and still leave put_disk in md_kobj_release
to avoid uaf of gendisk.

Function del_gendisk can't be called with reconfig_mutex. If it's called
with reconfig mutex, a deadlock can happen. del_gendisk waits all sysfs
files access to finish and sysfs file access waits reconfig mutex. So
put del_gendisk after releasing reconfig mutex.

But there is still a window that sysfs can be accessed between mddev_unlock
and del_gendisk. So some actions (add disk, change level, .e.g) can happen
which lead unexpected results. MD_DELETED is used to resolve this problem.
MD_DELETED is set before releasing reconfig mutex and it should be checked
for these sysfs access which need reconfig mutex. For sysfs access which
don't need reconfig mutex, del_gendisk will wait them to finish.

But it doesn't need to do this in function mddev_lock_nointr. There are
ten places that call it.
* Five of them are in dm raid which we don't need to care. MD_DELETED is
only used for md raid.
* stop_sync_thread, md_do_sync and md_start_sync are related sync request,
and it needs to wait sync thread to finish before stopping an array.
* md_ioctl: md_open is called before md_ioctl, so ->openers is added. It
will fail to stop the array. So it doesn't need to check MD_DELETED here
* md_set_readonly:
It needs to call mddev_set_closing_and_sync_blockdev when setting readonly
or read_auto. So it will fail to stop the array too because MD_CLOSING is
already set.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/linux-raid/20250611073108.25463-2-xni@redhat.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
extensive analysis:

## 1. **Critical Race Condition Fix**

The commit addresses a serious race condition between `mdadm --stop` and
udev that can lead to unexpected behavior:

```c
// Before: del_gendisk was called asynchronously in md_kobj_release
- del_gendisk(mddev->gendisk);
  put_disk(mddev->gendisk);

// After: del_gendisk moved to control path in mddev_unlock
+ if (test_bit(MD_DELETED, &mddev->flags))
+     del_gendisk(mddev->gendisk);
```

The race window allows udev to reopen the device node after `mdadm
--stop` returns but before `del_gendisk()` completes, creating a new
`struct mddev` in the kernel unexpectedly.

## 2. **Use-After-Free Prevention**

The commit prevents a potential use-after-free (UAF) of gendisk by
properly sequencing the cleanup:
- `del_gendisk()` is called synchronously in the control path
- `put_disk()` remains in `md_kobj_release()` to avoid UAF

## 3. **Deadlock Avoidance**

The implementation carefully avoids deadlocks by calling `del_gendisk()`
after releasing `reconfig_mutex`:

```c
// In mddev_unlock():
+ /* Call del_gendisk after release reconfig_mutex to avoid
+  * deadlock (e.g. call del_gendisk under the lock and an
+  * access to sysfs files waits the lock)
+  */
+ if (test_bit(MD_DELETED, &mddev->flags))
+     del_gendisk(mddev->gendisk);
```

## 4. **MD_DELETED Flag Protection**

The commit introduces proper synchronization using the `MD_DELETED`
flag:

```c
// In mddev_lock():
+ if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+     ret = -ENODEV;
+     mutex_unlock(&mddev->reconfig_mutex);
+ }
```

This prevents sysfs operations from racing with array shutdown.

## 5. **Minimal and Contained Changes**

The fix is:
- Relatively small (~100 lines)
- Confined to the md subsystem
- Does not introduce new features
- Does not make architectural changes
- Only modifies the cleanup path timing

## 6. **Clear Bug Fix Criteria**

This meets all stable backport criteria:
- Fixes a real bug that affects users (race condition causing unexpected
  device recreation)
- Prevents potential data corruption/loss scenarios
- Has minimal risk of regression (only affects shutdown path)
- Is well-tested (has Reviewed-by tag)
- Fixes behavior that violates expected semantics (device should be gone
  after `mdadm --stop`)

## 7. **Related UAF History**

Recent commit history shows ongoing UAF issues in md:
- `8542870237c3 md: fix mddev uaf while iterating all_mddevs list`
- This indicates the subsystem has had similar issues that required
  fixes

The synchronous `del_gendisk()` approach is the correct fix for this
class of problems, making this an important stability improvement for
the md subsystem.

 drivers/md/md.c | 33 +++++++++++++++++++++++----------
 drivers/md/md.h | 26 ++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca7ae3aad265..2758c3ee2f56 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -639,9 +639,6 @@ static void __mddev_put(struct mddev *mddev)
 	    mddev->ctime || mddev->hold_active)
 		return;
 
-	/* Array is not configured at all, and not held active, so destroy it */
-	set_bit(MD_DELETED, &mddev->flags);
-
 	/*
 	 * Call queue_work inside the spinlock so that flush_workqueue() after
 	 * mddev_find will succeed in waiting for the work to be done.
@@ -837,6 +834,16 @@ void mddev_unlock(struct mddev *mddev)
 		kobject_del(&rdev->kobj);
 		export_rdev(rdev, mddev);
 	}
+
+	/* Call del_gendisk after release reconfig_mutex to avoid
+	 * deadlock (e.g. call del_gendisk under the lock and an
+	 * access to sysfs files waits the lock)
+	 * And MD_DELETED is only used for md raid which is set in
+	 * do_md_stop. dm raid only uses md_stop to stop. So dm raid
+	 * doesn't need to check MD_DELETED when getting reconfig lock
+	 */
+	if (test_bit(MD_DELETED, &mddev->flags))
+		del_gendisk(mddev->gendisk);
 }
 EXPORT_SYMBOL_GPL(mddev_unlock);
 
@@ -5616,19 +5623,30 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr);
 	struct mddev *mddev = container_of(kobj, struct mddev, kobj);
 	ssize_t rv;
+	struct kernfs_node *kn = NULL;
 
 	if (!entry->store)
 		return -EIO;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
+
+	if (entry->store == array_state_store && cmd_match(page, "clear"))
+		kn = sysfs_break_active_protection(kobj, attr);
+
 	spin_lock(&all_mddevs_lock);
 	if (!mddev_get(mddev)) {
 		spin_unlock(&all_mddevs_lock);
+		if (kn)
+			sysfs_unbreak_active_protection(kn);
 		return -EBUSY;
 	}
 	spin_unlock(&all_mddevs_lock);
 	rv = entry->store(mddev, page, length);
 	mddev_put(mddev);
+
+	if (kn)
+		sysfs_unbreak_active_protection(kn);
+
 	return rv;
 }
 
@@ -5636,12 +5654,6 @@ static void md_kobj_release(struct kobject *ko)
 {
 	struct mddev *mddev = container_of(ko, struct mddev, kobj);
 
-	if (mddev->sysfs_state)
-		sysfs_put(mddev->sysfs_state);
-	if (mddev->sysfs_level)
-		sysfs_put(mddev->sysfs_level);
-
-	del_gendisk(mddev->gendisk);
 	put_disk(mddev->gendisk);
 }
 
@@ -6531,8 +6543,9 @@ static int do_md_stop(struct mddev *mddev, int mode,
 		mddev->bitmap_info.offset = 0;
 
 		export_array(mddev);
-
 		md_clean(mddev);
+		set_bit(MD_DELETED, &mddev->flags);
+
 		if (mddev->hold_active == UNTIL_STOP)
 			mddev->hold_active = 0;
 	}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 46995558d3bd..0a7c9122db50 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -589,11 +589,26 @@ static inline bool is_md_suspended(struct mddev *mddev)
 
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
-	return mutex_lock_interruptible(&mddev->reconfig_mutex);
+	int ret;
+
+	ret = mutex_lock_interruptible(&mddev->reconfig_mutex);
+
+	/* MD_DELETED is set in do_md_stop with reconfig_mutex.
+	 * So check it here.
+	 */
+	if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+		ret = -ENODEV;
+		mutex_unlock(&mddev->reconfig_mutex);
+	}
+
+	return ret;
 }
 
 /* Sometimes we need to take the lock in a situation where
  * failure due to interrupts is not acceptable.
+ * It doesn't need to check MD_DELETED here, the owner which
+ * holds the lock here can't be stopped. And all paths can't
+ * call this function after do_md_stop.
  */
 static inline void mddev_lock_nointr(struct mddev *mddev)
 {
@@ -602,7 +617,14 @@ static inline void mddev_lock_nointr(struct mddev *mddev)
 
 static inline int mddev_trylock(struct mddev *mddev)
 {
-	return mutex_trylock(&mddev->reconfig_mutex);
+	int ret;
+
+	ret = mutex_trylock(&mddev->reconfig_mutex);
+	if (!ret && test_bit(MD_DELETED, &mddev->flags)) {
+		ret = -ENODEV;
+		mutex_unlock(&mddev->reconfig_mutex);
+	}
+	return ret;
 }
 extern void mddev_unlock(struct mddev *mddev);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 6.6 19/23] md: Don't clear MD_CLOSING until mddev is freed
       [not found] <20250803212031.3547641-1-sashal@kernel.org>
  2025-08-03 21:20 ` [PATCH AUTOSEL 6.6 10/23] md: call del_gendisk in control path Sasha Levin
@ 2025-08-03 21:20 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-08-03 21:20 UTC (permalink / raw)
  To: patches, stable; +Cc: Xiao Ni, Yu Kuai, Sasha Levin, song, linux-raid

From: Xiao Ni <xni@redhat.com>

[ Upstream commit 5f286f33553d600e6c2fb5a23dd6afcf99b3ebac ]

UNTIL_STOP is used to avoid mddev is freed on the last close before adding
disks to mddev. And it should be cleared when stopping an array which is
mentioned in commit efeb53c0e572 ("md: Allow md devices to be created by
name."). So reset ->hold_active to 0 in md_clean.

And MD_CLOSING should be kept until mddev is freed to avoid reopen.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/linux-raid/20250611073108.25463-3-xni@redhat.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
detailed analysis:

## Bug Fix Analysis

This commit fixes a race condition in the MD (Multiple Device) subsystem
where the MD_CLOSING flag could be cleared prematurely, allowing an
mddev to be reopened while it's in the process of being freed. This is a
clear use-after-free vulnerability.

## Key Issues Fixed

1. **Race condition with MD_CLOSING flag**: The old code would
   conditionally clear MD_CLOSING based on whether `hold_active` was
   set:
  ```c
  if (mddev->hold_active)
  mddev->flags = 0;  // This clears MD_CLOSING!
  else
  mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
  ```

2. **Improper hold_active handling**: The commit properly moves the
   clearing of `hold_active` from `do_md_stop()` into `md_clean()`,
   making the code cleaner and more predictable.

## Impact on Users

- **Security Impact**: This prevents a potential use-after-free
  vulnerability where an mddev could be reopened while being freed.
- **Stability Impact**: Prevents crashes or undefined behavior when
  racing between mddev close/free and reopen operations.
- **Data Integrity**: Could prevent corruption scenarios where
  operations are performed on a partially freed mddev structure.

## Code Changes Analysis

The changes are minimal and focused:
1. Removes the conditional logic that could clear MD_CLOSING
   inappropriately
2. Always preserves MD_CLOSING flag in `md_clean()` with: `mddev->flags
   &= BIT_ULL_MASK(MD_CLOSING)`
3. Moves `hold_active = 0` assignment from `do_md_stop()` to
   `md_clean()` for better code organization
4. Updates comments to clarify the behavior

## Stable Tree Criteria

✓ **Fixes a real bug**: Yes - race condition leading to potential use-
after-free
✓ **Small and contained**: Yes - only 15 lines changed in a single
function
✓ **No new features**: Correct - only fixes existing behavior
✓ **Minimal regression risk**: Yes - preserves existing semantics while
fixing the race
✓ **Important enough**: Yes - security/stability issue in core storage
subsystem

The commit message explicitly references commit efeb53c0e572 which
introduced the UNTIL_STOP mechanism, showing this is fixing a long-
standing issue in the MD subsystem's lifecycle management.

 drivers/md/md.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2758c3ee2f56..60b49f7c74c2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6265,15 +6265,10 @@ static void md_clean(struct mddev *mddev)
 	mddev->persistent = 0;
 	mddev->level = LEVEL_NONE;
 	mddev->clevel[0] = 0;
-	/*
-	 * Don't clear MD_CLOSING, or mddev can be opened again.
-	 * 'hold_active != 0' means mddev is still in the creation
-	 * process and will be used later.
-	 */
-	if (mddev->hold_active)
-		mddev->flags = 0;
-	else
-		mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
+	/* if UNTIL_STOP is set, it's cleared here */
+	mddev->hold_active = 0;
+	/* Don't clear MD_CLOSING, or mddev can be opened again. */
+	mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
 	mddev->sb_flags = 0;
 	mddev->ro = MD_RDWR;
 	mddev->metadata_type[0] = 0;
@@ -6545,9 +6540,6 @@ static int do_md_stop(struct mddev *mddev, int mode,
 		export_array(mddev);
 		md_clean(mddev);
 		set_bit(MD_DELETED, &mddev->flags);
-
-		if (mddev->hold_active == UNTIL_STOP)
-			mddev->hold_active = 0;
 	}
 	md_new_event();
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-03 21:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250803212031.3547641-1-sashal@kernel.org>
2025-08-03 21:20 ` [PATCH AUTOSEL 6.6 10/23] md: call del_gendisk in control path Sasha Levin
2025-08-03 21:20 ` [PATCH AUTOSEL 6.6 19/23] md: Don't clear MD_CLOSING until mddev is freed Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).