From: Christoph Hellwig <hch@infradead.org>
To: Zhao Heming <heming.zhao@suse.com>
Cc: linux-raid@vger.kernel.org, song@kernel.org,
guoqing.jiang@cloud.ionos.com, lidong.zhong@suse.com,
xni@redhat.com, neilb@suse.de, colyli@suse.com
Subject: Re: [PATCH v2] md: don't create mddev in md_open
Date: Thu, 1 Apr 2021 07:17:22 +0100 [thread overview]
Message-ID: <20210401061722.GA1355765@infradead.org> (raw)
In-Reply-To: <1617240896-15343-1-git-send-email-heming.zhao@suse.com>
[-- Attachment #1: Type: text/plain, Size: 533 bytes --]
On Thu, Apr 01, 2021 at 09:34:56AM +0800, Zhao Heming wrote:
> commit d3374825ce57 ("md: make devices disappear when they are no longer
> needed.") introduced protection between mddev creating & removing. The
> md_open shouldn't create mddev when all_mddevs list doesn't contain
> mddev. With currently code logic, there will be very easy to trigger
> soft lockup in non-preempt env.
As mention below, please don't make this even more of a mess than it
needs to. We can just pick the two patches doing this from the series
I sent:
[-- Attachment #2: 0001-md-factor-out-a-mddev_find_locked-helper-from-mddev_.patch --]
[-- Type: text/plain, Size: 1816 bytes --]
From 86dfff895d62495120d8d61ef2bc5d48db89fe20 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Wed, 3 Mar 2021 15:04:15 +0100
Subject: md: factor out a mddev_find_locked helper from mddev_find
Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 368cad6cd53a6e..5692427e78ba37 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -734,6 +734,17 @@ void mddev_init(struct mddev *mddev)
}
EXPORT_SYMBOL_GPL(mddev_init);
+static struct mddev *mddev_find_locked(dev_t unit)
+{
+ struct mddev *mddev;
+
+ list_for_each_entry(mddev, &all_mddevs, all_mddevs)
+ if (mddev->unit == unit)
+ return mddev;
+
+ return NULL;
+}
+
static struct mddev *mddev_find(dev_t unit)
{
struct mddev *mddev, *new = NULL;
@@ -745,13 +756,13 @@ static struct mddev *mddev_find(dev_t unit)
spin_lock(&all_mddevs_lock);
if (unit) {
- list_for_each_entry(mddev, &all_mddevs, all_mddevs)
- if (mddev->unit == unit) {
- mddev_get(mddev);
- spin_unlock(&all_mddevs_lock);
- kfree(new);
- return mddev;
- }
+ mddev = mddev_find_locked(unit);
+ if (mddev) {
+ mddev_get(mddev);
+ spin_unlock(&all_mddevs_lock);
+ kfree(new);
+ return mddev;
+ }
if (new) {
list_add(&new->all_mddevs, &all_mddevs);
@@ -777,12 +788,7 @@ static struct mddev *mddev_find(dev_t unit)
return NULL;
}
- is_free = 1;
- list_for_each_entry(mddev, &all_mddevs, all_mddevs)
- if (mddev->unit == dev) {
- is_free = 0;
- break;
- }
+ is_free = !mddev_find_locked(dev);
}
new->unit = dev;
new->md_minor = MINOR(dev);
--
2.30.1
[-- Attachment #3: 0002-md-split-mddev_find.patch --]
[-- Type: text/plain, Size: 6044 bytes --]
From f24462e5166808c4fd724f6b09233096d6ac1e56 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Wed, 3 Mar 2021 15:16:53 +0100
Subject: md: split mddev_find
Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.
This turns out to fix this bug reported by
Zhao Heming <heming.zhao@suse.com>:
------------------------------ snip ------------------------------
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt
*** script ***
about trigger 1 time with 10 tests
```
1 node1="15sp3-mdcluster1"
2 node2="15sp3-mdcluster2"
3
4 mdadm -Ss
5 ssh ${node2} "mdadm -Ss"
6 wipefs -a /dev/sda /dev/sdb
7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
/dev/sdb --assume-clean
8
9 for i in {1..100}; do
10 echo ==== $i ====;
11
12 echo "test ...."
13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14 sleep 1
15
16 echo "clean ....."
17 ssh ${node2} "mdadm -Ss"
18 done
```
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.
*** stack ***
```
ID: 2831 TASK: ffff8dd7223b5040 CPU: 0 COMMAND: "mdadm"
#0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
#1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
#2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
#3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
#4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
#5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
#6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
#7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
#8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
#9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c
or:
[ 884.226509] mddev_put+0x1c/0xe0 [md_mod]
[ 884.226515] md_open+0x3c/0xe0 [md_mod]
[ 884.226518] __blkdev_get+0x30d/0x710
[ 884.226520] ? bd_acquire+0xd0/0xd0
[ 884.226522] blkdev_get+0x14/0x30
[ 884.226524] do_dentry_open+0x204/0x3a0
[ 884.226531] path_openat+0x2fc/0x1520
[ 884.226534] ? seq_printf+0x4e/0x70
[ 884.226536] do_filp_open+0x9b/0x110
[ 884.226542] ? md_release+0x20/0x20 [md_mod]
[ 884.226543] ? seq_read+0x1d8/0x3e0
[ 884.226545] ? kmem_cache_alloc+0x18a/0x270
[ 884.226547] ? do_sys_open+0x1bd/0x260
[ 884.226548] do_sys_open+0x1bd/0x260
[ 884.226551] do_syscall_64+0x5b/0x1e0
[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9
```
*** rootcause ***
"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.
The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.
```
<thread 1>: "mdadm -Ss"
md_release
mddev_put deletes mddev from all_mddevs
queue_work for mddev_delayed_delete
at this time, "/dev/md0" is still available for opening
<thread 2>: "mdadm --monitor ..."
md_open
+ mddev_find can't find mddev of /dev/md0, and create a new mddev and
| return.
+ trigger "if (mddev->gendisk != bdev->bd_disk)" and return
-ERESTARTSYS.
```
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.
In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------
Fixes: d3374825ce57 ("md: make devices disappear when they are no longer needed.")
Reported-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5692427e78ba37..08d24177bbd4a4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -746,6 +746,22 @@ static struct mddev *mddev_find_locked(dev_t unit)
}
static struct mddev *mddev_find(dev_t unit)
+{
+ struct mddev *mddev;
+
+ if (MAJOR(unit) != MD_MAJOR)
+ unit &= ~((1 << MdpMinorShift) - 1);
+
+ spin_lock(&all_mddevs_lock);
+ mddev = mddev_find_locked(unit);
+ if (mddev)
+ mddev_get(mddev);
+ spin_unlock(&all_mddevs_lock);
+
+ return mddev;
+}
+
+static struct mddev *mddev_find_or_alloc(dev_t unit)
{
struct mddev *mddev, *new = NULL;
@@ -5650,7 +5666,7 @@ static int md_alloc(dev_t dev, char *name)
* writing to /sys/module/md_mod/parameters/new_array.
*/
static DEFINE_MUTEX(disks_mutex);
- struct mddev *mddev = mddev_find(dev);
+ struct mddev *mddev = mddev_find_or_alloc(dev);
struct gendisk *disk;
int partitioned;
int shift;
@@ -6530,11 +6546,9 @@ static void autorun_devices(int part)
md_probe(dev);
mddev = mddev_find(dev);
- if (!mddev || !mddev->gendisk) {
- if (mddev)
- mddev_put(mddev);
+ if (!mddev)
break;
- }
+
if (mddev_lock(mddev))
pr_warn("md: %s locked, cannot run\n", mdname(mddev));
else if (mddev->raid_disks || mddev->major_version
--
2.30.1
next prev parent reply other threads:[~2021-04-01 6:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 1:34 [PATCH v2] md: don't create mddev in md_open Zhao Heming
2021-04-01 6:17 ` Christoph Hellwig [this message]
2021-04-01 13:01 ` heming.zhao
2021-04-02 5:58 ` Song Liu
2021-04-02 10:48 ` Christoph Hellwig
[not found] ` <b2288ab4-1da0-612d-8988-637cc33db99a@suse.com>
2021-04-02 16:01 ` Song Liu
2021-04-03 2:45 ` heming.zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210401061722.GA1355765@infradead.org \
--to=hch@infradead.org \
--cc=colyli@suse.com \
--cc=guoqing.jiang@cloud.ionos.com \
--cc=heming.zhao@suse.com \
--cc=lidong.zhong@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=song@kernel.org \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox