From: Logan Gunthorpe <logang@deltatee.com>
To: Song Liu <songliubraving@fb.com>, Jens Axboe <axboe@kernel.dk>,
linux-raid <linux-raid@vger.kernel.org>
Cc: David Sloan <david.sloan@eideticom.com>,
Yu Kuai <yukuai3@huawei.com>,
Mateusz Grzonka <mateusz.grzonka@intel.com>,
Saurabh Sengar <ssengar@linux.microsoft.com>,
XU pengfei <xupengfei@nfschina.com>,
Guoqing Jiang <guoqing.jiang@linux.dev>,
Zhou nan <zhounan@nfschina.com>
Subject: Re: [GIT PULL] md-next 20220921
Date: Wed, 21 Sep 2022 17:44:58 -0600 [thread overview]
Message-ID: <80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com> (raw)
In-Reply-To: <b347b8e9-d136-3430-5be0-b4b14d067dc4@deltatee.com>
On 2022-09-21 16:37, Logan Gunthorpe wrote:
>
>
> On 2022-09-21 15:33, Song Liu wrote:
>> Hi Jens,
>>
>> Please consider pulling the following changes for md-next on top of your
>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet).
>>
>> The major changes are:
>>
>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>> 2. Raid10 performance optimization, by Yu Kuai.
>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka.
>
> I may have hit a bug with my tests on the latest md-next branch. Still
> trying to hit it again. The last tests I ran for several days with some
> patches on the previous md-next branch, but I didn't have Mateusz's
> changes, and it also looks like the branch was rebased today so it could
> be caused by either of those things. I'll let you know when I know more.
Yes, ok, I've found two separate issues and both are fixed by reverting
21023a82bff7 ("md: generate CHANGE uevents for md device")
I suggest we drop that patch for this cycle so we can sort them out.
The issues are:
1) The concrete issue comes when running mdadm test 01r1fail. I get the
kernel bugs at the end of this email. It seems we cannot call
kobject_uevent() in at least one of the contexts that md_new_event() is
called in because it sleeps in a critical section.
2) With our custom test suite that creates and destroys arrays, adds and
removes disks, and runs data through them repeatedly, I randomly start
seeing these warnings:
mdadm: Fail to create md0 when using
/sys/module/md_mod/parameters/new_array, fallback to creation via node
And then very occasionally get that warning paired with this error:
mdadm: unexpected failure opening /dev/md0
Which stops the test because it fails to create an array. I also see a
lot of the same bugs as below so it may be related.
Logan
--
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 853, name: mdadm
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
1 lock held by mdadm/853:
#0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
CPU: 2 PID: 853 Comm: mdadm Not tainted
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x74
dump_stack+0x10/0x12
__might_resched.cold+0x146/0x17e
__might_sleep+0x66/0xc0
kmem_cache_alloc_trace+0x2f8/0x400
kobject_uevent_env+0x121/0xa30
kobject_uevent+0xb/0x10
md_new_event+0x6b/0x80
md_error+0x168/0x1b0
md_ioctl+0x989/0x2670
blkdev_ioctl+0x24d/0x450
__x64_sys_ioctl+0xc0/0x100
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
=============================
[ BUG: Invalid wait context ]
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680 Tainted: G
W
-----------------------------
mdadm/853 is trying to lock:
ffffffff990e4950 (uevent_sock_mutex){+.+.}-{3:3}, at:
kobject_uevent_env+0x460/0xa30
other info that might help us debug this:
context-{4:4}
1 lock held by mdadm/853:
#0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
stack backtrace:
CPU: 2 PID: 853 Comm: mdadm Tainted: G W
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x74
dump_stack+0x10/0x12
__lock_acquire.cold+0x2f2/0x31a
lock_acquire+0x183/0x440
__mutex_lock+0x125/0xe20
mutex_lock_nested+0x1b/0x20
kobject_uevent_env+0x460/0xa30
kobject_uevent+0xb/0x10
md_new_event+0x6b/0x80
md_error+0x168/0x1b0
md_ioctl+0x989/0x2670
blkdev_ioctl+0x24d/0x450
__x64_sys_ioctl+0xc0/0x100
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
next prev parent reply other threads:[~2022-09-21 23:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
2022-09-21 22:37 ` Logan Gunthorpe
2022-09-21 23:44 ` Logan Gunthorpe [this message]
2022-09-22 0:40 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com \
--to=logang@deltatee.com \
--cc=axboe@kernel.dk \
--cc=david.sloan@eideticom.com \
--cc=guoqing.jiang@linux.dev \
--cc=linux-raid@vger.kernel.org \
--cc=mateusz.grzonka@intel.com \
--cc=songliubraving@fb.com \
--cc=ssengar@linux.microsoft.com \
--cc=xupengfei@nfschina.com \
--cc=yukuai3@huawei.com \
--cc=zhounan@nfschina.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox