public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Song Liu <songliubraving@fb.com>, Jens Axboe <axboe@kernel.dk>,
	linux-raid <linux-raid@vger.kernel.org>
Cc: David Sloan <david.sloan@eideticom.com>,
	Yu Kuai <yukuai3@huawei.com>,
	Mateusz Grzonka <mateusz.grzonka@intel.com>,
	Saurabh Sengar <ssengar@linux.microsoft.com>,
	XU pengfei <xupengfei@nfschina.com>,
	Guoqing Jiang <guoqing.jiang@linux.dev>,
	Zhou nan <zhounan@nfschina.com>
Subject: Re: [GIT PULL] md-next 20220921
Date: Wed, 21 Sep 2022 17:44:58 -0600	[thread overview]
Message-ID: <80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com> (raw)
In-Reply-To: <b347b8e9-d136-3430-5be0-b4b14d067dc4@deltatee.com>



On 2022-09-21 16:37, Logan Gunthorpe wrote:
> 
> 
> On 2022-09-21 15:33, Song Liu wrote:
>> Hi Jens, 
>>
>> Please consider pulling the following changes for md-next on top of your
>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 
>>
>> The major changes are:
>>
>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>> 2. Raid10 performance optimization, by Yu Kuai. 
>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 
> 
> I may have hit a bug with my tests on the latest md-next branch. Still
> trying to hit it again. The last tests I ran for several days with some
> patches on the previous md-next branch, but I didn't have Mateusz's
> changes, and it also looks like the branch was rebased today so it could
> be caused by either of those things. I'll let you know when I know more.

Yes, ok, I've found two separate issues and both are fixed by reverting

   21023a82bff7 ("md: generate CHANGE uevents for md device")

I suggest we drop that patch for this cycle so we can sort them out.

The issues are:

1) The concrete issue comes when running mdadm test 01r1fail. I get the
kernel bugs at the end of this email. It seems we cannot call
kobject_uevent() in at least one of the contexts that md_new_event() is
called in because it sleeps in a critical section.

2) With our custom test suite that creates and destroys arrays, adds and
removes disks, and runs data through them repeatedly, I randomly start
seeing these warnings:

   mdadm: Fail to create md0 when using
/sys/module/md_mod/parameters/new_array, fallback to creation via node

And then very occasionally get that warning paired with this error:

   mdadm: unexpected failure opening /dev/md0

Which stops the test because it fails to create an array. I also see a
lot of the same bugs as below so it may be related.

Logan

--

 BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 853, name: mdadm
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 CPU: 2 PID: 853 Comm: mdadm Not tainted
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __might_resched.cold+0x146/0x17e
  __might_sleep+0x66/0xc0
  kmem_cache_alloc_trace+0x2f8/0x400
  kobject_uevent_env+0x121/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

 =============================
 [ BUG: Invalid wait context ]
 6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680 Tainted: G
  W
 -----------------------------
 mdadm/853 is trying to lock:
 ffffffff990e4950 (uevent_sock_mutex){+.+.}-{3:3}, at:
kobject_uevent_env+0x460/0xa30
 other info that might help us debug this:
 context-{4:4}
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 stack backtrace:
 CPU: 2 PID: 853 Comm: mdadm Tainted: G        W
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __lock_acquire.cold+0x2f2/0x31a
  lock_acquire+0x183/0x440
  __mutex_lock+0x125/0xe20
  mutex_lock_nested+0x1b/0x20
  kobject_uevent_env+0x460/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0



  reply	other threads:[~2022-09-21 23:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
2022-09-21 22:37 ` Logan Gunthorpe
2022-09-21 23:44   ` Logan Gunthorpe [this message]
2022-09-22  0:40     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com \
    --to=logang@deltatee.com \
    --cc=axboe@kernel.dk \
    --cc=david.sloan@eideticom.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=linux-raid@vger.kernel.org \
    --cc=mateusz.grzonka@intel.com \
    --cc=songliubraving@fb.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=xupengfei@nfschina.com \
    --cc=yukuai3@huawei.com \
    --cc=zhounan@nfschina.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox