From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org,
Herbert Xu <herbert@gondor.apana.org.au>,
"David S. Miller" <davem@davemloft.net>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Naveen N Rao <naveen@kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: drivers/nx: Invalid wait context issue when rebooting
Date: Fri, 11 Oct 2024 22:50:31 +0530 [thread overview]
Message-ID: <ZwleX01sala8Rc3-@linux.ibm.com> (raw)
In-Reply-To: <ZwkPT3H_WVd5KyZQ@linux.ibm.com>
On Fri, Oct 11, 2024 at 05:13:11PM +0530, Vishal Chourasia wrote:
> On Fri, Oct 11, 2024 at 09:37:27PM +1100, Michael Ellerman wrote:
> > Vishal Chourasia <vishalc@linux.ibm.com> writes:
> > > Hi,
> > > I am getting Invalid wait context warning printed when rebooting lpar
> > >
> > > kexec/61926 is trying to acquire `of_reconfig_chain.rwsem` while holding
> > > spinlock `devdata_mutex`
> > >
> > > Note: Name of the spinlock is misleading.
> >
> > Oof, yeah let's rename that to devdata_spinlock at least.
> >
> > > In my case, I compiled a new vmlinux file and loaded it into the running
> > > kernel using `kexec -l` and then hit `reboot`
> > >
> > > dmesg:
> > > ------
> > >
> > > [ BUG: Invalid wait context ]
> > > 6.11.0-test2-10547-g684a64bf32b6-dirty #79 Not tainted
> >
> > Is that v6.11 plus ~10,000 patches? O_o
> >
> > Ah no, 684a64bf32b6 is roughly v6.12-rc1. Maybe if you fetch tags into
> > your tree you will get a more sensible version string?
> >
> > Could also be good to try v6.12-rc2.
> Sure.
> >
> > > -----------------------------
> > > kexec/61926 is trying to lock:
> > > c000000002d8b590 ((of_reconfig_chain).rwsem){++++}-{4:4}, at: blocking_notifier_chain_unregister+0x44/0xa0
> > > other info that might help us debug this:
> > > context-{5:5}
> > > 4 locks held by kexec/61926:
> > > #0: c000000002926c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xf8/0x2e0
> > > #1: c00000000291af30 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x160/0x310
> > > #2: c000000051011938 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x174/0x310
> > > #3: c000000002d88070 (devdata_mutex){....}-{3:3}, at: nx842_remove+0xac/0x1bc
> >
> > That's pretty conclusive.
> >
> > I don't understand why you're the first person to see this. I can't see
> > that any of the relevant code has changed recently. Unless something in
> > lockdep itself changed?
> >
> > Did you just start seeing this on recent kernels? Can you bisect?
> Yes. Sure, I will try bisecting, and get back.
I tested for v6.0, v6.6, v6.9 kernel version, and all of them hit this
bug.
Also, this bug is hit when I load the vmlinux file using `kexec -l`
and do a reboot.
> >
> > > stack backtrace:
> > > CPU: 2 UID: 0 PID: 61926 Comm: kexec Not tainted 6.11.0-test2-10547-g684a64bf32b6-dirty #79
> > > Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries
> > > Call Trace:
> > > [c0000000bb577400] [c000000001239704] dump_stack_lvl+0xc8/0x130 (unreliable)
> > > [c0000000bb577440] [c000000000248398] __lock_acquire+0xb68/0xf00
> > > [c0000000bb577550] [c000000000248820] lock_acquire.part.0+0xf0/0x2a0
> > > [c0000000bb577670] [c00000000127faa0] down_write+0x70/0x1e0
> > > [c0000000bb5776b0] [c0000000001acea4] blocking_notifier_chain_unregister+0x44/0xa0
> > > [c0000000bb5776e0] [c000000000e2312c] of_reconfig_notifier_unregister+0x2c/0x40
> > > [c0000000bb577700] [c000000000ded24c] nx842_remove+0x148/0x1bc
> > > [c0000000bb577790] [c00000000011a114] vio_bus_remove+0x54/0xc0
> > > [c0000000bb5777c0] [c000000000c1a44c] device_shutdown+0x20c/0x310
> > > [c0000000bb577850] [c0000000001b0ab4] kernel_restart_prepare+0x54/0x70
> > > [c0000000bb577870] [c000000000308718] kernel_kexec+0xa8/0x110
> > > [c0000000bb5778e0] [c0000000001b1144] __do_sys_reboot+0x214/0x2e0
> > > [c0000000bb577a40] [c000000000032f98] system_call_exception+0x148/0x310
> > > [c0000000bb577e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
> >
> > I don't see why of_reconfig_notifier_unregister() needs to be called
> > with the devdata_mutext held, but I haven't looked that closely at it.
> >
> > So the change below might work.
> >
> > cheers
> >
> > diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c
> > index 35f2d0d8507e..a2050c5fb11d 100644
> > --- a/drivers/crypto/nx/nx-common-pseries.c
> > +++ b/drivers/crypto/nx/nx-common-pseries.c
> > @@ -1122,10 +1122,11 @@ static void nx842_remove(struct vio_dev *viodev)
> >
> > crypto_unregister_alg(&nx842_pseries_alg);
> >
> > + of_reconfig_notifier_unregister(&nx842_of_nb);
> > +
> > spin_lock_irqsave(&devdata_mutex, flags);
> > old_devdata = rcu_dereference_check(devdata,
> > lockdep_is_held(&devdata_mutex));
> > - of_reconfig_notifier_unregister(&nx842_of_nb);
> > RCU_INIT_POINTER(devdata, NULL);
> > spin_unlock_irqrestore(&devdata_mutex, flags);
> > synchronize_rcu();
> >
next prev parent reply other threads:[~2024-10-11 17:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 8:35 drivers/nx: Invalid wait context issue when rebooting Vishal Chourasia
2024-10-11 10:37 ` Michael Ellerman
2024-10-11 11:43 ` Vishal Chourasia
2024-10-11 17:20 ` Vishal Chourasia [this message]
2024-10-11 12:34 ` Vishal Chourasia
2024-10-14 12:24 ` Ritesh Harjani
2024-10-15 11:18 ` Vishal Chourasia
2024-10-11 12:36 ` Vishal Chourasia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwleX01sala8Rc3-@linux.ibm.com \
--to=vishalc@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=naveen@kernel.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.