linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Vishal Chourasia <vishalc@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Naveen N Rao <naveen@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: drivers/nx: Invalid wait context issue when rebooting
Date: Fri, 11 Oct 2024 21:37:27 +1100	[thread overview]
Message-ID: <87wmif53iw.fsf@mail.lhotse> (raw)
In-Reply-To: <ZwjjXJ5UtZ28FH6s@linux.ibm.com>

Vishal Chourasia <vishalc@linux.ibm.com> writes:
> Hi,
> I am getting Invalid wait context warning printed when rebooting lpar
>
> kexec/61926 is trying to acquire `of_reconfig_chain.rwsem` while holding
> spinlock `devdata_mutex`
>
> Note: Name of the spinlock is misleading.

Oof, yeah let's rename that to devdata_spinlock at least.

> In my case, I compiled a new vmlinux file and loaded it into the running
> kernel using `kexec -l` and then hit `reboot`
>
> dmesg:
> ------
>
> [ BUG: Invalid wait context ]
> 6.11.0-test2-10547-g684a64bf32b6-dirty #79 Not tainted

Is that v6.11 plus ~10,000 patches? O_o

Ah no, 684a64bf32b6 is roughly v6.12-rc1. Maybe if you fetch tags into
your tree you will get a more sensible version string?

Could also be good to try v6.12-rc2.

> -----------------------------
> kexec/61926 is trying to lock:
> c000000002d8b590 ((of_reconfig_chain).rwsem){++++}-{4:4}, at: blocking_notifier_chain_unregister+0x44/0xa0
> other info that might help us debug this:
> context-{5:5}
> 4 locks held by kexec/61926:
>  #0: c000000002926c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xf8/0x2e0
>  #1: c00000000291af30 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x160/0x310
>  #2: c000000051011938 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x174/0x310
>  #3: c000000002d88070 (devdata_mutex){....}-{3:3}, at: nx842_remove+0xac/0x1bc
  
That's pretty conclusive.

I don't understand why you're the first person to see this. I can't see
that any of the relevant code has changed recently. Unless something in
lockdep itself changed?

Did you just start seeing this on recent kernels? Can you bisect?

> stack backtrace:
> CPU: 2 UID: 0 PID: 61926 Comm: kexec Not tainted 6.11.0-test2-10547-g684a64bf32b6-dirty #79
> Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries
> Call Trace:
> [c0000000bb577400] [c000000001239704] dump_stack_lvl+0xc8/0x130 (unreliable)
> [c0000000bb577440] [c000000000248398] __lock_acquire+0xb68/0xf00
> [c0000000bb577550] [c000000000248820] lock_acquire.part.0+0xf0/0x2a0
> [c0000000bb577670] [c00000000127faa0] down_write+0x70/0x1e0
> [c0000000bb5776b0] [c0000000001acea4] blocking_notifier_chain_unregister+0x44/0xa0
> [c0000000bb5776e0] [c000000000e2312c] of_reconfig_notifier_unregister+0x2c/0x40
> [c0000000bb577700] [c000000000ded24c] nx842_remove+0x148/0x1bc
> [c0000000bb577790] [c00000000011a114] vio_bus_remove+0x54/0xc0
> [c0000000bb5777c0] [c000000000c1a44c] device_shutdown+0x20c/0x310
> [c0000000bb577850] [c0000000001b0ab4] kernel_restart_prepare+0x54/0x70
> [c0000000bb577870] [c000000000308718] kernel_kexec+0xa8/0x110
> [c0000000bb5778e0] [c0000000001b1144] __do_sys_reboot+0x214/0x2e0
> [c0000000bb577a40] [c000000000032f98] system_call_exception+0x148/0x310
> [c0000000bb577e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec

I don't see why of_reconfig_notifier_unregister() needs to be called
with the devdata_mutext held, but I haven't looked that closely at it.

So the change below might work.

cheers

diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c
index 35f2d0d8507e..a2050c5fb11d 100644
--- a/drivers/crypto/nx/nx-common-pseries.c
+++ b/drivers/crypto/nx/nx-common-pseries.c
@@ -1122,10 +1122,11 @@ static void nx842_remove(struct vio_dev *viodev)
 
 	crypto_unregister_alg(&nx842_pseries_alg);
 
+	of_reconfig_notifier_unregister(&nx842_of_nb);
+
 	spin_lock_irqsave(&devdata_mutex, flags);
 	old_devdata = rcu_dereference_check(devdata,
 			lockdep_is_held(&devdata_mutex));
-	of_reconfig_notifier_unregister(&nx842_of_nb);
 	RCU_INIT_POINTER(devdata, NULL);
 	spin_unlock_irqrestore(&devdata_mutex, flags);
 	synchronize_rcu();



  reply	other threads:[~2024-10-11 10:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11  8:35 drivers/nx: Invalid wait context issue when rebooting Vishal Chourasia
2024-10-11 10:37 ` Michael Ellerman [this message]
2024-10-11 11:43   ` Vishal Chourasia
2024-10-11 17:20     ` Vishal Chourasia
2024-10-11 12:34   ` Vishal Chourasia
2024-10-14 12:24     ` Ritesh Harjani
2024-10-15 11:18       ` Vishal Chourasia
2024-10-11 12:36   ` Vishal Chourasia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmif53iw.fsf@mail.lhotse \
    --to=mpe@ellerman.id.au \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=naveen@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=vishalc@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).