All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Naveen N Rao <naveen@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Sourabh Jain <sourabhjain@linux.ibm.com>,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: drivers/nx: Invalid wait context issue when rebooting
Date: Tue, 15 Oct 2024 16:48:53 +0530	[thread overview]
Message-ID: <Zw5PnZEXcMPJdwwy@linux.ibm.com> (raw)
In-Reply-To: <87a5f6zxbn.fsf@gmail.com>

On Mon, Oct 14, 2024 at 05:54:44PM +0530, Ritesh Harjani wrote:
> Vishal Chourasia <vishalc@linux.ibm.com> writes:
> 
> > On Fri, Oct 11, 2024 at 09:37:27PM +1100, Michael Ellerman wrote:
> >> 
> >> I don't see why of_reconfig_notifier_unregister() needs to be called
> >> with the devdata_mutext held, but I haven't looked that closely at it.
> >> 
> >> So the change below might work.
> >> 
> >> cheers
> >> 
> >> diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c
> >> index 35f2d0d8507e..a2050c5fb11d 100644
> >> --- a/drivers/crypto/nx/nx-common-pseries.c
> >> +++ b/drivers/crypto/nx/nx-common-pseries.c
> >> @@ -1122,10 +1122,11 @@ static void nx842_remove(struct vio_dev *viodev)
> >>  
> >>  	crypto_unregister_alg(&nx842_pseries_alg);
> >>  
> >> +	of_reconfig_notifier_unregister(&nx842_of_nb);
> >> +
> >>  	spin_lock_irqsave(&devdata_mutex, flags);
> >>  	old_devdata = rcu_dereference_check(devdata,
> >>  			lockdep_is_held(&devdata_mutex));
> >> -	of_reconfig_notifier_unregister(&nx842_of_nb);
> >>  	RCU_INIT_POINTER(devdata, NULL);
> >>  	spin_unlock_irqrestore(&devdata_mutex, flags);
> >>  	synchronize_rcu();
> >> 
> > With above changes, I see another similar bug, but what's strange is
> > swapper does not hold any lock and still this bug is being triggered
> 
> Looking at the below stack, it looks like you discovered a new problem
> after the above problem was fixed with the above changes.
> (So maybe you could submit this fix along with [1])
Sure, Ritesh. I have posted another version with the fix.
https://lore.kernel.org/all/20241015105551.1817348-2-vishalc@linux.ibm.com

> Also looking at the history of changes, seems the above problem always
> existed. Not sure why it wasn't caught earlier then?
> 
> [1]: https://lore.kernel.org/linuxppc-dev/ZwyqD-w5hEhrnqTB@linux.ibm.com/T/#u
> 
> I am not much aware of the below code paths. Nor it is evident from the
> stack on why "Invalid wait context". Maybe you can give git bisect a try
> for below issue (or can also wait for someone to comment on below stack).
> (But you might have to keep the nx-common-pseries driver disabled for git bisect to work). 
I will see if I can find a good commit and then carry out the bisect.
> 
> >
> > =============================
> > [ BUG: Invalid wait context ]
> > 6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty #84 Not tainted
> > -----------------------------
> > swapper/2/0 is trying to lock:
> > c000000004062128 (&xibm->lock){....}-{3:3}, at: xive_spapr_put_ipi+0xb8/0x120
> > other info that might help us debug this:
> > context-{2:2}
> > no locks held by swapper/2/0.
> > stack backtrace:
> > CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty #84
> > Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries
> > Call Trace:
> > [c000000004ac3420] [c00000000130d2e4] dump_stack_lvl+0xc8/0x130 (unreliable)
> > [c000000004ac3460] [c000000000312ca8] __lock_acquire+0xb68/0xf00
> > [c000000004ac3570] [c000000000313130] lock_acquire.part.0+0xf0/0x2a0
> > [c000000004ac3690] [c0000000013955b8] _raw_spin_lock_irqsave+0x78/0x130
> > kexec: waiting for cpu 2 (physical 2) to enter 2 state
> > [c000000004ac36d0] [c000000000194798] xive_spapr_put_ipi+0xb8/0x120
> > [c000000004ac3710] [c000000001383728] xive_cleanup_cpu_ipi+0xc8/0xf0
> > [c000000004ac3750] [c0000000013837f4] xive_teardown_cpu+0xa4/0x100
> > [c000000004ac3780] [c0000000001d2cc4] pseries_kexec_cpu_down+0x54/0x1e0
> > [c000000004ac3800] [c000000000213674] kexec_smp_down+0x124/0x1f0
> > [c000000004ac3890] [c0000000003c9ddc] __flush_smp_call_function_queue+0x28c/0xad0
> > [c000000004ac3950] [c00000000005fb64] smp_ipi_demux_relaxed+0xe4/0xf0
> > [c000000004ac3990] [c0000000000593d8] doorbell_exception+0x108/0x2f0
> > [c000000004ac3a20] [c00000000000a26c] doorbell_super_common_virt+0x28c/0x290
> > --- interrupt: a00 at plpar_hcall_norets_notrace+0x18/0x2c
> > NIP:  c0000000001bee18 LR: c0000000013867a8 CTR: 0000000000000000
> > REGS: c000000004ac3a50 TRAP: 0a00   Not tainted  (6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty)
> > MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 22000242  XER: 00000001
> > CFAR: 0000000000000000 IRQMASK: 0
> > GPR00: 0000000000000000 c000000004ac3cf0 c000000001e37600 0000000000000000
> > GPR04: 0000000000000000 0000000000000000 0001dc4f97750361 0000000000010000
> > GPR08: 00000000000000c0 0000000000000080 0001dc4f97750554 0000000000000080
> > GPR12: 0000000000000000 c0000007fffee480 0000000000000000 0000000000000000
> > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR20: 0000000000000000 c000000002ebf778 0000000000000000 00000043a215d824
> > GPR24: 0000000000000000 c000000000ec0f80 c000000002ebf778 0000000000000000
> > GPR28: 0000000000000000 0000000000000001 c0000000021a2300 c0000000021a2308
> > NIP [c0000000001bee18] plpar_hcall_norets_notrace+0x18/0x2c
> > LR [c0000000013867a8] check_and_cede_processor+0x48/0x80
> > --- interrupt: a00
> > [c000000004ac3cf0] [0000000000982538] 0x982538 (unreliable)
> > [c000000004ac3d50] [c000000001386874] dedicated_cede_loop+0x94/0x1a0
> > [c000000004ac3da0] [c00000000138584c] cpuidle_enter_state+0x10c/0x8a8
> > [c000000004ac3e50] [c000000000ec0f80] cpuidle_enter+0x50/0x80
> > [c000000004ac3e90] [c0000000002ba9c8] call_cpuidle+0x48/0xa0
> > [c000000004ac3eb0] [c0000000002cec54] cpuidle_idle_call+0x164/0x250
> > [c000000004ac3f00] [c0000000002cee74] do_idle+0x134/0x1d0
> > [c000000004ac3f50] [c0000000002cf34c] cpu_startup_entry+0x4c/0x50
> > [c000000004ac3f80] [c0000000000607d0] start_secondary+0x280/0x2b0
> > [c000000004ac3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
> 
> -ritesh

  reply	other threads:[~2024-10-15 11:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11  8:35 drivers/nx: Invalid wait context issue when rebooting Vishal Chourasia
2024-10-11 10:37 ` Michael Ellerman
2024-10-11 11:43   ` Vishal Chourasia
2024-10-11 17:20     ` Vishal Chourasia
2024-10-11 12:34   ` Vishal Chourasia
2024-10-14 12:24     ` Ritesh Harjani
2024-10-15 11:18       ` Vishal Chourasia [this message]
2024-10-11 12:36   ` Vishal Chourasia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zw5PnZEXcMPJdwwy@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=naveen@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=ritesh.list@gmail.com \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.