linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Naveen N Rao <naveen@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Sourabh Jain <sourabhjain@linux.ibm.com>,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: drivers/nx: Invalid wait context issue when rebooting
Date: Tue, 15 Oct 2024 16:48:53 +0530	[thread overview]
Message-ID: <Zw5PnZEXcMPJdwwy@linux.ibm.com> (raw)
In-Reply-To: <87a5f6zxbn.fsf@gmail.com>

On Mon, Oct 14, 2024 at 05:54:44PM +0530, Ritesh Harjani wrote:
> Vishal Chourasia <vishalc@linux.ibm.com> writes:
> 
> > On Fri, Oct 11, 2024 at 09:37:27PM +1100, Michael Ellerman wrote:
> >> 
> >> I don't see why of_reconfig_notifier_unregister() needs to be called
> >> with the devdata_mutext held, but I haven't looked that closely at it.
> >> 
> >> So the change below might work.
> >> 
> >> cheers
> >> 
> >> diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c
> >> index 35f2d0d8507e..a2050c5fb11d 100644
> >> --- a/drivers/crypto/nx/nx-common-pseries.c
> >> +++ b/drivers/crypto/nx/nx-common-pseries.c
> >> @@ -1122,10 +1122,11 @@ static void nx842_remove(struct vio_dev *viodev)
> >>  
> >>  	crypto_unregister_alg(&nx842_pseries_alg);
> >>  
> >> +	of_reconfig_notifier_unregister(&nx842_of_nb);
> >> +
> >>  	spin_lock_irqsave(&devdata_mutex, flags);
> >>  	old_devdata = rcu_dereference_check(devdata,
> >>  			lockdep_is_held(&devdata_mutex));
> >> -	of_reconfig_notifier_unregister(&nx842_of_nb);
> >>  	RCU_INIT_POINTER(devdata, NULL);
> >>  	spin_unlock_irqrestore(&devdata_mutex, flags);
> >>  	synchronize_rcu();
> >> 
> > With above changes, I see another similar bug, but what's strange is
> > swapper does not hold any lock and still this bug is being triggered
> 
> Looking at the below stack, it looks like you discovered a new problem
> after the above problem was fixed with the above changes.
> (So maybe you could submit this fix along with [1])
Sure, Ritesh. I have posted another version with the fix.
https://lore.kernel.org/all/20241015105551.1817348-2-vishalc@linux.ibm.com

> Also looking at the history of changes, seems the above problem always
> existed. Not sure why it wasn't caught earlier then?
> 
> [1]: https://lore.kernel.org/linuxppc-dev/ZwyqD-w5hEhrnqTB@linux.ibm.com/T/#u
> 
> I am not much aware of the below code paths. Nor it is evident from the
> stack on why "Invalid wait context". Maybe you can give git bisect a try
> for below issue (or can also wait for someone to comment on below stack).
> (But you might have to keep the nx-common-pseries driver disabled for git bisect to work). 
I will see if I can find a good commit and then carry out the bisect.
> 
> >
> > =============================
> > [ BUG: Invalid wait context ]
> > 6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty #84 Not tainted
> > -----------------------------
> > swapper/2/0 is trying to lock:
> > c000000004062128 (&xibm->lock){....}-{3:3}, at: xive_spapr_put_ipi+0xb8/0x120
> > other info that might help us debug this:
> > context-{2:2}
> > no locks held by swapper/2/0.
> > stack backtrace:
> > CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty #84
> > Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries
> > Call Trace:
> > [c000000004ac3420] [c00000000130d2e4] dump_stack_lvl+0xc8/0x130 (unreliable)
> > [c000000004ac3460] [c000000000312ca8] __lock_acquire+0xb68/0xf00
> > [c000000004ac3570] [c000000000313130] lock_acquire.part.0+0xf0/0x2a0
> > [c000000004ac3690] [c0000000013955b8] _raw_spin_lock_irqsave+0x78/0x130
> > kexec: waiting for cpu 2 (physical 2) to enter 2 state
> > [c000000004ac36d0] [c000000000194798] xive_spapr_put_ipi+0xb8/0x120
> > [c000000004ac3710] [c000000001383728] xive_cleanup_cpu_ipi+0xc8/0xf0
> > [c000000004ac3750] [c0000000013837f4] xive_teardown_cpu+0xa4/0x100
> > [c000000004ac3780] [c0000000001d2cc4] pseries_kexec_cpu_down+0x54/0x1e0
> > [c000000004ac3800] [c000000000213674] kexec_smp_down+0x124/0x1f0
> > [c000000004ac3890] [c0000000003c9ddc] __flush_smp_call_function_queue+0x28c/0xad0
> > [c000000004ac3950] [c00000000005fb64] smp_ipi_demux_relaxed+0xe4/0xf0
> > [c000000004ac3990] [c0000000000593d8] doorbell_exception+0x108/0x2f0
> > [c000000004ac3a20] [c00000000000a26c] doorbell_super_common_virt+0x28c/0x290
> > --- interrupt: a00 at plpar_hcall_norets_notrace+0x18/0x2c
> > NIP:  c0000000001bee18 LR: c0000000013867a8 CTR: 0000000000000000
> > REGS: c000000004ac3a50 TRAP: 0a00   Not tainted  (6.12.0-rc2-fix-invalid-wait-context-00222-g7d2910da7039-dirty)
> > MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 22000242  XER: 00000001
> > CFAR: 0000000000000000 IRQMASK: 0
> > GPR00: 0000000000000000 c000000004ac3cf0 c000000001e37600 0000000000000000
> > GPR04: 0000000000000000 0000000000000000 0001dc4f97750361 0000000000010000
> > GPR08: 00000000000000c0 0000000000000080 0001dc4f97750554 0000000000000080
> > GPR12: 0000000000000000 c0000007fffee480 0000000000000000 0000000000000000
> > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR20: 0000000000000000 c000000002ebf778 0000000000000000 00000043a215d824
> > GPR24: 0000000000000000 c000000000ec0f80 c000000002ebf778 0000000000000000
> > GPR28: 0000000000000000 0000000000000001 c0000000021a2300 c0000000021a2308
> > NIP [c0000000001bee18] plpar_hcall_norets_notrace+0x18/0x2c
> > LR [c0000000013867a8] check_and_cede_processor+0x48/0x80
> > --- interrupt: a00
> > [c000000004ac3cf0] [0000000000982538] 0x982538 (unreliable)
> > [c000000004ac3d50] [c000000001386874] dedicated_cede_loop+0x94/0x1a0
> > [c000000004ac3da0] [c00000000138584c] cpuidle_enter_state+0x10c/0x8a8
> > [c000000004ac3e50] [c000000000ec0f80] cpuidle_enter+0x50/0x80
> > [c000000004ac3e90] [c0000000002ba9c8] call_cpuidle+0x48/0xa0
> > [c000000004ac3eb0] [c0000000002cec54] cpuidle_idle_call+0x164/0x250
> > [c000000004ac3f00] [c0000000002cee74] do_idle+0x134/0x1d0
> > [c000000004ac3f50] [c0000000002cf34c] cpu_startup_entry+0x4c/0x50
> > [c000000004ac3f80] [c0000000000607d0] start_secondary+0x280/0x2b0
> > [c000000004ac3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
> 
> -ritesh


  reply	other threads:[~2024-10-15 11:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11  8:35 drivers/nx: Invalid wait context issue when rebooting Vishal Chourasia
2024-10-11 10:37 ` Michael Ellerman
2024-10-11 11:43   ` Vishal Chourasia
2024-10-11 17:20     ` Vishal Chourasia
2024-10-11 12:34   ` Vishal Chourasia
2024-10-14 12:24     ` Ritesh Harjani
2024-10-15 11:18       ` Vishal Chourasia [this message]
2024-10-11 12:36   ` Vishal Chourasia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zw5PnZEXcMPJdwwy@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=naveen@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=ritesh.list@gmail.com \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).