All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	qemu-ppc@nongnu.org, linuxppc-dev@lists.ozlabs.org
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Cédric Le Goater" <clg@kaod.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Suraj Jitindar Singh" <sjitindarsingh@gmail.com>,
	tglx@linutronix.de, "David? Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()
Date: Thu, 19 Dec 2019 23:45:26 +1100	[thread overview]
Message-ID: <871rt0mql5.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <7a5b63a0-2bf5-60bb-8678-b7b36671a29c@zx2c4.com>

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> Hi folks,
>
> I'm actually still experiencing this sporadically in the WireGuard test 
> suite, which you can see being run on https://build.wireguard.com/ . 

Fancy dashboard you got there :)

> About 50% of the time the powerpc64 build will fail at a place like this:
>
> [   65.147823] Oops: Exception in kernel mode, sig: 4 [#1]
> [   65.149198] LE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=4 NUMA pSeries
> [   65.149595] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1+ #1
> [   65.149745] NIP:  c000000000033330 LR: c00000000007eda0 CTR: c00000000007ed80
> [   65.149934] REGS: c000000000a47970 TRAP: 0700   Not tainted  (5.5.0-rc1+)
> [   65.150032] MSR:  800000000288b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> > CR: 48008288  XER: 00000000
> [   65.150352] CFAR: c0000000000332bc IRQMASK: 1
> [   65.150352] GPR00: c000000000036508 c000000000a47c00 c000000000a4c100 0000000000000001
> [   65.150352] GPR04: c000000000a50998 0000000000000000 c000000000a50908 000000000f509000
> [   65.150352] GPR08: 0000000028000000 0000000000000000 0000000000000000 c00000000ff24f00
> [   65.150352] GPR12: c00000000007ed80 c000000000ad9000 0000000000000000 0000000000000000
> [   65.150352] GPR16: 00000000008c9190 00000000008c94a8 00000000008c92f8 00000000008c98b0
> [   65.150352] GPR20: 00000000008f2f88 fffffffffffffffd 0000000000000014 0000000000e6c100
> [   65.150352] GPR24: 0000000000e6c100 0000000000000001 0000000000000000 c000000000a50998
> [   65.150352] GPR28: c000000000a9e280 c000000000a50aa4 0000000000000002 0000000000000000
> [   65.151591] NIP [c000000000033330] doorbell_try_core_ipi+0xd0/0xf0
> [   65.151750] LR [c00000000007eda0] smp_pseries_cause_ipi+0x20/0x70
> [   65.151913] Call Trace:
> [   65.152109] [c000000000a47c00] [c0000000000c7c9c] _nohz_idle_balance+0xbc/0x300 (unreliable)
> [   65.152370] [c000000000a47c30] [c000000000036508] smp_send_reschedule+0x98/0xb0
> [   65.152711] [c000000000a47c50] [c0000000000c1634] kick_ilb+0x114/0x140
> [   65.152962] [c000000000a47ca0] [c0000000000c86d8] newidle_balance+0x4e8/0x500
> [   65.153213] [c000000000a47d20] [c0000000000c8788] pick_next_task_fair+0x48/0x3a0
> [   65.153424] [c000000000a47d80] [c000000000466620] __schedule+0xf0/0x430
> [   65.153612] [c000000000a47de0] [c000000000466b04] schedule_idle+0x34/0x70
> [   65.153786] [c000000000a47e10] [c0000000000c0bc8] do_idle+0x1a8/0x220
> [   65.154121] [c000000000a47e70] [c0000000000c0e94] cpu_startup_entry+0x34/0x40
> [   65.154313] [c000000000a47ea0] [c00000000000ef1c] rest_init+0x10c/0x124
> [   65.154414] [c000000000a47ee0] [c000000000500004] start_kernel+0x568/0x594
> [   65.154585] [c000000000a47f90] [c00000000000a7cc] start_here_common+0x1c/0x330
> [   65.154854] Instruction dump:
> [   65.155191] 38210030 e8010010 7c0803a6 4e800020 3d220004 39295228 81290000 3929ffff
> [   65.155498] 7d284038 7c0004ac 5508017e 65082800 <7c00411c> e94d0178 812a0000 3929ffff
                                                      ^
Again the faulting instruction there is "msgsndp r8"

> [   65.156155] ---[ end trace 6180d12e268ffdaf ]---
> [   65.185452]
> [   66.187490] Kernel panic - not syncing: Fatal exception
>
> This is with "qemu-system-ppc64 -smp 4 -machine pseries" on QEMU 4.0.0.
>
> I'm not totally sure what's going on here. I'm emulating a pseries, and 
> using that with qemu's pseries model, and I see that selecting the 
> pseries forces the selection of 'config PPC_DOORBELL' (twice in the same 
> section, actually).

Noted.

> Then inside the kernel there appears to be some runtime CPU check for
> doorbell support.

Not really. The kernel looks at the CPU revision (PVR) and decides that
it has doorbell support.

> Is this a case in which QEMU is advertising doorbell support that TCG
> doesn't have? Or is something else happening here?

It's a gap in the emulation I guess. qemu doesn't emulate msgsndp, but
it really should because that's a supported instruction since Power8.

I suspect msgsndp wasn't implemented for TCG because TCG doesn't support
more than one thread per core, and you can only send doorbells to other
threads in the same core, and therefore there is no reason to ever use
msgsndp.

That's the message Suraj mentioned up thread, eg:

  $ qemu-system-ppc64 -nographic -vga none -M pseries -smp 2,threads=2 -cpu POWER8 -kernel build~/vmlinux
  qemu-system-ppc64: TCG cannot support more than 1 thread/core on a pseries machine


But I guess we've hit another case of a CPU sending itself an IPI, and
the way the sibling masks are done, CPUs are siblings of themselves, so
the sibling test passes, eg:

int doorbell_try_core_ipi(int cpu)
{
	int this_cpu = get_cpu();
	int ret = 0;

	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
		doorbell_core_ipi(cpu);



In which case this patch should fix it.

diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index f17ff1200eaa..e45cb9bba193 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -63,7 +63,7 @@ int doorbell_try_core_ipi(int cpu)
 	int this_cpu = get_cpu();
 	int ret = 0;
 
-	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
+	if (cpu != this_cpu && cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
 		doorbell_core_ipi(cpu);
 		ret = 1;
 	}


The other option would be we disable CPU_FTR_DBELL if we detect we're
running under TCG. But I'm not sure we have a particularly clean way to
detect that.

cheers

  parent reply	other threads:[~2019-12-19 12:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-27 12:36 pseries on qemu-system-ppc64le crashes in doorbell_core_ipi() Sebastian Andrzej Siewior
2019-03-27 16:37 ` [Qemu-ppc] " Cédric Le Goater
2019-03-27 16:51   ` Cédric Le Goater
2019-03-29  5:20     ` Suraj Jitindar Singh
2019-03-29  8:32       ` Sebastian Andrzej Siewior
2019-03-29  9:13       ` Nicholas Piggin
2019-03-29 15:31         ` Steven Rostedt
2019-03-30  3:10           ` Nicholas Piggin
2019-04-01  8:38             ` Peter Zijlstra
2019-04-04 16:25               ` Nicholas Piggin
2019-04-05 14:47                 ` Sebastian Andrzej Siewior
2019-04-06  0:06               ` Frederic Weisbecker
2019-04-09  9:25                 ` Nicholas Piggin
2019-12-19 10:41                   ` Jason A. Donenfeld
2019-12-19 11:13                     ` Sebastian Andrzej Siewior
2019-12-19 11:19                       ` Jason A. Donenfeld
2019-12-19 12:45                     ` Michael Ellerman [this message]
2019-12-19 13:08                       ` Cédric Le Goater
2019-12-20  0:22                         ` David? Gibson
2019-12-20 11:32                         ` Jason A. Donenfeld
2019-12-20 12:21                           ` David? Gibson
2019-12-20 15:59                           ` Cédric Le Goater
2019-12-20  0:53                       ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rt0mql5.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=Jason@zx2c4.com \
    --cc=bigeasy@linutronix.de \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rostedt@goodmis.org \
    --cc=sjitindarsingh@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.