All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: Gautham R Shenoy <ego.lkml@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>
Subject: Re: missing doorbell interrupt when onlining cpu
Date: Tue, 10 Sep 2019 18:04:53 -0500	[thread overview]
Message-ID: <87o8zrrbvu.fsf@linux.ibm.com> (raw)
In-Reply-To: <87woenr6p5.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:

> Nathan Lynch <nathanl@linux.ibm.com> writes:
>
>> I'm hoping for some help investigating a behavior I see when doing cpu
>> hotplug under load on P9 and P8 LPARs. Occasionally, while coming online
>> a cpu will seem to get "stuck" in idle, with a pending doorbell
>> interrupt unserviced (cpu 12 here):
>>
>> cpuhp/12-70    [012] 46133.602202: cpuhp_enter:          cpu: 0012 target: 205 step: 174 (0xc000000000028920s)
>>  load.sh-8201  [014] 46133.602248: sched_waking:         comm=cpuhp/12 pid=70 prio=120 target_cpu=012
>>  load.sh-8201  [014] 46133.602251: smp_send_reschedule:  (c000000000052868) cpu=12
>>   <idle>-0     [012] 46133.602252: do_idle:              (c000000000162e08)
>>  load.sh-8201  [014] 46133.602252: smp_muxed_ipi_message_pass: (c0000000000527e8) cpu=12 msg=1
>>  load.sh-8201  [014] 46133.602253: doorbell_core_ipi:    (c00000000004d3e8) cpu=12
>>   <idle>-0     [012] 46133.602257: arch_cpu_idle:        (c000000000022d08)
>>   <idle>-0     [012] 46133.602259: pseries_lpar_idle:    (c0000000000d43c8)
>
> I should be more explicit that given my tracing configuration I would
> expect to see doorbell events etc here e.g.
>
>          <idle>-0     [012] 46133.602086: doorbell_entry:       pt_regs=0xc000000200e7fb50
>          <idle>-0     [012] 46133.602087: smp_ipi_demux_relaxed: (c0000000000530f8)
>          <idle>-0     [012] 46133.602088: scheduler_ipi:        (c00000000015e4f8)
>          <idle>-0     [012] 46133.602091: sched_wakeup:         cpuhp/12:70 [120] success=1 CPU:012
>          <idle>-0     [012] 46133.602092: sched_wakeup:         migration/12:71 [0] success=1 CPU:012
>          <idle>-0     [012] 46133.602093: doorbell_exit:        pt_regs=0xc000000200e7fb50
>
> but instead cpu 12 goes to idle.

Another clue is that I've occasionaly provoked this warning:

WARNING: CPU: 7 PID: 9045 at arch/powerpc/kernel/irq.c:282 arch_local_irq_restore+0xdc/0x150
Modules linked in:
CPU: 7 PID: 9045 Comm: offliner Not tainted 5.3.0-rc2-00190-g9b123d1ea237-dirty #45
NIP:  c00000000001d91c LR: c000000001988210 CTR: 0000000000334ee8
REGS: c00000000e19f390 TRAP: 0700   Not tainted  (5.3.0-rc2-00190-g9b123d1ea237-dirty)
MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 44444424  XER: 20040000
CFAR: c00000000001d884 IRQMASK: 0 
GPR00: c000000001988210 c00000000e19f620 c0000000032f6200 0000000000000000 
GPR04: c00000000e589f10 0000000000000006 c00000000e19f664 c00000000395f260 
GPR08: 000000000000003b 0000000000008000 0000000000000009 0000000000000000 
GPR12: 0000000000000001 c00000001eca7780 000000000000005c 00000100106c7de0 
GPR16: 0000000000000000 0000000000000000 00000000100c0a48 0000000000000001 
GPR20: 00000000100c5748 00000001fc710000 0000000000000078 c000000003345c78 
GPR24: c0000003ffd99a00 c000000003349de0 0000000000000000 c0000003fb086c10 
GPR28: 0000000000000000 000000000000000f c0000003fb086c10 0000000000000000 
NIP [c00000000001d91c] arch_local_irq_restore+0xdc/0x150
LR [c000000001988210] _raw_spin_unlock_irqrestore+0xa0/0xd0
Call Trace:
[c00000000e19f6a0] [c000000001988210] _raw_spin_unlock_irqrestore+0xa0/0xd0
[c00000000e19f6d0] [c0000000001be920] try_to_wake_up+0x330/0xf30
[c00000000e19f7a0] [c0000000001bf5b0] wake_up_q+0x70/0xc0
[c00000000e19f7e0] [c0000000002b5a08] cpu_stop_queue_work+0xc8/0x140
[c00000000e19f850] [c0000000002b5bac] queue_stop_cpus_work+0xdc/0x160
[c00000000e19f8b0] [c0000000002b5c98] __stop_cpus+0x68/0xc0
[c00000000e19f950] [c0000000002b65ec] stop_cpus+0x5c/0x90
[c00000000e19f9a0] [c0000000002b6924] stop_machine_cpuslocked+0x194/0x1f0
[c00000000e19fa10] [c00000000016c768] takedown_cpu+0x98/0x260
[c00000000e19fad0] [c00000000016cea4] cpuhp_invoke_callback+0x114/0xf40
[c00000000e19fb60] [c00000000017194c] _cpu_down+0x19c/0x320
[c00000000e19fbd0] [c00000000016ff58] do_cpu_down+0x68/0xb0
[c00000000e19fc10] [c000000000ccccd4] cpu_subsys_offline+0x24/0x40
[c00000000e19fc30] [c000000000cc2860] device_offline+0x100/0x140
[c00000000e19fc70] [c000000000cc2a00] online_store+0x70/0xf0
[c00000000e19fcb0] [c000000000cbcee8] dev_attr_store+0x38/0x60
[c00000000e19fcd0] [c00000000059c970] sysfs_kf_write+0x70/0xb0
[c00000000e19fd10] [c00000000059afa8] kernfs_fop_write+0xf8/0x280
[c00000000e19fd60] [c0000000004b436c] __vfs_write+0x3c/0x70
[c00000000e19fd80] [c0000000004b8700] vfs_write+0xd0/0x220
[c00000000e19fdd0] [c0000000004b8abc] ksys_write+0x7c/0x140
[c00000000e19fe20] [c00000000000bbd8] system_call+0x5c/0x68

i.e. in arch_local_irq_restore():
		/*
		 * We should already be hard disabled here. We had bugs
		 * where that wasn't the case so let's dbl check it and
		 * warn if we are wrong. Only do that when IRQ tracing
		 * is enabled as mfmsr() can be costly.
		 */
		if (WARN_ON_ONCE(mfmsr() & MSR_EE))
			__hard_irq_disable();

Anyway, I've proposed a fix:

https://patchwork.ozlabs.org/patch/1160572/

      reply	other threads:[~2019-09-10 23:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 22:51 missing doorbell interrupt when onlining cpu Nathan Lynch
2019-09-04 23:18 ` Nathan Lynch
2019-09-10 23:04   ` Nathan Lynch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8zrrbvu.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=ego.lkml@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.