All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	peterz@infradead.org, tglx@linutronix.de, mingo@kernel.org,
	tj@kernel.org, rusty@rustcorp.com.au, akpm@linux-foundation.org,
	fweisbec@gmail.com, hch@infradead.org
Cc: mgorman@suse.de, riel@redhat.com, bp@suse.de,
	rostedt@goodmis.org, mgalbraith@suse.de, ego@linux.vnet.ibm.com,
	paulmck@linux.vnet.ibm.com, oleg@redhat.com, rjw@rjwysocki.net,
	linux-kernel@vger.kernel.org, Dave Jones <davej@redhat.com>
Subject: Re: [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
Date: Wed, 25 Jun 2014 11:42:31 -0400	[thread overview]
Message-ID: <53AAEDE7.8060300@oracle.com> (raw)
In-Reply-To: <20140526110831.16203.25130.stgit@srivatsabhat.in.ibm.com>

On 05/26/2014 07:08 AM, Srivatsa S. Bhat wrote:
> During CPU offline, in stop-machine, we don't enforce any rule in the
> _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other
> CPUs disable their local interrupts. Hence, we can encounter a scenario as
> depicted below, in which IPIs are sent by the other CPUs to the CPU going
> offline (while it is *still* online), but the outgoing CPU notices them only
> *after* it has gone offline.
> 
> 
>               CPU 1                                         CPU 2
>           (Online CPU)                               (CPU going offline)
> 
>        Enter _PREPARE stage                          Enter _PREPARE stage
> 
>                                                      Enter _DISABLE_IRQ stage
> 
> 
>                                                    =
>        Got a device interrupt,                     | Didn't notice the IPI
>        and the interrupt handler                   | since interrupts were
>        called smp_call_function()                  | disabled on this CPU.
>        and sent an IPI to CPU 2.                   |
>                                                    =
> 
> 
>        Enter _DISABLE_IRQ stage
> 
> 
>        Enter _RUN stage                              Enter _RUN stage
> 
>                                   =
>        Busy loop with interrupts  |                  Invoke take_cpu_down()
>        disabled.                  |                  and take CPU 2 offline
>                                   =
> 
> 
>        Enter _EXIT stage                             Enter _EXIT stage
> 
>        Re-enable interrupts                          Re-enable interrupts
> 
>                                                      The pending IPI is noted
>                                                      immediately, but alas,
>                                                      the CPU is offline at
>                                                      this point.
> 
> 
> 
> This of course, makes the smp-call-function IPI handler code unhappy and it
> complains about "receiving an IPI on an offline CPU".
> 
> However, if we look closely, we observe that the IPI was sent when CPU 2 was
> still online, and hence it was perfectly legal for CPU 1 to send the IPI at
> that point. Furthermore, receiving an IPI on an offline CPU is terrible only
> if there were pending callbacks yet to be executed by that CPU (in other words,
> its a bug if the CPU went offline with work still pending).
> 
> So, fix this by flushing all the queued smp-call-function callbacks on the
> outgoing CPU in the CPU_DYING stage[1], including those callbacks for which the
> source CPU's IPIs might not have been received on the outgoing CPU yet. This
> ensures that all pending IPI callbacks are run before the CPU goes completely
> offline. But note that the outgoing CPU can still get IPIs from the other CPUs
> just after it exits stop-machine, due to the scenario mentioned above. But
> because we flush the callbacks before going offline, this will be completely
> harmless.
> 
> Further, this solution also guarantees that there will be pending callbacks
> on an offline CPU *only if* the source CPU initiated the IPI-send-procedure
> *after* the target CPU went offline, which clearly indicates a bug in the
> sender code.
> 
> So, considering all this, teach the smp-call-function IPI handler code to
> complain only if an offline CPU received an IPI *and* it still had pending
> callbacks to execute, since that is the only buggy scenario.
> 
> There is another case (somewhat theoretical though) where IPIs might arrive
> late on the target CPU (possibly _after_ the CPU has gone offline): due to IPI
> latencies in the hardware. But with this patch, even this scenario turns out
> to be harmless, since we explicitly loop through the call_single_queue and
> flush out any pending callbacks without waiting for the corresponding IPIs
> to arrive.
> 
> 
> [1]. The CPU_DYING part needs a little more explanation: by the time we
> execute the CPU_DYING notifier callbacks, the CPU would have already been
> marked offline. But we want to flush out the pending callbacks at this stage,
> ignoring the fact that the CPU is offline. So restructure the IPI handler
> code so that we can by-pass the "is-cpu-offline?" check in this particular
> case. (Of course, the right solution here is to fix CPU hotplug to mark the
> CPU offline _after_ invoking the CPU_DYING notifiers, but this requires a
> lot of audit to ensure that this change doesn't break any existing code;
> hence lets go with the solution proposed above until that is done).
> 
> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew:

[ 1982.600053] kernel BUG at kernel/irq_work.c:175!
[ 1982.600053] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1982.600053] Dumping ftrace buffer:
[ 1982.600053]    (ftrace buffer empty)
[ 1982.600053] Modules linked in:
[ 1982.600053] CPU: 14 PID: 168 Comm: migration/14 Not tainted 3.16.0-rc2-next-20140624-sasha-00024-g332b58d #726
[ 1982.600053] task: ffff88036a5a3000 ti: ffff88036a5ac000 task.ti: ffff88036a5ac000
[ 1982.600053] RIP: irq_work_run (kernel/irq_work.c:175 (discriminator 1))
[ 1982.600053] RSP: 0000:ffff88036a5afbe0  EFLAGS: 00010046
[ 1982.600053] RAX: 0000000080000001 RBX: 0000000000000000 RCX: 0000000000000008
[ 1982.600053] RDX: 000000000000000e RSI: ffffffffaf9185fb RDI: 0000000000000000
[ 1982.600053] RBP: ffff88036a5afc08 R08: 0000000000099224 R09: 0000000000000000
[ 1982.600053] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88036afd8400
[ 1982.600053] R13: 0000000000000000 R14: ffffffffb0cf8120 R15: ffffffffb0cce5d0
[ 1982.600053] FS:  0000000000000000(0000) GS:ffff88036ae00000(0000) knlGS:0000000000000000
[ 1982.600053] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1982.600053] CR2: 00000000019485d0 CR3: 00000002c7c8f000 CR4: 00000000000006a0
[ 1982.600053] Stack:
[ 1982.600053]  ffffffffab20fbb5 0000000000000082 ffff88036afd8440 0000000000000000
[ 1982.600053]  0000000000000001 ffff88036a5afc28 ffffffffab20fca7 0000000000000000
[ 1982.600053]  00000000ffffffef ffff88036a5afc78 ffffffffab19c58e 000000000000000e
[ 1982.600053] Call Trace:
[ 1982.600053] ? flush_smp_call_function_queue (kernel/smp.c:263)
[ 1982.600053] hotplug_cfd (kernel/smp.c:81)
[ 1982.600053] notifier_call_chain (kernel/notifier.c:95)
[ 1982.600053] __raw_notifier_call_chain (kernel/notifier.c:395)
[ 1982.600053] __cpu_notify (kernel/cpu.c:202)
[ 1982.600053] cpu_notify (kernel/cpu.c:211)
[ 1982.600053] take_cpu_down (./arch/x86/include/asm/current.h:14 kernel/cpu.c:312)
[ 1982.600053] multi_cpu_stop (kernel/stop_machine.c:201)
[ 1982.600053] ? __stop_cpus (kernel/stop_machine.c:170)
[ 1982.600053] cpu_stopper_thread (kernel/stop_machine.c:474)
[ 1982.600053] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[ 1982.600053] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 1982.600053] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1982.600053] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 1982.600053] smpboot_thread_fn (kernel/smpboot.c:160)
[ 1982.600053] ? __smpboot_create_thread (kernel/smpboot.c:105)
[ 1982.600053] kthread (kernel/kthread.c:210)
[ 1982.600053] ? wait_for_completion (kernel/sched/completion.c:77 kernel/sched/completion.c:93 kernel/sched/completion.c:101 kernel/sched/completion.c:122)
[ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
[ 1982.600053] ret_from_fork (arch/x86/kernel/entry_64.S:349)
[ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
[ 1982.600053] Code: 00 00 00 00 e8 63 ff ff ff 48 83 c4 08 b8 01 00 00 00 5b 5d c3 b8 01 00 00 00 c3 90 65 8b 04 25 a0 da 00 00 a9 00 00 0f 00 75 09 <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 e8 2f ff ff ff 5d c3 66
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 00                	add    %al,(%rax)
   4:	e8 63 ff ff ff       	callq  0xffffffffffffff6c
   9:	48 83 c4 08          	add    $0x8,%rsp
   d:	b8 01 00 00 00       	mov    $0x1,%eax
  12:	5b                   	pop    %rbx
  13:	5d                   	pop    %rbp
  14:	c3                   	retq
  15:	b8 01 00 00 00       	mov    $0x1,%eax
  1a:	c3                   	retq
  1b:	90                   	nop
  1c:	65 8b 04 25 a0 da 00 	mov    %gs:0xdaa0,%eax
  23:	00
  24:	a9 00 00 0f 00       	test   $0xf0000,%eax
  29:	75 09                	jne    0x34
  2b:*	0f 0b                	ud2    		<-- trapping instruction
  2d:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  34:	55                   	push   %rbp
  35:	48 89 e5             	mov    %rsp,%rbp
  38:	e8 2f ff ff ff       	callq  0xffffffffffffff6c
  3d:	5d                   	pop    %rbp
  3e:	c3                   	retq
  3f:	66                   	data16
	...

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
   9:	55                   	push   %rbp
   a:	48 89 e5             	mov    %rsp,%rbp
   d:	e8 2f ff ff ff       	callq  0xffffffffffffff41
  12:	5d                   	pop    %rbp
  13:	c3                   	retq
  14:	66                   	data16
	...
[ 1982.600053] RIP irq_work_run (kernel/irq_work.c:175 (discriminator 1))
[ 1982.600053]  RSP <ffff88036a5afbe0>


Thanks,
Sasha

  reply	other threads:[~2014-06-25 15:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-26 11:08 [PATCH v7 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
2014-05-26 11:08 ` [PATCH v7 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
2014-05-26 11:08 ` [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
2014-06-25 15:42   ` Sasha Levin [this message]
2014-06-25 16:59     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53AAEDE7.8060300@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=davej@redhat.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.