[PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue
@ 2014-05-23 10:11 Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 10:11 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

Hi,

There is a long-standing problem related to CPU hotplug which causes IPIs to
be delivered to offline CPUs, and the smp-call-function IPI handler code
prints out a warning whenever this is detected. Every once in a while this
(usually harmless) warning gets reported on LKML, but so far it has not been
completely fixed. Usually the solution involves finding out the IPI sender
and fixing it by adding appropriate synchronization with CPU hotplug.

However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more specifically,
in stop-machine) that can lead to this problem even when the sender code
is perfectly fine. This patchset fixes that synchronization problem in the
CPU hotplug stop-machine code.

Patch 1 adds some additional debug code to the smp-call-function framework,
to help debug such issues easily.

Patch 2 modifies the stop-machine code to ensure that any IPIs that were sent
while the target CPU was online, would be noticed and handled by that CPU
without fail before it goes offline. Thus, this avoids scenarios where IPIs
are received on offline CPUs (as long as the sender uses proper hotplug
synchronization).

Patch 3 adds a mechanism to flush any pending smp-call-function callbacks
queued on the CPU going offline (including those callbacks for which the
IPIs from the source CPUs might not have arrived in time at the outgoing CPU).
This ensures that a CPU never goes offline with work still pending.

In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq() function.
But I was not able to find anything wrong with the block layer code. That's
when I started looking at the stop-machine code and realized that there is
a race-window which makes the IPI _receiver_ the culprit, not the sender.
Patch 2 fixes that race and hence this should put an end to most of the
hard-to-debug IPI-to-offline-CPU issues.

Changes in v6:
Modified Patch 3 to flush the pending callbacks from CPU_DYING notifier
instead of stop-machine directly, so that only the CPU hotplug path will
run this code, instead of everybody who uses stop-machine. Suggested by
Peter Zijlstra.

Changes in v5:
Added Patch 3 to flush out any pending smp-call-function callbacks on the
outgoing CPU, as suggested by Frederic Weisbecker.

Changes in v4:
Rewrote a comment in Patch 2 and reorganized the code for better readability.

Changes in v3:
Rewrote patch 2 and split the MULTI_STOP_DISABLE_IRQ state into two:
MULTI_STOP_DISABLE_IRQ_INACTIVE and MULTI_STOP_DISABLE_IRQ_ACTIVE, and
used this framework to ensure that the CPU going offline always disables
its interrupts last. Suggested by Tejun Heo.

v1 and v2:
https://lkml.org/lkml/2014/5/6/474

 Srivatsa S. Bhat (3):
      smp: Print more useful debug info upon receiving IPI on an offline CPU
      CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
      CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline

 kernel/smp.c          |   71 +++++++++++++++++++++++++++++++++++++++++++------
 kernel/stop_machine.c |   39 +++++++++++++++++++++++----
 2 files changed, 96 insertions(+), 14 deletions(-)

Regards,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU
  2014-05-23 10:11 [PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
@ 2014-05-23 10:12 ` Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
  2 siblings, 0 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 10:12 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

Today the smp-call-function code just prints a warning if we get an IPI on
an offline CPU. This info is sufficient to let us know that something went
wrong, but often it is very hard to debug exactly who sent the IPI and why,
from this info alone.

In most cases, we get the warning about the IPI to an offline CPU, immediately
after the CPU going offline comes out of the stop-machine phase and reenables
interrupts. Since all online CPUs participate in stop-machine, the information
regarding the sender of the IPI is already lost by the time we exit the
stop-machine loop. So even if we dump the stack on each CPU at this point,
we won't find anything useful since all of them will show the stack-trace of
the stopper thread. So we need a better way to figure out who sent the IPI and
why.

To achieve this, when we detect an IPI targeted to an offline CPU, loop through
the call-single-data linked list and print out the payload (i.e., the name
of the function which was supposed to be executed by the target CPU). This
would give us an insight as to who might have sent the IPI and help us debug
this further.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..306f818 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -185,14 +185,26 @@ void generic_smp_call_function_single_interrupt(void)
 {
 	struct llist_node *entry;
 	struct call_single_data *csd, *csd_next;
+	static bool warned;
+
+	entry = llist_del_all(&__get_cpu_var(call_single_queue));
+	entry = llist_reverse_order(entry);

 	/*
 	 * Shouldn't receive this interrupt on a cpu that is not yet online.
 	 */
-	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
+	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
+		warned = true;
+		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());

-	entry = llist_del_all(&__get_cpu_var(call_single_queue));
-	entry = llist_reverse_order(entry);
+		/*
+		 * We don't have to use the _safe() variant here
+		 * because we are not invoking the IPI handlers yet.
+		 */
+		llist_for_each_entry(csd, entry, llist)
+			pr_warn("IPI callback %pS sent to offline CPU\n",
+				csd->func);
+	}

 	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
 		csd->func(csd->info);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 10:11 [PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
@ 2014-05-23 10:12 ` Srivatsa S. Bhat
  2014-05-23 13:22   ` Frederic Weisbecker
  2014-05-23 15:21   ` Peter Zijlstra
  2014-05-23 10:12 ` [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
  2 siblings, 2 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 10:12 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

During CPU offline, stop-machine is used to take control over all the online
CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
that is to be taken offline.

But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
The important thing to note here is that the _DISABLE_IRQ stage comes much
later after starting stop-machine, and hence there is a large window where
other CPUs can send IPIs to the CPU going offline. As a result, we can
encounter a scenario as depicted below, which causes IPIs to be sent to the
CPU going offline, and that CPU notices them *after* it has gone offline,
triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.

              CPU 1                                         CPU 2
          (Online CPU)                               (CPU going offline)

       Enter _PREPARE stage                          Enter _PREPARE stage

                                                     Enter _DISABLE_IRQ stage

                                                   =
       Got a device interrupt,                     | Didn't notice the IPI
       and the interrupt handler                   | since interrupts were
       called smp_call_function()                  | disabled on this CPU.
       and sent an IPI to CPU 2.                   |
                                                   =

       Enter _DISABLE_IRQ stage

       Enter _RUN stage                              Enter _RUN stage

                                  =
       Busy loop with interrupts  |                  Invoke take_cpu_down()
       disabled.                  |                  and take CPU 2 offline
                                  =

       Enter _EXIT stage                             Enter _EXIT stage

       Re-enable interrupts                          Re-enable interrupts

                                                     The pending IPI is noted
                                                     immediately, but alas,
                                                     the CPU is offline at
                                                     this point.

So, as we can observe from this scenario, the IPI was sent when CPU 2 was
still online, and hence it was perfectly legal. But unfortunately it was
noted only after CPU 2 went offline, resulting in the warning from the
IPI handling code. In other words, the fault was not at the sender, but
at the receiver side - and if we look closely, the real bug is in the
stop-machine sequence itself.

The problem here is that the CPU going offline disabled its local interrupts
(by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
reason why it was not able to respond to the IPI before going offline.

A simple solution to this problem is to ensure that the CPU going offline
disables its interrupts only *after* the other CPUs do the same thing.
To achieve this, split the _DISABLE_IRQ state into 2 parts:

1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
(i.e., the "other" CPUs) disable their interrupts.

2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
CPU going offline) disables its interrupts.

With this in place, the CPU going offline will always be the last one to
disable interrupts. After this step, no further IPIs can be sent to the
outgoing CPU, since all the other CPUs would be executing the stop-machine
code with interrupts disabled. And by the time stop-machine ends, the CPU
would have gone offline and disappeared from the cpu_online_mask, and hence
future invocations of smp_call_function() and friends will automatically
prune that CPU out. Thus, we can guarantee that no CPU will end up
*inadvertently* sending IPIs to an offline CPU.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 01fbae5..288f7fe 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -130,8 +130,10 @@ enum multi_stop_state {
 	MULTI_STOP_NONE,
 	/* Awaiting everyone to be scheduled. */
 	MULTI_STOP_PREPARE,
-	/* Disable interrupts. */
-	MULTI_STOP_DISABLE_IRQ,
+	/* Disable interrupts on CPUs not in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_INACTIVE,
+	/* Disable interrupts on CPUs in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_ACTIVE,
 	/* Run the function */
 	MULTI_STOP_RUN,
 	/* Exit */
@@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
 		cpu_relax();
+
+		/*
+		 * We use 2 separate stages to disable interrupts, namely
+		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
+		 * disable their interrupts first, followed by the active CPUs.
+		 *
+		 * This is done to avoid a race in the CPU offline path, which
+		 * can lead to receiving IPIs on the outgoing CPU *after* it
+		 * has gone offline.
+		 *
+		 * During CPU offline, we don't want the other CPUs to send
+		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
+		 * disabled interrupts (because, then it will notice the IPIs
+		 * only after it has gone offline). We can prevent this by
+		 * making the other CPUs disable their interrupts first - that
+		 * way, they will run the stop-machine code with interrupts
+		 * disabled, and hence won't send IPIs after that point.
+		 */
+
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-			case MULTI_STOP_DISABLE_IRQ:
-				local_irq_disable();
-				hard_irq_disable();
+			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
+				if (!is_active) {
+					local_irq_disable();
+					hard_irq_disable();
+				}
+				break;
+			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
+				if (is_active) {
+					local_irq_disable();
+					hard_irq_disable();
+				}
 				break;
 			case MULTI_STOP_RUN:
 				if (is_active)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 10:12 ` [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
@ 2014-05-23 13:22   ` Frederic Weisbecker
  2014-05-23 14:45     ` Srivatsa S. Bhat
  2014-05-23 15:21   ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-05-23 13:22 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
> During CPU offline, stop-machine is used to take control over all the online
> CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
> that is to be taken offline.
> 
> But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
> The important thing to note here is that the _DISABLE_IRQ stage comes much
> later after starting stop-machine, and hence there is a large window where
> other CPUs can send IPIs to the CPU going offline. As a result, we can
> encounter a scenario as depicted below, which causes IPIs to be sent to the
> CPU going offline, and that CPU notices them *after* it has gone offline,
> triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.
> 
> 
>               CPU 1                                         CPU 2
>           (Online CPU)                               (CPU going offline)
> 
>        Enter _PREPARE stage                          Enter _PREPARE stage
> 
>                                                      Enter _DISABLE_IRQ stage
> 
> 
>                                                    =
>        Got a device interrupt,                     | Didn't notice the IPI
>        and the interrupt handler                   | since interrupts were
>        called smp_call_function()                  | disabled on this CPU.
>        and sent an IPI to CPU 2.                   |
>                                                    =
> 
> 
>        Enter _DISABLE_IRQ stage
> 
> 
>        Enter _RUN stage                              Enter _RUN stage
> 
>                                   =
>        Busy loop with interrupts  |                  Invoke take_cpu_down()
>        disabled.                  |                  and take CPU 2 offline
>                                   =
> 
> 
>        Enter _EXIT stage                             Enter _EXIT stage
> 
>        Re-enable interrupts                          Re-enable interrupts
> 
>                                                      The pending IPI is noted
>                                                      immediately, but alas,
>                                                      the CPU is offline at
>                                                      this point.
> 
> 
> 
> So, as we can observe from this scenario, the IPI was sent when CPU 2 was
> still online, and hence it was perfectly legal. But unfortunately it was
> noted only after CPU 2 went offline, resulting in the warning from the
> IPI handling code. In other words, the fault was not at the sender, but
> at the receiver side - and if we look closely, the real bug is in the
> stop-machine sequence itself.
> 
> The problem here is that the CPU going offline disabled its local interrupts
> (by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
> reason why it was not able to respond to the IPI before going offline.
> 
> A simple solution to this problem is to ensure that the CPU going offline
> disables its interrupts only *after* the other CPUs do the same thing.
> To achieve this, split the _DISABLE_IRQ state into 2 parts:
> 
> 1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
> (i.e., the "other" CPUs) disable their interrupts.
> 
> 2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
> CPU going offline) disables its interrupts.
> 
> With this in place, the CPU going offline will always be the last one to
> disable interrupts. After this step, no further IPIs can be sent to the
> outgoing CPU, since all the other CPUs would be executing the stop-machine
> code with interrupts disabled. And by the time stop-machine ends, the CPU
> would have gone offline and disappeared from the cpu_online_mask, and hence
> future invocations of smp_call_function() and friends will automatically
> prune that CPU out. Thus, we can guarantee that no CPU will end up
> *inadvertently* sending IPIs to an offline CPU.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 01fbae5..288f7fe 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -130,8 +130,10 @@ enum multi_stop_state {
>  	MULTI_STOP_NONE,
>  	/* Awaiting everyone to be scheduled. */
>  	MULTI_STOP_PREPARE,
> -	/* Disable interrupts. */
> -	MULTI_STOP_DISABLE_IRQ,
> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>  	/* Run the function */
>  	MULTI_STOP_RUN,
>  	/* Exit */
> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
>  		cpu_relax();
> +
> +		/*
> +		 * We use 2 separate stages to disable interrupts, namely
> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
> +		 * disable their interrupts first, followed by the active CPUs.
> +		 *
> +		 * This is done to avoid a race in the CPU offline path, which
> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
> +		 * has gone offline.
> +		 *
> +		 * During CPU offline, we don't want the other CPUs to send
> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> +		 * disabled interrupts (because, then it will notice the IPIs
> +		 * only after it has gone offline). We can prevent this by
> +		 * making the other CPUs disable their interrupts first - that
> +		 * way, they will run the stop-machine code with interrupts
> +		 * disabled, and hence won't send IPIs after that point.
> +		 */
> +
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -			case MULTI_STOP_DISABLE_IRQ:
> -				local_irq_disable();
> -				hard_irq_disable();
> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> +				if (!is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();
> +				}
> +				break;
> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> +				if (is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();
> +				}

Do we actually need that now that we are flushing the ipi queue on CPU dying?

>  				break;
>  			case MULTI_STOP_RUN:
>  				if (is_active)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 13:22   ` Frederic Weisbecker
@ 2014-05-23 14:45     ` Srivatsa S. Bhat
  2014-05-23 15:04       ` Frederic Weisbecker
  2014-05-23 15:12       ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 14:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/23/2014 06:52 PM, Frederic Weisbecker wrote:
> On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
>> During CPU offline, stop-machine is used to take control over all the online
>> CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
>> that is to be taken offline.
>>
>> But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
>> The important thing to note here is that the _DISABLE_IRQ stage comes much
>> later after starting stop-machine, and hence there is a large window where
>> other CPUs can send IPIs to the CPU going offline. As a result, we can
>> encounter a scenario as depicted below, which causes IPIs to be sent to the
>> CPU going offline, and that CPU notices them *after* it has gone offline,
>> triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.
>>
>>
>>               CPU 1                                         CPU 2
>>           (Online CPU)                               (CPU going offline)
>>
>>        Enter _PREPARE stage                          Enter _PREPARE stage
>>
>>                                                      Enter _DISABLE_IRQ stage
>>
>>
>>                                                    =
>>        Got a device interrupt,                     | Didn't notice the IPI
>>        and the interrupt handler                   | since interrupts were
>>        called smp_call_function()                  | disabled on this CPU.
>>        and sent an IPI to CPU 2.                   |
>>                                                    =
>>
>>
>>        Enter _DISABLE_IRQ stage
>>
>>
>>        Enter _RUN stage                              Enter _RUN stage
>>
>>                                   =
>>        Busy loop with interrupts  |                  Invoke take_cpu_down()
>>        disabled.                  |                  and take CPU 2 offline
>>                                   =
>>
>>
>>        Enter _EXIT stage                             Enter _EXIT stage
>>
>>        Re-enable interrupts                          Re-enable interrupts
>>
>>                                                      The pending IPI is noted
>>                                                      immediately, but alas,
>>                                                      the CPU is offline at
>>                                                      this point.
>>
>>
>>
>> So, as we can observe from this scenario, the IPI was sent when CPU 2 was
>> still online, and hence it was perfectly legal. But unfortunately it was
>> noted only after CPU 2 went offline, resulting in the warning from the
>> IPI handling code. In other words, the fault was not at the sender, but
>> at the receiver side - and if we look closely, the real bug is in the
>> stop-machine sequence itself.
>>
>> The problem here is that the CPU going offline disabled its local interrupts
>> (by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
>> reason why it was not able to respond to the IPI before going offline.
>>
>> A simple solution to this problem is to ensure that the CPU going offline
>> disables its interrupts only *after* the other CPUs do the same thing.
>> To achieve this, split the _DISABLE_IRQ state into 2 parts:
>>
>> 1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
>> (i.e., the "other" CPUs) disable their interrupts.
>>
>> 2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
>> CPU going offline) disables its interrupts.
>>
>> With this in place, the CPU going offline will always be the last one to
>> disable interrupts. After this step, no further IPIs can be sent to the
>> outgoing CPU, since all the other CPUs would be executing the stop-machine
>> code with interrupts disabled. And by the time stop-machine ends, the CPU
>> would have gone offline and disappeared from the cpu_online_mask, and hence
>> future invocations of smp_call_function() and friends will automatically
>> prune that CPU out. Thus, we can guarantee that no CPU will end up
>> *inadvertently* sending IPIs to an offline CPU.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index 01fbae5..288f7fe 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -130,8 +130,10 @@ enum multi_stop_state {
>>  	MULTI_STOP_NONE,
>>  	/* Awaiting everyone to be scheduled. */
>>  	MULTI_STOP_PREPARE,
>> -	/* Disable interrupts. */
>> -	MULTI_STOP_DISABLE_IRQ,
>> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
>> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
>> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
>> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>>  	/* Run the function */
>>  	MULTI_STOP_RUN,
>>  	/* Exit */
>> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>  		cpu_relax();
>> +
>> +		/*
>> +		 * We use 2 separate stages to disable interrupts, namely
>> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
>> +		 * disable their interrupts first, followed by the active CPUs.
>> +		 *
>> +		 * This is done to avoid a race in the CPU offline path, which
>> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
>> +		 * has gone offline.
>> +		 *
>> +		 * During CPU offline, we don't want the other CPUs to send
>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>> +		 * disabled interrupts (because, then it will notice the IPIs
>> +		 * only after it has gone offline). We can prevent this by
>> +		 * making the other CPUs disable their interrupts first - that
>> +		 * way, they will run the stop-machine code with interrupts
>> +		 * disabled, and hence won't send IPIs after that point.
>> +		 */
>> +
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -			case MULTI_STOP_DISABLE_IRQ:
>> -				local_irq_disable();
>> -				hard_irq_disable();
>> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
>> +				if (!is_active) {
>> +					local_irq_disable();
>> +					hard_irq_disable();
>> +				}
>> +				break;
>> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
>> +				if (is_active) {
>> +					local_irq_disable();
>> +					hard_irq_disable();
>> +				}
> 
> Do we actually need that now that we are flushing the ipi queue on CPU dying?
> 

Yes, we do. Flushing the IPI queue is one thing - it guarantees that a CPU
doesn't go offline without finishing its work. Not receiving IPIs after going
offline is a different thing - it helps avoid warnings from the IPI handling
code (although it will be harmless if the queue had been flushed earlier).

So I think it is good to have both, so that we can keep CPU offline very
clean - no pending work left around, as well as no possibility of (real or
spurious) warnings.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 14:45     ` Srivatsa S. Bhat
@ 2014-05-23 15:04       ` Frederic Weisbecker
  2014-05-23 15:24         ` Srivatsa S. Bhat
  2014-05-23 15:12       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-05-23 15:04 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
> On 05/23/2014 06:52 PM, Frederic Weisbecker wrote:
> > On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
> >> During CPU offline, stop-machine is used to take control over all the online
> >> CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
> >> that is to be taken offline.
> >>
> >> But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
> >> The important thing to note here is that the _DISABLE_IRQ stage comes much
> >> later after starting stop-machine, and hence there is a large window where
> >> other CPUs can send IPIs to the CPU going offline. As a result, we can
> >> encounter a scenario as depicted below, which causes IPIs to be sent to the
> >> CPU going offline, and that CPU notices them *after* it has gone offline,
> >> triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.
> >>
> >>
> >>               CPU 1                                         CPU 2
> >>           (Online CPU)                               (CPU going offline)
> >>
> >>        Enter _PREPARE stage                          Enter _PREPARE stage
> >>
> >>                                                      Enter _DISABLE_IRQ stage
> >>
> >>
> >>                                                    =
> >>        Got a device interrupt,                     | Didn't notice the IPI
> >>        and the interrupt handler                   | since interrupts were
> >>        called smp_call_function()                  | disabled on this CPU.
> >>        and sent an IPI to CPU 2.                   |
> >>                                                    =
> >>
> >>
> >>        Enter _DISABLE_IRQ stage
> >>
> >>
> >>        Enter _RUN stage                              Enter _RUN stage
> >>
> >>                                   =
> >>        Busy loop with interrupts  |                  Invoke take_cpu_down()
> >>        disabled.                  |                  and take CPU 2 offline
> >>                                   =
> >>
> >>
> >>        Enter _EXIT stage                             Enter _EXIT stage
> >>
> >>        Re-enable interrupts                          Re-enable interrupts
> >>
> >>                                                      The pending IPI is noted
> >>                                                      immediately, but alas,
> >>                                                      the CPU is offline at
> >>                                                      this point.
> >>
> >>
> >>
> >> So, as we can observe from this scenario, the IPI was sent when CPU 2 was
> >> still online, and hence it was perfectly legal. But unfortunately it was
> >> noted only after CPU 2 went offline, resulting in the warning from the
> >> IPI handling code. In other words, the fault was not at the sender, but
> >> at the receiver side - and if we look closely, the real bug is in the
> >> stop-machine sequence itself.
> >>
> >> The problem here is that the CPU going offline disabled its local interrupts
> >> (by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
> >> reason why it was not able to respond to the IPI before going offline.
> >>
> >> A simple solution to this problem is to ensure that the CPU going offline
> >> disables its interrupts only *after* the other CPUs do the same thing.
> >> To achieve this, split the _DISABLE_IRQ state into 2 parts:
> >>
> >> 1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
> >> (i.e., the "other" CPUs) disable their interrupts.
> >>
> >> 2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
> >> CPU going offline) disables its interrupts.
> >>
> >> With this in place, the CPU going offline will always be the last one to
> >> disable interrupts. After this step, no further IPIs can be sent to the
> >> outgoing CPU, since all the other CPUs would be executing the stop-machine
> >> code with interrupts disabled. And by the time stop-machine ends, the CPU
> >> would have gone offline and disappeared from the cpu_online_mask, and hence
> >> future invocations of smp_call_function() and friends will automatically
> >> prune that CPU out. Thus, we can guarantee that no CPU will end up
> >> *inadvertently* sending IPIs to an offline CPU.
> >>
> >> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >> ---
> >>
> >>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
> >>  1 file changed, 34 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> >> index 01fbae5..288f7fe 100644
> >> --- a/kernel/stop_machine.c
> >> +++ b/kernel/stop_machine.c
> >> @@ -130,8 +130,10 @@ enum multi_stop_state {
> >>  	MULTI_STOP_NONE,
> >>  	/* Awaiting everyone to be scheduled. */
> >>  	MULTI_STOP_PREPARE,
> >> -	/* Disable interrupts. */
> >> -	MULTI_STOP_DISABLE_IRQ,
> >> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
> >> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
> >> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
> >> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
> >>  	/* Run the function */
> >>  	MULTI_STOP_RUN,
> >>  	/* Exit */
> >> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
> >>  	do {
> >>  		/* Chill out and ensure we re-read multi_stop_state. */
> >>  		cpu_relax();
> >> +
> >> +		/*
> >> +		 * We use 2 separate stages to disable interrupts, namely
> >> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
> >> +		 * disable their interrupts first, followed by the active CPUs.
> >> +		 *
> >> +		 * This is done to avoid a race in the CPU offline path, which
> >> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
> >> +		 * has gone offline.
> >> +		 *
> >> +		 * During CPU offline, we don't want the other CPUs to send
> >> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> >> +		 * disabled interrupts (because, then it will notice the IPIs
> >> +		 * only after it has gone offline). We can prevent this by
> >> +		 * making the other CPUs disable their interrupts first - that
> >> +		 * way, they will run the stop-machine code with interrupts
> >> +		 * disabled, and hence won't send IPIs after that point.
> >> +		 */
> >> +
> >>  		if (msdata->state != curstate) {
> >>  			curstate = msdata->state;
> >>  			switch (curstate) {
> >> -			case MULTI_STOP_DISABLE_IRQ:
> >> -				local_irq_disable();
> >> -				hard_irq_disable();
> >> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> >> +				if (!is_active) {
> >> +					local_irq_disable();
> >> +					hard_irq_disable();
> >> +				}
> >> +				break;
> >> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> >> +				if (is_active) {
> >> +					local_irq_disable();
> >> +					hard_irq_disable();
> >> +				}
> > 
> > Do we actually need that now that we are flushing the ipi queue on CPU dying?
> > 
> 
> Yes, we do. Flushing the IPI queue is one thing - it guarantees that a CPU
> doesn't go offline without finishing its work. Not receiving IPIs after going
> offline is a different thing - it helps avoid warnings from the IPI handling
> code (although it will be harmless if the queue had been flushed earlier).

I'm confused. Perhaps I don't understand well how things mix up. How does it avoid the warning.
Isn't there still a risk that some IPI don't fire due to hardware latency.

I mean either we do:

    local_irq_enable()
    wait_for_pending_ipi()
    local_irq_disable()

Or we do

    hotplug_cpu_down {
        flush_ipi()
    }

But something in between looks broken:

    local_irq_disable()
    local_irq_enable()

    flush_ipi()


> 
> So I think it is good to have both, so that we can keep CPU offline very
> clean - no pending work left around, as well as no possibility of (real or
> spurious) warnings.

Ah may be what you want to avoid is this:

    CPU 0               CPU 1
    -------------------------

    send IPI to 1

                        flush_ipi()
                        set_cpu_offline()
                        get_ipi()
                        //get late IPI but queue is flushed already
                        smp_single_function_interrupt() {
                               WARN()

Yeah but still, your patch doesn't deal with late hardware IPI.
How about we move the warning to the IPI callback iterator:

-       WARN_ON_ONCE(cpu_is_offline())

        llist_for_each(...) {
+            WARN_ON_ONCE(cpu_is_offline())
             csd->func()
        }

Since what matters is that all functions are executed before going offline.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:04       ` Frederic Weisbecker
@ 2014-05-23 15:24         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/23/2014 08:34 PM, Frederic Weisbecker wrote:
> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>> On 05/23/2014 06:52 PM, Frederic Weisbecker wrote:
>>> On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
>>>> During CPU offline, stop-machine is used to take control over all the online
>>>> CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
>>>> that is to be taken offline.
>>>>
[...]
>>>>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 34 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>>>> index 01fbae5..288f7fe 100644
>>>> --- a/kernel/stop_machine.c
>>>> +++ b/kernel/stop_machine.c
>>>> @@ -130,8 +130,10 @@ enum multi_stop_state {
>>>>  	MULTI_STOP_NONE,
>>>>  	/* Awaiting everyone to be scheduled. */
>>>>  	MULTI_STOP_PREPARE,
>>>> -	/* Disable interrupts. */
>>>> -	MULTI_STOP_DISABLE_IRQ,
>>>> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
>>>> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
>>>> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
>>>> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>>>>  	/* Run the function */
>>>>  	MULTI_STOP_RUN,
>>>>  	/* Exit */
>>>> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>>>>  	do {
>>>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>>>  		cpu_relax();
>>>> +
>>>> +		/*
>>>> +		 * We use 2 separate stages to disable interrupts, namely
>>>> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
>>>> +		 * disable their interrupts first, followed by the active CPUs.
>>>> +		 *
>>>> +		 * This is done to avoid a race in the CPU offline path, which
>>>> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
>>>> +		 * has gone offline.
>>>> +		 *
>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>> +		 * only after it has gone offline). We can prevent this by
>>>> +		 * making the other CPUs disable their interrupts first - that
>>>> +		 * way, they will run the stop-machine code with interrupts
>>>> +		 * disabled, and hence won't send IPIs after that point.
>>>> +		 */
>>>> +
>>>>  		if (msdata->state != curstate) {
>>>>  			curstate = msdata->state;
>>>>  			switch (curstate) {
>>>> -			case MULTI_STOP_DISABLE_IRQ:
>>>> -				local_irq_disable();
>>>> -				hard_irq_disable();
>>>> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
>>>> +				if (!is_active) {
>>>> +					local_irq_disable();
>>>> +					hard_irq_disable();
>>>> +				}
>>>> +				break;
>>>> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
>>>> +				if (is_active) {
>>>> +					local_irq_disable();
>>>> +					hard_irq_disable();
>>>> +				}
>>>
>>> Do we actually need that now that we are flushing the ipi queue on CPU dying?
>>>
>>
>> Yes, we do. Flushing the IPI queue is one thing - it guarantees that a CPU
>> doesn't go offline without finishing its work. Not receiving IPIs after going
>> offline is a different thing - it helps avoid warnings from the IPI handling
>> code (although it will be harmless if the queue had been flushed earlier).
> 
> I'm confused. Perhaps I don't understand well how things mix up. How does it avoid the warning.
> Isn't there still a risk that some IPI don't fire due to hardware latency.
> 
> I mean either we do:
> 
>     local_irq_enable()
>     wait_for_pending_ipi()
>     local_irq_disable()
> 
> Or we do
> 
>     hotplug_cpu_down {
>         flush_ipi()
>     }
> 
> But something in between looks broken:
> 
>     local_irq_disable()
>     local_irq_enable()
> 
>     flush_ipi()
> 
> 
>>
>> So I think it is good to have both, so that we can keep CPU offline very
>> clean - no pending work left around, as well as no possibility of (real or
>> spurious) warnings.
> 
> Ah may be what you want to avoid is this:
> 
>     CPU 0               CPU 1
>     -------------------------
> 
>     send IPI to 1
> 
>                         flush_ipi()
>                         set_cpu_offline()
>                         get_ipi()
>                         //get late IPI but queue is flushed already
>                         smp_single_function_interrupt() {
>                                WARN()
> 
> Yeah but still, your patch doesn't deal with late hardware IPI.
> How about we move the warning to the IPI callback iterator:
> 
> -       WARN_ON_ONCE(cpu_is_offline())
> 
>         llist_for_each(...) {
> +            WARN_ON_ONCE(cpu_is_offline())
>              csd->func()
>         }
> 
> Since what matters is that all functions are executed before going offline.
> 

Right, we can't do anything about getting late IPIs, but we need to warn
only if there really was work to be done and the CPU is already offline.
What you suggested above will take care of that. I'll incorporate this in
an updated patch, thank you!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 14:45     ` Srivatsa S. Bhat
  2014-05-23 15:04       ` Frederic Weisbecker
@ 2014-05-23 15:12       ` Peter Zijlstra
  2014-05-23 15:18         ` Srivatsa S. Bhat
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-05-23 15:12 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
> >> +		 * During CPU offline, we don't want the other CPUs to send
> >> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> >> +		 * disabled interrupts (because, then it will notice the IPIs
> >> +		 * only after it has gone offline). We can prevent this by
> >> +		 * making the other CPUs disable their interrupts first - that
> >> +		 * way, they will run the stop-machine code with interrupts
> >> +		 * disabled, and hence won't send IPIs after that point.

That's complete nonsense, you can send IPIs all you want with interrupts
disabled.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:12       ` Peter Zijlstra
@ 2014-05-23 15:18         ` Srivatsa S. Bhat
  2014-05-23 15:31           ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>> +		 * only after it has gone offline). We can prevent this by
>>>> +		 * making the other CPUs disable their interrupts first - that
>>>> +		 * way, they will run the stop-machine code with interrupts
>>>> +		 * disabled, and hence won't send IPIs after that point.
> 
> That's complete nonsense, you can send IPIs all you want with interrupts
> disabled.
> 

True, but that's not what the comment says. It says "you can't send IPIs
because you are running the *stop-machine* loop, because the stop-machine loop
doesn't send IPIs itself! The only possibility of sending IPIs from within
stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
sends the IPI (like what the block layer used to do) - and we precisely avoid
that possibility by disabling interrupts. So no IPIs will be sent beyond
this point.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:18         ` Srivatsa S. Bhat
@ 2014-05-23 15:31           ` Peter Zijlstra
  2014-05-23 15:33             ` Srivatsa S. Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-05-23 15:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
> > On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
> >>>> +		 * During CPU offline, we don't want the other CPUs to send
> >>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> >>>> +		 * disabled interrupts (because, then it will notice the IPIs
> >>>> +		 * only after it has gone offline). We can prevent this by
> >>>> +		 * making the other CPUs disable their interrupts first - that
> >>>> +		 * way, they will run the stop-machine code with interrupts
> >>>> +		 * disabled, and hence won't send IPIs after that point.
> > 
> > That's complete nonsense, you can send IPIs all you want with interrupts
> > disabled.
> > 
> 
> True, but that's not what the comment says. It says "you can't send IPIs
> because you are running the *stop-machine* loop, because the stop-machine loop
> doesn't send IPIs itself! The only possibility of sending IPIs from within
> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
> sends the IPI (like what the block layer used to do) - and we precisely avoid
> that possibility by disabling interrupts. So no IPIs will be sent beyond
> this point.

but one of those CPUs is running the stop machine function, which calls
CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
all it wants, right?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:31           ` Peter Zijlstra
@ 2014-05-23 15:33             ` Srivatsa S. Bhat
  2014-05-23 15:37               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>>>> +		 * only after it has gone offline). We can prevent this by
>>>>>> +		 * making the other CPUs disable their interrupts first - that
>>>>>> +		 * way, they will run the stop-machine code with interrupts
>>>>>> +		 * disabled, and hence won't send IPIs after that point.
>>>
>>> That's complete nonsense, you can send IPIs all you want with interrupts
>>> disabled.
>>>
>>
>> True, but that's not what the comment says. It says "you can't send IPIs
>> because you are running the *stop-machine* loop, because the stop-machine loop
>> doesn't send IPIs itself! The only possibility of sending IPIs from within
>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
>> sends the IPI (like what the block layer used to do) - and we precisely avoid
>> that possibility by disabling interrupts. So no IPIs will be sent beyond
>> this point.
> 
> but one of those CPUs is running the stop machine function, which calls
> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
> all it wants, right?
> 

Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
IPIs on precisely that CPU - the one which is about to go offline).

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:33             ` Srivatsa S. Bhat
@ 2014-05-23 15:37               ` Srivatsa S. Bhat
  2014-05-23 15:48                 ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

On 05/23/2014 09:03 PM, Srivatsa S. Bhat wrote:
> On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
>> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
>>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
>>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>>>>> +		 * only after it has gone offline). We can prevent this by
>>>>>>> +		 * making the other CPUs disable their interrupts first - that
>>>>>>> +		 * way, they will run the stop-machine code with interrupts
>>>>>>> +		 * disabled, and hence won't send IPIs after that point.
>>>>
>>>> That's complete nonsense, you can send IPIs all you want with interrupts
>>>> disabled.
>>>>
>>>
>>> True, but that's not what the comment says. It says "you can't send IPIs
>>> because you are running the *stop-machine* loop, because the stop-machine loop
>>> doesn't send IPIs itself! The only possibility of sending IPIs from within
>>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
>>> sends the IPI (like what the block layer used to do) - and we precisely avoid
>>> that possibility by disabling interrupts. So no IPIs will be sent beyond
>>> this point.
>>
>> but one of those CPUs is running the stop machine function, which calls
>> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
>> all it wants, right?
>>
> 
> Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
> IPIs on precisely that CPU - the one which is about to go offline).
> 

And the comment makes that distinction between the "active-cpu" and "other CPUs"
(where active-cpu is the one which runs the stop-machine function and eventually
goes offline). Thus "other CPUs" won't send IPIs after that point, because they
are running the stop-machine loop with interrupts disabled. This ensures that
the "active-cpu" doesn't get any IPIs - which is what we want.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:37               ` Srivatsa S. Bhat
@ 2014-05-23 15:48                 ` Peter Zijlstra
  2014-05-23 15:53                   ` Srivatsa S. Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-05-23 15:48 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2535 bytes --]

On Fri, May 23, 2014 at 09:07:18PM +0530, Srivatsa S. Bhat wrote:
> On 05/23/2014 09:03 PM, Srivatsa S. Bhat wrote:
> > On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
> >> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
> >>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
> >>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
> >>>>>>> +		 * During CPU offline, we don't want the other CPUs to send
> >>>>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> >>>>>>> +		 * disabled interrupts (because, then it will notice the IPIs
> >>>>>>> +		 * only after it has gone offline). We can prevent this by
> >>>>>>> +		 * making the other CPUs disable their interrupts first - that
> >>>>>>> +		 * way, they will run the stop-machine code with interrupts
> >>>>>>> +		 * disabled, and hence won't send IPIs after that point.
> >>>>
> >>>> That's complete nonsense, you can send IPIs all you want with interrupts
> >>>> disabled.
> >>>>
> >>>
> >>> True, but that's not what the comment says. It says "you can't send IPIs
> >>> because you are running the *stop-machine* loop, because the stop-machine loop
> >>> doesn't send IPIs itself! The only possibility of sending IPIs from within
> >>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
> >>> sends the IPI (like what the block layer used to do) - and we precisely avoid
> >>> that possibility by disabling interrupts. So no IPIs will be sent beyond
> >>> this point.
> >>
> >> but one of those CPUs is running the stop machine function, which calls
> >> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
> >> all it wants, right?
> >>
> > 
> > Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
> > IPIs on precisely that CPU - the one which is about to go offline).
> > 
> 
> And the comment makes that distinction between the "active-cpu" and "other CPUs"
> (where active-cpu is the one which runs the stop-machine function and eventually
> goes offline). Thus "other CPUs" won't send IPIs after that point, because they
> are running the stop-machine loop with interrupts disabled. This ensures that
> the "active-cpu" doesn't get any IPIs - which is what we want.

OK, so clearly I'm having trouble reading today :/ Makes sense now.

But yes, its unlikely for CPU_DYING to self-IPI, although if you really
want, I can do ;-)

And I guess the one extra state doesn't hurt too bad for
stop_two_cpus().

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:48                 ` Peter Zijlstra
@ 2014-05-23 15:53                   ` Srivatsa S. Bhat
  2014-05-23 17:05                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

On 05/23/2014 09:18 PM, Peter Zijlstra wrote:
> On Fri, May 23, 2014 at 09:07:18PM +0530, Srivatsa S. Bhat wrote:
>> On 05/23/2014 09:03 PM, Srivatsa S. Bhat wrote:
>>> On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
>>>> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
>>>>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
>>>>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>>>>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>>>>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>>>>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>>>>>>> +		 * only after it has gone offline). We can prevent this by
>>>>>>>>> +		 * making the other CPUs disable their interrupts first - that
>>>>>>>>> +		 * way, they will run the stop-machine code with interrupts
>>>>>>>>> +		 * disabled, and hence won't send IPIs after that point.
>>>>>>
>>>>>> That's complete nonsense, you can send IPIs all you want with interrupts
>>>>>> disabled.
>>>>>>
>>>>>
>>>>> True, but that's not what the comment says. It says "you can't send IPIs
>>>>> because you are running the *stop-machine* loop, because the stop-machine loop
>>>>> doesn't send IPIs itself! The only possibility of sending IPIs from within
>>>>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
>>>>> sends the IPI (like what the block layer used to do) - and we precisely avoid
>>>>> that possibility by disabling interrupts. So no IPIs will be sent beyond
>>>>> this point.
>>>>
>>>> but one of those CPUs is running the stop machine function, which calls
>>>> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
>>>> all it wants, right?
>>>>
>>>
>>> Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
>>> IPIs on precisely that CPU - the one which is about to go offline).
>>>
>>
>> And the comment makes that distinction between the "active-cpu" and "other CPUs"
>> (where active-cpu is the one which runs the stop-machine function and eventually
>> goes offline). Thus "other CPUs" won't send IPIs after that point, because they
>> are running the stop-machine loop with interrupts disabled. This ensures that
>> the "active-cpu" doesn't get any IPIs - which is what we want.
> 
> OK, so clearly I'm having trouble reading today :/ Makes sense now.
> 
> But yes, its unlikely for CPU_DYING to self-IPI, although if you really
> want, I can do ;-)
>

Haha :-)
 
> And I guess the one extra state doesn't hurt too bad for
> stop_two_cpus().
> 

Ok, that's good then.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:53                   ` Srivatsa S. Bhat
@ 2014-05-23 17:05                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 17:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, tglx, mingo, tj, rusty, akpm, hch, mgorman,
	riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel

On 05/23/2014 09:23 PM, Srivatsa S. Bhat wrote:
> On 05/23/2014 09:18 PM, Peter Zijlstra wrote:
>> On Fri, May 23, 2014 at 09:07:18PM +0530, Srivatsa S. Bhat wrote:
>>> On 05/23/2014 09:03 PM, Srivatsa S. Bhat wrote:
>>>> On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
>>>>> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
>>>>>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
>>>>>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>>>>>>>> +		 * During CPU offline, we don't want the other CPUs to send
>>>>>>>>>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>>>>>>>> +		 * disabled interrupts (because, then it will notice the IPIs
>>>>>>>>>> +		 * only after it has gone offline). We can prevent this by
>>>>>>>>>> +		 * making the other CPUs disable their interrupts first - that
>>>>>>>>>> +		 * way, they will run the stop-machine code with interrupts
>>>>>>>>>> +		 * disabled, and hence won't send IPIs after that point.
>>>>>>>
>>>>>>> That's complete nonsense, you can send IPIs all you want with interrupts
>>>>>>> disabled.
>>>>>>>
>>>>>>
>>>>>> True, but that's not what the comment says. It says "you can't send IPIs
>>>>>> because you are running the *stop-machine* loop, because the stop-machine loop
>>>>>> doesn't send IPIs itself! The only possibility of sending IPIs from within
>>>>>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
>>>>>> sends the IPI (like what the block layer used to do) - and we precisely avoid
>>>>>> that possibility by disabling interrupts. So no IPIs will be sent beyond
>>>>>> this point.
>>>>>
>>>>> but one of those CPUs is running the stop machine function, which calls
>>>>> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
>>>>> all it wants, right?
>>>>>
>>>>
>>>> Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
>>>> IPIs on precisely that CPU - the one which is about to go offline).
>>>>
>>>
>>> And the comment makes that distinction between the "active-cpu" and "other CPUs"
>>> (where active-cpu is the one which runs the stop-machine function and eventually
>>> goes offline). Thus "other CPUs" won't send IPIs after that point, because they
>>> are running the stop-machine loop with interrupts disabled. This ensures that
>>> the "active-cpu" doesn't get any IPIs - which is what we want.
>>
>> OK, so clearly I'm having trouble reading today :/ Makes sense now.
>>
>> But yes, its unlikely for CPU_DYING to self-IPI, although if you really
>> want, I can do ;-)
>>
> 
> Haha :-)
>  
>> And I guess the one extra state doesn't hurt too bad for
>> stop_two_cpus().
>>
> 
> Ok, that's good then.
> 

Actually, this entire patch 2 becomes unnecessary by going with what Frederic
suggested. If we change the warning condition in patch 3 to "if cpu is offline
and there were pending callbacks, only then complain" instead of "if cpu is
offline and we got an IPI, then complain", then it doesn't matter if an IPI
arrives late (either due to hardware latency or due to stop-machine related
race), as long as we flush the callbacks before going offline.

I'll explain this in more detail in the next version of the patchset.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 10:12 ` [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
  2014-05-23 13:22   ` Frederic Weisbecker
@ 2014-05-23 15:21   ` Peter Zijlstra
  2014-05-23 15:31     ` Srivatsa S. Bhat
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-05-23 15:21 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, mingo, tj, rusty, akpm, fweisbec, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 575 bytes --]

On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
>        Re-enable interrupts                          Re-enable interrupts
> 
>                                                      The pending IPI is noted
>                                                      immediately, but alas,
>                                                      the CPU is offline at
>                                                      this point.
> 

So wasn't clear_local_APIC() (and the arch function __cpu_disable() in
general) wipe all pending interrup state?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-23 15:21   ` Peter Zijlstra
@ 2014-05-23 15:31     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, tj, rusty, akpm, fweisbec, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel,
	Benjamin Herrenschmidt, Paul Mackerras

On 05/23/2014 08:51 PM, Peter Zijlstra wrote:
> On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
>>        Re-enable interrupts                          Re-enable interrupts
>>
>>                                                      The pending IPI is noted
>>                                                      immediately, but alas,
>>                                                      the CPU is offline at
>>                                                      this point.
>>
> 
> So wasn't clear_local_APIC() (and the arch function __cpu_disable() in
> general) wipe all pending interrup state?
> 

Apparently not. Atleast powerpc explicitly re-enables only the IPIs.. 

A snippet from xics_migrate_irqs_away() in powerpc, which is called
in the cpu-disable phase:


/* Interrupts are disabled. */
void xics_migrate_irqs_away(void)
{
	[...]
	/* Reject any interrupt that was queued to us... */
	icp_ops->set_priority(0);

	[...]
	/* Allow IPIs again... */
	icp_ops->set_priority(DEFAULT_PRIORITY);

	for_each_irq_desc(virq, desc) {
		[...]
		/* We only need to migrate enabled IRQS */
		if (!desc->action)
			continue;

		[...]

		/* We need to get IPIs still. */
		if (irq == XICS_IPI || irq == XICS_IRQ_SPURIOUS)
			continue;
		[...]

}

But that said, it should have cleared any pending IPIs and allowed
only *new* IPIs (for whatever reason). Atleast the warning I hit
indicates that that didn't happen, and the old IPI really was taken
later.

Ben, Paul, any thoughts on this?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
  2014-05-23 10:11 [PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
  2014-05-23 10:12 ` [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
@ 2014-05-23 10:12 ` Srivatsa S. Bhat
  2014-05-23 13:27   ` Frederic Weisbecker
  2 siblings, 1 reply; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 10:12 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

During CPU offline, in the stop-machine loop, we use 2 separate stages to
disable interrupts, to ensure that the CPU going offline doesn't get any new
IPIs from the other CPUs after it has gone offline.

However, an IPI sent much earlier might arrive late on the target CPU
(possibly _after_ the CPU has gone offline) due to hardware latencies,
and if this happens, then the smp-call-function callbacks queued on the
outgoing CPU will not get noticed (and hence not executed) at all.

This is somewhat theoretical, but in any case, it makes sense to explicitly
loop through the call_single_queue and flush any pending callbacks before the
CPU goes completely offline. So, perform this step in the CPU_DYING stage of
CPU offline. That way, we would have handled all the queued callbacks before
going offline, and also, no new IPIs can be sent by the other CPUs to the
outgoing CPU at that point, because they will all be executing the stop-machine
code with interrupts disabled.

But since the outgoing CPU is already marked offline at this point, we can't
directly invoke generic_smp_call_function_single_interrupt() from CPU_DYING
notifier, because it will trigger the "IPI to offline CPU" warning. Hence,
separate out its functionality into to a new function called
'flush_smp_call_function_queue' which skips the "is-cpu-online?" check, and
call this instead (since we know what we are doing in this path).

(Aside: 'generic_smp_call_function_single_interrupt' is too long a name already,
so I didn't want to add an uglier-looking double-underscore prefixed version.
'flush_smp_call_function_queue' is a much more meaningful name).

Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 306f818..b7a527b 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
 
+static void flush_smp_call_function_queue(void);
+
 static int
 hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -52,6 +54,18 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
 
+	case CPU_DYING:
+	case CPU_DYING_FROZEN:
+		/*
+		 * The IPIs for the smp-call-function callbacks queued by other
+		 * CPUs might arrive late due to hardware latencies. So flush
+		 * out any pending IPI callbacks explicitly (without waiting for
+		 * the IPIs to arrive), to ensure that the outgoing CPU doesn't
+		 * go offline with work still pending.
+		 */
+		flush_smp_call_function_queue();
+		break;
+
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		free_cpumask_var(cfd->cpumask);
@@ -177,26 +191,56 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
 	return 0;
 }
 
-/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+/**
+ * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
+ *
+ * Flush any pending smp-call-function callbacks queued on this CPU. This is
+ * invoked by the generic IPI handler, as well as by a CPU about to go offline,
+ * to ensure that all pending IPI functions are run before it goes completely
+ * offline.
+ *
+ * Loop through the call_single_queue and run all the queued functions.
+ * Must be called with interrupts disabled.
  */
-void generic_smp_call_function_single_interrupt(void)
+static void flush_smp_call_function_queue(void)
 {
 	struct llist_node *entry;
 	struct call_single_data *csd, *csd_next;
-	static bool warned;
 
 	entry = llist_del_all(&__get_cpu_var(call_single_queue));
 	entry = llist_reverse_order(entry);
 
+	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+		csd->func(csd->info);
+		csd_unlock(csd);
+	}
+}
+
+/**
+ * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
+ *
+ * Invoked by arch to handle an IPI for call function single.
+ * Must be called with interrupts disabled.
+ */
+void generic_smp_call_function_single_interrupt(void)
+{
+	static bool warned;
+
+	WARN_ON(!irqs_disabled());
+
 	/*
 	 * Shouldn't receive this interrupt on a cpu that is not yet online.
 	 */
 	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
+		struct llist_node *entry;
+		struct call_single_data *csd;
+
 		warned = true;
 		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
 
+		entry = llist_del_all(&__get_cpu_var(call_single_queue));
+		entry = llist_reverse_order(entry);
+
 		/*
 		 * We don't have to use the _safe() variant here
 		 * because we are not invoking the IPI handlers yet.
@@ -206,10 +250,7 @@ void generic_smp_call_function_single_interrupt(void)
 				csd->func);
 	}
 
-	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
-		csd->func(csd->info);
-		csd_unlock(csd);
-	}
+	flush_smp_call_function_queue();
 }
 
 /*


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
  2014-05-23 10:12 ` [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
@ 2014-05-23 13:27   ` Frederic Weisbecker
  2014-05-23 14:47     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-05-23 13:27 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Fri, May 23, 2014 at 03:42:35PM +0530, Srivatsa S. Bhat wrote:
> During CPU offline, in the stop-machine loop, we use 2 separate stages to
> disable interrupts, to ensure that the CPU going offline doesn't get any new
> IPIs from the other CPUs after it has gone offline.
> 
> However, an IPI sent much earlier might arrive late on the target CPU
> (possibly _after_ the CPU has gone offline) due to hardware latencies,
> and if this happens, then the smp-call-function callbacks queued on the
> outgoing CPU will not get noticed (and hence not executed) at all.
> 
> This is somewhat theoretical, but in any case, it makes sense to explicitly
> loop through the call_single_queue and flush any pending callbacks before the
> CPU goes completely offline. So, perform this step in the CPU_DYING stage of
> CPU offline. That way, we would have handled all the queued callbacks before
> going offline, and also, no new IPIs can be sent by the other CPUs to the
> outgoing CPU at that point, because they will all be executing the stop-machine
> code with interrupts disabled.
> 
> But since the outgoing CPU is already marked offline at this point, we can't
> directly invoke generic_smp_call_function_single_interrupt() from CPU_DYING
> notifier, because it will trigger the "IPI to offline CPU" warning. Hence,
> separate out its functionality into to a new function called
> 'flush_smp_call_function_queue' which skips the "is-cpu-online?" check, and
> call this instead (since we know what we are doing in this path).
> 
> (Aside: 'generic_smp_call_function_single_interrupt' is too long a name already,
> so I didn't want to add an uglier-looking double-underscore prefixed version.
> 'flush_smp_call_function_queue' is a much more meaningful name).
> 
> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  kernel/smp.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 306f818..b7a527b 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
>  
> +static void flush_smp_call_function_queue(void);
> +
>  static int
>  hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
> @@ -52,6 +54,18 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  	case CPU_UP_CANCELED:
>  	case CPU_UP_CANCELED_FROZEN:
>  
> +	case CPU_DYING:
> +	case CPU_DYING_FROZEN:
> +		/*
> +		 * The IPIs for the smp-call-function callbacks queued by other
> +		 * CPUs might arrive late due to hardware latencies. So flush
> +		 * out any pending IPI callbacks explicitly (without waiting for
> +		 * the IPIs to arrive), to ensure that the outgoing CPU doesn't
> +		 * go offline with work still pending.
> +		 */
> +		flush_smp_call_function_queue();
> +		break;
> +
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
>  		free_cpumask_var(cfd->cpumask);
> @@ -177,26 +191,56 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
>  	return 0;
>  }
>  
> -/*
> - * Invoked by arch to handle an IPI for call function single. Must be
> - * called from the arch with interrupts disabled.
> +/**
> + * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
> + *
> + * Flush any pending smp-call-function callbacks queued on this CPU. This is
> + * invoked by the generic IPI handler, as well as by a CPU about to go offline,
> + * to ensure that all pending IPI functions are run before it goes completely
> + * offline.
> + *
> + * Loop through the call_single_queue and run all the queued functions.
> + * Must be called with interrupts disabled.
>   */
> -void generic_smp_call_function_single_interrupt(void)
> +static void flush_smp_call_function_queue(void)
>  {
>  	struct llist_node *entry;
>  	struct call_single_data *csd, *csd_next;
> -	static bool warned;
>  
>  	entry = llist_del_all(&__get_cpu_var(call_single_queue));
>  	entry = llist_reverse_order(entry);
>  
> +	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> +		csd->func(csd->info);
> +		csd_unlock(csd);
> +	}
> +}
> +
> +/**
> + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
> + *
> + * Invoked by arch to handle an IPI for call function single.
> + * Must be called with interrupts disabled.
> + */
> +void generic_smp_call_function_single_interrupt(void)
> +{
> +	static bool warned;
> +
> +	WARN_ON(!irqs_disabled());
> +
>  	/*
>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
>  	 */
>  	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
> +		struct llist_node *entry;
> +		struct call_single_data *csd;
> +
>  		warned = true;
>  		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
>  
> +		entry = llist_del_all(&__get_cpu_var(call_single_queue));

This is deleting all the entries, the call to flush_smp_call_function_queue()
will then miss these.

> +		entry = llist_reverse_order(entry);



> +
>  		/*
>  		 * We don't have to use the _safe() variant here
>  		 * because we are not invoking the IPI handlers yet.
> @@ -206,10 +250,7 @@ void generic_smp_call_function_single_interrupt(void)
>  				csd->func);
>  	}
>  
> -	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> -		csd->func(csd->info);
> -		csd_unlock(csd);
> -	}
> +	flush_smp_call_function_queue();
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
  2014-05-23 13:27   ` Frederic Weisbecker
@ 2014-05-23 14:47     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 20+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-23 14:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/23/2014 06:57 PM, Frederic Weisbecker wrote:
> On Fri, May 23, 2014 at 03:42:35PM +0530, Srivatsa S. Bhat wrote:
>> During CPU offline, in the stop-machine loop, we use 2 separate stages to
>> disable interrupts, to ensure that the CPU going offline doesn't get any new
>> IPIs from the other CPUs after it has gone offline.
>>
[...]
>> +/**
>> + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
>> + *
>> + * Invoked by arch to handle an IPI for call function single.
>> + * Must be called with interrupts disabled.
>> + */
>> +void generic_smp_call_function_single_interrupt(void)
>> +{
>> +	static bool warned;
>> +
>> +	WARN_ON(!irqs_disabled());
>> +
>>  	/*
>>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
>>  	 */
>>  	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
>> +		struct llist_node *entry;
>> +		struct call_single_data *csd;
>> +
>>  		warned = true;
>>  		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
>>  
>> +		entry = llist_del_all(&__get_cpu_var(call_single_queue));
> 
> This is deleting all the entries, the call to flush_smp_call_function_queue()
> will then miss these.
>

Argh! And I thought I had finally nailed it this time. :-(

Regards,
Srivatsa S. Bhat
 
>> +		entry = llist_reverse_order(entry);
> 
> 
> 
>> +
>>  		/*
>>  		 * We don't have to use the _safe() variant here
>>  		 * because we are not invoking the IPI handlers yet.
>> @@ -206,10 +250,7 @@ void generic_smp_call_function_single_interrupt(void)
>>  				csd->func);
>>  	}
>>  
>> -	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
>> -		csd->func(csd->info);
>> -		csd_unlock(csd);
>> -	}
>> +	flush_smp_call_function_queue();
>>  }
>>  
>>  /*
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-05-23 17:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-23 10:11 [PATCH v6 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
2014-05-23 10:12 ` [PATCH v6 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
2014-05-23 10:12 ` [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
2014-05-23 13:22   ` Frederic Weisbecker
2014-05-23 14:45     ` Srivatsa S. Bhat
2014-05-23 15:04       ` Frederic Weisbecker
2014-05-23 15:24         ` Srivatsa S. Bhat
2014-05-23 15:12       ` Peter Zijlstra
2014-05-23 15:18         ` Srivatsa S. Bhat
2014-05-23 15:31           ` Peter Zijlstra
2014-05-23 15:33             ` Srivatsa S. Bhat
2014-05-23 15:37               ` Srivatsa S. Bhat
2014-05-23 15:48                 ` Peter Zijlstra
2014-05-23 15:53                   ` Srivatsa S. Bhat
2014-05-23 17:05                     ` Srivatsa S. Bhat
2014-05-23 15:21   ` Peter Zijlstra
2014-05-23 15:31     ` Srivatsa S. Bhat
2014-05-23 10:12 ` [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
2014-05-23 13:27   ` Frederic Weisbecker
2014-05-23 14:47     ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox