All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>,
	akpm@linux-foundation.org, fweisbec@gmail.com,
	paulmck@linux.vnet.ibm.com, tglx@linutronix.de, mingo@kernel.org,
	rusty@rustcorp.com.au, hch@infradead.org, mgorman@suse.de,
	riel@redhat.com, bp@suse.de, rostedt@goodmis.org,
	mgalbraith@suse.de, ego@linux.vnet.ibm.com, rjw@rjwysocki.net,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 UPDATEDv3 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
Date: Tue, 20 May 2014 15:39:59 +0530	[thread overview]
Message-ID: <537B29F7.6060104@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140520094203.GU2485@laptop.programming.kicks-ass.net>

On 05/20/2014 03:12 PM, Peter Zijlstra wrote:
> On Tue, May 20, 2014 at 01:52:41AM +0530, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> [PATCH v5 UPDATEDv3 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
>>
>> During CPU offline, in the stop-machine loop, we use 2 separate stages to
>> disable interrupts, to ensure that the CPU going offline doesn't get any new
>> IPIs from the other CPUs after it has gone offline.
>>
>> However, an IPI sent much earlier might arrive late on the target CPU
>> (possibly _after_ the CPU has gone offline) due to hardware latencies,
>> and due to this, the smp-call-function callbacks queued on the outgoing
>> CPU might not get noticed (and hence not executed) at all.
>>
>> This is somewhat theoretical, but in any case, it makes sense to explicitly
>> loop through the call_single_queue and flush any pending callbacks before the
>> CPU goes completely offline. So, flush the queued smp-call-function callbacks
>> in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage, after disabling interrupts on the
>> active CPU. This can be trivially achieved by invoking the
>> generic_smp_call_function_single_interrupt() function itself (and since the
>> outgoing CPU is still online at this point, we won't trigger the "IPI to offline
>> CPU" warning in this function; so we are safe to call it here).
>>
>> This way, we would have handled all the queued callbacks before going offline,
>> and also, no new IPIs can be sent by the other CPUs to the outgoing CPU at that
>> point, because they will all be executing the stop-machine code with interrupts
>> disabled.
>>
>> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
>> Reviewed-by: Tejun Heo <tj@kernel.org>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  include/linux/smp.h   |    2 ++
>>  kernel/smp.c          |   17 ++++++++++++++---
>>  kernel/stop_machine.c |   11 +++++++++++
>>  3 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/smp.h b/include/linux/smp.h
>> index 633f5ed..e6b090d 100644
>> --- a/include/linux/smp.h
>> +++ b/include/linux/smp.h
>> @@ -151,6 +151,8 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
>>  
>>  static inline void kick_all_cpus_sync(void) {  }
>>  
>> +static inline void generic_smp_call_function_single_interrupt(void) { }
>> +
>>  #endif /* !SMP */
>>  
>>  /*
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 306f818..b765167 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -177,9 +177,18 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
>>  	return 0;
>>  }
>>  
>> -/*
>> - * Invoked by arch to handle an IPI for call function single. Must be
>> - * called from the arch with interrupts disabled.
>> +/**
>> + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
>> + *
>> + * Invoked by arch to handle an IPI for call function single.
>> + *
>> + * This is also invoked by a CPU about to go offline, to flush any pending
>> + * smp-call-function callbacks queued on this CPU (including those for which
>> + * the source CPU's IPIs might not have been received on this CPU yet).
>> + * This ensures that all pending IPI callbacks are run before the CPU goes
>> + * completely offline.
>> + *
>> + * Must be called with interrupts disabled.
>>   */
>>  void generic_smp_call_function_single_interrupt(void)
>>  {
>> @@ -187,6 +196,8 @@ void generic_smp_call_function_single_interrupt(void)
>>  	struct call_single_data *csd, *csd_next;
>>  	static bool warned;
>>  
>> +	WARN_ON(!irqs_disabled());
>> +
>>  	entry = llist_del_all(&__get_cpu_var(call_single_queue));
>>  	entry = llist_reverse_order(entry);
>>  
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index 288f7fe..4069c13 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -21,6 +21,7 @@
>>  #include <linux/smpboot.h>
>>  #include <linux/atomic.h>
>>  #include <linux/lglock.h>
>> +#include <linux/smp.h>
>>  
>>  /*
>>   * Structure to determine completion condition and record errors.  May
>> @@ -223,6 +224,16 @@ static int multi_cpu_stop(void *data)
>>  				if (is_active) {
>>  					local_irq_disable();
>>  					hard_irq_disable();
>> +
>> +					/*
>> +					 * IPIs (from the inactive CPUs) might
>> +					 * arrive late due to hardware latencies.
>> +					 * So flush out any pending IPI callbacks
>> +					 * explicitly, to ensure that the outgoing
>> +					 * CPU doesn't go offline with work still
>> +					 * pending (during CPU hotplug).
>> +					 */
>> +					generic_smp_call_function_single_interrupt();
>>  				}
>>  				break;
>>  			case MULTI_STOP_RUN:
> 
> The multi_cpu_stop() path isn't exclusive to hotplug, so your changelog
> is wrong or the patch is.
> 

Yes, I know that multi_cpu_stop() isn't exclusive to hotplug. That's why
I have explicitly referred to CPU hotplug in the comment as well as the
changelog.

But I totally agree that code-wise this is not the best way to do it since
this affects (although harmlessly) usecases other than hotplug as well.

Do you have any other suggestions?

> Furthermore, you yourself have worked on trying to remove stop machine
> from hotplug, that too should've been a big hint this is the wrong place
> for this.
> 

I know :-( But for lack of any other solution (other than getting rid of
stop-machine from the hotplug path), I went with this rather poor fix, as
a stop-gap arrangement until we implement the long-term solution (stop-
machine-free CPU hotplug).

While we are on the topic of stop-machine removal, I'll try to revive that
patchset again and hope that this time it will get more reviews and probably
get a chance to go upstream. That should solve a _lot_ of problems with
hotplug!

Regards,
Srivatsa S. Bhat


  reply	other threads:[~2014-05-20 10:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15 19:12 [PATCH v5 0/3] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
2014-05-15 19:13 ` [PATCH v5 1/3] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
2014-05-15 19:19   ` Joe Perches
2014-05-15 19:34     ` Srivatsa S. Bhat
2014-05-15 19:43       ` Joe Perches
2014-05-15 19:51         ` [PATCH v5 UPDATED " Srivatsa S. Bhat
2014-05-15 19:13 ` [PATCH v5 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
2014-05-15 19:17   ` Tejun Heo
2014-05-15 19:18     ` Srivatsa S. Bhat
2014-05-15 19:14 ` [PATCH v5 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline Srivatsa S. Bhat
2014-05-15 19:19   ` Tejun Heo
2014-05-15 19:26     ` Srivatsa S. Bhat
2014-05-15 19:36       ` Paul E. McKenney
2014-05-15 19:43         ` [PATCH v5 UPDATED " Srivatsa S. Bhat
2014-05-15 19:45           ` Tejun Heo
2014-05-19 12:19           ` [PATCH v5 UPDATEDv2 " Srivatsa S. Bhat
2014-05-19 16:18             ` Oleg Nesterov
2014-05-19 19:49               ` Srivatsa S. Bhat
2014-05-19 20:22                 ` [PATCH v5 UPDATEDv3 " Srivatsa S. Bhat
2014-05-20  9:42                   ` Peter Zijlstra
2014-05-20 10:09                     ` Srivatsa S. Bhat [this message]
2014-05-20 10:25                       ` Peter Zijlstra
2014-05-20 10:31                         ` Srivatsa S. Bhat
2014-05-20 10:38                           ` Srivatsa S. Bhat
2014-05-20 10:42                             ` Srivatsa S. Bhat
2014-05-15 19:44         ` [PATCH v5 " Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537B29F7.6060104@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=ego@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.