* smp_call_function_single with wait=0 considered harmful
@ 2013-12-04 16:46 Christoph Hellwig
2013-12-05 21:43 ` Bjorn Helgaas
2014-02-28 12:26 ` Peter Zijlstra
0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2013-12-04 16:46 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck,
Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney
Cc: linux-kernel
While doing my recent work on the generic smp function calls I noticed
that smp_call_function_single without the wait flag can't work, as
it allocates struct call_single_data on stack, and without the wait
flag will happily return before the IPI has been executed.
This affects the following callers:
arch/ia64/kernel/mca.c:mca_cpu_callback()
arch/ia64/kernel/smpboot.c:ia64_sync_itc()
arch/x86/kernel/kvm.c:kvm_cpu_notify()
arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier()
arch/x86/pci/amd_bus.c:amd_cpu_notify()
drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu()
kernel/stop_machine.c:stop_two_cpus()
It would be good to get these fixed so that we could remove the
parameter. Either convert them to wait, or use a preallocated
call_single_data and __smp_call_function_single.
After that I'd like to remove the wait argument to prevent further
abuses.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig
@ 2013-12-05 21:43 ` Bjorn Helgaas
2013-12-06 10:56 ` Christoph Hellwig
2014-02-28 12:26 ` Peter Zijlstra
1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2013-12-05 21:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck,
Robert Richter, Aaro Koskinen, David Daney,
linux-kernel@vger.kernel.org
On Wed, Dec 4, 2013 at 9:46 AM, Christoph Hellwig <hch@infradead.org> wrote:
> While doing my recent work on the generic smp function calls I noticed
> that smp_call_function_single without the wait flag can't work, as
> it allocates struct call_single_data on stack, and without the wait
> flag will happily return before the IPI has been executed.
I don't understand the problem yet. With wait==0,
smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so
it's not using a struct on the stack. We'll queue up "func" and
likely will return before it is executed, but that should be fine
because nobody will overwrite csd_data until it *is* executed and
csd_unlock() has been called.
> This affects the following callers:
>
> arch/ia64/kernel/mca.c:mca_cpu_callback()
> arch/ia64/kernel/smpboot.c:ia64_sync_itc()
> arch/x86/kernel/kvm.c:kvm_cpu_notify()
> arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier()
> arch/x86/pci/amd_bus.c:amd_cpu_notify()
I don't see any reason why amd_cpu_notify() needs to use wait==0.
> drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu()
> kernel/stop_machine.c:stop_two_cpus()
>
> It would be good to get these fixed so that we could remove the
> parameter. Either convert them to wait, or use a preallocated
> call_single_data and __smp_call_function_single.
>
> After that I'd like to remove the wait argument to prevent further
> abuses.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2013-12-05 21:43 ` Bjorn Helgaas
@ 2013-12-06 10:56 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2013-12-06 10:56 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner,
Tony Luck, Robert Richter, Aaro Koskinen, David Daney,
linux-kernel@vger.kernel.org
On Thu, Dec 05, 2013 at 02:43:03PM -0700, Bjorn Helgaas wrote:
> smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so
> it's not using a struct on the stack. We'll queue up "func" and
> likely will return before it is executed, but that should be fine
> because nobody will overwrite csd_data until it *is* executed and
> csd_unlock() has been called.
You're right, I missed the usage of the per-cpu data later in the
function.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig
2013-12-05 21:43 ` Bjorn Helgaas
@ 2014-02-28 12:26 ` Peter Zijlstra
2014-02-28 12:39 ` Peter Zijlstra
1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2014-02-28 12:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck,
Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney,
linux-kernel
On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote:
> While doing my recent work on the generic smp function calls I noticed
> that smp_call_function_single without the wait flag can't work, as
> it allocates struct call_single_data on stack, and without the wait
> flag will happily return before the IPI has been executed.
It doesn't actually; it uses a per-cpu one in the !wait case.
The subsequent csd_lock() ensures it will wait for any prior user to
complete, so only if you're doing multiple smp_call_function_single()
invocations back-to-back will they queue up.
> This affects the following callers:
<snip>
> kernel/stop_machine.c:stop_two_cpus()
That site should work with .wait=1 just fine, but given the above, the
.wait=0 doesn't appear problematic at all.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2014-02-28 12:26 ` Peter Zijlstra
@ 2014-02-28 12:39 ` Peter Zijlstra
2014-02-28 17:06 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2014-02-28 12:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck,
Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney,
linux-kernel, Prarit Bhargava, Rik van Riel, Mel Gorman
On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote:
> > kernel/stop_machine.c:stop_two_cpus()
>
> That site should work with .wait=1 just fine, but given the above, the
> .wait=0 doesn't appear problematic at all.
Scratch that; its broken, but not because of smp_call_function_single().
---
Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
We must use smp_call_function_single(.wait=1) for the
irq_cpu_stop_queue_work() to ensure the queueing is actually done under
stop_cpus_lock. Without this we could have dropped the lock by the time
we do the queueing and get the race we tried to fix.
Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
kernel/stop_machine.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 84571e09c907..01fbae5b97b7 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
*/
smp_call_function_single(min(cpu1, cpu2),
&irq_cpu_stop_queue_work,
- &call_args, 0);
+ &call_args, 1);
lg_local_unlock(&stop_cpus_lock);
preempt_enable();
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2014-02-28 12:39 ` Peter Zijlstra
@ 2014-02-28 17:06 ` Rik van Riel
2014-02-28 17:34 ` Prarit Bhargava
2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra
2 siblings, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2014-02-28 17:06 UTC (permalink / raw)
To: Peter Zijlstra, Christoph Hellwig
Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck,
Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney,
linux-kernel, Prarit Bhargava, Mel Gorman
On 02/28/2014 07:39 AM, Peter Zijlstra wrote:
> Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
>
> We must use smp_call_function_single(.wait=1) for the
> irq_cpu_stop_queue_work() to ensure the queueing is actually done under
> stop_cpus_lock. Without this we could have dropped the lock by the time
> we do the queueing and get the race we tried to fix.
>
> Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
--
All rights reversed
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful
2014-02-28 12:39 ` Peter Zijlstra
2014-02-28 17:06 ` Rik van Riel
@ 2014-02-28 17:34 ` Prarit Bhargava
2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra
2 siblings, 0 replies; 8+ messages in thread
From: Prarit Bhargava @ 2014-02-28 17:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner,
Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen,
David Daney, linux-kernel, Rik van Riel, Mel Gorman
On 02/28/2014 07:39 AM, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote:
>> On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote:
>>> kernel/stop_machine.c:stop_two_cpus()
>>
>> That site should work with .wait=1 just fine, but given the above, the
>> .wait=0 doesn't appear problematic at all.
>
> Scratch that; its broken, but not because of smp_call_function_single().
>
> ---
> Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
>
> We must use smp_call_function_single(.wait=1) for the
> irq_cpu_stop_queue_work() to ensure the queueing is actually done under
> stop_cpus_lock. Without this we could have dropped the lock by the time
> we do the queueing and get the race we tried to fix.
>
> Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
P.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
2014-02-28 12:39 ` Peter Zijlstra
2014-02-28 17:06 ` Rik van Riel
2014-02-28 17:34 ` Prarit Bhargava
@ 2014-03-11 12:36 ` tip-bot for Peter Zijlstra
2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-03-11 12:36 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, peterz, hch, riel, akpm, mgorman, tglx,
prarit
Commit-ID: 177c53d943368fc97644ebc0a250dc8e2d124250
Gitweb: http://git.kernel.org/tip/177c53d943368fc97644ebc0a250dc8e2d124250
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 28 Feb 2014 13:39:05 +0100
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 11:33:47 +0100
stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
We must use smp_call_function_single(.wait=1) for the
irq_cpu_stop_queue_work() to ensure the queueing is actually done under
stop_cpus_lock. Without this we could have dropped the lock by the time
we do the queueing and get the race we tried to fix.
Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/stop_machine.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 84571e0..01fbae5 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
*/
smp_call_function_single(min(cpu1, cpu2),
&irq_cpu_stop_queue_work,
- &call_args, 0);
+ &call_args, 1);
lg_local_unlock(&stop_cpus_lock);
preempt_enable();
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-11 12:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig
2013-12-05 21:43 ` Bjorn Helgaas
2013-12-06 10:56 ` Christoph Hellwig
2014-02-28 12:26 ` Peter Zijlstra
2014-02-28 12:39 ` Peter Zijlstra
2014-02-28 17:06 ` Rik van Riel
2014-02-28 17:34 ` Prarit Bhargava
2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox