* smp_call_function_single with wait=0 considered harmful @ 2013-12-04 16:46 Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas 2014-02-28 12:26 ` Peter Zijlstra 0 siblings, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2013-12-04 16:46 UTC (permalink / raw) To: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney Cc: linux-kernel While doing my recent work on the generic smp function calls I noticed that smp_call_function_single without the wait flag can't work, as it allocates struct call_single_data on stack, and without the wait flag will happily return before the IPI has been executed. This affects the following callers: arch/ia64/kernel/mca.c:mca_cpu_callback() arch/ia64/kernel/smpboot.c:ia64_sync_itc() arch/x86/kernel/kvm.c:kvm_cpu_notify() arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier() arch/x86/pci/amd_bus.c:amd_cpu_notify() drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu() kernel/stop_machine.c:stop_two_cpus() It would be good to get these fixed so that we could remove the parameter. Either convert them to wait, or use a preallocated call_single_data and __smp_call_function_single. After that I'd like to remove the wait argument to prevent further abuses. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig @ 2013-12-05 21:43 ` Bjorn Helgaas 2013-12-06 10:56 ` Christoph Hellwig 2014-02-28 12:26 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Bjorn Helgaas @ 2013-12-05 21:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Aaro Koskinen, David Daney, linux-kernel@vger.kernel.org On Wed, Dec 4, 2013 at 9:46 AM, Christoph Hellwig <hch@infradead.org> wrote: > While doing my recent work on the generic smp function calls I noticed > that smp_call_function_single without the wait flag can't work, as > it allocates struct call_single_data on stack, and without the wait > flag will happily return before the IPI has been executed. I don't understand the problem yet. With wait==0, smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so it's not using a struct on the stack. We'll queue up "func" and likely will return before it is executed, but that should be fine because nobody will overwrite csd_data until it *is* executed and csd_unlock() has been called. > This affects the following callers: > > arch/ia64/kernel/mca.c:mca_cpu_callback() > arch/ia64/kernel/smpboot.c:ia64_sync_itc() > arch/x86/kernel/kvm.c:kvm_cpu_notify() > arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier() > arch/x86/pci/amd_bus.c:amd_cpu_notify() I don't see any reason why amd_cpu_notify() needs to use wait==0. > drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu() > kernel/stop_machine.c:stop_two_cpus() > > It would be good to get these fixed so that we could remove the > parameter. Either convert them to wait, or use a preallocated > call_single_data and __smp_call_function_single. > > After that I'd like to remove the wait argument to prevent further > abuses. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-05 21:43 ` Bjorn Helgaas @ 2013-12-06 10:56 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2013-12-06 10:56 UTC (permalink / raw) To: Bjorn Helgaas Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Aaro Koskinen, David Daney, linux-kernel@vger.kernel.org On Thu, Dec 05, 2013 at 02:43:03PM -0700, Bjorn Helgaas wrote: > smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so > it's not using a struct on the stack. We'll queue up "func" and > likely will return before it is executed, but that should be fine > because nobody will overwrite csd_data until it *is* executed and > csd_unlock() has been called. You're right, I missed the usage of the per-cpu data later in the function. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas @ 2014-02-28 12:26 ` Peter Zijlstra 2014-02-28 12:39 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2014-02-28 12:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: > While doing my recent work on the generic smp function calls I noticed > that smp_call_function_single without the wait flag can't work, as > it allocates struct call_single_data on stack, and without the wait > flag will happily return before the IPI has been executed. It doesn't actually; it uses a per-cpu one in the !wait case. The subsequent csd_lock() ensures it will wait for any prior user to complete, so only if you're doing multiple smp_call_function_single() invocations back-to-back will they queue up. > This affects the following callers: <snip> > kernel/stop_machine.c:stop_two_cpus() That site should work with .wait=1 just fine, but given the above, the .wait=0 doesn't appear problematic at all. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:26 ` Peter Zijlstra @ 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Peter Zijlstra @ 2014-02-28 12:39 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Prarit Bhargava, Rik van Riel, Mel Gorman On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote: > On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: > > kernel/stop_machine.c:stop_two_cpus() > > That site should work with .wait=1 just fine, but given the above, the > .wait=0 doesn't appear problematic at all. Scratch that; its broken, but not because of smp_call_function_single(). --- Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- kernel/stop_machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 84571e09c907..01fbae5b97b7 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * */ smp_call_function_single(min(cpu1, cpu2), &irq_cpu_stop_queue_work, - &call_args, 0); + &call_args, 1); lg_local_unlock(&stop_cpus_lock); preempt_enable(); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:39 ` Peter Zijlstra @ 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: Rik van Riel @ 2014-02-28 17:06 UTC (permalink / raw) To: Peter Zijlstra, Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Prarit Bhargava, Mel Gorman On 02/28/2014 07:39 AM, Peter Zijlstra wrote: > Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() > > We must use smp_call_function_single(.wait=1) for the > irq_cpu_stop_queue_work() to ensure the queueing is actually done under > stop_cpus_lock. Without this we could have dropped the lock by the time > we do the queueing and get the race we tried to fix. > > Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> -- All rights reversed ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel @ 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: Prarit Bhargava @ 2014-02-28 17:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Rik van Riel, Mel Gorman On 02/28/2014 07:39 AM, Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote: >> On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: >>> kernel/stop_machine.c:stop_two_cpus() >> >> That site should work with .wait=1 just fine, but given the above, the >> .wait=0 doesn't appear problematic at all. > > Scratch that; its broken, but not because of smp_call_function_single(). > > --- > Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() > > We must use smp_call_function_single(.wait=1) for the > irq_cpu_stop_queue_work() to ensure the queueing is actually done under > stop_cpus_lock. Without this we could have dropped the lock by the time > we do the queueing and get the race we tried to fix. > > Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Prarit Bhargava <prarit@redhat.com> P. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava @ 2014-03-11 12:36 ` tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: tip-bot for Peter Zijlstra @ 2014-03-11 12:36 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, peterz, hch, riel, akpm, mgorman, tglx, prarit Commit-ID: 177c53d943368fc97644ebc0a250dc8e2d124250 Gitweb: http://git.kernel.org/tip/177c53d943368fc97644ebc0a250dc8e2d124250 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Fri, 28 Feb 2014 13:39:05 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 11 Mar 2014 11:33:47 +0100 stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/stop_machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 84571e0..01fbae5 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * */ smp_call_function_single(min(cpu1, cpu2), &irq_cpu_stop_queue_work, - &call_args, 0); + &call_args, 1); lg_local_unlock(&stop_cpus_lock); preempt_enable(); ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-11 12:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas 2013-12-06 10:56 ` Christoph Hellwig 2014-02-28 12:26 ` Peter Zijlstra 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox