From: Milton Miller <miltonm@bga.com>
To: Anton Blanchard <anton@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
xiaoguangrong@cn.fujitsu.com, mingo@elte.hu, jaxboe@fusionio.com,
npiggin@gmail.com, rusty@rustcorp.com.au,
akpm@linux-foundation.org, torvalds@linux-foundation.org,
paulmck@linux.vnet.ibm.com, miltonm@bga.com,
benh@kernel.crashing.org, linux-kernel@vger.kernel.org
Subject: [PATCH 1/2] smp_call_function_many SMP race
Date: Tue, 18 Jan 2011 15:07:25 -0600 [thread overview]
Message-ID: <smp-call-function-simplified@mdm.bga.com> (raw)
In-Reply-To: <smp-call-function-peter-reply@mdm.bga.com>
From: Anton Blanchard <anton@samba.org>
I noticed a failure where we hit the following WARN_ON in
generic_smp_call_function_interrupt:
if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
continue;
data->csd.func(data->csd.info);
refs = atomic_dec_return(&data->refs);
WARN_ON(refs < 0); <-------------------------
We atomically tested and cleared our bit in the cpumask, and yet the
number of cpus left (ie refs) was 0. How can this be?
It turns out commit 54fdade1c3332391948ec43530c02c4794a38172
(generic-ipi: make struct call_function_data lockless)
is at fault. It removes locking from smp_call_function_many and in
doing so creates a rather complicated race.
The problem comes about because:
- The smp_call_function_many interrupt handler walks call_function.queue
without any locking.
- We reuse a percpu data structure in smp_call_function_many.
- We do not wait for any RCU grace period before starting the next
smp_call_function_many.
Imagine a scenario where CPU A does two smp_call_functions back to
back, and CPU B does an smp_call_function in between. We concentrate on
how CPU C handles the calls:
CPU A CPU B CPU C CPU D
smp_call_function
smp_call_function_interrupt
walks
call_function.queue sees
data from CPU A on list
smp_call_function
smp_call_function_interrupt
walks
call_function.queue sees
(stale) CPU A on list
smp_call_function int
clears last ref on A
list_del_rcu, unlock
smp_call_function reuses
percpu *data A
data->cpumask sees and
clears bit in cpumask
might be using old or new fn!
decrements refs below 0
set data->refs (too late!)
The important thing to note is since the interrupt handler walks a
potentially stale call_function.queue without any locking, then another
cpu can view the percpu *data structure at any time, even when the
owner is in the process of initialising it.
The following test case hits the WARN_ON 100% of the time on my PowerPC
box (having 128 threads does help :)
#include <linux/module.h>
#include <linux/init.h>
#define ITERATIONS 100
static void do_nothing_ipi(void *dummy)
{
}
static void do_ipis(struct work_struct *dummy)
{
int i;
for (i = 0; i < ITERATIONS; i++)
smp_call_function(do_nothing_ipi, NULL, 1);
printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}
static struct work_struct work[NR_CPUS];
static int __init testcase_init(void)
{
int cpu;
for_each_online_cpu(cpu) {
INIT_WORK(&work[cpu], do_ipis);
schedule_work_on(cpu, &work[cpu]);
}
return 0;
}
static void __exit testcase_exit(void)
{
}
module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");
I tried to fix it by ordering the read and the write of ->cpumask and
->refs. In doing so I missed a critical case but Paul McKenney was able
to spot my bug thankfully :) To ensure we arent viewing previous
iterations the interrupt handler needs to read ->refs then ->cpumask
then ->refs _again_.
Thanks to Milton Miller and Paul McKenney for helping to debug this
issue.
[Milton Miller: add WARN_ON and BUG_ON, remove extra read of refs before
initial read of mask that doesn't help (also noted by Peter Zijlstra),
adjust comments, hopefully clarify senerio ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Revised-by: Milton Miller <miltonm@bga.com> [ removed excess tests ]
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: stable@kernel.org # 2.6.32 and later
Index: common/kernel/smp.c
===================================================================
--- common.orig/kernel/smp.c 2011-01-17 20:15:54.000000000 -0600
+++ common/kernel/smp.c 2011-01-17 20:16:18.000000000 -0600
@@ -194,6 +194,24 @@ void generic_smp_call_function_interrupt
list_for_each_entry_rcu(data, &call_function.queue, csd.list) {
int refs;
+ /*
+ * Since we walk the list without any locks, we might
+ * see an entry that was completed, removed from the
+ * list and is in the process of being reused.
+ *
+ * We must check that the cpu is in the cpumask before
+ * checking the refs, and both must be set before
+ * executing the callback on this cpu.
+ */
+
+ if (!cpumask_test_cpu(cpu, data->cpumask))
+ continue;
+
+ smp_rmb();
+
+ if (atomic_read(&data->refs) == 0)
+ continue;
+
if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
continue;
@@ -202,6 +220,8 @@ void generic_smp_call_function_interrupt
refs = atomic_dec_return(&data->refs);
WARN_ON(refs < 0);
if (!refs) {
+ WARN_ON(!cpumask_empty(data->cpumask));
+
raw_spin_lock(&call_function.lock);
list_del_rcu(&data->csd.list);
raw_spin_unlock(&call_function.lock);
@@ -453,11 +473,21 @@ void smp_call_function_many(const struct
data = &__get_cpu_var(cfd_data);
csd_lock(&data->csd);
+ BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask));
data->csd.func = func;
data->csd.info = info;
cpumask_and(data->cpumask, mask, cpu_online_mask);
cpumask_clear_cpu(this_cpu, data->cpumask);
+
+ /*
+ * To ensure the interrupt handler gets an complete view
+ * we order the cpumask and refs writes and order the read
+ * of them in the interrupt handler. In addition we may
+ * only clear our own cpu bit from the mask.
+ */
+ smp_wmb();
+
atomic_set(&data->refs, cpumask_weight(data->cpumask));
raw_spin_lock_irqsave(&call_function.lock, flags);
next prev parent reply other threads:[~2011-01-18 21:07 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-12 4:07 [PATCH] smp_call_function_many SMP race Anton Blanchard
2011-01-17 18:17 ` Peter Zijlstra
2011-01-18 21:05 ` Milton Miller
2011-01-18 21:06 ` [PATCH 2/2] consolidate writes in smp_call_funtion_interrupt Milton Miller
2011-01-27 16:22 ` Peter Zijlstra
2011-01-27 21:59 ` Milton Miller
2011-01-29 0:20 ` call_function_many: fix list delete vs add race Milton Miller
2011-01-31 7:21 ` Mike Galbraith
2011-01-31 20:26 ` [PATCH] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-02-01 3:15 ` Mike Galbraith
2011-01-31 10:27 ` call_function_many: fix list delete vs add race Peter Zijlstra
2011-01-31 20:26 ` Milton Miller
2011-01-31 20:39 ` Peter Zijlstra
2011-01-31 21:17 ` Peter Zijlstra
2011-01-31 21:36 ` Milton Miller
2011-02-01 0:22 ` Benjamin Herrenschmidt
2011-02-01 1:39 ` Linus Torvalds
2011-02-01 2:18 ` Paul E. McKenney
2011-02-01 2:43 ` Linus Torvalds
2011-02-01 4:45 ` Paul E. McKenney
2011-02-01 5:46 ` Linus Torvalds
2011-02-01 6:18 ` Benjamin Herrenschmidt
2011-02-01 14:13 ` Paul E. McKenney
2011-02-01 6:16 ` Benjamin Herrenschmidt
[not found] ` <ipi-list-reply@mdm.bga.com>
2011-02-01 7:12 ` [PATCH 1/3 v2] " Milton Miller
2011-02-01 22:00 ` Paul E. McKenney
2011-02-01 22:00 ` Milton Miller
2011-02-02 4:17 ` Paul E. McKenney
2011-02-06 23:51 ` Paul E. McKenney
2011-03-15 19:27 ` [PATCH 0/4 v3] smp_call_function_many issues from review Milton Miller
2011-03-15 20:22 ` Luck, Tony
2011-03-15 20:32 ` Dimitri Sivanich
2011-03-15 20:39 ` Peter Zijlstra
2011-03-16 17:55 ` Linus Torvalds
2011-03-16 18:13 ` Peter Zijlstra
2011-03-17 3:15 ` Mike Galbraith
2011-02-07 8:12 ` [PATCH 1/3 v2] call_function_many: fix list delete vs add race Mike Galbraith
2011-02-08 19:36 ` Paul E. McKenney
2011-08-21 6:17 ` Mike Galbraith
2011-02-02 6:22 ` Mike Galbraith
2011-02-01 7:12 ` [PATCH 2/3 v2] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-03-15 19:27 ` [PATCH 2/4 v3] call_function_many: add missing ordering Milton Miller
2011-03-16 12:06 ` Paul E. McKenney
2011-03-15 19:27 ` [PATCH 1/4 v3] call_function_many: fix list delete vs add race Milton Miller
2011-03-15 19:27 ` [PATCH 3/4 v3] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-03-15 22:32 ` Catalin Marinas
2011-03-16 7:52 ` Jan Beulich
2011-03-15 19:27 ` [PATCH 4/4 v3] smp_call_function_interrupt: use typedef and %pf Milton Miller
2011-01-18 21:07 ` Milton Miller [this message]
2011-01-20 0:41 ` [PATCH 1/2] smp_call_function_many SMP race Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=smp-call-function-simplified@mdm.bga.com \
--to=miltonm@bga.com \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=npiggin@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@linux-foundation.org \
--cc=xiaoguangrong@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.