From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Milton Miller <miltonm@bga.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
akpm@linux-foundation.org, Anton Blanchard <anton@samba.org>,
xiaoguangrong@cn.fujitsu.com, mingo@elte.hu, jaxboe@fusionio.com,
npiggin@gmail.com, rusty@rustcorp.com.au, efault@gmx.de,
Jan Beulich <JBeulich@novell.com>,
Dimitri Sivanich <sivanich@sgi.com>,
Tony Luck <tony.luck@intel.com>,
torvalds@linux-foundation.org, benh@kernel.crashing.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/4 v3] call_function_many: add missing ordering
Date: Wed, 16 Mar 2011 05:06:06 -0700 [thread overview]
Message-ID: <20110316120606.GN2273@linux.vnet.ibm.com> (raw)
In-Reply-To: <smp-call-function-barriers-v3@mdm.bga.com>
On Tue, Mar 15, 2011 at 01:27:16PM -0600, Milton Miller wrote:
> Paul McKenney's review pointed out two problems with the barriers
> in the 2.6.38 update to the smp call function many code.
>
> First, a barrier that would force the func and info members of data
> to be visible before their consumption in the interrupt handler
> was missing. This can be solved by adding a smp_wmb between
> setting the func and info members and setting setting the cpumask;
> this will pair with the existing and required smp_rmb ordering
> the cpumask read before the read of refs. This placement avoids
> the need a second smp_rmb in the interrupt handler which would
> be executed on each of the N cpus executing the call request.
> (I was thinking this barrier was present but was not).
>
> Second, the previous write to refs (establishing the zero that
> we the interrupt handler was testing from all cpus) was performed
> by a third party cpu. This would invoke transitivity which, as
> a recient or concurrent addition to memory-barriers.txt now
> explicitly states, would require a full smp_mb().
>
> However, we know the cpumask will only be set by one cpu (the
> data owner) and any preivous iteration of the mask would have
> cleared by the reading cpu. By redundantly writing refs to 0 on
> the owning cpu before the smp_wmb, the write to refs will follow
> the same path as the writes that set the cpumask, which in turn
> allows us to keep the barrier in the interrupt handler a smp_rmb
> instead of promoting it to a smp_mb (which will be be executed
> by N cpus for each of the possible M elements on the list).
>
> I moved and expanded the comment about our (ab)use of the rcu list
> primitives for the concurrent walk earlier into this function.
> I considered moving the first two paragraphs to the queue list
> head and lock, but felt it would have been too disconected from
> the code.
>
> Cc: Paul McKinney <paulmck@linux.vnet.ibm.com>
> Cc: stable (2.6.32 and later)
> Signed-off-by: Milton Miller <miltonm@bga.com>
>
> Paul, please review this alternative to your patch at either of
Hello, Milton,
At first glance, this looks promising, but it will take me a few
more days to wrap my head fully around it. Fair enough?
Thanx, Paul
> http://marc.info/?l=linux-kernel&m=129662029916241&w=2
> https://patchwork.kernel.org/patch/525891/
>
> In contrast to your patch, this proposal keeps the additional
> barriers in the (interrupt enabled) requester instead of the
> execution interrupt, where they would be executed by all N cpus
> in the system triggered concurrently for each of the N possible
> elements in the list.
>
> Index: common/kernel/smp.c
> ===================================================================
> --- common.orig/kernel/smp.c 2011-03-15 05:21:41.000000000 -0500
> +++ common/kernel/smp.c 2011-03-15 05:22:26.000000000 -0500
> @@ -483,23 +483,42 @@ void smp_call_function_many(const struct
>
> data = &__get_cpu_var(cfd_data);
> csd_lock(&data->csd);
> - BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask));
>
> - data->csd.func = func;
> - data->csd.info = info;
> - cpumask_and(data->cpumask, mask, cpu_online_mask);
> - cpumask_clear_cpu(this_cpu, data->cpumask);
> + /* This BUG_ON verifies our reuse assertions and can be removed */
> + BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask));
>
> /*
> + * The global call function queue list add and delete are protected
> + * by a lock, but the list is traversed without any lock, relying
> + * on the rcu list add and delete to allow safe concurrent traversal.
> * We reuse the call function data without waiting for any grace
> * period after some other cpu removes it from the global queue.
> - * This means a cpu might find our data block as it is writen.
> - * The interrupt handler waits until it sees refs filled out
> - * while its cpu mask bit is set; here we may only clear our
> - * own cpu mask bit, and must wait to set refs until we are sure
> - * previous writes are complete and we have obtained the lock to
> - * add the element to the queue.
> + * This means a cpu might find our data block as it is being
> + * filled out.
> + *
> + * We hold off the interrupt handler on the other cpu by
> + * ordering our writes to the cpu mask vs our setting of the
> + * refs counter. We assert only the cpu owning the data block
> + * will set a bit in cpumask, and each bit will only be cleared
> + * by the subject cpu. Each cpu must first find its bit is
> + * set and then check that refs is set indicating the element is
> + * ready to be processed, otherwise it must skip the entry.
> + *
> + * On the previous iteration refs was set to 0 by another cpu.
> + * To avoid the use of transitivity, set the counter to 0 here
> + * so the wmb will pair with the rmb in the interrupt handler.
> */
> + atomic_set(&data->refs, 0); /* convert 3rd to 1st party write */
> +
> + data->csd.func = func;
> + data->csd.info = info;
> +
> + /* Ensure 0 refs is visible before mask. Also orders func and info */
> + smp_wmb();
> +
> + /* We rely on the "and" being processed before the store */
> + cpumask_and(data->cpumask, mask, cpu_online_mask);
> + cpumask_clear_cpu(this_cpu, data->cpumask);
>
> raw_spin_lock_irqsave(&call_function.lock, flags);
> /*
> @@ -509,8 +528,9 @@ void smp_call_function_many(const struct
> */
> list_add_rcu(&data->csd.list, &call_function.queue);
> /*
> - * We rely on the wmb() in list_add_rcu to order the writes
> - * to func, data, and cpumask before this write to refs.
> + * We rely on the wmb() in list_add_rcu to complete our writes
> + * to the cpumask before this write to refs, which indicates
> + * data is on the list and is ready to be processed.
> */
> atomic_set(&data->refs, cpumask_weight(data->cpumask));
> raw_spin_unlock_irqrestore(&call_function.lock, flags);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2011-03-16 12:06 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-12 4:07 [PATCH] smp_call_function_many SMP race Anton Blanchard
2011-01-17 18:17 ` Peter Zijlstra
2011-01-18 21:05 ` Milton Miller
2011-01-18 21:06 ` [PATCH 2/2] consolidate writes in smp_call_funtion_interrupt Milton Miller
2011-01-27 16:22 ` Peter Zijlstra
2011-01-27 21:59 ` Milton Miller
2011-01-29 0:20 ` call_function_many: fix list delete vs add race Milton Miller
2011-01-31 7:21 ` Mike Galbraith
2011-01-31 20:26 ` [PATCH] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-02-01 3:15 ` Mike Galbraith
2011-01-31 10:27 ` call_function_many: fix list delete vs add race Peter Zijlstra
2011-01-31 20:26 ` Milton Miller
2011-01-31 20:39 ` Peter Zijlstra
2011-01-31 21:17 ` Peter Zijlstra
2011-01-31 21:36 ` Milton Miller
2011-02-01 0:22 ` Benjamin Herrenschmidt
2011-02-01 1:39 ` Linus Torvalds
2011-02-01 2:18 ` Paul E. McKenney
2011-02-01 2:43 ` Linus Torvalds
2011-02-01 4:45 ` Paul E. McKenney
2011-02-01 5:46 ` Linus Torvalds
2011-02-01 6:18 ` Benjamin Herrenschmidt
2011-02-01 14:13 ` Paul E. McKenney
2011-02-01 6:16 ` Benjamin Herrenschmidt
[not found] ` <ipi-list-reply@mdm.bga.com>
2011-02-01 7:12 ` [PATCH 1/3 v2] " Milton Miller
2011-02-01 22:00 ` Paul E. McKenney
2011-02-01 22:00 ` Milton Miller
2011-02-02 4:17 ` Paul E. McKenney
2011-02-06 23:51 ` Paul E. McKenney
2011-03-15 19:27 ` [PATCH 0/4 v3] smp_call_function_many issues from review Milton Miller
2011-03-15 20:22 ` Luck, Tony
2011-03-15 20:32 ` Dimitri Sivanich
2011-03-15 20:39 ` Peter Zijlstra
2011-03-16 17:55 ` Linus Torvalds
2011-03-16 18:13 ` Peter Zijlstra
2011-03-17 3:15 ` Mike Galbraith
2011-02-07 8:12 ` [PATCH 1/3 v2] call_function_many: fix list delete vs add race Mike Galbraith
2011-02-08 19:36 ` Paul E. McKenney
2011-08-21 6:17 ` Mike Galbraith
2011-02-02 6:22 ` Mike Galbraith
2011-02-01 7:12 ` [PATCH 2/3 v2] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-03-15 19:27 ` [PATCH 2/4 v3] call_function_many: add missing ordering Milton Miller
2011-03-16 12:06 ` Paul E. McKenney [this message]
2011-03-15 19:27 ` [PATCH 1/4 v3] call_function_many: fix list delete vs add race Milton Miller
2011-03-15 19:27 ` [PATCH 3/4 v3] smp_call_function_many: handle concurrent clearing of mask Milton Miller
2011-03-15 22:32 ` Catalin Marinas
2011-03-16 7:52 ` Jan Beulich
2011-03-15 19:27 ` [PATCH 4/4 v3] smp_call_function_interrupt: use typedef and %pf Milton Miller
2011-01-18 21:07 ` [PATCH 1/2] smp_call_function_many SMP race Milton Miller
2011-01-20 0:41 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110316120606.GN2273@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=efault@gmx.de \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=miltonm@bga.com \
--cc=mingo@elte.hu \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=sivanich@sgi.com \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=xiaoguangrong@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).