From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>,
David Howells <dhowells@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] percpu: add data dependency barrier in percpu accessors and operations
Date: Thu, 12 Jun 2014 08:34:26 -0700 [thread overview]
Message-ID: <20140612153426.GV4581@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140612135630.GA23606@htj.dyndns.org>
On Thu, Jun 12, 2014 at 09:56:30AM -0400, Tejun Heo wrote:
> >From 2ff90ab638d50d6191ba3a3564b53fccb78bd4cd Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Wed, 11 Jun 2014 20:47:02 -0400
>
> percpu areas are zeroed on allocation and, by its nature, accessed
> from multiple cpus. Consider the following scenario.
>
> p = NULL;
>
> CPU-1 CPU-2
> p = alloc_percpu() if (p)
> WARN_ON(this_cpu_read(*p));
>
> As all percpu areas are zeroed on allocation, CPU-2 should never
> trigger the WARN; however, there's no barrier between zeroing of the
> percpu regions and the assignment of @p or between testing of @p and
> dereferencing it and CPU-2 may see garbage value from before the
> clearing and trigger the warning.
>
> Note that this may happen even on archs which don't require data
> dependency barriers. While CPU-2 woudln't reorder percpu area access
> above the testing of @p, nothing prevents CPU-1 from scheduling
> zeroing after the assignment of @p.
>
> This patch fixes the issue by adding a smp_wmb() before a newly
> allocated percpu pointer is returned and a smp_read_barrier_depends()
> in __verify_pcpu_ptr() which is guaranteed to be invoked at least once
> before every percpu access. It currently may be invoked multiple
> times per operation which isn't optimal. Future patches will update
> the definitions so that the macro is invoked only once per access.
>
> It can be argued that smp_read_barrier_depends() is the responsibility
> of the caller; however, discerning local and remote accesses gets very
> confusing with percpu areas (initialized remotely but local to this
> cpu and vice-versa) and data dependency barrier is free on all archs
> except for alpha, so I think it's better to include it as part of
> percpu accessors and operations.
OK, I will bite... Did you find a bug of this form? (I do see a
couple of possible bugs on a quick look, so maybe...)
I would argue that the code above should instead say something like:
smp_store_release(p, alloc_percpu());
I was worried about use of per_cpu() by the reading CPU, but of course
in that case, things are initialized at compiler time.
> I wonder whether we also need a smp_wmb() for other __GFP_ZERO
> allocations. There isn't one and requiring the users to perform
> smp_wmb() to ensure the propagation of zeroing seems too subtle.
I would say "no" even if we do decide that alloc_percpu() needs
an smp_wmb(). The reason is that you really do have to store the
pointer at some point, and you should use smp_store_release() for
this task, at least if you are storing to something accessible to
other CPUs.
Thanx, Paul
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/percpu-defs.h | 15 ++++++++++++---
> mm/percpu.c | 9 +++++++++
> 2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
> index a5fc7d0..b91bb37 100644
> --- a/include/linux/percpu-defs.h
> +++ b/include/linux/percpu-defs.h
> @@ -1,6 +1,8 @@
> #ifndef _LINUX_PERCPU_DEFS_H
> #define _LINUX_PERCPU_DEFS_H
>
> +#include <asm/barrier.h>
> +
> /*
> * Base implementations of per-CPU variable declarations and definitions, where
> * the section in which the variable is to be placed is provided by the
> @@ -19,9 +21,15 @@
> __attribute__((section(".discard"), unused))
>
> /*
> - * Macro which verifies @ptr is a percpu pointer without evaluating
> - * @ptr. This is to be used in percpu accessors to verify that the
> - * input parameter is a percpu pointer.
> + * This macro serves two purposes. It verifies @ptr is a percpu pointer
> + * without evaluating @ptr and provides the data dependency barrier paired
> + * with smp_wmb() at the end of the allocation path so that the memory
> + * clearing in the allocation path is visible to all percpu accsses.
> + *
> + * The existence of the data dependency barrier is guaranteed and percpu
> + * users can take advantage of it - e.g. percpu area updates followed by
> + * smp_wmb() and then a percpu pointer assignment are guaranteed to be
> + * visible to accessors which access through the assigned percpu pointer.
> *
> * + 0 is required in order to convert the pointer type from a
> * potential array type to a pointer to a single item of the array.
> @@ -29,6 +37,7 @@
> #define __verify_pcpu_ptr(ptr) do { \
> const void __percpu *__vpp_verify = (typeof((ptr) + 0))NULL; \
> (void)__vpp_verify; \
> + smp_read_barrier_depends(); \
> } while (0)
>
> /*
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 2ddf9a9..bd3cab8 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -816,6 +816,15 @@ area_found:
> /* return address relative to base address */
> ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
> kmemleak_alloc_percpu(ptr, size);
> +
> + /*
> + * The following wmb is paired with the data dependency barrier in
> + * the percpu accessors and guarantees that the zeroing of the
> + * percpu areas in pcpu_populate_chunk() is visible to all percpu
> + * accesses through the returned percpu pointer. The accessors get
> + * their data dependency barrier from __verify_pcpu_ptr().
> + */
> + smp_wmb();
> return ptr;
>
> fail_unlock:
> --
> 1.9.3
>
next prev parent reply other threads:[~2014-06-12 15:34 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-12 13:56 [PATCH RFC] percpu: add data dependency barrier in percpu accessors and operations Tejun Heo
2014-06-12 15:34 ` Paul E. McKenney [this message]
2014-06-12 15:52 ` Tejun Heo
2014-06-17 14:41 ` Paul E. McKenney
2014-06-17 15:27 ` Tejun Heo
2014-06-17 15:56 ` Christoph Lameter
2014-06-17 16:00 ` Tejun Heo
2014-06-17 16:05 ` Tejun Heo
2014-06-17 16:28 ` Christoph Lameter
[not found] ` <CA+55aFxHr8JXwDR-4g4z1mkXvZRtY=OosYcUMPZRD2upfooS1w@mail.gmail.com>
2014-06-17 18:47 ` Christoph Lameter
2014-06-17 18:55 ` Paul E. McKenney
2014-06-17 19:39 ` Christoph Lameter
2014-06-17 19:47 ` Tejun Heo
2014-06-17 19:56 ` Paul E. McKenney
2014-06-19 20:39 ` Christoph Lameter
2014-06-17 16:57 ` Paul E. McKenney
2014-06-17 18:56 ` Tejun Heo
2014-06-17 19:42 ` Christoph Lameter
2014-06-17 20:44 ` Tejun Heo
2014-07-09 0:55 ` Rusty Russell
2014-07-14 11:39 ` Paul E. McKenney
2014-07-14 15:22 ` Christoph Lameter
2014-07-15 10:11 ` Paul E. McKenney
2014-07-15 14:06 ` Christoph Lameter
2014-07-15 14:32 ` Paul E. McKenney
2014-07-15 15:06 ` Christoph Lameter
2014-07-15 15:41 ` Linus Torvalds
2014-07-15 16:12 ` Christoph Lameter
[not found] ` <CA+55aFxU166V5-vH4vmK9OBdTZKyede=71RjjbOVSN9Qh+Se+A@mail.gmail.com>
2014-07-15 17:45 ` Paul E. McKenney
2014-07-15 17:41 ` Paul E. McKenney
2014-07-16 14:40 ` Christoph Lameter
2014-07-15 11:50 ` Rusty Russell
2014-06-17 19:27 ` Christoph Lameter
2014-06-17 19:40 ` Paul E. McKenney
2014-06-19 20:42 ` Christoph Lameter
2014-06-19 20:46 ` Tejun Heo
2014-06-19 21:11 ` Christoph Lameter
2014-06-19 21:15 ` Tejun Heo
2014-06-20 15:23 ` Christoph Lameter
2014-06-20 15:52 ` Tejun Heo
2014-06-19 20:51 ` Paul E. McKenney
2014-06-20 15:29 ` Christoph Lameter
2014-06-20 15:50 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140612153426.GV4581@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox