* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
[not found] ` <20170928084535.GA19060@arm.com>
@ 2017-09-28 15:43 ` Paul E. McKenney
2017-09-28 15:49 ` Will Deacon
2017-09-28 18:59 ` Michael Cree
0 siblings, 2 replies; 13+ messages in thread
From: Paul E. McKenney @ 2017-09-28 15:43 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, kirill.shutemov, linux-kernel, ynorov, rruigrok,
linux-arch, akpm, catalin.marinas, rth, ink, mattst88,
linux-alpha
On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > In many cases, page tables can be accessed concurrently by either another
> > > CPU (due to things like fast gup) or by the hardware page table walker
> > > itself, which may set access/dirty bits. In such cases, it is important
> > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > entries cannot be torn, merged or subject to apparent loss of coherence.
> >
> > In fact, we should use lockless_dereference() for many of them. Yes
> > Alpha is the only one that cares about the difference between that and
> > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > this, we might as well do it 'right' :-)
>
> I know this sounds daft, but I think one of the big reasons why
> lockless_dereference() doesn't get an awful lot of use is because it's
> such a mouthful! Why don't we just move the smp_read_barrier_depends()
> into READ_ONCE? Would anybody actually care about the potential impact on
> Alpha (which, frankly, is treading on thin ice given the low adoption of
> lockless_dereference())?
This is my cue to ask my usual question... ;-)
Are people still running mainline kernels on Alpha? (Added Alpha folks.)
As always, if anyone is, we must continue to support Alpha, but sounds
like time to check again.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-28 15:43 ` [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables Paul E. McKenney
@ 2017-09-28 15:49 ` Will Deacon
2017-09-28 16:07 ` Paul E. McKenney
2017-09-28 18:59 ` Michael Cree
1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2017-09-28 15:49 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Peter Zijlstra, kirill.shutemov, linux-kernel, ynorov, rruigrok,
linux-arch, akpm, catalin.marinas, rth, ink, mattst88,
linux-alpha
On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > In many cases, page tables can be accessed concurrently by either another
> > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > >
> > > In fact, we should use lockless_dereference() for many of them. Yes
> > > Alpha is the only one that cares about the difference between that and
> > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > this, we might as well do it 'right' :-)
> >
> > I know this sounds daft, but I think one of the big reasons why
> > lockless_dereference() doesn't get an awful lot of use is because it's
> > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > into READ_ONCE? Would anybody actually care about the potential impact on
> > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > lockless_dereference())?
>
> This is my cue to ask my usual question... ;-)
>
> Are people still running mainline kernels on Alpha? (Added Alpha folks.)
>
> As always, if anyone is, we must continue to support Alpha, but sounds
> like time to check again.
I'll be honest and say that I haven't updated mine for a while, but I do
have a soft spot for those machines :(
Will
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-28 15:49 ` Will Deacon
@ 2017-09-28 16:07 ` Paul E. McKenney
0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2017-09-28 16:07 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, kirill.shutemov, linux-kernel, ynorov, rruigrok,
linux-arch, akpm, catalin.marinas, rth, ink, mattst88,
linux-alpha
On Thu, Sep 28, 2017 at 04:49:54PM +0100, Will Deacon wrote:
> On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > >
> > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > Alpha is the only one that cares about the difference between that and
> > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > this, we might as well do it 'right' :-)
> > >
> > > I know this sounds daft, but I think one of the big reasons why
> > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > lockless_dereference())?
> >
> > This is my cue to ask my usual question... ;-)
> >
> > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
> >
> > As always, if anyone is, we must continue to support Alpha, but sounds
> > like time to check again.
>
> I'll be honest and say that I haven't updated mine for a while, but I do
> have a soft spot for those machines :(
Let's see what the Alpha folks say. I myself have had a close
relationship with Alpha for almost 20 years, but I suspect that in
my case it is more a hard spot on my head rather than a soft spot in
my heart. ;-)
Thanx,
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-28 15:43 ` [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables Paul E. McKenney
2017-09-28 15:49 ` Will Deacon
@ 2017-09-28 18:59 ` Michael Cree
2017-09-29 0:58 ` Paul E. McKenney
1 sibling, 1 reply; 13+ messages in thread
From: Michael Cree @ 2017-09-28 18:59 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Will Deacon, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > In many cases, page tables can be accessed concurrently by either another
> > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > >
> > > In fact, we should use lockless_dereference() for many of them. Yes
> > > Alpha is the only one that cares about the difference between that and
> > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > this, we might as well do it 'right' :-)
> >
> > I know this sounds daft, but I think one of the big reasons why
> > lockless_dereference() doesn't get an awful lot of use is because it's
> > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > into READ_ONCE? Would anybody actually care about the potential impact on
> > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > lockless_dereference())?
>
> This is my cue to ask my usual question... ;-)
>
> Are people still running mainline kernels on Alpha? (Added Alpha folks.)
Yes. I run two Alpha build daemons that build the unofficial
debian-alpha port. Debian popcon reports nine machines running
Alpha, which are likely to be running the 4.12.y kernel which
is currently in debian-alpha, (and presumably soon to be 4.13.y
which is now built on Alpha in experimental).
Cheers
Michael
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-28 18:59 ` Michael Cree
@ 2017-09-29 0:58 ` Paul E. McKenney
2017-09-29 9:08 ` Will Deacon
0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-09-29 0:58 UTC (permalink / raw)
To: Michael Cree, Will Deacon, Peter Zijlstra, kirill.shutemov,
linux-kernel, ynorov, rruigrok, linux-arch, akpm, catalin.marinas,
rth, ink, mattst88, linux-alpha
On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > >
> > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > Alpha is the only one that cares about the difference between that and
> > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > this, we might as well do it 'right' :-)
> > >
> > > I know this sounds daft, but I think one of the big reasons why
> > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > lockless_dereference())?
> >
> > This is my cue to ask my usual question... ;-)
> >
> > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
>
> Yes. I run two Alpha build daemons that build the unofficial
> debian-alpha port. Debian popcon reports nine machines running
> Alpha, which are likely to be running the 4.12.y kernel which
> is currently in debian-alpha, (and presumably soon to be 4.13.y
> which is now built on Alpha in experimental).
I salute your dedication to Alpha! ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-29 0:58 ` Paul E. McKenney
@ 2017-09-29 9:08 ` Will Deacon
2017-09-29 16:29 ` Paul E. McKenney
0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2017-09-29 9:08 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Thu, Sep 28, 2017 at 05:58:30PM -0700, Paul E. McKenney wrote:
> On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> > On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > > >
> > > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > > Alpha is the only one that cares about the difference between that and
> > > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > > this, we might as well do it 'right' :-)
> > > >
> > > > I know this sounds daft, but I think one of the big reasons why
> > > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > > lockless_dereference())?
> > >
> > > This is my cue to ask my usual question... ;-)
> > >
> > > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
> >
> > Yes. I run two Alpha build daemons that build the unofficial
> > debian-alpha port. Debian popcon reports nine machines running
> > Alpha, which are likely to be running the 4.12.y kernel which
> > is currently in debian-alpha, (and presumably soon to be 4.13.y
> > which is now built on Alpha in experimental).
>
> I salute your dedication to Alpha! ;-)
Ok, but where does that leave us wrt my initial proposal of moving
smp_read_barrier_depends() into READ_ONCE and getting rid of
lockless_dereference?
Michael (or anybody else running mainline on SMP Alpha) -- would you be
able to give the diff below a spin and see whether there's a measurable
performance impact?
Cheers,
Will
--->8
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..0ce21e25492a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
__read_once_size(&(x), __u.__c, sizeof(x)); \
else \
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
+ smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
__u.__val; \
})
#define READ_ONCE(x) __READ_ONCE(x, 1)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-29 9:08 ` Will Deacon
@ 2017-09-29 16:29 ` Paul E. McKenney
2017-09-29 16:33 ` Will Deacon
0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-09-29 16:29 UTC (permalink / raw)
To: Will Deacon
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> On Thu, Sep 28, 2017 at 05:58:30PM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> > > On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > > > >
> > > > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > > > Alpha is the only one that cares about the difference between that and
> > > > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > > > this, we might as well do it 'right' :-)
> > > > >
> > > > > I know this sounds daft, but I think one of the big reasons why
> > > > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > > > lockless_dereference())?
> > > >
> > > > This is my cue to ask my usual question... ;-)
> > > >
> > > > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
> > >
> > > Yes. I run two Alpha build daemons that build the unofficial
> > > debian-alpha port. Debian popcon reports nine machines running
> > > Alpha, which are likely to be running the 4.12.y kernel which
> > > is currently in debian-alpha, (and presumably soon to be 4.13.y
> > > which is now built on Alpha in experimental).
> >
> > I salute your dedication to Alpha! ;-)
>
> Ok, but where does that leave us wrt my initial proposal of moving
> smp_read_barrier_depends() into READ_ONCE and getting rid of
> lockless_dereference?
>
> Michael (or anybody else running mainline on SMP Alpha) -- would you be
> able to give the diff below a spin and see whether there's a measurable
> performance impact?
This will be a sensitive test. The smp_read_barrier_depends() can be
removed from lockless_dereference(). Without this removal Alpha will
get two memory barriers from rcu_dereference() and friends.
Thanx, Paul
> Cheers,
>
> Will
>
> --->8
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index e95a2631e545..0ce21e25492a 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> __read_once_size(&(x), __u.__c, sizeof(x)); \
> else \
> __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
> + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
> __u.__val; \
> })
> #define READ_ONCE(x) __READ_ONCE(x, 1)
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-29 16:29 ` Paul E. McKenney
@ 2017-09-29 16:33 ` Will Deacon
2017-10-03 19:11 ` Paul E. McKenney
0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2017-09-29 16:33 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > On Thu, Sep 28, 2017 at 05:58:30PM -0700, Paul E. McKenney wrote:
> > > On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> > > > On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > > > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > > > > >
> > > > > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > > > > Alpha is the only one that cares about the difference between that and
> > > > > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > > > > this, we might as well do it 'right' :-)
> > > > > >
> > > > > > I know this sounds daft, but I think one of the big reasons why
> > > > > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > > > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > > > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > > > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > > > > lockless_dereference())?
> > > > >
> > > > > This is my cue to ask my usual question... ;-)
> > > > >
> > > > > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
> > > >
> > > > Yes. I run two Alpha build daemons that build the unofficial
> > > > debian-alpha port. Debian popcon reports nine machines running
> > > > Alpha, which are likely to be running the 4.12.y kernel which
> > > > is currently in debian-alpha, (and presumably soon to be 4.13.y
> > > > which is now built on Alpha in experimental).
> > >
> > > I salute your dedication to Alpha! ;-)
> >
> > Ok, but where does that leave us wrt my initial proposal of moving
> > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > lockless_dereference?
> >
> > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > able to give the diff below a spin and see whether there's a measurable
> > performance impact?
>
> This will be a sensitive test. The smp_read_barrier_depends() can be
> removed from lockless_dereference(). Without this removal Alpha will
> get two memory barriers from rcu_dereference() and friends.
Oh yes, good point. I was trying to keep the diff simple, but you're
right that this is packing too many barriers. Fixed diff below.
Thanks,
Will
--->8
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..c4ee9d6d8f2d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
__read_once_size(&(x), __u.__c, sizeof(x)); \
else \
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
+ smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
__u.__val; \
})
#define READ_ONCE(x) __READ_ONCE(x, 1)
@@ -620,7 +621,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
({ \
typeof(p) _________p1 = READ_ONCE(p); \
typeof(*(p)) *___typecheck_p __maybe_unused; \
- smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
(_________p1); \
})
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-09-29 16:33 ` Will Deacon
@ 2017-10-03 19:11 ` Paul E. McKenney
2017-10-05 16:31 ` Will Deacon
0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-10-03 19:11 UTC (permalink / raw)
To: Will Deacon
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote:
> On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > > On Thu, Sep 28, 2017 at 05:58:30PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> > > > > On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > > > > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > > > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > > > > > In many cases, page tables can be accessed concurrently by either another
> > > > > > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > > > > > itself, which may set access/dirty bits. In such cases, it is important
> > > > > > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > > > > > entries cannot be torn, merged or subject to apparent loss of coherence.
> > > > > > > >
> > > > > > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > > > > > Alpha is the only one that cares about the difference between that and
> > > > > > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > > > > > this, we might as well do it 'right' :-)
> > > > > > >
> > > > > > > I know this sounds daft, but I think one of the big reasons why
> > > > > > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > > > > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > > > > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > > > > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > > > > > lockless_dereference())?
> > > > > >
> > > > > > This is my cue to ask my usual question... ;-)
> > > > > >
> > > > > > Are people still running mainline kernels on Alpha? (Added Alpha folks.)
> > > > >
> > > > > Yes. I run two Alpha build daemons that build the unofficial
> > > > > debian-alpha port. Debian popcon reports nine machines running
> > > > > Alpha, which are likely to be running the 4.12.y kernel which
> > > > > is currently in debian-alpha, (and presumably soon to be 4.13.y
> > > > > which is now built on Alpha in experimental).
> > > >
> > > > I salute your dedication to Alpha! ;-)
> > >
> > > Ok, but where does that leave us wrt my initial proposal of moving
> > > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > > lockless_dereference?
> > >
> > > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > > able to give the diff below a spin and see whether there's a measurable
> > > performance impact?
> >
> > This will be a sensitive test. The smp_read_barrier_depends() can be
> > removed from lockless_dereference(). Without this removal Alpha will
> > get two memory barriers from rcu_dereference() and friends.
>
> Oh yes, good point. I was trying to keep the diff simple, but you're
> right that this is packing too many barriers. Fixed diff below.
Not seeing any objections thus far. If there are none by (say) the
end of this week, I would be happy to queue a patch for the 4.16
merge window. That should give ample opportunity for further review
and testing.
Thanx, Paul
> Thanks,
>
> Will
>
> --->8
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index e95a2631e545..c4ee9d6d8f2d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> __read_once_size(&(x), __u.__c, sizeof(x)); \
> else \
> __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
> + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
> __u.__val; \
> })
> #define READ_ONCE(x) __READ_ONCE(x, 1)
> @@ -620,7 +621,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> ({ \
> typeof(p) _________p1 = READ_ONCE(p); \
> typeof(*(p)) *___typecheck_p __maybe_unused; \
> - smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> (_________p1); \
> })
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-10-03 19:11 ` Paul E. McKenney
@ 2017-10-05 16:31 ` Will Deacon
2017-10-05 19:22 ` Paul E. McKenney
2017-10-05 19:31 ` Andrea Parri
0 siblings, 2 replies; 13+ messages in thread
From: Will Deacon @ 2017-10-05 16:31 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
Hi Paul,
On Tue, Oct 03, 2017 at 12:11:10PM -0700, Paul E. McKenney wrote:
> On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote:
> > On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> > > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > > > Ok, but where does that leave us wrt my initial proposal of moving
> > > > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > > > lockless_dereference?
> > > >
> > > > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > > > able to give the diff below a spin and see whether there's a measurable
> > > > performance impact?
> > >
> > > This will be a sensitive test. The smp_read_barrier_depends() can be
> > > removed from lockless_dereference(). Without this removal Alpha will
> > > get two memory barriers from rcu_dereference() and friends.
> >
> > Oh yes, good point. I was trying to keep the diff simple, but you're
> > right that this is packing too many barriers. Fixed diff below.
>
> Not seeing any objections thus far. If there are none by (say) the
> end of this week, I would be happy to queue a patch for the 4.16
> merge window. That should give ample opportunity for further review
> and testing.
Ok, full patch below.
Will
--->8
From 15956d0cc6b37208d8542b1858a8d8b64227acf4 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Thu, 5 Oct 2017 16:57:36 +0100
Subject: [PATCH] locking/barriers: Kill lockless_dereference
lockless_dereference is a nice idea, but it's gained little traction in
kernel code since it's introduction three years ago. This is partly
because it's a pain to type, but also because using READ_ONCE instead
will work correctly on all architectures apart from Alpha, which is a
fully supported but somewhat niche architecture these days.
This patch moves smp_read_barrier_depends() (a NOP on all architectures
other than Alpha) from lockless_dereference into READ_ONCE, converts
the few actual users over to READ_ONCE and then finally removes
lockless_dereference altogether.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
Documentation/memory-barriers.txt | 12 ------------
.../translations/ko_KR/memory-barriers.txt | 12 ------------
arch/x86/events/core.c | 2 +-
arch/x86/include/asm/mmu_context.h | 4 ++--
arch/x86/kernel/ldt.c | 2 +-
drivers/md/dm-mpath.c | 20 ++++++++++----------
fs/dcache.c | 4 ++--
fs/overlayfs/ovl_entry.h | 2 +-
fs/overlayfs/readdir.c | 2 +-
include/linux/compiler.h | 21 +--------------------
include/linux/rculist.h | 4 ++--
include/linux/rcupdate.h | 4 ++--
kernel/events/core.c | 4 ++--
kernel/seccomp.c | 2 +-
kernel/task_work.c | 2 +-
mm/slab.h | 2 +-
16 files changed, 28 insertions(+), 71 deletions(-)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index b759a60624fd..470a682f3fa4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1886,18 +1886,6 @@ There are some more advanced barrier functions:
See Documentation/atomic_{t,bitops}.txt for more information.
- (*) lockless_dereference();
-
- This can be thought of as a pointer-fetch wrapper around the
- smp_read_barrier_depends() data-dependency barrier.
-
- This is also similar to rcu_dereference(), but in cases where
- object lifetime is handled by some mechanism other than RCU, for
- example, when the objects removed only when the system goes down.
- In addition, lockless_dereference() is used in some data structures
- that can be used both with and without RCU.
-
-
(*) dma_wmb();
(*) dma_rmb();
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index a7a813258013..ec3b46e27b7a 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -1858,18 +1858,6 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
참고하세요.
- (*) lockless_dereference();
-
- 이 함수는 smp_read_barrier_depends() 데이터 의존성 배리어를 사용하는
- 포인터 읽어오기 래퍼(wrapper) 함수로 생각될 수 있습니다.
-
- 객체의 라이프타임이 RCU 외의 메커니즘으로 관리된다는 점을 제외하면
- rcu_dereference() 와도 유사한데, 예를 들면 객체가 시스템이 꺼질 때에만
- 제거되는 경우 등입니다. 또한, lockless_dereference() 은 RCU 와 함께
- 사용될수도, RCU 없이 사용될 수도 있는 일부 데이터 구조에 사용되고
- 있습니다.
-
-
(*) dma_wmb();
(*) dma_rmb();
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 80534d3c2480..589af1eec7c1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment)
struct ldt_struct *ldt;
/* IRQs are off, so this synchronizes with smp_store_release */
- ldt = lockless_dereference(current->active_mm->context.ldt);
+ ldt = READ_ONCE(current->active_mm->context.ldt);
if (!ldt || idx >= ldt->nr_entries)
return 0;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c120b5db178a..9037a4e546e8 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -72,8 +72,8 @@ static inline void load_mm_ldt(struct mm_struct *mm)
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
- /* lockless_dereference synchronizes with smp_store_release */
- ldt = lockless_dereference(mm->context.ldt);
+ /* READ_ONCE synchronizes with smp_store_release */
+ ldt = READ_ONCE(mm->context.ldt);
/*
* Any change to mm->context.ldt is followed by an IPI to all
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index f0e64db18ac8..0a21390642c4 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -101,7 +101,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt)
static void install_ldt(struct mm_struct *current_mm,
struct ldt_struct *ldt)
{
- /* Synchronizes with lockless_dereference in load_mm_ldt. */
+ /* Synchronizes with READ_ONCE in load_mm_ldt. */
smp_store_release(¤t_mm->context.ldt, ldt);
/* Activate the LDT for all CPUs using current_mm. */
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 11f273d2f018..3f88c9d32f7e 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m,
pgpath = path_to_pgpath(path);
- if (unlikely(lockless_dereference(m->current_pg) != pg)) {
+ if (unlikely(READ_ONCE(m->current_pg) != pg)) {
/* Only update current_pgpath if pg changed */
spin_lock_irqsave(&m->lock, flags);
m->current_pgpath = pgpath;
@@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
}
/* Were we instructed to switch PG? */
- if (lockless_dereference(m->next_pg)) {
+ if (READ_ONCE(m->next_pg)) {
spin_lock_irqsave(&m->lock, flags);
pg = m->next_pg;
if (!pg) {
@@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
/* Don't change PG until it has no remaining paths */
check_current_pg:
- pg = lockless_dereference(m->current_pg);
+ pg = READ_ONCE(m->current_pg);
if (pg) {
pgpath = choose_path_in_pg(m, pg, nr_bytes);
if (!IS_ERR_OR_NULL(pgpath))
@@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
struct request *clone;
/* Do we need to select a new pgpath? */
- pgpath = lockless_dereference(m->current_pgpath);
+ pgpath = READ_ONCE(m->current_pgpath);
if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags))
pgpath = choose_pgpath(m, nr_bytes);
@@ -535,7 +535,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m
bool queue_io;
/* Do we need to select a new pgpath? */
- pgpath = lockless_dereference(m->current_pgpath);
+ pgpath = READ_ONCE(m->current_pgpath);
queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags);
if (!pgpath || !queue_io)
pgpath = choose_pgpath(m, nr_bytes);
@@ -1804,7 +1804,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
struct pgpath *current_pgpath;
int r;
- current_pgpath = lockless_dereference(m->current_pgpath);
+ current_pgpath = READ_ONCE(m->current_pgpath);
if (!current_pgpath)
current_pgpath = choose_pgpath(m, 0);
@@ -1826,7 +1826,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
}
if (r == -ENOTCONN) {
- if (!lockless_dereference(m->current_pg)) {
+ if (!READ_ONCE(m->current_pg)) {
/* Path status changed, redo selection */
(void) choose_pgpath(m, 0);
}
@@ -1895,9 +1895,9 @@ static int multipath_busy(struct dm_target *ti)
return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED);
/* Guess which priority_group will be used at next mapping time */
- pg = lockless_dereference(m->current_pg);
- next_pg = lockless_dereference(m->next_pg);
- if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg))
+ pg = READ_ONCE(m->current_pg);
+ next_pg = READ_ONCE(m->next_pg);
+ if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg))
pg = next_pg;
if (!pg) {
diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..34c852af215c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
{
/*
* Be careful about RCU walk racing with rename:
- * use 'lockless_dereference' to fetch the name pointer.
+ * use 'READ_ONCE' to fetch the name pointer.
*
* NOTE! Even if a rename will mean that the length
* was not loaded atomically, we don't care. The
@@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
* early because the data cannot match (there can
* be no NUL in the ct/tcount data)
*/
- const unsigned char *cs = lockless_dereference(dentry->d_name.name);
+ const unsigned char *cs = READ_ONCE(dentry->d_name.name);
return dentry_string_cmp(cs, ct, tcount);
}
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 878a750986dd..0f6809fa6628 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -74,5 +74,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode)
static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi)
{
- return lockless_dereference(oi->__upperdentry);
+ return READ_ONCE(oi->__upperdentry);
}
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 62e9b22a2077..0b389d330613 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -754,7 +754,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) {
struct inode *inode = file_inode(file);
- realfile = lockless_dereference(od->upperfile);
+ realfile = READ_ONCE(od->upperfile);
if (!realfile) {
struct path upperpath;
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..f260ff39f90f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
__read_once_size(&(x), __u.__c, sizeof(x)); \
else \
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
+ smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
__u.__val; \
})
#define READ_ONCE(x) __READ_ONCE(x, 1)
@@ -604,24 +605,4 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
(volatile typeof(x) *)&(x); })
#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
-/**
- * lockless_dereference() - safely load a pointer for later dereference
- * @p: The pointer to load
- *
- * Similar to rcu_dereference(), but for situations where the pointed-to
- * object's lifetime is managed by something other than RCU. That
- * "something other" might be reference counting or simple immortality.
- *
- * The seemingly unused variable ___typecheck_p validates that @p is
- * indeed a pointer type by using a pointer to typeof(*p) as the type.
- * Taking a pointer to typeof(*p) again is needed in case p is void *.
- */
-#define lockless_dereference(p) \
-({ \
- typeof(p) _________p1 = READ_ONCE(p); \
- typeof(*(p)) *___typecheck_p __maybe_unused; \
- smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
- (_________p1); \
-})
-
#endif /* __LINUX_COMPILER_H */
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index b1fd8bf85fdc..3a2bb7d8ed4d 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -274,7 +274,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
*/
#define list_entry_rcu(ptr, type, member) \
- container_of(lockless_dereference(ptr), type, member)
+ container_of(READ_ONCE(ptr), type, member)
/**
* Where are list_empty_rcu() and list_first_entry_rcu()?
@@ -367,7 +367,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* example is when items are added to the list, but never deleted.
*/
#define list_entry_lockless(ptr, type, member) \
- container_of((typeof(ptr))lockless_dereference(ptr), type, member)
+ container_of((typeof(ptr))READ_ONCE(ptr), type, member)
/**
* list_for_each_entry_lockless - iterate over rcu list of given type
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de50d8a4cf41..380a3aeb09d7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { }
#define __rcu_dereference_check(p, c, space) \
({ \
/* Dependency order vs. p above. */ \
- typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
+ typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(________p1)); \
@@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { }
#define rcu_dereference_raw(p) \
({ \
/* Dependency order vs. p above. */ \
- typeof(p) ________p1 = lockless_dereference(p); \
+ typeof(p) ________p1 = READ_ONCE(p); \
((typeof(*p) __force __kernel *)(________p1)); \
})
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6bc21e202ae4..417812ce0099 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4231,7 +4231,7 @@ static void perf_remove_from_owner(struct perf_event *event)
* indeed free this event, otherwise we need to serialize on
* owner->perf_event_mutex.
*/
- owner = lockless_dereference(event->owner);
+ owner = READ_ONCE(event->owner);
if (owner) {
/*
* Since delayed_put_task_struct() also drops the last
@@ -4328,7 +4328,7 @@ int perf_event_release_kernel(struct perf_event *event)
* Cannot change, child events are not migrated, see the
* comment with perf_event_ctx_lock_nested().
*/
- ctx = lockless_dereference(child->ctx);
+ ctx = READ_ONCE(child->ctx);
/*
* Since child_mutex nests inside ctx::mutex, we must jump
* through hoops. We start by grabbing a reference on the ctx.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index bb3a38005b9c..1daa8b61a268 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -189,7 +189,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
u32 ret = SECCOMP_RET_ALLOW;
/* Make sure cross-thread synced filter points somewhere sane. */
struct seccomp_filter *f =
- lockless_dereference(current->seccomp.filter);
+ READ_ONCE(current->seccomp.filter);
/* Ensure unexpected behavior doesn't result in failing open. */
if (unlikely(WARN_ON(f == NULL)))
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 836a72a66fba..9a9f262fc53d 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -67,7 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
* we raced with task_work_run(), *pprev == NULL/exited.
*/
raw_spin_lock_irqsave(&task->pi_lock, flags);
- while ((work = lockless_dereference(*pprev))) {
+ while ((work = READ_ONCE(*pprev))) {
if (work->func != func)
pprev = &work->next;
else if (cmpxchg(pprev, work, work->next) == work)
diff --git a/mm/slab.h b/mm/slab.h
index 073362816acc..8894f811a89d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -258,7 +258,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx)
* memcg_caches issues a write barrier to match this (see
* memcg_create_kmem_cache()).
*/
- cachep = lockless_dereference(arr->entries[idx]);
+ cachep = READ_ONCE(arr->entries[idx]);
rcu_read_unlock();
return cachep;
--
2.1.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-10-05 16:31 ` Will Deacon
@ 2017-10-05 19:22 ` Paul E. McKenney
2017-10-05 19:31 ` Andrea Parri
1 sibling, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2017-10-05 19:22 UTC (permalink / raw)
To: Will Deacon
Cc: Michael Cree, Peter Zijlstra, kirill.shutemov, linux-kernel,
ynorov, rruigrok, linux-arch, akpm, catalin.marinas, rth, ink,
mattst88, linux-alpha
On Thu, Oct 05, 2017 at 05:31:54PM +0100, Will Deacon wrote:
> Hi Paul,
>
> On Tue, Oct 03, 2017 at 12:11:10PM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote:
> > > On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > > > > Ok, but where does that leave us wrt my initial proposal of moving
> > > > > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > > > > lockless_dereference?
> > > > >
> > > > > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > > > > able to give the diff below a spin and see whether there's a measurable
> > > > > performance impact?
> > > >
> > > > This will be a sensitive test. The smp_read_barrier_depends() can be
> > > > removed from lockless_dereference(). Without this removal Alpha will
> > > > get two memory barriers from rcu_dereference() and friends.
> > >
> > > Oh yes, good point. I was trying to keep the diff simple, but you're
> > > right that this is packing too many barriers. Fixed diff below.
> >
> > Not seeing any objections thus far. If there are none by (say) the
> > end of this week, I would be happy to queue a patch for the 4.16
> > merge window. That should give ample opportunity for further review
> > and testing.
>
> Ok, full patch below.
Very good, applied for testing and review. I did have to adjust context
a bit for other -rcu commits, and the result is below. (Removed a single
"*" in a comment.)
Thanx, Paul
------------------------------------------------------------------------
commit 7e3675cc18bbf4d84f60bfc02ff563ae3764ad35
Author: Will Deacon <will.deacon@arm.com>
Date: Thu Oct 5 17:31:54 2017 +0100
locking/barriers: Kill lockless_dereference
lockless_dereference is a nice idea, but it's gained little traction in
kernel code since it's introduction three years ago. This is partly
because it's a pain to type, but also because using READ_ONCE instead
will work correctly on all architectures apart from Alpha, which is a
fully supported but somewhat niche architecture these days.
This patch moves smp_read_barrier_depends() (a NOP on all architectures
other than Alpha) from lockless_dereference into READ_ONCE, converts
the few actual users over to READ_ONCE and then finally removes
lockless_dereference altogether.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 519940ec767f..479ecec80593 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1880,18 +1880,6 @@ There are some more advanced barrier functions:
See Documentation/atomic_{t,bitops}.txt for more information.
- (*) lockless_dereference();
-
- This can be thought of as a pointer-fetch wrapper around the
- smp_read_barrier_depends() data-dependency barrier.
-
- This is also similar to rcu_dereference(), but in cases where
- object lifetime is handled by some mechanism other than RCU, for
- example, when the objects removed only when the system goes down.
- In addition, lockless_dereference() is used in some data structures
- that can be used both with and without RCU.
-
-
(*) dma_wmb();
(*) dma_rmb();
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index a7a813258013..ec3b46e27b7a 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -1858,18 +1858,6 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
참고하세요.
- (*) lockless_dereference();
-
- 이 함수는 smp_read_barrier_depends() 데이터 의존성 배리어를 사용하는
- 포인터 읽어오기 래퍼(wrapper) 함수로 생각될 수 있습니다.
-
- 객체의 라이프타임이 RCU 외의 메커니즘으로 관리된다는 점을 제외하면
- rcu_dereference() 와도 유사한데, 예를 들면 객체가 시스템이 꺼질 때에만
- 제거되는 경우 등입니다. 또한, lockless_dereference() 은 RCU 와 함께
- 사용될수도, RCU 없이 사용될 수도 있는 일부 데이터 구조에 사용되고
- 있습니다.
-
-
(*) dma_wmb();
(*) dma_rmb();
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 80534d3c2480..589af1eec7c1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment)
struct ldt_struct *ldt;
/* IRQs are off, so this synchronizes with smp_store_release */
- ldt = lockless_dereference(current->active_mm->context.ldt);
+ ldt = READ_ONCE(current->active_mm->context.ldt);
if (!ldt || idx >= ldt->nr_entries)
return 0;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c120b5db178a..9037a4e546e8 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -72,8 +72,8 @@ static inline void load_mm_ldt(struct mm_struct *mm)
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
- /* lockless_dereference synchronizes with smp_store_release */
- ldt = lockless_dereference(mm->context.ldt);
+ /* READ_ONCE synchronizes with smp_store_release */
+ ldt = READ_ONCE(mm->context.ldt);
/*
* Any change to mm->context.ldt is followed by an IPI to all
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index f0e64db18ac8..0a21390642c4 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -101,7 +101,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt)
static void install_ldt(struct mm_struct *current_mm,
struct ldt_struct *ldt)
{
- /* Synchronizes with lockless_dereference in load_mm_ldt. */
+ /* Synchronizes with READ_ONCE in load_mm_ldt. */
smp_store_release(¤t_mm->context.ldt, ldt);
/* Activate the LDT for all CPUs using current_mm. */
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 11f273d2f018..3f88c9d32f7e 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m,
pgpath = path_to_pgpath(path);
- if (unlikely(lockless_dereference(m->current_pg) != pg)) {
+ if (unlikely(READ_ONCE(m->current_pg) != pg)) {
/* Only update current_pgpath if pg changed */
spin_lock_irqsave(&m->lock, flags);
m->current_pgpath = pgpath;
@@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
}
/* Were we instructed to switch PG? */
- if (lockless_dereference(m->next_pg)) {
+ if (READ_ONCE(m->next_pg)) {
spin_lock_irqsave(&m->lock, flags);
pg = m->next_pg;
if (!pg) {
@@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
/* Don't change PG until it has no remaining paths */
check_current_pg:
- pg = lockless_dereference(m->current_pg);
+ pg = READ_ONCE(m->current_pg);
if (pg) {
pgpath = choose_path_in_pg(m, pg, nr_bytes);
if (!IS_ERR_OR_NULL(pgpath))
@@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
struct request *clone;
/* Do we need to select a new pgpath? */
- pgpath = lockless_dereference(m->current_pgpath);
+ pgpath = READ_ONCE(m->current_pgpath);
if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags))
pgpath = choose_pgpath(m, nr_bytes);
@@ -535,7 +535,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m
bool queue_io;
/* Do we need to select a new pgpath? */
- pgpath = lockless_dereference(m->current_pgpath);
+ pgpath = READ_ONCE(m->current_pgpath);
queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags);
if (!pgpath || !queue_io)
pgpath = choose_pgpath(m, nr_bytes);
@@ -1804,7 +1804,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
struct pgpath *current_pgpath;
int r;
- current_pgpath = lockless_dereference(m->current_pgpath);
+ current_pgpath = READ_ONCE(m->current_pgpath);
if (!current_pgpath)
current_pgpath = choose_pgpath(m, 0);
@@ -1826,7 +1826,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
}
if (r == -ENOTCONN) {
- if (!lockless_dereference(m->current_pg)) {
+ if (!READ_ONCE(m->current_pg)) {
/* Path status changed, redo selection */
(void) choose_pgpath(m, 0);
}
@@ -1895,9 +1895,9 @@ static int multipath_busy(struct dm_target *ti)
return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED);
/* Guess which priority_group will be used at next mapping time */
- pg = lockless_dereference(m->current_pg);
- next_pg = lockless_dereference(m->next_pg);
- if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg))
+ pg = READ_ONCE(m->current_pg);
+ next_pg = READ_ONCE(m->next_pg);
+ if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg))
pg = next_pg;
if (!pg) {
diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..34c852af215c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
{
/*
* Be careful about RCU walk racing with rename:
- * use 'lockless_dereference' to fetch the name pointer.
+ * use 'READ_ONCE' to fetch the name pointer.
*
* NOTE! Even if a rename will mean that the length
* was not loaded atomically, we don't care. The
@@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
* early because the data cannot match (there can
* be no NUL in the ct/tcount data)
*/
- const unsigned char *cs = lockless_dereference(dentry->d_name.name);
+ const unsigned char *cs = READ_ONCE(dentry->d_name.name);
return dentry_string_cmp(cs, ct, tcount);
}
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 878a750986dd..0f6809fa6628 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -74,5 +74,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode)
static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi)
{
- return lockless_dereference(oi->__upperdentry);
+ return READ_ONCE(oi->__upperdentry);
}
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 62e9b22a2077..0b389d330613 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -754,7 +754,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) {
struct inode *inode = file_inode(file);
- realfile = lockless_dereference(od->upperfile);
+ realfile = READ_ONCE(od->upperfile);
if (!realfile) {
struct path upperpath;
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..f260ff39f90f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
__read_once_size(&(x), __u.__c, sizeof(x)); \
else \
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
+ smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
__u.__val; \
})
#define READ_ONCE(x) __READ_ONCE(x, 1)
@@ -604,24 +605,4 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
(volatile typeof(x) *)&(x); })
#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
-/**
- * lockless_dereference() - safely load a pointer for later dereference
- * @p: The pointer to load
- *
- * Similar to rcu_dereference(), but for situations where the pointed-to
- * object's lifetime is managed by something other than RCU. That
- * "something other" might be reference counting or simple immortality.
- *
- * The seemingly unused variable ___typecheck_p validates that @p is
- * indeed a pointer type by using a pointer to typeof(*p) as the type.
- * Taking a pointer to typeof(*p) again is needed in case p is void *.
- */
-#define lockless_dereference(p) \
-({ \
- typeof(p) _________p1 = READ_ONCE(p); \
- typeof(*(p)) *___typecheck_p __maybe_unused; \
- smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
- (_________p1); \
-})
-
#endif /* __LINUX_COMPILER_H */
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 2bea1d5e9930..5ed091c064b2 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -274,7 +274,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
*/
#define list_entry_rcu(ptr, type, member) \
- container_of(lockless_dereference(ptr), type, member)
+ container_of(READ_ONCE(ptr), type, member)
/*
* Where are list_empty_rcu() and list_first_entry_rcu()?
@@ -367,7 +367,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* example is when items are added to the list, but never deleted.
*/
#define list_entry_lockless(ptr, type, member) \
- container_of((typeof(ptr))lockless_dereference(ptr), type, member)
+ container_of((typeof(ptr))READ_ONCE(ptr), type, member)
/**
* list_for_each_entry_lockless - iterate over rcu list of given type
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1a9f70d44af9..a6ddc42f87a5 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { }
#define __rcu_dereference_check(p, c, space) \
({ \
/* Dependency order vs. p above. */ \
- typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
+ typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(________p1)); \
@@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { }
#define rcu_dereference_raw(p) \
({ \
/* Dependency order vs. p above. */ \
- typeof(p) ________p1 = lockless_dereference(p); \
+ typeof(p) ________p1 = READ_ONCE(p); \
((typeof(*p) __force __kernel *)(________p1)); \
})
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6bc21e202ae4..417812ce0099 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4231,7 +4231,7 @@ static void perf_remove_from_owner(struct perf_event *event)
* indeed free this event, otherwise we need to serialize on
* owner->perf_event_mutex.
*/
- owner = lockless_dereference(event->owner);
+ owner = READ_ONCE(event->owner);
if (owner) {
/*
* Since delayed_put_task_struct() also drops the last
@@ -4328,7 +4328,7 @@ int perf_event_release_kernel(struct perf_event *event)
* Cannot change, child events are not migrated, see the
* comment with perf_event_ctx_lock_nested().
*/
- ctx = lockless_dereference(child->ctx);
+ ctx = READ_ONCE(child->ctx);
/*
* Since child_mutex nests inside ctx::mutex, we must jump
* through hoops. We start by grabbing a reference on the ctx.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index bb3a38005b9c..1daa8b61a268 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -189,7 +189,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
u32 ret = SECCOMP_RET_ALLOW;
/* Make sure cross-thread synced filter points somewhere sane. */
struct seccomp_filter *f =
- lockless_dereference(current->seccomp.filter);
+ READ_ONCE(current->seccomp.filter);
/* Ensure unexpected behavior doesn't result in failing open. */
if (unlikely(WARN_ON(f == NULL)))
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 836a72a66fba..9a9f262fc53d 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -67,7 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
* we raced with task_work_run(), *pprev == NULL/exited.
*/
raw_spin_lock_irqsave(&task->pi_lock, flags);
- while ((work = lockless_dereference(*pprev))) {
+ while ((work = READ_ONCE(*pprev))) {
if (work->func != func)
pprev = &work->next;
else if (cmpxchg(pprev, work, work->next) == work)
diff --git a/mm/slab.h b/mm/slab.h
index 073362816acc..8894f811a89d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -258,7 +258,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx)
* memcg_caches issues a write barrier to match this (see
* memcg_create_kmem_cache()).
*/
- cachep = lockless_dereference(arr->entries[idx]);
+ cachep = READ_ONCE(arr->entries[idx]);
rcu_read_unlock();
return cachep;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-10-05 16:31 ` Will Deacon
2017-10-05 19:22 ` Paul E. McKenney
@ 2017-10-05 19:31 ` Andrea Parri
2017-10-05 20:09 ` Paul E. McKenney
1 sibling, 1 reply; 13+ messages in thread
From: Andrea Parri @ 2017-10-05 19:31 UTC (permalink / raw)
To: Will Deacon
Cc: Paul E. McKenney, Michael Cree, Peter Zijlstra, kirill.shutemov,
linux-kernel, ynorov, rruigrok, linux-arch, akpm, catalin.marinas,
rth, ink, mattst88, linux-alpha
Hi Will,
none of my comments below represent objections to this patch, but
let me remark:
On Thu, Oct 05, 2017 at 05:31:54PM +0100, Will Deacon wrote:
> Hi Paul,
>
> On Tue, Oct 03, 2017 at 12:11:10PM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote:
> > > On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > > > > Ok, but where does that leave us wrt my initial proposal of moving
> > > > > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > > > > lockless_dereference?
> > > > >
> > > > > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > > > > able to give the diff below a spin and see whether there's a measurable
> > > > > performance impact?
> > > >
> > > > This will be a sensitive test. The smp_read_barrier_depends() can be
> > > > removed from lockless_dereference(). Without this removal Alpha will
> > > > get two memory barriers from rcu_dereference() and friends.
> > >
> > > Oh yes, good point. I was trying to keep the diff simple, but you're
> > > right that this is packing too many barriers. Fixed diff below.
> >
> > Not seeing any objections thus far. If there are none by (say) the
> > end of this week, I would be happy to queue a patch for the 4.16
> > merge window. That should give ample opportunity for further review
> > and testing.
>
> Ok, full patch below.
>
> Will
>
> --->8
>
> From 15956d0cc6b37208d8542b1858a8d8b64227acf4 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.deacon@arm.com>
> Date: Thu, 5 Oct 2017 16:57:36 +0100
> Subject: [PATCH] locking/barriers: Kill lockless_dereference
>
> lockless_dereference is a nice idea, but it's gained little traction in
> kernel code since it's introduction three years ago. This is partly
> because it's a pain to type, but also because using READ_ONCE instead
> will work correctly on all architectures apart from Alpha, which is a
> fully supported but somewhat niche architecture these days.
lockless_dereference might be a mouthful, but it does (explicitly)
say/remark: "Yep, we are relying on the following address dep. to
be "in strong-ppo" ".
Such information will be lost or, at least, not immediately clear
by just reading a READ_ONCE(). (And Yes, this information is only
relevant when we "include" Alpha in the picture/analysis.)
>
> This patch moves smp_read_barrier_depends() (a NOP on all architectures
> other than Alpha) from lockless_dereference into READ_ONCE, converts
> the few actual users over to READ_ONCE and then finally removes
> lockless_dereference altogether.
Notice that several "potential users" of lockless_dereference are
currently hidden in other call sites for smp_read_barrier_depends
(i.e., cases where this barrier is not called from within a lock-
less or an RCU dereference).
Some of these usages (e.g.,
include/linux/percpu-refcount.h:__ref_is_percpu,
mm/ksm.c:get_ksm_page,
security/keys/keyring.c:search_nested_keyrings )
precedes this barrier with a READ_ONCE; others (e.g.,
arch/alpha/include/asm/pgtable.h:pmd_offset,
net/ipv4/netfilter/arp_tables.c:arpt_do_table
kernel/kernel/events/uprobes.c:get_trampiline_vaddr )
with a plain read.
There also appear to be cases where the barrier is preceded by an
ACCESS_ONCE (c.f, fs/dcache.c:prepend_name) or by an xchg_release
(c.f., kernel/locking/qspinlock.c:queued_spin_lock_slowpath), and
it would not be difficult to imagine/create different usages.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
I understand that we all agree we're missing a Tested-by here ;-).
Andrea
> ---
> Documentation/memory-barriers.txt | 12 ------------
> .../translations/ko_KR/memory-barriers.txt | 12 ------------
> arch/x86/events/core.c | 2 +-
> arch/x86/include/asm/mmu_context.h | 4 ++--
> arch/x86/kernel/ldt.c | 2 +-
> drivers/md/dm-mpath.c | 20 ++++++++++----------
> fs/dcache.c | 4 ++--
> fs/overlayfs/ovl_entry.h | 2 +-
> fs/overlayfs/readdir.c | 2 +-
> include/linux/compiler.h | 21 +--------------------
> include/linux/rculist.h | 4 ++--
> include/linux/rcupdate.h | 4 ++--
> kernel/events/core.c | 4 ++--
> kernel/seccomp.c | 2 +-
> kernel/task_work.c | 2 +-
> mm/slab.h | 2 +-
> 16 files changed, 28 insertions(+), 71 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index b759a60624fd..470a682f3fa4 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1886,18 +1886,6 @@ There are some more advanced barrier functions:
> See Documentation/atomic_{t,bitops}.txt for more information.
>
>
> - (*) lockless_dereference();
> -
> - This can be thought of as a pointer-fetch wrapper around the
> - smp_read_barrier_depends() data-dependency barrier.
> -
> - This is also similar to rcu_dereference(), but in cases where
> - object lifetime is handled by some mechanism other than RCU, for
> - example, when the objects removed only when the system goes down.
> - In addition, lockless_dereference() is used in some data structures
> - that can be used both with and without RCU.
> -
> -
> (*) dma_wmb();
> (*) dma_rmb();
>
> diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
> index a7a813258013..ec3b46e27b7a 100644
> --- a/Documentation/translations/ko_KR/memory-barriers.txt
> +++ b/Documentation/translations/ko_KR/memory-barriers.txt
> @@ -1858,18 +1858,6 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
> 참고하세요.
>
>
> - (*) lockless_dereference();
> -
> - 이 함수는 smp_read_barrier_depends() 데이터 의존성 배리어를 사용하는
> - 포인터 읽어오기 래퍼(wrapper) 함수로 생각될 수 있습니다.
> -
> - 객체의 라이프타임이 RCU 외의 메커니즘으로 관리된다는 점을 제외하면
> - rcu_dereference() 와도 유사한데, 예를 들면 객체가 시스템이 꺼질 때에만
> - 제거되는 경우 등입니다. 또한, lockless_dereference() 은 RCU 와 함께
> - 사용될수도, RCU 없이 사용될 수도 있는 일부 데이터 구조에 사용되고
> - 있습니다.
> -
> -
> (*) dma_wmb();
> (*) dma_rmb();
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 80534d3c2480..589af1eec7c1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment)
> struct ldt_struct *ldt;
>
> /* IRQs are off, so this synchronizes with smp_store_release */
> - ldt = lockless_dereference(current->active_mm->context.ldt);
> + ldt = READ_ONCE(current->active_mm->context.ldt);
> if (!ldt || idx >= ldt->nr_entries)
> return 0;
>
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index c120b5db178a..9037a4e546e8 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -72,8 +72,8 @@ static inline void load_mm_ldt(struct mm_struct *mm)
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct ldt_struct *ldt;
>
> - /* lockless_dereference synchronizes with smp_store_release */
> - ldt = lockless_dereference(mm->context.ldt);
> + /* READ_ONCE synchronizes with smp_store_release */
> + ldt = READ_ONCE(mm->context.ldt);
>
> /*
> * Any change to mm->context.ldt is followed by an IPI to all
> diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
> index f0e64db18ac8..0a21390642c4 100644
> --- a/arch/x86/kernel/ldt.c
> +++ b/arch/x86/kernel/ldt.c
> @@ -101,7 +101,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt)
> static void install_ldt(struct mm_struct *current_mm,
> struct ldt_struct *ldt)
> {
> - /* Synchronizes with lockless_dereference in load_mm_ldt. */
> + /* Synchronizes with READ_ONCE in load_mm_ldt. */
> smp_store_release(¤t_mm->context.ldt, ldt);
>
> /* Activate the LDT for all CPUs using current_mm. */
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 11f273d2f018..3f88c9d32f7e 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m,
>
> pgpath = path_to_pgpath(path);
>
> - if (unlikely(lockless_dereference(m->current_pg) != pg)) {
> + if (unlikely(READ_ONCE(m->current_pg) != pg)) {
> /* Only update current_pgpath if pg changed */
> spin_lock_irqsave(&m->lock, flags);
> m->current_pgpath = pgpath;
> @@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
> }
>
> /* Were we instructed to switch PG? */
> - if (lockless_dereference(m->next_pg)) {
> + if (READ_ONCE(m->next_pg)) {
> spin_lock_irqsave(&m->lock, flags);
> pg = m->next_pg;
> if (!pg) {
> @@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
>
> /* Don't change PG until it has no remaining paths */
> check_current_pg:
> - pg = lockless_dereference(m->current_pg);
> + pg = READ_ONCE(m->current_pg);
> if (pg) {
> pgpath = choose_path_in_pg(m, pg, nr_bytes);
> if (!IS_ERR_OR_NULL(pgpath))
> @@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> struct request *clone;
>
> /* Do we need to select a new pgpath? */
> - pgpath = lockless_dereference(m->current_pgpath);
> + pgpath = READ_ONCE(m->current_pgpath);
> if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags))
> pgpath = choose_pgpath(m, nr_bytes);
>
> @@ -535,7 +535,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m
> bool queue_io;
>
> /* Do we need to select a new pgpath? */
> - pgpath = lockless_dereference(m->current_pgpath);
> + pgpath = READ_ONCE(m->current_pgpath);
> queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags);
> if (!pgpath || !queue_io)
> pgpath = choose_pgpath(m, nr_bytes);
> @@ -1804,7 +1804,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
> struct pgpath *current_pgpath;
> int r;
>
> - current_pgpath = lockless_dereference(m->current_pgpath);
> + current_pgpath = READ_ONCE(m->current_pgpath);
> if (!current_pgpath)
> current_pgpath = choose_pgpath(m, 0);
>
> @@ -1826,7 +1826,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
> }
>
> if (r == -ENOTCONN) {
> - if (!lockless_dereference(m->current_pg)) {
> + if (!READ_ONCE(m->current_pg)) {
> /* Path status changed, redo selection */
> (void) choose_pgpath(m, 0);
> }
> @@ -1895,9 +1895,9 @@ static int multipath_busy(struct dm_target *ti)
> return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED);
>
> /* Guess which priority_group will be used at next mapping time */
> - pg = lockless_dereference(m->current_pg);
> - next_pg = lockless_dereference(m->next_pg);
> - if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg))
> + pg = READ_ONCE(m->current_pg);
> + next_pg = READ_ONCE(m->next_pg);
> + if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg))
> pg = next_pg;
>
> if (!pg) {
> diff --git a/fs/dcache.c b/fs/dcache.c
> index f90141387f01..34c852af215c 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
> {
> /*
> * Be careful about RCU walk racing with rename:
> - * use 'lockless_dereference' to fetch the name pointer.
> + * use 'READ_ONCE' to fetch the name pointer.
> *
> * NOTE! Even if a rename will mean that the length
> * was not loaded atomically, we don't care. The
> @@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
> * early because the data cannot match (there can
> * be no NUL in the ct/tcount data)
> */
> - const unsigned char *cs = lockless_dereference(dentry->d_name.name);
> + const unsigned char *cs = READ_ONCE(dentry->d_name.name);
>
> return dentry_string_cmp(cs, ct, tcount);
> }
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 878a750986dd..0f6809fa6628 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -74,5 +74,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode)
>
> static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi)
> {
> - return lockless_dereference(oi->__upperdentry);
> + return READ_ONCE(oi->__upperdentry);
> }
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 62e9b22a2077..0b389d330613 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -754,7 +754,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
> if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) {
> struct inode *inode = file_inode(file);
>
> - realfile = lockless_dereference(od->upperfile);
> + realfile = READ_ONCE(od->upperfile);
> if (!realfile) {
> struct path upperpath;
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index e95a2631e545..f260ff39f90f 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> __read_once_size(&(x), __u.__c, sizeof(x)); \
> else \
> __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
> + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
> __u.__val; \
> })
> #define READ_ONCE(x) __READ_ONCE(x, 1)
> @@ -604,24 +605,4 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> (volatile typeof(x) *)&(x); })
> #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
>
> -/**
> - * lockless_dereference() - safely load a pointer for later dereference
> - * @p: The pointer to load
> - *
> - * Similar to rcu_dereference(), but for situations where the pointed-to
> - * object's lifetime is managed by something other than RCU. That
> - * "something other" might be reference counting or simple immortality.
> - *
> - * The seemingly unused variable ___typecheck_p validates that @p is
> - * indeed a pointer type by using a pointer to typeof(*p) as the type.
> - * Taking a pointer to typeof(*p) again is needed in case p is void *.
> - */
> -#define lockless_dereference(p) \
> -({ \
> - typeof(p) _________p1 = READ_ONCE(p); \
> - typeof(*(p)) *___typecheck_p __maybe_unused; \
> - smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> - (_________p1); \
> -})
> -
> #endif /* __LINUX_COMPILER_H */
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index b1fd8bf85fdc..3a2bb7d8ed4d 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -274,7 +274,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> */
> #define list_entry_rcu(ptr, type, member) \
> - container_of(lockless_dereference(ptr), type, member)
> + container_of(READ_ONCE(ptr), type, member)
>
> /**
> * Where are list_empty_rcu() and list_first_entry_rcu()?
> @@ -367,7 +367,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> * example is when items are added to the list, but never deleted.
> */
> #define list_entry_lockless(ptr, type, member) \
> - container_of((typeof(ptr))lockless_dereference(ptr), type, member)
> + container_of((typeof(ptr))READ_ONCE(ptr), type, member)
>
> /**
> * list_for_each_entry_lockless - iterate over rcu list of given type
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index de50d8a4cf41..380a3aeb09d7 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { }
> #define __rcu_dereference_check(p, c, space) \
> ({ \
> /* Dependency order vs. p above. */ \
> - typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
> + typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
> RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
> rcu_dereference_sparse(p, space); \
> ((typeof(*p) __force __kernel *)(________p1)); \
> @@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { }
> #define rcu_dereference_raw(p) \
> ({ \
> /* Dependency order vs. p above. */ \
> - typeof(p) ________p1 = lockless_dereference(p); \
> + typeof(p) ________p1 = READ_ONCE(p); \
> ((typeof(*p) __force __kernel *)(________p1)); \
> })
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 6bc21e202ae4..417812ce0099 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4231,7 +4231,7 @@ static void perf_remove_from_owner(struct perf_event *event)
> * indeed free this event, otherwise we need to serialize on
> * owner->perf_event_mutex.
> */
> - owner = lockless_dereference(event->owner);
> + owner = READ_ONCE(event->owner);
> if (owner) {
> /*
> * Since delayed_put_task_struct() also drops the last
> @@ -4328,7 +4328,7 @@ int perf_event_release_kernel(struct perf_event *event)
> * Cannot change, child events are not migrated, see the
> * comment with perf_event_ctx_lock_nested().
> */
> - ctx = lockless_dereference(child->ctx);
> + ctx = READ_ONCE(child->ctx);
> /*
> * Since child_mutex nests inside ctx::mutex, we must jump
> * through hoops. We start by grabbing a reference on the ctx.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index bb3a38005b9c..1daa8b61a268 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -189,7 +189,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> u32 ret = SECCOMP_RET_ALLOW;
> /* Make sure cross-thread synced filter points somewhere sane. */
> struct seccomp_filter *f =
> - lockless_dereference(current->seccomp.filter);
> + READ_ONCE(current->seccomp.filter);
>
> /* Ensure unexpected behavior doesn't result in failing open. */
> if (unlikely(WARN_ON(f == NULL)))
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 836a72a66fba..9a9f262fc53d 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -67,7 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
> * we raced with task_work_run(), *pprev == NULL/exited.
> */
> raw_spin_lock_irqsave(&task->pi_lock, flags);
> - while ((work = lockless_dereference(*pprev))) {
> + while ((work = READ_ONCE(*pprev))) {
> if (work->func != func)
> pprev = &work->next;
> else if (cmpxchg(pprev, work, work->next) == work)
> diff --git a/mm/slab.h b/mm/slab.h
> index 073362816acc..8894f811a89d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -258,7 +258,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx)
> * memcg_caches issues a write barrier to match this (see
> * memcg_create_kmem_cache()).
> */
> - cachep = lockless_dereference(arr->entries[idx]);
> + cachep = READ_ONCE(arr->entries[idx]);
> rcu_read_unlock();
>
> return cachep;
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
2017-10-05 19:31 ` Andrea Parri
@ 2017-10-05 20:09 ` Paul E. McKenney
0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2017-10-05 20:09 UTC (permalink / raw)
To: Andrea Parri
Cc: Will Deacon, Michael Cree, Peter Zijlstra, kirill.shutemov,
linux-kernel, ynorov, rruigrok, linux-arch, akpm, catalin.marinas,
rth, ink, mattst88, linux-alpha
On Thu, Oct 05, 2017 at 09:31:48PM +0200, Andrea Parri wrote:
> Hi Will,
>
> none of my comments below represent objections to this patch, but
> let me remark:
>
>
> On Thu, Oct 05, 2017 at 05:31:54PM +0100, Will Deacon wrote:
> > Hi Paul,
> >
> > On Tue, Oct 03, 2017 at 12:11:10PM -0700, Paul E. McKenney wrote:
> > > On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote:
> > > > On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote:
> > > > > > Ok, but where does that leave us wrt my initial proposal of moving
> > > > > > smp_read_barrier_depends() into READ_ONCE and getting rid of
> > > > > > lockless_dereference?
> > > > > >
> > > > > > Michael (or anybody else running mainline on SMP Alpha) -- would you be
> > > > > > able to give the diff below a spin and see whether there's a measurable
> > > > > > performance impact?
> > > > >
> > > > > This will be a sensitive test. The smp_read_barrier_depends() can be
> > > > > removed from lockless_dereference(). Without this removal Alpha will
> > > > > get two memory barriers from rcu_dereference() and friends.
> > > >
> > > > Oh yes, good point. I was trying to keep the diff simple, but you're
> > > > right that this is packing too many barriers. Fixed diff below.
> > >
> > > Not seeing any objections thus far. If there are none by (say) the
> > > end of this week, I would be happy to queue a patch for the 4.16
> > > merge window. That should give ample opportunity for further review
> > > and testing.
> >
> > Ok, full patch below.
> >
> > Will
> >
> > --->8
> >
> > From 15956d0cc6b37208d8542b1858a8d8b64227acf4 Mon Sep 17 00:00:00 2001
> > From: Will Deacon <will.deacon@arm.com>
> > Date: Thu, 5 Oct 2017 16:57:36 +0100
> > Subject: [PATCH] locking/barriers: Kill lockless_dereference
> >
> > lockless_dereference is a nice idea, but it's gained little traction in
> > kernel code since it's introduction three years ago. This is partly
> > because it's a pain to type, but also because using READ_ONCE instead
> > will work correctly on all architectures apart from Alpha, which is a
> > fully supported but somewhat niche architecture these days.
>
> lockless_dereference might be a mouthful, but it does (explicitly)
> say/remark: "Yep, we are relying on the following address dep. to
> be "in strong-ppo" ".
>
> Such information will be lost or, at least, not immediately clear
> by just reading a READ_ONCE(). (And Yes, this information is only
> relevant when we "include" Alpha in the picture/analysis.)
It is possible to argue that lockless_dereference() should remain for
this reason, even given READ_ONCE() containing smp_read_barrier_depends().
However, such arguments would be much more compelling if there were
tools that cared about the difference.
> > This patch moves smp_read_barrier_depends() (a NOP on all architectures
> > other than Alpha) from lockless_dereference into READ_ONCE, converts
> > the few actual users over to READ_ONCE and then finally removes
> > lockless_dereference altogether.
>
> Notice that several "potential users" of lockless_dereference are
> currently hidden in other call sites for smp_read_barrier_depends
> (i.e., cases where this barrier is not called from within a lock-
> less or an RCU dereference).
>
> Some of these usages (e.g.,
>
> include/linux/percpu-refcount.h:__ref_is_percpu,
> mm/ksm.c:get_ksm_page,
> security/keys/keyring.c:search_nested_keyrings )
>
> precedes this barrier with a READ_ONCE; others (e.g.,
>
> arch/alpha/include/asm/pgtable.h:pmd_offset,
> net/ipv4/netfilter/arp_tables.c:arpt_do_table
> kernel/kernel/events/uprobes.c:get_trampiline_vaddr )
>
> with a plain read.
I would welcome patches for the cases where smp_read_barrier_depends() is
preceded by READ_ONCE(). Perhaps the others should gain a READ_ONCE(),
and I suspect that they should, but ultimately that decision is in
the hands of the relevant maintainer, so any such patches need to be
separated and will need at least an ack from the relevant maintainers.
> There also appear to be cases where the barrier is preceded by an
> ACCESS_ONCE (c.f, fs/dcache.c:prepend_name) or by an xchg_release
> (c.f., kernel/locking/qspinlock.c:queued_spin_lock_slowpath), and
> it would not be difficult to imagine/create different usages.
It would indeed be good to replace ACCESS_ONCE() with READ_ONCE() or
with WRITE_ONCE() where this works. And yes, I agree that there are
other usage patterns possible.
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>
> I understand that we all agree we're missing a Tested-by here ;-).
Indeed, hence my "applied for testing and review". ;-)
Thanx, Paul
> Andrea
>
>
> > ---
> > Documentation/memory-barriers.txt | 12 ------------
> > .../translations/ko_KR/memory-barriers.txt | 12 ------------
> > arch/x86/events/core.c | 2 +-
> > arch/x86/include/asm/mmu_context.h | 4 ++--
> > arch/x86/kernel/ldt.c | 2 +-
> > drivers/md/dm-mpath.c | 20 ++++++++++----------
> > fs/dcache.c | 4 ++--
> > fs/overlayfs/ovl_entry.h | 2 +-
> > fs/overlayfs/readdir.c | 2 +-
> > include/linux/compiler.h | 21 +--------------------
> > include/linux/rculist.h | 4 ++--
> > include/linux/rcupdate.h | 4 ++--
> > kernel/events/core.c | 4 ++--
> > kernel/seccomp.c | 2 +-
> > kernel/task_work.c | 2 +-
> > mm/slab.h | 2 +-
> > 16 files changed, 28 insertions(+), 71 deletions(-)
> >
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index b759a60624fd..470a682f3fa4 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1886,18 +1886,6 @@ There are some more advanced barrier functions:
> > See Documentation/atomic_{t,bitops}.txt for more information.
> >
> >
> > - (*) lockless_dereference();
> > -
> > - This can be thought of as a pointer-fetch wrapper around the
> > - smp_read_barrier_depends() data-dependency barrier.
> > -
> > - This is also similar to rcu_dereference(), but in cases where
> > - object lifetime is handled by some mechanism other than RCU, for
> > - example, when the objects removed only when the system goes down.
> > - In addition, lockless_dereference() is used in some data structures
> > - that can be used both with and without RCU.
> > -
> > -
> > (*) dma_wmb();
> > (*) dma_rmb();
> >
> > diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
> > index a7a813258013..ec3b46e27b7a 100644
> > --- a/Documentation/translations/ko_KR/memory-barriers.txt
> > +++ b/Documentation/translations/ko_KR/memory-barriers.txt
> > @@ -1858,18 +1858,6 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
> > 참고하세요.
> >
> >
> > - (*) lockless_dereference();
> > -
> > - 이 함수는 smp_read_barrier_depends() 데이터 의존성 배리어를 사용하는
> > - 포인터 읽어오기 래퍼(wrapper) 함수로 생각될 수 있습니다.
> > -
> > - 객체의 라이프타임이 RCU 외의 메커니즘으로 관리된다는 점을 제외하면
> > - rcu_dereference() 와도 유사한데, 예를 들면 객체가 시스템이 꺼질 때에만
> > - 제거되는 경우 등입니다. 또한, lockless_dereference() 은 RCU 와 함께
> > - 사용될수도, RCU 없이 사용될 수도 있는 일부 데이터 구조에 사용되고
> > - 있습니다.
> > -
> > -
> > (*) dma_wmb();
> > (*) dma_rmb();
> >
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 80534d3c2480..589af1eec7c1 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment)
> > struct ldt_struct *ldt;
> >
> > /* IRQs are off, so this synchronizes with smp_store_release */
> > - ldt = lockless_dereference(current->active_mm->context.ldt);
> > + ldt = READ_ONCE(current->active_mm->context.ldt);
> > if (!ldt || idx >= ldt->nr_entries)
> > return 0;
> >
> > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> > index c120b5db178a..9037a4e546e8 100644
> > --- a/arch/x86/include/asm/mmu_context.h
> > +++ b/arch/x86/include/asm/mmu_context.h
> > @@ -72,8 +72,8 @@ static inline void load_mm_ldt(struct mm_struct *mm)
> > #ifdef CONFIG_MODIFY_LDT_SYSCALL
> > struct ldt_struct *ldt;
> >
> > - /* lockless_dereference synchronizes with smp_store_release */
> > - ldt = lockless_dereference(mm->context.ldt);
> > + /* READ_ONCE synchronizes with smp_store_release */
> > + ldt = READ_ONCE(mm->context.ldt);
> >
> > /*
> > * Any change to mm->context.ldt is followed by an IPI to all
> > diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
> > index f0e64db18ac8..0a21390642c4 100644
> > --- a/arch/x86/kernel/ldt.c
> > +++ b/arch/x86/kernel/ldt.c
> > @@ -101,7 +101,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt)
> > static void install_ldt(struct mm_struct *current_mm,
> > struct ldt_struct *ldt)
> > {
> > - /* Synchronizes with lockless_dereference in load_mm_ldt. */
> > + /* Synchronizes with READ_ONCE in load_mm_ldt. */
> > smp_store_release(¤t_mm->context.ldt, ldt);
> >
> > /* Activate the LDT for all CPUs using current_mm. */
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 11f273d2f018..3f88c9d32f7e 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m,
> >
> > pgpath = path_to_pgpath(path);
> >
> > - if (unlikely(lockless_dereference(m->current_pg) != pg)) {
> > + if (unlikely(READ_ONCE(m->current_pg) != pg)) {
> > /* Only update current_pgpath if pg changed */
> > spin_lock_irqsave(&m->lock, flags);
> > m->current_pgpath = pgpath;
> > @@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
> > }
> >
> > /* Were we instructed to switch PG? */
> > - if (lockless_dereference(m->next_pg)) {
> > + if (READ_ONCE(m->next_pg)) {
> > spin_lock_irqsave(&m->lock, flags);
> > pg = m->next_pg;
> > if (!pg) {
> > @@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
> >
> > /* Don't change PG until it has no remaining paths */
> > check_current_pg:
> > - pg = lockless_dereference(m->current_pg);
> > + pg = READ_ONCE(m->current_pg);
> > if (pg) {
> > pgpath = choose_path_in_pg(m, pg, nr_bytes);
> > if (!IS_ERR_OR_NULL(pgpath))
> > @@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> > struct request *clone;
> >
> > /* Do we need to select a new pgpath? */
> > - pgpath = lockless_dereference(m->current_pgpath);
> > + pgpath = READ_ONCE(m->current_pgpath);
> > if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags))
> > pgpath = choose_pgpath(m, nr_bytes);
> >
> > @@ -535,7 +535,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m
> > bool queue_io;
> >
> > /* Do we need to select a new pgpath? */
> > - pgpath = lockless_dereference(m->current_pgpath);
> > + pgpath = READ_ONCE(m->current_pgpath);
> > queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags);
> > if (!pgpath || !queue_io)
> > pgpath = choose_pgpath(m, nr_bytes);
> > @@ -1804,7 +1804,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
> > struct pgpath *current_pgpath;
> > int r;
> >
> > - current_pgpath = lockless_dereference(m->current_pgpath);
> > + current_pgpath = READ_ONCE(m->current_pgpath);
> > if (!current_pgpath)
> > current_pgpath = choose_pgpath(m, 0);
> >
> > @@ -1826,7 +1826,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
> > }
> >
> > if (r == -ENOTCONN) {
> > - if (!lockless_dereference(m->current_pg)) {
> > + if (!READ_ONCE(m->current_pg)) {
> > /* Path status changed, redo selection */
> > (void) choose_pgpath(m, 0);
> > }
> > @@ -1895,9 +1895,9 @@ static int multipath_busy(struct dm_target *ti)
> > return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED);
> >
> > /* Guess which priority_group will be used at next mapping time */
> > - pg = lockless_dereference(m->current_pg);
> > - next_pg = lockless_dereference(m->next_pg);
> > - if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg))
> > + pg = READ_ONCE(m->current_pg);
> > + next_pg = READ_ONCE(m->next_pg);
> > + if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg))
> > pg = next_pg;
> >
> > if (!pg) {
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index f90141387f01..34c852af215c 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
> > {
> > /*
> > * Be careful about RCU walk racing with rename:
> > - * use 'lockless_dereference' to fetch the name pointer.
> > + * use 'READ_ONCE' to fetch the name pointer.
> > *
> > * NOTE! Even if a rename will mean that the length
> > * was not loaded atomically, we don't care. The
> > @@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
> > * early because the data cannot match (there can
> > * be no NUL in the ct/tcount data)
> > */
> > - const unsigned char *cs = lockless_dereference(dentry->d_name.name);
> > + const unsigned char *cs = READ_ONCE(dentry->d_name.name);
> >
> > return dentry_string_cmp(cs, ct, tcount);
> > }
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index 878a750986dd..0f6809fa6628 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -74,5 +74,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode)
> >
> > static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi)
> > {
> > - return lockless_dereference(oi->__upperdentry);
> > + return READ_ONCE(oi->__upperdentry);
> > }
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 62e9b22a2077..0b389d330613 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -754,7 +754,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
> > if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) {
> > struct inode *inode = file_inode(file);
> >
> > - realfile = lockless_dereference(od->upperfile);
> > + realfile = READ_ONCE(od->upperfile);
> > if (!realfile) {
> > struct path upperpath;
> >
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index e95a2631e545..f260ff39f90f 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> > __read_once_size(&(x), __u.__c, sizeof(x)); \
> > else \
> > __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
> > + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
> > __u.__val; \
> > })
> > #define READ_ONCE(x) __READ_ONCE(x, 1)
> > @@ -604,24 +605,4 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> > (volatile typeof(x) *)&(x); })
> > #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
> >
> > -/**
> > - * lockless_dereference() - safely load a pointer for later dereference
> > - * @p: The pointer to load
> > - *
> > - * Similar to rcu_dereference(), but for situations where the pointed-to
> > - * object's lifetime is managed by something other than RCU. That
> > - * "something other" might be reference counting or simple immortality.
> > - *
> > - * The seemingly unused variable ___typecheck_p validates that @p is
> > - * indeed a pointer type by using a pointer to typeof(*p) as the type.
> > - * Taking a pointer to typeof(*p) again is needed in case p is void *.
> > - */
> > -#define lockless_dereference(p) \
> > -({ \
> > - typeof(p) _________p1 = READ_ONCE(p); \
> > - typeof(*(p)) *___typecheck_p __maybe_unused; \
> > - smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > - (_________p1); \
> > -})
> > -
> > #endif /* __LINUX_COMPILER_H */
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index b1fd8bf85fdc..3a2bb7d8ed4d 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -274,7 +274,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> > * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> > */
> > #define list_entry_rcu(ptr, type, member) \
> > - container_of(lockless_dereference(ptr), type, member)
> > + container_of(READ_ONCE(ptr), type, member)
> >
> > /**
> > * Where are list_empty_rcu() and list_first_entry_rcu()?
> > @@ -367,7 +367,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> > * example is when items are added to the list, but never deleted.
> > */
> > #define list_entry_lockless(ptr, type, member) \
> > - container_of((typeof(ptr))lockless_dereference(ptr), type, member)
> > + container_of((typeof(ptr))READ_ONCE(ptr), type, member)
> >
> > /**
> > * list_for_each_entry_lockless - iterate over rcu list of given type
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index de50d8a4cf41..380a3aeb09d7 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { }
> > #define __rcu_dereference_check(p, c, space) \
> > ({ \
> > /* Dependency order vs. p above. */ \
> > - typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
> > + typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
> > RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
> > rcu_dereference_sparse(p, space); \
> > ((typeof(*p) __force __kernel *)(________p1)); \
> > @@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { }
> > #define rcu_dereference_raw(p) \
> > ({ \
> > /* Dependency order vs. p above. */ \
> > - typeof(p) ________p1 = lockless_dereference(p); \
> > + typeof(p) ________p1 = READ_ONCE(p); \
> > ((typeof(*p) __force __kernel *)(________p1)); \
> > })
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 6bc21e202ae4..417812ce0099 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -4231,7 +4231,7 @@ static void perf_remove_from_owner(struct perf_event *event)
> > * indeed free this event, otherwise we need to serialize on
> > * owner->perf_event_mutex.
> > */
> > - owner = lockless_dereference(event->owner);
> > + owner = READ_ONCE(event->owner);
> > if (owner) {
> > /*
> > * Since delayed_put_task_struct() also drops the last
> > @@ -4328,7 +4328,7 @@ int perf_event_release_kernel(struct perf_event *event)
> > * Cannot change, child events are not migrated, see the
> > * comment with perf_event_ctx_lock_nested().
> > */
> > - ctx = lockless_dereference(child->ctx);
> > + ctx = READ_ONCE(child->ctx);
> > /*
> > * Since child_mutex nests inside ctx::mutex, we must jump
> > * through hoops. We start by grabbing a reference on the ctx.
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index bb3a38005b9c..1daa8b61a268 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -189,7 +189,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> > u32 ret = SECCOMP_RET_ALLOW;
> > /* Make sure cross-thread synced filter points somewhere sane. */
> > struct seccomp_filter *f =
> > - lockless_dereference(current->seccomp.filter);
> > + READ_ONCE(current->seccomp.filter);
> >
> > /* Ensure unexpected behavior doesn't result in failing open. */
> > if (unlikely(WARN_ON(f == NULL)))
> > diff --git a/kernel/task_work.c b/kernel/task_work.c
> > index 836a72a66fba..9a9f262fc53d 100644
> > --- a/kernel/task_work.c
> > +++ b/kernel/task_work.c
> > @@ -67,7 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
> > * we raced with task_work_run(), *pprev == NULL/exited.
> > */
> > raw_spin_lock_irqsave(&task->pi_lock, flags);
> > - while ((work = lockless_dereference(*pprev))) {
> > + while ((work = READ_ONCE(*pprev))) {
> > if (work->func != func)
> > pprev = &work->next;
> > else if (cmpxchg(pprev, work, work->next) == work)
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 073362816acc..8894f811a89d 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -258,7 +258,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx)
> > * memcg_caches issues a write barrier to match this (see
> > * memcg_create_kmem_cache()).
> > */
> > - cachep = lockless_dereference(arr->entries[idx]);
> > + cachep = READ_ONCE(arr->entries[idx]);
> > rcu_read_unlock();
> >
> > return cachep;
> > --
> > 2.1.4
> >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-10-05 20:09 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1506527369-19535-1-git-send-email-will.deacon@arm.com>
[not found] ` <1506527369-19535-2-git-send-email-will.deacon@arm.com>
[not found] ` <20170928083801.m6rb4frbbgzgam2o@hirez.programming.kicks-ass.net>
[not found] ` <20170928084535.GA19060@arm.com>
2017-09-28 15:43 ` [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables Paul E. McKenney
2017-09-28 15:49 ` Will Deacon
2017-09-28 16:07 ` Paul E. McKenney
2017-09-28 18:59 ` Michael Cree
2017-09-29 0:58 ` Paul E. McKenney
2017-09-29 9:08 ` Will Deacon
2017-09-29 16:29 ` Paul E. McKenney
2017-09-29 16:33 ` Will Deacon
2017-10-03 19:11 ` Paul E. McKenney
2017-10-05 16:31 ` Will Deacon
2017-10-05 19:22 ` Paul E. McKenney
2017-10-05 19:31 ` Andrea Parri
2017-10-05 20:09 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).