* [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()
@ 2016-03-03 15:38 Jianyu Zhan
2016-03-03 17:05 ` Darren Hart
0 siblings, 1 reply; 11+ messages in thread
From: Jianyu Zhan @ 2016-03-03 15:38 UTC (permalink / raw)
To: linux-kernel, peterz, tglx, dave, akpm, mingo, linux, dvhart,
borntraeger, fengguang.wu, bigeasy
Cc: nasa4836
Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
in unqueue_me(), to address below problem.
The scenario is like this:
====================
original code:
retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}
====================
It was observed that compiler generates code that is equivalent to:
retry:
if (q->lock_ptr != 0) {
spin_lock(q->lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}
since q->lock_ptr might change between the test of non-nullness and spin_lock(),
the double load will cause trouble. So that commit uses a barrier() to prevent this.
This patch replaces this bare barrier() with a READ_ONCE().
The reasons are:
1) READ_ONCE() is a more weak form of barrier() that affect only the specific
accesses, while barrier() is a more general compiler level memroy barrier.
READ_ONCE() was not available at that time when that patch was written.
2) READ_ONCE() which could be more informative by its name, while a bare barrier()
without comment leads to quite a bit of perplexity.
Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same:
====================
Before(barrier version):
unqueue_me():
linux/kernel/futex.c:1930
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
linux/kernel/futex.c:1932
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
====================
After(READ_ONCE version):
__read_once_size():
linux/include/linux/compiler.h:218
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
unqueue_me():
linux/kernel/futex.c:1935
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
Code size is also the same.
Suggested-by: Darren Hart <dvhart@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
kernel/futex.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 5d6ce64..58c1bcc 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1927,8 +1927,11 @@ static int unqueue_me(struct futex_q *q)
/* In the common case we don't take the spinlock, which is nice. */
retry:
- lock_ptr = q->lock_ptr;
- barrier();
+ /*
+ * Prevent the compiler to read q->lock_ptr twice (if and spin_lock),
+ * or that would cause trouble since q->lock_ptr can change in between.
+ */
+ lock_ptr = READ_ONCE(q->lock_ptr);
if (lock_ptr != NULL) {
spin_lock(lock_ptr);
/*
--
2.4.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-03 15:38 [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() Jianyu Zhan @ 2016-03-03 17:05 ` Darren Hart 2016-03-04 1:12 ` Jianyu Zhan 0 siblings, 1 reply; 11+ messages in thread From: Darren Hart @ 2016-03-03 17:05 UTC (permalink / raw) To: Jianyu Zhan Cc: linux-kernel, peterz, tglx, dave, akpm, mingo, linux, dvhart, borntraeger, fengguang.wu, bigeasy On Thu, Mar 03, 2016 at 11:38:05PM +0800, Jianyu Zhan wrote: > Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier() > in unqueue_me(), to address below problem. > > The scenario is like this: > > ==================== > original code: > > retry: > lock_ptr = q->lock_ptr; > if (lock_ptr != 0) { > spin_lock(lock_ptr) > if (unlikely(lock_ptr != q->lock_ptr)) { > spin_unlock(lock_ptr); > goto retry; > } > ... > } > > ==================== > It was observed that compiler generates code that is equivalent to: > > retry: > if (q->lock_ptr != 0) { > spin_lock(q->lock_ptr) > if (unlikely(lock_ptr != q->lock_ptr)) { > spin_unlock(lock_ptr); > goto retry; > } > ... > } > > since q->lock_ptr might change between the test of non-nullness and spin_lock(), > the double load will cause trouble. So that commit uses a barrier() to prevent this. > > This patch replaces this bare barrier() with a READ_ONCE(). > > The reasons are: > > 1) READ_ONCE() is a more weak form of barrier() that affect only the specific > accesses, while barrier() is a more general compiler level memroy barrier. > READ_ONCE() was not available at that time when that patch was written. > > 2) READ_ONCE() which could be more informative by its name, while a bare barrier() > without comment leads to quite a bit of perplexity. > > Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same: > > ==================== > Before(barrier version): > > unqueue_me(): > linux/kernel/futex.c:1930 > 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 > linux/kernel/futex.c:1932 > 1dfd: 4d 85 ff test %r15,%r15 > 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> > spin_lock(): > linux/include/linux/spinlock.h:302 > 1e06: 4c 89 ff mov %r15,%rdi > 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> > > ==================== > After(READ_ONCE version): > > __read_once_size(): > linux/include/linux/compiler.h:218 > 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 > unqueue_me(): > linux/kernel/futex.c:1935 > 1dfd: 4d 85 ff test %r15,%r15 > 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> > spin_lock(): > linux/include/linux/spinlock.h:302 > 1e06: 4c 89 ff mov %r15,%rdi > 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> > > Code size is also the same. > > Suggested-by: Darren Hart <dvhart@infradead.org> A simple suggested-by may cause people to think this patch was my idea, which would not be accurate. You can credit people for recommendations in the patch changelog ("Since v1: ..." below the --- line). > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> > Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> > --- > kernel/futex.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 5d6ce64..58c1bcc 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1927,8 +1927,11 @@ static int unqueue_me(struct futex_q *q) > > /* In the common case we don't take the spinlock, which is nice. */ > retry: > - lock_ptr = q->lock_ptr; > - barrier(); > + /* > + * Prevent the compiler to read q->lock_ptr twice (if and spin_lock), > + * or that would cause trouble since q->lock_ptr can change in between. > + */ I thought I provided a corrected comment block.... maybe I didn't. We have been working on improving the futex documentation, so we're paying close attention to terminology as well as grammar. This one needs a couple minor tweaks. I suggest: /* * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and * optimizing lock_ptr out of the logic below. */ The bit about q->lock_ptr possibly changing is already covered by the large comment block below the spin_lock(lock_ptr) call. With the above change, I'll add my Reviewed-by. Thanks, -- Darren Hart Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-03 17:05 ` Darren Hart @ 2016-03-04 1:12 ` Jianyu Zhan 2016-03-04 21:05 ` Darren Hart 0 siblings, 1 reply; 11+ messages in thread From: Jianyu Zhan @ 2016-03-04 1:12 UTC (permalink / raw) To: Darren Hart Cc: LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > I thought I provided a corrected comment block.... maybe I didn't. We have been > working on improving the futex documentation, so we're paying close attention to > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > /* > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > * optimizing lock_ptr out of the logic below. > */ > > The bit about q->lock_ptr possibly changing is already covered by the large > comment block below the spin_lock(lock_ptr) call. The large comment block is explaining the why the retry logic is required. To achieve this semantic requirement, the READ_ONCE is needed to prevent compiler optimizing it by doing double loads. So I think the comment above should explain this tricky part. > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ And as for preventing from optimizing the lock_ptr out of the retry code block, I have consult Paul Mckenney, he suggests one more READ_ONCE should be added here: if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) { <------------------------------ spin_unlock(lock_ptr); goto retry; } And I think this are two problem, and should be separated into two patches? Regards, Jianyu Zhan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 1:12 ` Jianyu Zhan @ 2016-03-04 21:05 ` Darren Hart 2016-03-04 21:57 ` Paul E. McKenney 2016-03-07 1:32 ` [PATCH v3] " Jianyu Zhan 0 siblings, 2 replies; 11+ messages in thread From: Darren Hart @ 2016-03-04 21:05 UTC (permalink / raw) To: Jianyu Zhan, Paul McKenney Cc: LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote: > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > > I thought I provided a corrected comment block.... maybe I didn't. We have been > > working on improving the futex documentation, so we're paying close attention to > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > > > /* > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > * optimizing lock_ptr out of the logic below. > > */ > > > > The bit about q->lock_ptr possibly changing is already covered by the large > > comment block below the spin_lock(lock_ptr) call. > > The large comment block is explaining the why the retry logic is required. > To achieve this semantic requirement, the READ_ONCE is needed to prevent > compiler optimizing it by doing double loads. > > So I think the comment above should explain this tricky part. Fair point. Consider: /* * q->lock_ptr can change between this read and the following spin_lock. * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and * optimizing lock_ptr out of the logic below. */ > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ > > And as for preventing from optimizing the lock_ptr out of the retry > code block, I have consult > Paul Mckenney, he suggests one more READ_ONCE should be added here: Let's keep this discussion together so we have a record of the justification. +Paul McKenney Paul, my understanding was that spin_lock was a CPU memory barrier, which in turn is an implicit compiler barrier (aka barrier()), of which READ_ONCE is described as a weaker form. Reviewing this, I realize the scope of barrier() wasn't clear to me. It seems while barrier() ensures ordering, it does not offer the same guarantee regarding reloading that READ_ONCE offers. So READ_ONCE is not strictly a weaker form of barrier() as I had gathered from a spotty reading of memory-barriers.txt, but it also offers guarantees regarding memory references that barrier() does not. Correct? > > if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) { > <------------------------------ > spin_unlock(lock_ptr); > goto retry; > } > > And I think this are two problem, and should be separated into two patches? Yes (pending results of the conversation above). -- Darren Hart Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 21:05 ` Darren Hart @ 2016-03-04 21:57 ` Paul E. McKenney 2016-03-04 22:38 ` Darren Hart 2016-03-07 1:32 ` [PATCH v3] " Jianyu Zhan 1 sibling, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2016-03-04 21:57 UTC (permalink / raw) To: Darren Hart Cc: Jianyu Zhan, LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote: > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote: > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > > > I thought I provided a corrected comment block.... maybe I didn't. We have been > > > working on improving the futex documentation, so we're paying close attention to > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > > > > > /* > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > * optimizing lock_ptr out of the logic below. > > > */ > > > > > > The bit about q->lock_ptr possibly changing is already covered by the large > > > comment block below the spin_lock(lock_ptr) call. > > > > The large comment block is explaining the why the retry logic is required. > > To achieve this semantic requirement, the READ_ONCE is needed to prevent > > compiler optimizing it by doing double loads. > > > > So I think the comment above should explain this tricky part. > > Fair point. Consider: > > > /* > * q->lock_ptr can change between this read and the following spin_lock. > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > * optimizing lock_ptr out of the logic below. > */ > > > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ > > > > And as for preventing from optimizing the lock_ptr out of the retry > > code block, I have consult > > Paul Mckenney, he suggests one more READ_ONCE should be added here: > > Let's keep this discussion together so we have a record of the > justification. > > +Paul McKenney > > Paul, my understanding was that spin_lock was a CPU memory barrier, > which in turn is an implicit compiler barrier (aka barrier()), of which > READ_ONCE is described as a weaker form. Reviewing this, I realize the > scope of barrier() wasn't clear to me. It seems while barrier() ensures > ordering, it does not offer the same guarantee regarding reloading that > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of > barrier() as I had gathered from a spotty reading of > memory-barriers.txt, but it also offers guarantees regarding memory > references that barrier() does not. > > Correct? If q->lock_ptr is never changed except under that lock, then there is indeed no reason for the ACCESS_ONCE(). So, is q->lock_ptr ever changed while the lock is -not- held? If so, I suggest that you put an ACCESS_ONCE() there. Thanx, Paul > > if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) { > > <------------------------------ > > spin_unlock(lock_ptr); > > goto retry; > > } > > > > And I think this are two problem, and should be separated into two patches? > > Yes (pending results of the conversation above). > > -- > Darren Hart > Intel Open Source Technology Center > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 21:57 ` Paul E. McKenney @ 2016-03-04 22:38 ` Darren Hart 2016-03-04 22:45 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Darren Hart @ 2016-03-04 22:38 UTC (permalink / raw) To: Paul E. McKenney Cc: Jianyu Zhan, LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote: > On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote: > > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote: > > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > > > > I thought I provided a corrected comment block.... maybe I didn't. We have been > > > > working on improving the futex documentation, so we're paying close attention to > > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > > > > > > > /* > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > > * optimizing lock_ptr out of the logic below. > > > > */ > > > > > > > > The bit about q->lock_ptr possibly changing is already covered by the large > > > > comment block below the spin_lock(lock_ptr) call. > > > > > > The large comment block is explaining the why the retry logic is required. > > > To achieve this semantic requirement, the READ_ONCE is needed to prevent > > > compiler optimizing it by doing double loads. > > > > > > So I think the comment above should explain this tricky part. > > > > Fair point. Consider: > > > > > > /* > > * q->lock_ptr can change between this read and the following spin_lock. > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > * optimizing lock_ptr out of the logic below. > > */ > > > > > > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ > > > > > > And as for preventing from optimizing the lock_ptr out of the retry > > > code block, I have consult > > > Paul Mckenney, he suggests one more READ_ONCE should be added here: > > > > Let's keep this discussion together so we have a record of the > > justification. > > > > +Paul McKenney > > > > Paul, my understanding was that spin_lock was a CPU memory barrier, > > which in turn is an implicit compiler barrier (aka barrier()), of which > > READ_ONCE is described as a weaker form. Reviewing this, I realize the > > scope of barrier() wasn't clear to me. It seems while barrier() ensures > > ordering, it does not offer the same guarantee regarding reloading that > > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of > > barrier() as I had gathered from a spotty reading of > > memory-barriers.txt, but it also offers guarantees regarding memory > > references that barrier() does not. > > > > Correct? > > If q->lock_ptr is never changed except under that lock, then there is > indeed no reason for the ACCESS_ONCE(). The only location where a q->lock_ptr is updated without that lock being held is in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until after the lock is held (so unqueue_me() cannot race with queue_lock()). > So, is q->lock_ptr ever changed while the lock is -not- held? If so, > I suggest that you put an ACCESS_ONCE() there. It is not. -- Darren Hart Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 22:38 ` Darren Hart @ 2016-03-04 22:45 ` Paul E. McKenney 2016-03-04 22:53 ` Darren Hart 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2016-03-04 22:45 UTC (permalink / raw) To: Darren Hart Cc: Jianyu Zhan, LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 04, 2016 at 02:38:01PM -0800, Darren Hart wrote: > On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote: > > On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote: > > > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote: > > > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > > > > > I thought I provided a corrected comment block.... maybe I didn't. We have been > > > > > working on improving the futex documentation, so we're paying close attention to > > > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > > > > > > > > > /* > > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > > > * optimizing lock_ptr out of the logic below. > > > > > */ > > > > > > > > > > The bit about q->lock_ptr possibly changing is already covered by the large > > > > > comment block below the spin_lock(lock_ptr) call. > > > > > > > > The large comment block is explaining the why the retry logic is required. > > > > To achieve this semantic requirement, the READ_ONCE is needed to prevent > > > > compiler optimizing it by doing double loads. > > > > > > > > So I think the comment above should explain this tricky part. > > > > > > Fair point. Consider: > > > > > > > > > /* > > > * q->lock_ptr can change between this read and the following spin_lock. > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > * optimizing lock_ptr out of the logic below. > > > */ > > > > > > > > > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ > > > > > > > > And as for preventing from optimizing the lock_ptr out of the retry > > > > code block, I have consult > > > > Paul Mckenney, he suggests one more READ_ONCE should be added here: > > > > > > Let's keep this discussion together so we have a record of the > > > justification. > > > > > > +Paul McKenney > > > > > > Paul, my understanding was that spin_lock was a CPU memory barrier, > > > which in turn is an implicit compiler barrier (aka barrier()), of which > > > READ_ONCE is described as a weaker form. Reviewing this, I realize the > > > scope of barrier() wasn't clear to me. It seems while barrier() ensures > > > ordering, it does not offer the same guarantee regarding reloading that > > > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of > > > barrier() as I had gathered from a spotty reading of > > > memory-barriers.txt, but it also offers guarantees regarding memory > > > references that barrier() does not. > > > > > > Correct? > > > > If q->lock_ptr is never changed except under that lock, then there is > > indeed no reason for the ACCESS_ONCE(). > > The only location where a q->lock_ptr is updated without that lock being held is > in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until > after the lock is held (so unqueue_me() cannot race with queue_lock()). > > > So, is q->lock_ptr ever changed while the lock is -not- held? If so, > > I suggest that you put an ACCESS_ONCE() there. > > It is not. If I followed that correctly, then I agree that you don't need an ACCESS_ONCE() in this case. Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 22:45 ` Paul E. McKenney @ 2016-03-04 22:53 ` Darren Hart 0 siblings, 0 replies; 11+ messages in thread From: Darren Hart @ 2016-03-04 22:53 UTC (permalink / raw) To: Paul E. McKenney Cc: Jianyu Zhan, LKML, Peter Zijlstra, Thomas Gleixner, dave, Andrew Morton, Ingo Molnar, Rasmus Villemoes, dvhart, Christian Borntraeger, Fengguang Wu, bigeasy On Fri, Mar 04, 2016 at 02:45:11PM -0800, Paul McKenney wrote: > On Fri, Mar 04, 2016 at 02:38:01PM -0800, Darren Hart wrote: > > On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote: > > > On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote: > > > > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote: > > > > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <dvhart@infradead.org> wrote: > > > > > > I thought I provided a corrected comment block.... maybe I didn't. We have been > > > > > > working on improving the futex documentation, so we're paying close attention to > > > > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest: > > > > > > > > > > > > /* > > > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > > > > * optimizing lock_ptr out of the logic below. > > > > > > */ > > > > > > > > > > > > The bit about q->lock_ptr possibly changing is already covered by the large > > > > > > comment block below the spin_lock(lock_ptr) call. > > > > > > > > > > The large comment block is explaining the why the retry logic is required. > > > > > To achieve this semantic requirement, the READ_ONCE is needed to prevent > > > > > compiler optimizing it by doing double loads. > > > > > > > > > > So I think the comment above should explain this tricky part. > > > > > > > > Fair point. Consider: > > > > > > > > > > > > /* > > > > * q->lock_ptr can change between this read and the following spin_lock. > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > > > > * optimizing lock_ptr out of the logic below. > > > > */ > > > > > > > > > > > > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */ > > > > > > > > > > And as for preventing from optimizing the lock_ptr out of the retry > > > > > code block, I have consult > > > > > Paul Mckenney, he suggests one more READ_ONCE should be added here: > > > > > > > > Let's keep this discussion together so we have a record of the > > > > justification. > > > > > > > > +Paul McKenney > > > > > > > > Paul, my understanding was that spin_lock was a CPU memory barrier, > > > > which in turn is an implicit compiler barrier (aka barrier()), of which > > > > READ_ONCE is described as a weaker form. Reviewing this, I realize the > > > > scope of barrier() wasn't clear to me. It seems while barrier() ensures > > > > ordering, it does not offer the same guarantee regarding reloading that > > > > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of > > > > barrier() as I had gathered from a spotty reading of > > > > memory-barriers.txt, but it also offers guarantees regarding memory > > > > references that barrier() does not. > > > > > > > > Correct? > > > > > > If q->lock_ptr is never changed except under that lock, then there is > > > indeed no reason for the ACCESS_ONCE(). > > > > The only location where a q->lock_ptr is updated without that lock being held is > > in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until > > after the lock is held (so unqueue_me() cannot race with queue_lock()). > > > > > So, is q->lock_ptr ever changed while the lock is -not- held? If so, > > > I suggest that you put an ACCESS_ONCE() there. > > > > It is not. > > If I followed that correctly, then I agree that you don't need an > ACCESS_ONCE() in this case. > Thanks for offering your time and expertise Paul. It's a major effort every time I open memory-barriers.txt :-) -- Darren Hart Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-04 21:05 ` Darren Hart 2016-03-04 21:57 ` Paul E. McKenney @ 2016-03-07 1:32 ` Jianyu Zhan 2016-03-08 11:26 ` Darren Hart 2016-03-08 16:09 ` [tip:locking/core] futex: Replace barrier() in unqueue_me() with READ_ONCE() tip-bot for Jianyu Zhan 1 sibling, 2 replies; 11+ messages in thread From: Jianyu Zhan @ 2016-03-07 1:32 UTC (permalink / raw) To: linux-kernel, peterz, tglx, dave, akpm, mingo, linux, dvhart, borntraeger, fengguang.wu, bigeasy Cc: nasa4836 Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier() in unqueue_me(), to address below problem. The scenario is like this: ==================== original code: retry: lock_ptr = q->lock_ptr; if (lock_ptr != 0) { spin_lock(lock_ptr) if (unlikely(lock_ptr != q->lock_ptr)) { spin_unlock(lock_ptr); goto retry; } ... } ==================== It was observed that compiler generates code that is equivalent to: retry: if (q->lock_ptr != 0) { spin_lock(q->lock_ptr) if (unlikely(lock_ptr != q->lock_ptr)) { spin_unlock(lock_ptr); goto retry; } ... } since q->lock_ptr might change between the test of non-nullness and spin_lock(), the double load will cause trouble. So that commit uses a barrier() to prevent this. This patch replaces this bare barrier() with a READ_ONCE(). The reasons are: 1) READ_ONCE() is a more weak form of barrier() that affect only the specific accesses, while barrier() is a more general compiler level memroy barrier. READ_ONCE() was not available at that time when that patch was written. 2) READ_ONCE() which could be more informative by its name, while a bare barrier() without comment leads to quite a bit of perplexity. Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same: ==================== Before(barrier version): unqueue_me(): linux/kernel/futex.c:1930 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 linux/kernel/futex.c:1932 1dfd: 4d 85 ff test %r15,%r15 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> spin_lock(): linux/include/linux/spinlock.h:302 1e06: 4c 89 ff mov %r15,%rdi 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> ==================== After(READ_ONCE version): __read_once_size(): linux/include/linux/compiler.h:218 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 unqueue_me(): linux/kernel/futex.c:1935 1dfd: 4d 85 ff test %r15,%r15 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> spin_lock(): linux/include/linux/spinlock.h:302 1e06: 4c 89 ff mov %r15,%rdi 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> Code size is also the same. Many thanks to Darren Hart <dvhart@infradead.org> for reviewing and suggestion. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> --- kernel/futex.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 5d6ce64..25dbfed 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1927,8 +1927,12 @@ static int unqueue_me(struct futex_q *q) /* In the common case we don't take the spinlock, which is nice. */ retry: - lock_ptr = q->lock_ptr; - barrier(); + /* + * q->lock_ptr can change between this read and the following spin_lock. + * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and + * optimizing lock_ptr out of the logic below. + */ + lock_ptr = READ_ONCE(q->lock_ptr); if (lock_ptr != NULL) { spin_lock(lock_ptr); /* -- 2.4.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3] futex: replace bare barrier() with more lightweight READ_ONCE() 2016-03-07 1:32 ` [PATCH v3] " Jianyu Zhan @ 2016-03-08 11:26 ` Darren Hart 2016-03-08 16:09 ` [tip:locking/core] futex: Replace barrier() in unqueue_me() with READ_ONCE() tip-bot for Jianyu Zhan 1 sibling, 0 replies; 11+ messages in thread From: Darren Hart @ 2016-03-08 11:26 UTC (permalink / raw) To: Jianyu Zhan Cc: linux-kernel, peterz, tglx, dave, akpm, mingo, linux, dvhart, borntraeger, fengguang.wu, bigeasy On Mon, Mar 07, 2016 at 09:32:24AM +0800, Jianyu Zhan wrote: > Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier() > in unqueue_me(), to address below problem. > > The scenario is like this: > > ==================== > original code: > > retry: > lock_ptr = q->lock_ptr; > if (lock_ptr != 0) { > spin_lock(lock_ptr) > if (unlikely(lock_ptr != q->lock_ptr)) { > spin_unlock(lock_ptr); > goto retry; > } > ... > } > > ==================== > It was observed that compiler generates code that is equivalent to: > > retry: > if (q->lock_ptr != 0) { > spin_lock(q->lock_ptr) > if (unlikely(lock_ptr != q->lock_ptr)) { > spin_unlock(lock_ptr); > goto retry; > } > ... > } > > since q->lock_ptr might change between the test of non-nullness and spin_lock(), > the double load will cause trouble. So that commit uses a barrier() to prevent this. > > This patch replaces this bare barrier() with a READ_ONCE(). > > The reasons are: > > 1) READ_ONCE() is a more weak form of barrier() that affect only the specific > accesses, while barrier() is a more general compiler level memroy barrier. > READ_ONCE() was not available at that time when that patch was written. > > 2) READ_ONCE() which could be more informative by its name, while a bare barrier() > without comment leads to quite a bit of perplexity. > > Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same: > > ==================== > Before(barrier version): > > unqueue_me(): > linux/kernel/futex.c:1930 > 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 > linux/kernel/futex.c:1932 > 1dfd: 4d 85 ff test %r15,%r15 > 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> > spin_lock(): > linux/include/linux/spinlock.h:302 > 1e06: 4c 89 ff mov %r15,%rdi > 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> > > ==================== > After(READ_ONCE version): > > __read_once_size(): > linux/include/linux/compiler.h:218 > 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15 > unqueue_me(): > linux/kernel/futex.c:1935 > 1dfd: 4d 85 ff test %r15,%r15 > 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292> > spin_lock(): > linux/include/linux/spinlock.h:302 > 1e06: 4c 89 ff mov %r15,%rdi > 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e> > > Code size is also the same. > > Many thanks to Darren Hart <dvhart@infradead.org> for reviewing and > suggestion. > > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> > Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Thanks for sticking with this Jianyu. Reviewed-by: Darren Hart <dvhart@linux.intel.com> > --- > kernel/futex.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 5d6ce64..25dbfed 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1927,8 +1927,12 @@ static int unqueue_me(struct futex_q *q) > > /* In the common case we don't take the spinlock, which is nice. */ > retry: > - lock_ptr = q->lock_ptr; > - barrier(); > + /* > + * q->lock_ptr can change between this read and the following spin_lock. > + * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and > + * optimizing lock_ptr out of the logic below. > + */ > + lock_ptr = READ_ONCE(q->lock_ptr); > if (lock_ptr != NULL) { > spin_lock(lock_ptr); > /* > -- > 2.4.3 > > -- Darren Hart Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:locking/core] futex: Replace barrier() in unqueue_me() with READ_ONCE() 2016-03-07 1:32 ` [PATCH v3] " Jianyu Zhan 2016-03-08 11:26 ` Darren Hart @ 2016-03-08 16:09 ` tip-bot for Jianyu Zhan 1 sibling, 0 replies; 11+ messages in thread From: tip-bot for Jianyu Zhan @ 2016-03-08 16:09 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, nasa4836, borntraeger, tglx, hpa, dvhart, mingo Commit-ID: 29b75eb2d56a714190a93d7be4525e617591077a Gitweb: http://git.kernel.org/tip/29b75eb2d56a714190a93d7be4525e617591077a Author: Jianyu Zhan <nasa4836@gmail.com> AuthorDate: Mon, 7 Mar 2016 09:32:24 +0800 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 8 Mar 2016 17:04:02 +0100 futex: Replace barrier() in unqueue_me() with READ_ONCE() Commit e91467ecd1ef ("bug in futex unqueue_me") introduced a barrier() in unqueue_me() to prevent the compiler from rereading the lock pointer which might change after a check for NULL. Replace the barrier() with a READ_ONCE() for the following reasons: 1) READ_ONCE() is a weaker form of barrier() that affects only the specific load operation, while barrier() is a general compiler level memory barrier. READ_ONCE() was not available at the time when the barrier was added. 2) Aside of that READ_ONCE() is descriptive and self explainatory while a barrier without comment is not clear to the casual reader. No functional change. [ tglx: Massaged changelog ] Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: dave@stgolabs.net Cc: peterz@infradead.org Cc: linux@rasmusvillemoes.dk Cc: akpm@linux-foundation.org Cc: fengguang.wu@intel.com Cc: bigeasy@linutronix.de Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/futex.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index bae542e..a5d2e74 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2010,8 +2010,12 @@ static int unqueue_me(struct futex_q *q) /* In the common case we don't take the spinlock, which is nice. */ retry: - lock_ptr = q->lock_ptr; - barrier(); + /* + * q->lock_ptr can change between this read and the following spin_lock. + * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and + * optimizing lock_ptr out of the logic below. + */ + lock_ptr = READ_ONCE(q->lock_ptr); if (lock_ptr != NULL) { spin_lock(lock_ptr); /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-03-08 16:10 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-03 15:38 [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE() Jianyu Zhan 2016-03-03 17:05 ` Darren Hart 2016-03-04 1:12 ` Jianyu Zhan 2016-03-04 21:05 ` Darren Hart 2016-03-04 21:57 ` Paul E. McKenney 2016-03-04 22:38 ` Darren Hart 2016-03-04 22:45 ` Paul E. McKenney 2016-03-04 22:53 ` Darren Hart 2016-03-07 1:32 ` [PATCH v3] " Jianyu Zhan 2016-03-08 11:26 ` Darren Hart 2016-03-08 16:09 ` [tip:locking/core] futex: Replace barrier() in unqueue_me() with READ_ONCE() tip-bot for Jianyu Zhan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).