Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon@arm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"ishkamiel@gmail.com" <ishkamiel@gmail.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	parri.andrea@gmail.com, boqun.feng@gmail.com,
	dhowells@redhat.com, david@fromorbit.com
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
Date: Thu, 2 Nov 2017 17:16:44 +0000	[thread overview]
Message-ID: <20171102171644.GD595@arm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1711021245570.1277-100000@iolanthe.rowland.org>

On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > > 
> > > > > Lock functions such as refcount_dec_and_lock() &
> > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > > they atomic counterparts. 
> > > > 
> > > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > > later load/store's.
> > > 
> > > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > > returns false; 
> > 
> > It should provide a release:
> > 
> >  - if !=1, dec_not_one will provide release
> >  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
> >    dec_and_test will provide the release, even if the test fails and we
> >    unlock again it should still dec.
> > 
> > The one exception is when the counter is saturated, but in that case
> > we'll never free the object and the ordering is moot in any case.
> 
> Also if the counter is 0, but that will never happen if the 
> refcounting is correct.
> 
> > > it provides ordering only if the return value is true.  
> > > In which case it provides acquire ordering (thanks to the spin_lock),
> > > and both release ordering and a control dependency (thanks to the
> > > refcount_dec_and_test).
> > > 
> > > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > > have you got anything simple around?
> > > 
> > > The combination of acquire + release is not the same as smp_mb, because 
> > 
> > acquire+release is nothing, its release+acquire that I meant which
> > should order things locally, but now that you've got me looking at it
> > again, we don't in fact do that.
> > 
> > So refcount_dec_and_lock() will provide a release, irrespective of the
> > return value (assuming we're not saturated). If it returns true, it also
> > does an acquire for the lock.
> > 
> > But combined they're acquire+release, which is unfortunate.. it means
> > the lock section and the refcount stuff overlaps, but I don't suppose
> > that's actually a problem. Need to consider more.
> 
> Right.  To address your point: release + acquire isn't the same as a
> full barrier either.  The SB pattern illustrates the difference:
> 
> 	P0		P1
> 	Write x=1	Write y=1
> 	Release a	smp_mb
> 	Acquire b	Read x=0
> 	Read y=0
> 
> This would not be allowed if the release + acquire sequence was 
> replaced by smp_mb.  But as it stands, this is allowed because nothing 
> prevents the CPU from interchanging the order of the release and the 
> acquire -- and then you're back to the acquire + release case.
> 
> However, there is one circumstance where this interchange isn't 
> allowed: when the release and acquire access the same memory 
> location.  Thus:
> 
> 	P0(int *x, int *y, int *a)
> 	{
> 		int r0;
> 
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(a, 1);
> 		smp_load_acquire(a);
> 		r0 = READ_ONCE(*y);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*y, 1);
> 		smp_mb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists (0:r0=0 /\ 1:r1=0)
> 
> This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> were replaced by a similar release/acquire pair for the same memory 
> location.

Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain
store and load-acquire to plain load? All we're saying is that you can forward
from a release to an acquire, which is fine for RCpc semantics.

e.g.

X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"
Generator=diyone7 (version 7.46+3)
Prefetch=0:x=F,0:y=T,1:y=F,1:x=T
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
{
}
 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;
exists
(0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)

which herd says is allowed:

Test SB+mfence+po-rfi-po Allowed
States 4
0:EAX=0; 1:EAX=1; 1:EBX=0;
0:EAX=0; 1:EAX=1; 1:EBX=1;
0:EAX=1; 1:EAX=1; 1:EBX=0;
0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Observation SB+mfence+po-rfi-po Sometimes 1 3
Time SB+mfence+po-rfi-po 0.00
Hash=0f983e2d7579e5c04c332f9ac620c31f

and I can reproduce using litmus to actually run it on my x86 box:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for SB+mfence+po-rfi-po.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"

{}

 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;

exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Generated assembler
#START _litmus_P1
	movl $1,(%r8,%rcx)
	movl $1,(%r9,%rcx)
	movl (%r9,%rcx),%eax
	movl (%rdi,%rcx),%edx
#START _litmus_P0
	movl $1,(%rdx,%rcx)
	mfence
	movl (%rdi,%rcx),%eax

Test SB+mfence+po-rfi-po Allowed
Histogram (4 states)
8     *>0:EAX=0; 1:EAX=1; 1:EBX=0;
1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0;
1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1;
592   :>0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok

Witnesses
Positive: 8, Negative: 3999992
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
Hash=0f983e2d7579e5c04c332f9ac620c31f
Generator=diyone7 (version 7.46+3)
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
Observation SB+mfence+po-rfi-po Sometimes 8 3999992
Time SB+mfence+po-rfi-po 0.17

Will

next prev parent reply	other threads:[~2017-11-02 17:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-23 11:09 [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Elena Reshetova
2017-10-23 13:12 ` Peter Zijlstra
2017-10-27  6:49   ` Reshetova, Elena
2017-10-27 13:56     ` Peter Zijlstra
2017-11-02 11:04       ` Reshetova, Elena
2017-11-02 13:57         ` Peter Zijlstra
2017-11-02 15:40           ` Alan Stern
2017-11-02 16:02             ` Peter Zijlstra
2017-11-02 16:45               ` Peter Zijlstra
2017-11-02 17:08               ` Alan Stern
2017-11-02 17:16                 ` Will Deacon [this message]
2017-11-02 17:26                   ` Peter Zijlstra
2017-11-02 20:21                   ` Alan Stern
2017-11-15 18:05                     ` Will Deacon
2017-11-15 19:15                       ` Alan Stern
2017-11-15 20:03                         ` Peter Zijlstra
2017-11-15 20:22                           ` Alan Stern
2017-11-16  8:46                             ` Peter Zijlstra
2017-11-15 21:01                           ` Andrea Parri
2017-11-16  8:58                             ` Peter Zijlstra
2017-11-16 10:00                               ` Andrea Parri
2017-11-02 17:45                 ` Andrea Parri
2017-11-02 20:28                   ` Alan Stern
2017-11-03 11:55           ` Reshetova, Elena
2017-11-13  9:09           ` Reshetova, Elena
2017-11-13 13:19             ` Paul E. McKenney
2017-11-13 16:01               ` Reshetova, Elena
2017-11-13 16:26                 ` Paul E. McKenney
2017-11-14 11:23                   ` Reshetova, Elena
2017-11-14 17:24                     ` Paul E. McKenney
2017-11-16 13:44             ` Michal Hocko
2017-11-16 15:29               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171102171644.GD595@arm.com \
    --to=will.deacon@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ishkamiel@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.