All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>,
	Jeff Layton <jlayton@redhat.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Matthew Wilcox <matthew@wil.cx>
Subject: Re: memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open)
Date: Mon, 19 Aug 2013 09:31:49 -0400	[thread overview]
Message-ID: <20130819133148.GF6726@fieldses.org> (raw)
In-Reply-To: <20130815213106.GP29406@linux.vnet.ibm.com>

On Thu, Aug 15, 2013 at 02:31:06PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 15, 2013 at 09:44:25PM +0100, David Howells wrote:
> > Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > (Adding Paul McKenney who's good at this stuff)
> 
> Well, I should be able to provide a more refined form of confusion...
> 
> > > > v2:
> > > > - fix potential double-free of lease if second check finds conflict
> > > > - add smp_mb's to ensure that other CPUs see i_flock changes
> > > > 
> > > > v3:
> > > > - remove smp_mb calls. Partial ordering is unlikely to help here.
> > > 
> > > Forgive me here, I still don't understand.  So to simplify massively,
> > > the situation looks like:
> > >
> > > 	setlease		open
> > > 	------------		------
> > > 
> > > 	atomic_read		atomic_inc
> > > 	write i_flock		read i_flock
> > > 	atomic_read
> > 
> > Are the three atomic ops reading the same value?  If so, you can have smp_mb()
> > calls here:
> > 
> > 	atomic_read		atomic_inc
> > 				smp_mb()
> > 	write i_flock		read i_flock
> > 	smp_mb()
> >  	atomic_read
> 
> Yes, you need memory barrier of some form or another.  You can use
> smp_mb__after_atomic_inc() after the atomic_inc, which is lighter
> weight on x86.  Note that atomic_inc() does not return a value, and
> thus does not guarantee ordering of any sort.
> 
> > I *think* that memory accesses in one place need to be reverse-ordered wrt to
> > those in the other place, so:
> > 
> > 	atomic_read		atomic_inc
> > 	smp_mb()		smp_mb()
> > 	write i_flock		read i_flock
> >  	atomic_read
> > 
> > doesn't achieve anything.
> 
> The only thing that this arrangement could achive would be if the
> atomic_ operations are all accessing the same variable, in which case
> if the first CPU's last atomic_read preceded the atomic_inc, then
> we would know that the first CPU's first atomic_inc also preceded
> that same atomic_inc.
> 
> Let's add values and look at it a slightly different way:
> 
> 	CPU 0			CPU 1
> 
> 	r1 = atomic_read(&x);	atomic_inc(&x);
> 	smp_mb();		smp_mb__after_atomic_inc()
> 	i_flock = 1;		r3 = i_flock;
> 	r2 = atomic_read(&x);
> 
> Assume that the initial values of x and i_flock are zero.  (This might not
> make sense in the code, but you can substitute different values without
> changing the way this works.)
> 
> Then if r2==0, we know that r1==0 as well.  Of course, if there were other
> atomic_inc(&x) operations in flight, it might be that r1!=r2, but we
> would know that r1 got an earlier value of x than r2.  If x is 64 bits,
> then we know that r1<r2.
> 
> But as Dave points out, the placement of the memory barriers prevents
> us from drawing any similar conclusions about the accesses to i_flock.
> 
> So let's take a similar look at the initial arrangement:
> 
> 	CPU 0			CPU 1
> 
> 	r1 = atomic_read(&x);	atomic_inc(&x);
> 	i_flock = 1;		smp_mb__after_atomic_inc()
> 	smp_mb();		r3 = i_flock;
> 	r2 = atomic_read(&x);
> 
> Again, assume that the initial values of x and of i_flock are zero.
> Assume also that this is the only code that executes.  Then if r3==0, we
> know that r2>=1.  In fact, if r3==0, we know that r2==1.  The load from
> i_flock() had to have come before the store, and so the memory barriers
> guarantee that the load into r2 must see the results of the atomic_inc().
> 
> Similarly, if r2==0, we know that CPU 0's load into r2 came before
> CPU 1's atomic_inc().  The memory barriers then guarantee that CPU 0's
> store into i_flock precedes CPU 1's load from i_flock, so that r3==1.
> Cache coherence gives us another fact.  Both of CPU 0's atomic_read()s
> do volatile loads from the same variable, so they must happen in order
> (if they were not volatile, the compiler would be within its rights
> to interchange them).

OK, but the smp_mb() already guaranteed this, right?

(How would our results be different if we replaced the above by

	r1 = x;		x++;
	i_flock=1;	smp_mb();
	smb_mb();	r3=i_flock;
	r2 = x;

?

Could the compiler could for example assume that x doesn't change at all
between the two assignments and not do the second read at all?  But if
it's allowed to do that then I'm not sure I understand what a compiler
barrier is.)

> Therefore, because CPU 0's atomic_read() into
> r2 precedes CPU 1's atomic_inc() -- recall that r2==0 -- we know that
> CPU 0's atomic_read() into r1 must also precede CPU 1's atomic_inc(),
> meaning that we know r1==0.
> 
> Reasoning about memory barriers takes this same form.  If something after
> memory barrier A can be proven to have happened before something that
> came before memory barrier B, then everything before memory barrier A
> must happen before everything after memory barrier B.  You carry out
> the proof by looking at loads and stores to a given variable.
> 
> This is a key point.  Memory barriers do nothing except for conditionally
> enforcing ordering.  They are not guaranteed to commit values to memory,
> to caches, or much of anything else.  Again, if you know that something
> following memory barrier A happened before something preceding memory
> barrier B, then you know that memory access preceding memory barrier A
> will be seen by a corresponding memory access following memory barrier B.
> 
> > > And we want to be sure that either the setlease caller sees the result
> > > of the atomic_inc, or the opener sees the result of the write to
> > > i_flock.
> > > 
> > > As an example, suppose the above steps happen in the order:
> > > 
> > > 	atomic_read [A]
> > > 	write i_flock [B]
> > > 	atomic_read [C]
> > > 				atomic_inc [X]
> > > 				read i_flock [Y]
> > 
> > (I've labelled the operations for convenience)
> > 
> > > How do we know that the read of i_flock [Y] at the last step reflects the
> > > latest value of i_flock?  For example, couldn't the write still be stuck in
> > > first CPU's cache?
> > 
> > Putting in memory barriers doesn't help here.  If A, B and C are all performed
> > and committed to memory by the time X happens, then Y will see B, but C will
> > not see X.
> 
> Indeed, you don't get both of these at once, except by accident.
> 
> > The thing to remember is that smp_mb() is not a commit operation: it doesn't
> > cause a modification to be committed to memory.  The reason you use it is to
> > make sure the CPU actually does preceding memory ops - eg. makes the
> > modification in question - before it goes and does any following memory ops.
> > 
> > Linux requires the system to be cache-coherent, so if the write is actually
> > performed by a CPU then the result will be obtained by any other CPU, no
> > matter whether it's still lingering in the first CPU's caches or whether it's
> > been passed on.
> 
> Or some later result, but yes.  Again, these accesses must be volatile
> (as in either atomic_read() or ACCESS_ONCE()), or the compiler is within
> its rights to do all sorts of mischief.

Again I'm confused here by the statement in memory-barriers.txt that
"All memory barriers except the data dependency barriers imply a
compiler barrier."  Shouldn't that mean that the compiler is forbidden
any mischief that would make accesses appear to be misordered with
respect to the barriers?  If a compiler barrier doesn't mean at least
that, then I can't figure out what's left that it could mean.

> > -*-
> > 
> > However, I could be wrong.  Memory barriers are mind-bending to try and think
> > through, especially when it's the operations being ordered are R-W vs R-W
> > rather than having W-W on at least one side.
> 
> There are 16 combinations, all of which do something.  Some cases require
> an external store to prove anything, though.  For example:
> 
> 	CPU 0: r1 = ACCESS_ONCE(x); smp_mb(); r2 = ACCESS_ONCE(y);
> 	CPU 1: r3 = ACCESS_ONCE(y); smp_mb(); r4 = ACCESS_ONCE(x);
> 	CPU 2: ACCESS_ONCE(x) = 1;
> 	CPU 3: ACCESS_ONCE(y) = 1;
> 
> Here, we know that if r2==0&&r3==1, then it is impossible to also have
> r4==0&&r1==1.  Similarly, if r4==0&&r1==1, then it is impossible to also
> have r2==0&&r3==1.  If there were no stores, then of course the reads
> could not tell you anything about the ordering.
> 
> > Hopefully Paul will be able to chime in
> 
> Careful what you wish for!  ;-)

Thanks so much for the explanations!

--b.

  parent reply	other threads:[~2013-08-19 13:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 19:07 [PATCH] locks: close potential race between setlease and open Jeff Layton
2013-08-14 12:11 ` [PATCH v3] " Jeff Layton
2013-08-15 19:32   ` Bruce Fields
2013-08-15 19:43     ` Jeff Layton
2013-08-15 20:30       ` Bruce Fields
2013-08-15 20:44     ` memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) David Howells
2013-08-15 21:31       ` Paul E. McKenney
2013-08-16 12:09         ` Jeff Layton
2013-08-19 13:31         ` Bruce Fields [this message]
2013-08-16 14:49 ` [PATCH v4] locks: close potential race between setlease and open Jeff Layton
2013-08-19 13:34   ` Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130819133148.GF6726@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=dhowells@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.