Re: [PATCH] Linux: Implement membarrier function

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>,
	mathieu.desnoyers@efficios.com,
	Florian Weimer <fweimer@redhat.com>,
	triegel@redhat.com, libc-alpha@sourceware.org,
	andrea.parri@amarulasolutions.com, will.deacon@arm.com,
	peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
	dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
	akiyks@gmail.com, dlustig@nvidia.com, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Linux: Implement membarrier function
Date: Tue, 11 Dec 2018 11:08:01 -0800	[thread overview]
Message-ID: <20181211190801.GO4170@linux.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1812111115020.1538-100000@iolanthe.rowland.org>

On Tue, Dec 11, 2018 at 11:21:15AM -0500, Alan Stern wrote:
> On Mon, 10 Dec 2018, Paul E. McKenney wrote:
> 
> > On Mon, Dec 10, 2018 at 11:22:31AM -0500, Alan Stern wrote:
> > > On Thu, 6 Dec 2018, Paul E. McKenney wrote:
> > > 
> > > > Hello, David,
> > > > 
> > > > I took a crack at extending LKMM to accommodate what I think would
> > > > support what you have in your paper.  Please see the very end of this
> > > > email for a patch against the "dev" branch of my -rcu tree.
> > > > 
> > > > This gives the expected result for the following three litmus tests,
> > > > but is probably deficient or otherwise misguided in other ways.  I have
> > > > added the LKMM maintainers on CC for their amusement.  ;-)
> > > > 
> > > > Thoughts?
> > > 
> > > Since sys_membarrier() provides a heavyweight barrier comparable to 
> > > synchronize_rcu(), the memory model should treat the two in the same 
> > > way.  That's what this patch does.
> > > 
> > > The corresponding critical section would be any region of code bounded
> > > by compiler barriers.  Since the LKMM doesn't currently handle plain
> > > accesses, the effect is the same as if a compiler barrier were present
> > > between each pair of instructions.  Basically, each instruction acts as
> > > its own critical section.  Therefore the patch below defines memb-rscsi
> > > as the trivial identity relation.  When plain accesses and compiler 
> > > barriers are added to the memory model, a different definition will be 
> > > needed.
> > > 
> > > This gives the correct results for the three C-Goldblat-memb-* litmus 
> > > tests in Paul's email.
> > 
> > Yow!!!
> > 
> > My first reaction was that this cannot possibly be correct because
> > sys_membarrier(), which is probably what we should call it, does not
> > wait for anything.  But your formulation has the corresponding readers
> > being "id", which as you say above is just a single event.
> > 
> > But what makes this work for the following litmus test?
> > 
> > ------------------------------------------------------------------------
> > 
> > C membrcu
> > 
> > {
> > }
> > 
> > P0(intptr_t *x0, intptr_t *x1)
> > {
> > 	WRITE_ONCE(*x0, 2);
> > 	smp_memb();
> > 	intptr_t r2 = READ_ONCE(*x1);
> > }
> > 
> > 
> > P1(intptr_t *x1, intptr_t *x2)
> > {
> > 	WRITE_ONCE(*x1, 2);
> > 	smp_memb();
> > 	intptr_t r2 = READ_ONCE(*x2);
> > }
> > 
> > 
> > P2(intptr_t *x2, intptr_t *x3)
> > {
> > 	WRITE_ONCE(*x2, 2);
> > 	smp_memb();
> > 	intptr_t r2 = READ_ONCE(*x3);
> > }
> > 
> > 
> > P3(intptr_t *x3, intptr_t *x4)
> > {
> > 	rcu_read_lock();
> > 	WRITE_ONCE(*x3, 2);
> > 	intptr_t r2 = READ_ONCE(*x4);
> > 	rcu_read_unlock();
> > }
> > 
> > 
> > P4(intptr_t *x4, intptr_t *x5)
> > {
> > 	rcu_read_lock();
> > 	WRITE_ONCE(*x4, 2);
> > 	intptr_t r2 = READ_ONCE(*x5);
> > 	rcu_read_unlock();
> > }
> > 
> > 
> > P5(intptr_t *x0, intptr_t *x5)
> > {
> > 	rcu_read_lock();
> > 	WRITE_ONCE(*x5, 2);
> > 	intptr_t r2 = READ_ONCE(*x0);
> > 	rcu_read_unlock();
> > }
> > 
> > exists
> > (5:r2=0 /\ 0:r2=0 /\ 1:r2=0 /\ 2:r2=0 /\ 3:r2=0 /\ 4:r2=0)
> > 
> > ------------------------------------------------------------------------
> > 
> > For this, herd gives "Never".  Of course, if I reverse the write and
> > read in any of P3(), P4(), or P5(), I get "Sometimes", which does make
> > sense.  But what is preserving the order between P3() and P4() and
> > between P4() and P5()?  I am not immediately seeing how the analogy
> > with RCU carries over to this case.
> 
> That isn't how it works.  Nothing preserves the orders you mentioned.
> It's more like: the order between P1 and P4 is preserved, as is the
> order between P0 and P5.  You'll see below...
> 
> (I readily agree that this result is not simple or obvious.  It took me
> quite a while to formulate the following analysis.)

For whatever it is worth, David Goldblatt agrees with you to at
least some extent.  I have sent him an inquiry.  ;-)

> To begin with, since there aren't any synchronize_rcu calls in the
> test, the rcu_read_lock and rcu_read_unlock calls do nothing.  They
> can be eliminated.

Agreed.  I was just being lazy.

> Also, I find the variable names "x0" - "x5" to be a little hard to
> work with.  If you don't mind, I'll replace them with "a" - "f".

Easy enough to translate, so have at it!

> Now, a little digression on how sys_membarrier works.  It starts by
> executing a full memory barrier.  Then it injects memory barriers into
> the instruction streams of all the other CPUs and waits for them all
> to complete.  Then it executes an ending memory barrier.
> 
> These barriers are ordered as described.  Therefore we have
> 
> 	mb0s < mb05 < mb0e,
> 	mb1s < mb14 < mb1e,  and
> 	mb2s < mb23 < mb2e,
> 
> where mb0s is the starting barrier of the sys_memb call on P0, mb05 is
> the barrier that it injects into P5, mb0e is the ending barrier of the
> call, and similarly for the other sys_memb calls.  The '<' signs mean
> that the thing on their left finishes before the thing on their right
> does.
> 
> Rewriting the litmus test in these terms gives:
> 
>         P0      P1      P2      P3      P4      P5
>         Wa=2    Wb=2    Wc=2    [mb23]  [mb14]  [mb05]
>         mb0s    mb1s    mb2s    Wd=2    We=2    Wf=2
>         mb0e    mb1e    mb2e    Re=0    Rf=0    Ra=0
>         Rb=0    Rc=0    Rd=0
> 
> Here the brackets in "[mb23]", "[mb14]", and "[mb05]" mean that the
> positions of these barriers in their respective threads' program
> orderings is undetermined; they need not come at the top as shown.
> 
> (Also, in case David is unfamiliar with it, the "Wa=2" notation is
> shorthand for "Write 2 to a" and "Rb=0" is short for "Read 0 from b".)
> 
> Finally, here are a few facts which may be well known and obvious, but
> I'll state them anyway:
> 
> 	A CPU cannot reorder instructions across a memory barrier.
> 	If x is po-after a barrier then x executes after the barrier
> 	is finished.
> 
> 	If a store is po-before a barrier then the store propagates
> 	to every CPU before the barrier finishes.
> 
> 	If a store propagates to some CPU before a load on that CPU
> 	reads from the same location, then the load will obtain the
> 	value from that store or a co-later store.  This implies that
> 	if a load obtains a value co-earlier than some store then the
> 	load must have executed before the store propagated to the
> 	load's CPU.
> 
> The proof consists of three main stages, each requiring three steps.
> Using the facts that b - f are all read as 0, I'll show that P1
> executes Rc before P3 executes Re, then that P0 executes Rb before P4
> executes Rf, and lastly that P5's Ra must obtain 2, not 0.  This will
> demonstrate that the litmus test is not allowed.
> 
> 1.	Suppose that mb23 ends up coming po-later than Wd in P3.
> 	Then we would have:
> 
> 		Wd propagates to P2 < mb23 < mb2e < Rd,
> 
> 	and so Rd would obtain 2, not 0.  Hence mb23 must come
> 	po-before Wd (as shown in the listing):  mb23 < Wd.
> 
> 2.	Since mb23 therefore occurs po-before Re and instructions
> 	cannot be reordered across barriers,  mb23 < Re.
> 
> 3.	Since Rc obtains 0, we must have:
> 
> 		Rc < Wc propagates to P1 < mb2s < mb23 < Re.
> 
> 	Thus Rc < Re.
> 
> 4.	Suppose that mb14 ends up coming po-later than We in P4.
> 	Then we would have:
> 
> 		We propagates to P3 < mb14 < mb1e < Rc < Re,
> 
> 	and so Re would obtain 2, not 0.  Hence mb14 must come
> 	po-before We (as shown in the listing):  mb14 < We.
> 
> 5.	Since mb14 therefore occurs po-before Rf and instructions
> 	cannot be reordered across barriers,  mb14 < Rf.
> 
> 6.	Since Rb obtains 0, we must have:
> 
> 		Rb < Wb propagates to P0 < mb1s < mb14 < Rf.
> 
> 	Thus Rb < Rf.
> 
> 7.	Suppose that mb05 ends up coming po-later than Wf in P5.
> 	Then we would have:
> 
> 		Wf propagates to P4 < mb05 < mb0e < Rb < Rf,
> 
> 	and so Rf would obtain 2, not 0.  Hence mb05 must come
> 	po-before Wf (as shown in the listing):  mb05 < Wf.
> 
> 8.	Since mb05 therefore occurs po-before Ra and instructions
> 	cannot be reordered across barriers,  mb05 < Ra.
> 
> 9.	Now we have:
> 
> 		Wa propagates to P5 < mb0s < mb05 < Ra,
> 
> 	and so Ra must obtain 2, not 0.  QED.

Like this, then, with maximal reordering of P3-P5's reads?

         P0      P1      P2      P3      P4      P5
         Wa=2
         mb0s
                                                 [mb05]
         mb0e                                    Ra=0
         Rb=0    Wb=2
                 mb1s
                                         [mb14]
                 mb1e                    Rf=0
                 Rc=0    Wc=2                    Wf=2
                         mb2s
                                 [mb23]
                         mb2e    Re=0
                         Rd=0            We=2
                                 Wd=2

But don't the sys_membarrier() calls affect everyone, especially given
the shared-variable communication?  If so, why wouldn't this more strict
variant hold?

         P0      P1      P2      P3      P4      P5
         Wa=2
         mb0s
                                 [mb05]  [mb05]  [mb05]
         mb0e
         Rb=0    Wb=2
                 mb1s
                                 [mb14]  [mb14]  [mb14]
                 mb1e
                 Rc=0    Wc=2
                         mb2s
                                 [mb23]  [mb23]  [mb23]
                         mb2e    Re=0    Rf=0    Ra=0
                         Rd=0            We=2    Wf=2
                                 Wd=2

In which case, wouldn't this cycle be forbidden even if it had only one
sys_membarrier() call?

Ah, but the IPIs are not necessarily synchronized across the CPUs,
so that the following could happen:

         P0      P1      P2      P3      P4      P5
         Wa=2
         mb0s
                                 [mb05]  [mb05]  [mb05]
         mb0e                                    Ra=0
         Rb=0    Wb=2
                 mb1s
                                 [mb14]  [mb14]
                                         Rf=0
                                                 Wf=2
                                                 [mb14]
                 mb1e
                 Rc=0    Wc=2
                         mb2s
                                 [mb23]
                                 Re=0
                                         We=2
                                         [mb23]  [mb23]
                         mb2e
                         Rd=0
                                 Wd=2

I guess in light of this post in 2001, I really don't have an excuse,
do I?  ;-)

	https://lists.gt.net/linux/kernel/223555

Or am I still missing something here?

							Thanx, Paul

next prev parent reply	other threads:[~2018-12-11 19:08 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8736rldyzm.fsf@oldenburg.str.redhat.com>
     [not found] ` <1543444466.5493.220.camel@redhat.com>
     [not found]   ` <87y39c2dsg.fsf@oldenburg.str.redhat.com>
     [not found]     ` <1689938209.14804.1543502662882.JavaMail.zimbra@efficios.com>
     [not found]       ` <20181129150433.GH4170@linux.ibm.com>
     [not found]         ` <CAHD6eXcvx1bskbp-X+vuMYoMQiCLOt0PiCZ5FT1yFsda9Ud-yA@mail.gmail.com>
2018-12-06 21:54           ` [PATCH] Linux: Implement membarrier function Paul E. McKenney
2018-12-06 21:54             ` Paul E. McKenney
2018-12-10 16:22             ` Alan Stern
2018-12-10 16:22               ` Alan Stern
2018-12-10 18:25               ` Paul E. McKenney
2018-12-10 18:25                 ` Paul E. McKenney
2018-12-11 16:21                 ` Alan Stern
2018-12-11 16:21                   ` Alan Stern
2018-12-11 19:08                   ` Paul E. McKenney [this message]
2018-12-11 19:08                     ` Paul E. McKenney
2018-12-11 20:09                     ` Alan Stern
2018-12-11 20:09                       ` Alan Stern
2018-12-11 21:22                       ` Paul E. McKenney
2018-12-11 21:22                         ` Paul E. McKenney
2018-12-12 17:07                         ` Paul E. McKenney
2018-12-12 17:07                           ` Paul E. McKenney
2018-12-12 18:04                           ` Alan Stern
2018-12-12 18:04                             ` Alan Stern
2018-12-12 19:42                             ` Paul E. McKenney
2018-12-12 19:42                               ` Paul E. McKenney
2018-12-12 21:32                               ` Alan Stern
2018-12-12 21:32                                 ` Alan Stern
2018-12-12 21:52                                 ` Paul E. McKenney
2018-12-12 21:52                                   ` Paul E. McKenney
2018-12-12 22:12                                   ` Alan Stern
2018-12-12 22:12                                     ` Alan Stern
2018-12-12 22:49                                     ` Paul E. McKenney
2018-12-12 22:49                                       ` Paul E. McKenney
2018-12-13 15:49                                       ` Alan Stern
2018-12-13 15:49                                         ` Alan Stern
2018-12-14  0:20                                         ` Paul E. McKenney
2018-12-14  0:20                                           ` Paul E. McKenney
2018-12-14  2:26                                           ` Alan Stern
2018-12-14  2:26                                             ` Alan Stern
2018-12-14  5:20                                             ` Paul E. McKenney
2018-12-14  5:20                                               ` Paul E. McKenney
2018-12-14 15:31                                           ` Alan Stern
2018-12-14 15:31                                             ` Alan Stern
2018-12-14 18:43                                             ` Paul E. McKenney
2018-12-14 18:43                                               ` Paul E. McKenney
2018-12-14 21:39                                               ` Alan Stern
2018-12-14 21:39                                                 ` Alan Stern
2018-12-16 18:51                                                 ` Paul E. McKenney
2018-12-16 18:51                                                   ` Paul E. McKenney
2018-12-17 16:02                                                   ` Alan Stern
2018-12-17 16:02                                                     ` Alan Stern
2018-12-17 18:32                                                     ` Paul E. McKenney
2018-12-17 18:32                                                       ` Paul E. McKenney
2018-12-12 22:19                                   ` Paul E. McKenney
2018-12-12 22:19                                     ` Paul E. McKenney
2018-12-11  6:42             ` David Goldblatt
2018-12-11  6:42               ` David Goldblatt
2018-12-11 14:49               ` Paul E. McKenney
2018-12-11 14:49                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181211190801.GO4170@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akiyks@gmail.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=boqun.feng@gmail.com \
    --cc=davidtgoldblatt@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dlustig@nvidia.com \
    --cc=fweimer@redhat.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=triegel@redhat.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox