public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* mips octeon memory model questions
@ 2014-02-04 18:41 Peter Zijlstra
  2014-02-04 18:41 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-04 18:41 UTC (permalink / raw)
  To: David Daney, Ralf Baechle
  Cc: linux-arch, linux-mips, linux-kernel, Paul McKenney, Will Deacon,
	Linus Torvalds

Hi all,

I have a number of questions in regards to commit 6b07d38aaa520ce.

Given that the octeon doesn't reorder reads; the following:

"      sync
       ll ...
       .
       .
       .
       sc ...
       .
       .
       sync

  The second SYNC was redundant, but harmless.  "

Still doesn't make sense, because if we need the first sync to stop
writes from being re-ordered with the ll-sc, we also need the second
sync to avoid the same.

Suppose:
   STORE a
   sync
   LL-SC b
   (not a sync)
   STORE c

What avoids this becoming visible as:

  a
  c
  b

?

Then there is:

"       syncw;syncw
        ll
        .
        .
        .
        sc
        .
        .

    Has identical semantics to the first sequence, but is much faster.
    The SYNCW orders the writes, and the SC will not complete successfully
    until the write is committed to the coherent memory system.  So at the
    end all preceeding writes have been committed.  Since Octeon does not
    do speculative reads, this functions as a full barrier."

Read Documentation/memory-barrier.txt:TRANSITIVITY, the above doesn't
sound like syncw is actually multi-copy atomic, and therefore doesn't
provide transitivity, and therefore is not a valid sequence for
operations that are supposed to imply a full memory-barrier.

Please as to explain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* mips octeon memory model questions
  2014-02-04 18:41 mips octeon memory model questions Peter Zijlstra
@ 2014-02-04 18:41 ` Peter Zijlstra
  2014-02-04 18:58 ` Linus Torvalds
  2014-02-04 19:51 ` David Daney
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-04 18:41 UTC (permalink / raw)
  To: David Daney, Ralf Baechle
  Cc: linux-arch, linux-mips, linux-kernel, Paul McKenney, Will Deacon,
	Linus Torvalds

Hi all,

I have a number of questions in regards to commit 6b07d38aaa520ce.

Given that the octeon doesn't reorder reads; the following:

"      sync
       ll ...
       .
       .
       .
       sc ...
       .
       .
       sync

  The second SYNC was redundant, but harmless.  "

Still doesn't make sense, because if we need the first sync to stop
writes from being re-ordered with the ll-sc, we also need the second
sync to avoid the same.

Suppose:
   STORE a
   sync
   LL-SC b
   (not a sync)
   STORE c

What avoids this becoming visible as:

  a
  c
  b

?

Then there is:

"       syncw;syncw
        ll
        .
        .
        .
        sc
        .
        .

    Has identical semantics to the first sequence, but is much faster.
    The SYNCW orders the writes, and the SC will not complete successfully
    until the write is committed to the coherent memory system.  So at the
    end all preceeding writes have been committed.  Since Octeon does not
    do speculative reads, this functions as a full barrier."

Read Documentation/memory-barrier.txt:TRANSITIVITY, the above doesn't
sound like syncw is actually multi-copy atomic, and therefore doesn't
provide transitivity, and therefore is not a valid sequence for
operations that are supposed to imply a full memory-barrier.

Please as to explain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 18:41 mips octeon memory model questions Peter Zijlstra
  2014-02-04 18:41 ` Peter Zijlstra
@ 2014-02-04 18:58 ` Linus Torvalds
  2014-02-04 19:05   ` Peter Zijlstra
  2014-02-04 19:51 ` David Daney
  2 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-02-04 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Daney, Ralf Baechle, linux-arch@vger.kernel.org, linux-mips,
	Linux Kernel Mailing List, Paul McKenney, Will Deacon

On Tue, Feb 4, 2014 at 10:41 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Still doesn't make sense, because if we need the first sync to stop
> writes from being re-ordered with the ll-sc, we also need the second
> sync to avoid the same.

Presumably octeon doesn't do speculative writes, only *buffered* writes.

So writes move down, not up.

But it looks like Cavium is one of those clown companies that have a
"contact us" button for technical documentation rather than actually
making it available.

Christ, why would anybody do business with a tech company that hides
technical details? Seriously, that just stinks of "we have so many
bugs that we cannot make the documentation available".

                  Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 18:58 ` Linus Torvalds
@ 2014-02-04 19:05   ` Peter Zijlstra
  2014-02-04 19:05     ` Peter Zijlstra
  2014-02-04 19:16     ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-04 19:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Daney, Ralf Baechle, linux-arch@vger.kernel.org, linux-mips,
	Linux Kernel Mailing List, Paul McKenney, Will Deacon

On Tue, Feb 04, 2014 at 10:58:40AM -0800, Linus Torvalds wrote:
> On Tue, Feb 4, 2014 at 10:41 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Still doesn't make sense, because if we need the first sync to stop
> > writes from being re-ordered with the ll-sc, we also need the second
> > sync to avoid the same.
> 
> Presumably octeon doesn't do speculative writes, only *buffered* writes.

Speculative writes are bad.. :-)

> So writes move down, not up.

Right, but the ll-sc store might move down over a later store. Say
because the ll-sc needs to first get exclusive ownership of the
cacheline where the later store would be to an already owned line.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 19:05   ` Peter Zijlstra
@ 2014-02-04 19:05     ` Peter Zijlstra
  2014-02-04 19:16     ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-04 19:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Daney, Ralf Baechle, linux-arch@vger.kernel.org, linux-mips,
	Linux Kernel Mailing List, Paul McKenney, Will Deacon

On Tue, Feb 04, 2014 at 10:58:40AM -0800, Linus Torvalds wrote:
> On Tue, Feb 4, 2014 at 10:41 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Still doesn't make sense, because if we need the first sync to stop
> > writes from being re-ordered with the ll-sc, we also need the second
> > sync to avoid the same.
> 
> Presumably octeon doesn't do speculative writes, only *buffered* writes.

Speculative writes are bad.. :-)

> So writes move down, not up.

Right, but the ll-sc store might move down over a later store. Say
because the ll-sc needs to first get exclusive ownership of the
cacheline where the later store would be to an already owned line.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 19:05   ` Peter Zijlstra
  2014-02-04 19:05     ` Peter Zijlstra
@ 2014-02-04 19:16     ` Linus Torvalds
  2014-02-04 19:39       ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-02-04 19:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Daney, Ralf Baechle, linux-arch@vger.kernel.org, linux-mips,
	Linux Kernel Mailing List, Paul McKenney, Will Deacon

On Tue, Feb 4, 2014 at 11:05 AM, Peter Zijlstra
>
>> So writes move down, not up.
>
> Right, but the ll-sc store might move down over a later store.

Unlikely. The thing is, in order for the sc to succeed, it has to
already have hit the cache coherency domain (or at least reserved it -
ie maybe the value is not actually *in* the cache, but the sc needs to
have gotten exclusive access to the cacheline).

So just how do you expect a later store (that is *after* the
conditional branch that tests the result of the sc) to move up before
it?

I'm not saying it's physically impossible: speculation is always
possible. But it would require some rather clever speculative store
buffers or caches and killing of same when mispredicted. Which is
actually fairly unlikely, since stores are seldom - if ever - in the
critical path. IOW, "lots and lots of effort for very little gain".

So I'm personally quite willing to believe that a
sc+conditional-branch+st is quite well ordered without any extra
barriers. I'd be more worried about *reads* moving up past the sc
("doesn't reorder reads" to me would imply not moving across the "ll"
part, but it's quite likely that the "sc" actually counts as both a
read and a write).

Without any visible documentation (see aforementioned "clown company"
comment) all of this is obviously just pure speculation.

              Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 19:16     ` Linus Torvalds
@ 2014-02-04 19:39       ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-04 19:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Daney, Ralf Baechle, linux-arch@vger.kernel.org, linux-mips,
	Linux Kernel Mailing List, Paul McKenney, Will Deacon

On Tue, Feb 04, 2014 at 11:16:58AM -0800, Linus Torvalds wrote:
> On Tue, Feb 4, 2014 at 11:05 AM, Peter Zijlstra
> >
> >> So writes move down, not up.
> >
> > Right, but the ll-sc store might move down over a later store.
> 
> Unlikely. The thing is, in order for the sc to succeed, it has to
> already have hit the cache coherency domain (or at least reserved it -
> ie maybe the value is not actually *in* the cache, but the sc needs to
> have gotten exclusive access to the cacheline).
> 
> So just how do you expect a later store (that is *after* the
> conditional branch that tests the result of the sc) to move up before
> it?

Ah, I completely overlooked the control dependency to the subsequent
store.

Yes, given that this makes sense.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 18:41 mips octeon memory model questions Peter Zijlstra
  2014-02-04 18:41 ` Peter Zijlstra
  2014-02-04 18:58 ` Linus Torvalds
@ 2014-02-04 19:51 ` David Daney
  2014-02-06 11:45   ` Peter Zijlstra
  2 siblings, 1 reply; 9+ messages in thread
From: David Daney @ 2014-02-04 19:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ralf Baechle, linux-arch, linux-mips, linux-kernel, Paul McKenney,
	Will Deacon, Linus Torvalds

On 02/04/2014 10:41 AM, Peter Zijlstra wrote:
> Hi all,
>
> I have a number of questions in regards to commit 6b07d38aaa520ce.
>
> Given that the octeon doesn't reorder reads; the following:
>
> "      sync
>         ll ...
>         .
>         .
>         .
>         sc ...
>         .
>         .
>         sync
>
>    The second SYNC was redundant, but harmless.  "
>
> Still doesn't make sense, because if we need the first sync to stop
> writes from being re-ordered with the ll-sc, we also need the second
> sync to avoid the same.
>
> Suppose:
>     STORE a
>     sync
>     LL-SC b
>     (not a sync)
>     STORE c
>
> What avoids this becoming visible as:
>
>    a
>    c
>    b

On OCTEON, SC implies a SYNC operation for the target memory location. 
So the "SC b" is ordered before any writes that come after the SC.


>
> ?
>
> Then there is:
>
> "       syncw;syncw
>          ll
>          .
>          .
>          .
>          sc
>          .
>          .
>
>      Has identical semantics to the first sequence, but is much faster.
>      The SYNCW orders the writes, and the SC will not complete successfully
>      until the write is committed to the coherent memory system.  So at the
>      end all preceeding writes have been committed.  Since Octeon does not
>      do speculative reads, this functions as a full barrier."
>
> Read Documentation/memory-barrier.txt:TRANSITIVITY, the above doesn't
> sound like syncw is actually multi-copy atomic, and therefore doesn't
> provide transitivity, and therefore is not a valid sequence for
> operations that are supposed to imply a full memory-barrier.
>
> Please as to explain.
>

It makes my head hurt.

The sequence:

    SYNCW
    LL a
    <other instructions that are not stores>
    SC a


Should function as a "<general barrier>".


I can try to explain why I think this is so:

Coherency is managed by the L2 Cache controller.

Each CPU has an n-entry write buffer.  The SYNCW insures that all 
preceding stores will commit before the store of the SC.  the 
instruction after the SC will not execute until the SC's store is committed.

The full SYNC instruction functions in a similar manner to the above 
sequence.  The only difference is that it doesn't have the side effect 
of modifying the target of the SC instruction.

In both cases all the stores are committed, and following loads are 
delayed until the commit is acknowledged.

Note:  All this is based on my understanding of the OCTEON 
micro-architecture.  I have not done any exhaustive testing Transitivity 
principle with respect to SYNCW/LL/SC as described above.

David Daney

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mips octeon memory model questions
  2014-02-04 19:51 ` David Daney
@ 2014-02-06 11:45   ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-02-06 11:45 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, linux-arch, linux-mips, linux-kernel, Paul McKenney,
	Will Deacon, Linus Torvalds

On Tue, Feb 04, 2014 at 11:51:31AM -0800, David Daney wrote:
> On OCTEON, SC implies a SYNC operation for the target memory location. So
> the "SC b" is ordered before any writes that come after the SC.

Ah, that makes it all come together. I was thrown by octeon initially
having WEAK_REORDERING_BEYOND_LLSC set and thus thinking there were no
implied barriers.

Thanks David.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-06 11:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-04 18:41 mips octeon memory model questions Peter Zijlstra
2014-02-04 18:41 ` Peter Zijlstra
2014-02-04 18:58 ` Linus Torvalds
2014-02-04 19:05   ` Peter Zijlstra
2014-02-04 19:05     ` Peter Zijlstra
2014-02-04 19:16     ` Linus Torvalds
2014-02-04 19:39       ` Peter Zijlstra
2014-02-04 19:51 ` David Daney
2014-02-06 11:45   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox