Memory Barrier Definitions

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Memory Barrier Definitions
@ 2002-05-07 19:07 Dave Engebretsen
  2002-05-07 19:49 ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-07 19:07 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have been working through a number of issues that became significant
on Power4 based systems, and wanted to start some discussion to
understand which other platforms are impacted in a similar way.  

The fundamental issue is that Power4 is weakly consistent and the
PowerPC architecture definitions for memory reference ordering do not
necessarily mesh well with the current Linux barrier primitive use. 
Obviously, we are not the only weakc platform, but I suspect the degree
and latencies we see push things more than most systems.  What is less
clear to me is how much PPC memory barrier symantics have in common with
other systems; presumably there are some which are similar.

As a specific example, on PowerPC the following memory barriers are
defined:

eieio: Orders all I/O references & store/store to system memory, but
seperatly
lwsync: Orders load/load, store/store, and load/store, only to system
memory 
sync: Orders everything

In terms of cycles, eieio is relatively cheap, lwsync is perhaps 100's,
while sync is measured in the 1000's.  The key is that only a sync
orders both system memory and I/O space references and it is very
expensive, so it should only be used where absolutely necessary, like in
a driver.

Linux defines (more or less) the following barriers:
mb, rmb, wmb, smp_mb, smp_wmb, smp_rmb

An example of where these primitives get us into trouble is the use of
wmb() to order two stores which are only to system memory (where a
lwsync would do for ppc64) and for a store to system memory followed by
a store to I/O (many examples in drivers).  Here ppc64 requires a sync. 
Therefore we must always pay the high price and use a sync for wmb().

A solution was pointed out by Rusty Russell that we should probabily be
using smp_*mb() for system memory ordering and reserve the *mb() calls
for when ordering against I/O is also required.  There does seem to be
some limited cases where this has been done, but in general *mb() are
used in most parts of the kernel.

Any thoughts if making better use of the smp_* macros would be the right
approach?

Thanks -

Dave Engebretsen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 19:07 Dave Engebretsen
@ 2002-05-07 19:49 ` Alan Cox
  2002-05-07 19:53   ` Dave Engebretsen
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2002-05-07 19:49 UTC (permalink / raw)
  To: Dave Engebretsen; +Cc: linux-kernel

> A solution was pointed out by Rusty Russell that we should probabily be
> using smp_*mb() for system memory ordering and reserve the *mb() calls

For pure compiler level ordering we have barrier()

Alan
 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 19:49 ` Alan Cox
@ 2002-05-07 19:53   ` Dave Engebretsen
  2002-05-07 20:27     ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-07 19:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > A solution was pointed out by Rusty Russell that we should probabily be
> > using smp_*mb() for system memory ordering and reserve the *mb() calls
> 
> For pure compiler level ordering we have barrier()
> 
> Alan
> 

Sure, but none of these issues I think need disscussion are a compiler
reordering.  Perhaps you are just pointing out another barrier primitive
to provide a more complete listing?  There are some others, such as the
*before_atomic* that will require a seperate discussion, I think.

In case my point was not clear, I'll restate: where PowerPC (at a
minimum) gets into trouble is with the seperate ordering between
references to system memory and to I/O space with respect to the various
forms of processor memory barrier instructions.  It is _very_ expensive
to blindly force all memory references to be ordered completely to the
seperate spaces.  The use of wmb(), rmb(), and mb() is overloaded in the
context of PowerPC.

Dave.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 19:53   ` Dave Engebretsen
@ 2002-05-07 20:27     ` Alan Cox
  2002-05-07 21:23       ` Dave Engebretsen
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Alan Cox @ 2002-05-07 20:27 UTC (permalink / raw)
  To: Dave Engebretsen; +Cc: Alan Cox, linux-kernel

> forms of processor memory barrier instructions.  It is _very_ expensive
> to blindly force all memory references to be ordered completely to the
> seperate spaces.  The use of wmb(), rmb(), and mb() is overloaded in the
> context of PowerPC.

I think I follow

You have

	Compiler ordering
	CPU v CPU memory ordering
	CPU v I/O memory ordering
	I/O v I/O memory ordering

and our current heirarchy is a little bit more squashed than that. I'd 
agree. We actually hit a corner case of this on the IDT winchip x86 where
we run relaxed store ordering and have to define wmb() as a locked add of
zero to the top of stack - which does have a penalty that isnt needed
for CPU ordering.

How much of this impacts Mips64 ?

Alan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 20:27     ` Alan Cox
@ 2002-05-07 21:23       ` Dave Engebretsen
  2002-05-07 22:15       ` justincarlson
  2002-05-07 22:57       ` Anton Blanchard
  2 siblings, 0 replies; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-07 21:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > forms of processor memory barrier instructions.  It is _very_ expensive
> I think I follow
> 
> You have
> 
>         Compiler ordering
>         CPU v CPU memory ordering
>         CPU v I/O memory ordering
>         I/O v I/O memory ordering
> 

Yep, that is a good summary.  And the problem arises from the very large
penalty for the syncronization form used for CPU v I/O ordering.  You
only want to pay that when necessary, certainly not when only CPU v CPU
ordering is required.  The difference can be on the order of a 1000
cycles (depending on many factors, of course).

Dave.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 20:27     ` Alan Cox
  2002-05-07 21:23       ` Dave Engebretsen
@ 2002-05-07 22:15       ` justincarlson
  2002-05-08  2:49         ` Dave Engebretsen
  2002-05-07 22:57       ` Anton Blanchard
  2 siblings, 1 reply; 23+ messages in thread
From: justincarlson @ 2002-05-07 22:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Dave Engebretsen, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Tue, 2002-05-07 at 16:27, Alan Cox wrote:
> and our current heirarchy is a little bit more squashed than that. I'd 
> agree. We actually hit a corner case of this on the IDT winchip x86 where
> we run relaxed store ordering and have to define wmb() as a locked add of
> zero to the top of stack - which does have a penalty that isnt needed
> for CPU ordering.
> 
> How much of this impacts Mips64 ?

In terms of the MIPS{32|64} ISA, the current primitives seem fine;
there's only 1 option defined in the ISA:  'sync'.  Order for all
off-cache accesses is guaranteed around a sync.

It gets a bit more complicated when you talk about what particular
implementations do, and ordering rules for uncached vs cached accesses,
but to the best of my knowledge there aren't any fundamental problems as
described for the PPC.

-Justin


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 20:27     ` Alan Cox
  2002-05-07 21:23       ` Dave Engebretsen
  2002-05-07 22:15       ` justincarlson
@ 2002-05-07 22:57       ` Anton Blanchard
  2002-05-13 18:16         ` Jesse Barnes
  2 siblings, 1 reply; 23+ messages in thread
From: Anton Blanchard @ 2002-05-07 22:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: Dave Engebretsen, linux-kernel, jbarnes


> You have
> 
> 	Compiler ordering
> 	CPU v CPU memory ordering
> 	CPU v I/O memory ordering
> 	I/O v I/O memory ordering

Yep. Maybe we could have:

CPU v CPU	smp_*mb or cpu_*mb 
CPU v I/O	*mb
I/O v I/O	io_*mb

Then again before Linus hits me on the head for hoarding vowels,

http://hypermail.spyroid.com/linux-kernel/archived/2001/week41/1270.html

I should suggest we make these a little less cryptic:

CPU v CPU	cpu_{read,write,memory}_barrier
CPU v I/O	{read,write,memory}_barrier
I/O v I/O	io_{read,write,memory}_barrier

> and our current heirarchy is a little bit more squashed than that. I'd 
> agree. We actually hit a corner case of this on the IDT winchip x86 where
> we run relaxed store ordering and have to define wmb() as a locked add of
> zero to the top of stack - which does have a penalty that isnt needed
> for CPU ordering.
> 
> How much of this impacts Mips64 ?

I remember some ia64 implementations have issues. Jesse, could you
fill us in again? I think you have problems with out of order
loads/stores to noncacheable space, right?

Anton

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 22:15       ` justincarlson
@ 2002-05-08  2:49         ` Dave Engebretsen
  2002-05-08 13:54           ` Justin Carlson
  2002-05-08 15:27           ` Dave Engebretsen
  0 siblings, 2 replies; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-08  2:49 UTC (permalink / raw)
  To: justincarlson; +Cc: Alan Cox, linux-kernel

justincarlson@cmu.edu wrote:
> 
> On Tue, 2002-05-07 at 16:27, Alan Cox wrote:
> > and our current heirarchy is a little bit more squashed than that. I'd
> > agree. We actually hit a corner case of this on the IDT winchip x86 where
> > we run relaxed store ordering and have to define wmb() as a locked add of
> > zero to the top of stack - which does have a penalty that isnt needed
> > for CPU ordering.
> >
> > How much of this impacts Mips64 ?
> 
> In terms of the MIPS{32|64} ISA, the current primitives seem fine;
> there's only 1 option defined in the ISA:  'sync'.  Order for all
> off-cache accesses is guaranteed around a sync.
> 
> It gets a bit more complicated when you talk about what particular
> implementations do, and ordering rules for uncached vs cached accesses,
> but to the best of my knowledge there aren't any fundamental problems as
> described for the PPC.
> 
> -Justin

PPC also guarantees every ordering when using the 'sync' instruction, so
that will give correctness at the price of a 1000 cycles or so.  You
refer to different rules for cached vs uncached on other implementations
-- that is the essence of our problem.  Are there different barrier
instructions in MIPS which provide different levels of performance for
different ordering enforcements?

Dave.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-08  2:49         ` Dave Engebretsen
@ 2002-05-08 13:54           ` Justin Carlson
  2002-05-08 15:27           ` Dave Engebretsen
  1 sibling, 0 replies; 23+ messages in thread
From: Justin Carlson @ 2002-05-08 13:54 UTC (permalink / raw)
  To: Dave Engebretsen; +Cc: justincarlson, Alan Cox, linux-kernel

On Tue, 2002-05-07 at 22:49, Dave Engebretsen wrote:

> PPC also guarantees every ordering when using the 'sync' instruction, so
> that will give correctness at the price of a 1000 cycles or so.  You
> refer to different rules for cached vs uncached on other implementations
> -- that is the essence of our problem.  Are there different barrier
> instructions in MIPS which provide different levels of performance for
> different ordering enforcements?
> 
> Dave.

No, there aren't.  The implementation details can affect which
primitives need to explicitly sync, though.  

For instance, the BRCM1250 makes some guarantees about visibility of
uncached writes that aren't strictly required by the architecture spec. 

-Justin


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-08  2:49         ` Dave Engebretsen
  2002-05-08 13:54           ` Justin Carlson
@ 2002-05-08 15:27           ` Dave Engebretsen
  2002-05-08 15:49             ` Andi Kleen
  2002-05-08 17:07             ` David Mosberger
  1 sibling, 2 replies; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-08 15:27 UTC (permalink / raw)
  To: justincarlson, Alan Cox, linux-kernel, anton, davidm, ak

Dave Engebretsen wrote:
> 
> justincarlson@cmu.edu wrote:
> >
> > On Tue, 2002-05-07 at 16:27, Alan Cox wrote:
> > > and our current heirarchy is a little bit more squashed than that. I'd
> > > agree. We actually hit a corner case of this on the IDT winchip x86 where
> > > we run relaxed store ordering and have to define wmb() as a locked add of
> > > zero to the top of stack - which does have a penalty that isnt needed
> > > for CPU ordering.
> > >
> > > How much of this impacts Mips64 ?
> >
> > In terms of the MIPS{32|64} ISA, the current primitives seem fine;
> > there's only 1 option defined in the ISA:  'sync'.  Order for all
> > off-cache accesses is guaranteed around a sync.
> >
> > It gets a bit more complicated when you talk about what particular
> > implementations do, and ordering rules for uncached vs cached accesses,
> > but to the best of my knowledge there aren't any fundamental problems as
> > described for the PPC.
> >
> > -Justin

I am curious what the definition of memory barriers is for IA64, Sparc,
and x86-64.  

>From what I can tell, sparc and x86-64 are like alpha and map directly
to the existing mb, wmb, and rmb semantics, incluing ordering between
system memory and I/O space.  Is that an accurate assesment?

IA64 has both the mf and mf.a instructions, one for system memory the
other for I/O space.  What is required for ordering of references
between the spaces?  That is not clear to me looking at the ia64
headers.

Thanks for any input -

Dave.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-08 15:27           ` Dave Engebretsen
@ 2002-05-08 15:49             ` Andi Kleen
  2002-05-08 17:07             ` David Mosberger
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2002-05-08 15:49 UTC (permalink / raw)
  To: Dave Engebretsen; +Cc: justincarlson, Alan Cox, linux-kernel, anton, davidm, ak

On Wed, May 08, 2002 at 10:27:10AM -0500, Dave Engebretsen wrote:
> I am curious what the definition of memory barriers is for IA64, Sparc,
> and x86-64.  
> 
> >From what I can tell, sparc and x86-64 are like alpha and map directly
> to the existing mb, wmb, and rmb semantics, incluing ordering between
> system memory and I/O space.  Is that an accurate assesment?

I don't think it is true for alpha, but it is true
for x86-64. x86-64 by default has strong ordering for most loads/stores.
It is possible to use weak ordering for special marked stores. For that
there are special read and write and read/write barriers which apply
to all memory (not distinction between io space and other memory). In 
addition there is a way to mark special memory areas as write combining and
some other settings, but that is ordered by the normal barriers too.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-08 15:27           ` Dave Engebretsen
  2002-05-08 15:49             ` Andi Kleen
@ 2002-05-08 17:07             ` David Mosberger
  2002-05-09  7:36               ` Rusty Russell
  1 sibling, 1 reply; 23+ messages in thread
From: David Mosberger @ 2002-05-08 17:07 UTC (permalink / raw)
  To: Dave Engebretsen; +Cc: justincarlson, Alan Cox, linux-kernel, anton, davidm, ak

>>>>> On Wed, 08 May 2002 10:27:10 -0500, Dave Engebretsen <engebret@vnet.ibm.com> said:

  Dave> I am curious what the definition of memory barriers is for
  Dave> IA64, Sparc, and x86-64.

I'm not sure it's enough to look just at the memory barriers.  The
barriers only make sense within the memory ordering model defined for
each architecture.  For ia64, this is defined in Section 4.4.7 of
the System Architecture Guide, which is available at:

 http://developer.intel.com/design/itanium/downloads/24531803s.htm

  Dave> From what I can tell, sparc and x86-64 are like alpha and map
  Dave> directly
  Dave> to the existing mb, wmb, and rmb semantics, incluing ordering
  Dave> between system memory and I/O space.  Is that an accurate
  Dave> assesment?

  Dave> IA64 has both the mf and mf.a instructions, one for system
  Dave> memory the other for I/O space.

The ia64 memory ordering model is quite orthogonal to the one that
Linux uses (which is based on the Alpha instructions): Linux
distinguishes between read and write memory barriers.  ia64 uses an
acquire/release model instead.  An acquire orders all *later* memory
accesses and a release orders all *earlier* accesses (regardless of
whether they are reads or writes).  Another difference is that the
acquire/release semantics is attached to load/store instructions,
respectively.  This means that in an ideal world, ia64 would rarely
need to use the memory barrier instruction.

Now, finding a way to abstract all the differences accross
architectures in a way that's easy to use and allows for optimal
implementation on each architecture may not be easy.  This problem
also shows up with user-level thread libraries and I have had on and
off discussions about this with Hans Boehm, but neither of us has
really had time to work on it seriously.  In truth, it is also the
case that for Itanium and Itanium 2, the cost of "mf" is small enough
that there hasn't been a huge need to get this exactly right.  But
when reworking the memory ordering model of Linux, it might just as
well be taken into account.

  Dave> What is required for ordering
  Dave> of references between the spaces?  That is not clear to me
  Dave> looking at the ia64 headers.

Look at Table 4-15 in the above document (on page 2-70).  I/O space is
simply a memory-mapped region that is mapped uncached.  In the table,
the row "Sequential" refers to uncached memory.  The quick summary is
that normal loads/stores to memory are not automatically ordered with
respect to accesses to uncached memory.

I'm also discussing some of these issues in my book
(http://www.lia64.org/book/) in the Device I/O chapter, but the
architecture manual mentioned above is of course the ultimate source
if you want all the gory details.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-08 17:07             ` David Mosberger
@ 2002-05-09  7:36               ` Rusty Russell
  2002-05-09  8:01                 ` Keith Owens
  2002-05-09 15:00                 ` David Mosberger
  0 siblings, 2 replies; 23+ messages in thread
From: Rusty Russell @ 2002-05-09  7:36 UTC (permalink / raw)
  To: davidm; +Cc: davidm, engebret, justincarlson, alan, linux-kernel, anton, ak

On Wed, 8 May 2002 10:07:08 -0700
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> The ia64 memory ordering model is quite orthogonal to the one that
> Linux uses (which is based on the Alpha instructions): Linux
> distinguishes between read and write memory barriers.  ia64 uses an
> acquire/release model instead.  An acquire orders all *later* memory
> accesses and a release orders all *earlier* accesses (regardless of
> whether they are reads or writes).  Another difference is that the
> acquire/release semantics is attached to load/store instructions,
> respectively.  This means that in an ideal world, ia64 would rarely
> need to use the memory barrier instruction.

Hmmm... could you explain more?  You're saying that every load is
an "acquire" and every store a "release"?  Or that they can be flagged
that way, but aren't always?

Does this means that an "acquire" means "all accesses after this insn
(in the code stream) must occur after this insn (in time)"?  Does
that only apply to the address that instruction touched, or all?

Confused,
Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-09  7:36               ` Rusty Russell
@ 2002-05-09  8:01                 ` Keith Owens
  2002-05-09 15:00                 ` David Mosberger
  1 sibling, 0 replies; 23+ messages in thread
From: Keith Owens @ 2002-05-09  8:01 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

On Thu, 9 May 2002 17:36:46 +1000, 
Rusty Russell <rusty@rustcorp.com.au> wrote:
>Hmmm... could you explain more?  You're saying that every load is
>an "acquire" and every store a "release"?  Or that they can be flagged
>that way, but aren't always?

cc trimmed.

The IA64 default is unordered memory accesses, except for special
instructions.  From Intel IA-64 Architecture Software Developer's
Manual.  Volume 1: IA-64 Application Architecture.

4.4.7 Memory Access Ordering

Memory data access ordering must satisfy read-after-write (RAW), 
write-after-write (WAW), and write-after-read (WAR) data dependencies
to the same memory location. In addition, memory writes and flushes
must observe control dependencies. Except for these restrictions,
reads, writes, and flushes may occur in an order different from the
specified program order. Note that no ordering exists between
instruction accesses and data accesses or between any two instruction 
accesses. The mechanisms described below are defined to enforce a
particular memory access order. In the following discussion, the terms
"previous" and "subsequent" are used to refer to the program specified
order. The term "visible" is used to refer to all architecturally
visible effects of performing a memory access (at a minimum this
involves reading or writing memory).

Memory accesses follow one of four memory ordering semantics:
unordered, release, acquire or fence.  Unordered data accesses may
become visible in any order. Release data accesses guarantee that all
previous data accesses are made visible prior to being made visible
themselves. Acquire data accesses guarantee that they are made visible
prior to all subsequent data accesses.  Fence operations combine the
release and acquire semantics into a bi-directional fence, i.e. they
guarantee that all previous data accesses are made visible prior to any
subsequent data accesses being made visible.

Explicit memory ordering takes the form of a set of instructions:
ordered load and ordered check load (ld.acq, ld.c.clr.acq), ordered
store (st.rel), semaphores (cmpxchg, xchg, fetchadd), and memory fence
(mf). The ld.acq and ld.c.clr.acq instructions follow acquire
semantics. The st.rel follows release semantics. The mf instruction is
a fence operation. The xchg, fetchadd.acq, and cmpxchg.acq instructions
have acquire semantics. The cmpxchg.rel, and fetchadd.rel instructions
have release semantics. The semaphore instructions also have implicit
ordering. If there is a write, it will always follow the read. In
addition, the read and write will be performed atomically with no
intervening accesses to the same memory region.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
@ 2002-05-09 11:33 Manfred Spraul
  2002-05-09 19:38 ` Dave Engebretsen
  0 siblings, 1 reply; 23+ messages in thread
From: Manfred Spraul @ 2002-05-09 11:33 UTC (permalink / raw)
  To: Dave Engebretsen, linux-kernel

 	
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

>
> An example of where these primitives get us into trouble is the use of
> wmb() to order two stores which are only to system memory (where a
> lwsync would do for ppc64) and for a store to system memory followed by
> a store to I/O (many examples in drivers).
>
2 questions:

1) Does that only affect memory barriers, or both memory barriers and
spinlocks?

example (from drivers/net/natsemi.c)

cpu0:
	spin_lock(&lock);
	writew(1, ioaddr+PGSEL);
	...
	writew(0, ioaddr+PGSEL);
	spin_unlock(&lock);

cpu1:
	spin_lock(&lock);
	readw(ioaddr+whatever);	// assumes that the register window is 0.

writew(1, ioaddr+PGSEL) selects a register window of the NIC. Are writew
and the spinlock synchonized on ppc64?

2) when you write "system memory", is that memory allocated with
kmalloc/gfp, or also memory allocated with pci_alloc_consistent()?

I've always assumed that
	pci_alloc_consistent_ptr->data=0;
	writew(0, ioaddr+TRIGGER);

is ordered, i.e. the memory write happens before the writew. Is that
guaranteed?

--
	Manfred

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-09  7:36               ` Rusty Russell
  2002-05-09  8:01                 ` Keith Owens
@ 2002-05-09 15:00                 ` David Mosberger
  2002-05-13  3:26                   ` Rusty Russell
  1 sibling, 1 reply; 23+ messages in thread
From: David Mosberger @ 2002-05-09 15:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: davidm, davidm, engebret, justincarlson, alan, linux-kernel,
	anton, ak

>>>>> On Thu, 9 May 2002 17:36:46 +1000, Rusty Russell <rusty@rustcorp.com.au> said:

  Rusty> On Wed, 8 May 2002 10:07:08 -0700 David Mosberger
  Rusty> <davidm@napali.hpl.hp.com> wrote:

  >> The ia64 memory ordering model is quite orthogonal to the one
  >> that Linux uses (which is based on the Alpha instructions): Linux
  >> distinguishes between read and write memory barriers.  ia64 uses
  >> an acquire/release model instead.  An acquire orders all *later*
  >> memory accesses and a release orders all *earlier* accesses
  >> (regardless of whether they are reads or writes).  Another
  >> difference is that the acquire/release semantics is attached to
  >> load/store instructions, respectively.  This means that in an
  >> ideal world, ia64 would rarely need to use the memory barrier
  >> instruction.

  Rusty> Hmmm... could you explain more?  You're saying that every
  Rusty> load is an "acquire" and every store a "release"?  Or that
  Rusty> they can be flagged that way, but aren't always?

The latter: loads can have "acquire" semantics and stores can have
"release" semantics.  For example, at the assembly level, ld8.acq
would be an 8-byte load with acquire semantics, st8.rel an 8-byte
store with release semantics.  At the C level, acquire/release
semantics is used for all accesses to "volatile" variables.

One way to think of all this is that using .acq/.rel for *all* memory
accesses will give you a memory model that exactly matches that of a
Pentium III.

  Rusty> Does this means that an "acquire" means "all accesses after
  Rusty> this insn (in the code stream) must occur after this insn (in
  Rusty> time)"?

Yes.

  Rusty> Does that only apply to the address that instruction
  Rusty> touched, or all?

No, the address doesn't matter (data dependencies are always honored).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-09 11:33 Memory Barrier Definitions Manfred Spraul
@ 2002-05-09 19:38 ` Dave Engebretsen
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Engebretsen @ 2002-05-09 19:38 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel

Manfred Spraul wrote:
> 
> 
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> 
> >
> > An example of where these primitives get us into trouble is the use of
> > wmb() to order two stores which are only to system memory (where a
> > lwsync would do for ppc64) and for a store to system memory followed by
> > a store to I/O (many examples in drivers).
> >
> 2 questions:
> 
> 1) Does that only affect memory barriers, or both memory barriers and
> spinlocks?
> 
> example (from drivers/net/natsemi.c)
> 
> cpu0:
>         spin_lock(&lock);
>         writew(1, ioaddr+PGSEL);
>         ...
>         writew(0, ioaddr+PGSEL);
>         spin_unlock(&lock);
> 
> cpu1:
>         spin_lock(&lock);
>         readw(ioaddr+whatever); // assumes that the register window is 0.
> 
> writew(1, ioaddr+PGSEL) selects a register window of the NIC. Are writew
> and the spinlock synchonized on ppc64?

This is an interesting example.  As the implementation stands today, for
this specific example, we are ok because the spin_lock/unlock pair
provides ordering within system memory access pairs OR i/o space pairs. 
Not across the types (we do not use the heavy weight sync).  So if there
are examples where the spin lock is meant to protect system memory
access to i/o space, we are in trouble.

> 2) when you write "system memory", is that memory allocated with
> kmalloc/gfp, or also memory allocated with pci_alloc_consistent()?
> 
> I've always assumed that
>         pci_alloc_consistent_ptr->data=0;
>         writew(0, ioaddr+TRIGGER);
> 
> is ordered, i.e. the memory write happens before the writew. Is that
> guaranteed?
> 

It is not guaranteed on all systems (PowerPC being an example). 
pci_alloc_consistent allocted storage is just normal system memory that
happens to be mapped to a PCI bus for DMA access.

Your example would fail, and in fact is basically what has been observed
to fail on Power4.  

What is needed is:

pci_alloc_consistent_ptr->data = 0;
wmb();
writew(0, ioaddr+TRIGGER);

This code also was observed to fail, when wmb() = eieio, which does not
order system memory accesses to I/O space accesses.

At present, we have worked around this by doing a heavy 'sync' before
and after writew and its ilk.  The point of my initial questions though
is that this fix is not exactly optimal :(

Dave.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-09 15:00                 ` David Mosberger
@ 2002-05-13  3:26                   ` Rusty Russell
  2002-05-13 16:36                     ` David Mosberger
  0 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2002-05-13  3:26 UTC (permalink / raw)
  To: davidm
  Cc: torvalds, engebret, justincarlson, alan, linux-kernel, anton, ak,
	paulus

On Thu, 9 May 2002 08:00:35 -0700
David Mosberger <davidm@napali.hpl.hp.com> wrote:
> The latter: loads can have "acquire" semantics and stores can have
> "release" semantics.  For example, at the assembly level, ld8.acq
> would be an 8-byte load with acquire semantics, st8.rel an 8-byte
> store with release semantics.  At the C level, acquire/release
> semantics is used for all accesses to "volatile" variables.

OK.  So ignoring the fact that you somehow have to attach your barriers
to a load or store for the moment, we have before vs. after (ia64),
read vs. write (most archs), io vs mem (ppc, ppc64), data dependency
vs non-data dependency (alpha), and smp vs up.

{read|write|readwrite} \
	_{before|after|bidir} \
		_{io|mem|iomem} \
			_{dd|nondd} \
				_{smp|nonsmp}

Now, I think the non-data-depends case is so common that it doesn't
belong in the name at all, but as a separate macro:

#ifndef __alpha__
	#define data_depends(barrier) (barrier)
#else
	#define data_depends(barrier) (ddep_##barrier)
#endif

Also, assuming that no data dependency on normal memory where it doesn't
matter on UP is the default, we can elide those.  Also, any barrier mentioning
IO can be assumed to be in force even if UP, so those combinations are invalid.
Note that UP with CONFIG_PREEMPT counts as SMP here:

	/* Complete all reads from normal memory before any normal
	   memory reads ;which follow this instruction, on SMP or PREEMPT */
	read_before();

	/* Do not begin any reads from normal memory which follow,
	   before any normal reads which preceed this instruction are
	   complete, on SMP or PREEMPT */
	read_after();

	/* read_before(); read_after(); */
	read_bidir();

	/* Complete all reads from IO before any IO reads which follow
	   this instruction. */
	read_before_io();

	/* Complete all reads (IO or memory) before any reads which follow
	   this instruction. */
	read_before_iomem();

	/* Do not begin any IO reads which follow, before any IO reads
	   which preceed this instruction are complete. */
	read_after_io();

	/* Do not begin any reads (IO or memory) which follow, before any
	   reads which preceed this instruction are complete. */
	read_after_iomem();

	/* read_before_io(); read_after_io(); */
	read_bidir_io();

	/* read_before_iomem(); read_after_iomem(); */
	read_bidir_iomem();

	/* read_before(), even if we are non-PREEMPT, non-SMP. */
	read_before_nonsmp();

	/* read_after(), even if we are non-PREEMPT, non-SMP. */
	read_after_nonsmp();

	/* read_before_nonsmp(); read_after_nonsmp(); */
	read_bidir_nonsmp();

Complete for write_* and readwrite_*.

Suggested semantics spin_lock():
	readwrite_after();
	readwrite_after_io();
ie. no interlock between io and memory: if you're doing both (eg acenic)
you need to put in your own iomem barrier (this is for the PPC folk).

Questions:
1) Can we elide any others?  In particular, can we remove ths _bidir_
   ones?
2) Should we prepend a "barr_" prefix?
3) Any variations I missed?

Cheers,
Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-13  3:26                   ` Rusty Russell
@ 2002-05-13 16:36                     ` David Mosberger
  2002-05-13 16:50                       ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: David Mosberger @ 2002-05-13 16:36 UTC (permalink / raw)
  To: Rusty Russell
  Cc: davidm, torvalds, engebret, justincarlson, alan, linux-kernel,
	anton, ak, paulus

>>>>> On Mon, 13 May 2002 13:26:05 +1000, Rusty Russell <rusty@rustcorp.com.au> said:

  Rusty> OK.  So ignoring the fact that you somehow have to attach
  Rusty> your barriers to a load or store for the moment, we have
  Rusty> before vs. after (ia64), read vs. write (most archs), io vs
  Rusty> mem (ppc, ppc64), data dependency vs non-data dependency
  Rusty> (alpha), and smp vs up.

An alternative way to think about the ia64 model is that it provides
"ordering" variables.  Accesses to those variables won't be reordered
by the compiler or the CPU and also order other (normally unordered
accesses).  One way to support this is have an ORDERING attribute for
variables (which would expand into "volatile" on ia64).  This would
have to be complemented by a set of barrier routines which will
achieve the desired ordering on machines that don't have the
acquire/release model of ia64 (and on ia64, they would expand into
nothing).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-13 16:36                     ` David Mosberger
@ 2002-05-13 16:50                       ` Linus Torvalds
  2002-05-13 17:53                         ` David Mosberger
  2002-05-13 23:28                         ` Rusty Russell
  0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2002-05-13 16:50 UTC (permalink / raw)
  To: davidm
  Cc: Rusty Russell, engebret, justincarlson, alan, linux-kernel, anton,
	ak, paulus

On Mon, 13 May 2002, David Mosberger wrote:
>
>  This would have to be complemented by a set of barrier routines which
> will achieve the desired ordering on machines that don't have the
> acquire/release model of ia64 (and on ia64, they would expand into
> nothing).

Earth to ia64, earth calling...

Until ia64 is a noticeable portion of the installed base, and indeed,
until it has shown that it can survive at all, we're not going to design
the Linux SMP memory ordering around that architecture.

If that means that ia64 will have to do strange things and maybe cannot
take advantage of its strange memory models, that's ok. Because reality
rules.

We're _not_ going to make up a complicated, big fancy new model. We might
tweak the current one a bit. And if that means that some architectures get
heavier barriers than they strictly need, then so be it. There are two
overriding concerns:

 - sanity: maybe it's better to have one mb() that is a sledgehammer but
   obvious, than it is to have many subtle variations that are just asking
   for subtle bugs.

 - x86 _owns_ the market right now, and we're not going to make up
   barriers that add overhead to x86. We may add barriers that end up
   being no-op's on x86 (because it is fairly ordered anyway), but
   basically it should be designed for the _common_ case, not for some
   odd-ball architecture that has sold machines mostly for test purposes.

The x86 situation is obviously just today. In five or ten years maybe
everybody agrees that we should follow the ia-64 model, and x86 can do
strange things that end up being slow.

			Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-13 16:50                       ` Linus Torvalds
@ 2002-05-13 17:53                         ` David Mosberger
  2002-05-13 23:28                         ` Rusty Russell
  1 sibling, 0 replies; 23+ messages in thread
From: David Mosberger @ 2002-05-13 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: davidm, Rusty Russell, engebret, justincarlson, alan,
	linux-kernel, anton, ak, paulus

>>>>> On Mon, 13 May 2002 09:50:01 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

  Linus> Until ia64 is a noticeable portion of the installed base, and
  Linus> indeed, until it has shown that it can survive at all, we're
  Linus> not going to design the Linux SMP memory ordering around that
  Linus> architecture.

Well, I hope we can *discuss* ideas for models that could accommodate
all platforms.

  Linus> We're _not_ going to make up a complicated, big fancy new
  Linus> model. We might tweak the current one a bit. And if that
  Linus> means that some architectures get heavier barriers than they
  Linus> strictly need, then so be it. There are two overriding
  Linus> concerns:

  Linus>  - sanity: maybe it's better to have one mb() that is a
  Linus> sledgehammer but obvious, than it is to have many subtle
  Linus> variations that are just asking for subtle bugs.

I tend to agree.

  Linus>  - x86 _owns_ the market right now, and we're not going to
  Linus> make up barriers that add overhead to x86. We may add
  Linus> barriers that end up being no-op's on x86 (because it is
  Linus> fairly ordered anyway), but basically it should be designed
  Linus> for the _common_ case, not for some odd-ball architecture
  Linus> that has sold machines mostly for test purposes.

Nobody suggested such a thing.

  Linus> The x86 situation is obviously just today. In five or ten
  Linus> years maybe everybody agrees that we should follow the ia-64
  Linus> model, and x86 can do strange things that end up being slow.

Geez, how about we spend a little time thinking about it *now*?
Perhaps Rusty can come up with a model that will be easy to program
for *and* work well for all platforms.  Wouldn't that be neat?  If
not, we can always fall back on the sledge hammer (unlike other
platforms, ia64 performance isn't affected much be extraneous memory
barriers).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-07 22:57       ` Anton Blanchard
@ 2002-05-13 18:16         ` Jesse Barnes
  0 siblings, 0 replies; 23+ messages in thread
From: Jesse Barnes @ 2002-05-13 18:16 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Alan Cox, Dave Engebretsen, linux-kernel

On Wed, May 08, 2002 at 08:57:52AM +1000, Anton Blanchard wrote:
> 
> > You have
> > 
> > 	Compiler ordering
> > 	CPU v CPU memory ordering
> > 	CPU v I/O memory ordering
> > 	I/O v I/O memory ordering
> 
> Yep. Maybe we could have:
> 
> CPU v CPU	smp_*mb or cpu_*mb 
> CPU v I/O	*mb
> I/O v I/O	io_*mb
> 
> Then again before Linus hits me on the head for hoarding vowels,
> 
> http://hypermail.spyroid.com/linux-kernel/archived/2001/week41/1270.html
> 
> I should suggest we make these a little less cryptic:
> 
> CPU v CPU	cpu_{read,write,memory}_barrier
> CPU v I/O	{read,write,memory}_barrier
> I/O v I/O	io_{read,write,memory}_barrier
> 
> > and our current heirarchy is a little bit more squashed than that. I'd 
> > agree. We actually hit a corner case of this on the IDT winchip x86 where
> > we run relaxed store ordering and have to define wmb() as a locked add of
> > zero to the top of stack - which does have a penalty that isnt needed
> > for CPU ordering.
> > 
> > How much of this impacts Mips64 ?
> 
> I remember some ia64 implementations have issues. Jesse, could you
> fill us in again? I think you have problems with out of order
> loads/stores to noncacheable space, right?

Both MIPS64 and our NUMA IA64 implementation have weakly ordered I/O.
The primitives outlined above should be sufficient to order I/O and
memory references on both platforms without unnecessary penalties.
Thanks for adding me to the Cc: list, sorry it took me so long to
respond.

It might also be good to summarize the ordering issues in a document
in the Documentation directory.  I've got a little something started
(see the ia64 patch for 2.5) for I/O ordering, and I have another
document that covers CPU memory ordering too that I could probably
contribute.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Memory Barrier Definitions
  2002-05-13 16:50                       ` Linus Torvalds
  2002-05-13 17:53                         ` David Mosberger
@ 2002-05-13 23:28                         ` Rusty Russell
  1 sibling, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2002-05-13 23:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: davidm, Rusty Russell, engebret, justincarlson, alan,
	linux-kernel, anton, ak, paulus

In message <Pine.LNX.4.44.0205130938380.19524-100000@home.transmeta.com> you wr
ite:
> We're _not_ going to make up a complicated, big fancy new model. We might
> tweak the current one a bit. And if that means that some architectures get
> heavier barriers than they strictly need, then so be it. There are two
> overriding concerns:
> 
>  - sanity: maybe it's better to have one mb() that is a sledgehammer but
>    obvious, than it is to have many subtle variations that are just asking
>    for subtle bugs.

NO NO NO.  Look at what actually happens now:

	void init_bh(int nr, void (*routine)(void))
	{
		bh_base[nr] = routine;
		mb();
	}

Now, what is this mb() for?  Are you sure?

If we can come up with a better fit between the macros and what the
code are trying to actually do, we win, even if they all map to the
same thing *today*.  While we're there, if we can get something that
fits with different architectures, great.

Clearer?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2002-05-13 23:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-09 11:33 Memory Barrier Definitions Manfred Spraul
2002-05-09 19:38 ` Dave Engebretsen
  -- strict thread matches above, loose matches on Subject: below --
2002-05-07 19:07 Dave Engebretsen
2002-05-07 19:49 ` Alan Cox
2002-05-07 19:53   ` Dave Engebretsen
2002-05-07 20:27     ` Alan Cox
2002-05-07 21:23       ` Dave Engebretsen
2002-05-07 22:15       ` justincarlson
2002-05-08  2:49         ` Dave Engebretsen
2002-05-08 13:54           ` Justin Carlson
2002-05-08 15:27           ` Dave Engebretsen
2002-05-08 15:49             ` Andi Kleen
2002-05-08 17:07             ` David Mosberger
2002-05-09  7:36               ` Rusty Russell
2002-05-09  8:01                 ` Keith Owens
2002-05-09 15:00                 ` David Mosberger
2002-05-13  3:26                   ` Rusty Russell
2002-05-13 16:36                     ` David Mosberger
2002-05-13 16:50                       ` Linus Torvalds
2002-05-13 17:53                         ` David Mosberger
2002-05-13 23:28                         ` Rusty Russell
2002-05-07 22:57       ` Anton Blanchard
2002-05-13 18:16         ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox