All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
@ 2006-03-07 10:34 ` Andi Kleen
  2006-03-07 18:30   ` David Howells
  2006-03-07 17:47 ` Stephen Hemminger
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 61+ messages in thread
From: Andi Kleen @ 2006-03-07 10:34 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

On Tuesday 07 March 2006 18:40, David Howells wrote:

> +Older and less complex CPUs will perform memory accesses in exactly the order
> +specified, so if one is given the following piece of code:
> +
> +	a = *A;
> +	*B = b;
> +	c = *C;
> +	d = *D;
> +	*E = e;
> +
> +It can be guaranteed that it will complete the memory access for each
> +instruction before moving on to the next line, leading to a definite sequence
> +of operations on the bus:

Actually gcc is free to reorder it 
(often it will not when it cannot prove that they don't alias, but sometimes
it can)

> +
> +     Consider, for example, an ethernet chipset such as the AMD PCnet32. It
> +     presents to the CPU an "address register" and a bunch of "data registers".
> +     The way it's accessed is to write the index of the internal register you
> +     want to access to the address register, and then read or write the
> +     appropriate data register to access the chip's internal register:
> +
> +	*ADR = ctl_reg_3;
> +	reg = *DATA;

You're not supposed to do it this way anyways. The official way to access
MMIO space is using read/write[bwlq]

Haven't read all of it sorry, but thanks for the work of documenting 
it.

-Andi


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:30   ` David Howells
@ 2006-03-07 11:13     ` Andi Kleen
  2006-03-07 19:24       ` David Howells
  2006-03-07 18:46     ` Jesse Barnes
  2006-03-07 19:23     ` Bryan O'Sullivan
  2 siblings, 1 reply; 61+ messages in thread
From: Andi Kleen @ 2006-03-07 11:13 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

On Tuesday 07 March 2006 19:30, David Howells wrote:

> > You're not supposed to do it this way anyways. The official way to access
> > MMIO space is using read/write[bwlq]
> 
> True, I suppose. I should make it clear that these accessor functions imply
> memory barriers, if indeed they do, 

I don't think they do.

> and that you should use them rather than 
> accessing I/O registers directly (at least, outside the arch you should).

Even inside the architecture it's a good idea.

-Andi


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 19:23     ` Bryan O'Sullivan
@ 2006-03-07 11:57       ` Andi Kleen
  2006-03-07 20:01         ` Jesse Barnes
                           ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Andi Kleen @ 2006-03-07 11:57 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tuesday 07 March 2006 20:23, Bryan O'Sullivan wrote:
> On Tue, 2006-03-07 at 18:30 +0000, David Howells wrote:
> 
> > True, I suppose. I should make it clear that these accessor functions imply
> > memory barriers, if indeed they do,
> 
> They don't, but according to Documentation/DocBook/deviceiobook.tmpl
> they are performed by the compiler in the order specified.

I don't think that's correct. Probably the documentation should
be fixed.

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] Document Linux's memory barriers
@ 2006-03-07 17:40 David Howells
  2006-03-07 10:34 ` Andi Kleen
                   ` (6 more replies)
  0 siblings, 7 replies; 61+ messages in thread
From: David Howells @ 2006-03-07 17:40 UTC (permalink / raw)
  To: torvalds, akpm, mingo; +Cc: linux-arch, linuxppc64-dev, linux-kernel


The attached patch documents the Linux kernel's memory barriers.

Signed-Off-By: David Howells <dhowells@redhat.com>
---
warthog>diffstat -p1 mb.diff 
 Documentation/memory-barriers.txt |  359 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 359 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
new file mode 100644
index 0000000..c2fc51b
--- /dev/null
+++ b/Documentation/memory-barriers.txt
@@ -0,0 +1,359 @@
+			 ============================
+			 LINUX KERNEL MEMORY BARRIERS
+			 ============================
+
+Contents:
+
+ (*) What are memory barriers?
+
+ (*) Linux kernel memory barrier functions.
+
+ (*) Implied kernel memory barriers.
+
+ (*) i386 and x86_64 arch specific notes.
+
+
+=========================
+WHAT ARE MEMORY BARRIERS?
+=========================
+
+Memory barriers are instructions to both the compiler and the CPU to impose a
+partial ordering between the memory access operations specified either side of
+the barrier.
+
+Older and less complex CPUs will perform memory accesses in exactly the order
+specified, so if one is given the following piece of code:
+
+	a = *A;
+	*B = b;
+	c = *C;
+	d = *D;
+	*E = e;
+
+It can be guaranteed that it will complete the memory access for each
+instruction before moving on to the next line, leading to a definite sequence
+of operations on the bus:
+
+	read *A, write *B, read *C, read *D, write *E.
+
+However, with newer and more complex CPUs, this isn't always true because:
+
+ (*) they can rearrange the order of the memory accesses to promote better use
+     of the CPU buses and caches;
+
+ (*) reads are synchronous and may need to be done immediately to permit
+     progress, whereas writes can often be deferred without a problem;
+
+ (*) and they are able to combine reads and writes to improve performance when
+     talking to the SDRAM (modern SDRAM chips can do batched accesses of
+     adjacent locations, cutting down on transaction setup costs).
+
+So what you might actually get from the above piece of code is:
+
+	read *A, read *C+*D, write *E, write *B
+
+Under normal operation, this is probably not going to be a problem; however,
+there are two circumstances where it definitely _can_ be a problem:
+
+ (1) I/O
+
+     Many I/O devices can be memory mapped, and so appear to the CPU as if
+     they're just memory locations. However, to control the device, the driver
+     has to make the right accesses in exactly the right order.
+
+     Consider, for example, an ethernet chipset such as the AMD PCnet32. It
+     presents to the CPU an "address register" and a bunch of "data registers".
+     The way it's accessed is to write the index of the internal register you
+     want to access to the address register, and then read or write the
+     appropriate data register to access the chip's internal register:
+
+	*ADR = ctl_reg_3;
+	reg = *DATA;
+
+     The problem with a clever CPU or a clever compiler is that the write to
+     the address register isn't guaranteed to happen before the access to the
+     data register, if the CPU or the compiler thinks it is more efficient to
+     defer the address write:
+
+	read *DATA, write *ADR
+
+     then things will break.
+
+     The way to deal with this is to insert an I/O memory barrier between the
+     two accesses:
+
+	*ADR = ctl_reg_3;
+	mb();
+	reg = *DATA;
+
+     In this case, the barrier makes a guarantee that all memory accesses
+     before the barrier will happen before all the memory accesses after the
+     barrier. It does _not_ guarantee that all memory accesses before the
+     barrier will be complete by the time the barrier is complete.
+
+ (2) Multiprocessor interaction
+
+     When there's a system with more than one processor, these may be working
+     on the same set of data, but attempting not to use locks as locks are
+     quite expensive. This means that accesses that affect both CPUs may have
+     to be carefully ordered to prevent error.
+
+     Consider the R/W semaphore slow path. In that, a waiting process is
+     queued on the semaphore, as noted by it having a record on its stack
+     linked to the semaphore's list:
+
+	struct rw_semaphore {
+		...
+		struct list_head waiters;
+	};
+
+	struct rwsem_waiter {
+		struct list_head list;
+		struct task_struct *task;
+	};
+
+     To wake up the waiter, the up_read() or up_write() functions have to read
+     the pointer from this record to know as to where the next waiter record
+     is, clear the task pointer, call wake_up_process() on the task, and
+     release the task struct reference held:
+
+	READ waiter->list.next;
+	READ waiter->task;
+	WRITE waiter->task;
+	CALL wakeup
+	RELEASE task
+
+     If any of these steps occur out of order, then the whole thing may fail.
+
+     Note that the waiter does not get the semaphore lock again - it just waits
+     for its task pointer to be cleared. Since the record is on its stack, this
+     means that if the task pointer is cleared _before_ the next pointer in the
+     list is read, then another CPU might start processing the waiter and it
+     might clobber its stack before up*() functions have a chance to read the
+     next pointer.
+
+	CPU 0				CPU 1
+	===============================	===============================
+					down_xxx()
+					Queue waiter
+					Sleep
+	up_yyy()
+	READ waiter->task;
+	WRITE waiter->task;
+	<preempt>
+					Resume processing
+					down_xxx() returns
+					call foo()
+					foo() clobbers *waiter
+	</preempt>
+	READ waiter->list.next;
+	--- OOPS ---
+
+     This could be dealt with using a spinlock, but then the down_xxx()
+     function has to get the spinlock again after it's been woken up, which is
+     a waste of resources.
+
+     The way to deal with this is to insert an SMP memory barrier:
+
+	READ waiter->list.next;
+	READ waiter->task;
+	smp_mb();
+	WRITE waiter->task;
+	CALL wakeup
+	RELEASE task
+
+     In this case, the barrier makes a guarantee that all memory accesses
+     before the barrier will happen before all the memory accesses after the
+     barrier. It does _not_ guarantee that all memory accesses before the
+     barrier will be complete by the time the barrier is complete.
+
+     SMP memory barriers are normally no-ops on a UP system because the CPU
+     orders overlapping accesses with respect to itself.
+
+
+=====================================
+LINUX KERNEL MEMORY BARRIER FUNCTIONS
+=====================================
+
+The Linux kernel has six basic memory barriers:
+
+		MANDATORY (I/O)	SMP
+		===============	================
+	GENERAL	mb()		smp_mb()
+	READ	rmb()		smp_rmb()
+	WRITE	wmb()		smp_wmb()
+
+General memory barriers make a guarantee that all memory accesses specified
+before the barrier will happen before all memory accesses specified after the
+barrier.
+
+Read memory barriers make a guarantee that all memory reads specified before
+the barrier will happen before all memory reads specified after the barrier.
+
+Write memory barriers make a guarantee that all memory writes specified before
+the barrier will happen before all memory writes specified after the barrier.
+
+SMP memory barriers are no-ops on uniprocessor compiled systems because it is
+assumed that a CPU will be self-consistent, and will order overlapping accesses
+with respect to itself.
+
+There is no guarantee that any of the memory accesses specified before a memory
+barrier will be complete by the completion of a memory barrier; the barrier can
+be considered to draw a line in the access queue that accesses of the
+appropriate type may not cross.
+
+There is no guarantee that issuing a memory barrier on one CPU will have any
+direct effect on another CPU or any other hardware in the system. The indirect
+effect will be the order the first CPU commits its accesses to the bus.
+
+Note that these are the _minimum_ guarantees. Different architectures may give
+more substantial guarantees, but they may not be relied upon outside of arch
+specific code.
+
+
+There are some more advanced barriering functions:
+
+ (*) set_mb(var, value)
+ (*) set_wmb(var, value)
+
+     These assign the value to the variable and then insert at least a write
+     barrier after it, depending on the function.
+
+
+==============================
+IMPLIED KERNEL MEMORY BARRIERS
+==============================
+
+Some of the other functions in the linux kernel imply memory barriers. For
+instance all the following (pseudo-)locking functions imply barriers.
+
+ (*) interrupt disablement and/or interrupts
+ (*) spin locks
+ (*) R/W spin locks
+ (*) mutexes
+ (*) semaphores
+ (*) R/W semaphores
+
+In all cases there are variants on a LOCK operation and an UNLOCK operation.
+
+ (*) LOCK operation implication:
+
+     Memory accesses issued after the LOCK will be completed after the LOCK
+     accesses have completed.
+
+     Memory accesses issued before the LOCK may be completed after the LOCK
+     accesses have completed.
+
+ (*) UNLOCK operation implication:
+
+     Memory accesses issued before the UNLOCK will be completed before the
+     UNLOCK accesses have completed.
+
+     Memory accesses issued after the UNLOCK may be completed before the UNLOCK
+     accesses have completed.
+
+ (*) LOCK vs UNLOCK implication:
+
+     The LOCK accesses will be completed before the unlock accesses.
+
+Locks and semaphores may not provide any guarantee of ordering on UP compiled
+systems, and so can't be counted on in such a situation to actually do
+anything at all, especially with respect to I/O memory barriering.
+
+Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
+memory and I/O accesses individually, or interrupt handling will barrier
+memory and I/O accesses on entry and on exit. This prevents an interrupt
+routine interfering with accesses made in a disabled-interrupt section of code
+and vice versa.
+
+This specification is a _minimum_ guarantee; any particular architecture may
+provide more substantial guarantees, but these may not be relied upon outside
+of arch specific code.
+
+
+As an example, consider the following:
+
+	*A = a;
+	*B = b;
+	LOCK
+	*C = c;
+	*D = d;
+	UNLOCK
+	*E = e;
+	*F = f;
+
+The following sequence of events on the bus is acceptable:
+
+	LOCK, *F+*A, *E, *C+*D, *B, UNLOCK
+
+But none of the following are:
+
+	*F+*A, *B,	LOCK, *C, *D,	UNLOCK, *E
+	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
+	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
+	*B,		LOCK, *C, *D,	UNLOCK, *F+*A, *E
+
+
+Consider also the following (going back to the AMD PCnet example):
+
+	DISABLE IRQ
+	*ADR = ctl_reg_3;
+	mb();
+	x = *DATA;
+	*ADR = ctl_reg_4;
+	mb();
+	*DATA = y;
+	*ADR = ctl_reg_5;
+	mb();
+	z = *DATA;
+	ENABLE IRQ
+	<interrupt>
+	*ADR = ctl_reg_7;
+	mb();
+	q = *DATA
+	</interrupt>
+
+What's to stop "z = *DATA" crossing "*ADR = ctl_reg_7" and reading from the
+wrong register? (There's no guarantee that the process of handling an
+interrupt will barrier memory accesses in any way).
+
+
+==============================
+I386 AND X86_64 SPECIFIC NOTES
+==============================
+
+Earlier i386 CPUs (pre-Pentium-III) are fully ordered - the operations on the
+bus appear in program order - and so there's no requirement for any sort of
+explicit memory barriers.
+
+From the Pentium-III onwards were three new memory barrier instructions:
+LFENCE, SFENCE and MFENCE which correspond to the kernel memory barrier
+functions rmb(), wmb() and mb(). However, there are additional implicit memory
+barriers in the CPU implementation:
+
+ (*) Interrupt processing implies mb().
+
+ (*) The LOCK prefix adds implication of mb() on whatever instruction it is
+     attached to.
+
+ (*) Normal writes to memory imply wmb() [and so SFENCE is normally not
+     required].
+
+ (*) Normal writes imply a semi-rmb(): reads before a write may not complete
+     after that write, but reads after a write may complete before the write
+     (ie: reads may go _ahead_ of writes).
+
+ (*) Non-temporal writes imply no memory barrier, and are the intended target
+     of SFENCE.
+
+ (*) Accesses to uncached memory imply mb() [eg: memory mapped I/O].
+
+
+======================
+POWERPC SPECIFIC NOTES
+======================
+
+The powerpc is weakly ordered, and its read and write accesses may be
+completed generally in any order. It's memory barriers are also to some extent
+more substantial than the mimimum requirement, and may directly effect
+hardware outside of the CPU.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
  2006-03-07 10:34 ` Andi Kleen
@ 2006-03-07 17:47 ` Stephen Hemminger
  2006-03-07 18:40 ` Alan Cox
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2006-03-07 17:47 UTC (permalink / raw)
  To: linux-kernel

This has been needed for quite some time but needs some more
additions:

1) Access to i/o mapped memory does not need memory barriers.

2) Explain difference between mb() and barrier().

3) Explain wmb() versus mmiowb()

Give some more examples of correct usage in drivers.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 10:34 ` Andi Kleen
@ 2006-03-07 18:30   ` David Howells
  2006-03-07 11:13     ` Andi Kleen
                       ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: David Howells @ 2006-03-07 18:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Andi Kleen <ak@suse.de> wrote:

> Actually gcc is free to reorder it 
> (often it will not when it cannot prove that they don't alias, but sometimes
> it can)

Yeah... I have mentioned the fact that compilers can reorder too, but
obviously not enough.

> You're not supposed to do it this way anyways. The official way to access
> MMIO space is using read/write[bwlq]

True, I suppose. I should make it clear that these accessor functions imply
memory barriers, if indeed they do, and that you should use them rather than
accessing I/O registers directly (at least, outside the arch you should).

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
  2006-03-07 10:34 ` Andi Kleen
  2006-03-07 17:47 ` Stephen Hemminger
@ 2006-03-07 18:40 ` Alan Cox
  2006-03-07 18:54     ` linux-os (Dick Johnson)
  2006-03-07 20:09   ` David Howells
  2006-03-08  2:07 ` Nick Piggin
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-07 18:40 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

On Maw, 2006-03-07 at 17:40 +0000, David Howells wrote:
> +Older and less complex CPUs will perform memory accesses in exactly the order
> +specified, so if one is given the following piece of code:

Not really true. Some of the fairly old dumb processors don't do this to
the bus, and just about anything with a cache wont (as it'll burst cache
lines to main memory)

> +     want to access to the address register, and then read or write the
> +     appropriate data register to access the chip's internal register:
> +
> +	*ADR = ctl_reg_3;
> +	reg = *DATA;

Not allowed anyway

> +     In this case, the barrier makes a guarantee that all memory accesses
> +     before the barrier will happen before all the memory accesses after the
> +     barrier. It does _not_ guarantee that all memory accesses before the
> +     barrier will be complete by the time the barrier is complete.

Better meaningful example would be barriers versus an IRQ handler. Which
leads nicely onto section 2

> +General memory barriers make a guarantee that all memory accesses specified
> +before the barrier will happen before all memory accesses specified after the
> +barrier.

No. They guarantee that to an observer also running on that set of
processors the accesses to main memory will appear to be ordered in that
manner. They don't guarantee I/O related ordering for non main memory
due to things like PCI posting rules and NUMA goings on.

As an example of the difference here a Geode will reorder stores as it
feels but snoop the bus such that it can ensure an external bus master
cannot observe this by holding it off the bus to fix up ordering
violations first.

> +Read memory barriers make a guarantee that all memory reads specified before
> +the barrier will happen before all memory reads specified after the barrier.
> +
> +Write memory barriers make a guarantee that all memory writes specified before
> +the barrier will happen before all memory writes specified after the barrier.

Both with the caveat above

> +There is no guarantee that any of the memory accesses specified before a memory
> +barrier will be complete by the completion of a memory barrier; the barrier can
> +be considered to draw a line in the access queue that accesses of the
> +appropriate type may not cross.

CPU generated accesses to main memory

> + (*) interrupt disablement and/or interrupts
> + (*) spin locks
> + (*) R/W spin locks
> + (*) mutexes
> + (*) semaphores
> + (*) R/W semaphores

Should probably cover schedule() here.

> +Locks and semaphores may not provide any guarantee of ordering on UP compiled
> +systems, and so can't be counted on in such a situation to actually do
> +anything at all, especially with respect to I/O memory barriering.

_irqsave/_irqrestore ...


> +==============================
> +I386 AND X86_64 SPECIFIC NOTES
> +==============================
> +
> +Earlier i386 CPUs (pre-Pentium-III) are fully ordered - the operations on the
> +bus appear in program order - and so there's no requirement for any sort of
> +explicit memory barriers.

Actually they are not. Processors prior to Pentium Pro ensure that the
perceived ordering between processors of writes to main memory is
preserved. The Pentium Pro is supposed to but does not in SMP cases. Our
spin_unlock code knows about this. It also has some problems with this
situation when handling write combining memory. The IDT Winchip series
processors are run in out of order store mode and our lock functions and
dmamappers should know enough about this. 

On x86 memory barriers for read serialize order using lock instructions,
on write the winchip at least generates serializing instructions.

barrier() is pure CPU level of course

> + (*) Normal writes to memory imply wmb() [and so SFENCE is normally not
> +     required].

Only at an on processor level and not for all clones, also there are
errata here for PPro.

> + (*) Accesses to uncached memory imply mb() [eg: memory mapped I/O].

Not always. MMIO ordering is outside of the CPU ordering rules and into
PCI and other bus ordering rules. Consider

	writel(STOP_DMA, &foodev->ctrl);
	free_dma_buffers(foodev);

This leads to horrible disasters.


> +
> +======================
> +POWERPC SPECIFIC NOTES

Can't comment on PPC 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:30   ` David Howells
  2006-03-07 11:13     ` Andi Kleen
@ 2006-03-07 18:46     ` Jesse Barnes
  2006-03-07 19:23     ` Bryan O'Sullivan
  2 siblings, 0 replies; 61+ messages in thread
From: Jesse Barnes @ 2006-03-07 18:46 UTC (permalink / raw)
  To: David Howells
  Cc: Andi Kleen, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tuesday, March 7, 2006 10:30 am, David Howells wrote:
> True, I suppose. I should make it clear that these accessor functions
> imply memory barriers, if indeed they do, and that you should use them
> rather than accessing I/O registers directly (at least, outside the
> arch you should).

But they don't, that's why we have mmiowb().  There are lots of cases to 
handle:
  1) memory vs. memory
  2) memory vs. I/O
  3) I/O vs. I/O
(reads and writes for every case).

AFAIK, we have (1) fairly well handled with a plethora of barrier ops.  
(2) is a bit fuzzy with the current operations I think, and for (3) all 
we have is mmiowb() afaik.  Maybe one of the ppc64 guys can elaborate on 
the barriers their hw needs for the above cases (I think they're the 
pathological case, so covering them should be good enough everybody).

Btw, thanks for putting together this documentation, it's desperately 
needed.

Jesse

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:40 ` Alan Cox
@ 2006-03-07 18:54     ` linux-os (Dick Johnson)
  2006-03-07 20:09   ` David Howells
  1 sibling, 0 replies; 61+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 18:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	Linux kernel


On Tue, 7 Mar 2006, Alan Cox wrote:
[SNIPPED...]
>
> Not always. MMIO ordering is outside of the CPU ordering rules and into
> PCI and other bus ordering rules. Consider
>
> 	writel(STOP_DMA, &foodev->ctrl);
> 	free_dma_buffers(foodev);
>
> This leads to horrible disasters.

This might be a good place to document:
    dummy = readl(&foodev->ctrl);

Will flush all pending writes to the PCI bus and that:
    (void) readl(&foodev->ctrl);
... won't because `gcc` may optimize it away. In fact, variable
"dummy" should be global or `gcc` may make it go away as well.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
@ 2006-03-07 18:54     ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 61+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 18:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	Linux kernel


On Tue, 7 Mar 2006, Alan Cox wrote:
[SNIPPED...]
>
> Not always. MMIO ordering is outside of the CPU ordering rules and into
> PCI and other bus ordering rules. Consider
>
> 	writel(STOP_DMA, &foodev->ctrl);
> 	free_dma_buffers(foodev);
>
> This leads to horrible disasters.

This might be a good place to document:
    dummy = readl(&foodev->ctrl);

Will flush all pending writes to the PCI bus and that:
    (void) readl(&foodev->ctrl);
... won't because `gcc` may optimize it away. In fact, variable
"dummy" should be global or `gcc` may make it go away as well.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:54     ` linux-os (Dick Johnson)
  (?)
@ 2006-03-07 19:06     ` Matthew Wilcox
  2006-03-07 19:15         ` linux-os (Dick Johnson)
  -1 siblings, 1 reply; 61+ messages in thread
From: Matthew Wilcox @ 2006-03-07 19:06 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Alan Cox, David Howells, torvalds, akpm, mingo, linux-arch,
	linuxppc64-dev, Linux kernel

On Tue, Mar 07, 2006 at 01:54:33PM -0500, linux-os (Dick Johnson) wrote:
> This might be a good place to document:
>     dummy = readl(&foodev->ctrl);
> 
> Will flush all pending writes to the PCI bus and that:
>     (void) readl(&foodev->ctrl);
> ... won't because `gcc` may optimize it away. In fact, variable
> "dummy" should be global or `gcc` may make it go away as well.

static inline unsigned int readl(const volatile void __iomem *addr)
{
	return *(volatile unsigned int __force *) addr;
}

The cast is volatile, so gcc knows not to optimise it away.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 19:06     ` Matthew Wilcox
@ 2006-03-07 19:15         ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 61+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 19:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alan Cox, David Howells, torvalds, akpm, mingo, linux-arch,
	linuxppc64-dev, Linux kernel


On Tue, 7 Mar 2006, Matthew Wilcox wrote:

> On Tue, Mar 07, 2006 at 01:54:33PM -0500, linux-os (Dick Johnson) wrote:
>> This might be a good place to document:
>>     dummy = readl(&foodev->ctrl);
>>
>> Will flush all pending writes to the PCI bus and that:
>>     (void) readl(&foodev->ctrl);
>> ... won't because `gcc` may optimize it away. In fact, variable
>> "dummy" should be global or `gcc` may make it go away as well.
>
> static inline unsigned int readl(const volatile void __iomem *addr)
> {
> 	return *(volatile unsigned int __force *) addr;
> }
>
> The cast is volatile, so gcc knows not to optimise it away.
>

When the assignment is not made a.k.a., cast to void, or when the
assignment is made to an otherwise unused variable, `gcc` does,
indeed make it go away. These problems caused weeks of chagrin
after it was found that a PCI DMA operation took 20 or more times
than it should. The writel(START_DMA, &control), followed by
a dummy = readl(&control), ended up with the readl() missing.
That meant that the DMA didn't start until some timer code
read a status register, wondering why it hadn't completed yet.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
@ 2006-03-07 19:15         ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 61+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 19:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alan Cox, David Howells, torvalds, akpm, mingo, linux-arch,
	linuxppc64-dev, Linux kernel


On Tue, 7 Mar 2006, Matthew Wilcox wrote:

> On Tue, Mar 07, 2006 at 01:54:33PM -0500, linux-os (Dick Johnson) wrote:
>> This might be a good place to document:
>>     dummy = readl(&foodev->ctrl);
>>
>> Will flush all pending writes to the PCI bus and that:
>>     (void) readl(&foodev->ctrl);
>> ... won't because `gcc` may optimize it away. In fact, variable
>> "dummy" should be global or `gcc` may make it go away as well.
>
> static inline unsigned int readl(const volatile void __iomem *addr)
> {
> 	return *(volatile unsigned int __force *) addr;
> }
>
> The cast is volatile, so gcc knows not to optimise it away.
>

When the assignment is not made a.k.a., cast to void, or when the
assignment is made to an otherwise unused variable, `gcc` does,
indeed make it go away. These problems caused weeks of chagrin
after it was found that a PCI DMA operation took 20 or more times
than it should. The writel(START_DMA, &control), followed by
a dummy = readl(&control), ended up with the readl() missing.
That meant that the DMA didn't start until some timer code
read a status register, wondering why it hadn't completed yet.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:30   ` David Howells
  2006-03-07 11:13     ` Andi Kleen
  2006-03-07 18:46     ` Jesse Barnes
@ 2006-03-07 19:23     ` Bryan O'Sullivan
  2006-03-07 11:57       ` Andi Kleen
  2 siblings, 1 reply; 61+ messages in thread
From: Bryan O'Sullivan @ 2006-03-07 19:23 UTC (permalink / raw)
  To: David Howells
  Cc: Andi Kleen, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tue, 2006-03-07 at 18:30 +0000, David Howells wrote:

> True, I suppose. I should make it clear that these accessor functions imply
> memory barriers, if indeed they do,

They don't, but according to Documentation/DocBook/deviceiobook.tmpl
they are performed by the compiler in the order specified.

They also convert between PCI byte order and CPU byte order.  If you
want to avoid that, you need the __raw_* versions, which are not
guaranteed to be provided by all arches.

	<b


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 11:13     ` Andi Kleen
@ 2006-03-07 19:24       ` David Howells
  2006-03-07 19:46         ` Stephen Hemminger
  0 siblings, 1 reply; 61+ messages in thread
From: David Howells @ 2006-03-07 19:24 UTC (permalink / raw)
  To: Andi Kleen, Stephen Hemminger, Jesse Barnes
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Andi Kleen <ak@suse.de> wrote:

> > > You're not supposed to do it this way anyways. The official way to access
> > > MMIO space is using read/write[bwlq]
> > 
> > True, I suppose. I should make it clear that these accessor functions imply
> > memory barriers, if indeed they do, 
> 
> I don't think they do.

Hmmm.. Seems Stephen Hemminger disagrees:

| > > 1) Access to i/o mapped memory does not need memory barriers.
| > 
| > There's no guarantee of that. On FRV you have to insert barriers as
| > appropriate when you're accessing I/O mapped memory if ordering is required
| > (accessing an ethernet card vs accessing a frame buffer), but support for
| > inserting the appropriate barriers is built into gcc - which knows the rules
| > for when to insert them.
| > 
| > Or are you referring to the fact that this should be implicit in inX(),
| > outX(), readX(), writeX() and similar?
| 
| yes

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:54     ` linux-os (Dick Johnson)
  (?)
  (?)
@ 2006-03-07 19:33     ` Alan Cox
  -1 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-07 19:33 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	Linux kernel

On Maw, 2006-03-07 at 13:54 -0500, linux-os (Dick Johnson) wrote:
> On Tue, 7 Mar 2006, Alan Cox wrote:
> > 	writel(STOP_DMA, &foodev->ctrl);
> > 	free_dma_buffers(foodev);
> >
> > This leads to horrible disasters.
> 
> This might be a good place to document:
>     dummy = readl(&foodev->ctrl);

Absolutely. And this falls outside of the memory barrier functions.
> 
> Will flush all pending writes to the PCI bus and that:
>     (void) readl(&foodev->ctrl);
> ... won't because `gcc` may optimize it away. In fact, variable
> "dummy" should be global or `gcc` may make it go away as well.

If they were ordinary functions then maybe, but they are not so a simple
readl(&foodev->ctrl) will be sufficient and isn't optimised away.

Alan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 19:24       ` David Howells
@ 2006-03-07 19:46         ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2006-03-07 19:46 UTC (permalink / raw)
  To: linux-kernel

On Tue, 07 Mar 2006 19:24:03 +0000
David Howells <dhowells@redhat.com> wrote:

> Andi Kleen <ak@suse.de> wrote:
> 
> > > > You're not supposed to do it this way anyways. The official way to access
> > > > MMIO space is using read/write[bwlq]
> > > 
> > > True, I suppose. I should make it clear that these accessor functions imply
> > > memory barriers, if indeed they do, 
> > 
> > I don't think they do.
> 
> Hmmm.. Seems Stephen Hemminger disagrees:
> 
> | > > 1) Access to i/o mapped memory does not need memory barriers.
> | > 
> | > There's no guarantee of that. On FRV you have to insert barriers as
> | > appropriate when you're accessing I/O mapped memory if ordering is required
> | > (accessing an ethernet card vs accessing a frame buffer), but support for
> | > inserting the appropriate barriers is built into gcc - which knows the rules
> | > for when to insert them.
> | > 
> | > Or are you referring to the fact that this should be implicit in inX(),
> | > outX(), readX(), writeX() and similar?
> | 

The problem with all this is like physics it is all relative to the observer.
I get confused an lost when talking about the general case because there are so many possible
specific examples where a barrier is or is not needed.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 11:57       ` Andi Kleen
@ 2006-03-07 20:01         ` Jesse Barnes
  2006-03-07 21:14         ` Bryan O'Sullivan
  2006-03-08  0:35         ` Alan Cox
  2 siblings, 0 replies; 61+ messages in thread
From: Jesse Barnes @ 2006-03-07 20:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Bryan O'Sullivan, David Howells, torvalds, akpm, mingo,
	linux-arch, linuxppc64-dev, linux-kernel

On Tuesday, March 7, 2006 3:57 am, Andi Kleen wrote:
> On Tuesday 07 March 2006 20:23, Bryan O'Sullivan wrote:
> > On Tue, 2006-03-07 at 18:30 +0000, David Howells wrote:
> > > True, I suppose. I should make it clear that these accessor
> > > functions imply memory barriers, if indeed they do,
> >
> > They don't, but according to Documentation/DocBook/deviceiobook.tmpl
> > they are performed by the compiler in the order specified.
>
> I don't think that's correct. Probably the documentation should
> be fixed.

On ia64 I'm pretty sure it's true, and it seems like it should be in the 
general case too.  The compiler shouldn't reorder uncached memory 
accesses with volatile semantics...

Jesse

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 18:40 ` Alan Cox
  2006-03-07 18:54     ` linux-os (Dick Johnson)
@ 2006-03-07 20:09   ` David Howells
  2006-03-08  0:32     ` Alan Cox
  2006-03-08  8:25     ` Duncan Sands
  1 sibling, 2 replies; 61+ messages in thread
From: David Howells @ 2006-03-07 20:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> Better meaningful example would be barriers versus an IRQ handler. Which
> leads nicely onto section 2

Yes, except that I can't think of one that's feasible that doesn't have to do
with I/O - which isn't a problem if you are using the proper accessor
functions.

Such an example has to involve more than one CPU, because you don't tend to
get memory/memory ordering problems on UP.

The obvious one might be circular buffers, except there's no problem there
provided you have a memory barrier between accessing the buffer and updating
your pointer into it.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 11:57       ` Andi Kleen
  2006-03-07 20:01         ` Jesse Barnes
@ 2006-03-07 21:14         ` Bryan O'Sullivan
  2006-03-07 21:24           ` Andi Kleen
  2006-03-08  0:35         ` Alan Cox
  2 siblings, 1 reply; 61+ messages in thread
From: Bryan O'Sullivan @ 2006-03-07 21:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tue, 2006-03-07 at 12:57 +0100, Andi Kleen wrote:

> > > True, I suppose. I should make it clear that these accessor functions imply
> > > memory barriers, if indeed they do,
> > 
> > They don't, but according to Documentation/DocBook/deviceiobook.tmpl
> > they are performed by the compiler in the order specified.
> 
> I don't think that's correct. Probably the documentation should
> be fixed.

That's why I hedged my words with "according to ..." :-)

But on most arches those accesses do indeed seem to happen in-order.  On
i386 and x86_64, it's a natural consequence of program store ordering.
On at least some other arches, there are explicit memory barriers in the
implementation of the access macros to force this ordering to occur.

	<b


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 21:14         ` Bryan O'Sullivan
@ 2006-03-07 21:24           ` Andi Kleen
  2006-03-08  0:36             ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Andi Kleen @ 2006-03-07 21:24 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tuesday 07 March 2006 22:14, Bryan O'Sullivan wrote:
> On Tue, 2006-03-07 at 12:57 +0100, Andi Kleen wrote:
> > > > True, I suppose. I should make it clear that these accessor functions
> > > > imply memory barriers, if indeed they do,
> > >
> > > They don't, but according to Documentation/DocBook/deviceiobook.tmpl
> > > they are performed by the compiler in the order specified.
> >
> > I don't think that's correct. Probably the documentation should
> > be fixed.
>
> That's why I hedged my words with "according to ..." :-)
>
> But on most arches those accesses do indeed seem to happen in-order.  On
> i386 and x86_64, it's a natural consequence of program store ordering.

Not true for reads on x86.

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
@ 2006-03-07 23:17 Chuck Ebbert
  2006-03-08  0:15 ` David S. Miller
  2006-03-08  0:24 ` Roberto Nibali
  0 siblings, 2 replies; 61+ messages in thread
From: Chuck Ebbert @ 2006-03-07 23:17 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

In-Reply-To: <31492.1141753245@warthog.cambridge.redhat.com>

On Tue, 07 Mar 2006 17:40:45 +0000, David Howells wrote:

> The attached patch documents the Linux kernel's memory barriers.

References:

AMD64 Architecture Programmer's Manual Volume 2: System Programming
        Chapter 7.1: Memory-Access Ordering
        Chapter 7.4: Buffering and Combining Memory Writes

IA-32 Intel Architecture Software Developer’s Manual, Volume 3:
System Programming Guide
        Chapter 7.1: Locked Atomic Operations
        Chapter 7.2: Memory Ordering
        Chapter 7.4: Serializing Instructions




-- 
Chuck
"Penguins don't come from next door, they come from the Antarctic!"


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 23:17 Chuck Ebbert
@ 2006-03-08  0:15 ` David S. Miller
  2006-03-08  0:24 ` Roberto Nibali
  1 sibling, 0 replies; 61+ messages in thread
From: David S. Miller @ 2006-03-08  0:15 UTC (permalink / raw)
  To: 76306.1226; +Cc: dhowells, linux-kernel

From: Chuck Ebbert <76306.1226@compuserve.com>
Date: Tue, 7 Mar 2006 18:17:19 -0500

> In-Reply-To: <31492.1141753245@warthog.cambridge.redhat.com>
> 
> On Tue, 07 Mar 2006 17:40:45 +0000, David Howells wrote:
> 
> > The attached patch documents the Linux kernel's memory barriers.
> 
> References:

Here are some good ones for Sparc64:

The SPARC Architecture Manual, Version 9
    Chapter 8: Memory Models
    Appendix D: Formal Specification of the Memory Models
    Appendix J: Programming with the Memory Models

UltraSPARC Programmer Reference Manual
    Chapter 5: Memory Accesses and Cacheability
    Chapter 15: Sparc-V9 Memory Models

UltraSPARC III Cu User's Manual
    Chapter 9: Memory Models

UltraSPARC IIIi Processor User's Manual
    Chapter 8: Memory Models

UltraSPARC Architecture 2005
    Chapter 9: Memory
    Appendix D: Formal Specifications of the Memory Models

UltraSPARC T1 Supplment to the UltraSPARC Architecture 2005
    Chapter 8: Memory Models
    Appendix F: Caches and Cache Coherency

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
       [not found]     ` <5NPq4-34a-23@gated-at.bofh.it>
@ 2006-03-08  0:22       ` Robert Hancock
  0 siblings, 0 replies; 61+ messages in thread
From: Robert Hancock @ 2006-03-08  0:22 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

Jesse Barnes wrote:
> On Tuesday, March 7, 2006 10:30 am, David Howells wrote:
>> True, I suppose. I should make it clear that these accessor functions
>> imply memory barriers, if indeed they do, and that you should use them
>> rather than accessing I/O registers directly (at least, outside the
>> arch you should).
> 
> But they don't, that's why we have mmiowb().

I don't think that is why that function exists.. It's a no-op on most 
architectures, even where you would need to be able to do write barriers 
on IO accesses (i.e. x86_64 using CONFIG_UNORDERED_IO). I believe that 
function is intended for a more limited special case.

I think any complete memory barrier description should document that 
function as well as EXPLICITLY specifying whether or not the 
readX/writeX, etc. functions imply barriers or not.

> Btw, thanks for putting together this documentation, it's desperately 
> needed.

Seconded.. The fact that there's debate over what the rules even are 
shows why this is needed so badly.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 23:17 Chuck Ebbert
  2006-03-08  0:15 ` David S. Miller
@ 2006-03-08  0:24 ` Roberto Nibali
  1 sibling, 0 replies; 61+ messages in thread
From: Roberto Nibali @ 2006-03-08  0:24 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: David Howells, linux-kernel

>>The attached patch documents the Linux kernel's memory barriers.
> 
> References:
> 
> AMD64 Architecture Programmer's Manual Volume 2: System Programming
>         Chapter 7.1: Memory-Access Ordering
>         Chapter 7.4: Buffering and Combining Memory Writes
> 
> IA-32 Intel Architecture Software Developer’s Manual, Volume 3:
> System Programming Guide
>         Chapter 7.1: Locked Atomic Operations
>         Chapter 7.2: Memory Ordering
>         Chapter 7.4: Serializing Instructions

Do you guys reckon it might be worthwhile adding Sparc's sequential 
consistency, TSO, RMO and PSO models, although I think only RMO is used 
in the Linux kernel? References can be found for example in:

   Solaris Internals, Core Kernel Architecture, p63-68:
           Chapter 3.3: Hardware Considerations for Locks and
                        Synchronization

   Unix Systems for Modern Architectures, Symmetric Multiprocessing
   and Caching for Kernel Programmers:
           Chapter 13 : Other Memory Models

Or is DaveM the only one fiddling with Sparc memory barriers implementation?

Regards,
Roberto Nibali, ratz
-- 
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 20:09   ` David Howells
@ 2006-03-08  0:32     ` Alan Cox
  2006-03-08  8:25     ` Duncan Sands
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08  0:32 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

On Maw, 2006-03-07 at 20:09 +0000, David Howells wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> 
> > Better meaningful example would be barriers versus an IRQ handler. Which
> > leads nicely onto section 2
> 
> Yes, except that I can't think of one that's feasible that doesn't have to do
> with I/O - which isn't a problem if you are using the proper accessor
> functions.

We get them off bus masters for one and you can construct silly versions
of the other.


There are several kernel instances of

	while(*ptr != HAVE_RESPONDED && time_before(jiffies, timeout))
		rmb();

where we wait for hardware to bus master respond when it is fast and
doesn't IRQ.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 11:57       ` Andi Kleen
  2006-03-07 20:01         ` Jesse Barnes
  2006-03-07 21:14         ` Bryan O'Sullivan
@ 2006-03-08  0:35         ` Alan Cox
  2 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08  0:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Bryan O'Sullivan, David Howells, torvalds, akpm, mingo,
	linux-arch, linuxppc64-dev, linux-kernel

On Maw, 2006-03-07 at 12:57 +0100, Andi Kleen wrote:
> > They don't, but according to Documentation/DocBook/deviceiobook.tmpl
> > they are performed by the compiler in the order specified.
> 
> I don't think that's correct. Probably the documentation should
> be fixed.

It would be wiser to ensure they are performed in the order specified.
As far as I can see this is currently true due to the volatile cast and
most drivers rely on this property so the brown and sticky will impact
the rotating air impeller pretty fast if it isnt.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 21:24           ` Andi Kleen
@ 2006-03-08  0:36             ` Alan Cox
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08  0:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Bryan O'Sullivan, David Howells, torvalds, akpm, mingo,
	linux-arch, linuxppc64-dev, linux-kernel

On Maw, 2006-03-07 at 22:24 +0100, Andi Kleen wrote:
> > But on most arches those accesses do indeed seem to happen in-order.  On
> > i386 and x86_64, it's a natural consequence of program store ordering.
> 
> Not true for reads on x86.

You must have a strange kernel Andi. Mine marks them as volatile
unsigned char * references.

Alan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
       [not found]       ` <5NUSF-30Z-5@gated-at.bofh.it>
@ 2006-03-08  1:10         ` Robert Hancock
  2006-03-08 11:35           ` Alan Cox
  2006-03-08 14:55           ` Andi Kleen
  0 siblings, 2 replies; 61+ messages in thread
From: Robert Hancock @ 2006-03-08  1:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> On Maw, 2006-03-07 at 22:24 +0100, Andi Kleen wrote:
>>> But on most arches those accesses do indeed seem to happen in-order.  On
>>> i386 and x86_64, it's a natural consequence of program store ordering.
>> Not true for reads on x86.
> 
> You must have a strange kernel Andi. Mine marks them as volatile
> unsigned char * references.

Well, that and the fact that IO memory should be mapped as uncacheable 
in the MTRRs should ensure that readl and writel won't be reordered on 
i386 and x86_64.. except in the case where CONFIG_UNORDERED_IO is 
enabled on x86_64 which can reorder writes since it uses nontemporal 
stores..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
                   ` (2 preceding siblings ...)
  2006-03-07 18:40 ` Alan Cox
@ 2006-03-08  2:07 ` Nick Piggin
  2006-03-08  3:10 ` Paul Mackerras
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 61+ messages in thread
From: Nick Piggin @ 2006-03-08  2:07 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

David Howells wrote:

>The attached patch documents the Linux kernel's memory barriers.
>
>Signed-Off-By: David Howells <dhowells@redhat.com>
>---
>
>

Good :)

>+==============================
>+IMPLIED KERNEL MEMORY BARRIERS
>+==============================
>+
>+Some of the other functions in the linux kernel imply memory barriers. For
>+instance all the following (pseudo-)locking functions imply barriers.
>+
>+ (*) interrupt disablement and/or interrupts
>

Is this really the case? I mean interrupt disablement only synchronises with
the local CPU, so it probably should not _have_ to imply barriers (eg. some
architectures are playing around with "virtual" interrupt disablement).

[...]

>+
>+Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
>+memory and I/O accesses individually, or interrupt handling will barrier
>+memory and I/O accesses on entry and on exit. This prevents an interrupt
>+routine interfering with accesses made in a disabled-interrupt section of code
>+and vice versa.
>+
>

But CPUs should always be consistent WRT themselves, so I'm not sure that
it is needed?

Thanks,
Nick

--
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
                   ` (3 preceding siblings ...)
  2006-03-08  2:07 ` Nick Piggin
@ 2006-03-08  3:10 ` Paul Mackerras
  2006-03-08  3:30   ` Linus Torvalds
                     ` (2 more replies)
  2006-03-08 16:18 ` Pavel Machek
  2006-03-08 16:26 ` Christoph Lameter
  6 siblings, 3 replies; 61+ messages in thread
From: Paul Mackerras @ 2006-03-08  3:10 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

David Howells writes:

> The attached patch documents the Linux kernel's memory barriers.

Thanks for venturing into this particular lion's den. :)

> +Memory barriers are instructions to both the compiler and the CPU to impose a
> +partial ordering between the memory access operations specified either side of
> +the barrier.

... as observed from another agent in the system - another CPU or a
bus-mastering I/O device.  A given CPU will always see its own memory
accesses in order.

> + (*) reads are synchronous and may need to be done immediately to permit

Leave out the "are synchronous and".  It's not true.

I also think you need to avoid talking about "the bus".  Some systems
don't have a bus, but rather have an interconnection fabric between
the CPUs and the memories.  Talking about a bus implies that all
memory accesses in fact get serialized (by having to be sent one after
the other over the bus) and that you can therefore talk about the
order in which they get to memory.  In some systems, no such order
exists.

It's possible to talk sensibly about the order in which memory
accesses get done without talking about a bus or requiring a total
ordering on the memory access.  The PowerPC architecture spec does
this by specifying that in certain circumstances one load or store has
to be "performed with respect to other processors and mechanisms"
before another.  A load is said to be performed with respect to
another agent when a store by that agent can no longer change the
value returned by the load.  Similarly, a store is performed w.r.t.
an agent when any load done by the agent will return the value stored
(or a later value).

> +     The way to deal with this is to insert an I/O memory barrier between the
> +     two accesses:
> +
> +	*ADR = ctl_reg_3;
> +	mb();
> +	reg = *DATA;

Ummm, this implies mb() is "an I/O memory barrier".  I can see people
getting confused if they read this and then see mb() being used when
no I/O is being done.

> +The Linux kernel has six basic memory barriers:
> +
> +		MANDATORY (I/O)	SMP
> +		===============	================
> +	GENERAL	mb()		smp_mb()
> +	READ	rmb()		smp_rmb()
> +	WRITE	wmb()		smp_wmb()
> +
> +General memory barriers make a guarantee that all memory accesses specified
> +before the barrier will happen before all memory accesses specified after the
> +barrier.

By "memory accesses" do you mean accesses to system memory, or do you
mean loads and stores - which may be to system memory, memory on an I/O
device (e.g. a framebuffer) or to memory-mapped I/O registers?

Linus explained recently that wmb() on x86 does not order stores to
system memory w.r.t. stores to stores to prefetchable I/O memory (at
least that's what I think he said ;).

> +Some of the other functions in the linux kernel imply memory barriers. For
> +instance all the following (pseudo-)locking functions imply barriers.
> +
> + (*) interrupt disablement and/or interrupts

Enabling/disabling interrupts doesn't imply a barrier on powerpc, and
nor does taking an interrupt or returning from one.

> + (*) spin locks

I think it's still an open question as to whether spin locks do any
ordering between accesses to system memory and accesses to I/O
registers.

> + (*) R/W spin locks
> + (*) mutexes
> + (*) semaphores
> + (*) R/W semaphores
> +
> +In all cases there are variants on a LOCK operation and an UNLOCK operation.
> +
> + (*) LOCK operation implication:
> +
> +     Memory accesses issued after the LOCK will be completed after the LOCK
> +     accesses have completed.
> +
> +     Memory accesses issued before the LOCK may be completed after the LOCK
> +     accesses have completed.
> +
> + (*) UNLOCK operation implication:
> +
> +     Memory accesses issued before the UNLOCK will be completed before the
> +     UNLOCK accesses have completed.
> +
> +     Memory accesses issued after the UNLOCK may be completed before the UNLOCK
> +     accesses have completed.

And therefore an UNLOCK followed by a LOCK is equivalent to a full
barrier, but a LOCK followed by an UNLOCK isn't.

> +Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
> +memory and I/O accesses individually, or interrupt handling will barrier
> +memory and I/O accesses on entry and on exit. This prevents an interrupt
> +routine interfering with accesses made in a disabled-interrupt section of code
> +and vice versa.

I don't think this is right, and I don't think it is necessary to
achieve the end you state, since a CPU will always see its own memory
accesses in program order.

> +The following sequence of events on the bus is acceptable:
> +
> +	LOCK, *F+*A, *E, *C+*D, *B, UNLOCK

What does *F+*A mean?

> +Consider also the following (going back to the AMD PCnet example):
> +
> +	DISABLE IRQ
> +	*ADR = ctl_reg_3;
> +	mb();
> +	x = *DATA;
> +	*ADR = ctl_reg_4;
> +	mb();
> +	*DATA = y;
> +	*ADR = ctl_reg_5;
> +	mb();
> +	z = *DATA;
> +	ENABLE IRQ
> +	<interrupt>
> +	*ADR = ctl_reg_7;
> +	mb();
> +	q = *DATA
> +	</interrupt>
> +
> +What's to stop "z = *DATA" crossing "*ADR = ctl_reg_7" and reading from the
> +wrong register? (There's no guarantee that the process of handling an
> +interrupt will barrier memory accesses in any way).

Well, the driver should *not* be doing *ADR at all, it should be using
read[bwl]/write[bwl].  The architecture code has to implement
read*/write* in such a way that the accesses generated can't be
reordered.  I _think_ it also has to make sure the write accesses
can't be write-combined, but it would be good to have that clarified.

> +======================
> +POWERPC SPECIFIC NOTES
> +======================
> +
> +The powerpc is weakly ordered, and its read and write accesses may be
> +completed generally in any order. It's memory barriers are also to some extent
> +more substantial than the mimimum requirement, and may directly effect
> +hardware outside of the CPU.

Unfortunately mb()/smp_mb() are quite expensive on PowerPC, since the
only instruction we have that implies a strong enough barrier is sync,
which also performs several other kinds of synchronization, such as
waiting until all previous instructions have completed executing to
the point where they can no longer cause an exception.

Paul.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  3:10 ` Paul Mackerras
@ 2006-03-08  3:30   ` Linus Torvalds
  2006-03-08 12:34     ` David Howells
  2006-03-08  7:41   ` Nick Piggin
  2006-03-08 13:19   ` David Howells
  2 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2006-03-08  3:30 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: David Howells, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel



On Wed, 8 Mar 2006, Paul Mackerras wrote:
> 
> Linus explained recently that wmb() on x86 does not order stores to
> system memory w.r.t. stores to stores to prefetchable I/O memory (at
> least that's what I think he said ;).

In fact, it won't order stores to normal memory even wrt any 
_non-prefetchable_ IO memory.

PCI (and any other sane IO fabric, for that matter) will do IO posting, so 
the fact that the CPU _core_ may order them due to a wmb() doesn't 
actually mean anything.

The only way to _really_ synchronize with a store to an IO device is 
literally to read from that device (*). No amount of memory barriers will 
do it.

So you can really only order stores to regular memory wrt each other, and 
stores to IO memory wrt each other. For the former, "smp_wmb()" does it.

For IO memory, normal IO memory is _always_ supposed to be in program 
order (at least for PCI. It's part of how the bus is supposed to work), 
unless the IO range allows prefetching (and you've set some MTRR). And if 
you do, that, currently you're kind of screwed. mmiowb() should do it, but 
nobody really uses it, and I think it's broken on x86 (it's a no-op, it 
really should be an "sfence").

A full "mb()" is probably most likely to work in practice. And yes, we 
should clean this up.

		Linus

(*) The "read" can of course be any event that tells you that the store 
has happened - it doesn't necessarily have to be an actual "read[bwl]()" 
operation. Eg the store might start a command, and when you get the 
completion interrupt, you obviously know that the store is done, just from 
a causal reason.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  3:10 ` Paul Mackerras
  2006-03-08  3:30   ` Linus Torvalds
@ 2006-03-08  7:41   ` Nick Piggin
  2006-03-08 13:19   ` David Howells
  2 siblings, 0 replies; 61+ messages in thread
From: Nick Piggin @ 2006-03-08  7:41 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Paul Mackerras wrote:
> David Howells writes:

>>+     The way to deal with this is to insert an I/O memory barrier between the
>>+     two accesses:
>>+
>>+	*ADR = ctl_reg_3;
>>+	mb();
>>+	reg = *DATA;
> 
> 
> Ummm, this implies mb() is "an I/O memory barrier".  I can see people
> getting confused if they read this and then see mb() being used when
> no I/O is being done.
> 

Isn't it? Why wouldn't you just use smp_mb() if no IO is being done?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 20:09   ` David Howells
  2006-03-08  0:32     ` Alan Cox
@ 2006-03-08  8:25     ` Duncan Sands
  2006-03-08 22:06       ` Paul Mackerras
  1 sibling, 1 reply; 61+ messages in thread
From: Duncan Sands @ 2006-03-08  8:25 UTC (permalink / raw)
  To: David Howells
  Cc: Alan Cox, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Tuesday 7 March 2006 21:09, David Howells wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> 
> > Better meaningful example would be barriers versus an IRQ handler. Which
> > leads nicely onto section 2
> 
> Yes, except that I can't think of one that's feasible that doesn't have to do
> with I/O - which isn't a problem if you are using the proper accessor
> functions.
> 
> Such an example has to involve more than one CPU, because you don't tend to
> get memory/memory ordering problems on UP.

On UP you at least need compiler barriers, right?  You're in trouble if you think
you are writing in a certain order, and expect to see the same order from an
interrupt handler, but the compiler decided to rearrange the order of the writes...

> The obvious one might be circular buffers, except there's no problem there
> provided you have a memory barrier between accessing the buffer and updating
> your pointer into it.
> 
> David

Ciao,

Duncan.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  1:10         ` Robert Hancock
@ 2006-03-08 11:35           ` Alan Cox
  2006-03-08 14:55           ` Andi Kleen
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08 11:35 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

On Maw, 2006-03-07 at 19:10 -0600, Robert Hancock wrote:
> Alan Cox wrote:
> > You must have a strange kernel Andi. Mine marks them as volatile
> > unsigned char * references.
> 
> Well, that and the fact that IO memory should be mapped as uncacheable 
> in the MTRRs should ensure that readl and writel won't be reordered on 
> i386 and x86_64.. except in the case where CONFIG_UNORDERED_IO is 
> enabled on x86_64 which can reorder writes since it uses nontemporal 
> stores..

You need both

real/writel need the volatile to stop gcc removing/reordering the
accesses at compiler level, and the mtrr/pci bridge stuff then deals
with bus level ordering for that CPU.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  3:30   ` Linus Torvalds
@ 2006-03-08 12:34     ` David Howells
  2006-03-08 16:40       ` Bryan O'Sullivan
  0 siblings, 1 reply; 61+ messages in thread
From: David Howells @ 2006-03-08 12:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, David Howells, akpm, mingo, linux-arch,
	linuxppc64-dev, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:

> > Linus explained recently that wmb() on x86 does not order stores to
> > system memory w.r.t. stores to stores to prefetchable I/O memory (at
> > least that's what I think he said ;).

On i386 and x86_64, do IN and OUT instructions imply MFENCE? It's not obvious
from the x86_64 docs.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  3:10 ` Paul Mackerras
  2006-03-08  3:30   ` Linus Torvalds
  2006-03-08  7:41   ` Nick Piggin
@ 2006-03-08 13:19   ` David Howells
  2006-03-08 21:49     ` Paul Mackerras
  2006-03-10  0:49     ` H. Peter Anvin
  2 siblings, 2 replies; 61+ messages in thread
From: David Howells @ 2006-03-08 13:19 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Paul Mackerras <paulus@samba.org> wrote:

> By "memory accesses" do you mean accesses to system memory, or do you
> mean loads and stores - which may be to system memory, memory on an I/O
> device (e.g. a framebuffer) or to memory-mapped I/O registers?

Well, I meant all loads and stores, irrespective of their destination.

However, on i386, for example, you've actually got at least two different I/O
access domains, and I don't know how they impinge upon each other (IN/OUT vs
MOV).

> Enabling/disabling interrupts doesn't imply a barrier on powerpc, and
> nor does taking an interrupt or returning from one.

Surely it ought to, otherwise what's to stop accesses done with interrupts
disabled crossing with accesses done inside an interrupt handler?

> > +Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
> ...
> I don't think this is right, and I don't think it is necessary to
> achieve the end you state, since a CPU will always see its own memory
> accesses in program order.

But what about a driver accessing some memory that its device is going to
observe under irq disablement, and then getting an interrupt immediately after
from that same device, the handler for which communicates with the device,
possibly then being broken because the CPU hasn't completed all the memory
accesses that the driver made while interrupts are disabled?

Alternatively, might it be possible for communications between two CPUs to be
stuffed because one took an interrupt that also modified common data before
the it had committed the memory accesses done under interrupt disablement?
This would suggest using a lock though.

I'm not sure that I can come up with a feasible example for this, but Alan Cox
seems to think that it's a valid problem too.

The only likely way I can see this being a problem is with unordered I/O
writes, which would suggest you have to place an mmiowb() before unlocking the
spinlock in such a case, assuming it is possible to get unordered I/O writes
(which I think it is).

> What does *F+*A mean?

Combined accesses.

> Well, the driver should *not* be doing *ADR at all, it should be using
> read[bwl]/write[bwl].  The architecture code has to implement
> read*/write* in such a way that the accesses generated can't be
> reordered.  I _think_ it also has to make sure the write accesses
> can't be write-combined, but it would be good to have that clarified.

Than what use mmiowb()?

Surely write combining and out-of-order reads are reasonable for cacheable
devices like framebuffers.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  1:10         ` Robert Hancock
  2006-03-08 11:35           ` Alan Cox
@ 2006-03-08 14:55           ` Andi Kleen
  1 sibling, 0 replies; 61+ messages in thread
From: Andi Kleen @ 2006-03-08 14:55 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

Robert Hancock <hancockr@shaw.ca> writes:

> Alan Cox wrote:
> > On Maw, 2006-03-07 at 22:24 +0100, Andi Kleen wrote:
> >>> But on most arches those accesses do indeed seem to happen in-order.  On
> >>> i386 and x86_64, it's a natural consequence of program store ordering.
> >> Not true for reads on x86.
> > You must have a strange kernel Andi. Mine marks them as volatile
> > unsigned char * references.
> 
> Well, that and the fact that IO memory should be mapped as uncacheable
> in the MTRRs should ensure that readl and writel won't be reordered on
> i386 and x86_64.. except in the case where CONFIG_UNORDERED_IO is
> enabled on x86_64 which can reorder writes since it uses nontemporal
> stores..

CONFIG_UNORDERED_IO is a failed experiment. I just removed it.

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
                   ` (4 preceding siblings ...)
  2006-03-08  3:10 ` Paul Mackerras
@ 2006-03-08 16:18 ` Pavel Machek
  2006-03-08 20:16   ` David Howells
  2006-03-08 16:26 ` Christoph Lameter
  6 siblings, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2006-03-08 16:18 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

Hi!

> +There are some more advanced barriering functions:
> +
> + (*) set_mb(var, value)
> + (*) set_wmb(var, value)
> +
> +     These assign the value to the variable and then insert at least a write
> +     barrier after it, depending on the function.
> +

I... don't understand what these do. Better explanation would
help.. .what is function?

Does it try to say that set_mb(var, value) is equivalent to var =
value; mb(); but here mb() affects that one variable, only?


> +In all cases there are variants on a LOCK operation and an UNLOCK operation.
> +
> + (*) LOCK operation implication:
> +
> +     Memory accesses issued after the LOCK will be completed after the LOCK
> +     accesses have completed.

"LOCK access"? Does it try to say that ...will be completed after any
access inside lock region is completed?

("LOCK" looks very much like well-known i386 prefix. Calling it
*_lock() or something would avoid that confusion. Fortunately there's
no UNLOCK instruction :-)

> + (*) UNLOCK operation implication:
> +
> +     Memory accesses issued before the UNLOCK will be completed before the
> +     UNLOCK accesses have completed.
> +
> +     Memory accesses issued after the UNLOCK may be completed before the UNLOCK
> +     accesses have completed.
> +
> + (*) LOCK vs UNLOCK implication:
> +
> +     The LOCK accesses will be completed before the unlock accesses.
                                                       ~~~~~~
							 capital? Or
						lower it everywhere?


> +==============================
> +I386 AND X86_64 SPECIFIC NOTES
> +==============================
> +
> +Earlier i386 CPUs (pre-Pentium-III) are fully ordered - the operations on the
> +bus appear in program order - and so there's no requirement for any sort of
> +explicit memory barriers.
> +
> +From the Pentium-III onwards were three new memory barrier instructions:
> +LFENCE, SFENCE and MFENCE which correspond to the kernel memory barrier
> +functions rmb(), wmb() and mb(). However, there are additional implicit memory
> +barriers in the CPU implementation:
> +
> + (*) Normal writes imply a semi-rmb(): reads before a write may not complete
> +     after that write, but reads after a write may complete before the write
> +     (ie: reads may go _ahead_ of writes).

This makes it sound like pentium-III+ is incompatible with previous
CPUs. Is it really the case?
								Pavel
-- 
Web maintainer for suspend.sf.net (www.sf.net/projects/suspend) wanted...

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 17:40 David Howells
                   ` (5 preceding siblings ...)
  2006-03-08 16:18 ` Pavel Machek
@ 2006-03-08 16:26 ` Christoph Lameter
  2006-03-08 17:35   ` David Howells
  6 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2006-03-08 16:26 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

You need to explain the difference between the compiler reordering and the 
control of the compilers arrangement of loads and stores and the cpu 
reordering of stores and loads. Note that IA64 has a much more complete 
set of means to reorder stores and loads. i386 and x84_64 processors can 
only do limited reordering. So it may make sense to deal with general 
reordering and then explain i386 as a specific limited case.

See the "Intel Itanium Architecture Software Developer's Manual" 
(available from intels website). Look at Volume 1 section 2.6 
"Speculation" and 4.4 "Memory Access"

Also the specific barrier functions of various locking elements varies to 
some extend.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 12:34     ` David Howells
@ 2006-03-08 16:40       ` Bryan O'Sullivan
  0 siblings, 0 replies; 61+ messages in thread
From: Bryan O'Sullivan @ 2006-03-08 16:40 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, Paul Mackerras, akpm, mingo, linux-arch,
	linuxppc64-dev, linux-kernel

On Wed, 2006-03-08 at 12:34 +0000, David Howells wrote:

> On i386 and x86_64, do IN and OUT instructions imply MFENCE?

No.

	<b


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 16:26 ` Christoph Lameter
@ 2006-03-08 17:35   ` David Howells
  2006-03-08 17:46     ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: David Howells @ 2006-03-08 17:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Howells, linux-kernel

Christoph Lameter <clameter@engr.sgi.com> wrote:

> You need to explain the difference between the compiler reordering and the 
> control of the compilers arrangement of loads and stores and the cpu 
> reordering of stores and loads.

Hmmm... I would hope people looking at this doc would understand that, but
I'll see what I can come up with.

> Note that IA64 has a much more complete set of means to reorder stores and
> loads. i386 and x84_64 processors can only do limited reordering. So it may
> make sense to deal with general reordering and then explain i386 as a
> specific limited case.

Don't you need to use sacrifice_goat() for controlling the IA64? :-)

Besides, I'm not sure that I need to explain that any CPU is a limited case;
I'm primarily trying to define the basic minimal guarantees you can expect
from using a memory barrier, and what might happen if you don't. It shouldn't
matter which arch you're dealing with, especially if you're writing a driver.

I tried to create arch-specific sections for describing arch-specific implicit
barriers and the extent of the explicit memory barriers on each arch, but the
i386 section was generating lots of exceptions that it looked infeasible to
describe them; besides, you aren't allowed to rely on such features outside of
arch code (I count arch-specific drivers as "arch code" for this).

> See the "Intel Itanium Architecture Software Developer's Manual" 
> (available from intels website). Look at Volume 1 section 2.6 
> "Speculation" and 4.4 "Memory Access"

I've added that to the refs, thanks.

> Also the specific barrier functions of various locking elements varies to 
> some extend.

Please elaborate.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 17:35   ` David Howells
@ 2006-03-08 17:46     ` Christoph Lameter
  2006-03-08 17:59       ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2006-03-08 17:46 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

On Wed, 8 Mar 2006, David Howells wrote:

> Hmmm... I would hope people looking at this doc would understand that, but
> I'll see what I can come up with.
> 
> > Note that IA64 has a much more complete set of means to reorder stores and
> > loads. i386 and x84_64 processors can only do limited reordering. So it may
> > make sense to deal with general reordering and then explain i386 as a
> > specific limited case.
> 
> Don't you need to use sacrifice_goat() for controlling the IA64? :-)

Likely...
 
> Besides, I'm not sure that I need to explain that any CPU is a limited case;
> I'm primarily trying to define the basic minimal guarantees you can expect
> from using a memory barrier, and what might happen if you don't. It shouldn't
> matter which arch you're dealing with, especially if you're writing a driver.

memory barrier functions have to be targeted to the processor with the 
ability to do the widest amount of reordering. This is the Itanium AFAIK.

> I tried to create arch-specific sections for describing arch-specific implicit
> barriers and the extent of the explicit memory barriers on each arch, but the
> i386 section was generating lots of exceptions that it looked infeasible to
> describe them; besides, you aren't allowed to rely on such features outside of
> arch code (I count arch-specific drivers as "arch code" for this).

i386 does not fully implement things like write barriers since they have 
an implicit ordering of stores.

> > Also the specific barrier functions of various locking elements varies to 
> > some extend.
> 
> Please elaborate.

F.e. spin_unlock has "release" semantics on IA64. That means that prior 
write accesses are visible before the store, read accesses are also 
completed before the store. However, the processor may perform later read 
and write accesses before the results of the store become visible.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 17:46     ` Christoph Lameter
@ 2006-03-08 17:59       ` Alan Cox
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08 17:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Howells, linux-kernel

> i386 does not fully implement things like write barriers since they have 
> an implicit ordering of stores.

Except when they don't (PPro errata cases, and the explicit support for
this in the IDT Winchip)


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 16:18 ` Pavel Machek
@ 2006-03-08 20:16   ` David Howells
  2006-03-08 22:01     ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: David Howells @ 2006-03-08 20:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

Pavel Machek <pavel@ucw.cz> wrote:

> > + (*) set_mb(var, value)
> > + (*) set_wmb(var, value)
> > +
> > +     These assign the value to the variable and then insert at least a write
> > +     barrier after it, depending on the function.
> > +
> 
> I... don't understand what these do. Better explanation would
> help.. .what is function?

I can only guess, and hope someone corrects me if I'm wrong.

> Does it try to say that set_mb(var, value) is equivalent to var =
> value; mb();

Yes.

> but here mb() affects that one variable, only?

No. set_*mb() is simply a canned sequence of assignment, memory barrier.

The type of barrier inserted depends on which function you choose. set_mb()
inserts an mb() and set_wmb() inserts a wmb().

> "LOCK access"?

The LOCK and UNLOCK functions presumably make at least one memory write apiece
to manipulate the target lock (on SMP at least).

> Does it try to say that ...will be completed after any access inside lock
> region is completed?

No. What you get in effect is something like:

	LOCK { *lock = q; }
	*A = a;
	*B = b;
	UNLOCK { *lock = u; }

Except that the accesses to the lock memory are made using special procedures
(LOCK prefixed instructions, XCHG, CAS/CMPXCHG, LL/SC, etc).

> This makes it sound like pentium-III+ is incompatible with previous
> CPUs. Is it really the case?

Yes - hence the alternative instruction stuff.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 13:19   ` David Howells
@ 2006-03-08 21:49     ` Paul Mackerras
  2006-03-08 22:05       ` Alan Cox
  2006-03-10  0:49     ` H. Peter Anvin
  1 sibling, 1 reply; 61+ messages in thread
From: Paul Mackerras @ 2006-03-08 21:49 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, akpm, mingo, linux-arch, linuxppc64-dev, linux-kernel

David Howells writes:

> > Enabling/disabling interrupts doesn't imply a barrier on powerpc, and
> > nor does taking an interrupt or returning from one.
> 
> Surely it ought to, otherwise what's to stop accesses done with interrupts
> disabled crossing with accesses done inside an interrupt handler?

The rule that the CPU always sees its own loads and stores in program
order.

If a CPU takes an interrupt after doing some stores, and the interrupt
handler does loads from the same location(s), it has to see the new
values, even if they haven't got to memory yet.  The interrupt isn't
special in this situation; if the instruction stream has a store to a
location followed by a load from it, the load *has* to see the value
stored by the store (assuming no other store to the same location in
the meantime, of course).  That's true whether or not the CPU takes an
exception or interrupt between the store and the load.  Anything else
would make programming really ... um ... interesting. :)

> > > +Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
> > ...
> > I don't think this is right, and I don't think it is necessary to
> > achieve the end you state, since a CPU will always see its own memory
> > accesses in program order.
> 
> But what about a driver accessing some memory that its device is going to
> observe under irq disablement, and then getting an interrupt immediately after
> from that same device, the handler for which communicates with the device,
> possibly then being broken because the CPU hasn't completed all the memory
> accesses that the driver made while interrupts are disabled?

Well, we have to be clear about what causes what here.  Is the device
accessing this memory just at a random time, or is the access caused
by (in response to) an MMIO store?  And what causes the interrupt?
Does it just happen to come along at this time or is it in response to
one of the stores?

If the device accesses to memory are in response to an MMIO store,
then the code needs an explicit wmb() between the memory stores and
the MMIO store.  Disabling interrupts isn't going to help here because
the device doesn't see the CPU interrupt enable state.

In general it is possible for the CPU to see a different state of
memory than the device sees.  If the driver needs to be sure that they
both see the same view then it needs to use some sort of
synchronization.  A memory barrier followed by a store to the device,
with no further stores to memory until we have an indication from the
device that it has received the MMIO store, would be a suitable way to
synchronize.  Enabling or disabling interrupts does nothing useful
here because the device doesn't see that.  That applies whether we are
in an interrupt routine or not.

Do you have a specific scenario in mind, with a particular device and
driver?

One thing that driver writers do need to be careful about is that if a
device writes some data to memory and then causes an interrupt, the
fact that the interrupt has reached the CPU and the CPU has invoked
the driver's interrupt routine does *not* mean that the data has got
to memory from the CPU's point of view.  The data could still be
queued up in the PCI host bridge or elsewhere.  Doing an MMIO read
from the device is sufficient to ensure that the CPU will then see the
correct data in memory.

> Alternatively, might it be possible for communications between two CPUs to be
> stuffed because one took an interrupt that also modified common data before
> the it had committed the memory accesses done under interrupt disablement?
> This would suggest using a lock though.

Disabling interrupts doesn't do *anything* to help with communication
between CPUs.  You have to use locks or explicit barriers for that.
It is possible for one CPU to see memory accesses done by another CPU
in a different order from the program order on the CPU that did the
accesses.  That applies whether or not some of the accesses were done
inside an interrupt routine.

> > What does *F+*A mean?
> 
> Combined accesses.

Still opaque, sorry: you mean they both happen in some unspecified
order?

> > Well, the driver should *not* be doing *ADR at all, it should be using
> > read[bwl]/write[bwl].  The architecture code has to implement
> > read*/write* in such a way that the accesses generated can't be
> > reordered.  I _think_ it also has to make sure the write accesses
> > can't be write-combined, but it would be good to have that clarified.
> 
> Than what use mmiowb()?

That was introduced to help some platforms that have difficulty
ensuring that MMIO accesses hit the device in the right order, IIRC.
I'm still not entirely clear on exactly where it's needed or what
guarantees you can rely on if you do or don't use it.

> Surely write combining and out-of-order reads are reasonable for cacheable
> devices like framebuffers.

They are.  read*/write* to non-cacheable non-prefetchable MMIO
shouldn't be reordered or write-combined, but for prefetchable MMIO
I'm not sure whether read*/write* should allow reordering, or whether
drivers should use __raw_read/write* if they want that.  (Of course,
with the __raw_ functions they don't get the endian conversion
either...)

Paul.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 20:16   ` David Howells
@ 2006-03-08 22:01     ` Alan Cox
  2006-03-09 11:41       ` David Howells
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2006-03-08 22:01 UTC (permalink / raw)
  To: David Howells
  Cc: Pavel Machek, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Mer, 2006-03-08 at 20:16 +0000, David Howells wrote:
> The LOCK and UNLOCK functions presumably make at least one memory write apiece
> to manipulate the target lock (on SMP at least).

No they merely perform the bus transactions neccessary to perform an
update atomically. They are however "serializing" instructions which
means they do cause a certain amount of serialization (see the intel
architecture manual on serializing instructions for detail).

Athlon and later know how to turn it from locked memory accesses into
merely an exclusive cache line grab.

> > This makes it sound like pentium-III+ is incompatible with previous
> > CPUs. Is it really the case?
> 
> Yes - hence the alternative instruction stuff.

It is the case for certain specialist instructions and the fences are
provided to go with those but can also help in other cases. PIII and
later in particular support explicit non temporal stores.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 21:49     ` Paul Mackerras
@ 2006-03-08 22:05       ` Alan Cox
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08 22:05 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: David Howells, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Iau, 2006-03-09 at 08:49 +1100, Paul Mackerras wrote:
> If the device accesses to memory are in response to an MMIO store,
> then the code needs an explicit wmb() between the memory stores and
> the MMIO store.  Disabling interrupts isn't going to help here because
> the device doesn't see the CPU interrupt enable state.

Interrupts are themselves entirely asynchronous anyway. The following
can occur on SMP Pentium-PIII.

	Device
		Raise IRQ

	CPU
		writel(MASK_IRQ, &dev->ctrl);
		readl(&dev->ctrl);

	IRQ arrives
		
CPU specific IRQ masking is synchronous, but IRQ delivery is not,
including IPI delivery (which is asynchronous and not guaranteed to
occur only once per IPI but can be replayed in obscure cases on x86).



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08  8:25     ` Duncan Sands
@ 2006-03-08 22:06       ` Paul Mackerras
  2006-03-08 22:24         ` David S. Miller
  2006-03-08 22:42         ` Alan Cox
  0 siblings, 2 replies; 61+ messages in thread
From: Paul Mackerras @ 2006-03-08 22:06 UTC (permalink / raw)
  To: Duncan Sands
  Cc: David Howells, akpm, linux-arch, linux-kernel, torvalds, mingo,
	linuxppc64-dev, Alan Cox

Duncan Sands writes:

> On UP you at least need compiler barriers, right?  You're in trouble if you think
> you are writing in a certain order, and expect to see the same order from an
> interrupt handler, but the compiler decided to rearrange the order of the writes...

I'd be interested to know what the C standard says about whether the
compiler can reorder writes that may be visible to a signal handler.
An interrupt handler in the kernel is logically equivalent to a signal
handler in normal C code.

Surely there are some C language lawyers on one of the lists that this
thread is going to?

Paul.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 22:06       ` Paul Mackerras
@ 2006-03-08 22:24         ` David S. Miller
  2006-03-08 22:31           ` Linus Torvalds
  2006-03-08 22:42         ` Alan Cox
  1 sibling, 1 reply; 61+ messages in thread
From: David S. Miller @ 2006-03-08 22:24 UTC (permalink / raw)
  To: paulus
  Cc: duncan.sands, dhowells, akpm, linux-arch, linux-kernel, torvalds,
	mingo, linuxppc64-dev, alan

From: Paul Mackerras <paulus@samba.org>
Date: Thu, 9 Mar 2006 09:06:05 +1100

> I'd be interested to know what the C standard says about whether the
> compiler can reorder writes that may be visible to a signal handler.
> An interrupt handler in the kernel is logically equivalent to a signal
> handler in normal C code.
> 
> Surely there are some C language lawyers on one of the lists that this
> thread is going to?

Just like for setjmp() I think you have to mark such things
as volatile.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 22:24         ` David S. Miller
@ 2006-03-08 22:31           ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2006-03-08 22:31 UTC (permalink / raw)
  To: David S. Miller
  Cc: paulus, duncan.sands, dhowells, akpm, linux-arch, linux-kernel,
	mingo, linuxppc64-dev, alan



On Wed, 8 Mar 2006, David S. Miller wrote:
> 
> Just like for setjmp() I think you have to mark such things
> as volatile.

.. and sigatomic_t.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 22:06       ` Paul Mackerras
  2006-03-08 22:24         ` David S. Miller
@ 2006-03-08 22:42         ` Alan Cox
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Cox @ 2006-03-08 22:42 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Duncan Sands, David Howells, akpm, linux-arch, linux-kernel,
	torvalds, mingo, linuxppc64-dev

On Iau, 2006-03-09 at 09:06 +1100, Paul Mackerras wrote:
> I'd be interested to know what the C standard says about whether the
> compiler can reorder writes that may be visible to a signal handler.
> An interrupt handler in the kernel is logically equivalent to a signal
> handler in normal C code.

The C standard doesn't have much to say. POSIX has a lot to say and yes
it can do this. You do need volatile or store barriers in signal touched
code quite often, or for that matter locks

POSIX/SuS also has stuff to say about what functions are signal safe and
what is not allowed.

Alan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-07 19:15         ` linux-os (Dick Johnson)
  (?)
@ 2006-03-09 11:26         ` Sergei Organov
  -1 siblings, 0 replies; 61+ messages in thread
From: Sergei Organov @ 2006-03-09 11:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc64-dev

"linux-os \(Dick Johnson\)" <linux-os@analogic.com> writes:

> On Tue, 7 Mar 2006, Matthew Wilcox wrote:
>
>> On Tue, Mar 07, 2006 at 01:54:33PM -0500, linux-os (Dick Johnson) wrote:
>>> This might be a good place to document:
>>>     dummy = readl(&foodev->ctrl);
>>>
>>> Will flush all pending writes to the PCI bus and that:
>>>     (void) readl(&foodev->ctrl);
>>> ... won't because `gcc` may optimize it away. In fact, variable
>>> "dummy" should be global or `gcc` may make it go away as well.
>>
>> static inline unsigned int readl(const volatile void __iomem *addr)
>> {
>> 	return *(volatile unsigned int __force *) addr;
>> }
>>
>> The cast is volatile, so gcc knows not to optimise it away.
>>
>
> When the assignment is not made a.k.a., cast to void, or when the
> assignment is made to an otherwise unused variable, `gcc` does,
> indeed make it go away.

Wrong. From the GCC texinfo documentation:

" Less obvious expressions are where something which looks like an access
is used in a void context.  An example would be,

     volatile int *src = SOMEVALUE;
     *src;

 With C, such expressions are rvalues, and as rvalues cause a read of
the object, GCC interprets this as a read of the volatile being pointed
to. "

So, did you report the bug to the GCC maintainers?

-- Sergei.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 22:01     ` Alan Cox
@ 2006-03-09 11:41       ` David Howells
  2006-03-09 12:28         ` Alan Cox
  2006-03-09 16:32         ` Linus Torvalds
  0 siblings, 2 replies; 61+ messages in thread
From: David Howells @ 2006-03-09 11:41 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Howells, Pavel Machek, torvalds, akpm, mingo, linux-arch,
	linuxppc64-dev, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > The LOCK and UNLOCK functions presumably make at least one memory write apiece
> > to manipulate the target lock (on SMP at least).
> 
> No they merely perform the bus transactions neccessary to perform an
> update atomically. They are however "serializing" instructions which
> means they do cause a certain amount of serialization (see the intel
> architecture manual on serializing instructions for detail).
> 
> Athlon and later know how to turn it from locked memory accesses into
> merely an exclusive cache line grab.

So, you're saying that the LOCK and UNLOCK primitives don't actually modify
memory, but rather simply pin the cacheline into the CPU's cache and refuse to
let anyone else touch it?

No... it can't work like that. It *must* make a memory modification - after
all, the CPU doesn't know that what it's doing is a spin_unlock(), say, rather
than an atomic_set().

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 11:41       ` David Howells
@ 2006-03-09 12:28         ` Alan Cox
  2006-03-09 13:02           ` David Howells
  2006-03-09 16:32         ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Cox @ 2006-03-09 12:28 UTC (permalink / raw)
  To: David Howells
  Cc: Pavel Machek, torvalds, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel

On Iau, 2006-03-09 at 11:41 +0000, David Howells wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> So, you're saying that the LOCK and UNLOCK primitives don't actually modify
> memory, but rather simply pin the cacheline into the CPU's cache and refuse to
> let anyone else touch it?

Basically yes

> No... it can't work like that. It *must* make a memory modification 

Then you'll have to argue with the chip designers because it doesn't.

Its all built around the cache coherency. To make a write to a cache
line I must be the sole owner of the line. Look up "MESI cache" in a
good book on the subject.

If we own the affected line then we can update just the cache and be
sure that since we own the cache line and we will write it back if
anyone else asks for it (or nowdays on some systems transfer it direct
to the other cpu) that we get locked semantics


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 12:28         ` Alan Cox
@ 2006-03-09 13:02           ` David Howells
  0 siblings, 0 replies; 61+ messages in thread
From: David Howells @ 2006-03-09 13:02 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Howells, Pavel Machek, torvalds, akpm, mingo, linux-arch,
	linuxppc64-dev, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > So, you're saying that the LOCK and UNLOCK primitives don't actually modify
> > memory, but rather simply pin the cacheline into the CPU's cache and refuse to
> > let anyone else touch it?
> 
> Basically yes

What you said is incomplete: the cacheline is wangled into the Exclusive
state, and there it sits until modified (at which point it shifts to the
Modified state) or stolen (when it shifts to the Shared state). Whilst the x86
CPU might pin it there for the duration of the execution of the locked
instruction, it can't leave it there until it detects a spin_unlock() or
equivalent.

I guess LL/SC and LWARX/STWCX work by the reserved load wangling the cacheline
into the Exclusive state, and then the conditional store only doing the store
if the cacheline is still in that state. I don't know whether the conditional
store may modify a cacheline that's in the Modified state, but I'd guess you'd
need more state than that, because you have to pair it with a load reserved.


With inter-CPU memory barriers I think you have to consider the cache part of
the memory, not part of the CPU. The CPU _does_ make a memory modification;
it's just that it doesn't proceed any further than the cache, until the cache
coherency mechanisms transfer the change to another CPU, or until the cache
becomes full and the lock's line gets ejected.

> > No... it can't work like that. It *must* make a memory modification 
> 
> Then you'll have to argue with the chip designers because it doesn't.
> 
> Its all built around the cache coherency. To make a write to a cache
> line I must be the sole owner of the line. Look up "MESI cache" in a
> good book on the subject.

http://en.wikipedia.org/wiki/MESI_protocol

And a picture of the state machine may be found here:

https://www.cs.tcd.ie/Jeremy.Jones/vivio/caches/MESIHelp.htm

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 11:41       ` David Howells
  2006-03-09 12:28         ` Alan Cox
@ 2006-03-09 16:32         ` Linus Torvalds
  2006-03-09 17:39           ` David Howells
  1 sibling, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2006-03-09 16:32 UTC (permalink / raw)
  To: David Howells
  Cc: Alan Cox, Pavel Machek, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel



On Thu, 9 Mar 2006, David Howells wrote:
> 
> So, you're saying that the LOCK and UNLOCK primitives don't actually modify
> memory, but rather simply pin the cacheline into the CPU's cache and refuse to
> let anyone else touch it?
> 
> No... it can't work like that. It *must* make a memory modification - after
> all, the CPU doesn't know that what it's doing is a spin_unlock(), say, rather
> than an atomic_set().

Basically, as long as nobody else is reading the lock, the lock will stay 
in the caches.

Only old and stupid architectures go out to the bus for locking. For 
example, I remember the original alpha "load-locked"/"store-conditional", 
and it was totally _horrible_ for anything that wanted performance, 
because it would do the "pending lock" bit on the bus, so it took hundreds 
of cycles even on UP. Gods, how I hated that. It made it almost totally 
useless for anything that just wanted to be irq-safe - it was cheaper to 
just disable interrupts, iirc. STUPID.

All modern CPU's do atomic operations entirely within the cache coherency 
logic. I think x86 still support the notion of a "locked cycle" on the 
bus, but I think that's entirely relegated to horrible people doing locked 
operations across PCI, and quite frankly, I suspect that it doesn't 
actually mean a thing (ie I'd expect no external hardware to actually 
react to the lock signal). However, nobody really cares, since nobody 
would be crazy enough to do locked cycles over PCI even if they were to 
work.

So in practice, as far as I know, the way _all_ modern CPU's do locked 
cycles is that they do it by getting exclusive ownership on the cacheline 
on the read, and either having logic in place to refuse to do release the 
cacheline until the write is complete (ie "locked cycles to the cache"), 
or to re-try the instruction if the cacheline has been released by the 
time the write is ready (ie "load-locked" + "store-conditional" + 
"potentially loop" to the cache).

NOBODY goes out to the bus for locking any more. That would be insane and 
stupid. 

Yes, many spinlocks see contention, and end up going out to the bus. But 
similarly, many spinlocks do _not_ see any contention at all (or other 
CPU's even looking at them), and may end up staying exclusive in a CPU 
cache for a long time.

The "no contention" case is actually pretty important. Many real loads on 
SMP end up being largely single-threaded, and together with some basic CPU 
affinity, you really _really_ want to make that single-threaded case go as 
fast as possible. And a pretty big part of that is locking: the difference 
between a lock that goes to the bus and one that does not is _huge_.

And lots of trivial code is almost dominated by locking costs. In some 
system calls on an SMP kernel, the locking cost can be (depending on how 
good or bad the CPU is at them) quite noticeable. Just a simple small 
read() will take several locks and/or do atomic ops, even if it was cached 
and it looks "trivial".

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 16:32         ` Linus Torvalds
@ 2006-03-09 17:39           ` David Howells
  2006-03-09 17:54             ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: David Howells @ 2006-03-09 17:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Alan Cox, Pavel Machek, akpm, mingo, linux-arch,
	linuxppc64-dev, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:

> Basically, as long as nobody else is reading the lock, the lock will stay 
> in the caches.

I think for the purposes of talking about memory barriers, we consider the
cache to be part of the memory since the cache coherency mechanisms will give
the same effect.

I suppose the way the cache can be viewed as working is that bits of memory
are shuttled around between the CPUs, RAM and any other devices that partake
of the coherency mechanism.

> All modern CPU's do atomic operations entirely within the cache coherency 
> logic.

I know that, and I think it's irrelevant to specifying memory barriers.

> I think x86 still support the notion of a "locked cycle" on the 
> bus,

I wonder if that's what XCHG and XADD do... There's no particular reason they
should be that much slower than LOCK INCL/DECL. Of course, I've only measured
this on my Dual-PPro test box, so other i386 arch CPUs may exhibit other
behaviour.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 17:39           ` David Howells
@ 2006-03-09 17:54             ` Linus Torvalds
  2006-03-09 17:56               ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2006-03-09 17:54 UTC (permalink / raw)
  To: David Howells
  Cc: Alan Cox, Pavel Machek, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel



On Thu, 9 Mar 2006, David Howells wrote:
> 
> I think for the purposes of talking about memory barriers, we consider the
> cache to be part of the memory since the cache coherency mechanisms will give
> the same effect.

Yes and no.

The yes comes from the normal "smp_xxx()" barriers. As far as they are 
concerned, the cache coherency means that caches are invisible.

The "no" comes from the IO side. Basically, since IO bypasses caches and 
sometimes write buffers, it's simply not ordered wrt normal accesses.

And that's where "bus cycles" actually matter wrt barriers. If you have a 
barrier that creates a bus cycle, it suddenly can be ordered wrt IO.

So the fact that x86 SMP ops basically never guarantee any bus cycles 
basically means that they are fundamentally no-ops when it comes to IO 
serialization. That was really my only point.

> > I think x86 still support the notion of a "locked cycle" on the 
> > bus,
> 
> I wonder if that's what XCHG and XADD do... There's no particular reason they
> should be that much slower than LOCK INCL/DECL. Of course, I've only measured
> this on my Dual-PPro test box, so other i386 arch CPUs may exhibit other
> behaviour.

I think it's an internal core implementation detail. I don't think they do 
anything on the bus, but I suspect that they could easily generate less 
optimized uops, simply because they didn't matter as much and didn't fit 
the "normal" core uop sequence.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-09 17:54             ` Linus Torvalds
@ 2006-03-09 17:56               ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2006-03-09 17:56 UTC (permalink / raw)
  To: David Howells
  Cc: Alan Cox, Pavel Machek, akpm, mingo, linux-arch, linuxppc64-dev,
	linux-kernel



On Thu, 9 Mar 2006, Linus Torvalds wrote:
> 
> So the fact that x86 SMP ops basically never guarantee any bus cycles 
> basically means that they are fundamentally no-ops when it comes to IO 
> serialization. That was really my only point.

Side note: of course, locked cycles _do_ "serialize" the core. So they'll 
stop at least the core write merging, and speculative reads. So they do 
have some impact on IO, but they have no way of impacting things like 
write posting etc that is outside the CPU.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Document Linux's memory barriers
  2006-03-08 13:19   ` David Howells
  2006-03-08 21:49     ` Paul Mackerras
@ 2006-03-10  0:49     ` H. Peter Anvin
  1 sibling, 0 replies; 61+ messages in thread
From: H. Peter Anvin @ 2006-03-10  0:49 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <28393.1141823992@warthog.cambridge.redhat.com>
By author:    David Howells <dhowells@redhat.com>
In newsgroup: linux.dev.kernel
> 
> However, on i386, for example, you've actually got at least two different I/O
> access domains, and I don't know how they impinge upon each other (IN/OUT vs
> MOV).
> 

You do, but those aren't the ones.

What you have is instead MOVNT versus everything else.  IN/OUT are
total sledgehammers, as they imply not only nonposted operation, but
the instruction implies wait for completion; this is required since
IN/OUT support emulation via SMI.

	-hpa

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2006-03-10  0:49 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5NONi-2hp-3@gated-at.bofh.it>
     [not found] ` <5NOtZ-1FO-27@gated-at.bofh.it>
     [not found]   ` <5NPgs-2Rw-37@gated-at.bofh.it>
     [not found]     ` <5NPq4-34a-23@gated-at.bofh.it>
2006-03-08  0:22       ` [PATCH] Document Linux's memory barriers Robert Hancock
     [not found] ` <5NQ2U-462-29@gated-at.bofh.it>
     [not found]   ` <5NRLg-6LJ-31@gated-at.bofh.it>
     [not found]     ` <5NRUR-6Yo-11@gated-at.bofh.it>
     [not found]       ` <5NUSF-30Z-5@gated-at.bofh.it>
2006-03-08  1:10         ` Robert Hancock
2006-03-08 11:35           ` Alan Cox
2006-03-08 14:55           ` Andi Kleen
2006-03-07 23:17 Chuck Ebbert
2006-03-08  0:15 ` David S. Miller
2006-03-08  0:24 ` Roberto Nibali
  -- strict thread matches above, loose matches on Subject: below --
2006-03-07 17:40 David Howells
2006-03-07 10:34 ` Andi Kleen
2006-03-07 18:30   ` David Howells
2006-03-07 11:13     ` Andi Kleen
2006-03-07 19:24       ` David Howells
2006-03-07 19:46         ` Stephen Hemminger
2006-03-07 18:46     ` Jesse Barnes
2006-03-07 19:23     ` Bryan O'Sullivan
2006-03-07 11:57       ` Andi Kleen
2006-03-07 20:01         ` Jesse Barnes
2006-03-07 21:14         ` Bryan O'Sullivan
2006-03-07 21:24           ` Andi Kleen
2006-03-08  0:36             ` Alan Cox
2006-03-08  0:35         ` Alan Cox
2006-03-07 17:47 ` Stephen Hemminger
2006-03-07 18:40 ` Alan Cox
2006-03-07 18:54   ` linux-os (Dick Johnson)
2006-03-07 18:54     ` linux-os (Dick Johnson)
2006-03-07 19:06     ` Matthew Wilcox
2006-03-07 19:15       ` linux-os (Dick Johnson)
2006-03-07 19:15         ` linux-os (Dick Johnson)
2006-03-09 11:26         ` Sergei Organov
2006-03-07 19:33     ` Alan Cox
2006-03-07 20:09   ` David Howells
2006-03-08  0:32     ` Alan Cox
2006-03-08  8:25     ` Duncan Sands
2006-03-08 22:06       ` Paul Mackerras
2006-03-08 22:24         ` David S. Miller
2006-03-08 22:31           ` Linus Torvalds
2006-03-08 22:42         ` Alan Cox
2006-03-08  2:07 ` Nick Piggin
2006-03-08  3:10 ` Paul Mackerras
2006-03-08  3:30   ` Linus Torvalds
2006-03-08 12:34     ` David Howells
2006-03-08 16:40       ` Bryan O'Sullivan
2006-03-08  7:41   ` Nick Piggin
2006-03-08 13:19   ` David Howells
2006-03-08 21:49     ` Paul Mackerras
2006-03-08 22:05       ` Alan Cox
2006-03-10  0:49     ` H. Peter Anvin
2006-03-08 16:18 ` Pavel Machek
2006-03-08 20:16   ` David Howells
2006-03-08 22:01     ` Alan Cox
2006-03-09 11:41       ` David Howells
2006-03-09 12:28         ` Alan Cox
2006-03-09 13:02           ` David Howells
2006-03-09 16:32         ` Linus Torvalds
2006-03-09 17:39           ` David Howells
2006-03-09 17:54             ` Linus Torvalds
2006-03-09 17:56               ` Linus Torvalds
2006-03-08 16:26 ` Christoph Lameter
2006-03-08 17:35   ` David Howells
2006-03-08 17:46     ` Christoph Lameter
2006-03-08 17:59       ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.