LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
@ 2007-10-15 20:47 Mikulas Patocka
  2007-10-15 21:37 ` Arjan van de Ven
  0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-15 20:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Linux Kernel Mailing List

> According to latest memory ordering specification documents from Intel 
> and AMD, both manufacturers are committed to in-order loads from 
> cacheable memory for the x86 architecture. Hence, smp_rmb() may be a 
> simple barrier.
>
> http://developer.intel.com/products/processor/manuals/318147.pdf 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

Hi

I'm just wondering about one thing --- what is LFENCE instruction good 
for?

SFENCE is for enforcing ordering in write-combining buffers (it doesn't 
have sense in write-back cache mode).
MFENCE is for preventing of moving stores past loads.

But what is LFENCE for? I read the above documents and they already say 
that CPUs have ordered loads.

In Intel instruction reference, the description for LFENCE is copied from 
SFENCE (with the word "store" replaced with the word "load"), so it 
doesn't really give much insight into the operation of the instruction.

Or is LFENCE just a no-op reserved for the possibility that Intel would 
relax ordering rules?

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-15 20:47 LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Mikulas Patocka
@ 2007-10-15 21:37 ` Arjan van de Ven
  2007-10-15 22:08   ` Mikulas Patocka
  0 siblings, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2007-10-15 21:37 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Nick Piggin, Linux Kernel Mailing List

On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote:

> > According to latest memory ordering specification documents from
> > Intel and AMD, both manufacturers are committed to in-order loads
> > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > may be a simple barrier.
> >
> > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> 
> Hi
> 
> I'm just wondering about one thing --- what is LFENCE instruction
> good for?
> 
> SFENCE is for enforcing ordering in write-combining buffers (it
> doesn't have sense in write-back cache mode).
> MFENCE is for preventing of moving stores past loads.
> 
> But what is LFENCE for? I read the above documents and they already
> say that CPUs have ordered loads.
> 

The cpus also have an explicit set of instructions that deliberately do
unordered stores/loads, and s/lfence etc are mostly designed for those.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-15 21:37 ` Arjan van de Ven
@ 2007-10-15 22:08   ` Mikulas Patocka
  2007-10-16  0:11     ` H. Peter Anvin
  2007-10-16  0:22     ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin
  0 siblings, 2 replies; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-15 22:08 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Nick Piggin, Linux Kernel Mailing List

> On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
> Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote:
> 
> > > According to latest memory ordering specification documents from
> > > Intel and AMD, both manufacturers are committed to in-order loads
> > > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > > may be a simple barrier.
> > >
> > > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> > 
> > Hi
> > 
> > I'm just wondering about one thing --- what is LFENCE instruction
> > good for?
> > 
> > SFENCE is for enforcing ordering in write-combining buffers (it
> > doesn't have sense in write-back cache mode).
> > MFENCE is for preventing of moving stores past loads.
> > 
> > But what is LFENCE for? I read the above documents and they already
> > say that CPUs have ordered loads.
> > 
> 
> The cpus also have an explicit set of instructions that deliberately do 
> unordered stores/loads, and s/lfence etc are mostly designed for those.

I know about unordered stores (movnti & similar) --- they basically use 
write-combining method on memory that is normally write-back --- and they 
need sfence. But which one instruction does unordered load and needs 
lefence?

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-15 22:08   ` Mikulas Patocka
@ 2007-10-16  0:11     ` H. Peter Anvin
  2007-10-16 10:17       ` Mikulas Patocka
  2007-10-16  0:22     ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin
  1 sibling, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2007-10-16  0:11 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List

Mikulas Patocka wrote:
> 
> I know about unordered stores (movnti & similar) --- they basically use 
> write-combining method on memory that is normally write-back --- and they 
> need sfence. But which one instruction does unordered load and needs 
> lefence?
> 

PREFETCHNTA.

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-15 22:08   ` Mikulas Patocka
  2007-10-16  0:11     ` H. Peter Anvin
@ 2007-10-16  0:22     ` Nick Piggin
  2007-10-16 10:33       ` Mikulas Patocka
  2007-10-17  5:51       ` Herbert Xu
  1 sibling, 2 replies; 17+ messages in thread
From: Nick Piggin @ 2007-10-16  0:22 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List

On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote:
> > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST)
> > Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote:
> > 
> > > > According to latest memory ordering specification documents from
> > > > Intel and AMD, both manufacturers are committed to in-order loads
> > > > from cacheable memory for the x86 architecture. Hence, smp_rmb()
> > > > may be a simple barrier.
> > > >
> > > > http://developer.intel.com/products/processor/manuals/318147.pdf 
> > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> > > 
> > > Hi
> > > 
> > > I'm just wondering about one thing --- what is LFENCE instruction
> > > good for?
> > > 
> > > SFENCE is for enforcing ordering in write-combining buffers (it
> > > doesn't have sense in write-back cache mode).
> > > MFENCE is for preventing of moving stores past loads.
> > > 
> > > But what is LFENCE for? I read the above documents and they already
> > > say that CPUs have ordered loads.
> > > 
> > 
> > The cpus also have an explicit set of instructions that deliberately do 
> > unordered stores/loads, and s/lfence etc are mostly designed for those.
> 
> I know about unordered stores (movnti & similar) --- they basically use 
> write-combining method on memory that is normally write-back --- and they 
> need sfence. But which one instruction does unordered load and needs 
> lefence?

Also, for non-wb memory. I don't think the Intel document referenced
says anything about this, but the AMD document says that loads can pass
loads (page 8, rule b).

This is why our rmb() is still an lfence.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16  0:11     ` H. Peter Anvin
@ 2007-10-16 10:17       ` Mikulas Patocka
  2007-10-16 15:42         ` LFENCE instruction H. Peter Anvin
  0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-16 10:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List

On Mon, 15 Oct 2007, H. Peter Anvin wrote:

> Mikulas Patocka wrote:
> > 
> > I know about unordered stores (movnti & similar) --- they basically use
> > write-combining method on memory that is normally write-back --- and they
> > need sfence. But which one instruction does unordered load and needs
> > lefence?
> > 
> 
> PREFETCHNTA.

PREFETCH* doesn't change program semantics. The processor is allowed to 
ignore prefetch instruction if it doesn't have resources needed for 
prefetch. It not ordered wrt. fences.

PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 
cache on Pentium 3 and M --- and it is implemented as prefetch into L2 
cache on other --- do it doesn't really use any special buffers.

Mikulas

> 	-hpa
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16  0:22     ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin
@ 2007-10-16 10:33       ` Mikulas Patocka
  2007-10-16 22:29         ` Nick Piggin
  2007-10-17  5:51       ` Herbert Xu
  1 sibling, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-16 10:33 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List



On Tue, 16 Oct 2007, Nick Piggin wrote:

> > > The cpus also have an explicit set of instructions that deliberately do 
> > > unordered stores/loads, and s/lfence etc are mostly designed for those.
> > 
> > I know about unordered stores (movnti & similar) --- they basically use 
> > write-combining method on memory that is normally write-back --- and they 
> > need sfence. But which one instruction does unordered load and needs 
> > lefence?
> 
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
> 
> This is why our rmb() is still an lfence.

I see, AMD says that WC memory loads can be out-of-order.

There is very little usability to it --- framebuffer and AGP aperture is 
the only piece of memory that is WC and no kernel structures are placed 
there, so it is possible to remove that lfence.

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction
  2007-10-16 10:17       ` Mikulas Patocka
@ 2007-10-16 15:42         ` H. Peter Anvin
  2007-10-16 21:25           ` Mikulas Patocka
  0 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2007-10-16 15:42 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List

Mikulas Patocka wrote:
> On Mon, 15 Oct 2007, H. Peter Anvin wrote:
> 
>> Mikulas Patocka wrote:
>>> I know about unordered stores (movnti & similar) --- they basically use
>>> write-combining method on memory that is normally write-back --- and they
>>> need sfence. But which one instruction does unordered load and needs
>>> lefence?
>>>
>> PREFETCHNTA.
> 
> PREFETCH* doesn't change program semantics. The processor is allowed to 
> ignore prefetch instruction if it doesn't have resources needed for 
> prefetch. It not ordered wrt. fences.
> 
> PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 
> cache on Pentium 3 and M --- and it is implemented as prefetch into L2 
> cache on other --- do it doesn't really use any special buffers.
> 

It's semantics allows it to, though.  It's not clear to me whether it is 
actually necessary on existing chips.

It does, I believe, way-restricted prefetch on existing silicon.

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction
  2007-10-16 15:42         ` LFENCE instruction H. Peter Anvin
@ 2007-10-16 21:25           ` Mikulas Patocka
  0 siblings, 0 replies; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-16 21:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List

On Tue, 16 Oct 2007, H. Peter Anvin wrote:

> Mikulas Patocka wrote:
> > 
> > PREFETCH* doesn't change program semantics. The processor is allowed to
> > ignore prefetch instruction if it doesn't have resources needed for
> > prefetch. It not ordered wrt. fences.
> > 
> > PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 cache
> > on Pentium 3 and M --- and it is implemented as prefetch into L2 cache on
> > other --- do it doesn't really use any special buffers.
> > 
> 
> It's semantics allows it to, though.  It's not clear to me whether it is
> actually necessary on existing chips.
> 
> It does, I believe, way-restricted prefetch on existing silicon.

It is allowed to use special buffers for prefetch, but --- because 
prefetch doesn't change program semantics, these special buffers must be 
kept consistent just like caches --- they must be snooped for bus 
transactions and they must be checked each time something writes to cache.

So I doubt anyone will ever implement it this way --- it's too much 
silicon for too little effect.

Mikulas

> 	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16 10:33       ` Mikulas Patocka
@ 2007-10-16 22:29         ` Nick Piggin
  2007-10-16 23:05           ` Mikulas Patocka
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2007-10-16 22:29 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List

On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote:
> 
> 
> On Tue, 16 Oct 2007, Nick Piggin wrote:
> 
> > > > The cpus also have an explicit set of instructions that deliberately do 
> > > > unordered stores/loads, and s/lfence etc are mostly designed for those.
> > > 
> > > I know about unordered stores (movnti & similar) --- they basically use 
> > > write-combining method on memory that is normally write-back --- and they 
> > > need sfence. But which one instruction does unordered load and needs 
> > > lefence?
> > 
> > Also, for non-wb memory. I don't think the Intel document referenced
> > says anything about this, but the AMD document says that loads can pass
> > loads (page 8, rule b).
> > 
> > This is why our rmb() is still an lfence.
> 
> I see, AMD says that WC memory loads can be out-of-order.
> 
> There is very little usability to it --- framebuffer and AGP aperture is 
> the only piece of memory that is WC and no kernel structures are placed 
> there, so it is possible to remove that lfence.

No. In Linux kernel, rmb() means that all previous loads, including to
any IO regions, will be executed before any subsequent load.

How can you possibly get rid of lfence from there just because you may
happen to *know* that it isn't used (btw. the IO serialisation isn't for
kernel data structures, it is for actual IO operations, generally).

Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
then they don't call it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16 22:29         ` Nick Piggin
@ 2007-10-16 23:05           ` Mikulas Patocka
  2007-10-16 23:21             ` Nick Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-16 23:05 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List

> > I see, AMD says that WC memory loads can be out-of-order.
> > 
> > There is very little usability to it --- framebuffer and AGP aperture is 
> > the only piece of memory that is WC and no kernel structures are placed 
> > there, so it is possible to remove that lfence.
> 
> No. In Linux kernel, rmb() means that all previous loads, including to
> any IO regions, will be executed before any subsequent load.

You already must not place any data structures into WC memory --- for 
example, spinlocks wouldn't work there. wmb() also won't work on WC 
memory, because it assumes that writes are ordered.

> How can you possibly get rid of lfence from there just because you may
> happen to *know* that it isn't used (btw. the IO serialisation isn't for
> kernel data structures, it is for actual IO operations, generally).

IO regions are in uncached memory, and x86 already serializes it fine. It 
flushes any write buffers on access to uncached memory.

(BTW. what is the general portable rule for serializing writel() and 
readl()? On x86 they are serialized in hardware, but what on other archs?)

> Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
> then they don't call it.

If wmb() doesn't currently work on write-combining memory, why should 
rmb() work there?

The purpose of rmb() is to enforce ordering on architectures that don't 
force it in hardware --- that is not the case of x86.

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16 23:05           ` Mikulas Patocka
@ 2007-10-16 23:21             ` Nick Piggin
  2007-10-17  0:30               ` Mikulas Patocka
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2007-10-16 23:21 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List

On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote:
> > > I see, AMD says that WC memory loads can be out-of-order.
> > > 
> > > There is very little usability to it --- framebuffer and AGP aperture is 
> > > the only piece of memory that is WC and no kernel structures are placed 
> > > there, so it is possible to remove that lfence.
> > 
> > No. In Linux kernel, rmb() means that all previous loads, including to
> > any IO regions, will be executed before any subsequent load.
> 
> You already must not place any data structures into WC memory --- for 
> example, spinlocks wouldn't work there.

What do you mean "already"? If we already have drivers loading data from
WC memory, then rmb() needs to order them, whether or not they actually
need it. If that were prohibitively costly, then we'd introduce a new
barrier which does not order WC memory, right?

> wmb() also won't work on WC 
> memory, because it assumes that writes are ordered.

You mean the one defined like this:
  #define wmb()   asm volatile("sfence" ::: "memory")
? If it assumed writes are ordered, then it would just be a barrier().

> > How can you possibly get rid of lfence from there just because you may
> > happen to *know* that it isn't used (btw. the IO serialisation isn't for
> > kernel data structures, it is for actual IO operations, generally).
> 
> IO regions are in uncached memory, and x86 already serializes it fine. It 
> flushes any write buffers on access to uncached memory.
> 
> (BTW. what is the general portable rule for serializing writel() and 
> readl()? On x86 they are serialized in hardware, but what on other archs?)

Most tend to order them strongly these days. There are also relaxed
variants for architectures that can take advantage of them.

> > Doing that would lead to an unmaintainable mess. If drivers don't need rmb,
> > then they don't call it.
> 
> If wmb() doesn't currently work on write-combining memory, why should 
> rmb() work there?

I don't understand why you say wmb() doesn't work on WC memory. What part
of which spec are you reading (or, given your mistrust of specs, what CPU
are you seeing failures with)?

> The purpose of rmb() is to enforce ordering on architectures that don't 
> force it in hardware --- that is not the case of x86.

Well it clearly is the case because I just pointed you to a document
that says they can go out of order. If you want to argue that existing
implementations do not, then by all means go ahead and send a patch to
Linus and see what he says about it ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16 23:21             ` Nick Piggin
@ 2007-10-17  0:30               ` Mikulas Patocka
  2007-10-17 12:24                 ` Nick Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-17  0:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List

> > You already must not place any data structures into WC memory --- for 
> > example, spinlocks wouldn't work there.
> 
> What do you mean "already"?

I mean "in current kernel" (I checked it in 2.6.22)

> If we already have drivers loading data from
> WC memory, then rmb() needs to order them, whether or not they actually
> need it. If that were prohibitively costly, then we'd introduce a new
> barrier which does not order WC memory, right?
> 
> 
> > wmb() also won't work on WC 
> > memory, because it assumes that writes are ordered.
> 
> You mean the one defined like this:
>   #define wmb()   asm volatile("sfence" ::: "memory")
> ? If it assumed writes are ordered, then it would just be a barrier().

You read wrong part of the include file. Really, it is 
(2.6.22,include/asm-i386/system.h):
#ifdef CONFIG_X86_OOSTORE
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", 
X86_FEATURE_XMM)
#else
#define wmb()   __asm__ __volatile__ ("": : :"memory")
#endif

CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
--- so on Intel and AMD, it is really just barrier().

So drivers can't assume that wmb() works on write-combining memory.

> > > Doing that would lead to an unmaintainable mess. If drivers don't 
> > > need rmb, then they don't call it.
> > 
> > If wmb() doesn't currently work on write-combining memory, why should 
> > rmb() work there?
> 
> I don't understand why you say wmb() doesn't work on WC memory.

Because it is defined as __asm__ __volatile__ ("": : :"memory")

And WC memory can reorder writes (WB memory can't).

> > The purpose of rmb() is to enforce ordering on architectures that don't 
> > force it in hardware --- that is not the case of x86.
> 
> Well it clearly is the case because I just pointed you to a document
> that says they can go out of order.

> If you want to argue that existing
> implementations do not, then by all means go ahead and send a patch to
> Linus and see what he says about it ;)

I mean this: wmb() assumes that the data to be ordered are not in WC 
memory. rmb() assumes that the data can be in WC memory (lfence is only 
useful on WC --- it doesn't have any effect on other memory types).

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-16  0:22     ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin
  2007-10-16 10:33       ` Mikulas Patocka
@ 2007-10-17  5:51       ` Herbert Xu
  2007-10-17 12:28         ` Nick Piggin
  1 sibling, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-10-17  5:51 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mikulas, arjan, linux-kernel, virtualization

Nick Piggin <npiggin@suse.de> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
> 
> This is why our rmb() is still an lfence.

BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
shared with other Xen domains or the hypervisor.

The reason this is necessary is because even if a Xen domain is
UP the hypervisor might be SMP.

It would be nice if we can have these adopt the new SMP barriers
on x86 instead of the IO ones as they currently do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-17  0:30               ` Mikulas Patocka
@ 2007-10-17 12:24                 ` Nick Piggin
  2007-10-18 17:06                   ` Mikulas Patocka
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2007-10-17 12:24 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List

On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote:
> > > You already must not place any data structures into WC memory --- for 
> > > example, spinlocks wouldn't work there.
> > 
> > What do you mean "already"?
> 
> I mean "in current kernel" (I checked it in 2.6.22)

Ahh, that's not "current kernel", though ;)

4071c718555d955a35e9651f77086096ad87d498

 
> > If we already have drivers loading data from
> > WC memory, then rmb() needs to order them, whether or not they actually
> > need it. If that were prohibitively costly, then we'd introduce a new
> > barrier which does not order WC memory, right?
> > 
> > 
> > > wmb() also won't work on WC 
> > > memory, because it assumes that writes are ordered.
> > 
> > You mean the one defined like this:
> >   #define wmb()   asm volatile("sfence" ::: "memory")
> > ? If it assumed writes are ordered, then it would just be a barrier().
> 
> You read wrong part of the include file. Really, it is 
> (2.6.22,include/asm-i386/system.h):
> #ifdef CONFIG_X86_OOSTORE
> #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", 
> X86_FEATURE_XMM)
> #else
> #define wmb()   __asm__ __volatile__ ("": : :"memory")
> #endif
> 
> CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
> --- so on Intel and AMD, it is really just barrier().
> 
> So drivers can't assume that wmb() works on write-combining memory.

Drivers should be able to assume that wmb() orders _everything_ (except
some whacky Altix thing, which I really want to fold under wmb at some
point anyway).

So I decided that old x86 semantics isn't right, and now it really is a
lock op / sfence everywhere.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-17  5:51       ` Herbert Xu
@ 2007-10-17 12:28         ` Nick Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2007-10-17 12:28 UTC (permalink / raw)
  To: Herbert Xu; +Cc: mikulas, arjan, linux-kernel, virtualization

On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote:
> Nick Piggin <npiggin@suse.de> wrote:
> >
> > Also, for non-wb memory. I don't think the Intel document referenced
> > says anything about this, but the AMD document says that loads can pass
> > loads (page 8, rule b).
> > 
> > This is why our rmb() is still an lfence.
> 
> BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
> instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
> shared with other Xen domains or the hypervisor.
> 
> The reason this is necessary is because even if a Xen domain is
> UP the hypervisor might be SMP.
> 
> It would be nice if we can have these adopt the new SMP barriers
> on x86 instead of the IO ones as they currently do.

That's a good point actually. Something like raw_smp_*mb, which
always orders memory, but only for regular WB operatoins. I could
put that on the todo list...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
  2007-10-17 12:24                 ` Nick Piggin
@ 2007-10-18 17:06                   ` Mikulas Patocka
  0 siblings, 0 replies; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-18 17:06 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List

> > > > You already must not place any data structures into WC memory --- for 
> > > > example, spinlocks wouldn't work there.
> > > 
> > > What do you mean "already"?
> > 
> > I mean "in current kernel" (I checked it in 2.6.22)
> 
> Ahh, that's not "current kernel", though ;)
> 
> 4071c718555d955a35e9651f77086096ad87d498
>
> > So drivers can't assume that wmb() works on write-combining memory.
> 
> Drivers should be able to assume that wmb() orders _everything_ (except
> some whacky Altix thing, which I really want to fold under wmb at some
> point anyway).
> 
> So I decided that old x86 semantics isn't right, and now it really is a
> lock op / sfence everywhere.

I see. I'm just curious --- is there any real usage for WC memory, except 
graphics card memory?

Mikulas

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-10-18 17:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-15 20:47 LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Mikulas Patocka
2007-10-15 21:37 ` Arjan van de Ven
2007-10-15 22:08   ` Mikulas Patocka
2007-10-16  0:11     ` H. Peter Anvin
2007-10-16 10:17       ` Mikulas Patocka
2007-10-16 15:42         ` LFENCE instruction H. Peter Anvin
2007-10-16 21:25           ` Mikulas Patocka
2007-10-16  0:22     ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin
2007-10-16 10:33       ` Mikulas Patocka
2007-10-16 22:29         ` Nick Piggin
2007-10-16 23:05           ` Mikulas Patocka
2007-10-16 23:21             ` Nick Piggin
2007-10-17  0:30               ` Mikulas Patocka
2007-10-17 12:24                 ` Nick Piggin
2007-10-18 17:06                   ` Mikulas Patocka
2007-10-17  5:51       ` Herbert Xu
2007-10-17 12:28         ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox