* LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
@ 2007-10-15 20:47 Mikulas Patocka
2007-10-15 21:37 ` Arjan van de Ven
0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2007-10-15 20:47 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Kernel Mailing List
> According to latest memory ordering specification documents from Intel
> and AMD, both manufacturers are committed to in-order loads from
> cacheable memory for the x86 architecture. Hence, smp_rmb() may be a
> simple barrier.
>
> http://developer.intel.com/products/processor/manuals/318147.pdf
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
Hi
I'm just wondering about one thing --- what is LFENCE instruction good
for?
SFENCE is for enforcing ordering in write-combining buffers (it doesn't
have sense in write-back cache mode).
MFENCE is for preventing of moving stores past loads.
But what is LFENCE for? I read the above documents and they already say
that CPUs have ordered loads.
In Intel instruction reference, the description for LFENCE is copied from
SFENCE (with the word "store" replaced with the word "load"), so it
doesn't really give much insight into the operation of the instruction.
Or is LFENCE just a no-op reserved for the possibility that Intel would
relax ordering rules?
Mikulas
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-15 20:47 LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Mikulas Patocka @ 2007-10-15 21:37 ` Arjan van de Ven 2007-10-15 22:08 ` Mikulas Patocka 0 siblings, 1 reply; 17+ messages in thread From: Arjan van de Ven @ 2007-10-15 21:37 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Nick Piggin, Linux Kernel Mailing List On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > According to latest memory ordering specification documents from > > Intel and AMD, both manufacturers are committed to in-order loads > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > may be a simple barrier. > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > Hi > > I'm just wondering about one thing --- what is LFENCE instruction > good for? > > SFENCE is for enforcing ordering in write-combining buffers (it > doesn't have sense in write-back cache mode). > MFENCE is for preventing of moving stores past loads. > > But what is LFENCE for? I read the above documents and they already > say that CPUs have ordered loads. > The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-15 21:37 ` Arjan van de Ven @ 2007-10-15 22:08 ` Mikulas Patocka 2007-10-16 0:11 ` H. Peter Anvin 2007-10-16 0:22 ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin 0 siblings, 2 replies; 17+ messages in thread From: Mikulas Patocka @ 2007-10-15 22:08 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Nick Piggin, Linux Kernel Mailing List > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > > According to latest memory ordering specification documents from > > > Intel and AMD, both manufacturers are committed to in-order loads > > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > > may be a simple barrier. > > > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > > > Hi > > > > I'm just wondering about one thing --- what is LFENCE instruction > > good for? > > > > SFENCE is for enforcing ordering in write-combining buffers (it > > doesn't have sense in write-back cache mode). > > MFENCE is for preventing of moving stores past loads. > > > > But what is LFENCE for? I read the above documents and they already > > say that CPUs have ordered loads. > > > > The cpus also have an explicit set of instructions that deliberately do > unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? Mikulas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-15 22:08 ` Mikulas Patocka @ 2007-10-16 0:11 ` H. Peter Anvin 2007-10-16 10:17 ` Mikulas Patocka 2007-10-16 0:22 ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin 1 sibling, 1 reply; 17+ messages in thread From: H. Peter Anvin @ 2007-10-16 0:11 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List Mikulas Patocka wrote: > > I know about unordered stores (movnti & similar) --- they basically use > write-combining method on memory that is normally write-back --- and they > need sfence. But which one instruction does unordered load and needs > lefence? > PREFETCHNTA. -hpa ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 0:11 ` H. Peter Anvin @ 2007-10-16 10:17 ` Mikulas Patocka 2007-10-16 15:42 ` LFENCE instruction H. Peter Anvin 0 siblings, 1 reply; 17+ messages in thread From: Mikulas Patocka @ 2007-10-16 10:17 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List On Mon, 15 Oct 2007, H. Peter Anvin wrote: > Mikulas Patocka wrote: > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and needs > > lefence? > > > > PREFETCHNTA. PREFETCH* doesn't change program semantics. The processor is allowed to ignore prefetch instruction if it doesn't have resources needed for prefetch. It not ordered wrt. fences. PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 cache on Pentium 3 and M --- and it is implemented as prefetch into L2 cache on other --- do it doesn't really use any special buffers. Mikulas > -hpa > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction 2007-10-16 10:17 ` Mikulas Patocka @ 2007-10-16 15:42 ` H. Peter Anvin 2007-10-16 21:25 ` Mikulas Patocka 0 siblings, 1 reply; 17+ messages in thread From: H. Peter Anvin @ 2007-10-16 15:42 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List Mikulas Patocka wrote: > On Mon, 15 Oct 2007, H. Peter Anvin wrote: > >> Mikulas Patocka wrote: >>> I know about unordered stores (movnti & similar) --- they basically use >>> write-combining method on memory that is normally write-back --- and they >>> need sfence. But which one instruction does unordered load and needs >>> lefence? >>> >> PREFETCHNTA. > > PREFETCH* doesn't change program semantics. The processor is allowed to > ignore prefetch instruction if it doesn't have resources needed for > prefetch. It not ordered wrt. fences. > > PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 > cache on Pentium 3 and M --- and it is implemented as prefetch into L2 > cache on other --- do it doesn't really use any special buffers. > It's semantics allows it to, though. It's not clear to me whether it is actually necessary on existing chips. It does, I believe, way-restricted prefetch on existing silicon. -hpa ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction 2007-10-16 15:42 ` LFENCE instruction H. Peter Anvin @ 2007-10-16 21:25 ` Mikulas Patocka 0 siblings, 0 replies; 17+ messages in thread From: Mikulas Patocka @ 2007-10-16 21:25 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Arjan van de Ven, Nick Piggin, Linux Kernel Mailing List On Tue, 16 Oct 2007, H. Peter Anvin wrote: > Mikulas Patocka wrote: > > > > PREFETCH* doesn't change program semantics. The processor is allowed to > > ignore prefetch instruction if it doesn't have resources needed for > > prefetch. It not ordered wrt. fences. > > > > PREFETCHNTA was implemented as prefetch into L1 cache and omitting L2 cache > > on Pentium 3 and M --- and it is implemented as prefetch into L2 cache on > > other --- do it doesn't really use any special buffers. > > > > It's semantics allows it to, though. It's not clear to me whether it is > actually necessary on existing chips. > > It does, I believe, way-restricted prefetch on existing silicon. It is allowed to use special buffers for prefetch, but --- because prefetch doesn't change program semantics, these special buffers must be kept consistent just like caches --- they must be snooped for bus transactions and they must be checked each time something writes to cache. So I doubt anyone will ever implement it this way --- it's too much silicon for too little effect. Mikulas > -hpa ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-15 22:08 ` Mikulas Patocka 2007-10-16 0:11 ` H. Peter Anvin @ 2007-10-16 0:22 ` Nick Piggin 2007-10-16 10:33 ` Mikulas Patocka 2007-10-17 5:51 ` Herbert Xu 1 sibling, 2 replies; 17+ messages in thread From: Nick Piggin @ 2007-10-16 0:22 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: > > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > > Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > > > > According to latest memory ordering specification documents from > > > > Intel and AMD, both manufacturers are committed to in-order loads > > > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > > > may be a simple barrier. > > > > > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > > > > > Hi > > > > > > I'm just wondering about one thing --- what is LFENCE instruction > > > good for? > > > > > > SFENCE is for enforcing ordering in write-combining buffers (it > > > doesn't have sense in write-back cache mode). > > > MFENCE is for preventing of moving stores past loads. > > > > > > But what is LFENCE for? I read the above documents and they already > > > say that CPUs have ordered loads. > > > > > > > The cpus also have an explicit set of instructions that deliberately do > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > I know about unordered stores (movnti & similar) --- they basically use > write-combining method on memory that is normally write-back --- and they > need sfence. But which one instruction does unordered load and needs > lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 0:22 ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin @ 2007-10-16 10:33 ` Mikulas Patocka 2007-10-16 22:29 ` Nick Piggin 2007-10-17 5:51 ` Herbert Xu 1 sibling, 1 reply; 17+ messages in thread From: Mikulas Patocka @ 2007-10-16 10:33 UTC (permalink / raw) To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Tue, 16 Oct 2007, Nick Piggin wrote: > > > The cpus also have an explicit set of instructions that deliberately do > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and needs > > lefence? > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. Mikulas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 10:33 ` Mikulas Patocka @ 2007-10-16 22:29 ` Nick Piggin 2007-10-16 23:05 ` Mikulas Patocka 0 siblings, 1 reply; 17+ messages in thread From: Nick Piggin @ 2007-10-16 22:29 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: > > > On Tue, 16 Oct 2007, Nick Piggin wrote: > > > > > The cpus also have an explicit set of instructions that deliberately do > > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > > > I know about unordered stores (movnti & similar) --- they basically use > > > write-combining method on memory that is normally write-back --- and they > > > need sfence. But which one instruction does unordered load and needs > > > lefence? > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is why our rmb() is still an lfence. > > I see, AMD says that WC memory loads can be out-of-order. > > There is very little usability to it --- framebuffer and AGP aperture is > the only piece of memory that is WC and no kernel structures are placed > there, so it is possible to remove that lfence. No. In Linux kernel, rmb() means that all previous loads, including to any IO regions, will be executed before any subsequent load. How can you possibly get rid of lfence from there just because you may happen to *know* that it isn't used (btw. the IO serialisation isn't for kernel data structures, it is for actual IO operations, generally). Doing that would lead to an unmaintainable mess. If drivers don't need rmb, then they don't call it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 22:29 ` Nick Piggin @ 2007-10-16 23:05 ` Mikulas Patocka 2007-10-16 23:21 ` Nick Piggin 0 siblings, 1 reply; 17+ messages in thread From: Mikulas Patocka @ 2007-10-16 23:05 UTC (permalink / raw) To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List > > I see, AMD says that WC memory loads can be out-of-order. > > > > There is very little usability to it --- framebuffer and AGP aperture is > > the only piece of memory that is WC and no kernel structures are placed > > there, so it is possible to remove that lfence. > > No. In Linux kernel, rmb() means that all previous loads, including to > any IO regions, will be executed before any subsequent load. You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. wmb() also won't work on WC memory, because it assumes that writes are ordered. > How can you possibly get rid of lfence from there just because you may > happen to *know* that it isn't used (btw. the IO serialisation isn't for > kernel data structures, it is for actual IO operations, generally). IO regions are in uncached memory, and x86 already serializes it fine. It flushes any write buffers on access to uncached memory. (BTW. what is the general portable rule for serializing writel() and readl()? On x86 they are serialized in hardware, but what on other archs?) > Doing that would lead to an unmaintainable mess. If drivers don't need rmb, > then they don't call it. If wmb() doesn't currently work on write-combining memory, why should rmb() work there? The purpose of rmb() is to enforce ordering on architectures that don't force it in hardware --- that is not the case of x86. Mikulas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 23:05 ` Mikulas Patocka @ 2007-10-16 23:21 ` Nick Piggin 2007-10-17 0:30 ` Mikulas Patocka 0 siblings, 1 reply; 17+ messages in thread From: Nick Piggin @ 2007-10-16 23:21 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: > > > I see, AMD says that WC memory loads can be out-of-order. > > > > > > There is very little usability to it --- framebuffer and AGP aperture is > > > the only piece of memory that is WC and no kernel structures are placed > > > there, so it is possible to remove that lfence. > > > > No. In Linux kernel, rmb() means that all previous loads, including to > > any IO regions, will be executed before any subsequent load. > > You already must not place any data structures into WC memory --- for > example, spinlocks wouldn't work there. What do you mean "already"? If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or not they actually need it. If that were prohibitively costly, then we'd introduce a new barrier which does not order WC memory, right? > wmb() also won't work on WC > memory, because it assumes that writes are ordered. You mean the one defined like this: #define wmb() asm volatile("sfence" ::: "memory") ? If it assumed writes are ordered, then it would just be a barrier(). > > How can you possibly get rid of lfence from there just because you may > > happen to *know* that it isn't used (btw. the IO serialisation isn't for > > kernel data structures, it is for actual IO operations, generally). > > IO regions are in uncached memory, and x86 already serializes it fine. It > flushes any write buffers on access to uncached memory. > > (BTW. what is the general portable rule for serializing writel() and > readl()? On x86 they are serialized in hardware, but what on other archs?) Most tend to order them strongly these days. There are also relaxed variants for architectures that can take advantage of them. > > Doing that would lead to an unmaintainable mess. If drivers don't need rmb, > > then they don't call it. > > If wmb() doesn't currently work on write-combining memory, why should > rmb() work there? I don't understand why you say wmb() doesn't work on WC memory. What part of which spec are you reading (or, given your mistrust of specs, what CPU are you seeing failures with)? > The purpose of rmb() is to enforce ordering on architectures that don't > force it in hardware --- that is not the case of x86. Well it clearly is the case because I just pointed you to a document that says they can go out of order. If you want to argue that existing implementations do not, then by all means go ahead and send a patch to Linus and see what he says about it ;) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 23:21 ` Nick Piggin @ 2007-10-17 0:30 ` Mikulas Patocka 2007-10-17 12:24 ` Nick Piggin 0 siblings, 1 reply; 17+ messages in thread From: Mikulas Patocka @ 2007-10-17 0:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List > > You already must not place any data structures into WC memory --- for > > example, spinlocks wouldn't work there. > > What do you mean "already"? I mean "in current kernel" (I checked it in 2.6.22) > If we already have drivers loading data from > WC memory, then rmb() needs to order them, whether or not they actually > need it. If that were prohibitively costly, then we'd introduce a new > barrier which does not order WC memory, right? > > > > wmb() also won't work on WC > > memory, because it assumes that writes are ordered. > > You mean the one defined like this: > #define wmb() asm volatile("sfence" ::: "memory") > ? If it assumed writes are ordered, then it would just be a barrier(). You read wrong part of the include file. Really, it is (2.6.22,include/asm-i386/system.h): #ifdef CONFIG_X86_OOSTORE #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) #else #define wmb() __asm__ __volatile__ ("": : :"memory") #endif CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 --- so on Intel and AMD, it is really just barrier(). So drivers can't assume that wmb() works on write-combining memory. > > > Doing that would lead to an unmaintainable mess. If drivers don't > > > need rmb, then they don't call it. > > > > If wmb() doesn't currently work on write-combining memory, why should > > rmb() work there? > > I don't understand why you say wmb() doesn't work on WC memory. Because it is defined as __asm__ __volatile__ ("": : :"memory") And WC memory can reorder writes (WB memory can't). > > The purpose of rmb() is to enforce ordering on architectures that don't > > force it in hardware --- that is not the case of x86. > > Well it clearly is the case because I just pointed you to a document > that says they can go out of order. > If you want to argue that existing > implementations do not, then by all means go ahead and send a patch to > Linus and see what he says about it ;) I mean this: wmb() assumes that the data to be ordered are not in WC memory. rmb() assumes that the data can be in WC memory (lfence is only useful on WC --- it doesn't have any effect on other memory types). Mikulas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-17 0:30 ` Mikulas Patocka @ 2007-10-17 12:24 ` Nick Piggin 2007-10-18 17:06 ` Mikulas Patocka 0 siblings, 1 reply; 17+ messages in thread From: Nick Piggin @ 2007-10-17 12:24 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: > > > You already must not place any data structures into WC memory --- for > > > example, spinlocks wouldn't work there. > > > > What do you mean "already"? > > I mean "in current kernel" (I checked it in 2.6.22) Ahh, that's not "current kernel", though ;) 4071c718555d955a35e9651f77086096ad87d498 > > If we already have drivers loading data from > > WC memory, then rmb() needs to order them, whether or not they actually > > need it. If that were prohibitively costly, then we'd introduce a new > > barrier which does not order WC memory, right? > > > > > > > wmb() also won't work on WC > > > memory, because it assumes that writes are ordered. > > > > You mean the one defined like this: > > #define wmb() asm volatile("sfence" ::: "memory") > > ? If it assumed writes are ordered, then it would just be a barrier(). > > You read wrong part of the include file. Really, it is > (2.6.22,include/asm-i386/system.h): > #ifdef CONFIG_X86_OOSTORE > #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", > X86_FEATURE_XMM) > #else > #define wmb() __asm__ __volatile__ ("": : :"memory") > #endif > > CONFIG_X86_OOSTORE is dependent on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 > --- so on Intel and AMD, it is really just barrier(). > > So drivers can't assume that wmb() works on write-combining memory. Drivers should be able to assume that wmb() orders _everything_ (except some whacky Altix thing, which I really want to fold under wmb at some point anyway). So I decided that old x86 semantics isn't right, and now it really is a lock op / sfence everywhere. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-17 12:24 ` Nick Piggin @ 2007-10-18 17:06 ` Mikulas Patocka 0 siblings, 0 replies; 17+ messages in thread From: Mikulas Patocka @ 2007-10-18 17:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Arjan van de Ven, Linux Kernel Mailing List > > > > You already must not place any data structures into WC memory --- for > > > > example, spinlocks wouldn't work there. > > > > > > What do you mean "already"? > > > > I mean "in current kernel" (I checked it in 2.6.22) > > Ahh, that's not "current kernel", though ;) > > 4071c718555d955a35e9651f77086096ad87d498 > > > So drivers can't assume that wmb() works on write-combining memory. > > Drivers should be able to assume that wmb() orders _everything_ (except > some whacky Altix thing, which I really want to fold under wmb at some > point anyway). > > So I decided that old x86 semantics isn't right, and now it really is a > lock op / sfence everywhere. I see. I'm just curious --- is there any real usage for WC memory, except graphics card memory? Mikulas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-16 0:22 ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin 2007-10-16 10:33 ` Mikulas Patocka @ 2007-10-17 5:51 ` Herbert Xu 2007-10-17 12:28 ` Nick Piggin 1 sibling, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-10-17 5:51 UTC (permalink / raw) To: Nick Piggin; +Cc: mikulas, arjan, linux-kernel, virtualization Nick Piggin <npiggin@suse.de> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's shared with other Xen domains or the hypervisor. The reason this is necessary is because even if a Xen domain is UP the hypervisor might be SMP. It would be nice if we can have these adopt the new SMP barriers on x86 instead of the IO ones as they currently do. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) 2007-10-17 5:51 ` Herbert Xu @ 2007-10-17 12:28 ` Nick Piggin 0 siblings, 0 replies; 17+ messages in thread From: Nick Piggin @ 2007-10-17 12:28 UTC (permalink / raw) To: Herbert Xu; +Cc: mikulas, arjan, linux-kernel, virtualization On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: > Nick Piggin <npiggin@suse.de> wrote: > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is why our rmb() is still an lfence. > > BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb > instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's > shared with other Xen domains or the hypervisor. > > The reason this is necessary is because even if a Xen domain is > UP the hypervisor might be SMP. > > It would be nice if we can have these adopt the new SMP barriers > on x86 instead of the IO ones as they currently do. That's a good point actually. Something like raw_smp_*mb, which always orders memory, but only for regular WB operatoins. I could put that on the todo list... ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-10-18 17:06 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-15 20:47 LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Mikulas Patocka 2007-10-15 21:37 ` Arjan van de Ven 2007-10-15 22:08 ` Mikulas Patocka 2007-10-16 0:11 ` H. Peter Anvin 2007-10-16 10:17 ` Mikulas Patocka 2007-10-16 15:42 ` LFENCE instruction H. Peter Anvin 2007-10-16 21:25 ` Mikulas Patocka 2007-10-16 0:22 ` LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) Nick Piggin 2007-10-16 10:33 ` Mikulas Patocka 2007-10-16 22:29 ` Nick Piggin 2007-10-16 23:05 ` Mikulas Patocka 2007-10-16 23:21 ` Nick Piggin 2007-10-17 0:30 ` Mikulas Patocka 2007-10-17 12:24 ` Nick Piggin 2007-10-18 17:06 ` Mikulas Patocka 2007-10-17 5:51 ` Herbert Xu 2007-10-17 12:28 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox