* x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
@ 2016-03-03 14:33 Dexuan Cui
2016-03-03 15:27 ` Ingo Molnar
0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2016-03-03 14:33 UTC (permalink / raw)
To: linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, David Howells, Paul E. McKenney
Cc: linux-kernel@vger.kernel.org
Hi,
My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
don't support XMM2.
However, it looks people say Locked Add is much faster than the FENCE
instructions, even on modern Intel CPUs like Haswell, e.g., please see
the three sources:
" 11.5.1 Locked Instructions as Memory Barriers
Optimization
Use locked instructions to implement Store/Store and Store/Load barriers.
"
http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
"lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
"...locked instruction are more efficient barriers...":
http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
I also found that FreeBSD prefers Locked Add.
So, I'm curious why Linux prefers MFENCE.
I guess I may be missing something.
I tried to google the question, but didn't find an answer.
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 14:33 x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Dexuan Cui @ 2016-03-03 15:27 ` Ingo Molnar 2016-03-03 15:34 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2016-03-03 15:27 UTC (permalink / raw) To: Dexuan Cui Cc: linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org, Michael S. Tsirkin, Peter Zijlstra * Dexuan Cui <decui@microsoft.com> wrote: > Hi, > My understanding about arch/x86/include/asm/barrier.h is: obviously Linux > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that > don't support XMM2. > > However, it looks people say Locked Add is much faster than the FENCE > instructions, even on modern Intel CPUs like Haswell, e.g., please see > the three sources: > > " 11.5.1 Locked Instructions as Memory Barriers > Optimization > Use locked instructions to implement Store/Store and Store/Load barriers. > " > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ": > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > "...locked instruction are more efficient barriers...": > http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ > > I also found that FreeBSD prefers Locked Add. > > So, I'm curious why Linux prefers MFENCE. > I guess I may be missing something. > > I tried to google the question, but didn't find an answer. It's being worked on, see this thread on lkml from a few weeks ago: C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster mb()+documentation tweaks C Jan 13 Michael S. Tsir | ├─>[PATCH v3 1/4] x86: add cc clobber for addl C Jan 13 Michael S. Tsir | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE C Jan 13 Michael S. Tsir | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO C Jan 13 Michael S. Tsir | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl The 4th patch changes MFENCE to a LOCK ADDL locked instruction. Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 15:27 ` Ingo Molnar @ 2016-03-03 15:34 ` Peter Zijlstra 2016-03-03 18:35 ` Michael S. Tsirkin 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2016-03-03 15:34 UTC (permalink / raw) To: Ingo Molnar Cc: Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org, Michael S. Tsirkin On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote: > > * Dexuan Cui <decui@microsoft.com> wrote: > > > Hi, > > My understanding about arch/x86/include/asm/barrier.h is: obviously Linux > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that > > don't support XMM2. > > > > However, it looks people say Locked Add is much faster than the FENCE > > instructions, even on modern Intel CPUs like Haswell, e.g., please see > > the three sources: > > > > " 11.5.1 Locked Instructions as Memory Barriers > > Optimization > > Use locked instructions to implement Store/Store and Store/Load barriers. > > " > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf > > > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ": > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > > > "...locked instruction are more efficient barriers...": > > http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ > > > > I also found that FreeBSD prefers Locked Add. > > > > So, I'm curious why Linux prefers MFENCE. > > I guess I may be missing something. > > > > I tried to google the question, but didn't find an answer. > > It's being worked on, see this thread on lkml from a few weeks ago: > > C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster mb()+documentation tweaks > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 1/4] x86: add cc clobber for addl > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl > > The 4th patch changes MFENCE to a LOCK ADDL locked instruction. Lots of additional chatter here: lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com And some useful bits here: lkml.kernel.org/r/56957D54.5000602@zytor.com latest version here: lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 15:34 ` Peter Zijlstra @ 2016-03-03 18:35 ` Michael S. Tsirkin 2016-03-03 19:05 ` H. Peter Anvin 0 siblings, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2016-03-03 18:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote: > On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote: > > > > * Dexuan Cui <decui@microsoft.com> wrote: > > > > > Hi, > > > My understanding about arch/x86/include/asm/barrier.h is: obviously Linux > > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that > > > don't support XMM2. > > > > > > However, it looks people say Locked Add is much faster than the FENCE > > > instructions, even on modern Intel CPUs like Haswell, e.g., please see > > > the three sources: > > > > > > " 11.5.1 Locked Instructions as Memory Barriers > > > Optimization > > > Use locked instructions to implement Store/Store and Store/Load barriers. > > > " > > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf > > > > > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ": > > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > > > > > "...locked instruction are more efficient barriers...": > > > http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ > > > > > > I also found that FreeBSD prefers Locked Add. > > > > > > So, I'm curious why Linux prefers MFENCE. > > > I guess I may be missing something. > > > > > > I tried to google the question, but didn't find an answer. > > > > It's being worked on, see this thread on lkml from a few weeks ago: > > > > C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster mb()+documentation tweaks > > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 1/4] x86: add cc clobber for addl > > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE > > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO > > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl > > > > The 4th patch changes MFENCE to a LOCK ADDL locked instruction. > > Lots of additional chatter here: > > lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com > > And some useful bits here: > > lkml.kernel.org/r/56957D54.5000602@zytor.com > > latest version here: > > lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ It's ready as far as I am concerned. Basically we are just waiting for ack from hpa. -- MST ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 18:35 ` Michael S. Tsirkin @ 2016-03-03 19:05 ` H. Peter Anvin 2016-06-03 13:39 ` Peter Zijlstra 2016-08-03 4:36 ` Michael S. Tsirkin 0 siblings, 2 replies; 10+ messages in thread From: H. Peter Anvin @ 2016-03-03 19:05 UTC (permalink / raw) To: Michael S. Tsirkin, Peter Zijlstra Cc: Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@redhat.com> wrote: >On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote: >> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote: >> > >> > * Dexuan Cui <decui@microsoft.com> wrote: >> > >> > > Hi, >> > > My understanding about arch/x86/include/asm/barrier.h is: >obviously Linux >> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 >platforms that >> > > don't support XMM2. >> > > >> > > However, it looks people say Locked Add is much faster than the >FENCE >> > > instructions, even on modern Intel CPUs like Haswell, e.g., >please see >> > > the three sources: >> > > >> > > " 11.5.1 Locked Instructions as Memory Barriers >> > > Optimization >> > > Use locked instructions to implement Store/Store and Store/Load >barriers. >> > > " >> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf >> > > >> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier >": >> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ >> > > >> > > "...locked instruction are more efficient barriers...": >> > > >http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ >> > > >> > > I also found that FreeBSD prefers Locked Add. >> > > >> > > So, I'm curious why Linux prefers MFENCE. >> > > I guess I may be missing something. >> > > >> > > I tried to google the question, but didn't find an answer. >> > >> > It's being worked on, see this thread on lkml from a few weeks ago: >> > >> > C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster >mb()+documentation tweaks >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 1/4] x86: add cc >clobber for addl >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 2/4] x86: drop a >comment left over from X86_OOSTORE >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 3/4] x86: tweak the >comment about use of wmb for IO >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 4/4] x86: drop mfence >in favor of lock+addl >> > >> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction. >> >> Lots of additional chatter here: >> >> lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com >> >> And some useful bits here: >> >> lkml.kernel.org/r/56957D54.5000602@zytor.com >> >> latest version here: >> >> lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ > >It's ready as far as I am concerned. >Basically we are just waiting for ack from hpa. And I'm still discussing this with the hardware people. It seems we can do this for *most* things, but not all; the question is where exactly we need to do something different. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 19:05 ` H. Peter Anvin @ 2016-06-03 13:39 ` Peter Zijlstra 2016-08-03 4:36 ` Michael S. Tsirkin 1 sibling, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2016-06-03 13:39 UTC (permalink / raw) To: H. Peter Anvin Cc: Michael S. Tsirkin, Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote: > >> latest version here: > >> > >> lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ > > > >It's ready as far as I am concerned. > >Basically we are just waiting for ack from hpa. > > And I'm still discussing this with the hardware people. It seems we > can do this for *most* things, but not all; the question is where > exactly we need to do something different. Anything on this? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-03-03 19:05 ` H. Peter Anvin 2016-06-03 13:39 ` Peter Zijlstra @ 2016-08-03 4:36 ` Michael S. Tsirkin 2016-08-03 12:50 ` Henrique de Moraes Holschuh 1 sibling, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2016-08-03 4:36 UTC (permalink / raw) To: H. Peter Anvin Cc: Peter Zijlstra, Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote: > On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@redhat.com> wrote: > >On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote: > >> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote: > >> > > >> > * Dexuan Cui <decui@microsoft.com> wrote: > >> > > >> > > Hi, > >> > > My understanding about arch/x86/include/asm/barrier.h is: > >obviously Linux > >> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 > >platforms that > >> > > don't support XMM2. > >> > > > >> > > However, it looks people say Locked Add is much faster than the > >FENCE > >> > > instructions, even on modern Intel CPUs like Haswell, e.g., > >please see > >> > > the three sources: > >> > > > >> > > " 11.5.1 Locked Instructions as Memory Barriers > >> > > Optimization > >> > > Use locked instructions to implement Store/Store and Store/Load > >barriers. > >> > > " > >> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf > >> > > > >> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier > >": > >> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > >> > > > >> > > "...locked instruction are more efficient barriers...": > >> > > > >http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ > >> > > > >> > > I also found that FreeBSD prefers Locked Add. > >> > > > >> > > So, I'm curious why Linux prefers MFENCE. > >> > > I guess I may be missing something. > >> > > > >> > > I tried to google the question, but didn't find an answer. > >> > > >> > It's being worked on, see this thread on lkml from a few weeks ago: > >> > > >> > C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster > >mb()+documentation tweaks > >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 1/4] x86: add cc > >clobber for addl > >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 2/4] x86: drop a > >comment left over from X86_OOSTORE > >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 3/4] x86: tweak the > >comment about use of wmb for IO > >> > C Jan 13 Michael S. Tsir | ├─>[PATCH v3 4/4] x86: drop mfence > >in favor of lock+addl > >> > > >> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction. > >> > >> Lots of additional chatter here: > >> > >> lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com > >> > >> And some useful bits here: > >> > >> lkml.kernel.org/r/56957D54.5000602@zytor.com > >> > >> latest version here: > >> > >> lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.com > > > >It's ready as far as I am concerned. > >Basically we are just waiting for ack from hpa. > > And I'm still discussing this with the hardware people. It seems we > can do this for *most* things, but not all; the question is where > exactly we need to do something different. I'm guessing there's still no update? There's a decent chance that without documentation a bunch of current uses are actually broken. See for example http://marc.info/?l=linux-kernel&m=145400059304553&w=2 which going by the manual is fixing smp_mb misuse for clflush - or maybe not? > -- > Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-08-03 4:36 ` Michael S. Tsirkin @ 2016-08-03 12:50 ` Henrique de Moraes Holschuh 2016-08-03 13:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 10+ messages in thread From: Henrique de Moraes Holschuh @ 2016-08-03 12:50 UTC (permalink / raw) To: Michael S. Tsirkin Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Wed, 03 Aug 2016, Michael S. Tsirkin wrote: > > And I'm still discussing this with the hardware people. It seems we > > can do this for *most* things, but not all; the question is where > > exactly we need to do something different. Let's hope the "hardware guys" get back to you soon :( HSD162/BDM116 MOVNTDQA From WC Memory May Pass Earlier Locked Instructions Problem: An execution of (V)MOVNTDQA (streaming load instruction) that loads from WC (write combining) memory may appear to pass an earlier locked instruction that accesses a different cache line. Implication: Software that expects a lock to fence subsequent (V)MOVNTDQA instructions may not operate properly. Workaround: None identified. Software that relies on a locked instruction to fence subsequent executions of (V)MOVNTDQA should insert an MFENCE instruction between the locked instruction and subsequent (V)MOVNTDQA instruction. SKL079 MOVNTDQA From WC Memory May Pass Earlier MFENCE Instructions Problem: An execution of MOVNTDQA or VMOVNTDQA that loads from WC (write combining) memory may appear to pass an earlier execution of the MFENCE instruction. Implication: When this erratum occurs, an execution of MOVNTDQA or VMOVNTDQA may appear to execute before memory operations that precede the earlier MFENCE instruction. Software that uses MFENCE to order subsequent executions of the MOVNTDQA instructions may not operate properly. Workaround: It is possible for the BIOS to contain a workaround for this erratum. For the steppings affected, see the Summary Table of Changes. These are just examples. Intel might have other errata related to *FENCE or LOCK, and AMD might have its share of model-specific LOCK or *FENCE oddities as well (I didn't check). Note that Skylake is broken in exactly the opposite way that Haswell and Broadwell are. Fortunately, Skylake could be fixed through a microcode update, but still... The point is that we indeed need to be careful if we want to switch away from *FENCE. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-08-03 12:50 ` Henrique de Moraes Holschuh @ 2016-08-03 13:04 ` Michael S. Tsirkin 2016-08-03 23:19 ` Henrique de Moraes Holschuh 0 siblings, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2016-08-03 13:04 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Wed, Aug 03, 2016 at 09:50:25AM -0300, Henrique de Moraes Holschuh wrote: > On Wed, 03 Aug 2016, Michael S. Tsirkin wrote: > > > And I'm still discussing this with the hardware people. It seems we > > > can do this for *most* things, but not all; the question is where > > > exactly we need to do something different. > > Let's hope the "hardware guys" get back to you soon :( > > > HSD162/BDM116 MOVNTDQA From WC Memory May Pass Earlier Locked > Instructions > > Problem: An execution of (V)MOVNTDQA (streaming load instruction) > that loads from WC (write combining) memory may appear to pass an > earlier locked instruction that accesses a different cache line. > > Implication: Software that expects a lock to fence subsequent > (V)MOVNTDQA instructions may not operate properly. > > Workaround: None identified. Software that relies on a locked > instruction to fence subsequent executions of (V)MOVNTDQA should > insert an MFENCE instruction between the locked instruction and > subsequent (V)MOVNTDQA instruction. > > > > SKL079 MOVNTDQA From WC Memory May Pass Earlier MFENCE Instructions > > Problem: An execution of MOVNTDQA or VMOVNTDQA that loads from WC > (write combining) memory may appear to pass an earlier execution of > the MFENCE instruction. > > Implication: When this erratum occurs, an execution of MOVNTDQA or > VMOVNTDQA may appear to execute before memory operations that > precede the earlier MFENCE instruction. Software that uses MFENCE > to order subsequent executions of the MOVNTDQA instructions may not > operate properly. > > Workaround: It is possible for the BIOS to contain a workaround for > this erratum. For the steppings affected, see the Summary Table of > Changes. > > > These are just examples. Intel might have other errata related to > *FENCE or LOCK, and AMD might have its share of model-specific LOCK or > *FENCE oddities as well (I didn't check). > > Note that Skylake is broken in exactly the opposite way that Haswell and > Broadwell are. Fortunately, Skylake could be fixed through a microcode > update, but still... > > The point is that we indeed need to be careful if we want to switch away > from *FENCE. Are any of these used in kernel though? > -- > "One disk to rule them all, One disk to find them. One disk to bring > them all and in the darkness grind them. In the Land of Redmond > where the shadows lie." -- The Silicon Valley Tarot > Henrique Holschuh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? 2016-08-03 13:04 ` Michael S. Tsirkin @ 2016-08-03 23:19 ` Henrique de Moraes Holschuh 0 siblings, 0 replies; 10+ messages in thread From: Henrique de Moraes Holschuh @ 2016-08-03 23:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org On Wed, 03 Aug 2016, Michael S. Tsirkin wrote: > Are any of these used in kernel though? These specific errata were not the point of my post, rather, it was the fact that errata related to *FENCE and LOCKed instructions exists. I didn't verify whether something attempts to use non-temporal loads or stores from WC memory in the kernel. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-08-03 23:20 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-03 14:33 x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Dexuan Cui 2016-03-03 15:27 ` Ingo Molnar 2016-03-03 15:34 ` Peter Zijlstra 2016-03-03 18:35 ` Michael S. Tsirkin 2016-03-03 19:05 ` H. Peter Anvin 2016-06-03 13:39 ` Peter Zijlstra 2016-08-03 4:36 ` Michael S. Tsirkin 2016-08-03 12:50 ` Henrique de Moraes Holschuh 2016-08-03 13:04 ` Michael S. Tsirkin 2016-08-03 23:19 ` Henrique de Moraes Holschuh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox