x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
@ 2016-03-03 14:33 Dexuan Cui
  2016-03-03 15:27 ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2016-03-03 14:33 UTC (permalink / raw)
  To: linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, David Howells, Paul E. McKenney
  Cc: linux-kernel@vger.kernel.org

Hi,
My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
don't support XMM2.

However, it looks people say Locked Add is much faster than the FENCE
instructions, even on modern Intel CPUs like Haswell, e.g., please see
the three sources:

" 11.5.1 Locked Instructions as Memory Barriers
Optimization
Use locked instructions to implement Store/Store and Store/Load barriers.
"
http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf

"lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/

"...locked instruction are more efficient barriers...":
http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/

I also found that FreeBSD prefers Locked Add.

So, I'm curious why Linux prefers MFENCE.
I guess I may be missing something.

I tried to google the question, but didn't find an answer.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 14:33 x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Dexuan Cui
@ 2016-03-03 15:27 ` Ingo Molnar
  2016-03-03 15:34   ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2016-03-03 15:27 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, David Howells, Paul E. McKenney,
	linux-kernel@vger.kernel.org, Michael S. Tsirkin, Peter Zijlstra


* Dexuan Cui <decui@microsoft.com> wrote:

> Hi,
> My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
> more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
> don't support XMM2.
> 
> However, it looks people say Locked Add is much faster than the FENCE
> instructions, even on modern Intel CPUs like Haswell, e.g., please see
> the three sources:
> 
> " 11.5.1 Locked Instructions as Memory Barriers
> Optimization
> Use locked instructions to implement Store/Store and Store/Load barriers.
> "
> http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> 
> "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
> http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> 
> "...locked instruction are more efficient barriers...":
> http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> 
> I also found that FreeBSD prefers Locked Add.
> 
> So, I'm curious why Linux prefers MFENCE.
> I guess I may be missing something.
> 
> I tried to google the question, but didn't find an answer.

It's being worked on, see this thread on lkml from a few weeks ago:

   C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster mb()+documentation tweaks
   C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc clobber for addl
   C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE
   C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO
   C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl

The 4th patch changes MFENCE to a LOCK ADDL locked instruction.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 15:27 ` Ingo Molnar
@ 2016-03-03 15:34   ` Peter Zijlstra
  2016-03-03 18:35     ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2016-03-03 15:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dexuan Cui, linux-x86_64@vger.kernel.org, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, David Howells, Paul E. McKenney,
	linux-kernel@vger.kernel.org, Michael S. Tsirkin

On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
> 
> * Dexuan Cui <decui@microsoft.com> wrote:
> 
> > Hi,
> > My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
> > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
> > don't support XMM2.
> > 
> > However, it looks people say Locked Add is much faster than the FENCE
> > instructions, even on modern Intel CPUs like Haswell, e.g., please see
> > the three sources:
> > 
> > " 11.5.1 Locked Instructions as Memory Barriers
> > Optimization
> > Use locked instructions to implement Store/Store and Store/Load barriers.
> > "
> > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> > 
> > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
> > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> > 
> > "...locked instruction are more efficient barriers...":
> > http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> > 
> > I also found that FreeBSD prefers Locked Add.
> > 
> > So, I'm curious why Linux prefers MFENCE.
> > I guess I may be missing something.
> > 
> > I tried to google the question, but didn't find an answer.
> 
> It's being worked on, see this thread on lkml from a few weeks ago:
> 
>    C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster mb()+documentation tweaks
>    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc clobber for addl
>    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE
>    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO
>    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl
> 
> The 4th patch changes MFENCE to a LOCK ADDL locked instruction.

Lots of additional chatter here:

  lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com

And some useful bits here:

  lkml.kernel.org/r/56957D54.5000602@zytor.com

latest version here:

  lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 15:34   ` Peter Zijlstra
@ 2016-03-03 18:35     ` Michael S. Tsirkin
  2016-03-03 19:05       ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-03-03 18:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Howells,
	Paul E. McKenney, linux-kernel@vger.kernel.org

On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
> > 
> > * Dexuan Cui <decui@microsoft.com> wrote:
> > 
> > > Hi,
> > > My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
> > > don't support XMM2.
> > > 
> > > However, it looks people say Locked Add is much faster than the FENCE
> > > instructions, even on modern Intel CPUs like Haswell, e.g., please see
> > > the three sources:
> > > 
> > > " 11.5.1 Locked Instructions as Memory Barriers
> > > Optimization
> > > Use locked instructions to implement Store/Store and Store/Load barriers.
> > > "
> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> > > 
> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> > > 
> > > "...locked instruction are more efficient barriers...":
> > > http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> > > 
> > > I also found that FreeBSD prefers Locked Add.
> > > 
> > > So, I'm curious why Linux prefers MFENCE.
> > > I guess I may be missing something.
> > > 
> > > I tried to google the question, but didn't find an answer.
> > 
> > It's being worked on, see this thread on lkml from a few weeks ago:
> > 
> >    C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster mb()+documentation tweaks
> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc clobber for addl
> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE
> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the comment about use of wmb for IO
> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence in favor of lock+addl
> > 
> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction.
> 
> Lots of additional chatter here:
> 
>   lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com
> 
> And some useful bits here:
> 
>   lkml.kernel.org/r/56957D54.5000602@zytor.com
> 
> latest version here:
> 
>   lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ

It's ready as far as I am concerned.
Basically we are just waiting for ack from hpa.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 18:35     ` Michael S. Tsirkin
@ 2016-03-03 19:05       ` H. Peter Anvin
  2016-06-03 13:39         ` Peter Zijlstra
  2016-08-03  4:36         ` Michael S. Tsirkin
  0 siblings, 2 replies; 10+ messages in thread
From: H. Peter Anvin @ 2016-03-03 19:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Peter Zijlstra
  Cc: Ingo Molnar, Dexuan Cui, linux-x86_64@vger.kernel.org,
	Thomas Gleixner, Ingo Molnar, David Howells, Paul E. McKenney,
	linux-kernel@vger.kernel.org

On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote:
>> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
>> > 
>> > * Dexuan Cui <decui@microsoft.com> wrote:
>> > 
>> > > Hi,
>> > > My understanding about arch/x86/include/asm/barrier.h is:
>obviously Linux
>> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32
>platforms that
>> > > don't support XMM2.
>> > > 
>> > > However, it looks people say Locked Add is much faster than the
>FENCE
>> > > instructions, even on modern Intel CPUs like Haswell, e.g.,
>please see
>> > > the three sources:
>> > > 
>> > > " 11.5.1 Locked Instructions as Memory Barriers
>> > > Optimization
>> > > Use locked instructions to implement Store/Store and Store/Load
>barriers.
>> > > "
>> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
>> > > 
>> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier
>":
>> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
>> > > 
>> > > "...locked instruction are more efficient barriers...":
>> > >
>http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
>> > > 
>> > > I also found that FreeBSD prefers Locked Add.
>> > > 
>> > > So, I'm curious why Linux prefers MFENCE.
>> > > I guess I may be missing something.
>> > > 
>> > > I tried to google the question, but didn't find an answer.
>> > 
>> > It's being worked on, see this thread on lkml from a few weeks ago:
>> > 
>> >    C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster
>mb()+documentation tweaks
>> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc
>clobber for addl
>> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a
>comment left over from X86_OOSTORE
>> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the
>comment about use of wmb for IO
>> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence
>in favor of lock+addl
>> > 
>> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction.
>> 
>> Lots of additional chatter here:
>> 
>>   lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com
>> 
>> And some useful bits here:
>> 
>>   lkml.kernel.org/r/56957D54.5000602@zytor.com
>> 
>> latest version here:
>> 
>>   lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ
>
>It's ready as far as I am concerned.
>Basically we are just waiting for ack from hpa.

And I'm still discussing this with the hardware people.  It seems we can do this for *most* things, but not all; the question is where exactly we need to do something different.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 19:05       ` H. Peter Anvin
@ 2016-06-03 13:39         ` Peter Zijlstra
  2016-08-03  4:36         ` Michael S. Tsirkin
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-06-03 13:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael S. Tsirkin, Ingo Molnar, Dexuan Cui,
	linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org

On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote:
> >> latest version here:
> >> 
> >>   lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.comZ
> >
> >It's ready as far as I am concerned.
> >Basically we are just waiting for ack from hpa.
> 
> And I'm still discussing this with the hardware people.  It seems we
> can do this for *most* things, but not all; the question is where
> exactly we need to do something different.

Anything on this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-03-03 19:05       ` H. Peter Anvin
  2016-06-03 13:39         ` Peter Zijlstra
@ 2016-08-03  4:36         ` Michael S. Tsirkin
  2016-08-03 12:50           ` Henrique de Moraes Holschuh
  1 sibling, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-08-03  4:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Ingo Molnar, Dexuan Cui,
	linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org

On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote:
> On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote:
> >> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
> >> > 
> >> > * Dexuan Cui <decui@microsoft.com> wrote:
> >> > 
> >> > > Hi,
> >> > > My understanding about arch/x86/include/asm/barrier.h is:
> >obviously Linux
> >> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32
> >platforms that
> >> > > don't support XMM2.
> >> > > 
> >> > > However, it looks people say Locked Add is much faster than the
> >FENCE
> >> > > instructions, even on modern Intel CPUs like Haswell, e.g.,
> >please see
> >> > > the three sources:
> >> > > 
> >> > > " 11.5.1 Locked Instructions as Memory Barriers
> >> > > Optimization
> >> > > Use locked instructions to implement Store/Store and Store/Load
> >barriers.
> >> > > "
> >> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> >> > > 
> >> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier
> >":
> >> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> >> > > 
> >> > > "...locked instruction are more efficient barriers...":
> >> > >
> >http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> >> > > 
> >> > > I also found that FreeBSD prefers Locked Add.
> >> > > 
> >> > > So, I'm curious why Linux prefers MFENCE.
> >> > > I guess I may be missing something.
> >> > > 
> >> > > I tried to google the question, but didn't find an answer.
> >> > 
> >> > It's being worked on, see this thread on lkml from a few weeks ago:
> >> > 
> >> >    C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster
> >mb()+documentation tweaks
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc
> >clobber for addl
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a
> >comment left over from X86_OOSTORE
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the
> >comment about use of wmb for IO
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence
> >in favor of lock+addl
> >> > 
> >> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction.
> >> 
> >> Lots of additional chatter here:
> >> 
> >>   lkml.kernel.org/r/20160112150032-mutt-send-email-mst@redhat.com
> >> 
> >> And some useful bits here:
> >> 
> >>   lkml.kernel.org/r/56957D54.5000602@zytor.com
> >> 
> >> latest version here:
> >> 
> >>   lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@redhat.com
> >
> >It's ready as far as I am concerned.
> >Basically we are just waiting for ack from hpa.
> 
> And I'm still discussing this with the hardware people.  It seems we
> can do this for *most* things, but not all; the question is where
> exactly we need to do something different.

I'm guessing there's still no update?

There's a decent chance that without documentation a bunch of current
uses are actually broken. See for example
http://marc.info/?l=linux-kernel&m=145400059304553&w=2
which going by the manual is fixing smp_mb misuse for clflush - or maybe not?

> -- 
> Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-08-03  4:36         ` Michael S. Tsirkin
@ 2016-08-03 12:50           ` Henrique de Moraes Holschuh
  2016-08-03 13:04             ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Henrique de Moraes Holschuh @ 2016-08-03 12:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui,
	linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org

On Wed, 03 Aug 2016, Michael S. Tsirkin wrote:
> > And I'm still discussing this with the hardware people.  It seems we
> > can do this for *most* things, but not all; the question is where
> > exactly we need to do something different.

Let's hope the "hardware guys" get back to you soon :(


     HSD162/BDM116  MOVNTDQA From WC Memory May Pass Earlier Locked
                    Instructions

     Problem: An execution of (V)MOVNTDQA (streaming load instruction)
     that loads from WC (write combining) memory may appear to pass an
     earlier locked instruction that accesses a different cache line.

     Implication: Software that expects a lock to fence subsequent
     (V)MOVNTDQA instructions may not operate properly.

     Workaround: None identified.  Software that relies on a locked
     instruction to fence subsequent executions of (V)MOVNTDQA should
     insert an MFENCE instruction between the locked instruction and
     subsequent (V)MOVNTDQA instruction.



     SKL079   MOVNTDQA From WC Memory May Pass Earlier MFENCE Instructions

     Problem: An execution of MOVNTDQA or VMOVNTDQA that loads from WC
     (write combining) memory may appear to pass an earlier execution of
     the MFENCE instruction.

     Implication: When this erratum occurs, an execution of MOVNTDQA or
     VMOVNTDQA may appear to execute before memory operations that
     precede the earlier MFENCE instruction.  Software that uses MFENCE
     to order subsequent executions of the MOVNTDQA instructions may not
     operate properly.

     Workaround: It is possible for the BIOS to contain a workaround for
     this erratum.  For the steppings affected, see the Summary Table of
     Changes.


These are just examples.  Intel might have other errata related to
*FENCE or LOCK, and AMD might have its share of model-specific LOCK or
*FENCE oddities as well (I didn't check).

Note that Skylake is broken in exactly the opposite way that Haswell and
Broadwell are.  Fortunately, Skylake could be fixed through a microcode
update, but still...

The point is that we indeed need to be careful if we want to switch away
from *FENCE.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-08-03 12:50           ` Henrique de Moraes Holschuh
@ 2016-08-03 13:04             ` Michael S. Tsirkin
  2016-08-03 23:19               ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-08-03 13:04 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui,
	linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org

On Wed, Aug 03, 2016 at 09:50:25AM -0300, Henrique de Moraes Holschuh wrote:
> On Wed, 03 Aug 2016, Michael S. Tsirkin wrote:
> > > And I'm still discussing this with the hardware people.  It seems we
> > > can do this for *most* things, but not all; the question is where
> > > exactly we need to do something different.
> 
> Let's hope the "hardware guys" get back to you soon :(
> 
> 
>      HSD162/BDM116  MOVNTDQA From WC Memory May Pass Earlier Locked
>                     Instructions
> 
>      Problem: An execution of (V)MOVNTDQA (streaming load instruction)
>      that loads from WC (write combining) memory may appear to pass an
>      earlier locked instruction that accesses a different cache line.
> 
>      Implication: Software that expects a lock to fence subsequent
>      (V)MOVNTDQA instructions may not operate properly.
> 
>      Workaround: None identified.  Software that relies on a locked
>      instruction to fence subsequent executions of (V)MOVNTDQA should
>      insert an MFENCE instruction between the locked instruction and
>      subsequent (V)MOVNTDQA instruction.
> 
> 
> 
>      SKL079   MOVNTDQA From WC Memory May Pass Earlier MFENCE Instructions
> 
>      Problem: An execution of MOVNTDQA or VMOVNTDQA that loads from WC
>      (write combining) memory may appear to pass an earlier execution of
>      the MFENCE instruction.
> 
>      Implication: When this erratum occurs, an execution of MOVNTDQA or
>      VMOVNTDQA may appear to execute before memory operations that
>      precede the earlier MFENCE instruction.  Software that uses MFENCE
>      to order subsequent executions of the MOVNTDQA instructions may not
>      operate properly.
> 
>      Workaround: It is possible for the BIOS to contain a workaround for
>      this erratum.  For the steppings affected, see the Summary Table of
>      Changes.
> 
> 
> These are just examples.  Intel might have other errata related to
> *FENCE or LOCK, and AMD might have its share of model-specific LOCK or
> *FENCE oddities as well (I didn't check).
> 
> Note that Skylake is broken in exactly the opposite way that Haswell and
> Broadwell are.  Fortunately, Skylake could be fixed through a microcode
> update, but still...
> 
> The point is that we indeed need to be careful if we want to switch away
> from *FENCE.

Are any of these used in kernel though?

> -- 
>   "One disk to rule them all, One disk to find them. One disk to bring
>   them all and in the darkness grind them. In the Land of Redmond
>   where the shadows lie." -- The Silicon Valley Tarot
>   Henrique Holschuh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?
  2016-08-03 13:04             ` Michael S. Tsirkin
@ 2016-08-03 23:19               ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 10+ messages in thread
From: Henrique de Moraes Holschuh @ 2016-08-03 23:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar, Dexuan Cui,
	linux-x86_64@vger.kernel.org, Thomas Gleixner, Ingo Molnar,
	David Howells, Paul E. McKenney, linux-kernel@vger.kernel.org

On Wed, 03 Aug 2016, Michael S. Tsirkin wrote:
> Are any of these used in kernel though?

These specific errata were not the point of my post, rather, it was the
fact that errata related to *FENCE and LOCKed instructions exists.

I didn't verify whether something attempts to use non-temporal loads or
stores from WC memory in the kernel.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-03 23:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-03 14:33 x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Dexuan Cui
2016-03-03 15:27 ` Ingo Molnar
2016-03-03 15:34   ` Peter Zijlstra
2016-03-03 18:35     ` Michael S. Tsirkin
2016-03-03 19:05       ` H. Peter Anvin
2016-06-03 13:39         ` Peter Zijlstra
2016-08-03  4:36         ` Michael S. Tsirkin
2016-08-03 12:50           ` Henrique de Moraes Holschuh
2016-08-03 13:04             ` Michael S. Tsirkin
2016-08-03 23:19               ` Henrique de Moraes Holschuh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox