public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Opteron Rev E has a bug ... a locked  instruction doesn't act as a read-acquire barrier
@ 2008-08-03  9:06 Arkadiusz Miskiewicz
  2008-08-04 13:26 ` Mikael Pettersson
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2008-08-03  9:06 UTC (permalink / raw)
  To: linux-kernel


Hello,

http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/atomicops-internals-x86.cc
says

"  // Opteron Rev E has a bug in which on very rare occasions a locked
  // instruction doesn't act as a read-acquire barrier if followed by a
  // non-locked read-modify-write instruction.  Rev F has this bug in 
  // pre-release versions, but not in versions released to customers,
  // so we test only for Rev E, which is family 15, model 32..63 inclusive.
  if (strcmp(vendor, "AuthenticAMD") == 0 &&       // AMD
      family == 15 &&
      32 <= model && model <= 63) {
    AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
  } else {
    AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
  }
"

does kernel have quirk/workaround for this? I'm looking at arch/x86/kernel/cpu 
but I don't see workaround related to this (possibly I'm overlooking).

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Opteron Rev E has a bug ... a locked  instruction doesn't act as a read-acquire barrier
  2008-08-03  9:06 Opteron Rev E has a bug ... a locked instruction doesn't act as a read-acquire barrier Arkadiusz Miskiewicz
@ 2008-08-04 13:26 ` Mikael Pettersson
  2008-08-04 13:56   ` Arkadiusz Miskiewicz
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Pettersson @ 2008-08-04 13:26 UTC (permalink / raw)
  To: Arkadiusz Miskiewicz; +Cc: linux-kernel

Arkadiusz Miskiewicz writes:
 > 
 > Hello,
 > 
 > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/atomicops-internals-x86.cc
 > says
 > 
 > "  // Opteron Rev E has a bug in which on very rare occasions a locked
 >   // instruction doesn't act as a read-acquire barrier if followed by a
 >   // non-locked read-modify-write instruction.  Rev F has this bug in 
 >   // pre-release versions, but not in versions released to customers,
 >   // so we test only for Rev E, which is family 15, model 32..63 inclusive.
 >   if (strcmp(vendor, "AuthenticAMD") == 0 &&       // AMD
 >       family == 15 &&
 >       32 <= model && model <= 63) {
 >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
 >   } else {
 >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
 >   }
 > "
 > 
 > does kernel have quirk/workaround for this? I'm looking at arch/x86/kernel/cpu 
 > but I don't see workaround related to this (possibly I'm overlooking).

I can find no reference to this alleged RevE erratum in the
Athlon64/Opteron revision guide (25759.pdf).

But if this bug is real then we need to know about it. Could
you ask the author of the code you quoted above to clarify?

/Mikael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Opteron Rev E has a bug ... a locked  instruction doesn't act as a read-acquire barrier
  2008-08-04 13:26 ` Mikael Pettersson
@ 2008-08-04 13:56   ` Arkadiusz Miskiewicz
  2008-08-04 14:54     ` Mikael Pettersson
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2008-08-04 13:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mikael Pettersson

On Monday 04 August 2008, Mikael Pettersson wrote:
> Arkadiusz Miskiewicz writes:
>  > Hello,
>  >
>  > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/at
>  >omicops-internals-x86.cc says
>  >
>  > "  // Opteron Rev E has a bug in which on very rare occasions a locked
>  >   // instruction doesn't act as a read-acquire barrier if followed by a
>  >   // non-locked read-modify-write instruction.  Rev F has this bug in
>  >   // pre-release versions, but not in versions released to customers,
>  >   // so we test only for Rev E, which is family 15, model 32..63
>  > inclusive. if (strcmp(vendor, "AuthenticAMD") == 0 &&       // AMD
>  >       family == 15 &&
>  >       32 <= model && model <= 63) {
>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
>  >   } else {
>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
>  >   }
>  > "
>  >
>  > does kernel have quirk/workaround for this? I'm looking at
>  > arch/x86/kernel/cpu but I don't see workaround related to this (possibly
>  > I'm overlooking).
>
> I can find no reference to this alleged RevE erratum in the
> Athlon64/Opteron revision guide (25759.pdf).
>
> But if this bug is real then we need to know about it. Could
> you ask the author of the code you quoted above to clarify?

Got answer, opensolaris has some workarounds for this bug I still don't know 
which errata # is that:

http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78d4a9db8c6e

btw. I got info about this bug after hiting this problem: 
http://bugs.mysql.com/bug.php?id=26081

> /Mikael

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Opteron Rev E has a bug ... a locked  instruction doesn't act as a read-acquire barrier
  2008-08-04 13:56   ` Arkadiusz Miskiewicz
@ 2008-08-04 14:54     ` Mikael Pettersson
  2008-08-06 13:09       ` Mikael Pettersson
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Pettersson @ 2008-08-04 14:54 UTC (permalink / raw)
  To: Arkadiusz Miskiewicz; +Cc: linux-kernel, Mikael Pettersson

On Mon, 4 Aug 2008 15:56:05 +0200, Arkadiusz Miskiewicz wrote:
>On Monday 04 August 2008, Mikael Pettersson wrote:
>> Arkadiusz Miskiewicz writes:
>>  > Hello,
>>  >
>>  > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/=
>at
>>  >omicops-internals-x86.cc says
>>  >
>>  > "  // Opteron Rev E has a bug in which on very rare occasions a locked
>>  >   // instruction doesn't act as a read-acquire barrier if followed by a
>>  >   // non-locked read-modify-write instruction.  Rev F has this bug in
>>  >   // pre-release versions, but not in versions released to customers,
>>  >   // so we test only for Rev E, which is family 15, model 32..63
>>  > inclusive. if (strcmp(vendor, "AuthenticAMD") =3D=3D 0 &&       // AMD
>>  >       family =3D=3D 15 &&
>>  >       32 <=3D model && model <=3D 63) {
>>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D true;
>>  >   } else {
>>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D false;
>>  >   }
>>  > "
>>  >
>>  > does kernel have quirk/workaround for this? I'm looking at
>>  > arch/x86/kernel/cpu but I don't see workaround related to this (possib=
>ly
>>  > I'm overlooking).
>>
>> I can find no reference to this alleged RevE erratum in the
>> Athlon64/Opteron revision guide (25759.pdf).
>>
>> But if this bug is real then we need to know about it. Could
>> you ask the author of the code you quoted above to clarify?
>
>Got answer, opensolaris has some workarounds for this bug I still don't kno=
>w=20
>which errata # is that:
>
>http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78=
>d4a9db8c6e
>
>btw. I got info about this bug after hiting this problem:=20
>http://bugs.mysql.com/bug.php?id=3D26081

Thanks, found the Solaris code in question and the mysql discussion.
I'll dig deeper tomorrow.

/Mikael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Opteron Rev E has a bug ... a locked  instruction doesn't act as a read-acquire barrier
  2008-08-04 14:54     ` Mikael Pettersson
@ 2008-08-06 13:09       ` Mikael Pettersson
  0 siblings, 0 replies; 5+ messages in thread
From: Mikael Pettersson @ 2008-08-06 13:09 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Arkadiusz Miskiewicz, linux-kernel

Mikael Pettersson writes:
 > On Mon, 4 Aug 2008 15:56:05 +0200, Arkadiusz Miskiewicz wrote:
 > >On Monday 04 August 2008, Mikael Pettersson wrote:
 > >> Arkadiusz Miskiewicz writes:
 > >>  > Hello,
 > >>  >
 > >>  > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/=
 > >at
 > >>  >omicops-internals-x86.cc says
 > >>  >
 > >>  > "  // Opteron Rev E has a bug in which on very rare occasions a locked
 > >>  >   // instruction doesn't act as a read-acquire barrier if followed by a
 > >>  >   // non-locked read-modify-write instruction.  Rev F has this bug in
 > >>  >   // pre-release versions, but not in versions released to customers,
 > >>  >   // so we test only for Rev E, which is family 15, model 32..63
 > >>  > inclusive. if (strcmp(vendor, "AuthenticAMD") =3D=3D 0 &&       // AMD
 > >>  >       family =3D=3D 15 &&
 > >>  >       32 <=3D model && model <=3D 63) {
 > >>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D true;
 > >>  >   } else {
 > >>  >     AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D false;
 > >>  >   }
 > >>  > "
 > >>  >
 > >>  > does kernel have quirk/workaround for this? I'm looking at
 > >>  > arch/x86/kernel/cpu but I don't see workaround related to this (possib=
 > >ly
 > >>  > I'm overlooking).
 > >>
 > >> I can find no reference to this alleged RevE erratum in the
 > >> Athlon64/Opteron revision guide (25759.pdf).
 > >>
 > >> But if this bug is real then we need to know about it. Could
 > >> you ask the author of the code you quoted above to clarify?
 > >
 > >Got answer, opensolaris has some workarounds for this bug I still don't kno=
 > >w=20
 > >which errata # is that:
 > >
 > >http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78=
 > >d4a9db8c6e
 > >
 > >btw. I got info about this bug after hiting this problem:=20
 > >http://bugs.mysql.com/bug.php?id=3D26081
 > 
 > Thanks, found the Solaris code in question and the mysql discussion.
 > I'll dig deeper tomorrow.

I investigated the Solaris track, but I've found no detailed
explanation of the alleged bug. I've asked the Sun engineer
who committed the fix for an explanation, but so far there's
been no reply.

Anyway, here's what I've found out.

It's Solaris bug # 6323525.

They call it "Mutex primitives don't work as expected."

if (number_of_cores() < 2) then don't have bug
if (family == 0xf && Model < 0x40) then have bug
if (rdmsr(MSR_BU_CFG/*0xC0011023*/) & 2) then bug is masked

lock:	// mutex_lock, spin_lock, etc
	...
	lock; cmpxchg ..
	jnz fail
	ret; nop; nop; nop	// patched to "lfence; ret" if bug

The workaround is to place a fencing instruction (lfence) between
the mutex operation and the subsequent read-modify-write instruction.
(This provides the necessary load memory barrier.)

There's no change to the unlock code.

Anyone know who to contact @ AMD about confirming or denying this?

/Mikael

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-06 13:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-03  9:06 Opteron Rev E has a bug ... a locked instruction doesn't act as a read-acquire barrier Arkadiusz Miskiewicz
2008-08-04 13:26 ` Mikael Pettersson
2008-08-04 13:56   ` Arkadiusz Miskiewicz
2008-08-04 14:54     ` Mikael Pettersson
2008-08-06 13:09       ` Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox