public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: SMP syncronization on AMD processors (broken?)
@ 2005-10-11 23:50 linux
  2005-10-12  2:12 ` Christopher Friesen
  2005-10-13 12:25 ` Kirill Korotaev
  0 siblings, 2 replies; 27+ messages in thread
From: linux @ 2005-10-11 23:50 UTC (permalink / raw)
  To: dev; +Cc: linux-kernel

> The whole story started when we wrote the following code:
> 
> void XXX(void)
> {
> 	/* ints disabled */
> restart:
> 	spin_lock(&lock);
> 	do_something();
> 	if (!flag)
> 		need_restart = 1;
> 	spin_unlock(&lock);
> 	if (need_restart)
> 		goto restart;	<<<< LOOPS 4EVER ON AMD!!!
> }
> 
> void YYY(void)
> {
> 	spin_lock(&lock);	<<<< SPINS 4EVER ON AMD!!!
> 	flag = 1;
> 	spin_unlock(&lock);
> }
> 
> function XXX() starts on CPU0 and begins to loop since flag is not set, 
> then CPU1 calls function YYY() and it turns out that it can't take the 
> lock any arbitrary time.

The right thing to do here is to wait for the flag to be set *outside*
the lock, and then re-validate inside the lock:

void XXX(void)
{
	/* ints disabled */
restart:
	spin_lock(&lock);
	do_something();
	if (!flag)
		need_restart = 1;
	spin_unlock(&lock);
	if (need_restart) {
		while (!flag)
			cpu_relax();
		goto restart;
	}
}

This way, XXX() keeps the lock dropped for as long as it takes for
YYY() to notice and grab it.


However, I realize that this is of course a simplified case of some real
code, where even *finding* the flag requires the spin lock.

The generic solution is to have a global "progress" counter, which
records "I made progress toward setting flag", that XXX() can
busy-loop on:

int progress;

void XXX(void)
{
	int old_progress;
	/* ints disabled */
restart:
	spin_lock(&lock);
	do_something();
	if (!flag) {
		old_progress = progress;
		need_restart = 1;
	}
	spin_unlock(&lock);
	if (need_restart) {
		while (progress == old_progress)
			cpu_relax();
		goto restart;
	}
}

void YYY(void)
{
	spin_lock(&lock);
	flag = 1;
	progress++;
	spin_unlock(&lock);
}

It may be that in your data structure, there is one or a series of
fields that already exist that you can use for the purpose.  The goal
is to merely detect *change*, so you can reacquire the lock and test
definitively.  It's okay to read freed memory while doing this, as long as
you can be sure that:
- The memory read won't oops the kernel, and
- You don't end up depending on the value of the freed memory to
  get you out of the stall.

^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: SMP syncronization on AMD processors (broken?)
@ 2005-10-08  9:31 Chuck Ebbert
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Ebbert @ 2005-10-08  9:31 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: Andrew Morton, Andrey Savochkin, Linus Torvalds, linux-kernel

In-Reply-To: <434520FF.8050100@sw.ru>

On Thu, 06 Oct 2005 at 17:05:03 +0400, Kirill Korotaev wrote:

> The question is whether concurrent spin_lock()'s should 
> acquire it in more or less "fair" fashinon or one of CPUs can starve any 
> arbitrary time while others do reacquire it in a loop.

 You neglected to say what CPU type you compiled the kernel for.

 If it wasn't Pentium Pro maybe you could patch include/asm-i386/spinlock.h
line 82 (or the same place in x86-64) like this:

___
  * (PPro errata 66, 92)
  */
 
-#if !defined(CONFIG_X86_OOSTORE) && !defined(CONFIG_X86_PPRO_FENCE)
+#if 0
 
 #define __raw_spin_unlock_string \
         "movb $1,%0" \
___

The data might not make it out of the CPU write buffer without a locking
instruction doing the update.
__
Chuck

^ permalink raw reply	[flat|nested] 27+ messages in thread
* SMP syncronization on AMD processors (broken?)
@ 2005-10-06 13:05 Kirill Korotaev
  2005-10-06 13:14 ` linux-os (Dick Johnson)
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Kirill Korotaev @ 2005-10-06 13:05 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, Andrew Morton, xemul,
	Andrey Savochkin, st

Hello Linus, Andrew and others,

Please help with a not simple question about spin_lock/spin_unlock on 
SMP archs. The question is whether concurrent spin_lock()'s should 
acquire it in more or less "fair" fashinon or one of CPUs can starve any 
arbitrary time while others do reacquire it in a loop.

The question raised because the situation we observe on AMD processors 
is really strange and makes us believe that something is wrong in 
kerne/in processor or our minds. Below goes an explanation:

The whole story started when we wrote the following code:

void XXX(void)
{
	/* ints disabled */
restart:
	spin_lock(&lock);
	do_something();
	if (!flag)
		need_restart = 1;
	spin_unlock(&lock);
	if (need_restart)
		goto restart;	<<<< LOOPS 4EVER ON AMD!!!
}

void YYY(void)
{
	spin_lock(&lock);	<<<< SPINS 4EVER ON AMD!!!
	flag = 1;
	spin_unlock(&lock);
}

function XXX() starts on CPU0 and begins to loop since flag is not set, 
then CPU1 calls function YYY() and it turns out that it can't take the 
lock any arbitrary time.

Other observations:
- This does not happen on Intel processors, more over on Intel 2 CPUs 
take locks in a fair manner, exactly one by one!
- If do_something() = usleep(3) we observed that XXX() loops forever, 
while YYY spins 4EVER on the same lock...
- cpu_relax() doesn't help after spin_unlock()...
- wbinvd() after spin_unlock() helpes and 2 CPUs began to take the lock 
in a fair manner.

How can this happen? Is it regulated somehow by SMP specifications?

Kirill
P.S. Below is provided /proc/cpuinfo of machines affected.

-----------------------------------------------------------------------------

[root@ts25 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 35
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
stepping        : 2
cpu MHz         : 2010.433
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 
3dnowext 3dnow pni
bogomips        : 3981.31

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 35
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
stepping        : 2
cpu MHz         : 2010.433
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 
3dnowext 3dnow pni
bogomips        : 3964.92

-----------------------------------------------------------------------------

[root@opteron root]# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 246
stepping        : 10
cpu MHz         : 1992.595
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 
3dnowext 3dnow
bogomips        : 3915.77


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-10-13 18:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-11 23:50 SMP syncronization on AMD processors (broken?) linux
2005-10-12  2:12 ` Christopher Friesen
2005-10-12  2:39   ` linux
2005-10-12  3:27     ` Kyle Moffett
2005-10-13 12:25 ` Kirill Korotaev
  -- strict thread matches above, loose matches on Subject: below --
2005-10-08  9:31 Chuck Ebbert
2005-10-06 13:05 Kirill Korotaev
2005-10-06 13:14 ` linux-os (Dick Johnson)
2005-10-06 13:19 ` Arjan van de Ven
2005-10-06 13:32   ` Andrey Savochkin
2005-10-06 14:22     ` Arjan van de Ven
2005-10-06 13:32 ` Andi Kleen
2005-10-06 13:46   ` Andrey Savochkin
2005-10-06 14:52     ` Linus Torvalds
2005-10-06 15:21       ` Andrey Savochkin
2005-10-06 15:46         ` Linus Torvalds
2005-10-11  0:59         ` Andrew Morton
2005-10-11  1:20           ` Andi Kleen
2005-10-11  3:20             ` Joe Seigh
2005-10-06 13:50   ` Eric Dumazet
2005-10-06 14:45 ` Linus Torvalds
2005-10-06 15:34   ` Hugh Dickins
2005-10-06 15:53     ` Eric Dumazet
2005-10-06 16:01     ` Linus Torvalds
2005-10-07 20:38 ` Joe Seigh
2005-10-07 20:57   ` Stephen Hemminger
2005-10-13 18:24 ` Joe Seigh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox