* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
@ 2002-11-08 17:13 ` Jeremy Fitzhardinge
2002-11-08 17:25 ` Linus Torvalds
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:13 UTC (permalink / raw)
To: linux-ia64
On Thu, 2002-11-07 at 19:51, William Lee Irwin III wrote:
> On Thu, Nov 07, 2002 at 09:23:21PM -0600, Van Maren, Kevin wrote:
> > This is a follow-up to the email thread I started on July 29th.
> > See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> > and the following discussion on LKML.
> > I'll summarize the problem again to refresh the issue.
> > Again, this is a correctness issue, not a performance one.
> > I am seeing a problem on medium-sized SMPs where user programs are
> > able to livelock the Linux kernel to such an extent that the
> > system appears dead. With the help of some hardware debugging
> > tools, I was able to determine that the problem is caused by
> > the reader-preference reader/writer locks in the Linux kernel.
>
> This is a very serious problem which I have also encountered. My
> strategy was to make the readers on the tasklist_lock more well-behaved,
> and with Ingo's help and co-authorship those changes were cleaned up,
> tuned to provide performance benefits for smaller systems, bugfixed,
> and incorporated in the kernel. They have at least provided 16x systems
> in my lab with much more stability. The issues are still triggerable on
> 32x systems in my lab, to which I do not have regular access.
The normal way of solving this fairness problem is to make pending write
locks block read lock attempts, so that the reader count is guaranteed
to drop to zero as read locks are released. I haven't looked at the
Linux implementation of rwlocks, so I don't know how hard this is to
do. Or perhaps there's some other reason for not implementing it this
way?
J
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
2002-11-08 17:13 ` Jeremy Fitzhardinge
@ 2002-11-08 17:25 ` Linus Torvalds
2002-11-08 17:28 ` Linus Torvalds
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:25 UTC (permalink / raw)
To: linux-ia64
On 8 Nov 2002, Jeremy Fitzhardinge wrote:
>
> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released. I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do. Or perhaps there's some other reason for not implementing it this
> way?
There's another reason for not doing it that way: allowing readers to keep
interrupts on even in the presense of interrupt uses of readers.
If you do the "pending writes stop readers" approach, you get
cpu1 cpu2
read_lock() - get
write_lock_irq() - pending
irq happens
- read_lock() - deadlock
and that means that you need to make readers protect against interrupts
even if the interrupts only read themselves.
NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
and maybe we should just make sure to find all such cases and turn the
read_lock() calls into read_lock_irqsave() and then make the rw-locks
block readers on pending writers. But it's certainly more work and cause
for subtler problems than just naively changing the rw implementation.
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
2002-11-08 17:13 ` Jeremy Fitzhardinge
2002-11-08 17:25 ` Linus Torvalds
@ 2002-11-08 17:28 ` Linus Torvalds
2002-11-08 17:34 ` David Howells
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:28 UTC (permalink / raw)
To: linux-ia64
On Fri, 8 Nov 2002, Linus Torvalds wrote:
>
> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.
Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq issue)
can use that, slowly porting users over one by one...
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (2 preceding siblings ...)
2002-11-08 17:28 ` Linus Torvalds
@ 2002-11-08 17:34 ` David Howells
2002-11-08 17:38 ` Jeremy Fitzhardinge
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Howells @ 2002-11-08 17:34 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 708 bytes --]
> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released. I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do. Or perhaps there's some other reason for not implementing it this
> way?
Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
also do it, just not so easily.
I've attached an implementation of a fair spinlock and an implementation of a
fair rwlock (which can be compiled and played with in u-space).
David
[-- Attachment #2: Type: text/plain, Size: 1806 bytes --]
typedef struct fairlock {
u8 curr;
u8 ticket;
} fairlock_t;
static inline init_fairlock(fairlock_t *lock)
{
memset(lock,0,sizeof(*lock));
}
/*
* spin lock fairly
*/
static inline void fair_lock(fairlock_t *lock)
{
u8 number = 1;
__asm__ __volatile__(
"# beginning fair_lock\n\t"
LOCK_PREFIX " xaddb %b2,%1\n\t" /* number = lock->ticket; lock->ticket++; */
"1:\n\t"
" cmpb %b2,%0\n\t"
" jne 2f\n\t" /* jump if number!=lock->curr */
LOCK_SECTION_START("")
"2:\n\t"
" rep; nop\n\t"
" jmp 1b\n"
LOCK_SECTION_END
"# ending fair_lock\n\t"
: "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
: "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
: "memory", "cc");
}
/*
* spin trylock fairly
*/
static inline void fair_trylock(fairlock_t *lock)
{
u32 tmp;
__asm__ __volatile__(
"# beginning fair_trylock\n\t"
" movw (%3),%%ax\n\t" /* AL=curr, AH=ticket */
"1:\n\t"
" cmpb %%al,%%ah\n\t"
" je 3f\n\t" /* jump if maybe we can get it */
"2:\n\t"
LOCK_SECTION_START("")
"3:\n\t"
" leaw 1(%ax),%w2\n\t" /* [propose] ticket=ticket+1 */
LOCK_PREFIX " cmpxchgw %w2,(%3)\n\t"
" je 3b\n\t" /* jump if worked */
" rep; nop\n\t" /* didn't work; AL & AH have been updated */
" jmp 1b\n"
LOCK_SECTION_END
"# ending fair_trylock\n\t"
: "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
: "r"(lock), "m"(lock->curr), "m"(lock->ticket)
: "memory", "cc", "eax");
}
/*
* spin unlock fairly
*/
static inline void fair_unlock(fairlock_t *lock)
{
u8 number;
__asm__ __volatile__(
"# beginning fair_unlock\n\t"
LOCK_PREFIX " incb %1\n\t" /* lock->curr++; */
"# ending fair_unlock\n\t"
: "=m"(lock->curr)
: "r"(lock), "m"(lock->curr)
: "memory", "cc");
}
[-- Attachment #3: Type: text/plain, Size: 4065 bytes --]
/* rwlock.c: fair read/write spinlocks
*
* Copyright (c) 2002 David Howells (dhowells@redhat.com).
*/
#include <linux/types.h>
typedef unsigned char u8;
typedef unsigned int u32;
#define LOCK_PREFIX "lock;"
typedef struct rwlock {
/* these member variables must be at the beginning and in this order */
u8 rd_curr; /* next reader ticket allowed to become active */
u8 curr; /* currently active ticket */
u8 ticket; /* next ticket to be issued */
u8 __pad;
} __attribute__((aligned(4))) rwlock_t;
#define RWLOCK_UNLOCKED (rwlock_t) { 0, 0, 0, 0 }
#define rwlock_debug(LOCK,WHERE) \
do { \
printf(WHERE"{%02x,%02x,%02x}\n",LOCK->rd_curr,LOCK->curr,LOCK->ticket); \
} while(0)
/*
* obtain a write lock
* - pull the next ticket from lock->ticket (which is subsequently incremented)
* - spin until lock->curr catches up to the value that lock->ticket had before the XADD
* - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
* getting a lock
*/
static inline void write_lock(rwlock_t *lock)
{
u32 eax;
rwlock_debug(lock,"-->write_lock");
asm volatile(
" # begin write_lock \n"
LOCK_PREFIX " xaddw %%ax,1(%3) \n" /* my ticket in AH */
"0: cmpb %%al,%%ah \n" /* lock->curr in AL */
" jne 2f \n"
" .section .text.lock,\"ax\"\n"
"2: \n"
" rep; nop \n"
" movb 1(%3),%%al \n" /* reload AL from lock->curr */
" jmp 0b \n"
" .previous \n"
" # end write_lock \n"
: "=m"(lock), "=a"(eax)
: "m"(lock), "r"(lock), "a"(0x0100)
: "memory", "cc"
);
rwlock_debug(lock,"<--write_lock");
}
/*
* release a write lock
* - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
*/
static inline void write_unlock(rwlock_t *lock)
{
u32 eax;
rwlock_debug(lock,"-->write_unlock");
asm volatile(
" # begin write_unlock \n"
" movw 0(%3),%%ax \n"
" incb %%al \n" /* lock->rd_curr++ */
" incb %%ah \n" /* lock->curr++ */
" movw %%ax,0(%3) \n"
" # end write_unlock \n"
: "=m"(lock), "=&a"(eax)
: "m"(lock), "r"(lock)
: "cc"
);
rwlock_debug(lock,"<--write_unlock");
}
/*
* obtain a read lock
* - pull the next ticket from lock->ticket (which is subsequently incremented)
* - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
* - lock->rd_curr is then advanced by one to allow the next read lock to be granted
* - lock->curr is irrelevant
*/
static inline void read_lock(rwlock_t *lock)
{
u32 eax;
rwlock_debug(lock,"-->read_lock");
asm volatile(
" # begin read_lock \n"
LOCK_PREFIX " xaddb %%ah,2(%3) \n" /* my ticket in AH */
"0: movb 0(%3),%%al \n" /* AL = lock->rd_curr */
" cmpb %%al,%%ah \n" /* if (ticket!=lock->rd_curr) */
" jne 2f \n" /* then jump */
" incb 0(%3) \n" /* lock->rd_curr */
" .section .text.lock,\"ax\"\n"
"2: \n"
" rep; nop \n"
" jmp 0b \n"
" .previous \n"
" # end read_lock \n"
: "=m"(lock), "=a"(eax)
: "m"(lock), "r"(lock), "a"(0x0100)
: "memory", "cc"
);
rwlock_debug(lock,"<--read_lock");
}
/*
* release a read lock
* - just advance the lock->curr count so that any spinning write lock can happen
*/
static inline void read_unlock(rwlock_t *lock)
{
rwlock_debug(lock,"-->read_unlock");
asm volatile(
" # begin read_unlock \n"
LOCK_PREFIX " incb %0 \n" /* lock->curr++ */
" # end read_unlock \n"
: "=m"(lock->curr)
: "m"(lock->curr)
: "cc"
);
rwlock_debug(lock,"<--read_unlock");
}
/*****************************************************************************/
/*
* testing stuff
*/
rwlock_t mylock = RWLOCK_UNLOCKED;
#define wibble() asm("nop")
void get_read_lock(void)
{
wibble();
read_lock(&mylock);
read_lock(&mylock);
read_unlock(&mylock);
wibble();
read_lock(&mylock);
read_unlock(&mylock);
read_unlock(&mylock);
wibble();
}
void get_write_lock(void)
{
wibble();
write_lock(&mylock);
wibble();
write_unlock(&mylock);
wibble();
}
int main()
{
get_read_lock();
get_write_lock();
get_write_lock();
get_read_lock();
return 0;
}
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (3 preceding siblings ...)
2002-11-08 17:34 ` David Howells
@ 2002-11-08 17:38 ` Jeremy Fitzhardinge
2002-11-08 17:41 ` Van Maren, Kevin
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:38 UTC (permalink / raw)
To: linux-ia64
On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> There's another reason for not doing it that way: allowing readers to keep
> interrupts on even in the presense of interrupt uses of readers.
>
> If you do the "pending writes stop readers" approach, you get
>
> cpu1 cpu2
>
> read_lock() - get
>
> write_lock_irq() - pending
>
> irq happens
> - read_lock() - deadlock
>
> and that means that you need to make readers protect against interrupts
> even if the interrupts only read themselves.
Even without interrupts that would be a bug. It isn't ever safe to
attempt to retake a read lock if you already hold it, because you may
deadlock with a pending writer. Fair multi-reader locks aren't
recursive locks.
> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.
Yes, I'd agree. It would definitely be a behavioural change with
respect to the legality of retaking a lock for reading, which would
probably be quite irritating to find (since they'd only cause a problem
if they actually coincide with an attempted write lock).
> Actually, giving this som emore thought, I really suspect that the
> simplest solution is to alloc a separate "fair_read_lock()", and paths
> that need to care about fairness (and know they don't have the irq
> issue)
> can use that, slowly porting users over one by one...
Do you mean have a separate lock type, or have two different read_lock
operations on the current type?
J
^ permalink raw reply [flat|nested] 19+ messages in thread* RE: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (4 preceding siblings ...)
2002-11-08 17:38 ` Jeremy Fitzhardinge
@ 2002-11-08 17:41 ` Van Maren, Kevin
2002-11-08 17:43 ` David Howells
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 17:41 UTC (permalink / raw)
To: linux-ia64
Yes, that was one of the options I suggested in the original post
to the IA64 list. I'll bounce it to LKML for reference, now that
the discussion has moved there.
In my proposal, however, I proposed making (the current) reader
locks recursive spinlocks (to eliminate the starvation problem,
at the expense of eliminating reader parallelism), which would
have the added benefit of "encouraging" people to move to the
fair locks. But your proposal is probably at least as good.
Of course, normal spinlocks do not scale either, since they
currently require N cache-cache transfers for a processor to
drop the lock, which results in N^2 cache transfers for each
processor to acquire/release the lock once. So with 32 processors
contending for the lock, at 1us per cache-cache transfer (picked
for easy math, but that is reasonable for large systems), it
takes 1ms for each processor to acquire the spinlock and hold it
for 10 cpu cycles.
So I'd _also_ like to see an MCS lock implementation replace the current
spinlock code (IBM "NMCS" lock patch can be trivially used to replace
all spinlocks).
Kevin
-----Original Message-----
From: Linus Torvalds
To: Jeremy Fitzhardinge
Cc: William Lee Irwin III; Van Maren, Kevin; linux-ia64@linuxia64.org; Linux
Kernel List; rusty@rustcorp.com.au; dhowells@redhat.com; mingo@elte.hu
Sent: 11/8/02 12:28 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem
On Fri, 8 Nov 2002, Linus Torvalds wrote:
>
> NOTE! I'm not saying the existing practice is necessarily a good
tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and
cause
> for subtler problems than just naively changing the rw implementation.
Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq
issue)
can use that, slowly porting users over one by one...
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (5 preceding siblings ...)
2002-11-08 17:41 ` Van Maren, Kevin
@ 2002-11-08 17:43 ` David Howells
2002-11-08 17:52 ` Matthew Wilcox
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Howells @ 2002-11-08 17:43 UTC (permalink / raw)
To: linux-ia64
> Do you mean have a separate lock type, or have two different read_lock
> operations on the current type?
You'd probably be better off with separate types... that way you can avoid
problems with conditionals and also with different tracking fields being
required by each grade of lock.
David
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (6 preceding siblings ...)
2002-11-08 17:43 ` David Howells
@ 2002-11-08 17:52 ` Matthew Wilcox
2002-11-08 17:54 ` David Howells
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2002-11-08 17:52 UTC (permalink / raw)
To: linux-ia64
On Fri, Nov 08, 2002 at 11:41:57AM -0600, Van Maren, Kevin wrote:
> processor to acquire/release the lock once. So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked
if you have 32 processors contending for the same spinlock, you have
bigger problems.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (7 preceding siblings ...)
2002-11-08 17:52 ` Matthew Wilcox
@ 2002-11-08 17:54 ` David Howells
2002-11-08 17:55 ` Stephen Hemminger
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Howells @ 2002-11-08 17:54 UTC (permalink / raw)
To: linux-ia64
> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).
For those that prefer patches (or at least something that'll compile without
warnings)...
David
diff -uNr linux-2.5.46/include/asm-i386/fairlock.h linux-fair-2546/include/asm-i386/fairlock.h
--- linux-2.5.46/include/asm-i386/fairlock.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-fair-2546/include/asm-i386/fairlock.h 2002-11-08 17:51:30.000000000 +0000
@@ -0,0 +1,210 @@
+/* fairlock.h: fair spinlocks
+ *
+ * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_FAIRLOCK_H
+#define _ASM_FAIRLOCK_H
+
+#include <linux/config.h>
+#include <linux/string.h>
+#include <asm/atomic.h>
+
+typedef struct fairlock {
+ u8 curr;
+ u8 ticket;
+} fairlock_t;
+
+static inline void init_fairlock(fairlock_t *lock)
+{
+ memset(lock,0,sizeof(*lock));
+}
+
+typedef struct rwfairlock {
+
+ /* these member variables must be at the beginning and in this order */
+ u8 rd_curr; /* next reader ticket allowed to become active */
+ u8 curr; /* currently active ticket */
+ u8 ticket; /* next ticket to be issued */
+ u8 __pad;
+
+} __attribute__((aligned(4))) rwfairlock_t;
+
+#define RWFAIRLOCK_UNLOCKED (rwfairlock_t) { 0, 0, 0, 0 }
+
+/*****************************************************************************/
+/*
+ * spin lock fairly
+ */
+static inline void fair_lock(fairlock_t *lock)
+{
+ u8 number = 1;
+
+ __asm__ __volatile__(
+ "# beginning fair_lock\n\t"
+LOCK_PREFIX " xaddb %b2,%1\n\t" /* number = lock->ticket; lock->ticket++; */
+ "1:\n\t"
+ " cmpb %b2,%0\n\t"
+ " jne 2f\n\t" /* jump if number!=lock->curr */
+ LOCK_SECTION_START("")
+ "2:\n\t"
+ " rep; nop\n\t"
+ " jmp 1b\n"
+ LOCK_SECTION_END
+ "# ending fair_lock\n\t"
+ : "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
+ : "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
+ : "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * spin trylock fairly
+ */
+static inline void fair_trylock(fairlock_t *lock)
+{
+ u32 tmp;
+
+ __asm__ __volatile__(
+ "# beginning fair_trylock\n\t"
+ " movw (%3),%%ax\n\t" /* AL=curr, AH=ticket */
+ "1:\n\t"
+ " cmpb %%al,%%ah\n\t"
+ " je 3f\n\t" /* jump if maybe we can get it */
+ "2:\n\t"
+ LOCK_SECTION_START("")
+ "3:\n\t"
+ " leaw 1(%ax),%w2\n\t" /* [propose] ticket=ticket+1 */
+LOCK_PREFIX " cmpxchgw %w2,(%3)\n\t"
+ " je 3b\n\t" /* jump if worked */
+ " rep; nop\n\t" /* didn't work; AL & AH have been updated */
+ " jmp 1b\n"
+ LOCK_SECTION_END
+ "# ending fair_trylock\n\t"
+ : "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
+ : "r"(lock), "m"(lock->curr), "m"(lock->ticket)
+ : "memory", "cc", "eax");
+}
+
+/*****************************************************************************/
+/*
+ * spin unlock fairly
+ */
+static inline void fair_unlock(fairlock_t *lock)
+{
+ __asm__ __volatile__(
+ "# beginning fair_unlock\n\t"
+LOCK_PREFIX " incb %1\n\t" /* lock->curr++; */
+ "# ending fair_unlock\n\t"
+ : "=m"(lock->curr)
+ : "r"(lock), "m"(lock->curr)
+ : "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * obtain a write lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
+ * getting a lock
+ */
+static inline void fair_write_lock(rwfairlock_t *lock)
+{
+ u32 eax;
+
+ asm volatile(
+ " # begin write_lock \n"
+LOCK_PREFIX " xaddw %%ax,1(%3) \n" /* my ticket in AH */
+ "0: cmpb %%al,%%ah \n" /* lock->curr in AL */
+ " jne 2f \n"
+ " .section .text.lock,\"ax\"\n"
+ "2: \n"
+ " rep; nop \n"
+ " movb 1(%3),%%al \n" /* reload AL from lock->curr */
+ " jmp 0b \n"
+ " .previous \n"
+ " # end write_lock \n"
+ : "=m"(lock), "=a"(eax)
+ : "m"(lock), "r"(lock), "a"(0x0100)
+ : "memory", "cc"
+ );
+}
+
+/*****************************************************************************/
+/*
+ * release a write lock
+ * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
+ */
+static inline void fair_write_unlock(rwfairlock_t *lock)
+{
+ u32 eax;
+
+ asm volatile(
+ " # begin write_unlock \n"
+ " movw 0(%3),%%ax \n"
+ " incb %%al \n" /* lock->rd_curr++ */
+ " incb %%ah \n" /* lock->curr++ */
+ " movw %%ax,0(%3) \n"
+ " # end write_unlock \n"
+ : "=m"(lock), "=&a"(eax)
+ : "m"(lock), "r"(lock)
+ : "cc"
+ );
+}
+
+/*****************************************************************************/
+/*
+ * obtain a read lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
+ * - lock->curr is irrelevant
+ */
+static inline void fair_read_lock(rwfairlock_t *lock)
+{
+ u32 eax;
+
+ asm volatile(
+ " # begin read_lock \n"
+LOCK_PREFIX " xaddb %%ah,2(%3) \n" /* my ticket in AH */
+ "0: movb 0(%3),%%al \n" /* AL = lock->rd_curr */
+ " cmpb %%al,%%ah \n" /* if (ticket!=lock->rd_curr) */
+ " jne 2f \n" /* then jump */
+ " incb 0(%3) \n" /* lock->rd_curr */
+ " .section .text.lock,\"ax\"\n"
+ "2: \n"
+ " rep; nop \n"
+ " jmp 0b \n"
+ " .previous \n"
+ " # end read_lock \n"
+ : "=m"(lock), "=a"(eax)
+ : "m"(lock), "r"(lock), "a"(0x0100)
+ : "memory", "cc"
+ );
+}
+
+/*****************************************************************************/
+/*
+ * release a read lock
+ * - just advance the lock->curr count so that any spinning write lock can happen
+ */
+static inline void read_unlock(rwfairlock_t *lock)
+{
+ asm volatile(
+ " # begin read_unlock \n"
+LOCK_PREFIX " incb %0 \n" /* lock->curr++ */
+ " # end read_unlock \n"
+ : "=m"(lock->curr)
+ : "m"(lock->curr)
+ : "cc"
+ );
+}
+
+#endif /* _ASM_FAIRLOCK_H */
diff -uNr linux-2.5.46/include/asm-i386/spinlock.h linux-fair-2546/include/asm-i386/spinlock.h
--- linux-2.5.46/include/asm-i386/spinlock.h 2002-10-15 10:12:35.000000000 +0100
+++ linux-fair-2546/include/asm-i386/spinlock.h 2002-11-08 17:50:39.000000000 +0000
@@ -4,6 +4,7 @@
#include <asm/atomic.h>
#include <asm/rwlock.h>
#include <asm/page.h>
+#include <asm/fairlock.h>
#include <linux/config.h>
/*
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (8 preceding siblings ...)
2002-11-08 17:54 ` David Howells
@ 2002-11-08 17:55 ` Stephen Hemminger
2002-11-08 18:05 ` Van Maren, Kevin
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Stephen Hemminger @ 2002-11-08 17:55 UTC (permalink / raw)
To: linux-ia64
On Fri, 2002-11-08 at 09:34, David Howells wrote:
>
> > The normal way of solving this fairness problem is to make pending write
> > locks block read lock attempts, so that the reader count is guaranteed
> > to drop to zero as read locks are released. I haven't looked at the
> > Linux implementation of rwlocks, so I don't know how hard this is to
> > do. Or perhaps there's some other reason for not implementing it this
> > way?
>
> Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
> very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
> also do it, just not so easily.
>
> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).
>
> David
There are a selection of similar algorithms here:
http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html#s_f
How does yours compare?
^ permalink raw reply [flat|nested] 19+ messages in thread* RE: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (9 preceding siblings ...)
2002-11-08 17:55 ` Stephen Hemminger
@ 2002-11-08 18:05 ` Van Maren, Kevin
2002-11-08 19:19 ` Matthew Wilcox
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 18:05 UTC (permalink / raw)
To: linux-ia64
Absolutely you should minimize the locking contention.
However, that isn't always possible, such as when you
have 64 processors contending on the same resource.
With the current kernel, the trivial example with reader/
writer locks was having them all call gettimeofday().
But try having 64 processors fstat() the same file,
which I have also seen happen (application looping,
waiting for another process to finish setting up the
file so they can all mmap it).
What MCS locks do is they reduce the number of times
the cacheline has to be flung around the system in
order to get work done: they "scale" much better with
the number of processors: O(N) instead of O(N^2).
Kevin
-----Original Message-----
From: Matthew Wilcox
To: Van Maren, Kevin
Cc: 'Linus Torvalds '; 'Jeremy Fitzhardinge '; 'William Lee Irwin III ';
'linux-ia64@linuxia64.org '; 'Linux Kernel List '; 'rusty@rustcorp.com.au ';
'dhowells@redhat.com '; 'mingo@elte.hu '
Sent: 11/8/02 12:52 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem
On Fri, Nov 08, 2002 at 11:41:57AM -0600, Van Maren, Kevin wrote:
> processor to acquire/release the lock once. So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked
if you have 32 processors contending for the same spinlock, you have
bigger problems.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (10 preceding siblings ...)
2002-11-08 18:05 ` Van Maren, Kevin
@ 2002-11-08 19:19 ` Matthew Wilcox
2002-11-08 19:26 ` David Mosberger
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2002-11-08 19:19 UTC (permalink / raw)
To: linux-ia64
On Fri, Nov 08, 2002 at 12:05:30PM -0600, Van Maren, Kevin wrote:
> Absolutely you should minimize the locking contention.
> However, that isn't always possible, such as when you
> have 64 processors contending on the same resource.
if you've got 64 processors contending on the same resource, maybe you
need to split that resource up so they can have a copy each. all that
cacheline bouncing can't do your numa boxes any good.
> With the current kernel, the trivial example with reader/
> writer locks was having them all call gettimeofday().
i hear x86-64 has a lockless gettimeofday. maybe that's the solution.
> But try having 64 processors fstat() the same file,
> which I have also seen happen (application looping,
> waiting for another process to finish setting up the
> file so they can all mmap it).
umm.. the call trace:
sys_fstat
|-> vfs_fstat
| |-> fget
| |-> read_lock(&files->file_lock)
| |-> vfs_getattr
| |-> inode->i_op->getattr
| |-> generic_fillattr
|-> cp_new_stat64
|-> memset
|-> copy_to_user
so you're talking about contention on files->file_lock, right? it's really
not the kernel's fault that your app is badly written. that lock's private
to process & children, so it's not like another application can hurt you.
> What MCS locks do is they reduce the number of times
> the cacheline has to be flung around the system in
> order to get work done: they "scale" much better with
> the number of processors: O(N) instead of O(N^2).
yes, but how slow are they in the uncontended case?
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (11 preceding siblings ...)
2002-11-08 19:19 ` Matthew Wilcox
@ 2002-11-08 19:26 ` David Mosberger
2002-11-08 20:17 ` Van Maren, Kevin
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Mosberger @ 2002-11-08 19:26 UTC (permalink / raw)
To: linux-ia64
>>>>> On Fri, 8 Nov 2002 19:19:07 +0000, Matthew Wilcox <willy@debian.org> said:
Matthew> On Fri, Nov 08, 2002 at 12:05:30PM -0600, Van Maren, Kevin
Matthew> wrote:
>> Absolutely you should minimize the locking contention. However,
>> that isn't always possible, such as when you have 64 processors
>> contending on the same resource.
Matthew> if you've got 64 processors contending on the same
Matthew> resource, maybe you need to split that resource up so they
Matthew> can have a copy each. all that cacheline bouncing can't do
Matthew> your numa boxes any good.
Matthew, please understand that this is NOT a performance problem.
It's a correctness problem. If livelock can resolut from read-write locks,
it's a huge security problem. Period.
--david
^ permalink raw reply [flat|nested] 19+ messages in thread* RE: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (12 preceding siblings ...)
2002-11-08 19:26 ` David Mosberger
@ 2002-11-08 20:17 ` Van Maren, Kevin
2002-11-08 20:39 ` Matthew Wilcox
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 20:17 UTC (permalink / raw)
To: linux-ia64
> all that cacheline bouncing can't do your numa boxes any good.
It happens even on our non-NUMA boxes. But that was the reason
behind developing MCS locks: they are designed to minimize the
cacheline bouncing due to lock contention, and become a win with
a very small number of processors contending the same spinlock.
> i hear x86-64 has a lockless gettimeofday. maybe that's the solution.
I was using gettimeofday() as ONE example of the problem.
Fixing gettimeofday(), such as with frlocks (see, for example,
http://lwn.net/Articles/7388) fixes ONE occurance of the
problem.
Every reader/writer lock that an application can force
the kernel to acquire can have this problem. If there
is enough time between acquires, it may take 32 or 64
processors to hang the system, but livelock WILL occur.
> it's really
> not the kernel's fault that your app is badly written.
There are MANY other cases where an application can force the
kernel to acquire a lock needed by other things.
The point is not whether the *application* is badly written,
but point is whether a bad application can mess up the kernel
by causing a livelock.
Spinlocks are a slightly different story. While there isn't
the starvation issue, livelock can still occur if the kernel
needs to acquire the spinlock more often that it takes to
acquire. This is why replacing the xtime_lock with a spinlock
fixes the reader/writer livelock, but not the problem: while
the writer can now get the spinlock, it can take an entire
clock tick to acquire/release it. So the net behavior is the
same: with a 1KHz timer and with 1us cache-cache latency, 32
processors spinning on gettimeofday() using a spinlock would
have a similar result.
Kevin
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (13 preceding siblings ...)
2002-11-08 20:17 ` Van Maren, Kevin
@ 2002-11-08 20:39 ` Matthew Wilcox
2002-11-09 2:48 ` Rusty Russell
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2002-11-08 20:39 UTC (permalink / raw)
To: linux-ia64
On Fri, Nov 08, 2002 at 02:17:21PM -0600, Van Maren, Kevin wrote:
> > all that cacheline bouncing can't do your numa boxes any good.
>
> It happens even on our non-NUMA boxes. But that was the reason
> behind developing MCS locks: they are designed to minimize the
> cacheline bouncing due to lock contention, and become a win with
> a very small number of processors contending the same spinlock.
that's not my point... a resource occupies a number of cachelines;
bouncing those cachelines between processors is expensive. if there's
a real workload that all the processors are contending for the same
resource, it's time to split up that resource.
> I was using gettimeofday() as ONE example of the problem.
> Fixing gettimeofday(), such as with frlocks (see, for example,
> http://lwn.net/Articles/7388) fixes ONE occurance of the
> problem.
>
> Every reader/writer lock that an application can force
> the kernel to acquire can have this problem. If there
> is enough time between acquires, it may take 32 or 64
> processors to hang the system, but livelock WILL occur.
not true. every reader/writer lock which guards a global resource can
have this problem. if the lock only guards your own task's resources,
there can be no problem.
> There are MANY other cases where an application can force the
> kernel to acquire a lock needed by other things.
and i agree they should be fixed.
> Spinlocks are a slightly different story. While there isn't
> the starvation issue, livelock can still occur if the kernel
> needs to acquire the spinlock more often that it takes to
> acquire. This is why replacing the xtime_lock with a spinlock
> fixes the reader/writer livelock, but not the problem: while
> the writer can now get the spinlock, it can take an entire
> clock tick to acquire/release it. So the net behavior is the
> same: with a 1KHz timer and with 1us cache-cache latency, 32
> processors spinning on gettimeofday() using a spinlock would
> have a similar result.
right. so spinlocking this resource is also not good enough. did
you see the "Voyager subarchitecture for 2.5.46" thread earlier this
week which discussed making it per-cpu?
http://www.uwsg.iu.edu/hypermail/linux/kernel/0211.0/1799.html
this seems like the right way to go to me.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (14 preceding siblings ...)
2002-11-08 20:39 ` Matthew Wilcox
@ 2002-11-09 2:48 ` Rusty Russell
2002-11-11 16:29 ` Mario Smarduch
2002-11-11 20:01 ` [Linux-ia64] reader-writer livelock proble Mario Smarduch
17 siblings, 0 replies; 19+ messages in thread
From: Rusty Russell @ 2002-11-09 2:48 UTC (permalink / raw)
To: linux-ia64
In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
> On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> > There's another reason for not doing it that way: allowing readers to keep
> > interrupts on even in the presense of interrupt uses of readers.
> >
> > If you do the "pending writes stop readers" approach, you get
> >
> > cpu1 cpu2
> >
> > read_lock() - get
> >
> > write_lock_irq() - pending
> >
> > irq happens
> > - read_lock() - deadlock
> >
> > and that means that you need to make readers protect against interrupts
> > even if the interrupts only read themselves.
>
> Even without interrupts that would be a bug. It isn't ever safe to
> attempt to retake a read lock if you already hold it, because you may
> deadlock with a pending writer. Fair multi-reader locks aren't
> recursive locks.
That's the point. This is explicitly guaranteed with the current
locks, and you are allowed to recurse on them. The netfilter code
explicitly uses this to retake the net brlock, since it gets called
from different paths.
Implement "read_lock_yield" or "wrlock_t" but don't break existing
semantics until 2.7 *please*!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Linux-ia64] reader-writer livelock problem
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (15 preceding siblings ...)
2002-11-09 2:48 ` Rusty Russell
@ 2002-11-11 16:29 ` Mario Smarduch
2002-11-11 20:01 ` [Linux-ia64] reader-writer livelock proble Mario Smarduch
17 siblings, 0 replies; 19+ messages in thread
From: Mario Smarduch @ 2002-11-11 16:29 UTC (permalink / raw)
To: linux-ia64
Rusty Russell wrote:
> In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
> > On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> > > There's another reason for not doing it that way: allowing readers to keep
> > > interrupts on even in the presense of interrupt uses of readers.
> > >
> > > If you do the "pending writes stop readers" approach, you get
> > >
> > > cpu1 cpu2
> > >
> > > read_lock() - get
> > >
> > > write_lock_irq() - pending
> > >
> > > irq happens
> > > - read_lock() - deadlock
> > >
> > > and that means that you need to make readers protect against interrupts
> > > even if the interrupts only read themselves.
> >
> > Even without interrupts that would be a bug. It isn't ever safe to
> > attempt to retake a read lock if you already hold it, because you may
> > deadlock with a pending writer. Fair multi-reader locks aren't
> > recursive locks.
>
> That's the point. This is explicitly guaranteed with the current
> locks, and you are allowed to recurse on them. The netfilter code
> explicitly uses this to retake the net brlock, since it gets called
> from different paths.
>
> Implement "read_lock_yield" or "wrlock_t" but don't break existing
> semantics until 2.7 *please*!
>
> Rusty.
> --
> Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
From what I understand this is a huge security risk - any mischevious user
can hang the system. Practically speaking its a hard sell to tell any customer
that in 2.7 the problem will be fixed and hope that it doesnt happen before
then. Is there any way to prevent the user (non-root) from exploiting this
weakness? In order for this to happen do all the CPUs have to run at
100%? I know that on some commercial Unix systems there are ways
to cap the CPU utilization by user/group ids are there such features/patches
available
on Linux?
- Mario.
^ permalink raw reply [flat|nested] 19+ messages in thread* RE: [Linux-ia64] reader-writer livelock proble
2002-11-08 3:23 [Linux-ia64] reader-writer livelock problem Van Maren, Kevin
` (16 preceding siblings ...)
2002-11-11 16:29 ` Mario Smarduch
@ 2002-11-11 20:01 ` Mario Smarduch
17 siblings, 0 replies; 19+ messages in thread
From: Mario Smarduch @ 2002-11-11 20:01 UTC (permalink / raw)
To: linux-ia64
David Mosberger wrote:
> >>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:
>
> Mario> I know that on some commercial Unix systems there are ways to
> Mario> cap the CPU utilization by user/group ids are there such
> Mario> features/patches available on Linux?
>
> There are probably other patches floating around, but Process Resource
> Management (PRM) for Linux is/was one approach to do just that:
>
> http://resourcemanagement.unixsolutions.hp.com/WaRM/prm_linux/
>
> The kernel patches available from this URL are pretty old (up to
> 2.4.6, as far as I could see), and I'm not sure what the future plans
> for PRM on Linux are. Perhaps someone else can provide more details.
>
> --david
I took a look and it appears pretty encouraging. I guess the final question
would be - with CPU caps imposed on non-root users would that prevent
a user from livelocking the system? I don't recall how long
it took for the system to livelock (I erased the original email), there may be
an oppertunity for livelock to develop before the PRM policies kick in.
- Mario.
^ permalink raw reply [flat|nested] 19+ messages in thread