[Linux-ia64] Linux kernel deadlock caused by spinlock bug

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [Linux-ia64] Linux kernel deadlock caused by spinlock bug
@ 2002-07-29 20:37 Van Maren, Kevin
  2002-07-29 20:46 ` Matthew Wilcox
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2002-07-29 20:37 UTC (permalink / raw)
  To: linux-ia64

Hi all,

I have hit a problem with the Linux reader/writer spinlock
implementation that is causing a kernel deadlock (technically
livelock) which causes the system to hang (hard) when running
certain user applications.  This problem could be exploited as
a DoS attack on medium-to-large SMP machines.

Using some spiffy logic analyzers and debugging tools, I was able
to piece together the following scenario.

With "several" processors acquiring and releasing read locks, it is
possible for a processor to _never_ succeed in acquiring a write lock.
Even though the read lock is held for a very short period, with much
contention for the cache line the processor would often lose ownership
before it could release the read lock.  [Even if it had it longer,
because it was looping, there would still be a good chance that it
would lose the cache line while holding the reader lock.]  By the time
the reader got the cache line back to release the lock, another processor
had acquired the read lock.  This behavior resulted in a processor not
being able to acquire the write lock, which it was attempting to do in
an interrupt handler.  So the interrupt handler was _never_ able to
complete and other interrupts were blocked by that processor (in my
case, network and keyboard interrupts).

The specific case I tracked down consisted of several processes in
a tight gettimeofday() loop, which resulted in the reader count never
getting to zero because there was always an outstanding reader.  While
I will stipulate that it is not a good thing for several processes to
be looping in gettimeofday(), I will assert that it is a very bad thing
for a few processes calling such a benign system call to hang the system.

Unfortunately, this was not even a contrived test case, as I hit it
experimenting with HPL Linpack on a 32-processor Unisys ES7000 Orion 200,
although it does not take nearly 32 processors to reproduce.
See http://www.unisys.com/products/es7000__servers/hardware/index.htm

The HPL code polls for an incoming message in HPL_bcast, called by
HPL_pdupdateTT.  However, in the same while loop, it also calls
gettimeofday via HPL_ptimer.  The system hung when processor 0
received a timer interrupt before the user process it was running
sent the message, and after several processes were waiting for the
message.  So the other processes spun calling gettimeofday waiting
for the message that would not come until they stopped calling
gettimeofday.  Classic deadlock.  It normally took less than 5
minutes to reproduce (at which time I had to hard-reboot the system).

The logic analyzer proved it was not a hardware issue, since the
cache line was being fairly shared by the processors and the
reader count was being updated correctly.

I have included a new version of the write_lock code for IA64 at
the end of this email.  I'm not making any claims about it being
optimal, just that it appears to work.

I changed the code to allow the writer bit to remain set even if
there is a reader.  By only allowing a single processor to set
the writer bit, I don't have to worry about pending writers starving
out readers.  The potential writer that was able to set the writer bit
gains ownership of the lock when the current readers finish.  Since
the retry for read_lock does not keep trying to increment the reader
count, there are no other required changes.

A similar patch is needed for IA32 and any other SMP platform that
supports more than ~4 processors and does not already guarantee fairness
to writers.

The "fix" for IA32 would probably be very similar to the IA64 code,
but retaining the RW_LOCK_BIAS.  The only code that needs to change
is __write_lock_failed, which would need to keep RW_LOCK_BIAS subtracted
if the result is > 0xFF000000UL (0x0 - RW_LOCK_BIAS) [would not get
in __write_lock_failed if the result was 0].  Implementing that change
may require trashing another register.

Actually, the IA32 code should also have a "pause" instruction inserted
(especially for Foster processors) in all the retry code loops...
I've left the actual IA32 fix as an exercise for the reader, but I can
fix it and send out a patch if needed.

Kevin Van Maren

Here is the new IA64 write_lock code (include/asm-ia64/spinlock.h):

/*
 * write_lock pseudo-code:
 * Assume lock is unlocked, and try to acquire it.
 * If failed, wait until there isn't a writer, and then set the writer bit.
 * Once have writer bit, wait until there are no more readers.
 */
#define write_lock(rw)
\
do {
\
        __asm__ __volatile__ (
\
                "mov ar.ccv = r0\n"
\
                "dep r29 = -1, r0, 31, 1\n"
\
                ";;\n"
\
                "cmpxchg4.acq r2 = [%0], r29, ar.ccv\n"
\
                ";;\n"
\
                "cmp4.eq p0,p7 = r0, r2\n"
\
                "(p7) br.cond.spnt.few 2f\n"
\
                ";;\n"
\
                "1:\n"
\
                ".section .text.lock,\"ax\"\n"
\
                "2:\tld4 r30 = [%0]\n"
\
                ";;\n"
\
                "tbit.nz p7,p0 = r30, 31\n"
\
                "(p7) br.cond.spnt.few 2b\n"
\
                ";;\n"
\
                "mov ar.ccv = r30\n"
\
                "dep r29 = -1, r0, 31, 1\n"
\
                ";;\n"
\
                "or r29 = r29, r30\n"
\
                ";;\n"
\
                "cmpxchg4.acq r2 = [%0], r29, ar.ccv\n"
\
                ";;\n"
\
                "cmp4.eq p0,p7 = r30, r2\n"
\
                "(p7) br.cond.spnt.few 2b\n"
\
                ";;\n"
\
                "3:\n"
\
                "ld4 r2 = [%0]\n"
\
                ";;\n"
\
                "extr.u r30 = r2, 0, 31\n"
\
                ";;\n"
\
                "cmp4.eq p0,p7 = r0, r30\n"
\
                "(p7) br.cond.spnt.few 3b\n"
\
                "br.cond.sptk.few 1b\n"
\
                ".previous\n"
\
                :: "r"(rw) : "ar.ccv", "p7", "r2", "r29", "r30", "memory");
\
} while(0)

/*
 * clear_bit() has "acq" semantics; we're really need "rel" semantics,
 * but for simplicity, we simply do a fence for now...
 */
#define write_unlock(x)                         ({mb(); clear_bit(31,
(x));})

The old IA64 code (for comparison) was:

#define write_lock(rw)
\
do {
\
        __asm__ __volatile__ (
\
                "mov ar.ccv = r0\n"
\
                "dep r29 = -1, r0, 31, 1\n"
\
                ";;\n"
\
                "1:\n"
\
                "ld4 r2 = [%0]\n"
\
                ";;\n"
\
                "cmp4.eq p0,p7 = r0,r2\n"
\
                "(p7) br.cond.spnt.few 1b \n"
\
                "cmpxchg4.acq r2 = [%0], r29, ar.ccv\n"
\
                ";;\n"
\
                "cmp4.eq p0,p7 = r0, r2\n"
\
                "(p7) br.cond.spnt.few 1b\n"
\
                ";;\n"
\
                :: "r"(rw) : "ar.ccv", "p7", "r2", "r29", "memory");
\
} while(0)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
@ 2002-07-29 20:46 ` Matthew Wilcox
  2002-07-29 21:05 ` Van Maren, Kevin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2002-07-29 20:46 UTC (permalink / raw)
  To: linux-ia64

On Mon, Jul 29, 2002 at 03:37:17PM -0500, Van Maren, Kevin wrote:
> I changed the code to allow the writer bit to remain set even if
> there is a reader.  By only allowing a single processor to set
> the writer bit, I don't have to worry about pending writers starving
> out readers.  The potential writer that was able to set the writer bit
> gains ownership of the lock when the current readers finish.  Since
> the retry for read_lock does not keep trying to increment the reader
> count, there are no other required changes.

however, this is broken.  linux relies on being able to do

read_lock(x);
func()
  -> func()
       -> func()
            -> read_lock(x);

if a writer comes between those two read locks, you're toast.

i suspect the right answer for the contention you're seeing is an improved
get_timeofday which is lockless.

-- 
Revolutions do not require corporate support.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
  2002-07-29 20:46 ` Matthew Wilcox
@ 2002-07-29 21:05 ` Van Maren, Kevin
  2002-07-29 21:18 ` Matthew Wilcox
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2002-07-29 21:05 UTC (permalink / raw)
  To: linux-ia64

> On Mon, Jul 29, 2002 at 03:37:17PM -0500, Van Maren, Kevin wrote:
> > I changed the code to allow the writer bit to remain set even if
> > there is a reader.  By only allowing a single processor to set
> > the writer bit, I don't have to worry about pending writers starving
> > out readers.  The potential writer that was able to set the 
> writer bit
> > gains ownership of the lock when the current readers finish.  Since
> > the retry for read_lock does not keep trying to increment the reader
> > count, there are no other required changes.
> 
> however, this is broken.  linux relies on being able to do
> 
> read_lock(x);
> func()
>   -> func()
>        -> func()
>             -> read_lock(x);
> 
> if a writer comes between those two read locks, you're toast.
> 
> i suspect the right answer for the contention you're seeing 
> is an improved
> get_timeofday which is lockless.

Recursive read locks certainly make it more difficult to fix the
problem.  Placing a band-aid on gettimeofday will fix the symptom
in one location, but will not fix the general problem, which is
writer starvation with heavy read lock load.  The only way to fix
that is to make writer locks fair or to eliminate them (make them
_all_ stateless).

Recursive read locks also imply that you can't replace them with
a "normal" spinlock, which would also solve the problem (although
they do _not_ scale under contention -- something like O(N^2)
cache-cache transfers for N processors to acquire once).

There are ways of fixing the writer starvation and allowing recursive
read locks, but that is more work (and heavier-weight than desirable).
How pervasive are recursive reader locks?  Should they be a special
type of reader lock?

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
  2002-07-29 20:46 ` Matthew Wilcox
  2002-07-29 21:05 ` Van Maren, Kevin
@ 2002-07-29 21:18 ` Matthew Wilcox
  2002-07-29 21:29 ` Van Maren, Kevin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2002-07-29 21:18 UTC (permalink / raw)
  To: linux-ia64

On Mon, Jul 29, 2002 at 04:05:35PM -0500, Van Maren, Kevin wrote:
> Recursive read locks certainly make it more difficult to fix the
> problem.  Placing a band-aid on gettimeofday will fix the symptom
> in one location, but will not fix the general problem, which is
> writer starvation with heavy read lock load.  The only way to fix
> that is to make writer locks fair or to eliminate them (make them
> _all_ stateless).

The basic principle is that if you see contention on a spinlock, you
should eliminate the spinlock somehow.  making spinlocks `fair' doesn't
help that you're spending lots of time spinning on a lock.

-- 
Revolutions do not require corporate support.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (2 preceding siblings ...)
  2002-07-29 21:18 ` Matthew Wilcox
@ 2002-07-29 21:29 ` Van Maren, Kevin
  2002-07-29 21:48 ` David Mosberger
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Van Maren, Kevin @ 2002-07-29 21:29 UTC (permalink / raw)
  To: linux-ia64

> On Mon, Jul 29, 2002 at 04:05:35PM -0500, Van Maren, Kevin wrote:
> > Recursive read locks certainly make it more difficult to fix the
> > problem.  Placing a band-aid on gettimeofday will fix the symptom
> > in one location, but will not fix the general problem, which is
> > writer starvation with heavy read lock load.  The only way to fix
> > that is to make writer locks fair or to eliminate them (make them
> > _all_ stateless).
> 
> The basic principle is that if you see contention on a spinlock, you
> should eliminate the spinlock somehow.  making spinlocks 
> `fair' doesn't
> help that you're spending lots of time spinning on a lock.

Yes, but that isn't the point: unless you eliminate all rw locks,
it is conceptually possible to cause a kernel deadlock by forcing
contention on the locks you didn't remove, if the user can force
the kernel to acquire a reader lock and if something else needs to
acquire the writer lock.  Correctness is the issue, not performance.
You have locks because there _could_ be contention, and locks handle
that contention _correctly_.  If you can eliminate the contention,
you can eliminate the locks, but if there is a chance for contention,
the locks have to remain, _and_ they have to handle contention
_correctly_, which does not occur with the current reader/writer
lock code, which can hang the kernel just as dead as a writer
between recursive reader lock calls with my code.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (3 preceding siblings ...)
  2002-07-29 21:29 ` Van Maren, Kevin
@ 2002-07-29 21:48 ` David Mosberger
  2002-07-30 15:58 ` Russell Lewis
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Mosberger @ 2002-07-29 21:48 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 29 Jul 2002 16:29:09 -0500, "Van Maren, Kevin" <kevin.vanmaren@unisys.com> said:

  Van> Yes, but that isn't the point: unless you eliminate all rw
  Van> locks, it is conceptually possible to cause a kernel deadlock
  Van> by forcing contention on the locks you didn't remove, if the
  Van> user can force the kernel to acquire a reader lock and if
  Van> something else needs to acquire the writer lock.  Correctness
  Van> is the issue, not performance.

I agree with Kevin here.  There must be some argument as to why
readers cannot indefinitely lock out a writer.  A probabilistic
argument is fine, but just saying "contention doesn't happen"
certainly isn't good enough.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (4 preceding siblings ...)
  2002-07-29 21:48 ` David Mosberger
@ 2002-07-30 15:58 ` Russell Lewis
  2002-07-30 16:56 ` Richard B. Johnson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Russell Lewis @ 2002-07-30 15:58 UTC (permalink / raw)
  To: linux-ia64

IDEA: Implement a read/write "bias" field that can show if a lock has 
been gained many  times in succession for either read or write.  When 
locks of the opposite type are attempting (and failing) to get the lock, 
back off the other users until starvation is relieved.

* You would add an unsigned integer field.  When you gain a read lock, 
you check the field.  If it had previously been positive, then just 
increment it.  If it had previously been negative, set it to 1 
(equivalent to setting to 0 and incrementing).  Likewise, if you gain a 
write lock and it had been negative, you DEcrement it, while if it had 
been positive, you set it to -1.  The general idea here is that the bias 
field gives a non-precise view of how many locks of a given type have 
been assigned in a row.
* When you attempt to grab a lock but fail, you set a bit to indicate 
that your are waiting; there is one bit for waiting reads and another 
for waiting writes.
* Before grabbing either a read or write lock, you check the bits and 
the bias field.  If you are grabbing a read AND the bias is positive AND 
the "write waiting" bit is set, then you do NOPs for min(bias,MAX_NOPS) 
cycles BEFORE you attempt to grab the lock.  Likewise, if you are 
grabbing a write AND the bias is negative AND the "read waiting" bit is 
set, then you do NOPs before attempting the grab the lock.
* When you gain either a read or write lock, you clear the "waiting" 
bit.  If there are other waiters that still remain, they will re-set 
that bit soon.

The general idea here is that if there is any lock that is granting an 
excessive amount of either reads or writes, you give preference to the 
other type of lock.  As soon as one of the "other" type of lock is 
granted, the bias field is reset and the whole process starts over. 
 Since the bias field doesn't have to be precise, you can increment it 
non-atomically.

Thoughts?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (5 preceding siblings ...)
  2002-07-30 15:58 ` Russell Lewis
@ 2002-07-30 16:56 ` Richard B. Johnson
  2002-07-30 22:48 ` Sean Griffin
  2002-07-31 17:37 ` Russell Lewis
  8 siblings, 0 replies; 10+ messages in thread
From: Richard B. Johnson @ 2002-07-30 16:56 UTC (permalink / raw)
  To: linux-ia64

On Tue, 30 Jul 2002, Russell Lewis wrote:

> IDEA: Implement a read/write "bias" field that can show if a lock has 
> been gained many  times in succession for either read or write.  When 
> locks of the opposite type are attempting (and failing) to get the lock, 
> back off the other users until starvation is relieved.
> 

You need to gain a lock just to read the bias field. You can't read
something that somebody else will change while you are deciding
upon what you read. It just can't work.

If we presume that it did work. What problem are you attempting
to fix?  FYI, there are no known 'lock-hogs'. Unlike a wait on
a semaphore, where a task waiting will sleep (give up the CPU), a
deadlock on a spin-lock isn't possible. A task will eventually
get the resource. Because of the well-known phenomena of "locality",
every possible 'attack' on the spin-lock variable will become
ordered and the code waiting on the locked resource will get
it in a first-come-first-served basis. This, of course, assumes
that the code isn't broken by attempts to change the natural
order.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (6 preceding siblings ...)
  2002-07-30 16:56 ` Richard B. Johnson
@ 2002-07-30 22:48 ` Sean Griffin
  2002-07-31 17:37 ` Russell Lewis
  8 siblings, 0 replies; 10+ messages in thread
From: Sean Griffin @ 2002-07-30 22:48 UTC (permalink / raw)
  To: linux-ia64


Mr. Lewis' solution fails to address the scenario of recursively
taken read locks.  Since in that case, the thread taking the lock
already holds the lock, running some nops doesn't really give
another thread the possibility of acquiring the write lock.  So
it works out to be the same situation only with a bunch of nops
executed in the critical path.


-Sean Griffin




> On Tue, 30 Jul 2002, Russell Lewis wrote:
> 
> > IDEA: Implement a read/write "bias" field that can show if a lock has 
> > been gained many  times in succession for either read or write.  When 
> > locks of the opposite type are attempting (and failing) to get the lock, 
> > back off the other users until starvation is relieved.
> > 
> 
> You need to gain a lock just to read the bias field. You can't read
> something that somebody else will change while you are deciding
> upon what you read. It just can't work.
> 
> If we presume that it did work. What problem are you attempting
> to fix?  FYI, there are no known 'lock-hogs'. Unlike a wait on
> a semaphore, where a task waiting will sleep (give up the CPU), a
> deadlock on a spin-lock isn't possible. A task will eventually
> get the resource. Because of the well-known phenomena of "locality",
> every possible 'attack' on the spin-lock variable will become
> ordered and the code waiting on the locked resource will get
> it in a first-come-first-served basis. This, of course, assumes
> that the code isn't broken by attempts to change the natural
> order.
> 
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
> The US military has given us many words, FUBAR, SNAFU, now ENRON.
> Yes, top management were graduates of West Point and Annapolis.
> 
> 
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-ia64] Linux kernel deadlock caused by spinlock bug
  2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
                   ` (7 preceding siblings ...)
  2002-07-30 22:48 ` Sean Griffin
@ 2002-07-31 17:37 ` Russell Lewis
  8 siblings, 0 replies; 10+ messages in thread
From: Russell Lewis @ 2002-07-31 17:37 UTC (permalink / raw)
  To: linux-ia64

Sean Griffin wrote:

>
>Mr. Lewis' solution fails to address the scenario of recursively
>taken read locks.  Since in that case, the thread taking the lock
>already holds the lock, running some nops doesn't really give
>another thread the possibility of acquiring the write lock.  So
>it works out to be the same situation only with a bunch of nops
>executed in the critical path.
>
>
>-Sean Griffin
>
Right.  It's not a very good solution, just one of many hacks that might 
work :(

What if we kept a value (in per-CPU or per-thread memory) that was a 
pointer to the last r/w lock we'd gained?  When we release ANY r/w lock, 
we could blindly initialize the "lastLock" pointer back to NULL.  Then 
we have a very quick (i.e. not very good) heuristic for detecting 
recursive locks; if the "lastLock" pointer equals the pointer to the 
thing you're about to try to lock, then you know that you already hold 
it and are about to grab it recursively...otherwise, you fall back to 
other routines.

If you really wanted, you could keep multiple pointers per CPU (exact 
number tunable by a kernel #define), and on release only clear the one 
you are actually releasing.  Most builds, I imagine, wouldn't need more 
than just a single pointer, though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-07-31 17:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-29 20:37 [Linux-ia64] Linux kernel deadlock caused by spinlock bug Van Maren, Kevin
2002-07-29 20:46 ` Matthew Wilcox
2002-07-29 21:05 ` Van Maren, Kevin
2002-07-29 21:18 ` Matthew Wilcox
2002-07-29 21:29 ` Van Maren, Kevin
2002-07-29 21:48 ` David Mosberger
2002-07-30 15:58 ` Russell Lewis
2002-07-30 16:56 ` Richard B. Johnson
2002-07-30 22:48 ` Sean Griffin
2002-07-31 17:37 ` Russell Lewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox