public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* contention on long-held spinlock
@ 2011-08-19  9:21 Ortwin Glück
  2011-08-19 19:25 ` Bryan Donlan
  2011-08-19 23:30 ` Andi Kleen
  0 siblings, 2 replies; 5+ messages in thread
From: Ortwin Glück @ 2011-08-19  9:21 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have observed a bad behaviour that is likely caused by spinlocks in 
the qla2xxx driver. This is a QLogic Fibre Channel storage driver.

Somehow the attached SAN had a problem and became unresponsive. Many 
processes queued up waiting to write to the device. The processes were 
doing nothing but wait, but system load increased to insane values (40 
and above on a 4 core machine). The system was very sluggish and 
unresponsive, making it very hard and slow to see what actually was the 
problem.

I didn't run an indepth analysis, but this is my guess: I see that 
qla2xxx uses spinlocks to guard the HW against concurrent access. So if 
the HW becomes unresponsive all waiters would busy spin and burn 
resources, right? Those spinlocks are superfast as long as the HW 
responds well, but become a CPU burner once the HW becomes slow.

I wonder if spinlocks could be made aware of such a situation and relax. 
Something like if spinning for more than 1000 times, perform a simple 
backoff and sleep. A spinlock should never spin busy for several 
seconds, right?

Thanks,

Ortwin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: contention on long-held spinlock
  2011-08-19  9:21 contention on long-held spinlock Ortwin Glück
@ 2011-08-19 19:25 ` Bryan Donlan
       [not found]   ` <5E4F49720D0BAD499EE1F01232234BA873C669E4D7@AVEXMB1.qlogic.org>
  2011-08-23 16:24   ` Arnd Bergmann
  2011-08-19 23:30 ` Andi Kleen
  1 sibling, 2 replies; 5+ messages in thread
From: Bryan Donlan @ 2011-08-19 19:25 UTC (permalink / raw)
  To: Ortwin Glück; +Cc: linux-kernel, Andrew Vasquez, linux-driver, linux-scsi

2011/8/19 Ortwin Glück <odi@odi.ch>:
> Hi,
>
> I have observed a bad behaviour that is likely caused by spinlocks in the
> qla2xxx driver. This is a QLogic Fibre Channel storage driver.

Please CC the relevant maintainers when reporting driver bugs (I'm
adding them in this reply); it will help make sure the right people
notice. Maintainer addresses can be found in the MAINTAINERS file at
the root of the linux source tree.

What version of the kernel are you using? It would also help to
provide dmesg output from when the problem is occurring, if anything
out of the ordinary can be found there (if you've already rebooted,
check /var/log/kern.log - or wherever your distribution puts the
kernel log)

> Somehow the attached SAN had a problem and became unresponsive. Many
> processes queued up waiting to write to the device. The processes were doing
> nothing but wait, but system load increased to insane values (40 and above
> on a 4 core machine). The system was very sluggish and unresponsive, making
> it very hard and slow to see what actually was the problem.
>
> I didn't run an indepth analysis, but this is my guess: I see that qla2xxx
> uses spinlocks to guard the HW against concurrent access. So if the HW
> becomes unresponsive all waiters would busy spin and burn resources, right?
> Those spinlocks are superfast as long as the HW responds well, but become a
> CPU burner once the HW becomes slow.
>
> I wonder if spinlocks could be made aware of such a situation and relax.
> Something like if spinning for more than 1000 times, perform a simple
> backoff and sleep. A spinlock should never spin busy for several seconds,
> right?

That's what mutexes are for. Note, however, that interrupt handlers
cannot use mutexes as they cannot sleep, nor can they wait for lock
holders which may themselves sleep.

Also note that holding spinlocks for a long time is more likely to
result in lockups than a slowdown - a CPU attempting to grab a
spinlock disables migration and preemption, so on your four CPU
system, four processes waiting on spinlocks is enough to completely
lock up the system (unless you're using the real-time branch's kernel,
which converts most spinlocks to mutexes).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: contention on long-held spinlock
  2011-08-19  9:21 contention on long-held spinlock Ortwin Glück
  2011-08-19 19:25 ` Bryan Donlan
@ 2011-08-19 23:30 ` Andi Kleen
  1 sibling, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2011-08-19 23:30 UTC (permalink / raw)
  To: Ortwin Glück; +Cc: linux-kernel

Ortwin Glück <odi@odi.ch> writes:
>
> I wonder if spinlocks could be made aware of such a situation and
> relax. Something like if spinning for more than 1000 times, perform a
> simple backoff and sleep. A spinlock should never spin busy for
> several seconds, right?

Spining for several seconds is always a bug. There's no way to make
this work. Please post backtraces (e.g. from perf record -g ; perf
report) and cc the driver maintainer, so that they can fix their
buggy code.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: contention on long-held spinlock
       [not found]   ` <5E4F49720D0BAD499EE1F01232234BA873C669E4D7@AVEXMB1.qlogic.org>
@ 2011-08-23 15:07     ` Bryan Donlan
  0 siblings, 0 replies; 5+ messages in thread
From: Bryan Donlan @ 2011-08-23 15:07 UTC (permalink / raw)
  To: Chad Dupuis; +Cc: Andrew Vasquez, Ortwin Glück, linux-kernel, Andi Kleen

On Tue, Aug 23, 2011 at 08:59, Chad Dupuis <chad.dupuis@qlogic.com> wrote:
> Hi Bryan,
>
> Could you provide some configuration details and system log files?  It would help us start to pinpoint where the problem may lie.

Hi,

Ortwin was the one with the problem originally - not me. I just added
maintainer CCs to try to get the message to the right place - I don't
even have the hardware in question :)

Thanks,

Bryan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: contention on long-held spinlock
  2011-08-19 19:25 ` Bryan Donlan
       [not found]   ` <5E4F49720D0BAD499EE1F01232234BA873C669E4D7@AVEXMB1.qlogic.org>
@ 2011-08-23 16:24   ` Arnd Bergmann
  1 sibling, 0 replies; 5+ messages in thread
From: Arnd Bergmann @ 2011-08-23 16:24 UTC (permalink / raw)
  To: Bryan Donlan
  Cc: Ortwin Glück, linux-kernel, Andrew Vasquez, linux-driver,
	linux-scsi

On Friday 19 August 2011, Bryan Donlan wrote:
> That's what mutexes are for. Note, however, that interrupt handlers
> cannot use mutexes as they cannot sleep, nor can they wait for lock
> holders which may themselves sleep.

I agree that there is probably some other bug that needs to be fixed
in the driver, but for testing it may well be worth replacing
the spinlock with a mutex and the request_irq with request_threaded_irq.
A threaded IRQ is slower than a normal one but does allow mutexes.

	Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-23 16:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-19  9:21 contention on long-held spinlock Ortwin Glück
2011-08-19 19:25 ` Bryan Donlan
     [not found]   ` <5E4F49720D0BAD499EE1F01232234BA873C669E4D7@AVEXMB1.qlogic.org>
2011-08-23 15:07     ` Bryan Donlan
2011-08-23 16:24   ` Arnd Bergmann
2011-08-19 23:30 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox