linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 11120] New: aacraid driver stalls on high-load SMP machines
@ 2008-07-18 21:36 bugme-daemon
  2008-07-19  1:13 ` [Bug 11120] " bugme-daemon
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: bugme-daemon @ 2008-07-18 21:36 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11120

           Summary: aacraid driver stalls on high-load SMP machines
           Product: SCSI Drivers
           Version: 2.5
     KernelVersion: 2.6.24
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: AACRAID
        AssignedTo: scsi_drivers-aacraid@kernel-bugs.osdl.org
        ReportedBy: smurf@smurf.noris.de


Latest working kernel version: unknown
Earliest failing kernel version: probably forever
Distribution: Ubuntu hardy
Hardware Environment: Dell PowerEdge 2650

Problem Description:

Under load, this happens rather often:

Jul 18 22:55:24 nun kernel: [86674.467410] aacraid: Host adapter abort request
(0,0,2,0)
Jul 18 22:55:24 nun kernel: [86674.467487] aacraid: Host adapter abort request
(0,0,3,0)
Jul 18 22:55:24 nun kernel: [86674.467617] aacraid: Host adapter reset request.
SCSI hang ?
Jul 18 22:57:26 nun kernel: [86815.728423] aacraid: Host adapter abort request
(0,0,0,0)
Jul 18 22:57:26 nun kernel: [86815.728500] aacraid: Host adapter abort request
(0,0,3,0)
Jul 18 22:57:26 nun kernel: [86815.728573] aacraid: Host adapter abort request
(0,0,2,0)
Jul 18 22:57:26 nun kernel: [86815.728640] aacraid: Host adapter abort request
(0,0,1,0)
Jul 18 22:57:26 nun kernel: [86815.728772] aacraid: Host adapter reset request.
SCSI hang ?

Access to the storage thus stalls for ten seconds or so.

I have successfully worked around the problem by using "schedtool -a 1
pid-of-basically-everything", so it seems to be an SMP-related problem.

However, one CPU is _somewhat_ slower than four, which is quite noticeable, so
we'd like to get this handled somehow :-/


lspci:

05:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
        Subsystem: Dell PowerEdge 2400,2500,2550,4400
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 7
        BIST result: 00
        I/O ports at cc00 [size=256]
        Memory at fccff000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fcd00000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

05:06.1 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
        Subsystem: Dell PowerEdge 2400,2500,2550,4400
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
        BIST result: 00
        I/O ports at c800 [size=256]
        Memory at fccfe000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at f8100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2


lspci -n:
05:06.0 0100: 9005:00c5 (rev 01)
        Subsystem: 1028:00c5
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 7
        BIST result: 00
        I/O ports at cc00 [size=256]
        Memory at fccff000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fcd00000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

05:06.1 0100: 9005:00c5 (rev 01)
        Subsystem: 1028:00c5
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
        BIST result: 00
        I/O ports at c800 [size=256]
        Memory at fccfe000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at f8100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 11120] aacraid driver stalls on high-load SMP machines
  2008-07-18 21:36 [Bug 11120] New: aacraid driver stalls on high-load SMP machines bugme-daemon
@ 2008-07-19  1:13 ` bugme-daemon
  2008-07-20 12:30 ` bugme-daemon
  2008-08-29  5:41 ` bugme-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugme-daemon @ 2008-07-19  1:13 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11120





------- Comment #1 from smurf@smurf.noris.de  2008-07-18 18:13 -------
Update: my uniprocessor band-aid, besides significantly decreasing performance,
resulted in an eventual CPU soft-hang (all of them) some hours later, so this
workaround obviously doesn't.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 11120] aacraid driver stalls on high-load SMP machines
  2008-07-18 21:36 [Bug 11120] New: aacraid driver stalls on high-load SMP machines bugme-daemon
  2008-07-19  1:13 ` [Bug 11120] " bugme-daemon
@ 2008-07-20 12:30 ` bugme-daemon
  2008-08-29  5:41 ` bugme-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugme-daemon @ 2008-07-20 12:30 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11120





------- Comment #2 from Mark_Salyzyn@adaptec.com  2008-07-20 05:30 -------
Increase your scsi bus timeouts and/or decrease the device queue depth. The
driver is doing what it can when the Adapter's Firmware gets overloaded and
reticent. One of the changes post 2.6.18 was to increase the maximum SGB Length
to 256 from 128 as safe at the time, this may have allowed this series of
Adapters to run out of internal resources in combination with other changes and
improvement in the block and scsi subsystem.

The line in .../drivers/scsi/aacraid/aacraid.h:

#define AAC_MAX_32BIT_SGBCOUNT  ((unsigned short)256)

affects this value.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 11120] aacraid driver stalls on high-load SMP machines
  2008-07-18 21:36 [Bug 11120] New: aacraid driver stalls on high-load SMP machines bugme-daemon
  2008-07-19  1:13 ` [Bug 11120] " bugme-daemon
  2008-07-20 12:30 ` bugme-daemon
@ 2008-08-29  5:41 ` bugme-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugme-daemon @ 2008-08-29  5:41 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11120


smurf@smurf.noris.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|scsi_drivers-aacraid@kernel-|smurf@smurf.noris.de
                   |bugs.osdl.org               |
             Status|NEW                         |ASSIGNED




------- Comment #3 from smurf@smurf.noris.de  2008-08-28 22:41 -------
Thank you. I will test this workaround today.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-29  5:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-18 21:36 [Bug 11120] New: aacraid driver stalls on high-load SMP machines bugme-daemon
2008-07-19  1:13 ` [Bug 11120] " bugme-daemon
2008-07-20 12:30 ` bugme-daemon
2008-08-29  5:41 ` bugme-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).