All of lore.kernel.org
 help / color / mirror / Atom feed
* test10 hangs on startup: NMI watchdog hits Adaptec driver
@ 2003-11-25  0:23 Peter Chubb
  2003-11-25  0:37 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Chubb @ 2003-11-25  0:23 UTC (permalink / raw)
  To: linux-kernel, gibbs, James.Bottomley


Hi folks,
   I've been seeing random hangs on a dual 500MHz celeron here; so I
rebooted this morning with the NMI watchdog turned on.

With the watchdog, the machine shows the attached.  Looks to me as if
the lock taken at aic7xx_osm.c:1709 which is released *after*
ahc_linux_initialize_scsi_bus() should perhaps be released earlier.
Otherwise the host lock is held for the duration.


NMI Watchdog detected LOCKUP on CPU0, eip c02fb737, registers:
CPU:    0
EIP:    0060:[<c02fb737>]    Not tainted
EFLAGS: 00000086
EIP is at .text.lock.scsi+0x81/0xaa
eax: c1b01a50   ebx: c1b01800   ecx: c1b01800   edx: c1b01a00
esi: c1b01800   edi: 00000000   ebp: 00000046   esp: f7fa5da4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 1, threadinfo=f7fa4000 task=c1a47900)
Stack: 00000000 c1b01800 00000000 c1b01c00 00000000 c02ff2bb c1b01800 00000000 
       ffffff41 ffffffff c031f932 c1b01800 00000000 00000000 c1b01c98 00000000 
       c1b01c00 00000000 c030492f c1b01c00 c1b01c90 00000000 00000000 00000000 
Call Trace:
 [<c02ff2bb>] scsi_report_bus_reset+0x1b/0x50
 [<c031f932>] ahc_send_async+0xa2/0x2e0
 [<c030492f>] ahc_run_untagged_queues+0x2f/0x40
 [<c030fb23>] ahc_abort_scbs+0x403/0x4c0
 [<c0310016>] ahc_reset_channel+0x2b6/0x5d0
 [<c018ba09>] proc_create+0x89/0xe0
 [<c031c0dc>] ahc_linux_initialize_scsi_bus+0x1fc/0x210
 [<c031bca8>] ahc_linux_register_host+0x178/0x360
 [<c01905f4>] sysfs_add_file+0xa4/0xb0
 [<c018fee0>] init_file+0x0/0x20
 [<c028a888>] pci_create_newid_file+0x28/0x30
 [<c028ad5c>] pci_register_driver+0x7c/0xa0
 [<c031aa3c>] ahc_linux_detect+0x4c/0x80
 [<c0527b8f>] ahc_linux_init+0xf/0x30
 [<c051095c>] do_initcalls+0x2c/0xa0
 [<c013271f>] init_workqueues+0xf/0x24
 [<c01050f6>] init+0x56/0x180
 [<c01050a0>] init+0x0/0x180
 [<c0107259>] kernel_thread_helper+0x5/0xc

Code: f3 90 80 38 00 7e f9 e9 fc fc ff ff f3 90 80 38 00 7e f9 e9 
console shuts up ...

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: test10 hangs on startup: NMI watchdog hits Adaptec driver
  2003-11-25  0:23 test10 hangs on startup: NMI watchdog hits Adaptec driver Peter Chubb
@ 2003-11-25  0:37 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2003-11-25  0:37 UTC (permalink / raw)
  To: Peter Chubb; +Cc: Linux Kernel, gibbs

On Mon, 2003-11-24 at 18:23, Peter Chubb wrote:
>    I've been seeing random hangs on a dual 500MHz celeron here; so I
> rebooted this morning with the NMI watchdog turned on.
> 
> With the watchdog, the machine shows the attached.  Looks to me as if
> the lock taken at aic7xx_osm.c:1709 which is released *after*
> ahc_linux_initialize_scsi_bus() should perhaps be released earlier.
> Otherwise the host lock is held for the duration.

There have been several threads on this.

The fix is attached.

James

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1483  -> 1.1484 
#	drivers/scsi/scsi_error.c	1.65    -> 1.66   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/11/24	jejb@raven.il.steeleye.com	1.1484
# Fix locking problems in scsi_report_bus_reset() causing aic7xxx to hang
# 
# All the users of this function in the SCSI tree call it with the host
# lock held.  With the new list traversal code, it was trying to take
# the lock again to traverse the list.
# 
# Fix it to use the unlocked version of list traversal and modify the
# header comments to make it clear that the lock is expected to be held
# on calling it.
# --------------------------------------------
#
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Mon Nov 24 17:27:38 2003
+++ b/drivers/scsi/scsi_error.c	Mon Nov 24 17:27:38 2003
@@ -911,7 +911,9 @@
 
 	if (rtn == SUCCESS) {
 		scsi_sleep(BUS_RESET_SETTLE_TIME);
+		spin_lock_irqsave(scmd->device->host->host_lock, flags);
 		scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
+		spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
 	}
 
 	return rtn;
@@ -940,7 +942,9 @@
 
 	if (rtn == SUCCESS) {
 		scsi_sleep(HOST_RESET_SETTLE_TIME);
+		spin_lock_irqsave(scmd->device->host->host_lock, flags);
 		scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
+		spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
 	}
 
 	return rtn;
@@ -1608,7 +1612,7 @@
  *
  * Returns:     Nothing
  *
- * Lock status: No locks are assumed held.
+ * Lock status: Host lock must be held.
  *
  * Notes:       This only needs to be called if the reset is one which
  *		originates from an unknown location.  Resets originated
@@ -1622,7 +1626,7 @@
 {
 	struct scsi_device *sdev;
 
-	shost_for_each_device(sdev, shost) {
+	__shost_for_each_device(sdev, shost) {
 		if (channel == sdev->channel) {
 			sdev->was_reset = 1;
 			sdev->expecting_cc_ua = 1;
@@ -1642,7 +1646,7 @@
  *
  * Returns:     Nothing
  *
- * Lock status: No locks are assumed held.
+ * Lock status: Host lock must be held
  *
  * Notes:       This only needs to be called if the reset is one which
  *		originates from an unknown location.  Resets originated
@@ -1656,7 +1660,7 @@
 {
 	struct scsi_device *sdev;
 
-	shost_for_each_device(sdev, shost) {
+	__shost_for_each_device(sdev, shost) {
 		if (channel == sdev->channel &&
 		    target == sdev->id) {
 			sdev->was_reset = 1;


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-11-25  0:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-25  0:23 test10 hangs on startup: NMI watchdog hits Adaptec driver Peter Chubb
2003-11-25  0:37 ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.