[2.6.14-rc1] sym scsi boot hang

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [2.6.14-rc1] sym scsi boot hang
@ 2005-09-13 12:48 Dipankar Sarma
  2005-09-13 13:17 ` Anton Blanchard
  0 siblings, 1 reply; 21+ messages in thread
From: Dipankar Sarma @ 2005-09-13 12:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

My ppc64 box refuses to boot with 2.6.14-rc1. The console log
is included below. I see a lot of repeat of the last message
in the log and then the box hangs.

Any idea what might have caused this ?

Thanks
Dipankar

sym0: <1010-66> rev 0x1 at pci 0001:01:01.0 irq 115
sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.1
 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:8: tagged command queuing enabled, command queue depth 16.
 target0:0:8: Beginning Domain Validation
 target0:0:8: asynchronous.
 target0:0:8: wide asynchronous.
 target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:8: Write Buffer failure 700ff
 target0:0:8: Domain Validation Disabing Information Units
 target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:8: Write Buffer failure 700ff
 target0:0:8: Domain Validation detected failure, dropping back
 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:8: Ending Domain Validation
 target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:9: tagged command queuing enabled, command queue depth 16.
 target0:0:9: Beginning Domain Validation
 target0:0:9: asynchronous.
 target0:0:9: wide asynchronous.
 target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:9: Write Buffer failure 700ff
 target0:0:9: Domain Validation Disabing Information Units
 target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:9: Write Buffer failure 700ff
 target0:0:9: Domain Validation detected failure, dropping back
 target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:9: Ending Domain Validation
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:10: tagged command queuing enabled, command queue depth 16.
 target0:0:10: Beginning Domain Validation
 target0:0:10: asynchronous.
 target0:0:10: wide asynchronous.
 target0:0:10: Domain Validation skipping write tests
 target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:10: Domain Validation Disabing Information Units
 target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:10: Domain Validation detected failure, dropping back
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: Ending Domain Validation
  Vendor: IBM       Model: HSBPM2   PU2SCSI  Rev: 0016
  Type:   Enclosure                          ANSI SCSI revision: 02
 target0:0:14: Beginning Domain Validation
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 target0:0:14: Ending Domain Validation
  Vendor: IBM       Model: HSBPD4M  PU3SCSI  Rev: 0016
  Type:   Enclosure                          ANSI SCSI revision: 02
 target0:0:15: Beginning Domain Validation
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 target0:0:15: Ending Domain Validation
sym1: <1010-66> rev 0x1 at pci 0001:01:01.1 irq 116
sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi1 : sym-2.2.1
sym2: <1010-66> rev 0x1 at pci 0001:41:01.0 irq 119
sym2: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym2: SCSI BUS has been reset.
scsi2 : sym-2.2.1
sym3: <1010-66> rev 0x1 at pci 0001:41:01.1 irq 120
sym3: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym3: SCSI BUS has been reset.
scsi3 : sym-2.2.1
st: Version 20050830, fixed bufsize 32768, s/g segs 256
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
Attached scsi disk sda at scsi0, channel 0, id 8, lun 0
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sdb: drive cache: write through
 sdb: sdb1 sdb2
Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
sdc: Spinning up disk....<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 12:48 [2.6.14-rc1] sym scsi boot hang Dipankar Sarma
@ 2005-09-13 13:17 ` Anton Blanchard
  2005-09-13 14:29   ` Anton Blanchard
  0 siblings, 1 reply; 21+ messages in thread
From: Anton Blanchard @ 2005-09-13 13:17 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: linux-scsi, linux-kernel


Hi,

> My ppc64 box refuses to boot with 2.6.14-rc1. The console log
> is included below. I see a lot of repeat of the last message
> in the log and then the box hangs.
> 
> Any idea what might have caused this ?

...

> Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
> sdc: Spinning up disk....<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)

Looks like a change between 2.6.13-git11 and 2.6.14-rc1 caused this - so
something in the last 24 hours.

Anton

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 13:17 ` Anton Blanchard
@ 2005-09-13 14:29   ` Anton Blanchard
  2005-09-13 16:35     ` James Bottomley
  0 siblings, 1 reply; 21+ messages in thread
From: Anton Blanchard @ 2005-09-13 14:29 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: linux-scsi, linux-kernel

 
Hi,

> Looks like a change between 2.6.13-git11 and 2.6.14-rc1 caused this - so
> something in the last 24 hours.

I just noticed a similar hang on the ibmvscsi driver. The following
backout patch seems to fix it (part of the scsi merge yesterday), I'll
look closer after I get some sleep.

Anton

diff -urN build/drivers/scsi/scsi_lib.c build2/drivers/scsi/scsi_lib.c
--- build/drivers/scsi/scsi_lib.c	2005-09-13 15:13:32.000000000 +1000
+++ build2/drivers/scsi/scsi_lib.c	2005-09-14 00:44:57.000000000 +1000
@@ -97,30 +97,6 @@
 }
 
 static void scsi_run_queue(struct request_queue *q);
-static void scsi_release_buffers(struct scsi_cmnd *cmd);
-
-/*
- * Function:	scsi_unprep_request()
- *
- * Purpose:	Remove all preparation done for a request, including its
- *		associated scsi_cmnd, so that it can be requeued.
- *
- * Arguments:	req	- request to unprepare
- *
- * Lock status:	Assumed that no locks are held upon entry.
- *
- * Returns:	Nothing.
- */
-static void scsi_unprep_request(struct request *req)
-{
-	struct scsi_cmnd *cmd = req->special;
-
-	req->flags &= ~REQ_DONTPREP;
-	req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL;
-
-	scsi_release_buffers(cmd);
-	scsi_put_command(cmd);
-}
 
 /*
  * Function:    scsi_queue_insert()
@@ -140,14 +116,12 @@
  *              commands.
  * Notes:       This could be called either from an interrupt context or a
  *              normal process context.
- * Notes:	Upon return, cmd is a stale pointer.
  */
 int scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
 	struct Scsi_Host *host = cmd->device->host;
 	struct scsi_device *device = cmd->device;
 	struct request_queue *q = device->request_queue;
-	struct request *req = cmd->request;
 	unsigned long flags;
 
 	SCSI_LOG_MLQUEUE(1,
@@ -188,9 +162,8 @@
 	 * function.  The SCSI request function detects the blocked condition
 	 * and plugs the queue appropriately.
          */
-	scsi_unprep_request(req);
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_requeue_request(q, req);
+	blk_requeue_request(q, cmd->request);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
@@ -366,7 +339,7 @@
 	int result;
 	
 	if (sshdr) {
-		sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_NOIO);
+		sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
 		if (!sense)
 			return DRIVER_ERROR << 24;
 		memset(sense, 0, SCSI_SENSE_BUFFERSIZE);
@@ -579,16 +552,15 @@
  *		I/O errors in the middle of the request, in which case
  *		we need to request the blocks that come after the bad
  *		sector.
- * Notes:	Upon return, cmd is a stale pointer.
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	struct request *req = cmd->request;
 	unsigned long flags;
 
-	scsi_unprep_request(req);
+	cmd->request->flags &= ~REQ_DONTPREP;
+
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_requeue_request(q, req);
+	blk_requeue_request(q, cmd->request);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
@@ -623,14 +595,13 @@
  *
  * Lock status: Assumed that lock is not held upon entry.
  *
- * Returns:     cmd if requeue required, NULL otherwise.
+ * Returns:     cmd if requeue done or required, NULL otherwise
  *
  * Notes:       This is called for block device requests in order to
  *              mark some number of sectors as complete.
  * 
  *		We are guaranteeing that the request queue will be goosed
  *		at some point during this call.
- * Notes:	If cmd was requeued, upon return it will be a stale pointer.
  */
 static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
 					  int bytes, int requeue)
@@ -653,15 +624,14 @@
 		if (!uptodate && blk_noretry_request(req))
 			end_that_request_chunk(req, 0, leftover);
 		else {
-			if (requeue) {
+			if (requeue)
 				/*
 				 * Bleah.  Leftovers again.  Stick the
 				 * leftovers in the front of the
 				 * queue, and goose the queue again.
 				 */
 				scsi_requeue_command(q, cmd);
-				cmd = NULL;
-			}
+
 			return cmd;
 		}
 	}
@@ -887,13 +857,15 @@
 		 * requeueing right here - we will requeue down below
 		 * when we handle the bad sectors.
 		 */
+		cmd = scsi_end_request(cmd, 1, good_bytes, result == 0);
 
 		/*
-		 * If the command completed without error, then either
-		 * finish off the rest of the command, or start a new one.
+		 * If the command completed without error, then either finish off the
+		 * rest of the command, or start a new one.
 		 */
-		if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
+		if (result == 0 || cmd == NULL ) {
 			return;
+		}
 	}
 	/*
 	 * Now, if we were good little boys and girls, Santa left us a request
@@ -908,7 +880,7 @@
 				 * and quietly refuse further access.
 				 */
 				cmd->device->changed = 1;
-				scsi_end_request(cmd, 0,
+				cmd = scsi_end_request(cmd, 0,
 						this_count, 1);
 				return;
 			} else {
@@ -942,7 +914,7 @@
 				scsi_requeue_command(q, cmd);
 				result = 0;
 			} else {
-				scsi_end_request(cmd, 0, this_count, 1);
+				cmd = scsi_end_request(cmd, 0, this_count, 1);
 				return;
 			}
 			break;
@@ -959,7 +931,7 @@
 				dev_printk(KERN_INFO,
 					   &cmd->device->sdev_gendev,
 					   "Device not ready.\n");
-			scsi_end_request(cmd, 0, this_count, 1);
+			cmd = scsi_end_request(cmd, 0, this_count, 1);
 			return;
 		case VOLUME_OVERFLOW:
 			if (!(req->flags & REQ_QUIET)) {
@@ -969,7 +941,7 @@
 				__scsi_print_command(cmd->data_cmnd);
 				scsi_print_sense("", cmd);
 			}
-			scsi_end_request(cmd, 0, block_bytes, 1);
+			cmd = scsi_end_request(cmd, 0, block_bytes, 1);
 			return;
 		default:
 			break;
@@ -1000,7 +972,7 @@
 		block_bytes = req->hard_cur_sectors << 9;
 		if (!block_bytes)
 			block_bytes = req->data_len;
-		scsi_end_request(cmd, 0, block_bytes, 1);
+		cmd = scsi_end_request(cmd, 0, block_bytes, 1);
 	}
 }
 EXPORT_SYMBOL(scsi_io_completion);
@@ -1146,7 +1118,7 @@
 	if (unlikely(!scsi_device_online(sdev))) {
 		printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n",
 		       sdev->host->host_no, sdev->id, sdev->lun);
-		goto kill;
+		return BLKPREP_KILL;
 	}
 	if (unlikely(sdev->sdev_state != SDEV_RUNNING)) {
 		/* OK, we're not in a running state don't prep
@@ -1156,7 +1128,7 @@
 			 * at all allowed down */
 			printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to dead device\n",
 			       sdev->host->host_no, sdev->id, sdev->lun);
-			goto kill;
+			return BLKPREP_KILL;
 		}
 		/* OK, we only allow special commands (i.e. not
 		 * user initiated ones */
@@ -1188,11 +1160,11 @@
 		if(unlikely(specials_only) && !(req->flags & REQ_SPECIAL)) {
 			if(specials_only == SDEV_QUIESCE ||
 					specials_only == SDEV_BLOCK)
-				goto defer;
+				return BLKPREP_DEFER;
 			
 			printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to device being removed\n",
 			       sdev->host->host_no, sdev->id, sdev->lun);
-			goto kill;
+			return BLKPREP_KILL;
 		}
 			
 			
@@ -1210,7 +1182,7 @@
 		cmd->tag = req->tag;
 	} else {
 		blk_dump_rq_flags(req, "SCSI bad req");
-		goto kill;
+		return BLKPREP_KILL;
 	}
 	
 	/* note the overloading of req->special.  When the tag
@@ -1248,13 +1220,8 @@
 		 * required).
 		 */
 		ret = scsi_init_io(cmd);
-		switch(ret) {
-		case BLKPREP_KILL:
-			/* BLKPREP_KILL return also releases the command */
-			goto kill;
-		case BLKPREP_DEFER:
-			goto defer;
-		}
+		if (ret)	/* BLKPREP_KILL return also releases the command */
+			return ret;
 		
 		/*
 		 * Initialize the actual SCSI command for this request.
@@ -1264,7 +1231,7 @@
 			if (unlikely(!drv->init_command(cmd))) {
 				scsi_release_buffers(cmd);
 				scsi_put_command(cmd);
-				goto kill;
+				return BLKPREP_KILL;
 			}
 		} else {
 			memcpy(cmd->cmnd, req->cmd, sizeof(cmd->cmnd));
@@ -1295,9 +1262,6 @@
 	if (sdev->device_busy == 0)
 		blk_plug_device(q);
 	return BLKPREP_DEFER;
- kill:
-	req->errors = DID_NO_CONNECT << 16;
-	return BLKPREP_KILL;
 }
 
 /*
@@ -1372,24 +1336,19 @@
 }
 
 /*
- * Kill a request for a dead device
+ * Kill requests for a dead device
  */
-static void scsi_kill_request(struct request *req, request_queue_t *q)
+static void scsi_kill_requests(request_queue_t *q)
 {
-	struct scsi_cmnd *cmd = req->special;
-
-	blkdev_dequeue_request(req);
+	struct request *req;
 
-	if (unlikely(cmd == NULL)) {
-		printk(KERN_CRIT "impossible request in %s.\n",
-				 __FUNCTION__);
-		BUG();
+	while ((req = elv_next_request(q)) != NULL) {
+		blkdev_dequeue_request(req);
+		req->flags |= REQ_QUIET;
+		while (end_that_request_first(req, 0, req->nr_sectors))
+			;
+		end_that_request_last(req);
 	}
-
-	scsi_init_cmd_errh(cmd);
-	cmd->result = DID_NO_CONNECT << 16;
-	atomic_inc(&cmd->device->iorequest_cnt);
-	__scsi_done(cmd);
 }
 
 /*
@@ -1412,8 +1371,7 @@
 
 	if (!sdev) {
 		printk("scsi: killing requests for dead queue\n");
-		while ((req = elv_next_request(q)) != NULL)
-			scsi_kill_request(req, q);
+		scsi_kill_requests(q);
 		return;
 	}
 
@@ -1440,7 +1398,11 @@
 		if (unlikely(!scsi_device_online(sdev))) {
 			printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n",
 			       sdev->host->host_no, sdev->id, sdev->lun);
-			scsi_kill_request(req, q);
+			blkdev_dequeue_request(req);
+			req->flags |= REQ_QUIET;
+			while (end_that_request_first(req, 0, req->nr_sectors))
+				;
+			end_that_request_last(req);
 			continue;
 		}
 
@@ -1453,14 +1415,6 @@
 		sdev->device_busy++;
 
 		spin_unlock(q->queue_lock);
-		cmd = req->special;
-		if (unlikely(cmd == NULL)) {
-			printk(KERN_CRIT "impossible request in %s.\n"
-					 "please mail a stack trace to "
-					 "linux-scsi@vger.kernel.org",
-					 __FUNCTION__);
-			BUG();
-		}
 		spin_lock(shost->host_lock);
 
 		if (!scsi_host_queue_ready(q, shost, sdev))
@@ -1479,6 +1433,15 @@
 		 */
 		spin_unlock_irq(shost->host_lock);
 
+		cmd = req->special;
+		if (unlikely(cmd == NULL)) {
+			printk(KERN_CRIT "impossible request in %s.\n"
+					 "please mail a stack trace to "
+					 "linux-scsi@vger.kernel.org",
+					 __FUNCTION__);
+			BUG();
+		}
+
 		/*
 		 * Finally, initialize any error handling parameters, and set up
 		 * the timers for timeouts.
@@ -1514,7 +1477,6 @@
 	 * cases (host limits or settings) should run the queue at some
 	 * later time.
 	 */
-	scsi_unprep_request(req);
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;
diff -urN build/drivers/scsi/scsi_priv.h build2/drivers/scsi/scsi_priv.h
--- build/drivers/scsi/scsi_priv.h	2005-09-13 15:13:32.000000000 +1000
+++ build2/drivers/scsi/scsi_priv.h	2005-09-14 00:44:57.000000000 +1000
@@ -124,7 +124,6 @@
 extern void scsi_sysfs_device_initialize(struct scsi_device *);
 extern int scsi_sysfs_target_initialize(struct scsi_device *);
 extern struct scsi_transport_template blank_transport_template;
-extern void __scsi_remove_device(struct scsi_device *);
 
 extern struct bus_type scsi_bus_type;
 
diff -urN build/drivers/scsi/scsi_scan.c build2/drivers/scsi/scsi_scan.c
--- build/drivers/scsi/scsi_scan.c	2005-09-13 15:13:32.000000000 +1000
+++ build2/drivers/scsi/scsi_scan.c	2005-09-14 00:44:57.000000000 +1000
@@ -870,12 +870,8 @@
  out_free_sdev:
 	if (res == SCSI_SCAN_LUN_PRESENT) {
 		if (sdevp) {
-			if (scsi_device_get(sdev) == 0) {
-				*sdevp = sdev;
-			} else {
-				__scsi_remove_device(sdev);
-				res = SCSI_SCAN_NO_RESPONSE;
-			}
+			scsi_device_get(sdev);
+			*sdevp = sdev;
 		}
 	} else {
 		if (sdev->host->hostt->slave_destroy)
@@ -1264,19 +1260,6 @@
 }
 EXPORT_SYMBOL(__scsi_add_device);
 
-int scsi_add_device(struct Scsi_Host *host, uint channel,
-		    uint target, uint lun)
-{
-	struct scsi_device *sdev = 
-		__scsi_add_device(host, channel, target, lun, NULL);
-	if (IS_ERR(sdev))
-		return PTR_ERR(sdev);
-
-	scsi_device_put(sdev);
-	return 0;
-}
-EXPORT_SYMBOL(scsi_add_device);
-
 void scsi_rescan_device(struct device *dev)
 {
 	struct scsi_driver *drv;
@@ -1293,8 +1276,27 @@
 }
 EXPORT_SYMBOL(scsi_rescan_device);
 
-static void __scsi_scan_target(struct device *parent, unsigned int channel,
-		unsigned int id, unsigned int lun, int rescan)
+/**
+ * scsi_scan_target - scan a target id, possibly including all LUNs on the
+ *     target.
+ * @sdevsca:	Scsi_Device handle for scanning
+ * @shost:	host to scan
+ * @channel:	channel to scan
+ * @id:		target id to scan
+ *
+ * Description:
+ *     Scan the target id on @shost, @channel, and @id. Scan at least LUN
+ *     0, and possibly all LUNs on the target id.
+ *
+ *     Use the pre-allocated @sdevscan as a handle for the scanning. This
+ *     function sets sdevscan->host, sdevscan->id and sdevscan->lun; the
+ *     scanning functions modify sdevscan->lun.
+ *
+ *     First try a REPORT LUN scan, if that does not scan the target, do a
+ *     sequential scan of LUNs on the target id.
+ **/
+void scsi_scan_target(struct device *parent, unsigned int channel,
+		      unsigned int id, unsigned int lun, int rescan)
 {
 	struct Scsi_Host *shost = dev_to_shost(parent);
 	int bflags = 0;
@@ -1308,7 +1310,9 @@
 		 */
 		return;
 
+
 	starget = scsi_alloc_target(parent, channel, id);
+
 	if (!starget)
 		return;
 
@@ -1354,33 +1358,6 @@
 
 	put_device(&starget->dev);
 }
-
-/**
- * scsi_scan_target - scan a target id, possibly including all LUNs on the
- *     target.
- * @parent:	host to scan
- * @channel:	channel to scan
- * @id:		target id to scan
- * @lun:	Specific LUN to scan or SCAN_WILD_CARD
- * @rescan:	passed to LUN scanning routines
- *
- * Description:
- *     Scan the target id on @parent, @channel, and @id. Scan at least LUN 0,
- *     and possibly all LUNs on the target id.
- *
- *     First try a REPORT LUN scan, if that does not scan the target, do a
- *     sequential scan of LUNs on the target id.
- **/
-void scsi_scan_target(struct device *parent, unsigned int channel,
-		      unsigned int id, unsigned int lun, int rescan)
-{
-	struct Scsi_Host *shost = dev_to_shost(parent);
-
-	down(&shost->scan_mutex);
-	if (scsi_host_scan_allowed(shost))
-		__scsi_scan_target(parent, channel, id, lun, rescan);
-	up(&shost->scan_mutex);
-}
 EXPORT_SYMBOL(scsi_scan_target);
 
 static void scsi_scan_channel(struct Scsi_Host *shost, unsigned int channel,
@@ -1406,12 +1383,10 @@
 				order_id = shost->max_id - id - 1;
 			else
 				order_id = id;
-			__scsi_scan_target(&shost->shost_gendev, channel,
-					order_id, lun, rescan);
+			scsi_scan_target(&shost->shost_gendev, channel, order_id, lun, rescan);
 		}
 	else
-		__scsi_scan_target(&shost->shost_gendev, channel,
-				id, lun, rescan);
+		scsi_scan_target(&shost->shost_gendev, channel, id, lun, rescan);
 }
 
 int scsi_scan_host_selected(struct Scsi_Host *shost, unsigned int channel,
@@ -1509,15 +1484,12 @@
  */
 struct scsi_device *scsi_get_host_dev(struct Scsi_Host *shost)
 {
-	struct scsi_device *sdev = NULL;
+	struct scsi_device *sdev;
 	struct scsi_target *starget;
 
-	down(&shost->scan_mutex);
-	if (!scsi_host_scan_allowed(shost))
-		goto out;
 	starget = scsi_alloc_target(&shost->shost_gendev, 0, shost->this_id);
 	if (!starget)
-		goto out;
+		return NULL;
 
 	sdev = scsi_alloc_sdev(starget, 0, NULL);
 	if (sdev) {
@@ -1525,8 +1497,6 @@
 		sdev->borken = 0;
 	}
 	put_device(&starget->dev);
- out:
-	up(&shost->scan_mutex);
 	return sdev;
 }
 EXPORT_SYMBOL(scsi_get_host_dev);
diff -urN build/drivers/scsi/scsi_sysfs.c build2/drivers/scsi/scsi_sysfs.c
--- build/drivers/scsi/scsi_sysfs.c	2005-09-13 15:13:32.000000000 +1000
+++ build2/drivers/scsi/scsi_sysfs.c	2005-09-14 00:44:57.000000000 +1000
@@ -653,7 +653,7 @@
 			error = attr_add(&sdev->sdev_gendev,
 					sdev->host->hostt->sdev_attrs[i]);
 			if (error) {
-				__scsi_remove_device(sdev);
+				scsi_remove_device(sdev);
 				goto out;
 			}
 		}
@@ -667,7 +667,7 @@
 							scsi_sysfs_sdev_attrs[i]);
 			error = device_create_file(&sdev->sdev_gendev, attr);
 			if (error) {
-				__scsi_remove_device(sdev);
+				scsi_remove_device(sdev);
 				goto out;
 			}
 		}
@@ -687,10 +687,17 @@
 	return error;
 }
 
-void __scsi_remove_device(struct scsi_device *sdev)
+/**
+ * scsi_remove_device - unregister a device from the scsi bus
+ * @sdev:	scsi_device to unregister
+ **/
+void scsi_remove_device(struct scsi_device *sdev)
 {
+	struct Scsi_Host *shost = sdev->host;
+
+	down(&shost->scan_mutex);
 	if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
-		return;
+		goto out;
 
 	class_device_unregister(&sdev->sdev_classdev);
 	device_del(&sdev->sdev_gendev);
@@ -699,17 +706,8 @@
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_unregister_device(&sdev->sdev_gendev);
 	put_device(&sdev->sdev_gendev);
-}
-
-/**
- * scsi_remove_device - unregister a device from the scsi bus
- * @sdev:	scsi_device to unregister
- **/
-void scsi_remove_device(struct scsi_device *sdev)
-{
-	down(&sdev->host->scan_mutex);
-	__scsi_remove_device(sdev);
-	up(&sdev->host->scan_mutex);
+out:
+	up(&shost->scan_mutex);
 }
 EXPORT_SYMBOL(scsi_remove_device);
 
diff -urN build/include/scsi/scsi_device.h build2/include/scsi/scsi_device.h
--- build/include/scsi/scsi_device.h	2005-09-13 15:13:32.000000000 +1000
+++ build2/include/scsi/scsi_device.h	2005-09-14 00:44:57.000000000 +1000
@@ -178,8 +178,8 @@
 
 extern struct scsi_device *__scsi_add_device(struct Scsi_Host *,
 		uint, uint, uint, void *hostdata);
-extern int scsi_add_device(struct Scsi_Host *host, uint channel,
-			   uint target, uint lun);
+#define scsi_add_device(host, channel, target, lun) \
+	__scsi_add_device(host, channel, target, lun, NULL)
 extern void scsi_remove_device(struct scsi_device *);
 extern int scsi_device_cancel(struct scsi_device *, int);
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 14:29   ` Anton Blanchard
@ 2005-09-13 16:35     ` James Bottomley
  2005-09-13 16:47       ` Anton Blanchard
  2005-09-14  8:06       ` Anton Blanchard
  0 siblings, 2 replies; 21+ messages in thread
From: James Bottomley @ 2005-09-13 16:35 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 2005-09-14 at 00:29 +1000, Anton Blanchard wrote:
> I just noticed a similar hang on the ibmvscsi driver. The following
> backout patch seems to fix it (part of the scsi merge yesterday), I'll
> look closer after I get some sleep.

If that's the cause, it's probably a double down of the host scan
semaphore somewhere in the code.  alt-sysrq-t should work in this case,
can you get a stack trace of the blocked process?

Thanks,

James



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 16:35     ` James Bottomley
@ 2005-09-13 16:47       ` Anton Blanchard
  2005-09-13 17:32         ` James Bottomley
  2005-09-14  8:06       ` Anton Blanchard
  1 sibling, 1 reply; 21+ messages in thread
From: Anton Blanchard @ 2005-09-13 16:47 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel


Hi,

> If that's the cause, it's probably a double down of the host scan
> semaphore somewhere in the code.  alt-sysrq-t should work in this case,
> can you get a stack trace of the blocked process?

Good idea, I wonder why the IO isnt completing.

Anton

[c0000000004945fc] schedule+0x63c/0xf70
[c0000000004954c8] wait_for_completion+0xb8/0x140
[c0000000002a3800] blk_execute_rq+0xb0/0x120
[c000000000338a98] scsi_execute+0xf8/0x150
[c000000000338bc0] scsi_execute_req+0xd0/0x140
[c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0
[c00000000033cfc4] __scsi_scan_target+0x164/0x4f0
[c00000000033d430] scsi_scan_channel+0xe0/0x120
[c00000000033d598] scsi_scan_host_selected+0x128/0x1d0
[c000000000361c20] ibmvscsi_probe+0x270/0x400
[c0000000000367fc] vio_bus_probe+0x7c/0x90
[c000000000299a78] driver_probe_device+0x98/0x160
[c000000000299ce8] __driver_attach+0xa8/0xd0
[c000000000298938] bus_for_each_dev+0x88/0xe0
[c000000000299738] driver_attach+0x28/0x40
[c000000000299074] bus_add_driver+0xc4/0x200
[c00000000029a1cc] driver_register+0x5c/0x80
[c0000000000365b0] vio_register_driver+0x50/0x70
[c000000000565bac] ibmvscsi_module_init+0x1c/0x40

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 16:47       ` Anton Blanchard
@ 2005-09-13 17:32         ` James Bottomley
  2005-09-13 17:13           ` Anton Blanchard
  2005-09-13 17:33           ` Dipankar Sarma
  0 siblings, 2 replies; 21+ messages in thread
From: James Bottomley @ 2005-09-13 17:32 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 2005-09-14 at 02:47 +1000, Anton Blanchard wrote:
> Good idea, I wonder why the IO isnt completing.
> 
> Anton
> 
> [c0000000004945fc] schedule+0x63c/0xf70
> [c0000000004954c8] wait_for_completion+0xb8/0x140
> [c0000000002a3800] blk_execute_rq+0xb0/0x120
> [c000000000338a98] scsi_execute+0xf8/0x150
> [c000000000338bc0] scsi_execute_req+0xd0/0x140
> [c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0
> [c00000000033cfc4] __scsi_scan_target+0x164/0x4f0
> [c00000000033d430] scsi_scan_channel+0xe0/0x120
> [c00000000033d598] scsi_scan_host_selected+0x128/0x1d0
> [c000000000361c20] ibmvscsi_probe+0x270/0x400
> [c0000000000367fc] vio_bus_probe+0x7c/0x90
> [c000000000299a78] driver_probe_device+0x98/0x160
> [c000000000299ce8] __driver_attach+0xa8/0xd0
> [c000000000298938] bus_for_each_dev+0x88/0xe0
> [c000000000299738] driver_attach+0x28/0x40
> [c000000000299074] bus_add_driver+0xc4/0x200
> [c00000000029a1cc] driver_register+0x5c/0x80
> [c0000000000365b0] vio_register_driver+0x50/0x70
> [c000000000565bac] ibmvscsi_module_init+0x1c/0x40

That trace says the ibmvscsi driver (not sym2) has lost an I/O

James



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 17:32         ` James Bottomley
@ 2005-09-13 17:13           ` Anton Blanchard
  2005-09-13 17:33           ` Dipankar Sarma
  1 sibling, 0 replies; 21+ messages in thread
From: Anton Blanchard @ 2005-09-13 17:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel


> That trace says the ibmvscsi driver (not sym2) has lost an I/O

Yep, this is another machine. The sym2 box is hanging at boot also.
Ill get a backtrace on it next.

Anton

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 17:32         ` James Bottomley
  2005-09-13 17:13           ` Anton Blanchard
@ 2005-09-13 17:33           ` Dipankar Sarma
  1 sibling, 0 replies; 21+ messages in thread
From: Dipankar Sarma @ 2005-09-13 17:33 UTC (permalink / raw)
  To: James Bottomley; +Cc: Anton Blanchard, SCSI Mailing List, Linux Kernel

On Tue, Sep 13, 2005 at 12:32:40PM -0500, James Bottomley wrote:
> On Wed, 2005-09-14 at 02:47 +1000, Anton Blanchard wrote:
> > Good idea, I wonder why the IO isnt completing.
> > 
> > Anton
> > 
> > [c0000000004945fc] schedule+0x63c/0xf70
> > [c0000000004954c8] wait_for_completion+0xb8/0x140
> > [c0000000002a3800] blk_execute_rq+0xb0/0x120
> > [c000000000338a98] scsi_execute+0xf8/0x150
> > [c000000000338bc0] scsi_execute_req+0xd0/0x140
> > [c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0
> > [c00000000033cfc4] __scsi_scan_target+0x164/0x4f0
> > [c00000000033d430] scsi_scan_channel+0xe0/0x120
> > [c00000000033d598] scsi_scan_host_selected+0x128/0x1d0
> > [c000000000361c20] ibmvscsi_probe+0x270/0x400
> > [c0000000000367fc] vio_bus_probe+0x7c/0x90
> > [c000000000299a78] driver_probe_device+0x98/0x160
> > [c000000000299ce8] __driver_attach+0xa8/0xd0
> > [c000000000298938] bus_for_each_dev+0x88/0xe0
> > [c000000000299738] driver_attach+0x28/0x40
> > [c000000000299074] bus_add_driver+0xc4/0x200
> > [c00000000029a1cc] driver_register+0x5c/0x80
> > [c0000000000365b0] vio_register_driver+0x50/0x70
> > [c000000000565bac] ibmvscsi_module_init+0x1c/0x40
> 
> That trace says the ibmvscsi driver (not sym2) has lost an I/O

Anton's trace is from a different machine. Mine has symbios.
Backing out the scsi merge (Anton's backout patch) of a day ago
fixes my problem.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-13 16:35     ` James Bottomley
  2005-09-13 16:47       ` Anton Blanchard
@ 2005-09-14  8:06       ` Anton Blanchard
  2005-09-14 15:49         ` Alan Stern
  2005-09-14 16:57         ` James Bottomley
  1 sibling, 2 replies; 21+ messages in thread
From: Anton Blanchard @ 2005-09-14  8:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern

Hi,

> If that's the cause, it's probably a double down of the host scan
> semaphore somewhere in the code.  alt-sysrq-t should work in this case,
> can you get a stack trace of the blocked process?

It appears to be this patch:

  [SCSI] SCSI core: fix leakage of scsi_cmnd's

  From:         Alan Stern <stern@rowland.harvard.edu>

  This patch (as559b) adds a new routine, scsi_unprep_request, which
  gets called every place a request is requeued.  (That includes
  scsi_queue_insert as well as scsi_requeue_command.)  It also changes
  scsi_kill_requests to make it call __scsi_done with result equal to
  DID_NO_CONNECT << 16.  (I'm not sure if it's necessary to call
  scsi_init_cmd_errh here; maybe you can check on that.)  Finally, the
  patch changes the return value from scsi_end_request, to avoid
  returning a stale pointer in the case where the request was requeued.
  Fortunately the return value is used in only place, and the change
  actually simplified it.

And in particular it looks like the scsi_unprep_request in
scsi_queue_insert is causing it. The following patch fixes the boot
problems on the vscsi machine:

Index: build/drivers/scsi/scsi_lib.c
===================================================================
--- build.orig/drivers/scsi/scsi_lib.c	2005-09-14 18:23:34.000000000 +1000
+++ build/drivers/scsi/scsi_lib.c	2005-09-14 18:27:33.000000000 +1000
@@ -188,7 +188,7 @@
 	 * function.  The SCSI request function detects the blocked condition
 	 * and plugs the queue appropriately.
          */
-	scsi_unprep_request(req);
+	//scsi_unprep_request(req);
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14  8:06       ` Anton Blanchard
@ 2005-09-14 15:49         ` Alan Stern
  2005-09-14 16:52           ` Mike Christie
  2005-09-14 20:35           ` James Bottomley
  2005-09-14 16:57         ` James Bottomley
  1 sibling, 2 replies; 21+ messages in thread
From: Alan Stern @ 2005-09-14 15:49 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: James Bottomley, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 14 Sep 2005, Anton Blanchard wrote:

> Hi,
> 
> > If that's the cause, it's probably a double down of the host scan
> > semaphore somewhere in the code.  alt-sysrq-t should work in this case,
> > can you get a stack trace of the blocked process?
> 
> It appears to be this patch:
> 
>   [SCSI] SCSI core: fix leakage of scsi_cmnd's
> 
>   From:         Alan Stern <stern@rowland.harvard.edu>

> And in particular it looks like the scsi_unprep_request in
> scsi_queue_insert is causing it. The following patch fixes the boot
> problems on the vscsi machine:

In general the scsi_unprep_request routine is correct and needs to be
there.  The one part that might be questionable is the assignment to
req->special.  It may turn out that the real solution is to have
scsi_execute set req->special to NULL; I assumed it would be NULL already
but perhaps I was wrong.

(James, I see a possible problem with scsi_insert_special_req.  It adds to
the queue a request with REQ_DONTPREP set.  How can such a request, with
no associated scsi_cmnd, ever work?  Also, won't scsi_end_request and 
__scsi_release_request end up putting the same scsi_command twice?)

Here is a patch that addresses the first problem and fixes up a few other
loose ends.  Please see if it helps.

Alan Stern



Index: usb-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_lib.c
+++ usb-2.6/drivers/scsi/scsi_lib.c
@@ -116,7 +116,13 @@ static void scsi_unprep_request(struct r
 	struct scsi_cmnd *cmd = req->special;
 
 	req->flags &= ~REQ_DONTPREP;
-	req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL;
+	req->special = NULL;
+	if (req->flags & REQ_SPECIAL) {
+		struct scsi_request *sreq = cmd->sc_request;
+
+		if (sreq->sr_magic == SCSI_REQ_MAGIC)
+			req->special = sreq;
+	}
 
 	scsi_release_buffers(cmd);
 	scsi_put_command(cmd);
@@ -343,6 +349,7 @@ int scsi_execute(struct scsi_device *sde
 	req->sense_len = 0;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
+	req->special = NULL;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
@@ -1072,9 +1079,6 @@ static int scsi_init_io(struct scsi_cmnd
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
-	/* release the command and kill it */
-	scsi_release_buffers(cmd);
-	scsi_put_command(cmd);
 	return BLKPREP_KILL;
 }
 
@@ -1176,13 +1180,13 @@ static int scsi_prep_fn(struct request_q
 	if (req->flags & REQ_SPECIAL && req->special) {
 		struct scsi_request *sreq = req->special;
 
-		if (sreq->sr_magic == SCSI_REQ_MAGIC) {
-			cmd = scsi_get_command(sreq->sr_device, GFP_ATOMIC);
-			if (unlikely(!cmd))
-				goto defer;
-			scsi_init_cmd_from_req(cmd, sreq);
-		} else
-			cmd = req->special;
+		if (sreq->sr_magic != SCSI_REQ_MAGIC)
+			printk(KERN_ERR "invalid sr_magic in %s\n",
+					__FUNCTION__);
+		cmd = scsi_get_command(sreq->sr_device, GFP_ATOMIC);
+		if (unlikely(!cmd))
+			goto defer;
+		scsi_init_cmd_from_req(cmd, sreq);
 	} else if (req->flags & (REQ_CMD | REQ_BLOCK_PC)) {
 
 		if(unlikely(specials_only) && !(req->flags & REQ_SPECIAL)) {
@@ -1194,17 +1198,14 @@ static int scsi_prep_fn(struct request_q
 			       sdev->host->host_no, sdev->id, sdev->lun);
 			goto kill;
 		}
-			
 			
 		/*
 		 * Now try and find a command block that we can use.
 		 */
-		if (!req->special) {
-			cmd = scsi_get_command(sdev, GFP_ATOMIC);
-			if (unlikely(!cmd))
-				goto defer;
-		} else
-			cmd = req->special;
+		cmd = scsi_get_command(sdev, GFP_ATOMIC);
+		if (unlikely(!cmd))
+			goto defer;
+		cmd->sc_request = NULL;
 		
 		/* pull a tag out of the request if we have one */
 		cmd->tag = req->tag;
@@ -1250,7 +1251,7 @@ static int scsi_prep_fn(struct request_q
 		ret = scsi_init_io(cmd);
 		switch(ret) {
 		case BLKPREP_KILL:
-			/* BLKPREP_KILL return also releases the command */
+			scsi_unprep_request(req);
 			goto kill;
 		case BLKPREP_DEFER:
 			goto defer;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 15:49         ` Alan Stern
@ 2005-09-14 16:52           ` Mike Christie
  2005-09-14 16:53             ` Mike Christie
  2005-09-14 20:35           ` James Bottomley
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Christie @ 2005-09-14 16:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Anton Blanchard, James Bottomley, Dipankar Sarma,
	SCSI Mailing List, Linux Kernel

Alan Stern wrote:
> On Wed, 14 Sep 2005, Anton Blanchard wrote:
> 
> 
>>Hi,
>>
>>
>>>If that's the cause, it's probably a double down of the host scan
>>>semaphore somewhere in the code.  alt-sysrq-t should work in this case,
>>>can you get a stack trace of the blocked process?
>>
>>It appears to be this patch:
>>
>>  [SCSI] SCSI core: fix leakage of scsi_cmnd's
>>
>>  From:         Alan Stern <stern@rowland.harvard.edu>
> 
> 
>>And in particular it looks like the scsi_unprep_request in
>>scsi_queue_insert is causing it. The following patch fixes the boot
>>problems on the vscsi machine:
> 
> 
> In general the scsi_unprep_request routine is correct and needs to be
> there.  The one part that might be questionable is the assignment to
> req->special.  It may turn out that the real solution is to have
> scsi_execute set req->special to NULL; I assumed it would be NULL already
> but perhaps I was wrong.

I think we have scsi_execute and friends setting REQ_SPECIAL. This is 
could cause a problem becuase it does not have a scsi_request.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 16:52           ` Mike Christie
@ 2005-09-14 16:53             ` Mike Christie
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Christie @ 2005-09-14 16:53 UTC (permalink / raw)
  To: Mike Christie
  Cc: Alan Stern, Anton Blanchard, James Bottomley, Dipankar Sarma,
	SCSI Mailing List, Linux Kernel

Mike Christie wrote:
> Alan Stern wrote:
> 
>> On Wed, 14 Sep 2005, Anton Blanchard wrote:
>>
>>
>>> Hi,
>>>
>>>
>>>> If that's the cause, it's probably a double down of the host scan
>>>> semaphore somewhere in the code.  alt-sysrq-t should work in this case,
>>>> can you get a stack trace of the blocked process?
>>>
>>>
>>> It appears to be this patch:
>>>
>>>  [SCSI] SCSI core: fix leakage of scsi_cmnd's
>>>
>>>  From:         Alan Stern <stern@rowland.harvard.edu>
>>
>>
>>
>>> And in particular it looks like the scsi_unprep_request in
>>> scsi_queue_insert is causing it. The following patch fixes the boot
>>> problems on the vscsi machine:
>>
>>
>>
>> In general the scsi_unprep_request routine is correct and needs to be
>> there.  The one part that might be questionable is the assignment to
>> req->special.  It may turn out that the real solution is to have
>> scsi_execute set req->special to NULL; I assumed it would be NULL already
>> but perhaps I was wrong.
> 
> 
> I think we have scsi_execute and friends setting REQ_SPECIAL. This is 
> could cause a problem becuase it does not have a scsi_request.
> 

well now actually it won't becuase sc_request should be null for those 
scsi_execute block pc commands I think.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 15:49         ` Alan Stern
  2005-09-14 16:52           ` Mike Christie
@ 2005-09-14 20:35           ` James Bottomley
  1 sibling, 0 replies; 21+ messages in thread
From: James Bottomley @ 2005-09-14 20:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 2005-09-14 at 11:49 -0400, Alan Stern wrote:
> (James, I see a possible problem with scsi_insert_special_req.  It adds to
> the queue a request with REQ_DONTPREP set.  How can such a request, with
> no associated scsi_cmnd, ever work?  Also, won't scsi_end_request and 
> __scsi_release_request end up putting the same scsi_command twice?)

It's a historical anomaly which will hopefully die when we finally
manage to get sg and st converted to the generic request infrastructure.
Then scsi_request can be killed and this along with it.

What used to happen (as the comment implies) is that drivers would
allocate a single request and then reuse it for multiple independent
commands.  Since they weren't too picky about cleaning it up after each
use, we had to reset the DONTPREP flag to ensure each new invocation was
actually correctly prepared.

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14  8:06       ` Anton Blanchard
  2005-09-14 15:49         ` Alan Stern
@ 2005-09-14 16:57         ` James Bottomley
  2005-09-14 20:19           ` Alan Stern
  2005-09-16 10:28           ` Anton Blanchard
  1 sibling, 2 replies; 21+ messages in thread
From: James Bottomley @ 2005-09-14 16:57 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern

On Wed, 2005-09-14 at 18:06 +1000, Anton Blanchard wrote:
> And in particular it looks like the scsi_unprep_request in
> scsi_queue_insert is causing it. The following patch fixes the boot
> problems on the vscsi machine:

OK, my fault.  Your fix is almost correct .. I was going to do this
eventually, honest, because there's no need to unprep and reprep a
command that comes in through scsi_queue_insert().

However, I decided to leave it in to exercise the scsi_unprep_request()
path just to make sure it was working.  What's happening, I think, is
that we also use this path for retries.  Since we kill and reget the
command each time, the retries decrement is never seen, so we're
retrying forever.

This should be the correct reversal.

James
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -140,14 +140,12 @@ static void scsi_unprep_request(struct r
  *              commands.
  * Notes:       This could be called either from an interrupt context or a
  *              normal process context.
- * Notes:	Upon return, cmd is a stale pointer.
  */
 int scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
 	struct Scsi_Host *host = cmd->device->host;
 	struct scsi_device *device = cmd->device;
 	struct request_queue *q = device->request_queue;
-	struct request *req = cmd->request;
 	unsigned long flags;
 
 	SCSI_LOG_MLQUEUE(1,
@@ -188,9 +186,8 @@ int scsi_queue_insert(struct scsi_cmnd *
 	 * function.  The SCSI request function detects the blocked condition
 	 * and plugs the queue appropriately.
          */
-	scsi_unprep_request(req);
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_requeue_request(q, req);
+	blk_requeue_request(q, cmd->request);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 16:57         ` James Bottomley
@ 2005-09-14 20:19           ` Alan Stern
  2005-09-14 20:44             ` James Bottomley
  2005-09-16 10:28           ` Anton Blanchard
  1 sibling, 1 reply; 21+ messages in thread
From: Alan Stern @ 2005-09-14 20:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 14 Sep 2005, James Bottomley wrote:

> OK, my fault.  Your fix is almost correct .. I was going to do this
> eventually, honest, because there's no need to unprep and reprep a
> command that comes in through scsi_queue_insert().
> 
> However, I decided to leave it in to exercise the scsi_unprep_request()
> path just to make sure it was working.  What's happening, I think, is
> that we also use this path for retries.  Since we kill and reget the
> command each time, the retries decrement is never seen, so we're
> retrying forever.
> 
> This should be the correct reversal.

Then shouldn't you also avoid unprepping and reprepping a command that is
deferred because the host isn't ready?

And isn't it necessary to make sure that req->special is NULL when
submitting a special request with no scsi_request, and that
cmd->sc_request is NULL when associating a command block to a special
request with no scsi_request?

In short, is this patch needed?

Alan Stern



Index: usb-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_lib.c
+++ usb-2.6/drivers/scsi/scsi_lib.c
@@ -343,6 +343,7 @@ int scsi_execute(struct scsi_device *sde
 	req->sense_len = 0;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
+	req->special = NULL;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
@@ -1072,9 +1073,6 @@ static int scsi_init_io(struct scsi_cmnd
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
-	/* release the command and kill it */
-	scsi_release_buffers(cmd);
-	scsi_put_command(cmd);
 	return BLKPREP_KILL;
 }
 
@@ -1205,6 +1203,7 @@ static int scsi_prep_fn(struct request_q
 				goto defer;
 		} else
 			cmd = req->special;
+		cmd->sc_request = NULL;
 		
 		/* pull a tag out of the request if we have one */
 		cmd->tag = req->tag;
@@ -1250,7 +1249,7 @@ static int scsi_prep_fn(struct request_q
 		ret = scsi_init_io(cmd);
 		switch(ret) {
 		case BLKPREP_KILL:
-			/* BLKPREP_KILL return also releases the command */
+			scsi_unprep_request(req);
 			goto kill;
 		case BLKPREP_DEFER:
 			goto defer;
@@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque
 	 * cases (host limits or settings) should run the queue at some
 	 * later time.
 	 */
-	scsi_unprep_request(req);
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 20:19           ` Alan Stern
@ 2005-09-14 20:44             ` James Bottomley
  2005-09-14 21:33               ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: James Bottomley @ 2005-09-14 20:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 2005-09-14 at 16:19 -0400, Alan Stern wrote:
> On Wed, 14 Sep 2005, James Bottomley wrote:
> Then shouldn't you also avoid unprepping and reprepping a command that is
> deferred because the host isn't ready?

Yes ... really the only case for unprep is when we've partially released
the command (like in scsi_io_completion) where we need to tear the rest
of it down.

The rule should be that if it needs preparing, then it's in the same
state as the block layer would send it to us in (with no appendages).

For most requeues, we have all the prepared resources attached, so they
don't need tearing down and repreparing.

> And isn't it necessary to make sure that req->special is NULL when
> submitting a special request with no scsi_request, and that

Yes, but only if the command will be prepared again.

> cmd->sc_request is NULL when associating a command block to a special
> request with no scsi_request?

No, that's zeroed out when the command is allocated.  It's only set in
the path that sends down a scsi_request.

James



> In short, is this patch needed?
> 
> Alan Stern
> 
> 
> 
> Index: usb-2.6/drivers/scsi/scsi_lib.c
> ===================================================================
> --- usb-2.6.orig/drivers/scsi/scsi_lib.c
> +++ usb-2.6/drivers/scsi/scsi_lib.c
> @@ -343,6 +343,7 @@ int scsi_execute(struct scsi_device *sde
>  	req->sense_len = 0;
>  	req->timeout = timeout;
>  	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
> +	req->special = NULL;
>  
>  	/*
>  	 * head injection *required* here otherwise quiesce won't work
> @@ -1072,9 +1073,6 @@ static int scsi_init_io(struct scsi_cmnd
>  	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
>  			req->current_nr_sectors);
>  
> -	/* release the command and kill it */
> -	scsi_release_buffers(cmd);
> -	scsi_put_command(cmd);
>  	return BLKPREP_KILL;
>  }
>  
> @@ -1205,6 +1203,7 @@ static int scsi_prep_fn(struct request_q
>  				goto defer;
>  		} else
>  			cmd = req->special;
> +		cmd->sc_request = NULL;
>  		
>  		/* pull a tag out of the request if we have one */
>  		cmd->tag = req->tag;
> @@ -1250,7 +1249,7 @@ static int scsi_prep_fn(struct request_q
>  		ret = scsi_init_io(cmd);
>  		switch(ret) {
>  		case BLKPREP_KILL:
> -			/* BLKPREP_KILL return also releases the command */
> +			scsi_unprep_request(req);
>  			goto kill;
>  		case BLKPREP_DEFER:
>  			goto defer;
> @@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque
>  	 * cases (host limits or settings) should run the queue at some
>  	 * later time.
>  	 */
> -	scsi_unprep_request(req);
>  	spin_lock_irq(q->queue_lock);
>  	blk_requeue_request(q, req);
>  	sdev->device_busy--;
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 20:44             ` James Bottomley
@ 2005-09-14 21:33               ` Alan Stern
  2005-09-15 13:56                 ` James Bottomley
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2005-09-14 21:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 14 Sep 2005, James Bottomley wrote:

> On Wed, 2005-09-14 at 16:19 -0400, Alan Stern wrote:
> > On Wed, 14 Sep 2005, James Bottomley wrote:
> > Then shouldn't you also avoid unprepping and reprepping a command that is
> > deferred because the host isn't ready?
> 
> Yes ... really the only case for unprep is when we've partially released
> the command (like in scsi_io_completion) where we need to tear the rest
> of it down.

In other words, in scsi_requeue_command and nowhere else.

> The rule should be that if it needs preparing, then it's in the same
> state as the block layer would send it to us in (with no appendages).

That's what the unprep routine was supposed to accomplish.

> For most requeues, we have all the prepared resources attached, so they
> don't need tearing down and repreparing.
> 
> > And isn't it necessary to make sure that req->special is NULL when
> > submitting a special request with no scsi_request, and that
> 
> Yes, but only if the command will be prepared again.

Or will be prepared for the first time, as in scsi_execute.  As far as I 
can tell, a new struct request is not set to all 0's.  So if you queue a 
request with REQ_SPECIAL set and you fail to clear req->special, you're in 
trouble.  Do you have any idea why this hasn't been causing errors all 
along?

> > cmd->sc_request is NULL when associating a command block to a special
> > request with no scsi_request?
> 
> No, that's zeroed out when the command is allocated.  It's only set in
> the path that sends down a scsi_request.

Oops, yes.  I must have been reading __scsi_get_command instead of 
scsi_get_command.

Okay, then how does this patch look (moved the routine over to where it 
gets used, plus two real changes)?

Alan Stern



Index: usb-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_lib.c
+++ usb-2.6/drivers/scsi/scsi_lib.c
@@ -100,29 +100,6 @@ static void scsi_run_queue(struct reques
 static void scsi_release_buffers(struct scsi_cmnd *cmd);
 
 /*
- * Function:	scsi_unprep_request()
- *
- * Purpose:	Remove all preparation done for a request, including its
- *		associated scsi_cmnd, so that it can be requeued.
- *
- * Arguments:	req	- request to unprepare
- *
- * Lock status:	Assumed that no locks are held upon entry.
- *
- * Returns:	Nothing.
- */
-static void scsi_unprep_request(struct request *req)
-{
-	struct scsi_cmnd *cmd = req->special;
-
-	req->flags &= ~REQ_DONTPREP;
-	req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL;
-
-	scsi_release_buffers(cmd);
-	scsi_put_command(cmd);
-}
-
-/*
  * Function:    scsi_queue_insert()
  *
  * Purpose:     Insert a command in the midlevel queue.
@@ -343,6 +320,7 @@ int scsi_execute(struct scsi_device *sde
 	req->sense_len = 0;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
+	req->special = NULL;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
@@ -564,6 +542,29 @@ static void scsi_run_queue(struct reques
 }
 
 /*
+ * Function:	scsi_unprep_request()
+ *
+ * Purpose:	Remove all preparation done for a request, including its
+ *		associated scsi_cmnd, so that it can be requeued.
+ *
+ * Arguments:	req	- request to unprepare
+ *
+ * Lock status:	Assumed that no locks are held upon entry.
+ *
+ * Returns:	Nothing.
+ */
+static void scsi_unprep_request(struct request *req)
+{
+	struct scsi_cmnd *cmd = req->special;
+
+	req->flags &= ~REQ_DONTPREP;
+	req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL;
+
+	scsi_release_buffers(cmd);
+	scsi_put_command(cmd);
+}
+
+/*
  * Function:	scsi_requeue_command()
  *
  * Purpose:	Handle post-processing of completed commands.
@@ -1514,7 +1515,6 @@ static void scsi_request_fn(struct reque
 	 * cases (host limits or settings) should run the queue at some
 	 * later time.
 	 */
-	scsi_unprep_request(req);
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 21:33               ` Alan Stern
@ 2005-09-15 13:56                 ` James Bottomley
  2005-09-15 14:13                   ` Alan Stern
  2005-09-15 17:52                   ` Alan Stern
  0 siblings, 2 replies; 21+ messages in thread
From: James Bottomley @ 2005-09-15 13:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote:
> On Wed, 14 Sep 2005, James Bottomley wrote:
> > Yes ... really the only case for unprep is when we've partially released
> > the command (like in scsi_io_completion) where we need to tear the rest
> > of it down.
> 
> In other words, in scsi_requeue_command and nowhere else.

Pretty much, yes.

> Or will be prepared for the first time, as in scsi_execute.  As far as I 
> can tell, a new struct request is not set to all 0's.  So if you queue a 
> request with REQ_SPECIAL set and you fail to clear req->special, you're in 
> trouble.  Do you have any idea why this hasn't been causing errors all 
> along?

That's true, it's not.  However ll_rq_blk.c:rq_init() clears req-
>special (and initialises all other important fields).

> Okay, then how does this patch look (moved the routine over to where it 
> gets used, plus two real changes)?

Well ... under pressure to fix this in -mm, I already commited a version
to rc-fixes.  What I did was fully reverse the changes to the
scsi_insert_queue() [the patch I sent Anton].  We can move the unprep
function if you feel strongly about it, but I'm also happy to keep it
where it is.

James



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-15 13:56                 ` James Bottomley
@ 2005-09-15 14:13                   ` Alan Stern
  2005-09-15 17:52                   ` Alan Stern
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Stern @ 2005-09-15 14:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Thu, 15 Sep 2005, James Bottomley wrote:

> On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote:
> > On Wed, 14 Sep 2005, James Bottomley wrote:
> > > Yes ... really the only case for unprep is when we've partially released
> > > the command (like in scsi_io_completion) where we need to tear the rest
> > > of it down.
> > 
> > In other words, in scsi_requeue_command and nowhere else.
> 
> Pretty much, yes.
> 
> > Or will be prepared for the first time, as in scsi_execute.  As far as I 
> > can tell, a new struct request is not set to all 0's.  So if you queue a 
> > request with REQ_SPECIAL set and you fail to clear req->special, you're in 
> > trouble.  Do you have any idea why this hasn't been causing errors all 
> > along?
> 
> That's true, it's not.  However ll_rq_blk.c:rq_init() clears req-
> >special (and initialises all other important fields).

(*Sigh*...  I'm trying to do this too fast, not following up properly on 
all the code paths.)  Okay, good, glad to hear it.

> > Okay, then how does this patch look (moved the routine over to where it 
> > gets used, plus two real changes)?
> 
> Well ... under pressure to fix this in -mm, I already commited a version
> to rc-fixes.  What I did was fully reverse the changes to the
> scsi_insert_queue() [the patch I sent Anton].  We can move the unprep
> function if you feel strongly about it, but I'm also happy to keep it
> where it is.

I don't care where the function goes, so just leave it.

That leaves only the question of the call to scsi_unprep_request near the 
end of scsi_request_fn, in the not_ready: section.  Looks like that call 
isn't needed and can be taken out also, do you agree?

Alan Stern


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-15 13:56                 ` James Bottomley
  2005-09-15 14:13                   ` Alan Stern
@ 2005-09-15 17:52                   ` Alan Stern
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Stern @ 2005-09-15 17:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel

On Thu, 15 Sep 2005, James Bottomley wrote:

> On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote:
> > On Wed, 14 Sep 2005, James Bottomley wrote:
> > > Yes ... really the only case for unprep is when we've partially released
> > > the command (like in scsi_io_completion) where we need to tear the rest
> > > of it down.
> > 
> > In other words, in scsi_requeue_command and nowhere else.
> 
> Pretty much, yes.

I found one other thing that needs to be fixed.  The call to 
scsi_release_buffers in scsi_unprep_request causes an oops, because the 
sgtable has already been freed in scsi_io_completion.  The following patch 
is needed.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Index: usb-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_lib.c
+++ usb-2.6/drivers/scsi/scsi_lib.c
@@ -118,7 +118,6 @@ static void scsi_unprep_request(struct r
 	req->flags &= ~REQ_DONTPREP;
 	req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL;
 
-	scsi_release_buffers(cmd);
 	scsi_put_command(cmd);
 }
 
@@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque
 	 * cases (host limits or settings) should run the queue at some
 	 * later time.
 	 */
-	scsi_unprep_request(req);
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [2.6.14-rc1] sym scsi boot hang
  2005-09-14 16:57         ` James Bottomley
  2005-09-14 20:19           ` Alan Stern
@ 2005-09-16 10:28           ` Anton Blanchard
  1 sibling, 0 replies; 21+ messages in thread
From: Anton Blanchard @ 2005-09-16 10:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern


Hi,

> OK, my fault.  Your fix is almost correct .. I was going to do this
> eventually, honest, because there's no need to unprep and reprep a
> command that comes in through scsi_queue_insert().
> 
> However, I decided to leave it in to exercise the scsi_unprep_request()
> path just to make sure it was working.  What's happening, I think, is
> that we also use this path for retries.  Since we kill and reget the
> command each time, the retries decrement is never seen, so we're
> retrying forever.
> 
> This should be the correct reversal.

Thanks James, that did the trick.

Anton

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-09-16 10:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-13 12:48 [2.6.14-rc1] sym scsi boot hang Dipankar Sarma
2005-09-13 13:17 ` Anton Blanchard
2005-09-13 14:29   ` Anton Blanchard
2005-09-13 16:35     ` James Bottomley
2005-09-13 16:47       ` Anton Blanchard
2005-09-13 17:32         ` James Bottomley
2005-09-13 17:13           ` Anton Blanchard
2005-09-13 17:33           ` Dipankar Sarma
2005-09-14  8:06       ` Anton Blanchard
2005-09-14 15:49         ` Alan Stern
2005-09-14 16:52           ` Mike Christie
2005-09-14 16:53             ` Mike Christie
2005-09-14 20:35           ` James Bottomley
2005-09-14 16:57         ` James Bottomley
2005-09-14 20:19           ` Alan Stern
2005-09-14 20:44             ` James Bottomley
2005-09-14 21:33               ` Alan Stern
2005-09-15 13:56                 ` James Bottomley
2005-09-15 14:13                   ` Alan Stern
2005-09-15 17:52                   ` Alan Stern
2005-09-16 10:28           ` Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox