* [2.6.14-rc1] sym scsi boot hang @ 2005-09-13 12:48 Dipankar Sarma 2005-09-13 13:17 ` Anton Blanchard 0 siblings, 1 reply; 21+ messages in thread From: Dipankar Sarma @ 2005-09-13 12:48 UTC (permalink / raw) To: linux-scsi; +Cc: linux-kernel My ppc64 box refuses to boot with 2.6.14-rc1. The console log is included below. I see a lot of repeat of the last message in the log and then the box hangs. Any idea what might have caused this ? Thanks Dipankar sym0: <1010-66> rev 0x1 at pci 0001:01:01.0 irq 115 sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking sym0: SCSI BUS has been reset. scsi0 : sym-2.2.1 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:8: tagged command queuing enabled, command queue depth 16. target0:0:8: Beginning Domain Validation target0:0:8: asynchronous. target0:0:8: wide asynchronous. target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:8: Write Buffer failure 700ff target0:0:8: Domain Validation Disabing Information Units target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:8: Write Buffer failure 700ff target0:0:8: Domain Validation detected failure, dropping back target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:8: Ending Domain Validation target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:9: tagged command queuing enabled, command queue depth 16. target0:0:9: Beginning Domain Validation target0:0:9: asynchronous. target0:0:9: wide asynchronous. target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:9: Write Buffer failure 700ff target0:0:9: Domain Validation Disabing Information Units target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:9: Write Buffer failure 700ff target0:0:9: Domain Validation detected failure, dropping back target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:9: Ending Domain Validation target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:10: tagged command queuing enabled, command queue depth 16. target0:0:10: Beginning Domain Validation target0:0:10: asynchronous. target0:0:10: wide asynchronous. target0:0:10: Domain Validation skipping write tests target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:10: Domain Validation Disabing Information Units target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:10: Domain Validation detected failure, dropping back target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: Ending Domain Validation Vendor: IBM Model: HSBPM2 PU2SCSI Rev: 0016 Type: Enclosure ANSI SCSI revision: 02 target0:0:14: Beginning Domain Validation 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. target0:0:14: Ending Domain Validation Vendor: IBM Model: HSBPD4M PU3SCSI Rev: 0016 Type: Enclosure ANSI SCSI revision: 02 target0:0:15: Beginning Domain Validation 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. target0:0:15: Ending Domain Validation sym1: <1010-66> rev 0x1 at pci 0001:01:01.1 irq 116 sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking sym1: SCSI BUS has been reset. scsi1 : sym-2.2.1 sym2: <1010-66> rev 0x1 at pci 0001:41:01.0 irq 119 sym2: No NVRAM, ID 7, Fast-80, LVD, parity checking sym2: SCSI BUS has been reset. scsi2 : sym-2.2.1 sym3: <1010-66> rev 0x1 at pci 0001:41:01.1 irq 120 sym3: No NVRAM, ID 7, Fast-80, LVD, parity checking sym3: SCSI BUS has been reset. scsi3 : sym-2.2.1 st: Version 20050830, fixed bufsize 32768, s/g segs 256 SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sda: drive cache: write through SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 > Attached scsi disk sda at scsi0, channel 0, id 8, lun 0 SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sdb: drive cache: write through SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sdb: drive cache: write through sdb: sdb1 sdb2 Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) sdc: Spinning up disk....<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 12:48 [2.6.14-rc1] sym scsi boot hang Dipankar Sarma @ 2005-09-13 13:17 ` Anton Blanchard 2005-09-13 14:29 ` Anton Blanchard 0 siblings, 1 reply; 21+ messages in thread From: Anton Blanchard @ 2005-09-13 13:17 UTC (permalink / raw) To: Dipankar Sarma; +Cc: linux-scsi, linux-kernel Hi, > My ppc64 box refuses to boot with 2.6.14-rc1. The console log > is included below. I see a lot of repeat of the last message > in the log and then the box hangs. > > Any idea what might have caused this ? ... > Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0 > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > sdc: Spinning up disk....<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Looks like a change between 2.6.13-git11 and 2.6.14-rc1 caused this - so something in the last 24 hours. Anton ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 13:17 ` Anton Blanchard @ 2005-09-13 14:29 ` Anton Blanchard 2005-09-13 16:35 ` James Bottomley 0 siblings, 1 reply; 21+ messages in thread From: Anton Blanchard @ 2005-09-13 14:29 UTC (permalink / raw) To: Dipankar Sarma; +Cc: linux-scsi, linux-kernel Hi, > Looks like a change between 2.6.13-git11 and 2.6.14-rc1 caused this - so > something in the last 24 hours. I just noticed a similar hang on the ibmvscsi driver. The following backout patch seems to fix it (part of the scsi merge yesterday), I'll look closer after I get some sleep. Anton diff -urN build/drivers/scsi/scsi_lib.c build2/drivers/scsi/scsi_lib.c --- build/drivers/scsi/scsi_lib.c 2005-09-13 15:13:32.000000000 +1000 +++ build2/drivers/scsi/scsi_lib.c 2005-09-14 00:44:57.000000000 +1000 @@ -97,30 +97,6 @@ } static void scsi_run_queue(struct request_queue *q); -static void scsi_release_buffers(struct scsi_cmnd *cmd); - -/* - * Function: scsi_unprep_request() - * - * Purpose: Remove all preparation done for a request, including its - * associated scsi_cmnd, so that it can be requeued. - * - * Arguments: req - request to unprepare - * - * Lock status: Assumed that no locks are held upon entry. - * - * Returns: Nothing. - */ -static void scsi_unprep_request(struct request *req) -{ - struct scsi_cmnd *cmd = req->special; - - req->flags &= ~REQ_DONTPREP; - req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL; - - scsi_release_buffers(cmd); - scsi_put_command(cmd); -} /* * Function: scsi_queue_insert() @@ -140,14 +116,12 @@ * commands. * Notes: This could be called either from an interrupt context or a * normal process context. - * Notes: Upon return, cmd is a stale pointer. */ int scsi_queue_insert(struct scsi_cmnd *cmd, int reason) { struct Scsi_Host *host = cmd->device->host; struct scsi_device *device = cmd->device; struct request_queue *q = device->request_queue; - struct request *req = cmd->request; unsigned long flags; SCSI_LOG_MLQUEUE(1, @@ -188,9 +162,8 @@ * function. The SCSI request function detects the blocked condition * and plugs the queue appropriately. */ - scsi_unprep_request(req); spin_lock_irqsave(q->queue_lock, flags); - blk_requeue_request(q, req); + blk_requeue_request(q, cmd->request); spin_unlock_irqrestore(q->queue_lock, flags); scsi_run_queue(q); @@ -366,7 +339,7 @@ int result; if (sshdr) { - sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_NOIO); + sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL); if (!sense) return DRIVER_ERROR << 24; memset(sense, 0, SCSI_SENSE_BUFFERSIZE); @@ -579,16 +552,15 @@ * I/O errors in the middle of the request, in which case * we need to request the blocks that come after the bad * sector. - * Notes: Upon return, cmd is a stale pointer. */ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd) { - struct request *req = cmd->request; unsigned long flags; - scsi_unprep_request(req); + cmd->request->flags &= ~REQ_DONTPREP; + spin_lock_irqsave(q->queue_lock, flags); - blk_requeue_request(q, req); + blk_requeue_request(q, cmd->request); spin_unlock_irqrestore(q->queue_lock, flags); scsi_run_queue(q); @@ -623,14 +595,13 @@ * * Lock status: Assumed that lock is not held upon entry. * - * Returns: cmd if requeue required, NULL otherwise. + * Returns: cmd if requeue done or required, NULL otherwise * * Notes: This is called for block device requests in order to * mark some number of sectors as complete. * * We are guaranteeing that the request queue will be goosed * at some point during this call. - * Notes: If cmd was requeued, upon return it will be a stale pointer. */ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate, int bytes, int requeue) @@ -653,15 +624,14 @@ if (!uptodate && blk_noretry_request(req)) end_that_request_chunk(req, 0, leftover); else { - if (requeue) { + if (requeue) /* * Bleah. Leftovers again. Stick the * leftovers in the front of the * queue, and goose the queue again. */ scsi_requeue_command(q, cmd); - cmd = NULL; - } + return cmd; } } @@ -887,13 +857,15 @@ * requeueing right here - we will requeue down below * when we handle the bad sectors. */ + cmd = scsi_end_request(cmd, 1, good_bytes, result == 0); /* - * If the command completed without error, then either - * finish off the rest of the command, or start a new one. + * If the command completed without error, then either finish off the + * rest of the command, or start a new one. */ - if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL) + if (result == 0 || cmd == NULL ) { return; + } } /* * Now, if we were good little boys and girls, Santa left us a request @@ -908,7 +880,7 @@ * and quietly refuse further access. */ cmd->device->changed = 1; - scsi_end_request(cmd, 0, + cmd = scsi_end_request(cmd, 0, this_count, 1); return; } else { @@ -942,7 +914,7 @@ scsi_requeue_command(q, cmd); result = 0; } else { - scsi_end_request(cmd, 0, this_count, 1); + cmd = scsi_end_request(cmd, 0, this_count, 1); return; } break; @@ -959,7 +931,7 @@ dev_printk(KERN_INFO, &cmd->device->sdev_gendev, "Device not ready.\n"); - scsi_end_request(cmd, 0, this_count, 1); + cmd = scsi_end_request(cmd, 0, this_count, 1); return; case VOLUME_OVERFLOW: if (!(req->flags & REQ_QUIET)) { @@ -969,7 +941,7 @@ __scsi_print_command(cmd->data_cmnd); scsi_print_sense("", cmd); } - scsi_end_request(cmd, 0, block_bytes, 1); + cmd = scsi_end_request(cmd, 0, block_bytes, 1); return; default: break; @@ -1000,7 +972,7 @@ block_bytes = req->hard_cur_sectors << 9; if (!block_bytes) block_bytes = req->data_len; - scsi_end_request(cmd, 0, block_bytes, 1); + cmd = scsi_end_request(cmd, 0, block_bytes, 1); } } EXPORT_SYMBOL(scsi_io_completion); @@ -1146,7 +1118,7 @@ if (unlikely(!scsi_device_online(sdev))) { printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n", sdev->host->host_no, sdev->id, sdev->lun); - goto kill; + return BLKPREP_KILL; } if (unlikely(sdev->sdev_state != SDEV_RUNNING)) { /* OK, we're not in a running state don't prep @@ -1156,7 +1128,7 @@ * at all allowed down */ printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to dead device\n", sdev->host->host_no, sdev->id, sdev->lun); - goto kill; + return BLKPREP_KILL; } /* OK, we only allow special commands (i.e. not * user initiated ones */ @@ -1188,11 +1160,11 @@ if(unlikely(specials_only) && !(req->flags & REQ_SPECIAL)) { if(specials_only == SDEV_QUIESCE || specials_only == SDEV_BLOCK) - goto defer; + return BLKPREP_DEFER; printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to device being removed\n", sdev->host->host_no, sdev->id, sdev->lun); - goto kill; + return BLKPREP_KILL; } @@ -1210,7 +1182,7 @@ cmd->tag = req->tag; } else { blk_dump_rq_flags(req, "SCSI bad req"); - goto kill; + return BLKPREP_KILL; } /* note the overloading of req->special. When the tag @@ -1248,13 +1220,8 @@ * required). */ ret = scsi_init_io(cmd); - switch(ret) { - case BLKPREP_KILL: - /* BLKPREP_KILL return also releases the command */ - goto kill; - case BLKPREP_DEFER: - goto defer; - } + if (ret) /* BLKPREP_KILL return also releases the command */ + return ret; /* * Initialize the actual SCSI command for this request. @@ -1264,7 +1231,7 @@ if (unlikely(!drv->init_command(cmd))) { scsi_release_buffers(cmd); scsi_put_command(cmd); - goto kill; + return BLKPREP_KILL; } } else { memcpy(cmd->cmnd, req->cmd, sizeof(cmd->cmnd)); @@ -1295,9 +1262,6 @@ if (sdev->device_busy == 0) blk_plug_device(q); return BLKPREP_DEFER; - kill: - req->errors = DID_NO_CONNECT << 16; - return BLKPREP_KILL; } /* @@ -1372,24 +1336,19 @@ } /* - * Kill a request for a dead device + * Kill requests for a dead device */ -static void scsi_kill_request(struct request *req, request_queue_t *q) +static void scsi_kill_requests(request_queue_t *q) { - struct scsi_cmnd *cmd = req->special; - - blkdev_dequeue_request(req); + struct request *req; - if (unlikely(cmd == NULL)) { - printk(KERN_CRIT "impossible request in %s.\n", - __FUNCTION__); - BUG(); + while ((req = elv_next_request(q)) != NULL) { + blkdev_dequeue_request(req); + req->flags |= REQ_QUIET; + while (end_that_request_first(req, 0, req->nr_sectors)) + ; + end_that_request_last(req); } - - scsi_init_cmd_errh(cmd); - cmd->result = DID_NO_CONNECT << 16; - atomic_inc(&cmd->device->iorequest_cnt); - __scsi_done(cmd); } /* @@ -1412,8 +1371,7 @@ if (!sdev) { printk("scsi: killing requests for dead queue\n"); - while ((req = elv_next_request(q)) != NULL) - scsi_kill_request(req, q); + scsi_kill_requests(q); return; } @@ -1440,7 +1398,11 @@ if (unlikely(!scsi_device_online(sdev))) { printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n", sdev->host->host_no, sdev->id, sdev->lun); - scsi_kill_request(req, q); + blkdev_dequeue_request(req); + req->flags |= REQ_QUIET; + while (end_that_request_first(req, 0, req->nr_sectors)) + ; + end_that_request_last(req); continue; } @@ -1453,14 +1415,6 @@ sdev->device_busy++; spin_unlock(q->queue_lock); - cmd = req->special; - if (unlikely(cmd == NULL)) { - printk(KERN_CRIT "impossible request in %s.\n" - "please mail a stack trace to " - "linux-scsi@vger.kernel.org", - __FUNCTION__); - BUG(); - } spin_lock(shost->host_lock); if (!scsi_host_queue_ready(q, shost, sdev)) @@ -1479,6 +1433,15 @@ */ spin_unlock_irq(shost->host_lock); + cmd = req->special; + if (unlikely(cmd == NULL)) { + printk(KERN_CRIT "impossible request in %s.\n" + "please mail a stack trace to " + "linux-scsi@vger.kernel.org", + __FUNCTION__); + BUG(); + } + /* * Finally, initialize any error handling parameters, and set up * the timers for timeouts. @@ -1514,7 +1477,6 @@ * cases (host limits or settings) should run the queue at some * later time. */ - scsi_unprep_request(req); spin_lock_irq(q->queue_lock); blk_requeue_request(q, req); sdev->device_busy--; diff -urN build/drivers/scsi/scsi_priv.h build2/drivers/scsi/scsi_priv.h --- build/drivers/scsi/scsi_priv.h 2005-09-13 15:13:32.000000000 +1000 +++ build2/drivers/scsi/scsi_priv.h 2005-09-14 00:44:57.000000000 +1000 @@ -124,7 +124,6 @@ extern void scsi_sysfs_device_initialize(struct scsi_device *); extern int scsi_sysfs_target_initialize(struct scsi_device *); extern struct scsi_transport_template blank_transport_template; -extern void __scsi_remove_device(struct scsi_device *); extern struct bus_type scsi_bus_type; diff -urN build/drivers/scsi/scsi_scan.c build2/drivers/scsi/scsi_scan.c --- build/drivers/scsi/scsi_scan.c 2005-09-13 15:13:32.000000000 +1000 +++ build2/drivers/scsi/scsi_scan.c 2005-09-14 00:44:57.000000000 +1000 @@ -870,12 +870,8 @@ out_free_sdev: if (res == SCSI_SCAN_LUN_PRESENT) { if (sdevp) { - if (scsi_device_get(sdev) == 0) { - *sdevp = sdev; - } else { - __scsi_remove_device(sdev); - res = SCSI_SCAN_NO_RESPONSE; - } + scsi_device_get(sdev); + *sdevp = sdev; } } else { if (sdev->host->hostt->slave_destroy) @@ -1264,19 +1260,6 @@ } EXPORT_SYMBOL(__scsi_add_device); -int scsi_add_device(struct Scsi_Host *host, uint channel, - uint target, uint lun) -{ - struct scsi_device *sdev = - __scsi_add_device(host, channel, target, lun, NULL); - if (IS_ERR(sdev)) - return PTR_ERR(sdev); - - scsi_device_put(sdev); - return 0; -} -EXPORT_SYMBOL(scsi_add_device); - void scsi_rescan_device(struct device *dev) { struct scsi_driver *drv; @@ -1293,8 +1276,27 @@ } EXPORT_SYMBOL(scsi_rescan_device); -static void __scsi_scan_target(struct device *parent, unsigned int channel, - unsigned int id, unsigned int lun, int rescan) +/** + * scsi_scan_target - scan a target id, possibly including all LUNs on the + * target. + * @sdevsca: Scsi_Device handle for scanning + * @shost: host to scan + * @channel: channel to scan + * @id: target id to scan + * + * Description: + * Scan the target id on @shost, @channel, and @id. Scan at least LUN + * 0, and possibly all LUNs on the target id. + * + * Use the pre-allocated @sdevscan as a handle for the scanning. This + * function sets sdevscan->host, sdevscan->id and sdevscan->lun; the + * scanning functions modify sdevscan->lun. + * + * First try a REPORT LUN scan, if that does not scan the target, do a + * sequential scan of LUNs on the target id. + **/ +void scsi_scan_target(struct device *parent, unsigned int channel, + unsigned int id, unsigned int lun, int rescan) { struct Scsi_Host *shost = dev_to_shost(parent); int bflags = 0; @@ -1308,7 +1310,9 @@ */ return; + starget = scsi_alloc_target(parent, channel, id); + if (!starget) return; @@ -1354,33 +1358,6 @@ put_device(&starget->dev); } - -/** - * scsi_scan_target - scan a target id, possibly including all LUNs on the - * target. - * @parent: host to scan - * @channel: channel to scan - * @id: target id to scan - * @lun: Specific LUN to scan or SCAN_WILD_CARD - * @rescan: passed to LUN scanning routines - * - * Description: - * Scan the target id on @parent, @channel, and @id. Scan at least LUN 0, - * and possibly all LUNs on the target id. - * - * First try a REPORT LUN scan, if that does not scan the target, do a - * sequential scan of LUNs on the target id. - **/ -void scsi_scan_target(struct device *parent, unsigned int channel, - unsigned int id, unsigned int lun, int rescan) -{ - struct Scsi_Host *shost = dev_to_shost(parent); - - down(&shost->scan_mutex); - if (scsi_host_scan_allowed(shost)) - __scsi_scan_target(parent, channel, id, lun, rescan); - up(&shost->scan_mutex); -} EXPORT_SYMBOL(scsi_scan_target); static void scsi_scan_channel(struct Scsi_Host *shost, unsigned int channel, @@ -1406,12 +1383,10 @@ order_id = shost->max_id - id - 1; else order_id = id; - __scsi_scan_target(&shost->shost_gendev, channel, - order_id, lun, rescan); + scsi_scan_target(&shost->shost_gendev, channel, order_id, lun, rescan); } else - __scsi_scan_target(&shost->shost_gendev, channel, - id, lun, rescan); + scsi_scan_target(&shost->shost_gendev, channel, id, lun, rescan); } int scsi_scan_host_selected(struct Scsi_Host *shost, unsigned int channel, @@ -1509,15 +1484,12 @@ */ struct scsi_device *scsi_get_host_dev(struct Scsi_Host *shost) { - struct scsi_device *sdev = NULL; + struct scsi_device *sdev; struct scsi_target *starget; - down(&shost->scan_mutex); - if (!scsi_host_scan_allowed(shost)) - goto out; starget = scsi_alloc_target(&shost->shost_gendev, 0, shost->this_id); if (!starget) - goto out; + return NULL; sdev = scsi_alloc_sdev(starget, 0, NULL); if (sdev) { @@ -1525,8 +1497,6 @@ sdev->borken = 0; } put_device(&starget->dev); - out: - up(&shost->scan_mutex); return sdev; } EXPORT_SYMBOL(scsi_get_host_dev); diff -urN build/drivers/scsi/scsi_sysfs.c build2/drivers/scsi/scsi_sysfs.c --- build/drivers/scsi/scsi_sysfs.c 2005-09-13 15:13:32.000000000 +1000 +++ build2/drivers/scsi/scsi_sysfs.c 2005-09-14 00:44:57.000000000 +1000 @@ -653,7 +653,7 @@ error = attr_add(&sdev->sdev_gendev, sdev->host->hostt->sdev_attrs[i]); if (error) { - __scsi_remove_device(sdev); + scsi_remove_device(sdev); goto out; } } @@ -667,7 +667,7 @@ scsi_sysfs_sdev_attrs[i]); error = device_create_file(&sdev->sdev_gendev, attr); if (error) { - __scsi_remove_device(sdev); + scsi_remove_device(sdev); goto out; } } @@ -687,10 +687,17 @@ return error; } -void __scsi_remove_device(struct scsi_device *sdev) +/** + * scsi_remove_device - unregister a device from the scsi bus + * @sdev: scsi_device to unregister + **/ +void scsi_remove_device(struct scsi_device *sdev) { + struct Scsi_Host *shost = sdev->host; + + down(&shost->scan_mutex); if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) - return; + goto out; class_device_unregister(&sdev->sdev_classdev); device_del(&sdev->sdev_gendev); @@ -699,17 +706,8 @@ sdev->host->hostt->slave_destroy(sdev); transport_unregister_device(&sdev->sdev_gendev); put_device(&sdev->sdev_gendev); -} - -/** - * scsi_remove_device - unregister a device from the scsi bus - * @sdev: scsi_device to unregister - **/ -void scsi_remove_device(struct scsi_device *sdev) -{ - down(&sdev->host->scan_mutex); - __scsi_remove_device(sdev); - up(&sdev->host->scan_mutex); +out: + up(&shost->scan_mutex); } EXPORT_SYMBOL(scsi_remove_device); diff -urN build/include/scsi/scsi_device.h build2/include/scsi/scsi_device.h --- build/include/scsi/scsi_device.h 2005-09-13 15:13:32.000000000 +1000 +++ build2/include/scsi/scsi_device.h 2005-09-14 00:44:57.000000000 +1000 @@ -178,8 +178,8 @@ extern struct scsi_device *__scsi_add_device(struct Scsi_Host *, uint, uint, uint, void *hostdata); -extern int scsi_add_device(struct Scsi_Host *host, uint channel, - uint target, uint lun); +#define scsi_add_device(host, channel, target, lun) \ + __scsi_add_device(host, channel, target, lun, NULL) extern void scsi_remove_device(struct scsi_device *); extern int scsi_device_cancel(struct scsi_device *, int); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 14:29 ` Anton Blanchard @ 2005-09-13 16:35 ` James Bottomley 2005-09-13 16:47 ` Anton Blanchard 2005-09-14 8:06 ` Anton Blanchard 0 siblings, 2 replies; 21+ messages in thread From: James Bottomley @ 2005-09-13 16:35 UTC (permalink / raw) To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 2005-09-14 at 00:29 +1000, Anton Blanchard wrote: > I just noticed a similar hang on the ibmvscsi driver. The following > backout patch seems to fix it (part of the scsi merge yesterday), I'll > look closer after I get some sleep. If that's the cause, it's probably a double down of the host scan semaphore somewhere in the code. alt-sysrq-t should work in this case, can you get a stack trace of the blocked process? Thanks, James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 16:35 ` James Bottomley @ 2005-09-13 16:47 ` Anton Blanchard 2005-09-13 17:32 ` James Bottomley 2005-09-14 8:06 ` Anton Blanchard 1 sibling, 1 reply; 21+ messages in thread From: Anton Blanchard @ 2005-09-13 16:47 UTC (permalink / raw) To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel Hi, > If that's the cause, it's probably a double down of the host scan > semaphore somewhere in the code. alt-sysrq-t should work in this case, > can you get a stack trace of the blocked process? Good idea, I wonder why the IO isnt completing. Anton [c0000000004945fc] schedule+0x63c/0xf70 [c0000000004954c8] wait_for_completion+0xb8/0x140 [c0000000002a3800] blk_execute_rq+0xb0/0x120 [c000000000338a98] scsi_execute+0xf8/0x150 [c000000000338bc0] scsi_execute_req+0xd0/0x140 [c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0 [c00000000033cfc4] __scsi_scan_target+0x164/0x4f0 [c00000000033d430] scsi_scan_channel+0xe0/0x120 [c00000000033d598] scsi_scan_host_selected+0x128/0x1d0 [c000000000361c20] ibmvscsi_probe+0x270/0x400 [c0000000000367fc] vio_bus_probe+0x7c/0x90 [c000000000299a78] driver_probe_device+0x98/0x160 [c000000000299ce8] __driver_attach+0xa8/0xd0 [c000000000298938] bus_for_each_dev+0x88/0xe0 [c000000000299738] driver_attach+0x28/0x40 [c000000000299074] bus_add_driver+0xc4/0x200 [c00000000029a1cc] driver_register+0x5c/0x80 [c0000000000365b0] vio_register_driver+0x50/0x70 [c000000000565bac] ibmvscsi_module_init+0x1c/0x40 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 16:47 ` Anton Blanchard @ 2005-09-13 17:32 ` James Bottomley 2005-09-13 17:13 ` Anton Blanchard 2005-09-13 17:33 ` Dipankar Sarma 0 siblings, 2 replies; 21+ messages in thread From: James Bottomley @ 2005-09-13 17:32 UTC (permalink / raw) To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 2005-09-14 at 02:47 +1000, Anton Blanchard wrote: > Good idea, I wonder why the IO isnt completing. > > Anton > > [c0000000004945fc] schedule+0x63c/0xf70 > [c0000000004954c8] wait_for_completion+0xb8/0x140 > [c0000000002a3800] blk_execute_rq+0xb0/0x120 > [c000000000338a98] scsi_execute+0xf8/0x150 > [c000000000338bc0] scsi_execute_req+0xd0/0x140 > [c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0 > [c00000000033cfc4] __scsi_scan_target+0x164/0x4f0 > [c00000000033d430] scsi_scan_channel+0xe0/0x120 > [c00000000033d598] scsi_scan_host_selected+0x128/0x1d0 > [c000000000361c20] ibmvscsi_probe+0x270/0x400 > [c0000000000367fc] vio_bus_probe+0x7c/0x90 > [c000000000299a78] driver_probe_device+0x98/0x160 > [c000000000299ce8] __driver_attach+0xa8/0xd0 > [c000000000298938] bus_for_each_dev+0x88/0xe0 > [c000000000299738] driver_attach+0x28/0x40 > [c000000000299074] bus_add_driver+0xc4/0x200 > [c00000000029a1cc] driver_register+0x5c/0x80 > [c0000000000365b0] vio_register_driver+0x50/0x70 > [c000000000565bac] ibmvscsi_module_init+0x1c/0x40 That trace says the ibmvscsi driver (not sym2) has lost an I/O James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 17:32 ` James Bottomley @ 2005-09-13 17:13 ` Anton Blanchard 2005-09-13 17:33 ` Dipankar Sarma 1 sibling, 0 replies; 21+ messages in thread From: Anton Blanchard @ 2005-09-13 17:13 UTC (permalink / raw) To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel > That trace says the ibmvscsi driver (not sym2) has lost an I/O Yep, this is another machine. The sym2 box is hanging at boot also. Ill get a backtrace on it next. Anton ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 17:32 ` James Bottomley 2005-09-13 17:13 ` Anton Blanchard @ 2005-09-13 17:33 ` Dipankar Sarma 1 sibling, 0 replies; 21+ messages in thread From: Dipankar Sarma @ 2005-09-13 17:33 UTC (permalink / raw) To: James Bottomley; +Cc: Anton Blanchard, SCSI Mailing List, Linux Kernel On Tue, Sep 13, 2005 at 12:32:40PM -0500, James Bottomley wrote: > On Wed, 2005-09-14 at 02:47 +1000, Anton Blanchard wrote: > > Good idea, I wonder why the IO isnt completing. > > > > Anton > > > > [c0000000004945fc] schedule+0x63c/0xf70 > > [c0000000004954c8] wait_for_completion+0xb8/0x140 > > [c0000000002a3800] blk_execute_rq+0xb0/0x120 > > [c000000000338a98] scsi_execute+0xf8/0x150 > > [c000000000338bc0] scsi_execute_req+0xd0/0x140 > > [c00000000033bce4] scsi_probe_and_add_lun+0x204/0x9f0 > > [c00000000033cfc4] __scsi_scan_target+0x164/0x4f0 > > [c00000000033d430] scsi_scan_channel+0xe0/0x120 > > [c00000000033d598] scsi_scan_host_selected+0x128/0x1d0 > > [c000000000361c20] ibmvscsi_probe+0x270/0x400 > > [c0000000000367fc] vio_bus_probe+0x7c/0x90 > > [c000000000299a78] driver_probe_device+0x98/0x160 > > [c000000000299ce8] __driver_attach+0xa8/0xd0 > > [c000000000298938] bus_for_each_dev+0x88/0xe0 > > [c000000000299738] driver_attach+0x28/0x40 > > [c000000000299074] bus_add_driver+0xc4/0x200 > > [c00000000029a1cc] driver_register+0x5c/0x80 > > [c0000000000365b0] vio_register_driver+0x50/0x70 > > [c000000000565bac] ibmvscsi_module_init+0x1c/0x40 > > That trace says the ibmvscsi driver (not sym2) has lost an I/O Anton's trace is from a different machine. Mine has symbios. Backing out the scsi merge (Anton's backout patch) of a day ago fixes my problem. Thanks Dipankar ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-13 16:35 ` James Bottomley 2005-09-13 16:47 ` Anton Blanchard @ 2005-09-14 8:06 ` Anton Blanchard 2005-09-14 15:49 ` Alan Stern 2005-09-14 16:57 ` James Bottomley 1 sibling, 2 replies; 21+ messages in thread From: Anton Blanchard @ 2005-09-14 8:06 UTC (permalink / raw) To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern Hi, > If that's the cause, it's probably a double down of the host scan > semaphore somewhere in the code. alt-sysrq-t should work in this case, > can you get a stack trace of the blocked process? It appears to be this patch: [SCSI] SCSI core: fix leakage of scsi_cmnd's From: Alan Stern <stern@rowland.harvard.edu> This patch (as559b) adds a new routine, scsi_unprep_request, which gets called every place a request is requeued. (That includes scsi_queue_insert as well as scsi_requeue_command.) It also changes scsi_kill_requests to make it call __scsi_done with result equal to DID_NO_CONNECT << 16. (I'm not sure if it's necessary to call scsi_init_cmd_errh here; maybe you can check on that.) Finally, the patch changes the return value from scsi_end_request, to avoid returning a stale pointer in the case where the request was requeued. Fortunately the return value is used in only place, and the change actually simplified it. And in particular it looks like the scsi_unprep_request in scsi_queue_insert is causing it. The following patch fixes the boot problems on the vscsi machine: Index: build/drivers/scsi/scsi_lib.c =================================================================== --- build.orig/drivers/scsi/scsi_lib.c 2005-09-14 18:23:34.000000000 +1000 +++ build/drivers/scsi/scsi_lib.c 2005-09-14 18:27:33.000000000 +1000 @@ -188,7 +188,7 @@ * function. The SCSI request function detects the blocked condition * and plugs the queue appropriately. */ - scsi_unprep_request(req); + //scsi_unprep_request(req); spin_lock_irqsave(q->queue_lock, flags); blk_requeue_request(q, req); spin_unlock_irqrestore(q->queue_lock, flags); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 8:06 ` Anton Blanchard @ 2005-09-14 15:49 ` Alan Stern 2005-09-14 16:52 ` Mike Christie 2005-09-14 20:35 ` James Bottomley 2005-09-14 16:57 ` James Bottomley 1 sibling, 2 replies; 21+ messages in thread From: Alan Stern @ 2005-09-14 15:49 UTC (permalink / raw) To: Anton Blanchard Cc: James Bottomley, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 14 Sep 2005, Anton Blanchard wrote: > Hi, > > > If that's the cause, it's probably a double down of the host scan > > semaphore somewhere in the code. alt-sysrq-t should work in this case, > > can you get a stack trace of the blocked process? > > It appears to be this patch: > > [SCSI] SCSI core: fix leakage of scsi_cmnd's > > From: Alan Stern <stern@rowland.harvard.edu> > And in particular it looks like the scsi_unprep_request in > scsi_queue_insert is causing it. The following patch fixes the boot > problems on the vscsi machine: In general the scsi_unprep_request routine is correct and needs to be there. The one part that might be questionable is the assignment to req->special. It may turn out that the real solution is to have scsi_execute set req->special to NULL; I assumed it would be NULL already but perhaps I was wrong. (James, I see a possible problem with scsi_insert_special_req. It adds to the queue a request with REQ_DONTPREP set. How can such a request, with no associated scsi_cmnd, ever work? Also, won't scsi_end_request and __scsi_release_request end up putting the same scsi_command twice?) Here is a patch that addresses the first problem and fixes up a few other loose ends. Please see if it helps. Alan Stern Index: usb-2.6/drivers/scsi/scsi_lib.c =================================================================== --- usb-2.6.orig/drivers/scsi/scsi_lib.c +++ usb-2.6/drivers/scsi/scsi_lib.c @@ -116,7 +116,13 @@ static void scsi_unprep_request(struct r struct scsi_cmnd *cmd = req->special; req->flags &= ~REQ_DONTPREP; - req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL; + req->special = NULL; + if (req->flags & REQ_SPECIAL) { + struct scsi_request *sreq = cmd->sc_request; + + if (sreq->sr_magic == SCSI_REQ_MAGIC) + req->special = sreq; + } scsi_release_buffers(cmd); scsi_put_command(cmd); @@ -343,6 +349,7 @@ int scsi_execute(struct scsi_device *sde req->sense_len = 0; req->timeout = timeout; req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; + req->special = NULL; /* * head injection *required* here otherwise quiesce won't work @@ -1072,9 +1079,6 @@ static int scsi_init_io(struct scsi_cmnd printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors, req->current_nr_sectors); - /* release the command and kill it */ - scsi_release_buffers(cmd); - scsi_put_command(cmd); return BLKPREP_KILL; } @@ -1176,13 +1180,13 @@ static int scsi_prep_fn(struct request_q if (req->flags & REQ_SPECIAL && req->special) { struct scsi_request *sreq = req->special; - if (sreq->sr_magic == SCSI_REQ_MAGIC) { - cmd = scsi_get_command(sreq->sr_device, GFP_ATOMIC); - if (unlikely(!cmd)) - goto defer; - scsi_init_cmd_from_req(cmd, sreq); - } else - cmd = req->special; + if (sreq->sr_magic != SCSI_REQ_MAGIC) + printk(KERN_ERR "invalid sr_magic in %s\n", + __FUNCTION__); + cmd = scsi_get_command(sreq->sr_device, GFP_ATOMIC); + if (unlikely(!cmd)) + goto defer; + scsi_init_cmd_from_req(cmd, sreq); } else if (req->flags & (REQ_CMD | REQ_BLOCK_PC)) { if(unlikely(specials_only) && !(req->flags & REQ_SPECIAL)) { @@ -1194,17 +1198,14 @@ static int scsi_prep_fn(struct request_q sdev->host->host_no, sdev->id, sdev->lun); goto kill; } - /* * Now try and find a command block that we can use. */ - if (!req->special) { - cmd = scsi_get_command(sdev, GFP_ATOMIC); - if (unlikely(!cmd)) - goto defer; - } else - cmd = req->special; + cmd = scsi_get_command(sdev, GFP_ATOMIC); + if (unlikely(!cmd)) + goto defer; + cmd->sc_request = NULL; /* pull a tag out of the request if we have one */ cmd->tag = req->tag; @@ -1250,7 +1251,7 @@ static int scsi_prep_fn(struct request_q ret = scsi_init_io(cmd); switch(ret) { case BLKPREP_KILL: - /* BLKPREP_KILL return also releases the command */ + scsi_unprep_request(req); goto kill; case BLKPREP_DEFER: goto defer; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 15:49 ` Alan Stern @ 2005-09-14 16:52 ` Mike Christie 2005-09-14 16:53 ` Mike Christie 2005-09-14 20:35 ` James Bottomley 1 sibling, 1 reply; 21+ messages in thread From: Mike Christie @ 2005-09-14 16:52 UTC (permalink / raw) To: Alan Stern Cc: Anton Blanchard, James Bottomley, Dipankar Sarma, SCSI Mailing List, Linux Kernel Alan Stern wrote: > On Wed, 14 Sep 2005, Anton Blanchard wrote: > > >>Hi, >> >> >>>If that's the cause, it's probably a double down of the host scan >>>semaphore somewhere in the code. alt-sysrq-t should work in this case, >>>can you get a stack trace of the blocked process? >> >>It appears to be this patch: >> >> [SCSI] SCSI core: fix leakage of scsi_cmnd's >> >> From: Alan Stern <stern@rowland.harvard.edu> > > >>And in particular it looks like the scsi_unprep_request in >>scsi_queue_insert is causing it. The following patch fixes the boot >>problems on the vscsi machine: > > > In general the scsi_unprep_request routine is correct and needs to be > there. The one part that might be questionable is the assignment to > req->special. It may turn out that the real solution is to have > scsi_execute set req->special to NULL; I assumed it would be NULL already > but perhaps I was wrong. I think we have scsi_execute and friends setting REQ_SPECIAL. This is could cause a problem becuase it does not have a scsi_request. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 16:52 ` Mike Christie @ 2005-09-14 16:53 ` Mike Christie 0 siblings, 0 replies; 21+ messages in thread From: Mike Christie @ 2005-09-14 16:53 UTC (permalink / raw) To: Mike Christie Cc: Alan Stern, Anton Blanchard, James Bottomley, Dipankar Sarma, SCSI Mailing List, Linux Kernel Mike Christie wrote: > Alan Stern wrote: > >> On Wed, 14 Sep 2005, Anton Blanchard wrote: >> >> >>> Hi, >>> >>> >>>> If that's the cause, it's probably a double down of the host scan >>>> semaphore somewhere in the code. alt-sysrq-t should work in this case, >>>> can you get a stack trace of the blocked process? >>> >>> >>> It appears to be this patch: >>> >>> [SCSI] SCSI core: fix leakage of scsi_cmnd's >>> >>> From: Alan Stern <stern@rowland.harvard.edu> >> >> >> >>> And in particular it looks like the scsi_unprep_request in >>> scsi_queue_insert is causing it. The following patch fixes the boot >>> problems on the vscsi machine: >> >> >> >> In general the scsi_unprep_request routine is correct and needs to be >> there. The one part that might be questionable is the assignment to >> req->special. It may turn out that the real solution is to have >> scsi_execute set req->special to NULL; I assumed it would be NULL already >> but perhaps I was wrong. > > > I think we have scsi_execute and friends setting REQ_SPECIAL. This is > could cause a problem becuase it does not have a scsi_request. > well now actually it won't becuase sc_request should be null for those scsi_execute block pc commands I think. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 15:49 ` Alan Stern 2005-09-14 16:52 ` Mike Christie @ 2005-09-14 20:35 ` James Bottomley 1 sibling, 0 replies; 21+ messages in thread From: James Bottomley @ 2005-09-14 20:35 UTC (permalink / raw) To: Alan Stern Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 2005-09-14 at 11:49 -0400, Alan Stern wrote: > (James, I see a possible problem with scsi_insert_special_req. It adds to > the queue a request with REQ_DONTPREP set. How can such a request, with > no associated scsi_cmnd, ever work? Also, won't scsi_end_request and > __scsi_release_request end up putting the same scsi_command twice?) It's a historical anomaly which will hopefully die when we finally manage to get sg and st converted to the generic request infrastructure. Then scsi_request can be killed and this along with it. What used to happen (as the comment implies) is that drivers would allocate a single request and then reuse it for multiple independent commands. Since they weren't too picky about cleaning it up after each use, we had to reset the DONTPREP flag to ensure each new invocation was actually correctly prepared. James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 8:06 ` Anton Blanchard 2005-09-14 15:49 ` Alan Stern @ 2005-09-14 16:57 ` James Bottomley 2005-09-14 20:19 ` Alan Stern 2005-09-16 10:28 ` Anton Blanchard 1 sibling, 2 replies; 21+ messages in thread From: James Bottomley @ 2005-09-14 16:57 UTC (permalink / raw) To: Anton Blanchard; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern On Wed, 2005-09-14 at 18:06 +1000, Anton Blanchard wrote: > And in particular it looks like the scsi_unprep_request in > scsi_queue_insert is causing it. The following patch fixes the boot > problems on the vscsi machine: OK, my fault. Your fix is almost correct .. I was going to do this eventually, honest, because there's no need to unprep and reprep a command that comes in through scsi_queue_insert(). However, I decided to leave it in to exercise the scsi_unprep_request() path just to make sure it was working. What's happening, I think, is that we also use this path for retries. Since we kill and reget the command each time, the retries decrement is never seen, so we're retrying forever. This should be the correct reversal. James diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -140,14 +140,12 @@ static void scsi_unprep_request(struct r * commands. * Notes: This could be called either from an interrupt context or a * normal process context. - * Notes: Upon return, cmd is a stale pointer. */ int scsi_queue_insert(struct scsi_cmnd *cmd, int reason) { struct Scsi_Host *host = cmd->device->host; struct scsi_device *device = cmd->device; struct request_queue *q = device->request_queue; - struct request *req = cmd->request; unsigned long flags; SCSI_LOG_MLQUEUE(1, @@ -188,9 +186,8 @@ int scsi_queue_insert(struct scsi_cmnd * * function. The SCSI request function detects the blocked condition * and plugs the queue appropriately. */ - scsi_unprep_request(req); spin_lock_irqsave(q->queue_lock, flags); - blk_requeue_request(q, req); + blk_requeue_request(q, cmd->request); spin_unlock_irqrestore(q->queue_lock, flags); scsi_run_queue(q); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 16:57 ` James Bottomley @ 2005-09-14 20:19 ` Alan Stern 2005-09-14 20:44 ` James Bottomley 2005-09-16 10:28 ` Anton Blanchard 1 sibling, 1 reply; 21+ messages in thread From: Alan Stern @ 2005-09-14 20:19 UTC (permalink / raw) To: James Bottomley Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 14 Sep 2005, James Bottomley wrote: > OK, my fault. Your fix is almost correct .. I was going to do this > eventually, honest, because there's no need to unprep and reprep a > command that comes in through scsi_queue_insert(). > > However, I decided to leave it in to exercise the scsi_unprep_request() > path just to make sure it was working. What's happening, I think, is > that we also use this path for retries. Since we kill and reget the > command each time, the retries decrement is never seen, so we're > retrying forever. > > This should be the correct reversal. Then shouldn't you also avoid unprepping and reprepping a command that is deferred because the host isn't ready? And isn't it necessary to make sure that req->special is NULL when submitting a special request with no scsi_request, and that cmd->sc_request is NULL when associating a command block to a special request with no scsi_request? In short, is this patch needed? Alan Stern Index: usb-2.6/drivers/scsi/scsi_lib.c =================================================================== --- usb-2.6.orig/drivers/scsi/scsi_lib.c +++ usb-2.6/drivers/scsi/scsi_lib.c @@ -343,6 +343,7 @@ int scsi_execute(struct scsi_device *sde req->sense_len = 0; req->timeout = timeout; req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; + req->special = NULL; /* * head injection *required* here otherwise quiesce won't work @@ -1072,9 +1073,6 @@ static int scsi_init_io(struct scsi_cmnd printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors, req->current_nr_sectors); - /* release the command and kill it */ - scsi_release_buffers(cmd); - scsi_put_command(cmd); return BLKPREP_KILL; } @@ -1205,6 +1203,7 @@ static int scsi_prep_fn(struct request_q goto defer; } else cmd = req->special; + cmd->sc_request = NULL; /* pull a tag out of the request if we have one */ cmd->tag = req->tag; @@ -1250,7 +1249,7 @@ static int scsi_prep_fn(struct request_q ret = scsi_init_io(cmd); switch(ret) { case BLKPREP_KILL: - /* BLKPREP_KILL return also releases the command */ + scsi_unprep_request(req); goto kill; case BLKPREP_DEFER: goto defer; @@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque * cases (host limits or settings) should run the queue at some * later time. */ - scsi_unprep_request(req); spin_lock_irq(q->queue_lock); blk_requeue_request(q, req); sdev->device_busy--; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 20:19 ` Alan Stern @ 2005-09-14 20:44 ` James Bottomley 2005-09-14 21:33 ` Alan Stern 0 siblings, 1 reply; 21+ messages in thread From: James Bottomley @ 2005-09-14 20:44 UTC (permalink / raw) To: Alan Stern Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 2005-09-14 at 16:19 -0400, Alan Stern wrote: > On Wed, 14 Sep 2005, James Bottomley wrote: > Then shouldn't you also avoid unprepping and reprepping a command that is > deferred because the host isn't ready? Yes ... really the only case for unprep is when we've partially released the command (like in scsi_io_completion) where we need to tear the rest of it down. The rule should be that if it needs preparing, then it's in the same state as the block layer would send it to us in (with no appendages). For most requeues, we have all the prepared resources attached, so they don't need tearing down and repreparing. > And isn't it necessary to make sure that req->special is NULL when > submitting a special request with no scsi_request, and that Yes, but only if the command will be prepared again. > cmd->sc_request is NULL when associating a command block to a special > request with no scsi_request? No, that's zeroed out when the command is allocated. It's only set in the path that sends down a scsi_request. James > In short, is this patch needed? > > Alan Stern > > > > Index: usb-2.6/drivers/scsi/scsi_lib.c > =================================================================== > --- usb-2.6.orig/drivers/scsi/scsi_lib.c > +++ usb-2.6/drivers/scsi/scsi_lib.c > @@ -343,6 +343,7 @@ int scsi_execute(struct scsi_device *sde > req->sense_len = 0; > req->timeout = timeout; > req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; > + req->special = NULL; > > /* > * head injection *required* here otherwise quiesce won't work > @@ -1072,9 +1073,6 @@ static int scsi_init_io(struct scsi_cmnd > printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors, > req->current_nr_sectors); > > - /* release the command and kill it */ > - scsi_release_buffers(cmd); > - scsi_put_command(cmd); > return BLKPREP_KILL; > } > > @@ -1205,6 +1203,7 @@ static int scsi_prep_fn(struct request_q > goto defer; > } else > cmd = req->special; > + cmd->sc_request = NULL; > > /* pull a tag out of the request if we have one */ > cmd->tag = req->tag; > @@ -1250,7 +1249,7 @@ static int scsi_prep_fn(struct request_q > ret = scsi_init_io(cmd); > switch(ret) { > case BLKPREP_KILL: > - /* BLKPREP_KILL return also releases the command */ > + scsi_unprep_request(req); > goto kill; > case BLKPREP_DEFER: > goto defer; > @@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque > * cases (host limits or settings) should run the queue at some > * later time. > */ > - scsi_unprep_request(req); > spin_lock_irq(q->queue_lock); > blk_requeue_request(q, req); > sdev->device_busy--; > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 20:44 ` James Bottomley @ 2005-09-14 21:33 ` Alan Stern 2005-09-15 13:56 ` James Bottomley 0 siblings, 1 reply; 21+ messages in thread From: Alan Stern @ 2005-09-14 21:33 UTC (permalink / raw) To: James Bottomley Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 14 Sep 2005, James Bottomley wrote: > On Wed, 2005-09-14 at 16:19 -0400, Alan Stern wrote: > > On Wed, 14 Sep 2005, James Bottomley wrote: > > Then shouldn't you also avoid unprepping and reprepping a command that is > > deferred because the host isn't ready? > > Yes ... really the only case for unprep is when we've partially released > the command (like in scsi_io_completion) where we need to tear the rest > of it down. In other words, in scsi_requeue_command and nowhere else. > The rule should be that if it needs preparing, then it's in the same > state as the block layer would send it to us in (with no appendages). That's what the unprep routine was supposed to accomplish. > For most requeues, we have all the prepared resources attached, so they > don't need tearing down and repreparing. > > > And isn't it necessary to make sure that req->special is NULL when > > submitting a special request with no scsi_request, and that > > Yes, but only if the command will be prepared again. Or will be prepared for the first time, as in scsi_execute. As far as I can tell, a new struct request is not set to all 0's. So if you queue a request with REQ_SPECIAL set and you fail to clear req->special, you're in trouble. Do you have any idea why this hasn't been causing errors all along? > > cmd->sc_request is NULL when associating a command block to a special > > request with no scsi_request? > > No, that's zeroed out when the command is allocated. It's only set in > the path that sends down a scsi_request. Oops, yes. I must have been reading __scsi_get_command instead of scsi_get_command. Okay, then how does this patch look (moved the routine over to where it gets used, plus two real changes)? Alan Stern Index: usb-2.6/drivers/scsi/scsi_lib.c =================================================================== --- usb-2.6.orig/drivers/scsi/scsi_lib.c +++ usb-2.6/drivers/scsi/scsi_lib.c @@ -100,29 +100,6 @@ static void scsi_run_queue(struct reques static void scsi_release_buffers(struct scsi_cmnd *cmd); /* - * Function: scsi_unprep_request() - * - * Purpose: Remove all preparation done for a request, including its - * associated scsi_cmnd, so that it can be requeued. - * - * Arguments: req - request to unprepare - * - * Lock status: Assumed that no locks are held upon entry. - * - * Returns: Nothing. - */ -static void scsi_unprep_request(struct request *req) -{ - struct scsi_cmnd *cmd = req->special; - - req->flags &= ~REQ_DONTPREP; - req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL; - - scsi_release_buffers(cmd); - scsi_put_command(cmd); -} - -/* * Function: scsi_queue_insert() * * Purpose: Insert a command in the midlevel queue. @@ -343,6 +320,7 @@ int scsi_execute(struct scsi_device *sde req->sense_len = 0; req->timeout = timeout; req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; + req->special = NULL; /* * head injection *required* here otherwise quiesce won't work @@ -564,6 +542,29 @@ static void scsi_run_queue(struct reques } /* + * Function: scsi_unprep_request() + * + * Purpose: Remove all preparation done for a request, including its + * associated scsi_cmnd, so that it can be requeued. + * + * Arguments: req - request to unprepare + * + * Lock status: Assumed that no locks are held upon entry. + * + * Returns: Nothing. + */ +static void scsi_unprep_request(struct request *req) +{ + struct scsi_cmnd *cmd = req->special; + + req->flags &= ~REQ_DONTPREP; + req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL; + + scsi_release_buffers(cmd); + scsi_put_command(cmd); +} + +/* * Function: scsi_requeue_command() * * Purpose: Handle post-processing of completed commands. @@ -1514,7 +1515,6 @@ static void scsi_request_fn(struct reque * cases (host limits or settings) should run the queue at some * later time. */ - scsi_unprep_request(req); spin_lock_irq(q->queue_lock); blk_requeue_request(q, req); sdev->device_busy--; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 21:33 ` Alan Stern @ 2005-09-15 13:56 ` James Bottomley 2005-09-15 14:13 ` Alan Stern 2005-09-15 17:52 ` Alan Stern 0 siblings, 2 replies; 21+ messages in thread From: James Bottomley @ 2005-09-15 13:56 UTC (permalink / raw) To: Alan Stern Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote: > On Wed, 14 Sep 2005, James Bottomley wrote: > > Yes ... really the only case for unprep is when we've partially released > > the command (like in scsi_io_completion) where we need to tear the rest > > of it down. > > In other words, in scsi_requeue_command and nowhere else. Pretty much, yes. > Or will be prepared for the first time, as in scsi_execute. As far as I > can tell, a new struct request is not set to all 0's. So if you queue a > request with REQ_SPECIAL set and you fail to clear req->special, you're in > trouble. Do you have any idea why this hasn't been causing errors all > along? That's true, it's not. However ll_rq_blk.c:rq_init() clears req- >special (and initialises all other important fields). > Okay, then how does this patch look (moved the routine over to where it > gets used, plus two real changes)? Well ... under pressure to fix this in -mm, I already commited a version to rc-fixes. What I did was fully reverse the changes to the scsi_insert_queue() [the patch I sent Anton]. We can move the unprep function if you feel strongly about it, but I'm also happy to keep it where it is. James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-15 13:56 ` James Bottomley @ 2005-09-15 14:13 ` Alan Stern 2005-09-15 17:52 ` Alan Stern 1 sibling, 0 replies; 21+ messages in thread From: Alan Stern @ 2005-09-15 14:13 UTC (permalink / raw) To: James Bottomley Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Thu, 15 Sep 2005, James Bottomley wrote: > On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote: > > On Wed, 14 Sep 2005, James Bottomley wrote: > > > Yes ... really the only case for unprep is when we've partially released > > > the command (like in scsi_io_completion) where we need to tear the rest > > > of it down. > > > > In other words, in scsi_requeue_command and nowhere else. > > Pretty much, yes. > > > Or will be prepared for the first time, as in scsi_execute. As far as I > > can tell, a new struct request is not set to all 0's. So if you queue a > > request with REQ_SPECIAL set and you fail to clear req->special, you're in > > trouble. Do you have any idea why this hasn't been causing errors all > > along? > > That's true, it's not. However ll_rq_blk.c:rq_init() clears req- > >special (and initialises all other important fields). (*Sigh*... I'm trying to do this too fast, not following up properly on all the code paths.) Okay, good, glad to hear it. > > Okay, then how does this patch look (moved the routine over to where it > > gets used, plus two real changes)? > > Well ... under pressure to fix this in -mm, I already commited a version > to rc-fixes. What I did was fully reverse the changes to the > scsi_insert_queue() [the patch I sent Anton]. We can move the unprep > function if you feel strongly about it, but I'm also happy to keep it > where it is. I don't care where the function goes, so just leave it. That leaves only the question of the call to scsi_unprep_request near the end of scsi_request_fn, in the not_ready: section. Looks like that call isn't needed and can be taken out also, do you agree? Alan Stern ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-15 13:56 ` James Bottomley 2005-09-15 14:13 ` Alan Stern @ 2005-09-15 17:52 ` Alan Stern 1 sibling, 0 replies; 21+ messages in thread From: Alan Stern @ 2005-09-15 17:52 UTC (permalink / raw) To: James Bottomley Cc: Anton Blanchard, Dipankar Sarma, SCSI Mailing List, Linux Kernel On Thu, 15 Sep 2005, James Bottomley wrote: > On Wed, 2005-09-14 at 17:33 -0400, Alan Stern wrote: > > On Wed, 14 Sep 2005, James Bottomley wrote: > > > Yes ... really the only case for unprep is when we've partially released > > > the command (like in scsi_io_completion) where we need to tear the rest > > > of it down. > > > > In other words, in scsi_requeue_command and nowhere else. > > Pretty much, yes. I found one other thing that needs to be fixed. The call to scsi_release_buffers in scsi_unprep_request causes an oops, because the sgtable has already been freed in scsi_io_completion. The following patch is needed. Alan Stern Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Index: usb-2.6/drivers/scsi/scsi_lib.c =================================================================== --- usb-2.6.orig/drivers/scsi/scsi_lib.c +++ usb-2.6/drivers/scsi/scsi_lib.c @@ -118,7 +118,6 @@ static void scsi_unprep_request(struct r req->flags &= ~REQ_DONTPREP; req->special = (req->flags & REQ_SPECIAL) ? cmd->sc_request : NULL; - scsi_release_buffers(cmd); scsi_put_command(cmd); } @@ -1514,7 +1513,6 @@ static void scsi_request_fn(struct reque * cases (host limits or settings) should run the queue at some * later time. */ - scsi_unprep_request(req); spin_lock_irq(q->queue_lock); blk_requeue_request(q, req); sdev->device_busy--; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [2.6.14-rc1] sym scsi boot hang 2005-09-14 16:57 ` James Bottomley 2005-09-14 20:19 ` Alan Stern @ 2005-09-16 10:28 ` Anton Blanchard 1 sibling, 0 replies; 21+ messages in thread From: Anton Blanchard @ 2005-09-16 10:28 UTC (permalink / raw) To: James Bottomley; +Cc: Dipankar Sarma, SCSI Mailing List, Linux Kernel, stern Hi, > OK, my fault. Your fix is almost correct .. I was going to do this > eventually, honest, because there's no need to unprep and reprep a > command that comes in through scsi_queue_insert(). > > However, I decided to leave it in to exercise the scsi_unprep_request() > path just to make sure it was working. What's happening, I think, is > that we also use this path for retries. Since we kill and reget the > command each time, the retries decrement is never seen, so we're > retrying forever. > > This should be the correct reversal. Thanks James, that did the trick. Anton ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2005-09-16 10:31 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-13 12:48 [2.6.14-rc1] sym scsi boot hang Dipankar Sarma 2005-09-13 13:17 ` Anton Blanchard 2005-09-13 14:29 ` Anton Blanchard 2005-09-13 16:35 ` James Bottomley 2005-09-13 16:47 ` Anton Blanchard 2005-09-13 17:32 ` James Bottomley 2005-09-13 17:13 ` Anton Blanchard 2005-09-13 17:33 ` Dipankar Sarma 2005-09-14 8:06 ` Anton Blanchard 2005-09-14 15:49 ` Alan Stern 2005-09-14 16:52 ` Mike Christie 2005-09-14 16:53 ` Mike Christie 2005-09-14 20:35 ` James Bottomley 2005-09-14 16:57 ` James Bottomley 2005-09-14 20:19 ` Alan Stern 2005-09-14 20:44 ` James Bottomley 2005-09-14 21:33 ` Alan Stern 2005-09-15 13:56 ` James Bottomley 2005-09-15 14:13 ` Alan Stern 2005-09-15 17:52 ` Alan Stern 2005-09-16 10:28 ` Anton Blanchard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox