* Block timeouts seem not to be working @ 2008-09-11 15:05 James Bottomley 2008-09-11 15:42 ` Mike Anderson 2008-09-11 18:35 ` Jens Axboe 0 siblings, 2 replies; 5+ messages in thread From: James Bottomley @ 2008-09-11 15:05 UTC (permalink / raw) To: Jens Axboe, Mike Anderson; +Cc: linux-scsi I just noticed this with a rather finickey SAS system I have. It's got a SATA DVD attached over an expander. Periodically the DVD just hangs up, so we wait for the timeout and then send a phy reset which clears it. What I'm seeing with the new block timer code is that the timer never expires. I can dig some more into this, but if you wanted to test it as well, the timer code is easy to excite. Just throw away one command in every 128 or so in the queuecommand routine of your favourite HBA driver. James ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working 2008-09-11 15:05 Block timeouts seem not to be working James Bottomley @ 2008-09-11 15:42 ` Mike Anderson 2008-09-11 18:35 ` Jens Axboe 1 sibling, 0 replies; 5+ messages in thread From: Mike Anderson @ 2008-09-11 15:42 UTC (permalink / raw) To: James Bottomley; +Cc: Jens Axboe, linux-scsi James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > I just noticed this with a rather finickey SAS system I have. It's got > a SATA DVD attached over an expander. Periodically the DVD just hangs > up, so we wait for the timeout and then send a phy reset which clears > it. > > What I'm seeing with the new block timer code is that the timer never > expires. I can dig some more into this, but if you wanted to test it as > well, the timer code is easy to excite. Just throw away one command in > every 128 or so in the queuecommand routine of your favourite HBA > driver. I have not seen the case of the timer never expiring, but will look into other test cases. Mike C and I where seeing timeout issues (host staying in recovery state or list debug bugon's) when running tests with timeouts set to 1 or 2 seconds. We are working with Jens to address this. The issue we where hitting appears related to the scsi_eh / scsi_done completion synchronization. -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working 2008-09-11 15:05 Block timeouts seem not to be working James Bottomley 2008-09-11 15:42 ` Mike Anderson @ 2008-09-11 18:35 ` Jens Axboe 2008-09-12 21:46 ` James Bottomley 1 sibling, 1 reply; 5+ messages in thread From: Jens Axboe @ 2008-09-11 18:35 UTC (permalink / raw) To: James Bottomley; +Cc: Mike Anderson, linux-scsi On Thu, Sep 11 2008, James Bottomley wrote: > I just noticed this with a rather finickey SAS system I have. It's got > a SATA DVD attached over an expander. Periodically the DVD just hangs > up, so we wait for the timeout and then send a phy reset which clears > it. > > What I'm seeing with the new block timer code is that the timer never > expires. I can dig some more into this, but if you wanted to test it as > well, the timer code is easy to excite. Just throw away one command in > every 128 or so in the queuecommand routine of your favourite HBA > driver. James, I've seen a few oddities as well, I'll be beating on it tomorrow again to shake out the last bug(s). -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working 2008-09-11 18:35 ` Jens Axboe @ 2008-09-12 21:46 ` James Bottomley 2008-09-15 20:01 ` Mike Christie 0 siblings, 1 reply; 5+ messages in thread From: James Bottomley @ 2008-09-12 21:46 UTC (permalink / raw) To: Jens Axboe, Mike Christie; +Cc: Mike Anderson, linux-scsi On Thu, 2008-09-11 at 20:35 +0200, Jens Axboe wrote: > On Thu, Sep 11 2008, James Bottomley wrote: > > I just noticed this with a rather finickey SAS system I have. It's got > > a SATA DVD attached over an expander. Periodically the DVD just hangs > > up, so we wait for the timeout and then send a phy reset which clears > > it. > > > > What I'm seeing with the new block timer code is that the timer never > > expires. I can dig some more into this, but if you wanted to test it as > > well, the timer code is easy to excite. Just throw away one command in > > every 128 or so in the queuecommand routine of your favourite HBA > > driver. > > James, I've seen a few oddities as well, I'll be beating on it tomorrow > again to shake out the last bug(s). Actually, turns out it's nothing to do with block timeouts, it's a target reset bug. This loop: for (id = 0; id <= shost->max_id; id++) { Never terminates if shost->max_id is set to ~0, like aic94xx does. It's also pretty inefficient since you mostly have compact target numbers, but the max_id can be very high. The best way would be to sort the recovery list by target id and skip them if they're equal, but even a worst case O(N^2) traversal is probably OK here. James --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index ad019ec..94ed262 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1065,10 +1065,10 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, struct list_head *done_q) { struct scsi_cmnd *scmd, *tgtr_scmd, *next; - unsigned int id; + unsigned int id = 0; int rtn; - for (id = 0; id <= shost->max_id; id++) { + do { tgtr_scmd = NULL; list_for_each_entry(scmd, work_q, eh_entry) { if (id == scmd_id(scmd)) { @@ -1076,8 +1076,18 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, break; } } + if (!tgtr_scmd) { + /* not one exactly equal; find the next highest */ + list_for_each_entry(scmd, work_q, eh_entry) { + if (scmd_id(scmd) > id && + (!tgtr_scmd || + scmd_id(tgtr_scmd) > scmd_id(scmd))) + tgtr_scmd = scmd; + } + } if (!tgtr_scmd) - continue; + /* no more commands, that's it */ + break; SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset " "to target %d\n", @@ -1096,7 +1106,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, " failed target: " "%d\n", current->comm, id)); - } + id++; + } while(id != 0); return list_empty(work_q); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working 2008-09-12 21:46 ` James Bottomley @ 2008-09-15 20:01 ` Mike Christie 0 siblings, 0 replies; 5+ messages in thread From: Mike Christie @ 2008-09-15 20:01 UTC (permalink / raw) To: James Bottomley; +Cc: Jens Axboe, Mike Anderson, linux-scsi James Bottomley wrote: > On Thu, 2008-09-11 at 20:35 +0200, Jens Axboe wrote: >> On Thu, Sep 11 2008, James Bottomley wrote: >>> I just noticed this with a rather finickey SAS system I have. It's got >>> a SATA DVD attached over an expander. Periodically the DVD just hangs >>> up, so we wait for the timeout and then send a phy reset which clears >>> it. >>> >>> What I'm seeing with the new block timer code is that the timer never >>> expires. I can dig some more into this, but if you wanted to test it as >>> well, the timer code is easy to excite. Just throw away one command in >>> every 128 or so in the queuecommand routine of your favourite HBA >>> driver. >> James, I've seen a few oddities as well, I'll be beating on it tomorrow >> again to shake out the last bug(s). > > Actually, turns out it's nothing to do with block timeouts, it's a > target reset bug. > > This loop: > > for (id = 0; id <= shost->max_id; id++) { > > Never terminates if shost->max_id is set to ~0, like aic94xx does. > > It's also pretty inefficient since you mostly have compact target > numbers, but the max_id can be very high. The best way would be to sort > the recovery list by target id and skip them if they're equal, but even > a worst case O(N^2) traversal is probably OK here. > Sorry about that. I really screwed up on multiple counts there. I tested your patch with Linus's tree here on some drivers by setting the command timeout to 1 second and letting IO run against slow targets. I saw the eh run multiple times and it ran fine with your patch. Thanks for finding that. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-15 20:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-11 15:05 Block timeouts seem not to be working James Bottomley 2008-09-11 15:42 ` Mike Anderson 2008-09-11 18:35 ` Jens Axboe 2008-09-12 21:46 ` James Bottomley 2008-09-15 20:01 ` Mike Christie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox