* Block timeouts seem not to be working
@ 2008-09-11 15:05 James Bottomley
2008-09-11 15:42 ` Mike Anderson
2008-09-11 18:35 ` Jens Axboe
0 siblings, 2 replies; 5+ messages in thread
From: James Bottomley @ 2008-09-11 15:05 UTC (permalink / raw)
To: Jens Axboe, Mike Anderson; +Cc: linux-scsi
I just noticed this with a rather finickey SAS system I have. It's got
a SATA DVD attached over an expander. Periodically the DVD just hangs
up, so we wait for the timeout and then send a phy reset which clears
it.
What I'm seeing with the new block timer code is that the timer never
expires. I can dig some more into this, but if you wanted to test it as
well, the timer code is easy to excite. Just throw away one command in
every 128 or so in the queuecommand routine of your favourite HBA
driver.
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working
2008-09-11 15:05 Block timeouts seem not to be working James Bottomley
@ 2008-09-11 15:42 ` Mike Anderson
2008-09-11 18:35 ` Jens Axboe
1 sibling, 0 replies; 5+ messages in thread
From: Mike Anderson @ 2008-09-11 15:42 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> I just noticed this with a rather finickey SAS system I have. It's got
> a SATA DVD attached over an expander. Periodically the DVD just hangs
> up, so we wait for the timeout and then send a phy reset which clears
> it.
>
> What I'm seeing with the new block timer code is that the timer never
> expires. I can dig some more into this, but if you wanted to test it as
> well, the timer code is easy to excite. Just throw away one command in
> every 128 or so in the queuecommand routine of your favourite HBA
> driver.
I have not seen the case of the timer never expiring, but will look into
other test cases.
Mike C and I where seeing timeout issues (host staying in recovery state
or list debug bugon's) when running tests with timeouts set to 1 or 2
seconds. We are working with Jens to address this. The issue we where
hitting appears related to the scsi_eh / scsi_done completion
synchronization.
-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working
2008-09-11 15:05 Block timeouts seem not to be working James Bottomley
2008-09-11 15:42 ` Mike Anderson
@ 2008-09-11 18:35 ` Jens Axboe
2008-09-12 21:46 ` James Bottomley
1 sibling, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2008-09-11 18:35 UTC (permalink / raw)
To: James Bottomley; +Cc: Mike Anderson, linux-scsi
On Thu, Sep 11 2008, James Bottomley wrote:
> I just noticed this with a rather finickey SAS system I have. It's got
> a SATA DVD attached over an expander. Periodically the DVD just hangs
> up, so we wait for the timeout and then send a phy reset which clears
> it.
>
> What I'm seeing with the new block timer code is that the timer never
> expires. I can dig some more into this, but if you wanted to test it as
> well, the timer code is easy to excite. Just throw away one command in
> every 128 or so in the queuecommand routine of your favourite HBA
> driver.
James, I've seen a few oddities as well, I'll be beating on it tomorrow
again to shake out the last bug(s).
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working
2008-09-11 18:35 ` Jens Axboe
@ 2008-09-12 21:46 ` James Bottomley
2008-09-15 20:01 ` Mike Christie
0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2008-09-12 21:46 UTC (permalink / raw)
To: Jens Axboe, Mike Christie; +Cc: Mike Anderson, linux-scsi
On Thu, 2008-09-11 at 20:35 +0200, Jens Axboe wrote:
> On Thu, Sep 11 2008, James Bottomley wrote:
> > I just noticed this with a rather finickey SAS system I have. It's got
> > a SATA DVD attached over an expander. Periodically the DVD just hangs
> > up, so we wait for the timeout and then send a phy reset which clears
> > it.
> >
> > What I'm seeing with the new block timer code is that the timer never
> > expires. I can dig some more into this, but if you wanted to test it as
> > well, the timer code is easy to excite. Just throw away one command in
> > every 128 or so in the queuecommand routine of your favourite HBA
> > driver.
>
> James, I've seen a few oddities as well, I'll be beating on it tomorrow
> again to shake out the last bug(s).
Actually, turns out it's nothing to do with block timeouts, it's a
target reset bug.
This loop:
for (id = 0; id <= shost->max_id; id++) {
Never terminates if shost->max_id is set to ~0, like aic94xx does.
It's also pretty inefficient since you mostly have compact target
numbers, but the max_id can be very high. The best way would be to sort
the recovery list by target id and skip them if they're equal, but even
a worst case O(N^2) traversal is probably OK here.
James
---
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ad019ec..94ed262 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1065,10 +1065,10 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
struct list_head *done_q)
{
struct scsi_cmnd *scmd, *tgtr_scmd, *next;
- unsigned int id;
+ unsigned int id = 0;
int rtn;
- for (id = 0; id <= shost->max_id; id++) {
+ do {
tgtr_scmd = NULL;
list_for_each_entry(scmd, work_q, eh_entry) {
if (id == scmd_id(scmd)) {
@@ -1076,8 +1076,18 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
break;
}
}
+ if (!tgtr_scmd) {
+ /* not one exactly equal; find the next highest */
+ list_for_each_entry(scmd, work_q, eh_entry) {
+ if (scmd_id(scmd) > id &&
+ (!tgtr_scmd ||
+ scmd_id(tgtr_scmd) > scmd_id(scmd)))
+ tgtr_scmd = scmd;
+ }
+ }
if (!tgtr_scmd)
- continue;
+ /* no more commands, that's it */
+ break;
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
"to target %d\n",
@@ -1096,7 +1106,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
" failed target: "
"%d\n",
current->comm, id));
- }
+ id++;
+ } while(id != 0);
return list_empty(work_q);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Block timeouts seem not to be working
2008-09-12 21:46 ` James Bottomley
@ 2008-09-15 20:01 ` Mike Christie
0 siblings, 0 replies; 5+ messages in thread
From: Mike Christie @ 2008-09-15 20:01 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Mike Anderson, linux-scsi
James Bottomley wrote:
> On Thu, 2008-09-11 at 20:35 +0200, Jens Axboe wrote:
>> On Thu, Sep 11 2008, James Bottomley wrote:
>>> I just noticed this with a rather finickey SAS system I have. It's got
>>> a SATA DVD attached over an expander. Periodically the DVD just hangs
>>> up, so we wait for the timeout and then send a phy reset which clears
>>> it.
>>>
>>> What I'm seeing with the new block timer code is that the timer never
>>> expires. I can dig some more into this, but if you wanted to test it as
>>> well, the timer code is easy to excite. Just throw away one command in
>>> every 128 or so in the queuecommand routine of your favourite HBA
>>> driver.
>> James, I've seen a few oddities as well, I'll be beating on it tomorrow
>> again to shake out the last bug(s).
>
> Actually, turns out it's nothing to do with block timeouts, it's a
> target reset bug.
>
> This loop:
>
> for (id = 0; id <= shost->max_id; id++) {
>
> Never terminates if shost->max_id is set to ~0, like aic94xx does.
>
> It's also pretty inefficient since you mostly have compact target
> numbers, but the max_id can be very high. The best way would be to sort
> the recovery list by target id and skip them if they're equal, but even
> a worst case O(N^2) traversal is probably OK here.
>
Sorry about that. I really screwed up on multiple counts there. I tested
your patch with Linus's tree here on some drivers by setting the command
timeout to 1 second and letting IO run against slow targets. I saw the
eh run multiple times and it ran fine with your patch. Thanks for
finding that.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-15 20:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-11 15:05 Block timeouts seem not to be working James Bottomley
2008-09-11 15:42 ` Mike Anderson
2008-09-11 18:35 ` Jens Axboe
2008-09-12 21:46 ` James Bottomley
2008-09-15 20:01 ` Mike Christie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox