public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Possible bug handling bad I/Os?
@ 2002-08-29 13:34 Michael Heinz
  2002-08-29 16:41 ` Doug Ledford
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Heinz @ 2002-08-29 13:34 UTC (permalink / raw)
  To: linux-scsi

I ran into an interesting problem recently, and I'd like to ask if I chose
the correct solution.

The "virtual" HBA driver I've written was having a problem recovering when
it temporarily lost contact with the remote hardware. All signs were that
the HBA itself recovered, but that SCSI stopped issuing I/Os.

The last thing to happen seemed to be that my driver would get a call to
queue_command while I knew the connection was down. Since I knew the
connection was down I would simply immediately return an error and do
nothing else.

Since I already had a watchdog process to manage the reconnect, as an
experiment, I tried putting the bad SCSI_Cmnd on a linked list and returning
success to the SCSI layer. A few seconds later, the watchdog picks up the
command and calls the calls its done function. This immediately resolved the
problem!

So, my question is: Is this the right way to handle this problem, or is
there another issue? At least part of the SCSI system knows the command was
bad, because it never tries to abort it - but it never issues another
command, either.

Any suggestions?
-- 
Michael Heinz <mheinz@infiniconsys.com>
Staff Software Engineer
InfiniCon Systems, Inc.


^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: Possible bug handling bad I/Os?
@ 2002-08-29 15:12 Martin Peschke3
  2002-08-29 15:14 ` Michael Heinz
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Peschke3 @ 2002-08-29 15:12 UTC (permalink / raw)
  To: Michael Heinz; +Cc: linux-scsi


The mid-layer queueing code is know to have a starvation problem
under certain conditions. Mid-layer queueing is used if queuecommands
fails.
I think there was a thread about it a few months ago.
We implemented a queue processed by a timer in our HBA driver.
Same thing as you did.
Seems that nobody has tried a mid-layer fix yet.

Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349


Michael Heinz <mheinz@infiniconsys.com>@vger.kernel.org on 29.08.2002
15:34:28

Sent by:    linux-scsi-owner@vger.kernel.org


To:    linux-scsi <linux-scsi@vger.kernel.org>
cc:
Subject:    Possible bug handling bad I/Os?



I ran into an interesting problem recently, and I'd like to ask if I chose
the correct solution.

The "virtual" HBA driver I've written was having a problem recovering when
it temporarily lost contact with the remote hardware. All signs were that
the HBA itself recovered, but that SCSI stopped issuing I/Os.

The last thing to happen seemed to be that my driver would get a call to
queue_command while I knew the connection was down. Since I knew the
connection was down I would simply immediately return an error and do
nothing else.

Since I already had a watchdog process to manage the reconnect, as an
experiment, I tried putting the bad SCSI_Cmnd on a linked list and
returning
success to the SCSI layer. A few seconds later, the watchdog picks up the
command and calls the calls its done function. This immediately resolved
the
problem!

So, my question is: Is this the right way to handle this problem, or is
there another issue? At least part of the SCSI system knows the command was
bad, because it never tries to abort it - but it never issues another
command, either.

Any suggestions?
--
Michael Heinz <mheinz@infiniconsys.com>
Staff Software Engineer
InfiniCon Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-08-29 19:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-29 13:34 Possible bug handling bad I/Os? Michael Heinz
2002-08-29 16:41 ` Doug Ledford
2002-08-29 16:58   ` Michael Heinz
2002-08-29 17:11   ` Michael Heinz
2002-08-29 17:27     ` Doug Ledford
2002-08-29 19:16       ` Luben Tuikov
2002-08-29 19:43         ` Doug Ledford
  -- strict thread matches above, loose matches on Subject: below --
2002-08-29 15:12 Martin Peschke3
2002-08-29 15:14 ` Michael Heinz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox