public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* host_self_blocked question/bug?
@ 2003-11-25 21:46 Brian King
  2003-11-25 21:55 ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2003-11-25 21:46 UTC (permalink / raw)
  To: linux-scsi

I am writing an HBA driver for 2.6 and found that when host_self_blocked 
is true the error handler will still send Test Unit Ready. This seems 
like a bug. I would like to be able to use scsi_block_requests to stop 
anything from coming into queuecommand, as I may be running BIST on the 
adapter, downloading microcode, etc. in which case I cannot accept any 
commands. It seems to me like the solution might be to simply have 
scsi_eh_tur return success in if host_self_blocked is true. Thoughts?


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: host_self_blocked question/bug?
  2003-11-25 21:46 host_self_blocked question/bug? Brian King
@ 2003-11-25 21:55 ` James Bottomley
  2003-11-25 22:31   ` Brian King
  0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2003-11-25 21:55 UTC (permalink / raw)
  To: Brian King; +Cc: SCSI Mailing List

On Tue, 2003-11-25 at 15:46, Brian King wrote:
> I am writing an HBA driver for 2.6 and found that when host_self_blocked 
> is true the error handler will still send Test Unit Ready. This seems 
> like a bug. I would like to be able to use scsi_block_requests to stop 
> anything from coming into queuecommand, as I may be running BIST on the 
> adapter, downloading microcode, etc. in which case I cannot accept any 
> commands. It seems to me like the solution might be to simply have 
> scsi_eh_tur return success in if host_self_blocked is true. Thoughts?

Not in isolation...what are you trying to do?

The original design was to allow short hiatuses when the HBA couldn't
accept I/O.  It doesn't work if there's I/O pending (unless the stop is
very short), because the SCSI timers are still ticking and error
recovery doesn't see this flag.

There has been talk of making this interface robust to pending commands
(halt the timers and freeze the error handler) for FC HBA's that take
ages to process loop events, but no work has been done on this---it's
quite a bit more work than simply not allowing the eh to emit TURs.

James



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: host_self_blocked question/bug?
  2003-11-25 21:55 ` James Bottomley
@ 2003-11-25 22:31   ` Brian King
  2003-11-26  0:32     ` Patrick Mansfield
  0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2003-11-25 22:31 UTC (permalink / raw)
  To: linux-scsi

James Bottomley wrote:
> The original design was to allow short hiatuses when the HBA couldn't
> accept I/O.  It doesn't work if there's I/O pending (unless the stop is
> very short), because the SCSI timers are still ticking and error
> recovery doesn't see this flag.
> 
> There has been talk of making this interface robust to pending commands
> (halt the timers and freeze the error handler) for FC HBA's that take
> ages to process loop events, but no work has been done on this---it's
> quite a bit more work than simply not allowing the eh to emit TURs.

I'd like a way to be able to stop the mid-layer from sending me any 
commands. The scenarios I have today are:

1. Fatal error on the adapter.
2. microcode download to the adapter.
3. Adapter cache recovery commands.

All of these cases require me to run BIST on the adapter and bring it 
back up. To do this may take 20-30 seconds. I call scsi_block_requests, 
fail all pending ops back with DID_ERROR, reset the adapter, then call 
scsi_unblock_requests. My usage of it gets around the ticking timer 
problem. I agree that the error recovery thread doesn't see this either 
and that this is a potential problem. I had planned to work around that 
by failing abort and device reset, forcing the host_reset to be called, 
which would wait on the completion of the adapter reset, but it would be 
nice if I didn't have to do that.



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: host_self_blocked question/bug?
  2003-11-25 22:31   ` Brian King
@ 2003-11-26  0:32     ` Patrick Mansfield
  2003-11-26 13:44       ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Mansfield @ 2003-11-26  0:32 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Tue, Nov 25, 2003 at 04:31:09PM -0600, Brian King wrote:
> James Bottomley wrote:
> > The original design was to allow short hiatuses when the HBA couldn't
> > accept I/O.  It doesn't work if there's I/O pending (unless the stop is
> > very short), because the SCSI timers are still ticking and error
> > recovery doesn't see this flag.
> > 
> > There has been talk of making this interface robust to pending commands
> > (halt the timers and freeze the error handler) for FC HBA's that take
> > ages to process loop events, but no work has been done on this---it's
> > quite a bit more work than simply not allowing the eh to emit TURs.

> I'd like a way to be able to stop the mid-layer from sending me any 
> commands. The scenarios I have today are:
> 
> 1. Fatal error on the adapter.
> 2. microcode download to the adapter.
> 3. Adapter cache recovery commands.
> 
> All of these cases require me to run BIST on the adapter and bring it 
> back up. To do this may take 20-30 seconds. I call scsi_block_requests, 
> fail all pending ops back with DID_ERROR, reset the adapter, then call 
> scsi_unblock_requests. My usage of it gets around the ticking timer 
> problem. I agree that the error recovery thread doesn't see this either 
> and that this is a potential problem. I had planned to work around that 
> by failing abort and device reset, forcing the host_reset to be called, 
> which would wait on the completion of the adapter reset, but it would be 
> nice if I didn't have to do that.

Given the above conditions: could we not start up the eh, and abort the eh
(and start it up again when unblocked) if already running and
we see host_self_blocked is set?

The following blocks the error handler from starting up, then we need code
to abort the error handler.

(There should be locking around all the setting and checking of
host_self_blocked.)

Untested, compiled only patch against main line bk:

===== drivers/scsi/scsi_error.c 1.65 vs edited =====
--- 1.65/drivers/scsi/scsi_error.c	Sun Sep 21 10:49:36 2003
+++ edited/drivers/scsi/scsi_error.c	Tue Nov 25 16:11:01 2003
@@ -47,7 +47,8 @@
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
-	if (shost->host_busy == shost->host_failed) {
+	if ((shost->host_busy == shost->host_failed) &&
+	    !shost->host_self_blocked) {
 		up(shost->eh_wait);
 		SCSI_LOG_ERROR_RECOVERY(5,
 				printk("Waking error handler thread\n"));
===== drivers/scsi/scsi_lib.c 1.113 vs edited =====
--- 1.113/drivers/scsi/scsi_lib.c	Sat Sep 20 06:53:02 2003
+++ edited/drivers/scsi/scsi_lib.c	Tue Nov 25 16:12:30 2003
@@ -1303,6 +1303,7 @@
 {
 	shost->host_self_blocked = 0;
 	scsi_run_host_queues(shost);
+	scsi_eh_wakeup(shost);
 }
 
 int __init scsi_init_queue(void)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: host_self_blocked question/bug?
  2003-11-26  0:32     ` Patrick Mansfield
@ 2003-11-26 13:44       ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2003-11-26 13:44 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Brian King, linux-scsi

On Tue, Nov 25, 2003 at 04:32:04PM -0800, Patrick Mansfield wrote:
> (There should be locking around all the setting and checking of
> host_self_blocked.)

the same is true for many of the 1 bit bitfileds in struct Scsi_Host/
scsi_device, we should probably use the atomic bitmap operations for
those.  It's been on my todo list for a while, but I wonder whether
that's enough of an interface change to be postponed to 2.7.x.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-11-26 13:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-25 21:46 host_self_blocked question/bug? Brian King
2003-11-25 21:55 ` James Bottomley
2003-11-25 22:31   ` Brian King
2003-11-26  0:32     ` Patrick Mansfield
2003-11-26 13:44       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox