* host_self_blocked question/bug? @ 2003-11-25 21:46 Brian King 2003-11-25 21:55 ` James Bottomley 0 siblings, 1 reply; 5+ messages in thread From: Brian King @ 2003-11-25 21:46 UTC (permalink / raw) To: linux-scsi I am writing an HBA driver for 2.6 and found that when host_self_blocked is true the error handler will still send Test Unit Ready. This seems like a bug. I would like to be able to use scsi_block_requests to stop anything from coming into queuecommand, as I may be running BIST on the adapter, downloading microcode, etc. in which case I cannot accept any commands. It seems to me like the solution might be to simply have scsi_eh_tur return success in if host_self_blocked is true. Thoughts? -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: host_self_blocked question/bug? 2003-11-25 21:46 host_self_blocked question/bug? Brian King @ 2003-11-25 21:55 ` James Bottomley 2003-11-25 22:31 ` Brian King 0 siblings, 1 reply; 5+ messages in thread From: James Bottomley @ 2003-11-25 21:55 UTC (permalink / raw) To: Brian King; +Cc: SCSI Mailing List On Tue, 2003-11-25 at 15:46, Brian King wrote: > I am writing an HBA driver for 2.6 and found that when host_self_blocked > is true the error handler will still send Test Unit Ready. This seems > like a bug. I would like to be able to use scsi_block_requests to stop > anything from coming into queuecommand, as I may be running BIST on the > adapter, downloading microcode, etc. in which case I cannot accept any > commands. It seems to me like the solution might be to simply have > scsi_eh_tur return success in if host_self_blocked is true. Thoughts? Not in isolation...what are you trying to do? The original design was to allow short hiatuses when the HBA couldn't accept I/O. It doesn't work if there's I/O pending (unless the stop is very short), because the SCSI timers are still ticking and error recovery doesn't see this flag. There has been talk of making this interface robust to pending commands (halt the timers and freeze the error handler) for FC HBA's that take ages to process loop events, but no work has been done on this---it's quite a bit more work than simply not allowing the eh to emit TURs. James ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: host_self_blocked question/bug? 2003-11-25 21:55 ` James Bottomley @ 2003-11-25 22:31 ` Brian King 2003-11-26 0:32 ` Patrick Mansfield 0 siblings, 1 reply; 5+ messages in thread From: Brian King @ 2003-11-25 22:31 UTC (permalink / raw) To: linux-scsi James Bottomley wrote: > The original design was to allow short hiatuses when the HBA couldn't > accept I/O. It doesn't work if there's I/O pending (unless the stop is > very short), because the SCSI timers are still ticking and error > recovery doesn't see this flag. > > There has been talk of making this interface robust to pending commands > (halt the timers and freeze the error handler) for FC HBA's that take > ages to process loop events, but no work has been done on this---it's > quite a bit more work than simply not allowing the eh to emit TURs. I'd like a way to be able to stop the mid-layer from sending me any commands. The scenarios I have today are: 1. Fatal error on the adapter. 2. microcode download to the adapter. 3. Adapter cache recovery commands. All of these cases require me to run BIST on the adapter and bring it back up. To do this may take 20-30 seconds. I call scsi_block_requests, fail all pending ops back with DID_ERROR, reset the adapter, then call scsi_unblock_requests. My usage of it gets around the ticking timer problem. I agree that the error recovery thread doesn't see this either and that this is a potential problem. I had planned to work around that by failing abort and device reset, forcing the host_reset to be called, which would wait on the completion of the adapter reset, but it would be nice if I didn't have to do that. -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: host_self_blocked question/bug? 2003-11-25 22:31 ` Brian King @ 2003-11-26 0:32 ` Patrick Mansfield 2003-11-26 13:44 ` Christoph Hellwig 0 siblings, 1 reply; 5+ messages in thread From: Patrick Mansfield @ 2003-11-26 0:32 UTC (permalink / raw) To: Brian King; +Cc: linux-scsi On Tue, Nov 25, 2003 at 04:31:09PM -0600, Brian King wrote: > James Bottomley wrote: > > The original design was to allow short hiatuses when the HBA couldn't > > accept I/O. It doesn't work if there's I/O pending (unless the stop is > > very short), because the SCSI timers are still ticking and error > > recovery doesn't see this flag. > > > > There has been talk of making this interface robust to pending commands > > (halt the timers and freeze the error handler) for FC HBA's that take > > ages to process loop events, but no work has been done on this---it's > > quite a bit more work than simply not allowing the eh to emit TURs. > I'd like a way to be able to stop the mid-layer from sending me any > commands. The scenarios I have today are: > > 1. Fatal error on the adapter. > 2. microcode download to the adapter. > 3. Adapter cache recovery commands. > > All of these cases require me to run BIST on the adapter and bring it > back up. To do this may take 20-30 seconds. I call scsi_block_requests, > fail all pending ops back with DID_ERROR, reset the adapter, then call > scsi_unblock_requests. My usage of it gets around the ticking timer > problem. I agree that the error recovery thread doesn't see this either > and that this is a potential problem. I had planned to work around that > by failing abort and device reset, forcing the host_reset to be called, > which would wait on the completion of the adapter reset, but it would be > nice if I didn't have to do that. Given the above conditions: could we not start up the eh, and abort the eh (and start it up again when unblocked) if already running and we see host_self_blocked is set? The following blocks the error handler from starting up, then we need code to abort the error handler. (There should be locking around all the setting and checking of host_self_blocked.) Untested, compiled only patch against main line bk: ===== drivers/scsi/scsi_error.c 1.65 vs edited ===== --- 1.65/drivers/scsi/scsi_error.c Sun Sep 21 10:49:36 2003 +++ edited/drivers/scsi/scsi_error.c Tue Nov 25 16:11:01 2003 @@ -47,7 +47,8 @@ /* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) { - if (shost->host_busy == shost->host_failed) { + if ((shost->host_busy == shost->host_failed) && + !shost->host_self_blocked) { up(shost->eh_wait); SCSI_LOG_ERROR_RECOVERY(5, printk("Waking error handler thread\n")); ===== drivers/scsi/scsi_lib.c 1.113 vs edited ===== --- 1.113/drivers/scsi/scsi_lib.c Sat Sep 20 06:53:02 2003 +++ edited/drivers/scsi/scsi_lib.c Tue Nov 25 16:12:30 2003 @@ -1303,6 +1303,7 @@ { shost->host_self_blocked = 0; scsi_run_host_queues(shost); + scsi_eh_wakeup(shost); } int __init scsi_init_queue(void) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: host_self_blocked question/bug? 2003-11-26 0:32 ` Patrick Mansfield @ 2003-11-26 13:44 ` Christoph Hellwig 0 siblings, 0 replies; 5+ messages in thread From: Christoph Hellwig @ 2003-11-26 13:44 UTC (permalink / raw) To: Patrick Mansfield; +Cc: Brian King, linux-scsi On Tue, Nov 25, 2003 at 04:32:04PM -0800, Patrick Mansfield wrote: > (There should be locking around all the setting and checking of > host_self_blocked.) the same is true for many of the 1 bit bitfileds in struct Scsi_Host/ scsi_device, we should probably use the atomic bitmap operations for those. It's been on my todo list for a while, but I wonder whether that's enough of an interface change to be postponed to 2.7.x. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-11-26 13:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-11-25 21:46 host_self_blocked question/bug? Brian King 2003-11-25 21:55 ` James Bottomley 2003-11-25 22:31 ` Brian King 2003-11-26 0:32 ` Patrick Mansfield 2003-11-26 13:44 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox