[PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE
@ 2007-02-02 22:04 Edward Goggin
  2007-02-02 22:54 ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: Edward Goggin @ 2007-02-02 22:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, eric.moore

Patch Set Summary:

1	Define new SCSI ML host status DID_COND_REQUEUE and
	add its handling code to scsi_decide_disposition.
	Scsi_decide_disposition returns ADD_TO_MLQUEUE IFF
	not REQ_FAILFAST.

2	Return DID_COND_REQUEUE instead of DID_BUS_BUSY host status
	in MPT fusion driver when IOC status is SUCCESS and scsi
	status is busy.

Signed-off-by: Ed Goggin <egoggin@vmware.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE
  2007-02-02 22:04 [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE Edward Goggin
@ 2007-02-02 22:54 ` James Bottomley
  2007-02-02 23:11   ` Edward Goggin
  0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2007-02-02 22:54 UTC (permalink / raw)
  To: Edward Goggin; +Cc: linux-scsi, eric.moore

On Fri, 2007-02-02 at 17:04 -0500, Edward Goggin wrote:
> Patch Set Summary:
> 
> 1	Define new SCSI ML host status DID_COND_REQUEUE and
> 	add its handling code to scsi_decide_disposition.
> 	Scsi_decide_disposition returns ADD_TO_MLQUEUE IFF
> 	not REQ_FAILFAST.
> 
> 2	Return DID_COND_REQUEUE instead of DID_BUS_BUSY host status
> 	in MPT fusion driver when IOC status is SUCCESS and scsi
> 	status is busy.

Please, no.

In the first place, as I already said on the previous thread, I don't
think the driver should be interpreting the BUSY return.

Secondly, the original problem was with fibre devices which seemed to
want FAILFAST on BUSY (which looks very bogus to me), but no-one asked
for this behaviour to be preserved.  The original bug report:

        "When a target device responds with BUSY status, the MPT driver
        was
        sending DID_OK to the SCSI mid layer, which caused the IO to be
        retried indefinitely betweenthe mid layer and the 
        driver.  By changing the driver return status to DID_BUS_BUSY,
        the target BUSY status can now flow through the mid layer to an
        upper layer Failover driver, whichwill manage the I/O timeout."

is about behaviour which is now fixed (BUSY is retried for the command's
maximum lifetime but no longer).

Thirdly, the VMware issue was that the fibre fix was causing your
implementation to time out too fast.

The solution, then, as I said previously should simply be to pass the
BUSY status up unmodified from the fusion driver.

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE
  2007-02-02 22:54 ` James Bottomley
@ 2007-02-02 23:11   ` Edward Goggin
  2007-02-02 23:18     ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: Edward Goggin @ 2007-02-02 23:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, eric.moore

On Fri, 2007-02-02 at 16:54 -0600, James Bottomley wrote:
> On Fri, 2007-02-02 at 17:04 -0500, Edward Goggin wrote:
> > Patch Set Summary:
> > 
> > 1	Define new SCSI ML host status DID_COND_REQUEUE and
> > 	add its handling code to scsi_decide_disposition.
> > 	Scsi_decide_disposition returns ADD_TO_MLQUEUE IFF
> > 	not REQ_FAILFAST.
> > 
> > 2	Return DID_COND_REQUEUE instead of DID_BUS_BUSY host status
> > 	in MPT fusion driver when IOC status is SUCCESS and scsi
> > 	status is busy.
> 
> Please, no.
> 
> In the first place, as I already said on the previous thread, I don't
> think the driver should be interpreting the BUSY return.
> 
> Secondly, the original problem was with fibre devices which seemed to
> want FAILFAST on BUSY (which looks very bogus to me), but no-one asked
> for this behaviour to be preserved.  The original bug report:
> 
>         "When a target device responds with BUSY status, the MPT driver
>         was
>         sending DID_OK to the SCSI mid layer, which caused the IO to be
>         retried indefinitely betweenthe mid layer and the 
>         driver.  By changing the driver return status to DID_BUS_BUSY,
>         the target BUSY status can now flow through the mid layer to an
>         upper layer Failover driver, whichwill manage the I/O timeout."
>         
> is about behaviour which is now fixed (BUSY is retried for the command's
> maximum lifetime but no longer).
> 
> Thirdly, the VMware issue was that the fibre fix was causing your
> implementation to time out too fast.
> 
> The solution, then, as I said previously should simply be to pass the
> BUSY status up unmodified from the fusion driver.
> 

That solution doesn't work for the RDAC/MPP driver as the BUSY status
handler retries indefinitely.  We need a solution which works for both a
bare metal host running RDAC/MPP which for this use case, wants to get
control over the failed command ASAP and a VMware host which may need to
retry longer than DID_BUS_BUSY currently allows for.

I'll let the LSI/Engenio people comment further on their needs.

> James
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE
  2007-02-02 23:11   ` Edward Goggin
@ 2007-02-02 23:18     ` James Bottomley
  2007-02-02 23:33       ` Edward Goggin
  0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2007-02-02 23:18 UTC (permalink / raw)
  To: Edward Goggin; +Cc: linux-scsi, eric.moore

On Fri, 2007-02-02 at 18:11 -0500, Edward Goggin wrote:
> That solution doesn't work for the RDAC/MPP driver as the BUSY status
> handler retries indefinitely.  We need a solution which works for both a
> bare metal host running RDAC/MPP which for this use case, wants to get
> control over the failed command ASAP and a VMware host which may need to
> retry longer than DID_BUS_BUSY currently allows for.

No it doesn't, not any longer... the mid-layer retries for the command
up to its timeout before failing.  That's the point about questioning
the validity of the original problem.

James



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE
  2007-02-02 23:18     ` James Bottomley
@ 2007-02-02 23:33       ` Edward Goggin
  0 siblings, 0 replies; 5+ messages in thread
From: Edward Goggin @ 2007-02-02 23:33 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, eric.moore

On Fri, 2007-02-02 at 17:18 -0600, James Bottomley wrote:
> On Fri, 2007-02-02 at 18:11 -0500, Edward Goggin wrote:
> > That solution doesn't work for the RDAC/MPP driver as the BUSY status
> > handler retries indefinitely.  We need a solution which works for both a
> > bare metal host running RDAC/MPP which for this use case, wants to get
> > control over the failed command ASAP and a VMware host which may need to
> > retry longer than DID_BUS_BUSY currently allows for.
> 
> No it doesn't, not any longer... the mid-layer retries for the command
> up to its timeout before failing.  That's the point about questioning
> the validity of the original problem.
> 
> James
> 
> 

I think I see your argument ... retries for BUSY and all other scsi/host
status's are limited by the code in scsi_softirq_done which filters the
disposition returned by scsi_decide_disposition, so no status will yield
an indefinite retry.

Not clear if that's soon enough for RDAC/MPP.  For the VMware case, it
appears to allow an additional 30 seconds (beyond what DID_BUSY_BUSY
would allow) for a retry. 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-02-02 23:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-02 22:04 [PATCH 0/2] : definion, code, and use of new SCSI ML host status DID_COND_REQUEUE Edward Goggin
2007-02-02 22:54 ` James Bottomley
2007-02-02 23:11   ` Edward Goggin
2007-02-02 23:18     ` James Bottomley
2007-02-02 23:33       ` Edward Goggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox