[PATCH 2/5] fusion: vmware bug fix prevent inifinite retries

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
@ 2007-01-05  3:46 Eric Moore
  2007-01-06 15:30 ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Moore @ 2007-01-05  3:46 UTC (permalink / raw)
  To: linux-scsi, James.Bottomley

Simialar to the previous bug fix, this patch avoids
inifinite retries. This is replicated in Vmware environment
when the guest OS is using the 53c1030 as a emulated storage controller.
This patch will restore original driver behavior for SPI from about
a year ago, though keeping same behavor for FC/SAS for which it
solved a failover issue.

Signed-off-by: Eric Moore <Eric.Moore@lsi.com>

diff -uarpN b/drivers/message/fusion/mptscsih.c a/drivers/message/fusion/mptscsih.c
--- b/drivers/message/fusion/mptscsih.c	2006-12-25 01:51:26.000000000 -0700
+++ a/drivers/message/fusion/mptscsih.c	2006-12-25 01:55:05.000000000 -0700
@@ -779,7 +779,7 @@ mptscsih_io_done(MPT_ADAPTER *ioc, MPT_F
 			sc->resid=0;
 		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	/* 0x0040 */
 		case MPI_IOCSTATUS_SUCCESS:			/* 0x0000 */
-			if (scsi_status == MPI_SCSI_STATUS_BUSY)
+			if (ioc->bus_type != SPI && scsi_status == MPI_SCSI_STATUS_BUSY)
 				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
 			else
 				sc->result = (DID_OK << 16) | scsi_status;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-05  3:46 Eric Moore
@ 2007-01-06 15:30 ` James Bottomley
  2007-01-06 16:10   ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2007-01-06 15:30 UTC (permalink / raw)
  To: Eric Moore; +Cc: linux-scsi

On Thu, 2007-01-04 at 20:46 -0700, Eric Moore wrote:
> -			if (scsi_status == MPI_SCSI_STATUS_BUSY)
> +			if (ioc->bus_type != SPI && scsi_status == MPI_SCSI_STATUS_BUSY)
>  				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
>  			else
>  				sc->result = (DID_OK << 16) | scsi_status;

DID_BUS_BUSY causes an immediate retry, but it does debit the retry
count, so it shouldn't cause "infinite retries" ... if it does, there's
something else wrong here.

I should also point out that the MPI_SCSI_STATUS_BUSY is
SAM_STAT_BUSY ... this return will cause a queue stop and a requeue, but
it doesn't actually debit the retries, so it *may* cause an infinite
loop if the system is permanently busy.

Finally, whatever's causing this, it should probably be treated the same
for all fusion bus types ...

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-06 15:30 ` James Bottomley
@ 2007-01-06 16:10   ` Matthew Wilcox
  2007-01-06 16:28     ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2007-01-06 16:10 UTC (permalink / raw)
  To: James Bottomley; +Cc: Eric Moore, linux-scsi

On Sat, Jan 06, 2007 at 09:30:45AM -0600, James Bottomley wrote:
> On Thu, 2007-01-04 at 20:46 -0700, Eric Moore wrote:
> > -			if (scsi_status == MPI_SCSI_STATUS_BUSY)
> > +			if (ioc->bus_type != SPI && scsi_status == MPI_SCSI_STATUS_BUSY)
> >  				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
> >  			else
> >  				sc->result = (DID_OK << 16) | scsi_status;
> 
> DID_BUS_BUSY causes an immediate retry, but it does debit the retry
> count, so it shouldn't cause "infinite retries" ... if it does, there's
> something else wrong here.

I wonder if this is the same bug I'm chasing (on ia64 machines,
reproduced with both Montecito and Madison).  

The symptom is a stack overflow caused by this infinite loop:

generic_unplug_device
__generic_unplug_device
  scsi_request_fn [1]
  blk_requeue_request
  elv_requeue_request
  __elv_add_request
__generic_unplug_device
  scsi_request_fn [2]
  blk_requeue_request
  elv_requeue_request
  __elv_add_request
__generic_unplug_device
  scsi_request_fn [3]
  scsi_dispatch_cmd
  scsi_queue_insert
  blk_insert_request
  scsi_request_fn [4]
  blk_plug_device

(stack dump courtesy of incrementing a counter each time through
__generic_unplug_device and checking it in blk_plug_device() and
__generic_unplug_device)

I don't see how it happens; as far as I can tell, by the time we're
going to call blk_plug_device() in scsi_request_fn [4], there's no way
to unplug the queue again before it gets back to scsi_request_fn [3]
... and from the point where we call scsi_dispatch_cmd(), we immediately
either break or test blk_queue_plugged() and exit.  There should be no
way for it to call blk_requeue_request() again.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-06 16:10   ` Matthew Wilcox
@ 2007-01-06 16:28     ` James Bottomley
  0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2007-01-06 16:28 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Eric Moore, linux-scsi

On Sat, 2007-01-06 at 09:10 -0700, Matthew Wilcox wrote:
> On Sat, Jan 06, 2007 at 09:30:45AM -0600, James Bottomley wrote:
> > On Thu, 2007-01-04 at 20:46 -0700, Eric Moore wrote:
> > > -			if (scsi_status == MPI_SCSI_STATUS_BUSY)
> > > +			if (ioc->bus_type != SPI && scsi_status == MPI_SCSI_STATUS_BUSY)
> > >  				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
> > >  			else
> > >  				sc->result = (DID_OK << 16) | scsi_status;
> > 
> > DID_BUS_BUSY causes an immediate retry, but it does debit the retry
> > count, so it shouldn't cause "infinite retries" ... if it does, there's
> > something else wrong here.
> 
> I wonder if this is the same bug I'm chasing (on ia64 machines,
> reproduced with both Montecito and Madison).  

I don't think so ... the done() breaks the stack chain by queueing it up
for the softirq to complete.  In order to get the recursion you see, we
have to go straight to requeue from dispatch_cmd.

At a rough guess I'd say what you're seeing could be a result of the
requeue in scsi_dispatch_cmd caused by the device being in SDEV_BLOCK.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
@ 2007-01-08 22:03 Moore, Eric
  2007-01-08 22:24 ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Moore, Eric @ 2007-01-08 22:03 UTC (permalink / raw)
  To: James Bottomley, manon, azimman; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On  Saturday, January 06, 2007 8:31 AM, James Bottomley wrote:

> 
> DID_BUS_BUSY causes an immediate retry, but it does debit the retry
> count, so it shouldn't cause "infinite retries" ... if it 
> does, there's
> something else wrong here.
> 
> I should also point out that the MPI_SCSI_STATUS_BUSY is
> SAM_STAT_BUSY ... this return will cause a queue stop and a 
> requeue, but
> it doesn't actually debit the retries, so it *may* cause an infinite
> loop if the system is permanently busy.
> 
> Finally, whatever's causing this, it should probably be 
> treated the same
> for all fusion bus types ...
> 

James -  I was incorrect in the way I worded this patch.  Please read
further.

Original request came to me an you from Manon Goo <manon@manon.de> on
November 21, see attached.

Here is what VMware says, per Adam Zimman <azimman@vmware.com>:

"VMkernel emulates a 1030/SPI.   Path Failovers induce the vmkernel to
return a BUSY
status for VM initiated SCSI I/O requests.  After a few I/O commands are
returned with
BUSY status, the RHEL VM will make the disk read-only.  The host status
of DID_BUS_BUSY
causes the RHEL scsi error recovery process to retry a BUSY I/O at most
5 times and 
then return an I/O failure upward in the I/O stack. If the I/O request
failed with a scsi 
status of BUSY rather than a host status of DID_BUS_BUSY, the RHEL scsi
error recovery process
would retry the I/O indefinitely."

In the 03.02.19, we add added the current logic for the following
reason:

"When a target device responds with BUSY status, the MPT driver was
sending DID_OK to the 
SCSI mid layer, which caused the IO to be retried indefinitely between
the mid layer and the 
driver.  By changing the driver return status to DID_BUS_BUSY, the
target BUSY status can 
now flow through the mid layer to an upper layer Failover driver, which
will manage the I/O timeout."

Eric 

[-- Attachment #2: Type: message/rfc822, Size: 3958 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 1605 bytes --]

Dear Sirs,

When changing from kernel 2.6.13 to 2.6.14 a change to the mtpscsih.c 
driver was introduced thet changed the behaviour of the driver in respect 
to timeouts.

The introduced changes around line 760.  As far as I undestand this change 
propagets a bussy device running async as a host failture.
This is extremely troublesome when using the mptscsi driver with vmware ESX 
because esx expects the driver to wait when doing SAN pathfailovers or 
going async.
Is there any chance to have the old behavior ?

Thanks in advance
Manon Goo

                        break;

+               case MPI_IOCSTATUS_SCSI_DATA_OVERRUN:           /* 0x0044 */
+                       sc->resid=0;
                case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:        /* 0x0040 */
                case MPI_IOCSTATUS_SUCCESS:                     /* 0x0000 */
-                       scsi_status = pScsiReply->SCSIStatus;
-                       sc->result = (DID_OK << 16) | scsi_status;
+                       if (scsi_status == MPI_SCSI_STATUS_BUSY)
+                               sc->result = (DID_BUS_BUSY << 16) | 
scsi_status;
+                       else
+                               sc->result = (DID_OK << 16) | scsi_status;
                        if (scsi_state == 0) {
                                ;
                        } else if (scsi_state & 
MPI_SCSI_STATE_AUTOSENSE_VALID) {

Manon Goo
Dembach Goo Informatik GmbH & Co KG
Rathenauplatz 9
D-50674 Köln
Tel: +49 221 801483 0
Mobil: +49 177 8091974
Fax: +49 221 801483 20
Email: manon@dg-i.net

[-- Attachment #2.1.2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-08 22:03 Moore, Eric
@ 2007-01-08 22:24 ` James Bottomley
  0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2007-01-08 22:24 UTC (permalink / raw)
  To: Moore, Eric; +Cc: manon, azimman, linux-scsi

On Mon, 2007-01-08 at 15:03 -0700, Moore, Eric wrote:
> In the 03.02.19, we add added the current logic for the following
> reason:
> 
> "When a target device responds with BUSY status, the MPT driver was
> sending DID_OK to the 
> SCSI mid layer, which caused the IO to be retried indefinitely between
> the mid layer and the 
> driver.  By changing the driver return status to DID_BUS_BUSY, the
> target BUSY status can 
> now flow through the mid layer to an upper layer Failover driver, which
> will manage the I/O timeout."

Right, I sort of suspected something like this.  BUSY/QUEUE_FULL
handling was a bit iffy in 2.4; but it was sorted out in the 2003/4
timeframe.  Nowadays, I think you want to translate the
MPI_SCSI_STATUS_BUSY directly to SAM_STAT_BUSY (i.e. just remove the
special casing if).

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
@ 2007-01-09  1:37 Moore, Eric
  2007-01-09 16:17 ` Michael Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Moore, Eric @ 2007-01-09  1:37 UTC (permalink / raw)
  To: James Bottomley
  Cc: manon, azimman, linux-scsi, Shirron, Stephen, Michael Reed

On  Monday, January 08, 2007 3:25 PM, James Bottomley wrote:

> Right, I sort of suspected something like this.  BUSY/QUEUE_FULL
> handling was a bit iffy in 2.4; but it was sorted out in the 2003/4
> timeframe.  Nowadays, I think you want to translate the
> MPI_SCSI_STATUS_BUSY directly to SAM_STAT_BUSY (i.e. just remove the
> special casing if).
> 

I think your'e on the same page with the folks from VMware,
where the've asked us to go back to our old driver code.
Meaning we kill the check for "MPI_SCSI_STATUS_BUSY", instead the sam
status
is sent back "as is" without changing the DID_OK to DID_BUS_BUSY, etc. 

My problem with that is whether is breaks the Fibre Channel Folks. 
Will FC failover solution work properly if we go back to the old code?
I add Stephen Shirron and Mike Reed. 
I don't know.   Here is an explanation why that fix was needed back
about a year ago:

"When a target device responds with BUSY status, the MPT driver was
sending DID_OK to the 
SCSI mid layer, which caused the IO to be retried indefinitely between
the mid layer and the 
driver.  By changing the driver return status to DID_BUS_BUSY, the
target BUSY status can 
now flow through the mid layer to an upper layer Failover driver, which
will manage the I/O timeout."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-09  1:37 [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries Moore, Eric
@ 2007-01-09 16:17 ` Michael Reed
  2007-01-09 17:49   ` Manon Goo
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Reed @ 2007-01-09 16:17 UTC (permalink / raw)
  To: Moore, Eric; +Cc: James Bottomley, manon, azimman, linux-scsi, Shirron, Stephen



Moore, Eric wrote:
> On  Monday, January 08, 2007 3:25 PM, James Bottomley wrote:
> 
>> Right, I sort of suspected something like this.  BUSY/QUEUE_FULL
>> handling was a bit iffy in 2.4; but it was sorted out in the 2003/4
>> timeframe.  Nowadays, I think you want to translate the
>> MPI_SCSI_STATUS_BUSY directly to SAM_STAT_BUSY (i.e. just remove the
>> special casing if).

Christoph put in code to limit a command's lifetime to prevent infinite
loops in the case of QUEUE_FULL and BUSY.  (See scsi_softirq_done()
for implementation.)

DID_OK / COMMAND_COMPLETE / BUSY results in a ADD_TO_MLQUEUE for a retry,
same as QUEUE_FULL.  I don't infinite retries, just a whole lot of them.
See scsi_decide_disposition().

Mike

>>
> 
> I think your'e on the same page with the folks from VMware,
> where the've asked us to go back to our old driver code.
> Meaning we kill the check for "MPI_SCSI_STATUS_BUSY", instead the sam
> status
> is sent back "as is" without changing the DID_OK to DID_BUS_BUSY, etc. 
> 
> My problem with that is whether is breaks the Fibre Channel Folks. 
> Will FC failover solution work properly if we go back to the old code?
> I add Stephen Shirron and Mike Reed. 
> I don't know.   Here is an explanation why that fix was needed back
> about a year ago:
> 
> 
> "When a target device responds with BUSY status, the MPT driver was
> sending DID_OK to the 
> SCSI mid layer, which caused the IO to be retried indefinitely between
> the mid layer and the 
> driver.  By changing the driver return status to DID_BUS_BUSY, the
> target BUSY status can 
> now flow through the mid layer to an upper layer Failover driver, which
> will manage the I/O timeout."
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-09 16:17 ` Michael Reed
@ 2007-01-09 17:49   ` Manon Goo
  0 siblings, 0 replies; 15+ messages in thread
From: Manon Goo @ 2007-01-09 17:49 UTC (permalink / raw)
  To: Michael Reed, Moore, Eric, David Berghoff
  Cc: James Bottomley, azimman, linux-scsi, Shirron, Stephen

[-- Attachment #1: Type: text/plain, Size: 3719 bytes --]

Hmm .... why don't w make the whole thing configurable (david implemented 
this for us)


+/*
+ *  cmd line parameters
+ */
+static int mpt_mpi_busy;
+module_param(mpt_mpi_busy, int, 0);
+MODULE_PARM_DESC(mpt_mpi_busy, " MPT MPI busy workaround for VMWare ESX 
(default=0)");
+
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/

 typedef struct _BIG_SENSE_BUF {
@@ -704,10 +711,13 @@
                        sc->resid=0;
                case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:        /* 0x0040 */
                case MPI_IOCSTATUS_SUCCESS:                     /* 0x0000 */
-                       if (scsi_status == MPI_SCSI_STATUS_BUSY)
+                       if ((scsi_status == MPI_SCSI_STATUS_BUSY) && 
!mpt_mpi_busy)
                                sc->result = (DID_BUS_BUSY << 16) | 
scsi_status;
-                       else
+                       else {
+                                if (mpt_mpi_busy)
+                                        printk(KERN_INFO "MPT MPI ESX busy 
hack enabled ... waiting\n");
                                sc->result = (DID_OK << 16) | scsi_status;
+                        }
                        if (scsi_state == 0) {
                                ;
                        } else if (scsi_state & 
MPI_SCSI_STATE_AUTOSENSE_VALID) {



The prink(KERN... could be set if mpt_mpi_busy == 2 to make debugging of 
the situation optional

Manon

--On 9. Januar 2007 10:17:17 -0600 Michael Reed <mdr@sgi.com> wrote:

>
>
> Moore, Eric wrote:
>> On  Monday, January 08, 2007 3:25 PM, James Bottomley wrote:
>>
>>> Right, I sort of suspected something like this.  BUSY/QUEUE_FULL
>>> handling was a bit iffy in 2.4; but it was sorted out in the 2003/4
>>> timeframe.  Nowadays, I think you want to translate the
>>> MPI_SCSI_STATUS_BUSY directly to SAM_STAT_BUSY (i.e. just remove the
>>> special casing if).
>
> Christoph put in code to limit a command's lifetime to prevent infinite
> loops in the case of QUEUE_FULL and BUSY.  (See scsi_softirq_done()
> for implementation.)
>
> DID_OK / COMMAND_COMPLETE / BUSY results in a ADD_TO_MLQUEUE for a retry,
> same as QUEUE_FULL.  I don't infinite retries, just a whole lot of them.
> See scsi_decide_disposition().
>
> Mike
>
>>>
>>
>> I think your'e on the same page with the folks from VMware,
>> where the've asked us to go back to our old driver code.
>> Meaning we kill the check for "MPI_SCSI_STATUS_BUSY", instead the sam
>> status
>> is sent back "as is" without changing the DID_OK to DID_BUS_BUSY, etc.
>>
>> My problem with that is whether is breaks the Fibre Channel Folks.
>> Will FC failover solution work properly if we go back to the old code?
>> I add Stephen Shirron and Mike Reed.
>> I don't know.   Here is an explanation why that fix was needed back
>> about a year ago:
>>
>>
>> "When a target device responds with BUSY status, the MPT driver was
>> sending DID_OK to the
>> SCSI mid layer, which caused the IO to be retried indefinitely between
>> the mid layer and the
>> driver.  By changing the driver return status to DID_BUS_BUSY, the
>> target BUSY status can
>> now flow through the mid layer to an upper layer Failover driver, which
>> will manage the I/O timeout."
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>



Manon Goo
Dembach Goo Informatik GmbH & Co KG
Rathenauplatz 9
D-50674 Köln
Tel: +49 221 801483 0
Mobil: +49 177 8091974
Fax: +49 221 801483 20
Email: manon@dg-i.net


[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
@ 2007-01-09 18:14 Adam Zimman
  2007-01-09 20:55 ` Petr Vandrovec
  0 siblings, 1 reply; 15+ messages in thread
From: Adam Zimman @ 2007-01-09 18:14 UTC (permalink / raw)
  To: Manon Goo, Michael Reed, Moore, Eric, David Berghoff,
	Petr Vandrovec, Vicky Xu
  Cc: James Bottomley, linux-scsi, Shirron, Stephen

Adding VMware engineering... 

-----Original Message-----
From: Manon Goo [mailto:manon@manon.de] 
Sent: Tuesday, January 09, 2007 9:49 AM
To: Michael Reed; Moore, Eric; David Berghoff
Cc: James Bottomley; Adam Zimman; linux-scsi@vger.kernel.org; Shirron,
Stephen
Subject: Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries

Hmm .... why don't w make the whole thing configurable (david implemented
this for us)


+/*
+ *  cmd line parameters
+ */
+static int mpt_mpi_busy;
+module_param(mpt_mpi_busy, int, 0);
+MODULE_PARM_DESC(mpt_mpi_busy, " MPT MPI busy workaround for VMWare ESX 
(default=0)");
+
 
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
*/

 typedef struct _BIG_SENSE_BUF {
@@ -704,10 +711,13 @@
                        sc->resid=0;
                case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:        /* 0x0040 */
                case MPI_IOCSTATUS_SUCCESS:                     /* 0x0000 */
-                       if (scsi_status == MPI_SCSI_STATUS_BUSY)
+                       if ((scsi_status == MPI_SCSI_STATUS_BUSY) && 
!mpt_mpi_busy)
                                sc->result = (DID_BUS_BUSY << 16) | 
scsi_status;
-                       else
+                       else {
+                                if (mpt_mpi_busy)
+                                        printk(KERN_INFO "MPT MPI ESX busy 
hack enabled ... waiting\n");
                                sc->result = (DID_OK << 16) | scsi_status;
+                        }
                        if (scsi_state == 0) {
                                ;
                        } else if (scsi_state & 
MPI_SCSI_STATE_AUTOSENSE_VALID) {



The prink(KERN... could be set if mpt_mpi_busy == 2 to make debugging of 
the situation optional

Manon

--On 9. Januar 2007 10:17:17 -0600 Michael Reed <mdr@sgi.com> wrote:

>
>
> Moore, Eric wrote:
>> On  Monday, January 08, 2007 3:25 PM, James Bottomley wrote:
>>
>>> Right, I sort of suspected something like this.  BUSY/QUEUE_FULL
>>> handling was a bit iffy in 2.4; but it was sorted out in the 2003/4
>>> timeframe.  Nowadays, I think you want to translate the
>>> MPI_SCSI_STATUS_BUSY directly to SAM_STAT_BUSY (i.e. just remove the
>>> special casing if).
>
> Christoph put in code to limit a command's lifetime to prevent infinite
> loops in the case of QUEUE_FULL and BUSY.  (See scsi_softirq_done()
> for implementation.)
>
> DID_OK / COMMAND_COMPLETE / BUSY results in a ADD_TO_MLQUEUE for a retry,
> same as QUEUE_FULL.  I don't infinite retries, just a whole lot of them.
> See scsi_decide_disposition().
>
> Mike
>
>>>
>>
>> I think your'e on the same page with the folks from VMware,
>> where the've asked us to go back to our old driver code.
>> Meaning we kill the check for "MPI_SCSI_STATUS_BUSY", instead the sam
>> status
>> is sent back "as is" without changing the DID_OK to DID_BUS_BUSY, etc.
>>
>> My problem with that is whether is breaks the Fibre Channel Folks.
>> Will FC failover solution work properly if we go back to the old code?
>> I add Stephen Shirron and Mike Reed.
>> I don't know.   Here is an explanation why that fix was needed back
>> about a year ago:
>>
>>
>> "When a target device responds with BUSY status, the MPT driver was
>> sending DID_OK to the
>> SCSI mid layer, which caused the IO to be retried indefinitely between
>> the mid layer and the
>> driver.  By changing the driver return status to DID_BUS_BUSY, the
>> target BUSY status can
>> now flow through the mid layer to an upper layer Failover driver, which
>> will manage the I/O timeout."
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>



Manon Goo
Dembach Goo Informatik GmbH & Co KG
Rathenauplatz 9
D-50674 Köln
Tel: +49 221 801483 0
Mobil: +49 177 8091974
Fax: +49 221 801483 20
Email: manon@dg-i.net

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-09 18:14 Adam Zimman
@ 2007-01-09 20:55 ` Petr Vandrovec
  2007-01-09 21:32   ` Edward Goggin
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2007-01-09 20:55 UTC (permalink / raw)
  To: Adam Zimman
  Cc: Manon Goo, Michael Reed, Moore, Eric, David Berghoff, Vicky Xu,
	James Bottomley, linux-scsi, Shirron, Stephen

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

Adam Zimman wrote:
> Adding VMware engineering... 
> 
> -----Original Message-----
> From: Manon Goo [mailto:manon@manon.de] 
> Sent: Tuesday, January 09, 2007 9:49 AM
> To: Michael Reed; Moore, Eric; David Berghoff
> Cc: James Bottomley; Adam Zimman; linux-scsi@vger.kernel.org; Shirron, Stephen
> Subject: Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
> 
> Hmm .... why don't w make the whole thing configurable (david implemented this for us)

In that case I would prefer going on with automatic detection of our 
implementation - see first part of attached mpt-patch.diff I offered 
when this all started.  As passing options to modules loaded by initrd 
is not exactly trivial and varies across distributions, I would prefer 
this runtime detection...
				Thanks,
					Petr Vandrovec

> 
> +/*
> + *  cmd line parameters
> + */
> +static int mpt_mpi_busy;
> +module_param(mpt_mpi_busy, int, 0);
> +MODULE_PARM_DESC(mpt_mpi_busy, " MPT MPI busy workaround for VMWare ESX 
> (default=0)");
> +
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> 
>  typedef struct _BIG_SENSE_BUF {
> @@ -704,10 +711,13 @@
>                         sc->resid=0;
>                 case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:        /* 0x0040 */
>                 case MPI_IOCSTATUS_SUCCESS:                     /* 0x0000 */
> -                       if (scsi_status == MPI_SCSI_STATUS_BUSY)
> +                       if ((scsi_status == MPI_SCSI_STATUS_BUSY) && 
> !mpt_mpi_busy)
>                                 sc->result = (DID_BUS_BUSY << 16) | 
> scsi_status;
> -                       else
> +                       else {
> +                                if (mpt_mpi_busy)
> +                                        printk(KERN_INFO "MPT MPI ESX busy 
> hack enabled ... waiting\n");
>                                 sc->result = (DID_OK << 16) | scsi_status;
> +                        }
>                         if (scsi_state == 0) {
>                                 ;
>                         } else if (scsi_state & 
> MPI_SCSI_STATE_AUTOSENSE_VALID) {

[-- Attachment #2: mpt-patch.diff --]
[-- Type: text/plain, Size: 3079 bytes --]

Patch to fix timeouts during multi-path failover with LSILogic adapter on RedHat 4.0 Update 3 & 4.
Patch is needed for other distributions as well - for example current Linus's 2.6.19 kernel needs it.


diff -urN linux-2.6.9.orig/drivers/message/fusion/mptbase.c linux-2.6.9/drivers/message/fusion/mptbase.c
--- linux-2.6.9.orig/drivers/message/fusion/mptbase.c	2006-12-15 11:13:47.000000000 -0800
+++ linux-2.6.9/drivers/message/fusion/mptbase.c	2006-12-15 11:43:01.000000000 -0800
@@ -1435,12 +1435,21 @@
 		ioc->bus_type = SPI;
 		/* 1030 Chip Fix. Disable Split transactions
 		 * for PCIX. Set MOST bits to zero if Rev < C0( = 8).
+		 *
+		 * Also detect VMware's LSILogic emulation - it does
+		 * not have PCI-X capability at offset 0x68 (and does
+		 * not need disabling split transactions although it
+		 * reports itself as revision 1).
 		 */
 		pci_read_config_byte(pdev, PCI_CLASS_REVISION, &revision);
 		if (revision < C0_1030) {
-			pci_read_config_byte(pdev, 0x6a, &pcixcmd);
-			pcixcmd &= 0x8F;
-			pci_write_config_byte(pdev, 0x6a, pcixcmd);
+			if (pci_find_capability(pdev, PCI_CAP_ID_PCIX) == 0x68) {
+				pci_read_config_byte(pdev, 0x6a, &pcixcmd);
+				pcixcmd &= 0x8F;
+				pci_write_config_byte(pdev, 0x6a, pcixcmd);
+			} else {
+				ioc->isVMware = 1;
+			}
 		}
 	}
 	else if (pdev->device == MPI_MANUFACTPAGE_DEVID_1030_53C1035) {
diff -urN linux-2.6.9.orig/drivers/message/fusion/mptbase.h linux-2.6.9/drivers/message/fusion/mptbase.h
--- linux-2.6.9.orig/drivers/message/fusion/mptbase.h	2006-12-15 11:13:47.000000000 -0800
+++ linux-2.6.9/drivers/message/fusion/mptbase.h	2006-12-15 11:40:47.000000000 -0800
@@ -673,7 +673,8 @@
 	u8			 upload_fw;	/* If set, do a fw upload */
 	u8			 reload_fw;	/* Force a FW Reload on next reset */
 	u8			 NBShiftFactor;  /* NB Shift Factor based on Block Size (Facts)  */
-	u8			 pad1[4];
+	u8			 isVMware;
+	u8			 pad1[3];
 	int			 DoneCtx;
 	int			 TaskCtx;
 	int			 InternalCtx;
diff -urN linux-2.6.9.orig/drivers/message/fusion/mptscsi.c linux-2.6.9/drivers/message/fusion/mptscsi.c
--- linux-2.6.9.orig/drivers/message/fusion/mptscsi.c	2006-12-15 11:13:47.000000000 -0800
+++ linux-2.6.9/drivers/message/fusion/mptscsi.c	2006-12-15 11:49:18.000000000 -0800
@@ -773,7 +773,12 @@
 			sc->resid=0;
 		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	/* 0x0040 */
 		case MPI_IOCSTATUS_SUCCESS:			/* 0x0000 */
-			if (scsi_status == MPI_SCSI_STATUS_BUSY)
+			/*
+			 * In the case of emulated adapter busy status may be reported 
+			 * for minutes when storage path switch occurs in the firmware.
+			 * We definitely do not want to give up after standard timeout.
+			 */
+			if (scsi_status == MPI_SCSI_STATUS_BUSY && !ioc->isVMware)
 				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
 			else
 				sc->result = (DID_OK << 16) | scsi_status;
@@ -810,6 +815,7 @@
 				 * Not real sure here either so do nothing...  */
 			}
 
+			/* Perhaps this wanted to test scsi_status and not sc->result? */
 			if (sc->result == MPI_SCSI_STATUS_TASK_SET_FULL)
 				mptscsih_report_queue_full(sc, pScsiReply, pScsiReq);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-09 20:55 ` Petr Vandrovec
@ 2007-01-09 21:32   ` Edward Goggin
  2007-01-10 16:10     ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Goggin @ 2007-01-09 21:32 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adam Zimman, Petr Vandrovec, dgreen, Manon Goo, Michael Reed,
	Moore, Eric, David Berghoff, Vicky Xu, James Bottomley,
	Shirron, Stephen

[-- Attachment #1: Type: text/plain, Size: 6489 bytes --]

On Tue, 2007-01-09 at 12:55 -0800, Petr Vandrovec wrote:
> Adam Zimman wrote:
> > Adding VMware engineering... 
> > 
> > -----Original Message-----
> > From: Manon Goo [mailto:manon@manon.de] 
> > Sent: Tuesday, January 09, 2007 9:49 AM
> > To: Michael Reed; Moore, Eric; David Berghoff
> > Cc: James Bottomley; Adam Zimman; linux-scsi@vger.kernel.org; Shirron, Stephen
> > Subject: Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
> > 
> > Hmm .... why don't w make the whole thing configurable (david implemented this for us)
> 
> In that case I would prefer going on with automatic detection of our 
> implementation - see first part of attached mpt-patch.diff I offered 
> when this all started.  As passing options to modules loaded by initrd 
> is not exactly trivial and varies across distributions, I would prefer 
> this runtime detection...
> 				Thanks,
> 					Petr Vandrovec
> 

The attached (untested) patch shows a VMware and scsi transport agnostic
approach which introduces a new host status (DID_QUALIFIED_REQUEUE) to
be used by mptscsih.c (and other LLDs) instead of DID_BUS_BUSY.  A host
status of DID_QUALIFIED_REQUEUE will return ADD_TO_MLQUEUE from
scsi_decide_disposition IFF the REQ_FAILFAST bit is not set in the
cmd_flags field of the SCSI command's request structure.

The approach depends on both VMware Linux guests not setting
REQ_FAILFAST and non-VMware Linux hosts with an IBM RDAC/MPP multi-
pathing driver doing so.  This requirement is not a problem for VMware
since its guest operating systems have no need to configure block device
multi-pathing.  This requirement shouldn't be a problem for the IBM
RDAC/MPP driver either since it should already be setting the
REQ_FAILFAST attribute of I/Os for which it is providing multi-pathing,
similar to what the Linux dm-multipath driver already does.

Ed Goggin

> > 
> > +/*
> > + *  cmd line parameters
> > + */
> > +static int mpt_mpi_busy;
> > +module_param(mpt_mpi_busy, int, 0);
> > +MODULE_PARM_DESC(mpt_mpi_busy, " MPT MPI busy workaround for VMWare ESX 
> > (default=0)");
> > +
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> > 
> >  typedef struct _BIG_SENSE_BUF {
> > @@ -704,10 +711,13 @@
> >                         sc->resid=0;
> >                 case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:        /* 0x0040 */
> >                 case MPI_IOCSTATUS_SUCCESS:                     /* 0x0000 */
> > -                       if (scsi_status == MPI_SCSI_STATUS_BUSY)
> > +                       if ((scsi_status == MPI_SCSI_STATUS_BUSY) && 
> > !mpt_mpi_busy)
> >                                 sc->result = (DID_BUS_BUSY << 16) | 
> > scsi_status;
> > -                       else
> > +                       else {
> > +                                if (mpt_mpi_busy)
> > +                                        printk(KERN_INFO "MPT MPI ESX busy 
> > hack enabled ... waiting\n");
> >                                 sc->result = (DID_OK << 16) | scsi_status;
> > +                        }
> >                         if (scsi_state == 0) {
> >                                 ;
> >                         } else if (scsi_state & 
> > MPI_SCSI_STATE_AUTOSENSE_VALID) {
> plain text document attachment (mpt-patch.diff)
> Patch to fix timeouts during multi-path failover with LSILogic adapter on RedHat 4.0 Update 3 & 4.
> Patch is needed for other distributions as well - for example current Linus's 2.6.19 kernel needs it.
> 
> 
> diff -urN linux-2.6.9.orig/drivers/message/fusion/mptbase.c linux-2.6.9/drivers/message/fusion/mptbase.c
> --- linux-2.6.9.orig/drivers/message/fusion/mptbase.c	2006-12-15 11:13:47.000000000 -0800
> +++ linux-2.6.9/drivers/message/fusion/mptbase.c	2006-12-15 11:43:01.000000000 -0800
> @@ -1435,12 +1435,21 @@
>  		ioc->bus_type = SPI;
>  		/* 1030 Chip Fix. Disable Split transactions
>  		 * for PCIX. Set MOST bits to zero if Rev < C0( = 8).
> +		 *
> +		 * Also detect VMware's LSILogic emulation - it does
> +		 * not have PCI-X capability at offset 0x68 (and does
> +		 * not need disabling split transactions although it
> +		 * reports itself as revision 1).
>  		 */
>  		pci_read_config_byte(pdev, PCI_CLASS_REVISION, &revision);
>  		if (revision < C0_1030) {
> -			pci_read_config_byte(pdev, 0x6a, &pcixcmd);
> -			pcixcmd &= 0x8F;
> -			pci_write_config_byte(pdev, 0x6a, pcixcmd);
> +			if (pci_find_capability(pdev, PCI_CAP_ID_PCIX) == 0x68) {
> +				pci_read_config_byte(pdev, 0x6a, &pcixcmd);
> +				pcixcmd &= 0x8F;
> +				pci_write_config_byte(pdev, 0x6a, pcixcmd);
> +			} else {
> +				ioc->isVMware = 1;
> +			}
>  		}
>  	}
>  	else if (pdev->device == MPI_MANUFACTPAGE_DEVID_1030_53C1035) {
> diff -urN linux-2.6.9.orig/drivers/message/fusion/mptbase.h linux-2.6.9/drivers/message/fusion/mptbase.h
> --- linux-2.6.9.orig/drivers/message/fusion/mptbase.h	2006-12-15 11:13:47.000000000 -0800
> +++ linux-2.6.9/drivers/message/fusion/mptbase.h	2006-12-15 11:40:47.000000000 -0800
> @@ -673,7 +673,8 @@
>  	u8			 upload_fw;	/* If set, do a fw upload */
>  	u8			 reload_fw;	/* Force a FW Reload on next reset */
>  	u8			 NBShiftFactor;  /* NB Shift Factor based on Block Size (Facts)  */
> -	u8			 pad1[4];
> +	u8			 isVMware;
> +	u8			 pad1[3];
>  	int			 DoneCtx;
>  	int			 TaskCtx;
>  	int			 InternalCtx;
> diff -urN linux-2.6.9.orig/drivers/message/fusion/mptscsi.c linux-2.6.9/drivers/message/fusion/mptscsi.c
> --- linux-2.6.9.orig/drivers/message/fusion/mptscsi.c	2006-12-15 11:13:47.000000000 -0800
> +++ linux-2.6.9/drivers/message/fusion/mptscsi.c	2006-12-15 11:49:18.000000000 -0800
> @@ -773,7 +773,12 @@
>  			sc->resid=0;
>  		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	/* 0x0040 */
>  		case MPI_IOCSTATUS_SUCCESS:			/* 0x0000 */
> -			if (scsi_status == MPI_SCSI_STATUS_BUSY)
> +			/*
> +			 * In the case of emulated adapter busy status may be reported 
> +			 * for minutes when storage path switch occurs in the firmware.
> +			 * We definitely do not want to give up after standard timeout.
> +			 */
> +			if (scsi_status == MPI_SCSI_STATUS_BUSY && !ioc->isVMware)
>  				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
>  			else
>  				sc->result = (DID_OK << 16) | scsi_status;
> @@ -810,6 +815,7 @@
>  				 * Not real sure here either so do nothing...  */
>  			}
>  
> +			/* Perhaps this wanted to test scsi_status and not sc->result? */
>  			if (sc->result == MPI_SCSI_STATUS_TASK_SET_FULL)
>  				mptscsih_report_queue_full(sc, pScsiReply, pScsiReq);
>  

[-- Attachment #2: vmware_patch --]
[-- Type: text/x-patch, Size: 2008 bytes --]

diff -ru linux-2.6.20-rc3/drivers/message/fusion/mptscsih.c linux-2.6.20-rc3.vmware_patch/drivers/message/fusion/mptscsih.c
--- linux-2.6.20-rc3/drivers/message/fusion/mptscsih.c	2007-01-09 15:10:28.000019000 -0500
+++ linux-2.6.20-rc3.vmware_patch/drivers/message/fusion/mptscsih.c	2007-01-09 15:46:26.000105000 -0500
@@ -769,7 +769,7 @@
 		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	/* 0x0040 */
 		case MPI_IOCSTATUS_SUCCESS:			/* 0x0000 */
 			if (scsi_status == MPI_SCSI_STATUS_BUSY)
-				sc->result = (DID_BUS_BUSY << 16) | scsi_status;
+				sc->result = (DID_QUALIFIED_REQUEUE << 16) | scsi_status;
 			else
 				sc->result = (DID_OK << 16) | scsi_status;
 			if (scsi_state == 0) {
diff -ru linux-2.6.20-rc3/drivers/scsi/scsi_error.c linux-2.6.20-rc3.vmware_patch/drivers/scsi/scsi_error.c
--- linux-2.6.20-rc3/drivers/scsi/scsi_error.c	2007-01-09 14:40:16.000022000 -0500
+++ linux-2.6.20-rc3.vmware_patch/drivers/scsi/scsi_error.c	2007-01-09 15:55:48.000051000 -0500
@@ -1216,6 +1216,14 @@
 	case DID_IMM_RETRY:
 		return NEEDS_RETRY;
 
+	case DID_QUALIFIED_REQUEUE:
+		/*
+		 * Return immediately w/o requeue if the request 
+		 * indicates no retry.
+		 */
+		if (blk_noretry_request(scmd->request)) {
+			return SUCCESS;
+		}
 	case DID_REQUEUE:
 		return ADD_TO_MLQUEUE;
 
diff -ru linux-2.6.20-rc3/include/scsi/scsi.h linux-2.6.20-rc3.vmware_patch/include/scsi/scsi.h
--- linux-2.6.20-rc3/include/scsi/scsi.h	2007-01-09 15:07:30.000009000 -0500
+++ linux-2.6.20-rc3.vmware_patch/include/scsi/scsi.h	2007-01-09 15:45:41.000075000 -0500
@@ -309,6 +309,8 @@
 #define DID_IMM_RETRY   0x0c	/* Retry without decrementing retry count  */
 #define DID_REQUEUE	0x0d	/* Requeue command (no immediate retry) also
 				 * without decrementing the retry count	   */
+#define DID_QUALIFIED_REQUEUE 0x0e /* Requeue cmd w/o decr of retry count IFF
+				 * not blk_noretry_request		   */
 #define DRIVER_OK       0x00	/* Driver status                           */
 
 /*

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
@ 2007-01-10  5:44 Moore, Eric
  0 siblings, 0 replies; 15+ messages in thread
From: Moore, Eric @ 2007-01-10  5:44 UTC (permalink / raw)
  To: Edward Goggin, linux-scsi
  Cc: Adam Zimman, Petr Vandrovec, dgreen, Manon Goo, Michael Reed,
	David Berghoff, Vicky Xu, James Bottomley, Shirron, Stephen

On Tuesday, January 09, 2007 2:33 PM, Edward Goggin wrote: 

> multi-pathing.  This requirement shouldn't be a problem for the IBM
> RDAC/MPP driver either since it should already be setting the
> REQ_FAILFAST attribute of I/Os for which it is providing 
> multi-pathing,
> similar to what the Linux dm-multipath driver already does.
> 

This approach seems to be the best.   The other one from Petr is
good as well, however I've not had time today to verify whether 
this detection algorithm will work properly, as we have built pcie 1030
parts,
and its checking for pcix compatibilty at offset 0x68.

With regards to this patch, I need to contact the mpp folks to see
whether they are setting the REQ_FAILFAST attribute.   I will
followup shortly.

Eric 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-09 21:32   ` Edward Goggin
@ 2007-01-10 16:10     ` James Bottomley
  2007-01-10 16:44       ` Edward Goggin
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2007-01-10 16:10 UTC (permalink / raw)
  To: Edward Goggin
  Cc: linux-scsi, Adam Zimman, Petr Vandrovec, dgreen, Manon Goo,
	Michael Reed, Moore, Eric, David Berghoff, Vicky Xu,
	Shirron, Stephen

On Tue, 2007-01-09 at 16:32 -0500, Edward Goggin wrote:
> The attached (untested) patch shows a VMware and scsi transport agnostic
> approach which introduces a new host status (DID_QUALIFIED_REQUEUE) to
> be used by mptscsih.c (and other LLDs) instead of DID_BUS_BUSY.  A host
> status of DID_QUALIFIED_REQUEUE will return ADD_TO_MLQUEUE from
> scsi_decide_disposition IFF the REQ_FAILFAST bit is not set in the
> cmd_flags field of the SCSI command's request structure.
> 
> The approach depends on both VMware Linux guests not setting
> REQ_FAILFAST and non-VMware Linux hosts with an IBM RDAC/MPP multi-
> pathing driver doing so.  This requirement is not a problem for VMware
> since its guest operating systems have no need to configure block device
> multi-pathing.  This requirement shouldn't be a problem for the IBM
> RDAC/MPP driver either since it should already be setting the
> REQ_FAILFAST attribute of I/Os for which it is providing multi-pathing,
> similar to what the Linux dm-multipath driver already does.

Not in the driver, please ...

the SAM status BUSY is a well known one for array controllers to return
while contemplating a failover.  Thus, if we think this is the issue,
the mid-layer should be the entity to pass the status through on
REQ_FAILFAST not the driver (i.e. pass SAM_STAT_BUSY through unmodified
and alter the mid-layer).  However, I'd be unhappy about doing this:
BUSY is a standard return for a lot of controllers for transient
resource conditions, which wouldn't necessarily be alleviated on path
failover.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries
  2007-01-10 16:10     ` James Bottomley
@ 2007-01-10 16:44       ` Edward Goggin
  0 siblings, 0 replies; 15+ messages in thread
From: Edward Goggin @ 2007-01-10 16:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Adam Zimman, Petr Vandrovec, dgreen, Manon Goo,
	Michael Reed, Moore, Eric, David Berghoff, Vicky Xu,
	Shirron, Stephen

On Wed, 2007-01-10 at 08:10 -0800, James Bottomley wrote:
> On Tue, 2007-01-09 at 16:32 -0500, Edward Goggin wrote:
> > The attached (untested) patch shows a VMware and scsi transport agnostic
> > approach which introduces a new host status (DID_QUALIFIED_REQUEUE) to
> > be used by mptscsih.c (and other LLDs) instead of DID_BUS_BUSY.  A host
> > status of DID_QUALIFIED_REQUEUE will return ADD_TO_MLQUEUE from
> > scsi_decide_disposition IFF the REQ_FAILFAST bit is not set in the
> > cmd_flags field of the SCSI command's request structure.
> > 
> > The approach depends on both VMware Linux guests not setting
> > REQ_FAILFAST and non-VMware Linux hosts with an IBM RDAC/MPP multi-
> > pathing driver doing so.  This requirement is not a problem for VMware
> > since its guest operating systems have no need to configure block device
> > multi-pathing.  This requirement shouldn't be a problem for the IBM
> > RDAC/MPP driver either since it should already be setting the
> > REQ_FAILFAST attribute of I/Os for which it is providing multi-pathing,
> > similar to what the Linux dm-multipath driver already does.
> 
> Not in the driver, please ...
> 
> the SAM status BUSY is a well known one for array controllers to return
> while contemplating a failover.  Thus, if we think this is the issue,
> the mid-layer should be the entity to pass the status through on
> REQ_FAILFAST not the driver (i.e. pass SAM_STAT_BUSY through unmodified
> and alter the mid-layer).  However, I'd be unhappy about doing this:
> BUSY is a standard return for a lot of controllers for transient
> resource conditions, which wouldn't necessarily be alleviated on path
> failover.
> 
> James
> 

I share your concern about affecting the semantics of SAM_STAT_BUSY.
This is why the patch introduces a new host status instead of simply
changing the code for BUSY in scsi_decide_disposition to requeue only if
not REQ_FAILFAST.

The new host status is meant to provide the option for LLDs to enact
conditional requeuing of the cmd (conditional based on the REQ_FAILFAST
setting) which is not throttled by the cmd's retry count -- a capability
which is not there currently.

> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-01-10 16:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-09  1:37 [PATCH 2/5] fusion: vmware bug fix prevent inifinite retries Moore, Eric
2007-01-09 16:17 ` Michael Reed
2007-01-09 17:49   ` Manon Goo
  -- strict thread matches above, loose matches on Subject: below --
2007-01-10  5:44 Moore, Eric
2007-01-09 18:14 Adam Zimman
2007-01-09 20:55 ` Petr Vandrovec
2007-01-09 21:32   ` Edward Goggin
2007-01-10 16:10     ` James Bottomley
2007-01-10 16:44       ` Edward Goggin
2007-01-08 22:03 Moore, Eric
2007-01-08 22:24 ` James Bottomley
2007-01-05  3:46 Eric Moore
2007-01-06 15:30 ` James Bottomley
2007-01-06 16:10   ` Matthew Wilcox
2007-01-06 16:28     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox