public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
@ 2010-10-26 13:53 Chauhan, Vijay
  2010-10-26 18:55 ` Malahal Naineni
  2010-10-26 19:18 ` [dm-devel] " Mike Christie
  0 siblings, 2 replies; 7+ messages in thread
From: Chauhan, Vijay @ 2010-10-26 13:53 UTC (permalink / raw)
  To: James Bottomley, device-mapper development
  Cc: sekharan@us.ibm.com, linux-scsi@vger.kernel.org

Resubmitting this patch to get the attention.

This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer, as a reason momentarily path failure is noticed by DM multipath. 

Signed-off-by: Vijay Chauhan<vijay.chauhan@lsi.com>
Reviewed-by: Babu Moger <babu.moger@lsi.com>
Reviewed-by: Bob Stankey <Robert.stankey@lsi.com>
---

diff -uprN linux-2.6.35-rc6-orig/drivers/scsi/device_handler/scsi_dh_rdac.c linux-2.6.35-rc6/drivers/scsi/device_handler/scsi_dh_rdac.c
--- linux-2.6.35-rc6-orig/drivers/scsi/device_handler/scsi_dh_rdac.c	2010-07-22 15:13:38.000000000 -0400
+++ linux-2.6.35-rc6/drivers/scsi/device_handler/scsi_dh_rdac.c	2010-07-27 12:13:58.000000000 -0400
@@ -738,6 +738,11 @@ static int rdac_check_sense(struct scsi_
 			 * Quiescence in progress , just retry.
 			 */
 			return ADD_TO_MLQUEUE;
+		if (sense_hdr->asc == 0x3f && sense_hdr->ascq == 0x03)
+			/*
+			 * INQUIRY DATA has changed, retry again.
+			 */
+			return ADD_TO_MLQUEUE;
 		break;
 	}
 	/* success just means we do not care what scsi-ml does */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 13:53 [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn Chauhan, Vijay
@ 2010-10-26 18:55 ` Malahal Naineni
  2010-10-26 19:18 ` [dm-devel] " Mike Christie
  1 sibling, 0 replies; 7+ messages in thread
From: Malahal Naineni @ 2010-10-26 18:55 UTC (permalink / raw)
  To: device-mapper development, linux-scsi@vger.kernel.org

Chauhan, Vijay [Vijay.Chauhan@lsi.com] wrote:
> Resubmitting this patch to get the attention.
> 
> This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer, as a reason momentarily path failure is noticed by DM multipath. 
> 
> Signed-off-by: Vijay Chauhan<vijay.chauhan@lsi.com>
> Reviewed-by: Babu Moger <babu.moger@lsi.com>
> Reviewed-by: Bob Stankey <Robert.stankey@lsi.com>
> ---
> 
> diff -uprN linux-2.6.35-rc6-orig/drivers/scsi/device_handler/scsi_dh_rdac.c linux-2.6.35-rc6/drivers/scsi/device_handler/scsi_dh_rdac.c
> --- linux-2.6.35-rc6-orig/drivers/scsi/device_handler/scsi_dh_rdac.c	2010-07-22 15:13:38.000000000 -0400
> +++ linux-2.6.35-rc6/drivers/scsi/device_handler/scsi_dh_rdac.c	2010-07-27 12:13:58.000000000 -0400
> @@ -738,6 +738,11 @@ static int rdac_check_sense(struct scsi_
>  			 * Quiescence in progress , just retry.
>  			 */
>  			return ADD_TO_MLQUEUE;
> +		if (sense_hdr->asc == 0x3f && sense_hdr->ascq == 0x03)
> +			/*
> +			 * INQUIRY DATA has changed, retry again.
> +			 */
> +			return ADD_TO_MLQUEUE;

The code looks fine.  Is this ASC/ASCQ code specific to RDAC devices? If
not, how about changing the scsi_error.c itself?

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 13:53 [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn Chauhan, Vijay
  2010-10-26 18:55 ` Malahal Naineni
@ 2010-10-26 19:18 ` Mike Christie
  2010-10-26 19:21   ` James Bottomley
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Christie @ 2010-10-26 19:18 UTC (permalink / raw)
  To: device-mapper development
  Cc: Chauhan, Vijay, James Bottomley, linux-scsi@vger.kernel.org

On 10/26/2010 08:53 AM, Chauhan, Vijay wrote:
> Resubmitting this patch to get the attention.
>
> This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer, as a reason momentarily path failure is noticed by DM multipath.
>

Is it getting failed by accident? In scsi_io_completion we check for UAs 
and will retry if the removable bit is not set. That check is after 
scsi_end_request though (is the scsi_end_request call failing the IO).

Did you guys also want REPORTED_LUNS_DATA_HAS_CHANGED to be retried too. 
I think scsi_dh_alua's REPORTED_LUNS_DATA_HAS_CHANGED maybe should be 
genericly retried, because it seems for both errors we will want to 
retry for all devices.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 19:18 ` [dm-devel] " Mike Christie
@ 2010-10-26 19:21   ` James Bottomley
  2010-10-26 19:32     ` Shyam_Iyer
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2010-10-26 19:21 UTC (permalink / raw)
  To: Mike Christie
  Cc: device-mapper development, Chauhan, Vijay, James Bottomley,
	linux-scsi@vger.kernel.org

On Tue, 2010-10-26 at 14:18 -0500, Mike Christie wrote:
> On 10/26/2010 08:53 AM, Chauhan, Vijay wrote:
> > Resubmitting this patch to get the attention.
> >
> > This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer, as a reason momentarily path failure is noticed by DM multipath.
> >
> 
> Is it getting failed by accident? In scsi_io_completion we check for UAs 
> and will retry if the removable bit is not set. That check is after 
> scsi_end_request though (is the scsi_end_request call failing the IO).
> 
> Did you guys also want REPORTED_LUNS_DATA_HAS_CHANGED to be retried too. 
> I think scsi_dh_alua's REPORTED_LUNS_DATA_HAS_CHANGED maybe should be 
> genericly retried, because it seems for both errors we will want to 
> retry for all devices.

So my primary worry about patches like this is that it eats AENs ...
this is fine because, as Mike says, we should just ignore them.

However, the moment we start processing AENs (as another set of dm
people promise they have in process) we'll lose them from rdac arrays
and people will get unhappy.

If the generic UA retry isn't working, let's fix it there rather than
these hacks that would be hard to spot and pull out when (if) we ever
get a generic AEN infrastructure.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 19:21   ` James Bottomley
@ 2010-10-26 19:32     ` Shyam_Iyer
  2010-10-26 19:58       ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Shyam_Iyer @ 2010-10-26 19:32 UTC (permalink / raw)
  To: James.Bottomley, michaelc; +Cc: dm-devel, Vijay.Chauhan, jejb, linux-scsi

 -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of James
> Bottomley
> Sent: Tuesday, October 26, 2010 3:22 PM
> To: Mike Christie
> Cc: device-mapper development; Chauhan, Vijay; James Bottomley; linux-scsi@vger.kernel.org
> Subject: Re: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
> 
> On Tue, 2010-10-26 at 14:18 -0500, Mike Christie wrote:
> > On 10/26/2010 08:53 AM, Chauhan, Vijay wrote:
> > > Resubmitting this patch to get the attention.
> > >
> > > This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in
> rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer,
> as a reason momentarily path failure is noticed by DM multipath.
> > >
> >
> > Is it getting failed by accident? In scsi_io_completion we check for UAs
> > and will retry if the removable bit is not set. That check is after
> > scsi_end_request though (is the scsi_end_request call failing the IO).
> >
> > Did you guys also want REPORTED_LUNS_DATA_HAS_CHANGED to be retried too.
> > I think scsi_dh_alua's REPORTED_LUNS_DATA_HAS_CHANGED maybe should be
> > genericly retried, because it seems for both errors we will want to
> > retry for all devices.
> 
> So my primary worry about patches like this is that it eats AENs ...
> this is fine because, as Mike says, we should just ignore them.
> 
> However, the moment we start processing AENs (as another set of dm
> people promise they have in process) we'll lose them from rdac arrays
> and people will get unhappy.
> 
> If the generic UA retry isn't working, let's fix it there rather than
> these hacks that would be hard to spot and pull out when (if) we ever
> get a generic AEN infrastructure.
> 
> James

Sometimes the default way to handle a UA may be not the correct one. One arrays implementation to respond to the UA could be different from another array.

Example: A thin provisioning threshold exceed check condition. The device handler infrastructure can be a savior with such hacks.. 

-Shyam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 19:32     ` Shyam_Iyer
@ 2010-10-26 19:58       ` James Bottomley
  2010-10-26 20:27         ` Shyam_Iyer
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2010-10-26 19:58 UTC (permalink / raw)
  To: Shyam_Iyer; +Cc: michaelc, dm-devel, Vijay.Chauhan, jejb, linux-scsi

On Wed, 2010-10-27 at 01:02 +0530, Shyam_Iyer@Dell.com wrote:
> -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of James
> > Bottomley
> > Sent: Tuesday, October 26, 2010 3:22 PM
> > To: Mike Christie
> > Cc: device-mapper development; Chauhan, Vijay; James Bottomley; linux-scsi@vger.kernel.org
> > Subject: Re: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
> > 
> > On Tue, 2010-10-26 at 14:18 -0500, Mike Christie wrote:
> > > On 10/26/2010 08:53 AM, Chauhan, Vijay wrote:
> > > > Resubmitting this patch to get the attention.
> > > >
> > > > This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in
> > rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid layer,
> > as a reason momentarily path failure is noticed by DM multipath.
> > > >
> > >
> > > Is it getting failed by accident? In scsi_io_completion we check for UAs
> > > and will retry if the removable bit is not set. That check is after
> > > scsi_end_request though (is the scsi_end_request call failing the IO).
> > >
> > > Did you guys also want REPORTED_LUNS_DATA_HAS_CHANGED to be retried too.
> > > I think scsi_dh_alua's REPORTED_LUNS_DATA_HAS_CHANGED maybe should be
> > > genericly retried, because it seems for both errors we will want to
> > > retry for all devices.
> > 
> > So my primary worry about patches like this is that it eats AENs ...
> > this is fine because, as Mike says, we should just ignore them.
> > 
> > However, the moment we start processing AENs (as another set of dm
> > people promise they have in process) we'll lose them from rdac arrays
> > and people will get unhappy.
> > 
> > If the generic UA retry isn't working, let's fix it there rather than
> > these hacks that would be hard to spot and pull out when (if) we ever
> > get a generic AEN infrastructure.
> > 
> > James
> 
> Sometimes the default way to handle a UA may be not the correct one.
> One arrays implementation to respond to the UA could be different from
> another array.

I don't quite understand this observation in the context of this patch.
What the patch does is retry the command (i.e. ignore the UA) which
should pretty much be our default response.

> Example: A thin provisioning threshold exceed check condition. The
> device handler infrastructure can be a savior with such hacks.. 

I'm going to regret asking this (especially given all the noise there's
been on thin provisioning thresholds):  Which arrays don't actually
issue them correctly?

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
  2010-10-26 19:58       ` James Bottomley
@ 2010-10-26 20:27         ` Shyam_Iyer
  0 siblings, 0 replies; 7+ messages in thread
From: Shyam_Iyer @ 2010-10-26 20:27 UTC (permalink / raw)
  To: James.Bottomley; +Cc: michaelc, dm-devel, Vijay.Chauhan, jejb, linux-scsi

> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@suse.de]
> Sent: Tuesday, October 26, 2010 3:59 PM
> To: Iyer, Shyam
> Cc: michaelc@cs.wisc.edu; dm-devel@redhat.com; Vijay.Chauhan@lsi.com; jejb@kernel.org; linux-
> scsi@vger.kernel.org
> Subject: RE: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn
> 
> On Wed, 2010-10-27 at 01:02 +0530, Shyam_Iyer@Dell.com wrote:
> > -----Original Message-----
> > > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of
> James
> > > Bottomley
> > > Sent: Tuesday, October 26, 2010 3:22 PM
> > > To: Mike Christie
> > > Cc: device-mapper development; Chauhan, Vijay; James Bottomley; linux-scsi@vger.kernel.org
> > > Subject: Re: [dm-devel] [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense
> fn
> > >
> > > On Tue, 2010-10-26 at 14:18 -0500, Mike Christie wrote:
> > > > On 10/26/2010 08:53 AM, Chauhan, Vijay wrote:
> > > > > Resubmitting this patch to get the attention.
> > > > >
> > > > > This patch adds retry for the IO returned with 06/3f/03((INQUIRY_DATA_CHANGED)) sense code  in
> > > rdac_check_sense(). IO returned with 06/3f/03 from controller are currently failed by scsi mid
> layer,
> > > as a reason momentarily path failure is noticed by DM multipath.
> > > > >
> > > >
> > > > Is it getting failed by accident? In scsi_io_completion we check for UAs
> > > > and will retry if the removable bit is not set. That check is after
> > > > scsi_end_request though (is the scsi_end_request call failing the IO).
> > > >
> > > > Did you guys also want REPORTED_LUNS_DATA_HAS_CHANGED to be retried too.
> > > > I think scsi_dh_alua's REPORTED_LUNS_DATA_HAS_CHANGED maybe should be
> > > > genericly retried, because it seems for both errors we will want to
> > > > retry for all devices.
> > >
> > > So my primary worry about patches like this is that it eats AENs ...
> > > this is fine because, as Mike says, we should just ignore them.
> > >
> > > However, the moment we start processing AENs (as another set of dm
> > > people promise they have in process) we'll lose them from rdac arrays
> > > and people will get unhappy.
> > >
> > > If the generic UA retry isn't working, let's fix it there rather than
> > > these hacks that would be hard to spot and pull out when (if) we ever
> > > get a generic AEN infrastructure.
> > >
> > > James
> >
> > Sometimes the default way to handle a UA may be not the correct one.
> > One arrays implementation to respond to the UA could be different from
> > another array.
> 
> I don't quite understand this observation in the context of this patch.
> What the patch does is retry the command (i.e. ignore the UA) which
> should pretty much be our default response.
> 

Observation in this context is based on the move here to make UA handling generic.
 
> > Example: A thin provisioning threshold exceed check condition. The
> > device handler infrastructure can be a savior with such hacks..
> 
> I'm going to regret asking this (especially given all the noise there's
> been on thin provisioning thresholds):  Which arrays don't actually
> issue them correctly?
> 


Its not a question of correct vs incorrect. The implementation detail of how a thin provisioning threshold UA should be handled is outside the scope of T10 and so there are different architectures which deal with this situation differently. Viewing them all with the same prism may cause issues.

The spec only says it should be retried.. and no more. What administrative actions need to be taken based on the event is outside the scope of T10...

It is more of a flexibility to handle the retries based on the array architecture..


Shyam

> James
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-26 20:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-26 13:53 [RESUBMIT][Patch] scsi_dh_rdac: retry IO for 06/3f/03 in rdac_check_sense fn Chauhan, Vijay
2010-10-26 18:55 ` Malahal Naineni
2010-10-26 19:18 ` [dm-devel] " Mike Christie
2010-10-26 19:21   ` James Bottomley
2010-10-26 19:32     ` Shyam_Iyer
2010-10-26 19:58       ` James Bottomley
2010-10-26 20:27         ` Shyam_Iyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox