public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
       [not found]     ` <5424472E.4060300@linux.vnet.ibm.com>
@ 2014-09-25 16:57       ` Christoph Hellwig
  2014-09-30 18:05         ` wenxiong
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-09-25 16:57 UTC (permalink / raw)
  To: Brian King; +Cc: device-mapper development, linux-scsi, James Bottomley

On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
> The issue we've run into started when this patch started making its
> way into distros:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
> 
> That changed the behaviour for user initiated TUR commands. After an ipr
> adapter gets reset, all disk array devices require a start unit command
> to be issued to them before they will accept commands. So, with the SCSI
> EH change, we now end up in a scenario with dual ipr adapters where the
> TUR getting issued from the health checker returns with a Not Ready response
> and since SCSI EH no longer triggers the Start Unit in this scenario,
> the path never recovers.
> 
> The alternative solution would be to change the TUR path checker in multipath-tools
> to issue a Start Unit if it sees a 02/04/02.

Or we could fix up the check introduced by the commit, with something
ala:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a2c3d3d..7228d9e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 	if (! scsi_command_normalize_sense(scmd, &sshdr))
 		return FAILED;	/* no valid sense data */
 
-	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+	if (scmd->cmnd[0] == TEST_UNIT_READY &&
+	    scmd->request->cmd_type == REQ_TYPE_FS &&
+	    scmd->scsi_done != scsi_eh_done) {
 		/*
 		 * nasty: for mid-layer issued TURs, we need to return the
 		 * actual sense data without any recovery attempt.  For eh
-		 * issued ones, we need to try to recover and interpret
+		 * issued ones, we need to try to recover and interpret,
+		 * and for pass through TURs we just need to stay out of the
+		 * way, so that the device handlers can do the right thing.
 		 */
 		return SUCCESS;
+	}
 
 	scsi_report_sense(sdev, &sshdr);
 

> 
> Thanks,
> 
> Brian
> 
> -- 
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
  2014-09-25 16:57       ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig
@ 2014-09-30 18:05         ` wenxiong
  2014-10-01 12:51           ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: wenxiong @ 2014-09-30 18:05 UTC (permalink / raw)
  To: device-mapper development, Christoph Hellwig
  Cc: Brian King, James Bottomley, linux-scsi


Quoting Christoph Hellwig <hch@infradead.org>:

> On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
>> The issue we've run into started when this patch started making its
>> way into distros:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
>>
>> That changed the behaviour for user initiated TUR commands. After an ipr
>> adapter gets reset, all disk array devices require a start unit command
>> to be issued to them before they will accept commands. So, with the SCSI
>> EH change, we now end up in a scenario with dual ipr adapters where the
>> TUR getting issued from the health checker returns with a Not Ready response
>> and since SCSI EH no longer triggers the Start Unit in this scenario,
>> the path never recovers.
>>
>> The alternative solution would be to change the TUR path checker in  
>> multipath-tools
>> to issue a Start Unit if it sees a 02/04/02.
>
> Or we could fix up the check introduced by the commit, with something
> ala:
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index a2c3d3d..7228d9e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>  	if (! scsi_command_normalize_sense(scmd, &sshdr))
>  		return FAILED;	/* no valid sense data */
>
> -	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> +	if (scmd->cmnd[0] == TEST_UNIT_READY &&
> +	    scmd->request->cmd_type == REQ_TYPE_FS &&
> +	    scmd->scsi_done != scsi_eh_done) {
>  		/*
>  		 * nasty: for mid-layer issued TURs, we need to return the
>  		 * actual sense data without any recovery attempt.  For eh
> -		 * issued ones, we need to try to recover and interpret
> +		 * issued ones, we need to try to recover and interpret,
> +		 * and for pass through TURs we just need to stay out of the
> +		 * way, so that the device handlers can do the right thing.
>  		 */
>  		return SUCCESS;
> +	}
>
>  	scsi_report_sense(sdev, &sshdr);
>
>

Hi Christoph,

We have verified above patch in our test group system yesterday and  
today. It works fine with their testcases.

Thanks,
Wendy


>>
>> Thanks,
>>
>> Brian
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> ---end quoted text---
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
  2014-09-30 18:05         ` wenxiong
@ 2014-10-01 12:51           ` Christoph Hellwig
  2014-10-06 15:22             ` Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-10-01 12:51 UTC (permalink / raw)
  To: wenxiong
  Cc: device-mapper development, Christoph Hellwig, Brian King,
	James Bottomley, linux-scsi

Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
commands are sent as BLOCK_PC, so this would basically revert James'
original fix for the SATL case.

Am I right to assume you only need the call to scsi_dh->check_sense and
not the rest of the handling for the multipath path checker?  If that's
the case something like the patch below sould work:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5db8454..399c1c8 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 	if (! scsi_command_normalize_sense(scmd, &sshdr))
 		return FAILED;	/* no valid sense data */
 
-	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
-		/*
-		 * nasty: for mid-layer issued TURs, we need to return the
-		 * actual sense data without any recovery attempt.  For eh
-		 * issued ones, we need to try to recover and interpret
-		 */
-		return SUCCESS;
-
 	scsi_report_sense(sdev, &sshdr);
 
 	if (scsi_sense_is_deferred(&sshdr))
@@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 		/* handler does not care. Drop down to default handling */
 	}
 
+	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+		/*
+		 * nasty: for mid-layer issued TURs, we need to return the
+		 * actual sense data without any recovery attempt.  For eh
+		 * issued ones, we need to try to recover and interpret
+		 */
+		return SUCCESS;
+
 	/*
 	 * Previous logic looked for FILEMARK, EOM or ILI which are
 	 * mainly associated with tapes and returned SUCCESS.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
  2014-10-01 12:51           ` Christoph Hellwig
@ 2014-10-06 15:22             ` Brian King
  2014-10-06 21:50               ` wenxiong
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2014-10-06 15:22 UTC (permalink / raw)
  To: Christoph Hellwig, wenxiong
  Cc: device-mapper development, James Bottomley, linux-scsi

On 10/01/2014 07:51 AM, Christoph Hellwig wrote:
> Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
> commands are sent as BLOCK_PC, so this would basically revert James'
> original fix for the SATL case.
> 
> Am I right to assume you only need the call to scsi_dh->check_sense and
> not the rest of the handling for the multipath path checker?  If that's
> the case something like the patch below sould work:

This would work if we also duplicated the 02/04/02 K/C/Q check in alua_check_sense
handler.

Wendy - can you try my patch below, along with Christoph's latest patch here
and see if that resolves the issue?

Thanks,

Brian

> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5db8454..399c1c8 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>  	if (! scsi_command_normalize_sense(scmd, &sshdr))
>  		return FAILED;	/* no valid sense data */
> 
> -	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> -		/*
> -		 * nasty: for mid-layer issued TURs, we need to return the
> -		 * actual sense data without any recovery attempt.  For eh
> -		 * issued ones, we need to try to recover and interpret
> -		 */
> -		return SUCCESS;
> -
>  	scsi_report_sense(sdev, &sshdr);
> 
>  	if (scsi_sense_is_deferred(&sshdr))
> @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>  		/* handler does not care. Drop down to default handling */
>  	}
> 
> +	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> +		/*
> +		 * nasty: for mid-layer issued TURs, we need to return the
> +		 * actual sense data without any recovery attempt.  For eh
> +		 * issued ones, we need to try to recover and interpret
> +		 */
> +		return SUCCESS;
> +
>  	/*
>  	 * Previous logic looked for FILEMARK, EOM or ILI which are
>  	 * mainly associated with tapes and returned SUCCESS.
> 



Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 drivers/scsi/device_handler/scsi_dh_alua.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff -puN drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart drivers/scsi/device_handler/scsi_dh_alua.c
--- linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart	2014-10-06 10:19:16.184798305 -0500
+++ linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c	2014-10-06 10:20:35.743165951 -0500
@@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_
 			 * LUN Not Ready -- Offline
 			 */
 			return SUCCESS;
+		if (sdev->allow_restart &&
+		    (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02))
+			/*
+			 * if the device is not started, we need to wake
+			 * the error handler to start the motor
+			 */
+			return FAILED;
 		break;
 	case UNIT_ATTENTION:
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00)
_


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
  2014-10-06 15:22             ` Brian King
@ 2014-10-06 21:50               ` wenxiong
  2014-10-21 11:03                 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: wenxiong @ 2014-10-06 21:50 UTC (permalink / raw)
  To: device-mapper development, Brian King
  Cc: Christoph Hellwig, linux-scsi, James Bottomley


Quoting Brian King <brking@linux.vnet.ibm.com>:

> On 10/01/2014 07:51 AM, Christoph Hellwig wrote:
>> Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
>> commands are sent as BLOCK_PC, so this would basically revert James'
>> original fix for the SATL case.
>>
>> Am I right to assume you only need the call to scsi_dh->check_sense and
>> not the rest of the handling for the multipath path checker?  If that's
>> the case something like the patch below sould work:
>
> This would work if we also duplicated the 02/04/02 K/C/Q check in  
> alua_check_sense
> handler.
>
> Wendy - can you try my patch below, along with Christoph's latest patch here
> and see if that resolves the issue?
>
> Thanks,
>
> Brian
>
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 5db8454..399c1c8 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>>  	if (! scsi_command_normalize_sense(scmd, &sshdr))
>>  		return FAILED;	/* no valid sense data */
>>
>> -	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
>> -		/*
>> -		 * nasty: for mid-layer issued TURs, we need to return the
>> -		 * actual sense data without any recovery attempt.  For eh
>> -		 * issued ones, we need to try to recover and interpret
>> -		 */
>> -		return SUCCESS;
>> -
>>  	scsi_report_sense(sdev, &sshdr);
>>
>>  	if (scsi_sense_is_deferred(&sshdr))
>> @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>>  		/* handler does not care. Drop down to default handling */
>>  	}
>>
>> +	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
>> +		/*
>> +		 * nasty: for mid-layer issued TURs, we need to return the
>> +		 * actual sense data without any recovery attempt.  For eh
>> +		 * issued ones, we need to try to recover and interpret
>> +		 */
>> +		return SUCCESS;
>> +
>>  	/*
>>  	 * Previous logic looked for FILEMARK, EOM or ILI which are
>>  	 * mainly associated with tapes and returned SUCCESS.
>>
>
>
>
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>
>  drivers/scsi/device_handler/scsi_dh_alua.c |    7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff -puN  
> drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart  
> drivers/scsi/device_handler/scsi_dh_alua.c
> ---  
> linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart	2014-10-06 10:19:16.184798305  
> -0500
> +++  
> linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c	2014-10-06  
> 10:20:35.743165951 -0500
> @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_
>  			 * LUN Not Ready -- Offline
>  			 */
>  			return SUCCESS;
> +		if (sdev->allow_restart &&
> +		    (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02))
> +			/*
> +			 * if the device is not started, we need to wake
> +			 * the error handler to start the motor
> +			 */
> +			return FAILED;
>  		break;
>  	case UNIT_ATTENTION:
>  		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00)
> _
>
> --

Sorry it took some time since we need to re-config the systems for this test.

With Christoph's new patch only, still saw the failure.
With Christoph's new patch + Brian's patch, works fine, didn't see the  
failure.


Thanks,
Wendy


> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
  2014-10-06 21:50               ` wenxiong
@ 2014-10-21 11:03                 ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2014-10-21 11:03 UTC (permalink / raw)
  To: wenxiong
  Cc: device-mapper development, Brian King, Christoph Hellwig,
	linux-scsi, James Bottomley

On Mon, Oct 06, 2014 at 05:50:32PM -0400, wenxiong@linux.vnet.ibm.com wrote:
> Sorry it took some time since we need to re-config the systems for this test.
> 
> With Christoph's new patch only, still saw the failure.
> With Christoph's new patch + Brian's patch, works fine, didn't see the
> failure.

Can one of you send me a tested series with both patches?

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-21 11:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140924195721.148637349@linux.vnet.ibm.com>
     [not found] ` <20140924195752.289011902@linux.vnet.ibm.com>
     [not found]   ` <5423B664.2020007@suse.de>
     [not found]     ` <5424472E.4060300@linux.vnet.ibm.com>
2014-09-25 16:57       ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig
2014-09-30 18:05         ` wenxiong
2014-10-01 12:51           ` Christoph Hellwig
2014-10-06 15:22             ` Brian King
2014-10-06 21:50               ` wenxiong
2014-10-21 11:03                 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox