* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
[not found] ` <5424472E.4060300@linux.vnet.ibm.com>
@ 2014-09-25 16:57 ` Christoph Hellwig
2014-09-30 18:05 ` wenxiong
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-09-25 16:57 UTC (permalink / raw)
To: Brian King; +Cc: device-mapper development, linux-scsi, James Bottomley
On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
> The issue we've run into started when this patch started making its
> way into distros:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
>
> That changed the behaviour for user initiated TUR commands. After an ipr
> adapter gets reset, all disk array devices require a start unit command
> to be issued to them before they will accept commands. So, with the SCSI
> EH change, we now end up in a scenario with dual ipr adapters where the
> TUR getting issued from the health checker returns with a Not Ready response
> and since SCSI EH no longer triggers the Start Unit in this scenario,
> the path never recovers.
>
> The alternative solution would be to change the TUR path checker in multipath-tools
> to issue a Start Unit if it sees a 02/04/02.
Or we could fix up the check introduced by the commit, with something
ala:
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a2c3d3d..7228d9e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
if (! scsi_command_normalize_sense(scmd, &sshdr))
return FAILED; /* no valid sense data */
- if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+ if (scmd->cmnd[0] == TEST_UNIT_READY &&
+ scmd->request->cmd_type == REQ_TYPE_FS &&
+ scmd->scsi_done != scsi_eh_done) {
/*
* nasty: for mid-layer issued TURs, we need to return the
* actual sense data without any recovery attempt. For eh
- * issued ones, we need to try to recover and interpret
+ * issued ones, we need to try to recover and interpret,
+ * and for pass through TURs we just need to stay out of the
+ * way, so that the device handlers can do the right thing.
*/
return SUCCESS;
+ }
scsi_report_sense(sdev, &sshdr);
>
> Thanks,
>
> Brian
>
> --
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
2014-09-25 16:57 ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig
@ 2014-09-30 18:05 ` wenxiong
2014-10-01 12:51 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: wenxiong @ 2014-09-30 18:05 UTC (permalink / raw)
To: device-mapper development, Christoph Hellwig
Cc: Brian King, James Bottomley, linux-scsi
Quoting Christoph Hellwig <hch@infradead.org>:
> On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
>> The issue we've run into started when this patch started making its
>> way into distros:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
>>
>> That changed the behaviour for user initiated TUR commands. After an ipr
>> adapter gets reset, all disk array devices require a start unit command
>> to be issued to them before they will accept commands. So, with the SCSI
>> EH change, we now end up in a scenario with dual ipr adapters where the
>> TUR getting issued from the health checker returns with a Not Ready response
>> and since SCSI EH no longer triggers the Start Unit in this scenario,
>> the path never recovers.
>>
>> The alternative solution would be to change the TUR path checker in
>> multipath-tools
>> to issue a Start Unit if it sees a 02/04/02.
>
> Or we could fix up the check introduced by the commit, with something
> ala:
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index a2c3d3d..7228d9e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
> if (! scsi_command_normalize_sense(scmd, &sshdr))
> return FAILED; /* no valid sense data */
>
> - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> + if (scmd->cmnd[0] == TEST_UNIT_READY &&
> + scmd->request->cmd_type == REQ_TYPE_FS &&
> + scmd->scsi_done != scsi_eh_done) {
> /*
> * nasty: for mid-layer issued TURs, we need to return the
> * actual sense data without any recovery attempt. For eh
> - * issued ones, we need to try to recover and interpret
> + * issued ones, we need to try to recover and interpret,
> + * and for pass through TURs we just need to stay out of the
> + * way, so that the device handlers can do the right thing.
> */
> return SUCCESS;
> + }
>
> scsi_report_sense(sdev, &sshdr);
>
>
Hi Christoph,
We have verified above patch in our test group system yesterday and
today. It works fine with their testcases.
Thanks,
Wendy
>>
>> Thanks,
>>
>> Brian
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> ---end quoted text---
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
2014-09-30 18:05 ` wenxiong
@ 2014-10-01 12:51 ` Christoph Hellwig
2014-10-06 15:22 ` Brian King
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-10-01 12:51 UTC (permalink / raw)
To: wenxiong
Cc: device-mapper development, Christoph Hellwig, Brian King,
James Bottomley, linux-scsi
Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
commands are sent as BLOCK_PC, so this would basically revert James'
original fix for the SATL case.
Am I right to assume you only need the call to scsi_dh->check_sense and
not the rest of the handling for the multipath path checker? If that's
the case something like the patch below sould work:
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5db8454..399c1c8 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
if (! scsi_command_normalize_sense(scmd, &sshdr))
return FAILED; /* no valid sense data */
- if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
- /*
- * nasty: for mid-layer issued TURs, we need to return the
- * actual sense data without any recovery attempt. For eh
- * issued ones, we need to try to recover and interpret
- */
- return SUCCESS;
-
scsi_report_sense(sdev, &sshdr);
if (scsi_sense_is_deferred(&sshdr))
@@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
/* handler does not care. Drop down to default handling */
}
+ if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+ /*
+ * nasty: for mid-layer issued TURs, we need to return the
+ * actual sense data without any recovery attempt. For eh
+ * issued ones, we need to try to recover and interpret
+ */
+ return SUCCESS;
+
/*
* Previous logic looked for FILEMARK, EOM or ILI which are
* mainly associated with tapes and returned SUCCESS.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
2014-10-01 12:51 ` Christoph Hellwig
@ 2014-10-06 15:22 ` Brian King
2014-10-06 21:50 ` wenxiong
0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2014-10-06 15:22 UTC (permalink / raw)
To: Christoph Hellwig, wenxiong
Cc: device-mapper development, James Bottomley, linux-scsi
On 10/01/2014 07:51 AM, Christoph Hellwig wrote:
> Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
> commands are sent as BLOCK_PC, so this would basically revert James'
> original fix for the SATL case.
>
> Am I right to assume you only need the call to scsi_dh->check_sense and
> not the rest of the handling for the multipath path checker? If that's
> the case something like the patch below sould work:
This would work if we also duplicated the 02/04/02 K/C/Q check in alua_check_sense
handler.
Wendy - can you try my patch below, along with Christoph's latest patch here
and see if that resolves the issue?
Thanks,
Brian
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5db8454..399c1c8 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
> if (! scsi_command_normalize_sense(scmd, &sshdr))
> return FAILED; /* no valid sense data */
>
> - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> - /*
> - * nasty: for mid-layer issued TURs, we need to return the
> - * actual sense data without any recovery attempt. For eh
> - * issued ones, we need to try to recover and interpret
> - */
> - return SUCCESS;
> -
> scsi_report_sense(sdev, &sshdr);
>
> if (scsi_sense_is_deferred(&sshdr))
> @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
> /* handler does not care. Drop down to default handling */
> }
>
> + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> + /*
> + * nasty: for mid-layer issued TURs, we need to return the
> + * actual sense data without any recovery attempt. For eh
> + * issued ones, we need to try to recover and interpret
> + */
> + return SUCCESS;
> +
> /*
> * Previous logic looked for FILEMARK, EOM or ILI which are
> * mainly associated with tapes and returned SUCCESS.
>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++
1 file changed, 7 insertions(+)
diff -puN drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart drivers/scsi/device_handler/scsi_dh_alua.c
--- linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart 2014-10-06 10:19:16.184798305 -0500
+++ linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-06 10:20:35.743165951 -0500
@@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_
* LUN Not Ready -- Offline
*/
return SUCCESS;
+ if (sdev->allow_restart &&
+ (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02))
+ /*
+ * if the device is not started, we need to wake
+ * the error handler to start the motor
+ */
+ return FAILED;
break;
case UNIT_ATTENTION:
if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00)
_
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
2014-10-06 15:22 ` Brian King
@ 2014-10-06 21:50 ` wenxiong
2014-10-21 11:03 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: wenxiong @ 2014-10-06 21:50 UTC (permalink / raw)
To: device-mapper development, Brian King
Cc: Christoph Hellwig, linux-scsi, James Bottomley
Quoting Brian King <brking@linux.vnet.ibm.com>:
> On 10/01/2014 07:51 AM, Christoph Hellwig wrote:
>> Unfortunately the patch wasn't quite correct - all TEST_UNIT_READY
>> commands are sent as BLOCK_PC, so this would basically revert James'
>> original fix for the SATL case.
>>
>> Am I right to assume you only need the call to scsi_dh->check_sense and
>> not the rest of the handling for the multipath path checker? If that's
>> the case something like the patch below sould work:
>
> This would work if we also duplicated the 02/04/02 K/C/Q check in
> alua_check_sense
> handler.
>
> Wendy - can you try my patch below, along with Christoph's latest patch here
> and see if that resolves the issue?
>
> Thanks,
>
> Brian
>
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 5db8454..399c1c8 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -459,14 +459,6 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>> if (! scsi_command_normalize_sense(scmd, &sshdr))
>> return FAILED; /* no valid sense data */
>>
>> - if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
>> - /*
>> - * nasty: for mid-layer issued TURs, we need to return the
>> - * actual sense data without any recovery attempt. For eh
>> - * issued ones, we need to try to recover and interpret
>> - */
>> - return SUCCESS;
>> -
>> scsi_report_sense(sdev, &sshdr);
>>
>> if (scsi_sense_is_deferred(&sshdr))
>> @@ -482,6 +474,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>> /* handler does not care. Drop down to default handling */
>> }
>>
>> + if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
>> + /*
>> + * nasty: for mid-layer issued TURs, we need to return the
>> + * actual sense data without any recovery attempt. For eh
>> + * issued ones, we need to try to recover and interpret
>> + */
>> + return SUCCESS;
>> +
>> /*
>> * Previous logic looked for FILEMARK, EOM or ILI which are
>> * mainly associated with tapes and returned SUCCESS.
>>
>
>
>
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>
> drivers/scsi/device_handler/scsi_dh_alua.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff -puN
> drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart
> drivers/scsi/device_handler/scsi_dh_alua.c
> ---
> linux/drivers/scsi/device_handler/scsi_dh_alua.c~alua_allow_restart 2014-10-06 10:19:16.184798305
> -0500
> +++
> linux-bjking1/drivers/scsi/device_handler/scsi_dh_alua.c 2014-10-06
> 10:20:35.743165951 -0500
> @@ -474,6 +474,13 @@ static int alua_check_sense(struct scsi_
> * LUN Not Ready -- Offline
> */
> return SUCCESS;
> + if (sdev->allow_restart &&
> + (sense_hdr->asc == 0x04) && (sense_hdr->ascq == 0x02))
> + /*
> + * if the device is not started, we need to wake
> + * the error handler to start the motor
> + */
> + return FAILED;
> break;
> case UNIT_ATTENTION:
> if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00)
> _
>
> --
Sorry it took some time since we need to re-config the systems for this test.
With Christoph's new patch only, still saw the failure.
With Christoph's new patch + Brian's patch, works fine, didn't see the
failure.
Thanks,
Wendy
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices
2014-10-06 21:50 ` wenxiong
@ 2014-10-21 11:03 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2014-10-21 11:03 UTC (permalink / raw)
To: wenxiong
Cc: device-mapper development, Brian King, Christoph Hellwig,
linux-scsi, James Bottomley
On Mon, Oct 06, 2014 at 05:50:32PM -0400, wenxiong@linux.vnet.ibm.com wrote:
> Sorry it took some time since we need to re-config the systems for this test.
>
> With Christoph's new patch only, still saw the failure.
> With Christoph's new patch + Brian's patch, works fine, didn't see the
> failure.
Can one of you send me a tested series with both patches?
Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-10-21 11:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140924195721.148637349@linux.vnet.ibm.com>
[not found] ` <20140924195752.289011902@linux.vnet.ibm.com>
[not found] ` <5423B664.2020007@suse.de>
[not found] ` <5424472E.4060300@linux.vnet.ibm.com>
2014-09-25 16:57 ` [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices Christoph Hellwig
2014-09-30 18:05 ` wenxiong
2014-10-01 12:51 ` Christoph Hellwig
2014-10-06 15:22 ` Brian King
2014-10-06 21:50 ` wenxiong
2014-10-21 11:03 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox